@sad_muso you didn't say what emulator you're using. afaik even simple shaders are often very system bus intensive, so they're constantly fighting for bandwidth with the GPU, CPU, etc. afair, lr-genesis-plus-gx really hammers the system bus, so try using lr-picodrive instead, if you're using the former.
Are they CPU/RAM intensive or both? Does the Pi 4 cope with shaders better than the 3 B?
first part answered above. pi 4 will cope better.
Do overlays cause any hit on performance, specifically when used in conjunction with shaders?
yes, for whatever reason they seem to.
What is the least expensive way to implement curve and scanlines that is currently in existence?
the way you're doing it already - z_pi_crt.