This happens to me when I route my audio signals from the Pi through my home theater receiver and then into the TV. You are spot on in calling it an 'audio-screensaver' as that seems to trigger it when absolutely no audio signal is detected for a second or two.
Side effect of using fancy electronics on audio I suppose. If the device "conditioning" your audio signals doesn't have a setting to disable this behavior then you are stuck with it acting this way. There isn't really anything that ES, RetroPie, or RetroArch can do to fix this since it occurs within the audio conditioning device itself.
On the plus side, this normally doesn't occur too much in games while actually playing. Only when going into/out of games from ES or at loading screens, etc.