EmulationStation freezes after game-select event triggered and video snap plays
-
I've updated the steps after re-testing from scratch. The steps to reproduce no longer include updating the system, RetroPie-Setup, or all of the packages -- just emulationstation.
-
Interesting discovery. When ES becomes unresponsive, I'm actually finding that if I manually kill the omx player, then ES becomes responsive again. It's as if the input event queue (or rendering?) is blocked.
So the steps here are:
- Follow prior steps to get ES unresponsive
- Press down a few times (as if you were expecting to be able to switch games)
- Find the omx player process
$ ps aux | grep letterbox pi 1888 3.2 0.7 196248 27988 tty1 SLl+ 04:02 0:03 --layer 10010 --loop --no-osd --aspect-mode letterbox --vol -210 -o both --win 326,81,632,258 --orientation 0 /home/pi/.emulationstation/downloaded_media/arcade/videos/88games.mp4- Kill the process
kill 1888- Notice the that the selected game in the gamelist view has actually changed -- the prior inputs have since been processed by ES. You need to be quick to start switching games otherwise ES will become unresponsive again once the videos start playing.
When I kill the omx player process, I receive the following log message in
es_log.txt:Apr 24 04:17:39 lvl1: /home/pi/.emulationstation/scripts/game-select/test.sh "arcade" "/home/pi/RetroPie/roms/arcade/10yard.zip" "10-yard Fight" "input" failed with exit code != 0. Terminating processing for this event. -
My current interpretation of what's happening is that ES kicks off the
game-selectscript but, somehow, the result of that command is blocked on the omx player process. I'm not sure if this is a strange interaction between the use ofsystem(for the script) andfork / execve(for omx) calls within ES.Perhaps there's something not thread-safe here?
-
I'm not sure if this is a strange interaction between the use of system (for the script) and fork / execve (for omx) calls within ES.
Something like this. Using
omxplayerinvokes an external program, just like the scripting system. I'd expect the script to trigger before the video starts, but there might be some race condition somewhere.I still can't reproduce the freeze, though switching between games I've noticed a few messages of
... June 24 06:01:44 lvl1: /home/pi/.emulationstation/scripts/game-select/nop.sh "snes" "/home/pi/RetroPie/roms/snes/TestGame.zip" "TestGame" "input" failed with exit code != 0. Terminating processing for this event.I'll take a look at the code later on to see if something pops up. Can you also try the latest EmulationStation version - installed with the
emulationstation-devpackage ? -
@mitu Thanks for helping take a look
I installed
emulationstation-devand am able to reproduce the same.$ sudo $HOME/RetroPie-Setup/retropie_packages.sh emulationstation-dev _binary_ $ emulationstation --help EmulationStation, a graphical front-end for ROM browsing. Written by Alec "Aloshi" Lofquist. Version 2.11.0rp-dev, built Feb 13 2022 - 16:52:03 -
I've been testing with the
devversion, but I still can't reproduce the freeze/crash. Maybe it's video related ? Can you try to remove thegame-selectevent script and see if it still happens ? -
I removed the
game-selectevent and the freeze no longer occurs. Here's the output of affprobefor an example video:Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/pi/.emulationstation/downloaded_media/arcade/videos/88games.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 title : 88games artist : EmuMovies album_artist : Circo album : MAME Video Snaps date : 2015 encoder : Lavf57.56.101 comment : http://emumovies.com genre : Video Snaps Duration: 00:00:30.05, start: 0.000000, bitrate: 477 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480 [SAR 1:1 DAR 4:3], 406 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 24000 Hz, stereo, fltp, 64 kb/s (default) Metadata: handler_name : SoundHandlerI also tried increasing the video delay in the theme from 3 seconds to 5 seconds, but still encountered the issue.
I can see if I can add some additional debugging into a custom build of ES to see if I can gather more information. Understanding the cause of the script sometimes failing with a non-zero exit code might also lead to something insightful.
-
After a little debugging, it looks like this is a race condition between the system() call and the SIGCHLD signal handler.
The
SIGCHLDsignal handler is preventingsystem()calls from executing properly. I believe this thread accurately describes the issue I'm seeing: https://stackoverflow.com/questions/17550217/linux-system-sigchld-handling-multithreadingI commented out the line in ES that adds the
SIGCHLDsignal handler for the omx player and I no longer experience the freeze. Of course, I have zombie child processes that need to get reaped -- but this at least confirmed the theory. -
Nice find !
It does seem to be race, perhaps theSIGCHLDs are queued and thesystem's signal doesn't get delivered untilomxplayeris finished playing (forcibly) - that would explain why you get all the queued debug logging whenomxplayeris stopped.
Looks like usingsystemfor the scripting events is not safe, perhaps a similar approach viaforkneeds to be implemented. -
👍 I experimented with a few potential solutions. I'll take another look at the
forkapproach for the scripting events. I originally tried that early on and didn't have success though I may not have been doing it correctly. I'll put together a branch for that later today.I believe I have an alternative working solution here: https://github.com/RetroPie/EmulationStation/compare/master...obrie:game-select-omx-freeze
That solution changes the signal handler to have the video component only reap its own pids. Not sure how well that follows best practices, though.
-
I've come up with a better, simpler solution: https://github.com/RetroPie/EmulationStation/compare/master...obrie:game-select-omx-freeze-alt
The problem is that the
SIGCHLDhandler is not taking into account that the handler may be called only once when there are multiple child processes to be reaped. In this scenario, the following happens:- The system gamelist view is rendered and the first game is selected
- After 3 seconds, the emulationstation process is forked, the
SIGCHLDhandler is registered, and the video starts playing - When you attempt to navigate down to the next game, both a
game-selectevent is triggered and the video is stopped - Under the right conditions, these 2 events can happen effectively simultaneously resulting in 2 child processes that need to be reaped -- the forked omx process and the
system()call to theselect-gamescript - When this happens, a single call is made to the
SIGCHLDhandler
The VideoPlayerComponent's
SIGCHLDhandler only ever expects there to be a single child reaped. In this case, however, there are multiple child processes to be reaped.So the solution here is fairly straightforward: introduce a somewhat standard solution of reaping child processes until there are no more to be reaped.
After this change is made, the freezing issue goes away 🎉
@mitu Let me know if you think this is ready for a PR and I'll go ahead and submit one.
-
It looks ok and it's simpler than replacing
systemin therunSystemCommandfunction. -
-
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.