EmulationStation freezes after game-select event triggered and video snap plays
-
@mitu I tried this with a fresh install and was able to reproduce the issue.
The steps I followed:
- Create sd card from v4.8 rpi4_400 image
- Add arcade roms, scraped media, and gamelist to the system
- Boot up the system and configure a keyboard controller
- Exit EmulationStation
- Update emulationstation:
sudo $HOME/RetroPie-Setup/retropie_packages.sh emulationstation _binary_
- Create
~/.emulationstation/es_settings.cfg
:
<?xml version="1.0"?> <bool name="VideoOmxPlayer" value="true" />
- Create
~/.emulationstation/scripts/game-select/test.sh
:
#!/bin/sh exit 0
- Restart EmulationStation
- Open
arcade
system in ES - Wait for video to play for at least a few seconds (you may need to do this a few times)
- Attempt to do anything (scroll down / up) -- ES is now unresponsive
I can try to remove a few steps to see if there's 1 thing in particular that triggers the issueUpdated. Note that I tested this with just 2 games in the gamelist as well in order to remove that from the equation.Let me know if you have any other thoughts or things to try / logs to pull.
-
I've updated the steps after re-testing from scratch. The steps to reproduce no longer include updating the system, RetroPie-Setup, or all of the packages -- just emulationstation.
-
Interesting discovery. When ES becomes unresponsive, I'm actually finding that if I manually kill the omx player, then ES becomes responsive again. It's as if the input event queue (or rendering?) is blocked.
So the steps here are:
- Follow prior steps to get ES unresponsive
- Press down a few times (as if you were expecting to be able to switch games)
- Find the omx player process
$ ps aux | grep letterbox pi 1888 3.2 0.7 196248 27988 tty1 SLl+ 04:02 0:03 --layer 10010 --loop --no-osd --aspect-mode letterbox --vol -210 -o both --win 326,81,632,258 --orientation 0 /home/pi/.emulationstation/downloaded_media/arcade/videos/88games.mp4
- Kill the process
kill 1888
- Notice the that the selected game in the gamelist view has actually changed -- the prior inputs have since been processed by ES. You need to be quick to start switching games otherwise ES will become unresponsive again once the videos start playing.
When I kill the omx player process, I receive the following log message in
es_log.txt
:Apr 24 04:17:39 lvl1: /home/pi/.emulationstation/scripts/game-select/test.sh "arcade" "/home/pi/RetroPie/roms/arcade/10yard.zip" "10-yard Fight" "input" failed with exit code != 0. Terminating processing for this event.
-
My current interpretation of what's happening is that ES kicks off the
game-select
script but, somehow, the result of that command is blocked on the omx player process. I'm not sure if this is a strange interaction between the use ofsystem
(for the script) andfork / execve
(for omx) calls within ES.Perhaps there's something not thread-safe here?
-
I'm not sure if this is a strange interaction between the use of system (for the script) and fork / execve (for omx) calls within ES.
Something like this. Using
omxplayer
invokes an external program, just like the scripting system. I'd expect the script to trigger before the video starts, but there might be some race condition somewhere.I still can't reproduce the freeze, though switching between games I've noticed a few messages of
... June 24 06:01:44 lvl1: /home/pi/.emulationstation/scripts/game-select/nop.sh "snes" "/home/pi/RetroPie/roms/snes/TestGame.zip" "TestGame" "input" failed with exit code != 0. Terminating processing for this event.
I'll take a look at the code later on to see if something pops up. Can you also try the latest EmulationStation version - installed with the
emulationstation-dev
package ? -
@mitu Thanks for helping take a look
I installed
emulationstation-dev
and am able to reproduce the same.$ sudo $HOME/RetroPie-Setup/retropie_packages.sh emulationstation-dev _binary_ $ emulationstation --help EmulationStation, a graphical front-end for ROM browsing. Written by Alec "Aloshi" Lofquist. Version 2.11.0rp-dev, built Feb 13 2022 - 16:52:03
-
I've been testing with the
dev
version, but I still can't reproduce the freeze/crash. Maybe it's video related ? Can you try to remove thegame-select
event script and see if it still happens ? -
I removed the
game-select
event and the freeze no longer occurs. Here's the output of affprobe
for an example video:Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/pi/.emulationstation/downloaded_media/arcade/videos/88games.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 title : 88games artist : EmuMovies album_artist : Circo album : MAME Video Snaps date : 2015 encoder : Lavf57.56.101 comment : http://emumovies.com genre : Video Snaps Duration: 00:00:30.05, start: 0.000000, bitrate: 477 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x480 [SAR 1:1 DAR 4:3], 406 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 24000 Hz, stereo, fltp, 64 kb/s (default) Metadata: handler_name : SoundHandler
I also tried increasing the video delay in the theme from 3 seconds to 5 seconds, but still encountered the issue.
I can see if I can add some additional debugging into a custom build of ES to see if I can gather more information. Understanding the cause of the script sometimes failing with a non-zero exit code might also lead to something insightful.
-
After a little debugging, it looks like this is a race condition between the system() call and the SIGCHLD signal handler.
The
SIGCHLD
signal handler is preventingsystem()
calls from executing properly. I believe this thread accurately describes the issue I'm seeing: https://stackoverflow.com/questions/17550217/linux-system-sigchld-handling-multithreadingI commented out the line in ES that adds the
SIGCHLD
signal handler for the omx player and I no longer experience the freeze. Of course, I have zombie child processes that need to get reaped -- but this at least confirmed the theory. -
Nice find !
It does seem to be race, perhaps theSIGCHLD
s are queued and thesystem
's signal doesn't get delivered untilomxplayer
is finished playing (forcibly) - that would explain why you get all the queued debug logging whenomxplayer
is stopped.
Looks like usingsystem
for the scripting events is not safe, perhaps a similar approach viafork
needs to be implemented. -
👍 I experimented with a few potential solutions. I'll take another look at the
fork
approach for the scripting events. I originally tried that early on and didn't have success though I may not have been doing it correctly. I'll put together a branch for that later today.I believe I have an alternative working solution here: https://github.com/RetroPie/EmulationStation/compare/master...obrie:game-select-omx-freeze
That solution changes the signal handler to have the video component only reap its own pids. Not sure how well that follows best practices, though.
-
I've come up with a better, simpler solution: https://github.com/RetroPie/EmulationStation/compare/master...obrie:game-select-omx-freeze-alt
The problem is that the
SIGCHLD
handler is not taking into account that the handler may be called only once when there are multiple child processes to be reaped. In this scenario, the following happens:- The system gamelist view is rendered and the first game is selected
- After 3 seconds, the emulationstation process is forked, the
SIGCHLD
handler is registered, and the video starts playing - When you attempt to navigate down to the next game, both a
game-select
event is triggered and the video is stopped - Under the right conditions, these 2 events can happen effectively simultaneously resulting in 2 child processes that need to be reaped -- the forked omx process and the
system()
call to theselect-game
script - When this happens, a single call is made to the
SIGCHLD
handler
The VideoPlayerComponent's
SIGCHLD
handler only ever expects there to be a single child reaped. In this case, however, there are multiple child processes to be reaped.So the solution here is fairly straightforward: introduce a somewhat standard solution of reaping child processes until there are no more to be reaped.
After this change is made, the freezing issue goes away 🎉
@mitu Let me know if you think this is ready for a PR and I'll go ahead and submit one.
-
It looks ok and it's simpler than replacing
system
in therunSystemCommand
function. -
-
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.