[RPi 3] Optimized lr-snes9x using PGO
-
Recently bought a RPi 3 B+ to use as a RetroPie system in my living room and this was the first thing I did. I only tested against the available (RPi 2 optimized) binary download, but I was seeing ~12-25% improvements in some games.
If you're wondering what PGO (profile-guided optimization) is, Wikipedia has a brief entry on it. You can also view GCC's instrumentation documentation here.
List of games used to generate profile information:
- ActRaiser MSU-1 (MSU-1 audio)
- Final Fantasy VI (Opening only)
- Kirby's Dream Land 3 (SA-1)
- Mega Man X2 (Cx4)
- Star Fox (Super FX)
- Super Mario World 2: Yoshi's Island (Super FX 2)
- Super Metroid
Outside of games, I also ran state loading/saving.
You can find the RetroPie-Setup patches I used to build lr-snes9x here. Note that you need to comment out CXXFLAGS, LDFLAGS and uncomment the other CXXFLAGS variable when generating the optimized binary.
Explanation for some of the options passed to the compiler and linker:
-funroll-loops
- I've seen recommended for ARM devices from multiple sources. Enabled by default with
-fprofile-use
- I've seen recommended for ARM devices from multiple sources. Enabled by default with
-funswitch-loops
- I've seen recommended for ARM devices from multiple sources. Enabled by default with
-O3
- I've seen recommended for ARM devices from multiple sources. Enabled by default with
-Wl,-O1,--sort-common,--as-needed
- Linker optimizations taken from ArchLinux. Unlikely to have a performance impact, may reduce binary size and increases linkage time.
--as-needed
is the most likely to cause linker issues.
- Linker optimizations taken from ArchLinux. Unlikely to have a performance impact, may reduce binary size and increases linkage time.
For what it's worth,
-O3
was not used because the performance benefit is usually negligible and there's known issues in GCC 6.3.0 with some of the optimizations it enables.Download: lr-snes9x.tar.gz (42d454b)
To install, extract the contents to
/opt/retropie/libretrocores/
I'd appreciate feedback on how the performance compares to lr-snes9x built from source.
Notes:
- It's possible the binary could be slower than one generated using the default source script due to poor profiling.
- If you want to generate your own lr-snes9x using the provided patches, expect ~20 minute build time and ~75% performance hit while profiling (3 B+).
- Patches change snes9x git repository to upstream snes9x instead of libretro.
- You still can't achieve 60 fps on demanding games without threaded rendering (3 B+ stock).
- I believe the lowest fps I saw in my tested games was ~53 in Yoshi's Island during the 'Goal' screen at the end of a level.
-
@griever said in [RPi 3] Optimized lr-snes9x using PGO:
~12-25% improvements in some games
Interesting approach, though I guess PGO would be game dependant ? How do you measure the perf. difference (improvement/regression) ?
-
@mitu The benefit should mostly depend on how similar games are to the ones I used to generate profiling information. As for how I calculated the performance difference, I just compared the fps using mostly static screens while waiting for the fps to stabilize in games where I saw the most slowdown to get a very rough estimate. To make it easier, you'll probably want to enable fast forwarding so you can go above 60 fps.
-
I was able to obtain stable 60 fps on Yoshi's Island in known spots with slowdown on world 1-1 (beanstalk, goal) with the following settings:
video_vsync = "false"
video_threaded = "false"
video_frame_delay = "0"
video_max_swapchain_images = "3"
video_smooth = "false"The title menu still drops to ~59 fps
Worth noting I'm also running RetroArch master (cfd52f8) along with the above patch to system.sh. I didn't state this in the OP, but I'm using whatever the default video driver is (I assume blob) and the zfast crt curve shader.
I still need to test games with known slowdown like Kirby 3 and Star Fox, but I don't expect them to run fullspeed without threaded video.
-
@griever Very nice!
Kirby3 with stock retroarch 1.7.3 binary. Stock video settings except "reduce slowdown" hack set to compatible. Pi3b+ stock 1400mhz with crt-pi shader at 1080p output res. Level 1-1 briefly tested and the room with the cat and hamster:lr-snes9x 1.54.1 binary (source fails to build on me):
45fps/40fps in room.
Removing the "reduce slowdown" hack = ~60fps/~53fps in room.
Turning off threaded video and hack = ~50fps/ ~44fps in room.Your lr-snes9x 1.56.2 binary:
60fps/53fps in room.
Removing the "reduce slowdown" hack = 60fps everywhere.
Turning off threaded video and hack = 60fps/~53fps in room.I normally run my pi3b+ at 1500mhz to eliminate most slowdowns on lr-snes9x, but hopefully your build will convince the retropie team to have separate 3b+ optimized builds in the future.
-
@darksavior That's awesome! Thanks for testing!
-
So would this yield any benefit for N64 or am i am being dumb.
-
@barcrest I'm not familiar with N64 on the Pi, but PGO usually always has some benefit from what I understand. Sadly I believe the Pi is heavily GPU and memory bottlenecked on N64 emulation, so any benefit PGO may provide wouldn't matter in most games; although I could be wrong.
I decided to try to use PGO on lr-snes9x because it's usually within a couple percent from being fullspeed and hoped it could reduce or remove minor frame drops in games where you'd otherwise get fullspeed.
-
Got around to testing lr-snes9x without PGO (but with
-O3 -marm -funroll-loops -funswitch-loops
and the linker flags)Idle at Kirby 3 cat and hamster room:
No PGO: 52.9
PGO: full speedNote: This is odd since I remember getting ~56 last I tested. I did re-test and make sure I was getting full speed.
Star Fox 2 opening:
No PGO: Low ~47, high ~57. Commonly went to ~54
PGO: Low ~52.5, high 60. Commonly went to ~57.5Yoshi's Island world 1-1:
No PGO: full speed
PGO: full speedAs for why I'm getting higher frames than I remember, these are my best guesses:
- Switched to using
-O3
and-marm
and rebuilt RetroArch/Emulationstation. (ARM mode should have been the default anyway) - Updated to a slightly more recent RetroArch
- Firmware/kernel may have been updated, although I believe I'm running the same as previous tests
- Switched to
vm.swappiness = 1
. Did not appear to be explicitly set before/reported 60
Honestly, none of the above should really matter but I'm somehow getting full speed in that room now. I'm using the same lr-snes9x binary provided in this thread and the same RetroArch settings.
You can view my RetroPie-Setup changes here: https://github.com/RetroPie/RetroPie-Setup/compare/master...GrieverV:unstable
Do note I'm constantly rewriting the git history to modify commits or rebase on top of upstream and I usually don't test my changes until the following day.edit: I did delete my all/retroarch.cfg and regenerate it with a newer build of RetroArch. I made sure to reconfigure all the performance related options back to what I was using before.
- Switched to using
-
@griever I've fixed my compiling problems on the pi. I shouldn't have messed around with the temp folders...
pi3b+ all stock except for crtpi shader.1080p output res. Retroarch manually updated to 1.7.4 from source. Cat/Hamster room
1.56.2 built from source = 55.7fps
Your 1.56.2 = ~59.9fps (fullspeed) -
After some testing, 1.56 (from source or this pgo) has broken savestates with msu1 games. 1.54 binary works.
-
@darksavior if you have a repeatable test case for the broken savestats with msu1 games, and given that it seems to be an upstream bug, maybe is worth to send them a bugreport?
-
@hhromic Not sure what happened, but savestates work again even with this binary. I've only updated to retroarch 1.7.5 lately.
@Griever I'm lost on compiling. Maybe you can provide an
lr-snes9x.sh
with your modifications to build from source if the retropie team won't add them? Your binary will eventually become obsolete. -
@darksavior fortunately he provided the patches in his original post as a gist:
https://gist.github.com/GrieverV/b3d1a8e2c23c295b802f9e33286437c6
cheers!Edit: note that there are two patches, one for
lr-snes9x.sh
and another forsystem.sh
.
Edit2: Also note that if you don't want to profile the games yourself, @Griever needs to also make available the profiling files. -
@darksavior As @hhromic said, links to patches are included in the OP. As for the profiling data, I'm not sure how reusable it is and wouldn't want to promote reusing data for older versions of snes9x
Do be warned that compilation can take 15-20 minutes and profiling will be very, very slow.
edit: Going by every other project I see that uses PGO, I doubt the data is reusable. Automating profiling isn't really feasible without macros and savestates and that's just too much effort for me.
-
Right, profiling data is not entirely reusable when updating the binary. You are right, I overlooked that.
Indeed automating the profiling is not trivial to do :( -
Pardon my ignorance but how does one go about building lr-snes9x from source with the optimized patches?
-
@Griever Yes, if it's not too much trouble, a brief tutorial would be nice on how to use the patches. I know enough to dive in the .sh installer file and change stuff.
-
With snes9x 1.57 out I finally decided to take a stab at this and manually edit the changes since I have no idea how to use patches. Seems to be working.
-
Did you do anything other than copy Griever's RetroPie-Setup changes? I replaced my system.sh and lr-snes9x.sh files to match his, updated lr-snes9x from source and despite now being on 1.57, the core runs worse than the binary he posted in the original post of this thread.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.