[RPi 3] Optimized lr-snes9x using PGO
-
@darksavior That's awesome! Thanks for testing!
-
So would this yield any benefit for N64 or am i am being dumb.
-
@barcrest I'm not familiar with N64 on the Pi, but PGO usually always has some benefit from what I understand. Sadly I believe the Pi is heavily GPU and memory bottlenecked on N64 emulation, so any benefit PGO may provide wouldn't matter in most games; although I could be wrong.
I decided to try to use PGO on lr-snes9x because it's usually within a couple percent from being fullspeed and hoped it could reduce or remove minor frame drops in games where you'd otherwise get fullspeed.
-
Got around to testing lr-snes9x without PGO (but with
-O3 -marm -funroll-loops -funswitch-loops
and the linker flags)Idle at Kirby 3 cat and hamster room:
No PGO: 52.9
PGO: full speedNote: This is odd since I remember getting ~56 last I tested. I did re-test and make sure I was getting full speed.
Star Fox 2 opening:
No PGO: Low ~47, high ~57. Commonly went to ~54
PGO: Low ~52.5, high 60. Commonly went to ~57.5Yoshi's Island world 1-1:
No PGO: full speed
PGO: full speedAs for why I'm getting higher frames than I remember, these are my best guesses:
- Switched to using
-O3
and-marm
and rebuilt RetroArch/Emulationstation. (ARM mode should have been the default anyway) - Updated to a slightly more recent RetroArch
- Firmware/kernel may have been updated, although I believe I'm running the same as previous tests
- Switched to
vm.swappiness = 1
. Did not appear to be explicitly set before/reported 60
Honestly, none of the above should really matter but I'm somehow getting full speed in that room now. I'm using the same lr-snes9x binary provided in this thread and the same RetroArch settings.
You can view my RetroPie-Setup changes here: https://github.com/RetroPie/RetroPie-Setup/compare/master...GrieverV:unstable
Do note I'm constantly rewriting the git history to modify commits or rebase on top of upstream and I usually don't test my changes until the following day.edit: I did delete my all/retroarch.cfg and regenerate it with a newer build of RetroArch. I made sure to reconfigure all the performance related options back to what I was using before.
- Switched to using
-
@griever I've fixed my compiling problems on the pi. I shouldn't have messed around with the temp folders...
pi3b+ all stock except for crtpi shader.1080p output res. Retroarch manually updated to 1.7.4 from source. Cat/Hamster room
1.56.2 built from source = 55.7fps
Your 1.56.2 = ~59.9fps (fullspeed) -
After some testing, 1.56 (from source or this pgo) has broken savestates with msu1 games. 1.54 binary works.
-
@darksavior if you have a repeatable test case for the broken savestats with msu1 games, and given that it seems to be an upstream bug, maybe is worth to send them a bugreport?
-
@hhromic Not sure what happened, but savestates work again even with this binary. I've only updated to retroarch 1.7.5 lately.
@Griever I'm lost on compiling. Maybe you can provide an
lr-snes9x.sh
with your modifications to build from source if the retropie team won't add them? Your binary will eventually become obsolete. -
@darksavior fortunately he provided the patches in his original post as a gist:
https://gist.github.com/GrieverV/b3d1a8e2c23c295b802f9e33286437c6
cheers!Edit: note that there are two patches, one for
lr-snes9x.sh
and another forsystem.sh
.
Edit2: Also note that if you don't want to profile the games yourself, @Griever needs to also make available the profiling files. -
@darksavior As @hhromic said, links to patches are included in the OP. As for the profiling data, I'm not sure how reusable it is and wouldn't want to promote reusing data for older versions of snes9x
Do be warned that compilation can take 15-20 minutes and profiling will be very, very slow.
edit: Going by every other project I see that uses PGO, I doubt the data is reusable. Automating profiling isn't really feasible without macros and savestates and that's just too much effort for me.
-
Right, profiling data is not entirely reusable when updating the binary. You are right, I overlooked that.
Indeed automating the profiling is not trivial to do :( -
Pardon my ignorance but how does one go about building lr-snes9x from source with the optimized patches?
-
@Griever Yes, if it's not too much trouble, a brief tutorial would be nice on how to use the patches. I know enough to dive in the .sh installer file and change stuff.
-
With snes9x 1.57 out I finally decided to take a stab at this and manually edit the changes since I have no idea how to use patches. Seems to be working.
-
Did you do anything other than copy Griever's RetroPie-Setup changes? I replaced my system.sh and lr-snes9x.sh files to match his, updated lr-snes9x from source and despite now being on 1.57, the core runs worse than the binary he posted in the original post of this thread.
-
@MapleStory I had that problem at first until I read his first post which says
"Note that you need to comment out CXXFLAGS, LDFLAGS and uncomment the other CXXFLAGS variable when generating the optimized binary."
Do that and you'll be good to go.
#CXXFLAGS+=" -fprofile-dir=/home/pi/pgo/out -fprofile-generate=/home/pi/pgo/out"
CXXFLAGS+=" -fprofile-dir=/home/pi/pgo/out -fprofile-use=/home/pi/pgo/out"
#LDFLAGS+=" -lgcov"
-
Thanks, made the changes and it compiled successfully but Kirby's Dreamland 3 cat/hamster room fluctuates between 58.7-59fps. Doesn't get to a clear 60fps. Griever's binary in the first post, however, maintains a stable 60fps throughout. Other than compiling lr-snes9x from source after editing the scripts, was there any other updates I should've done?
-
@MapleStory I only tested it against the official build on a pi3b non-plus at 1300 and it was an improvement over that one. It's still not as fast as the Griever's build. I wouldn't know what else would be required to improve it. A pi3b+ user should be fullspeed on 1.57 unless you use the slowdown fix hack. This is my first time editing building options so I'm rather new to this.
-
@MapleStory The final binary use profiling(performance) data - i.e. you run the emulator a few times, in the games you're interested on improving performance, and some performance data is generated. Then you re-compile the binary again, this time the compiler reads the performance data and produces another binary that takes into account this perf data, with some new optimisations added based on the data.
That's why @Griever said it's difficult to automate the build of such compiler, it's practically non-automated if you want to have meaningful results.
-
So, I've been meaning to ask in case anyone had tried it out: if I use the binary above on a pre-RA 1.57 setup, will it still be an improvement/will it at least run, or are the improvements dependent on updating RetroArch as well? I'm not keen on updating it as I have a stable, working setup and I'm under the impression I read a few things changed on the performance front.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.