• Recent
  • Tags
  • Popular
  • Home
  • Docs
  • Register
  • Login
RetroPie forum home
  • Recent
  • Tags
  • Popular
  • Home
  • Docs
  • Register
  • Login

[RPi 3] Optimized lr-snes9x using PGO

Scheduled Pinned Locked Moved Ideas and Development
snessnes9xoptimizationraspberry pi 3
29 Posts 7 Posters 6.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    Griever
    last edited by Griever 17 Sept 2018, 16:43

    Recently bought a RPi 3 B+ to use as a RetroPie system in my living room and this was the first thing I did. I only tested against the available (RPi 2 optimized) binary download, but I was seeing ~12-25% improvements in some games.

    If you're wondering what PGO (profile-guided optimization) is, Wikipedia has a brief entry on it. You can also view GCC's instrumentation documentation here.

    List of games used to generate profile information:

    • ActRaiser MSU-1 (MSU-1 audio)
    • Final Fantasy VI (Opening only)
    • Kirby's Dream Land 3 (SA-1)
    • Mega Man X2 (Cx4)
    • Star Fox (Super FX)
    • Super Mario World 2: Yoshi's Island (Super FX 2)
    • Super Metroid

    Outside of games, I also ran state loading/saving.

    You can find the RetroPie-Setup patches I used to build lr-snes9x here. Note that you need to comment out CXXFLAGS, LDFLAGS and uncomment the other CXXFLAGS variable when generating the optimized binary.

    Explanation for some of the options passed to the compiler and linker:

    • -funroll-loops
      • I've seen recommended for ARM devices from multiple sources. Enabled by default with -fprofile-use
    • -funswitch-loops
      • I've seen recommended for ARM devices from multiple sources. Enabled by default with -O3
    • -Wl,-O1,--sort-common,--as-needed
      • Linker optimizations taken from ArchLinux. Unlikely to have a performance impact, may reduce binary size and increases linkage time. --as-needed is the most likely to cause linker issues.

    For what it's worth, -O3 was not used because the performance benefit is usually negligible and there's known issues in GCC 6.3.0 with some of the optimizations it enables.

    Download: lr-snes9x.tar.gz (42d454b)

    To install, extract the contents to /opt/retropie/libretrocores/

    I'd appreciate feedback on how the performance compares to lr-snes9x built from source.

    Notes:

    • It's possible the binary could be slower than one generated using the default source script due to poor profiling.
    • If you want to generate your own lr-snes9x using the provided patches, expect ~20 minute build time and ~75% performance hit while profiling (3 B+).
    • Patches change snes9x git repository to upstream snes9x instead of libretro.
    • You still can't achieve 60 fps on demanding games without threaded rendering (3 B+ stock).
    • I believe the lowest fps I saw in my tested games was ~53 in Yoshi's Island during the 'Goal' screen at the end of a level.
    1 Reply Last reply Reply Quote 4
    • M
      mitu Global Moderator
      last edited by 17 Sept 2018, 16:51

      @griever said in [RPi 3] Optimized lr-snes9x using PGO:

      ~12-25% improvements in some games

      Interesting approach, though I guess PGO would be game dependant ? How do you measure the perf. difference (improvement/regression) ?

      G 1 Reply Last reply 17 Sept 2018, 16:57 Reply Quote 0
      • G
        Griever @mitu
        last edited by Griever 17 Sept 2018, 16:57

        @mitu The benefit should mostly depend on how similar games are to the ones I used to generate profiling information. As for how I calculated the performance difference, I just compared the fps using mostly static screens while waiting for the fps to stabilize in games where I saw the most slowdown to get a very rough estimate. To make it easier, you'll probably want to enable fast forwarding so you can go above 60 fps.

        1 Reply Last reply Reply Quote 0
        • G
          Griever
          last edited by 19 Sept 2018, 00:25

          I was able to obtain stable 60 fps on Yoshi's Island in known spots with slowdown on world 1-1 (beanstalk, goal) with the following settings:

          video_vsync = "false"
          video_threaded = "false"
          video_frame_delay = "0"
          video_max_swapchain_images = "3"
          video_smooth = "false"

          The title menu still drops to ~59 fps

          Worth noting I'm also running RetroArch master (cfd52f8) along with the above patch to system.sh. I didn't state this in the OP, but I'm using whatever the default video driver is (I assume blob) and the zfast crt curve shader.

          I still need to test games with known slowdown like Kirby 3 and Star Fox, but I don't expect them to run fullspeed without threaded video.

          D 1 Reply Last reply 19 Sept 2018, 01:28 Reply Quote 0
          • D
            Darksavior @Griever
            last edited by Darksavior 19 Sept 2018, 01:28

            @griever Very nice!
            Kirby3 with stock retroarch 1.7.3 binary. Stock video settings except "reduce slowdown" hack set to compatible. Pi3b+ stock 1400mhz with crt-pi shader at 1080p output res. Level 1-1 briefly tested and the room with the cat and hamster:

            lr-snes9x 1.54.1 binary (source fails to build on me):
            45fps/40fps in room.
            Removing the "reduce slowdown" hack = ~60fps/~53fps in room.
            Turning off threaded video and hack = ~50fps/ ~44fps in room.

            Your lr-snes9x 1.56.2 binary:
            60fps/53fps in room.
            Removing the "reduce slowdown" hack = 60fps everywhere.
            Turning off threaded video and hack = 60fps/~53fps in room.

            I normally run my pi3b+ at 1500mhz to eliminate most slowdowns on lr-snes9x, but hopefully your build will convince the retropie team to have separate 3b+ optimized builds in the future.

            G 1 Reply Last reply 19 Sept 2018, 02:13 Reply Quote 4
            • G
              Griever @Darksavior
              last edited by 19 Sept 2018, 02:13

              @darksavior That's awesome! Thanks for testing!

              1 Reply Last reply Reply Quote 0
              • B
                Barcrest
                last edited by 21 Sept 2018, 14:54

                So would this yield any benefit for N64 or am i am being dumb.

                G 1 Reply Last reply 22 Sept 2018, 07:47 Reply Quote 0
                • G
                  Griever @Barcrest
                  last edited by 22 Sept 2018, 07:47

                  @barcrest I'm not familiar with N64 on the Pi, but PGO usually always has some benefit from what I understand. Sadly I believe the Pi is heavily GPU and memory bottlenecked on N64 emulation, so any benefit PGO may provide wouldn't matter in most games; although I could be wrong.

                  I decided to try to use PGO on lr-snes9x because it's usually within a couple percent from being fullspeed and hoped it could reduce or remove minor frame drops in games where you'd otherwise get fullspeed.

                  1 Reply Last reply Reply Quote 1
                  • G
                    Griever
                    last edited by Griever 23 Sept 2018, 08:48

                    Got around to testing lr-snes9x without PGO (but with -O3 -marm -funroll-loops -funswitch-loops and the linker flags)

                    Idle at Kirby 3 cat and hamster room:
                    No PGO: 52.9
                    PGO: full speed

                    Note: This is odd since I remember getting ~56 last I tested. I did re-test and make sure I was getting full speed.

                    Star Fox 2 opening:
                    No PGO: Low ~47, high ~57. Commonly went to ~54
                    PGO: Low ~52.5, high 60. Commonly went to ~57.5

                    Yoshi's Island world 1-1:
                    No PGO: full speed
                    PGO: full speed

                    As for why I'm getting higher frames than I remember, these are my best guesses:

                    • Switched to using -O3 and -marm and rebuilt RetroArch/Emulationstation. (ARM mode should have been the default anyway)
                    • Updated to a slightly more recent RetroArch
                    • Firmware/kernel may have been updated, although I believe I'm running the same as previous tests
                    • Switched to vm.swappiness = 1. Did not appear to be explicitly set before/reported 60

                    Honestly, none of the above should really matter but I'm somehow getting full speed in that room now. I'm using the same lr-snes9x binary provided in this thread and the same RetroArch settings.

                    You can view my RetroPie-Setup changes here: https://github.com/RetroPie/RetroPie-Setup/compare/master...GrieverV:unstable
                    Do note I'm constantly rewriting the git history to modify commits or rebase on top of upstream and I usually don't test my changes until the following day.

                    edit: I did delete my all/retroarch.cfg and regenerate it with a newer build of RetroArch. I made sure to reconfigure all the performance related options back to what I was using before.

                    D 1 Reply Last reply 24 Sept 2018, 02:53 Reply Quote 1
                    • D
                      Darksavior @Griever
                      last edited by Darksavior 24 Sept 2018, 02:53

                      @griever I've fixed my compiling problems on the pi. I shouldn't have messed around with the temp folders...

                      pi3b+ all stock except for crtpi shader.1080p output res. Retroarch manually updated to 1.7.4 from source. Cat/Hamster room
                      1.56.2 built from source = 55.7fps
                      Your 1.56.2 = ~59.9fps (fullspeed)

                      1 Reply Last reply Reply Quote 2
                      • D
                        Darksavior
                        last edited by 4 Oct 2018, 06:14

                        After some testing, 1.56 (from source or this pgo) has broken savestates with msu1 games. 1.54 binary works.

                        H 1 Reply Last reply 4 Oct 2018, 15:12 Reply Quote 1
                        • H
                          hhromic @Darksavior
                          last edited by 4 Oct 2018, 15:12

                          @darksavior if you have a repeatable test case for the broken savestats with msu1 games, and given that it seems to be an upstream bug, maybe is worth to send them a bugreport?

                          D 1 Reply Last reply 10 Oct 2018, 10:57 Reply Quote 0
                          • D
                            Darksavior @hhromic
                            last edited by Darksavior 10 Oct 2018, 12:02 10 Oct 2018, 10:57

                            @hhromic Not sure what happened, but savestates work again even with this binary. I've only updated to retroarch 1.7.5 lately.

                            @Griever I'm lost on compiling. Maybe you can provide an lr-snes9x.sh with your modifications to build from source if the retropie team won't add them? Your binary will eventually become obsolete.

                            H G 2 Replies Last reply 10 Oct 2018, 11:09 Reply Quote 0
                            • H
                              hhromic @Darksavior
                              last edited by hhromic 10 Oct 2018, 12:29 10 Oct 2018, 11:09

                              @darksavior fortunately he provided the patches in his original post as a gist:
                              https://gist.github.com/GrieverV/b3d1a8e2c23c295b802f9e33286437c6
                              cheers!

                              Edit: note that there are two patches, one for lr-snes9x.sh and another for system.sh.
                              Edit2: Also note that if you don't want to profile the games yourself, @Griever needs to also make available the profiling files.

                              1 Reply Last reply Reply Quote 0
                              • G
                                Griever @Darksavior
                                last edited by Griever 10 Nov 2018, 15:36 11 Oct 2018, 14:27

                                @darksavior As @hhromic said, links to patches are included in the OP. As for the profiling data, I'm not sure how reusable it is and wouldn't want to promote reusing data for older versions of snes9x

                                Do be warned that compilation can take 15-20 minutes and profiling will be very, very slow.

                                edit: Going by every other project I see that uses PGO, I doubt the data is reusable. Automating profiling isn't really feasible without macros and savestates and that's just too much effort for me.

                                1 Reply Last reply Reply Quote 0
                                • H
                                  hhromic
                                  last edited by 12 Oct 2018, 11:30

                                  Right, profiling data is not entirely reusable when updating the binary. You are right, I overlooked that.
                                  Indeed automating the profiling is not trivial to do :(

                                  1 Reply Last reply Reply Quote 0
                                  • MapleStoryM
                                    MapleStory
                                    last edited by 18 Oct 2018, 17:21

                                    Pardon my ignorance but how does one go about building lr-snes9x from source with the optimized patches?

                                    1 Reply Last reply Reply Quote 0
                                    • D
                                      Darksavior
                                      last edited by Darksavior 25 Oct 2018, 01:14

                                      @Griever Yes, if it's not too much trouble, a brief tutorial would be nice on how to use the patches. I know enough to dive in the .sh installer file and change stuff.

                                      1 Reply Last reply Reply Quote 0
                                      • D
                                        Darksavior
                                        last edited by Darksavior 11 Oct 2018, 12:33 10 Nov 2018, 12:02

                                        With snes9x 1.57 out I finally decided to take a stab at this and manually edit the changes since I have no idea how to use patches. Seems to be working.

                                        MapleStoryM 1 Reply Last reply 13 Nov 2018, 11:51 Reply Quote 0
                                        • MapleStoryM
                                          MapleStory @Darksavior
                                          last edited by 13 Nov 2018, 11:51

                                          @Darksavior

                                          Did you do anything other than copy Griever's RetroPie-Setup changes? I replaced my system.sh and lr-snes9x.sh files to match his, updated lr-snes9x from source and despite now being on 1.57, the core runs worse than the binary he posted in the original post of this thread.

                                          D 1 Reply Last reply 13 Nov 2018, 12:20 Reply Quote 0
                                          • First post
                                            Last post

                                          Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                          Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.

                                            [[user:consent.lead]]
                                            [[user:consent.not_received]]