• Recent
  • Tags
  • Popular
  • Home
  • Docs
  • Register
  • Login
RetroPie forum home
  • Recent
  • Tags
  • Popular
  • Home
  • Docs
  • Register
  • Login

Overclocking the Pi3b+ GPU (Results)

Scheduled Pinned Locked Moved General Discussion and Gaming
pi3 b+overclockgpu
133 Posts 18 Posters 39.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    Parabolaralus @Brunnis
    last edited by 8 Feb 2019, 19:09

    @Brunnis Thank you for taking the time to research and post this data!

    1 Reply Last reply Reply Quote 1
    • R
      Rion
      last edited by 8 Feb 2019, 20:24

      @Brunnis I have also noticed the slowdowns happening in certain games using ondemand CPU governor.

      I never bothered with Overclocking but just changed to performance instead.

      But if would be interesting to see if there is anyway to optimize cpu governor ondemand.

      FBNeo rom filtering
      Mame2003 Arcade Bezels
      Fba Arcade Bezels
      Fba NeoGeo Bezels

      1 Reply Last reply Reply Quote 0
      • B
        Brunnis @dankcushions
        last edited by Brunnis 2 Aug 2019, 21:35 8 Feb 2019, 21:30

        @dankcushions

        no, i'm just articulating what i mean when i say that the data presented is not the "smoking gun", but your personal observations of a stutter is.

        Fair enough.

        i think the sampling_down_factor might be the one we would tweak:

        • sampling_down_factor:

          This parameter controls the rate at which the kernel makes a decision

          on when to decrease the frequency while running at top speed. When set

          to 1 (the default) decisions to reevaluate load are made at the same

          interval regardless of current clock speed. But when set to greater

          than 1 (e.g. 100) it acts as a multiplier for the scheduling interval

          for reevaluating load when the CPU is at its top speed due to high

          load. This improves performance by reducing the overhead of load

          evaluation and helping the CPU stay at its top speed when truly busy,

          rather than shifting back and forth in speed. This tunable has no

          effect on behavior at lower speeds/lower CPU loads.

        I don't think that will work either. The problem is, again, that the average load is too low. Whether we stretch out the sample period over 1, 2, 10 frames, the average load will be close to the same and far below the required 95% that's needed to stay at the highest speed.

        The way I see it, rapid highly periodic loads like these are hard to handle. The same issue occurs when running RetroArch on Windows 10 machines with modern Core processors, so it's not isolated to the Raspberry Pi. The only possible solutions I've been able to come up with so far are to:

        1. Decrease the sample period, so that reactions to load changes can be carried out faster. If the default sample period is really 10 ms, that means more than half the execution time of a frame can be spent at the lower frequency before the CPU is instructed to increase clocks. The sample period would need to be drastically reduced in order to minimize the time spent down clocked after beginning actual timing critical work.

        2. Use the "performance" governor. This completely eliminates the inefficiency of needing to sample CPU load before reacting.

        That's it for me on the topic. I'm fine with using the run command settings to control this, like I always have before. Sometimes it's just fun to try to understand the mechanics behind a behavior. :-)

        @Rion said in Overclocking the Pi3b+ GPU (Results):

        I have also noticed the slowdowns happening in certain games using ondemand CPU governor.

        Even without using video_max_swapchain_images=2?

        @Rion said in Overclocking the Pi3b+ GPU (Results):

        I never bothered with Overclocking but just changed to performance instead.

        Yeah, that's the correct approach. Starting out with overclocking would be bad, since you're then just working against a mechanism that's now even more prone to try to lower the frequency. So, first change the governor, then overclock if performance still isn't good enough. :-)

        @Rion said in Overclocking the Pi3b+ GPU (Results):

        But if would be interesting to see if there is anyway to optimize cpu governor ondemand.

        I think the nature of the load makes it hard. It will never be as performant as simply using the "performance" governor. Well, if you tweak the "ondemand" governor so that it considers the emulator load to be high enough to not down clock inbetween frames, then it will perform the same as the "performance" governor. But then there's no point in doing the optimization in the first place, since it won't save you any power consumption over the "performance" governor anyway!

        D 1 Reply Last reply 9 Feb 2019, 07:12 Reply Quote 3
        • R
          Riverstorm
          last edited by 9 Feb 2019, 01:46

          Performance is one the first things I turn on with a new build for the past few years. There's several MAME titles that you can feel the difference between on-demand vs performance when gaming. I don't know if it's coincidental but the more demanding titles seem to really show which almost seems contradictory but maybe another component is bottlenecking it.

          If you open SSH and use watch you can see it constantly yo-yo while playing almost any game. It never stays at maximum performance like when using the performance setting.

          It takes a minute to turn it on, done! :)

          1 Reply Last reply Reply Quote 0
          • P
            pi2user
            last edited by 9 Feb 2019, 03:14

            Another tool you can use to check up on cpu performance is nmon (nigel's monitor). It is a standard debian package that was originally developed for monitoring enterprise level POWER systems.
            It shows performance on a per-core basis, so you can see how much any individual core is being used. It is interesting to see how load is spread out over all the cores even in a single-core task like retroarch. The system appears to frequently reassign load to different cores, so each core gets a turn at running fully loaded.
            Install and run nmon, then press c for cpu display per-core, and then t for top procs. There's also l for long-term cpu stats but this is overall and not per-core. q quits out of nmon.

            1 Reply Last reply Reply Quote 1
            • D
              dankcushions Global Moderator @Brunnis
              last edited by 9 Feb 2019, 07:12

              @Brunnis said in Overclocking the Pi3b+ GPU (Results):

              @dankcushions

              i think the sampling_down_factor might be the one we would tweak:

              • sampling_down_factor:

                This parameter controls the rate at which the kernel makes a decision

                on when to decrease the frequency while running at top speed. When set

                to 1 (the default) decisions to reevaluate load are made at the same

                interval regardless of current clock speed. But when set to greater

                than 1 (e.g. 100) it acts as a multiplier for the scheduling interval

                for reevaluating load when the CPU is at its top speed due to high

                load. This improves performance by reducing the overhead of load

                evaluation and helping the CPU stay at its top speed when truly busy,

                rather than shifting back and forth in speed. This tunable has no

                effect on behavior at lower speeds/lower CPU loads.

              I don't think that will work either. The problem is, again, that the average load is too low. Whether we stretch out the sample period over 1, 2, 10 frames, the average load will be close to the same and far below the required 95% that's needed to stay at the highest speed.

              agree i think you’d also have to raise that threshold also (which is possible)

              Well, if you tweak the "ondemand" governor so that it considers the emulator load to be high enough to not down clock inbetween frames, then it will perform the same as the "performance" governor. But then there's no point in doing the optimization in the first place, since it won't save you any power consumption over the "performance" governor anyway!

              for those situations, absolutely, but the issue is that using the performance governer puts ALL applications launched via run command at full speed, which includes multithreaded or otherwise low-load applications (kodi, pixel desktop (??), maybe even some emulators like gambette, etc). i still like the idea of finding a way to make the ondemand governer work for us.

              B 1 Reply Last reply 9 Feb 2019, 08:44 Reply Quote 0
              • B
                Brunnis @dankcushions
                last edited by Brunnis 2 Sept 2019, 14:02 9 Feb 2019, 08:44

                @dankcushions said in Overclocking the Pi3b+ GPU (Results):

                @Brunnis said in Overclocking the Pi3b+ GPU (Results):

                @dankcushions

                i think the sampling_down_factor might be the one we would tweak:

                • sampling_down_factor:

                  This parameter controls the rate at which the kernel makes a decision

                  on when to decrease the frequency while running at top speed. When set

                  to 1 (the default) decisions to reevaluate load are made at the same

                  interval regardless of current clock speed. But when set to greater

                  than 1 (e.g. 100) it acts as a multiplier for the scheduling interval

                  for reevaluating load when the CPU is at its top speed due to high

                  load. This improves performance by reducing the overhead of load

                  evaluation and helping the CPU stay at its top speed when truly busy,

                  rather than shifting back and forth in speed. This tunable has no

                  effect on behavior at lower speeds/lower CPU loads.

                I don't think that will work either. The problem is, again, that the average load is too low. Whether we stretch out the sample period over 1, 2, 10 frames, the average load will be close to the same and far below the required 95% that's needed to stay at the highest speed.

                agree i think you’d also have to raise that threshold also (which is possible)

                Well, if you tweak the "ondemand" governor so that it considers the emulator load to be high enough to not down clock inbetween frames, then it will perform the same as the "performance" governor. But then there's no point in doing the optimization in the first place, since it won't save you any power consumption over the "performance" governor anyway!

                for those situations, absolutely, but the issue is that using the performance governer puts ALL applications launched via run command at full speed, which includes multithreaded or otherwise low-load applications (kodi, pixel desktop (??), maybe even some emulators like gambette, etc). i still like the idea of finding a way to make the ondemand governer work for us.

                I was still stuck in thinking emulation (for which I’m not convinced ondemand is a great idea). I agree for Kodi and the likes.

                As for trying out an optimization: I’d start out by either testing half as long sampling_rate OR decreasing the up_threshold to something like 50%. It won’t fix all possible performance issues, but there’s a good chance it’s much better than the defaults for the RetroPie use case.

                1 Reply Last reply Reply Quote 1
                • RascasR
                  Rascas
                  last edited by 9 Feb 2019, 16:47

                  Here is the default settings of Libreelec:
                  https://github.com/LibreELEC/LibreELEC.tv/blob/libreelec-9.0/projects/RPi/initramfs/platform_init
                  I believe they are the same now on Raspbian/RetroPie. The only difference is the io_is_busy, which improves performance when in heavy reading/writing to the sdcard, for example streaming a torrent. But probably not so usefull for emulation.

                  H 1 Reply Last reply 9 Feb 2019, 23:26 Reply Quote 2
                  • H
                    hhromic @Rascas
                    last edited by 9 Feb 2019, 23:26

                    @Rascas said in Overclocking the Pi3b+ GPU (Results):

                    I believe they are the same now on Raspbian/RetroPie.

                    Can confirm from one of my Raspbian devices:

                    /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy: 0
                    /sys/devices/system/cpu/cpufreq/ondemand/up_threshold: 50
                    /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate: 100000
                    /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor: 50
                    
                    1 Reply Last reply Reply Quote 0
                    • S
                      Silent
                      last edited by 10 Feb 2019, 09:31

                      I really like the snippet with setting individual emulators to performance governor! I'd love to have that set eg. for PSX but I wouldn't necessarily want it for GBC or Kodi. Onstart sounds like a good way to optimally utilize the option!

                      1 Reply Last reply Reply Quote 0
                      • quicksilverQ
                        quicksilver
                        last edited by quicksilver 26 Feb 2019, 13:27

                        So after a lot of testing it looks like the two pi3b+'s are stable at 590mhz core_freq and 615mhz respectively. Compared to my 3b which was stable up to 565mhz core_freq this is a pretty good jump. My old pi2's were only stable up to 525 and 535mhz core_freq. And they are all supposed to have the same GPU which means there must be some improvements with the manufacturing process from each generation of pi or perhaps power regulation is better too. I know that my testing sample is fairly small but there is a clear trend in the 6-7 different pis I have tested that shows that even though the GPU is the same in all models, later models definitely seem to have more GPU overclocking headroom. While this may not mean much to most people, it makes a noticeable difference when trying to run N64 or Dreamcast games that are right on the edge of being playable, in some cases an overclock is just enough to push it into the playable zone.

                        I should also note that while its often discussed that the CPU is not the bottleneck for N64 emulation on the pi, this is only true of the pi3b and 3b+. Going back and testing my pi2's there were quite a few N64 games that were pushing its cpu past 100% usage, even when overclocked to 1075mhz.

                        mituM B 2 Replies Last reply 26 Feb 2019, 13:46 Reply Quote 0
                        • mituM
                          mitu Global Moderator @quicksilver
                          last edited by 26 Feb 2019, 13:46

                          @quicksilver said in Overclocking the Pi3b+ GPU (Results):

                          I should also note that while its often discussed that the CPU is not the bottleneck for N64 emulation on the pi, this is only true of the pi3b and 3b+.

                          Well, the PI3B / 3B+ have a slightly different CPU than the Pi2:

                          • PI2 - Broadcom BCM2836 SoC, with quad-core ARM Cortex-A7 900 MHz processor
                          • PI3B/3B+ - Broadcom BCM2837 SoC, with quad-core ARM Cortex-A53 1200 MHz processor
                          quicksilverQ H 2 Replies Last reply 26 Feb 2019, 13:49 Reply Quote 0
                          • quicksilverQ
                            quicksilver @mitu
                            last edited by 26 Feb 2019, 13:49

                            @mitu correct, just the GPU has remained the same. I just felt it was important distinction to make, especially for those using older pi devices.

                            1 Reply Last reply Reply Quote 1
                            • H
                              hhromic @mitu
                              last edited by hhromic 26 Feb 2019, 13:50

                              Also don't forget that the RPI2 has two revisions, one with BCM2836 and another with BCM2837 (underclocked).

                              09c63171-a59d-4a96-a711-346c63e86da0-image.png

                              Ref: https://www.raspberrypi.org/documentation/hardware/raspberrypi/revision-codes/README.md
                              Ref: https://elinux.org/RPi_HardwareHistory

                              quicksilverQ 1 Reply Last reply 26 Feb 2019, 13:53 Reply Quote 0
                              • quicksilverQ
                                quicksilver @hhromic
                                last edited by 26 Feb 2019, 13:53

                                @hhromic Interesting! I think I remember hearing that, though I had forgotten about it. Something to the effect that they took the BCM2837's that couldnt handle being clocked at 1200 mhz and used them on the revised pi2's.

                                1 Reply Last reply Reply Quote 1
                                • B
                                  Brunnis @quicksilver
                                  last edited by 28 Feb 2019, 08:27

                                  @quicksilver Thanks for the results! Out of curiosity, how does the instability manifest itself when you push the core_freq too far? I'm asking because I just changed from using a large heatsink sandwich to a Flirc case and I'm having issues. With the previous heatsink, I had run stability tests for days in total and saw no issues at all. With the Flirc case, I quickly ran into a few issues:

                                  • Crash (black screen) while at Raspbian desktop
                                  • Floating point error while running sysbench (within 30 minutes)
                                  • Tainted kernel (memory paging issue) while idling in RetroPie over night. EmulationStation still running, but not responding to input.

                                  My current quess is that these issues have mainly cropped up due to a temperature difference. The Flirc case is not as good at cooling the Pi as the heatsink sandwich and this probably pushes the system over the edge. It still doesn't explain the tainted kernel issue at idle, though. Maybe that issue was there before changing case.

                                  I'm going to try three things:

                                  • Increase CPU voltage one step
                                  • Lower core_freq overclock from 600 to 550 MHz
                                  • Increase SDRAM voltage one step
                                  1 Reply Last reply Reply Quote 0
                                  • B
                                    Brunnis
                                    last edited by Brunnis 28 Feb 2019, 10:40

                                    A small FYI (I think we've touched on this before in this thread):

                                    • On my Pi 3B+, only over_voltage=1 has an effect. Increasing over_voltage setting to 2 or more has no effect at all. Maximum allowed CPU voltage is 1.4V and since my Pi already runs at 1.375V, it makes sense that any setting higher than 1 (which adds 0.025V) would have no effect.
                                    • Default SDRAM voltages on my Pi are: sdram_c=1.25V, sdram_i=1.25V, sdram_p=1.225V. The over_voltage_sdram setting assumes 1.20V is default. So, using over_voltage_sdram=1 has no effect on my board (since it's 1.225V, which is the same or lower than the defaults). Using over_voltage_sdram=2 only affects sdram_p (it's raised to 1.25V). Using over_voltage_sdram=3 finally increases voltage on all three SDRAM rails to 1.275V.

                                    I don't know if the defaults are the same for all Pi 3B+ boards. Check your voltages with:

                                    for id in core sdram_c sdram_i sdram_p ; do echo -e "$id:\t$(vcgencmd measure_volts $id)" ; done
                                    

                                    Please note that you should put a load on the CPU before running this command, otherwise you'll just see the default 1.2V used in downclocked state.

                                    quicksilverQ 1 Reply Last reply 28 Feb 2019, 12:32 Reply Quote 0
                                    • quicksilverQ
                                      quicksilver @Brunnis
                                      last edited by quicksilver 28 Feb 2019, 12:32

                                      @Brunnis instability from having core freq too high has always shown up for me as a freeze. Quake 3 is running but suddenly locks up, though I can still ssh in and reboot my pi. I should note that my max overclock is only achievable with maximum overvoltage (1.397v, not sure why your pi won't go this high), anything less will freeze very quickly at these speeds. If I overclock v3d frequency too high I get visual artifacts onscreen followed by a freeze.

                                      Twice I encountered an issue where I left my pi idle in emulation station to find that it had frozen hours later. One of those times my pi was overheating and my case fan which is programed to turn on at 65C was not on. Not entirely sure what would cause the pi to overheat when idle. But after removing my sdram and CPU overclocks I have not had that issue since. I haven't had time to thoroughly test them yet so I'll just leave them off for now.

                                      B 1 Reply Last reply 28 Feb 2019, 12:53 Reply Quote 0
                                      • B
                                        Brunnis @quicksilver
                                        last edited by 28 Feb 2019, 12:53

                                        @quicksilver No, you misunderstood me. My Pi also goes up to 1.39V. However, since default voltage is 1.375V, the only over_voltage setting that does anything is over_voltage=1 (which will give 1.39V). The Pi is not allowed to go over 1.4V, so setting over_voltage to 2 or higher will still result in the same 1.39V as over_voltage=1.

                                        Thanks for sharing your experience. That over heating condition sounds...weird. I have modified a few of my overclock settings and will see if that helps (lowered core_freq from 600 to 550, lowered video related clocks from 400 to 367 and increased SDRAM voltages by 0.025V).

                                        arm_freq=1475
                                        core_freq=550
                                        v3d_freq=367
                                        h264_freq=367
                                        isp_freq=367
                                        sdram_freq=550
                                        over_voltage=1
                                        over_voltage_sdram_c=3
                                        over_voltage_sdram_i=3
                                        over_voltage_sdram_p=2
                                        temp_soft_limit=70

                                        quicksilverQ 1 Reply Last reply 28 Feb 2019, 14:46 Reply Quote 0
                                        • quicksilverQ
                                          quicksilver @Brunnis
                                          last edited by quicksilver 28 Feb 2019, 14:46

                                          @Brunnis are you sure about the over voltage values? I just did a test and this is what I got on my pi3b+:

                                          No over voltage= 1.344V
                                          Over_voltage 1= 1.369V
                                          Over_voltage 2= 1.388V
                                          Over_voltage 3= 1.394V (not 1.397v like I said earlier)

                                          So for me over_voltage 3 is max setting. Are you sure yours is different?

                                          B 1 Reply Last reply 28 Feb 2019, 14:58 Reply Quote 0
                                          • First post
                                            Last post

                                          Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                          Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.

                                            [[user:consent.lead]]
                                            [[user:consent.not_received]]