RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login

    Overclocking the Pi3b+ GPU (Results)

    Scheduled Pinned Locked Moved General Discussion and Gaming
    pi3 b+overclockgpu
    133 Posts 18 Posters 40.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      Brunnis
      last edited by Brunnis

      @dankcushions said in Overclocking the Pi3b+ GPU (Results):

      i guess i still don't see a smoking gun with the figures being given, especially when the issue is only apparent using video settings where stutter is a known risk under cpu load situations. however if it's a binary thing to your eyes where the stutter is eliminated once the performance governor is set, i guess that is all that needs to be said.

      Well, in this case the stuttering does not occur because the CPU isn't fast enough, but because the ondemand governor is not able to determine that the CPU should stay at max frequency. That's a pretty big difference in my eyes.

      The figures I posted above show us that the ondemand governor doesn't work as we'd expect and that the resulting performance issue is simply masked by buffering with the default settings. With this testing alone, I can't say for sure that it doesn't affect some marginal games even at default settings. It's certainly possible that only video_max_swapchain_images=2 exposes it. In that case, it would of course be okay to leave the governor at the current default.

      I didn't post this to press for a change of default governor (since BuZz has already said it won't happen). However, I thought the figures were interesting, since the frequency rollercoaster behavior at default settings didn't seem to be common knowledge.

      dankcushionsD 1 Reply Last reply Reply Quote 1
      • robertvb83R
        robertvb83 @mitu
        last edited by

        @mitu said in Overclocking the Pi3b+ GPU (Results):

        @robertvb83 said in Overclocking the Pi3b+ GPU (Results):

        updating mame2003-plus almost always freezes or stops with errors..

        What kind of errors ? If they're memory related error (not enough memory), then you can increase the amount of swap added during compilation to get over those issues. Do you get the same kind of errors without overclocking ?

        I did not have any errors without overclocking!
        this is where compiling ends when overclocked:
        alt text

        My full size arcade cabinet Robotron vs. Octolyzer

        dankcushionsD 1 Reply Last reply Reply Quote 0
        • dankcushionsD
          dankcushions Global Moderator @Brunnis
          last edited by

          @Brunnis said in Overclocking the Pi3b+ GPU (Results):

          @dankcushions said in Overclocking the Pi3b+ GPU (Results):

          i guess i still don't see a smoking gun with the figures being given, especially when the issue is only apparent using video settings where stutter is a known risk under cpu load situations. however if it's a binary thing to your eyes where the stutter is eliminated once the performance governor is set, i guess that is all that needs to be said.

          Well, in this case the stuttering does not occur because the CPU isn't fast enough, but because the ondemand governor is not able to determine that the CPU should stay at max frequency. That's a pretty big difference in my eyes.

          The figures I posted above show us that the ondemand governor doesn't work as we'd expect and that the resulting performance issue is simply masked by buffering with the default settings.

          forgive me, but i don't think they neccesarily show that. they only show that the governor has decided the CPU should be downclocked (to 600) during some tests. we know the emulators in question do not exert a constant load on the cpu (~90% usage). from the earlier information, it looks like the governor should be checking CPU load and making this decision every 0.01 of a second (sampling_rate defaults to 10000 usecs?), so given that fidelity i am now not surprised that you will see it downlocking every so often. it's probably changing the core speed 100s of times a second.

          the performance issue you observe must be caused by his process, agreed, but that stutters is not specifically measured in the above data, if you get what i mean. we need an fps benchmark for that.

          anyway, it seems to me like a good fix might be to increase the sampling_rate fidelity to something north of a frame. eg over 16666667 usec. that way, applications that are generally low load will still downlock, but cpu-heavy emulators will stay full speed. i don't know if that's a good idea, just my initial thought.

          B 1 Reply Last reply Reply Quote 0
          • dankcushionsD
            dankcushions Global Moderator @robertvb83
            last edited by

            @robertvb83 said in Overclocking the Pi3b+ GPU (Results):

            I did not have any errors without overclocking!
            this is where compiling ends when overclocked:

            remember that retropie compiles use 2 of the 4 cores, but emulation mostly uses 1 core, so an overclock that is stable in games can definitely be unstable in compiles.

            1 Reply Last reply Reply Quote 0
            • B
              Brunnis @dankcushions
              last edited by

              @dankcushions said in Overclocking the Pi3b+ GPU (Results):

              forgive me, but i don't think they neccesarily show that. they only show that the governor has decided the CPU should be downclocked (to 600) during some tests. we know the emulators in question do not exert a constant load on the cpu (~90% usage). from the earlier information, it looks like the governor should be checking CPU load and making this decision every 0.01 of a second (sampling_rate defaults to 10000 usecs?), so given that fidelity i am now not surprised that you will see it downlocking every so often. it's probably changing the core speed 100s of times a second.

              Yes, I think I may have expressed myself a bit unclear. I agree that the governor probably just behaves according to spec. The "unexpected" part is that it affects the performance in a negative way in certain cases. Ideally, the "ondemand" governor should produce the same (or very close to the same) end result (i.e. performance) as the "performance" governor.

              the performance issue you observe must be caused by his process, agreed, but that stutters is not specifically measured in the above data, if you get what i mean. we need an fps benchmark for that.

              Yeah, we can definitely measure the performance delta, but what would we do with the data? Would it affect the current discussion in any significant way? For an initial discussion on whether the ondemand governor can cope with the load without affecting the result, audio-visual cues are certainly sufficient. The regression in the end result is not exactly subtle.

              anyway, it seems to me like a good fix might be to increase the sampling_rate fidelity to something north of a frame. eg over 16666667 usec. that way, applications that are generally low load will still downlock, but cpu-heavy emulators will stay full speed. i don't know if that's a good idea, just my initial thought.

              They write sample rate in the docs, but I guess they mean period? Increasing the sampling period wouldn't really help. The average load over the period would then often be too low to clock up at all. We'd instead need a really small sample period, so that the downclocked core spins up as fast as possible when the load increases (i.e. the next frame rendering kicks off). The current issue is probably that the default of, say, 10 ms means that once the core is clocked down and the emulator kicks off again, you're spending up to 10 ms rendering the frame at 600 MHz before the governor checks the load and decides to clock back up. Then it's too late and you won't be able to submit the frame on time.

              dankcushionsD 1 Reply Last reply Reply Quote 0
              • quicksilverQ
                quicksilver @hhromic
                last edited by quicksilver

                @hhromic just found this in the official RPI documentation:

                "NOTE: Setting any overclocking parameters to values other than those used by raspi-config may set a permanent bit within the SoC, making it possible to detect that your Pi has been overclocked. The specific circumstances where the overclock bit is set are if force_turbo is set to 1 and any of the over_voltage_* options are set to a value > 0. See the blog post on Turbo Mode for more information."

                So force turbo or AND any amount of over voltage applied will set the warranty bit.

                H 1 Reply Last reply Reply Quote 1
                • dankcushionsD
                  dankcushions Global Moderator @Brunnis
                  last edited by

                  @Brunnis said in Overclocking the Pi3b+ GPU (Results):

                  @dankcushions said in Overclocking the Pi3b+ GPU (Results):

                  forgive me, but i don't think they neccesarily show that. they only show that the governor has decided the CPU should be downclocked (to 600) during some tests. we know the emulators in question do not exert a constant load on the cpu (~90% usage). from the earlier information, it looks like the governor should be checking CPU load and making this decision every 0.01 of a second (sampling_rate defaults to 10000 usecs?), so given that fidelity i am now not surprised that you will see it downlocking every so often. it's probably changing the core speed 100s of times a second.

                  Yes, I think I may have expressed myself a bit unclear. I agree that the governor probably just behaves according to spec. The "unexpected" part is that it affects the performance in a negative way in certain cases. Ideally, the "ondemand" governor should produce the same (or very close to the same) end result (i.e. performance) as the "performance" governor.

                  the performance issue you observe must be caused by his process, agreed, but that stutters is not specifically measured in the above data, if you get what i mean. we need an fps benchmark for that.

                  Yeah, we can definitely measure the performance delta, but what would we do with the data? Would it affect the current discussion in any significant way?

                  no, i'm just articulating what i mean when i say that the data presented is not the "smoking gun", but your personal observations of a stutter is.

                  i think the sampling_down_factor might be the one we would tweak:

                  • sampling_down_factor:

                    This parameter controls the rate at which the kernel makes a decision

                    on when to decrease the frequency while running at top speed. When set

                    to 1 (the default) decisions to reevaluate load are made at the same

                    interval regardless of current clock speed. But when set to greater

                    than 1 (e.g. 100) it acts as a multiplier for the scheduling interval

                    for reevaluating load when the CPU is at its top speed due to high

                    load. This improves performance by reducing the overhead of load

                    evaluation and helping the CPU stay at its top speed when truly busy,

                    rather than shifting back and forth in speed. This tunable has no

                    effect on behavior at lower speeds/lower CPU loads.

                  B 1 Reply Last reply Reply Quote 0
                  • H
                    hhromic @quicksilver
                    last edited by

                    @quicksilver said in Overclocking the Pi3b+ GPU (Results):

                    So force turbo or any amount of over voltage applied will set the warranty bit.

                    No, it is force_turbo=1 and over_voltage_* > 0. If you don't use force_turbo, you are fine.

                    The specific circumstances where the overclock bit is set are if force_turbo is set to 1 and any of the over_voltage_* options are set to a value > 0.

                    quicksilverQ 1 Reply Last reply Reply Quote 0
                    • quicksilverQ
                      quicksilver @hhromic
                      last edited by

                      @hhromic Ah thank you for the clarification! I completely missed the "AND".

                      1 Reply Last reply Reply Quote 1
                      • H
                        hhromic
                        last edited by

                        Interesting discussions and investigations guys!
                        I still advocate to leave ondemand as the system default and educate users on how overclocking works and how to use the governor runcommand option, as it's the soundest/safest approach. This topic should definitively be used to update/populate the Wiki entry on the topic.

                        The only improvement I would consider in this subject would be to implement a per-command governor setting in runcommand, similar to how video modes are set currently. For example create a governors.cfg file alongside videomodes.cfg.

                        This would give the flexibility for the governor to be configured per-emulator as necessary, e.g. performance for lr-mupen64plus and default for lr-gambatte, or any other customisation.

                        What do you think @buzz? should be a fairly easy thing to code taking the videomode functionality as template.

                        BuZzB 1 Reply Last reply Reply Quote 0
                        • BuZzB
                          BuZz administrators @hhromic
                          last edited by BuZz

                          @hhromic The interface is busy enough as it is. I don't think this warrants that level of configuration. So no thanks.

                          To help us help you - please make sure you read the sticky topics before posting - https://retropie.org.uk/forum/topic/3/read-this-first

                          H 1 Reply Last reply Reply Quote 0
                          • H
                            hhromic @BuZz
                            last edited by

                            @BuZz umm I could have sworn there was already a menu entry for the cpu governor in runcommand, but you are right, it is only read from the global options and configured externally in the runcommand scriptmodule. I agree that adding this to the menu would add two more entries to the already crowded interface.

                            The idea was more for these advanced tinker users (like in this topic!), so if you reconsider it in the future, perhaps we can add the functionality without exposing any menu items, i.e. requiring editing the governors config file manually.

                            BuZzB 1 Reply Last reply Reply Quote 0
                            • BuZzB
                              BuZz administrators @hhromic
                              last edited by

                              @hhromic advanced users can do this via an onstart/onend script if they want.

                              To help us help you - please make sure you read the sticky topics before posting - https://retropie.org.uk/forum/topic/3/read-this-first

                              H 1 Reply Last reply Reply Quote 0
                              • H
                                hhromic @BuZz
                                last edited by

                                @BuZz you mean adding something like this (and the corresponding reverting snippet in onend):

                                #!/usr/bin/env bash
                                
                                system="$1"
                                emulator="$2"
                                
                                if [[ "$emulator" == "lr-mupen64plus" ]]; then
                                      for cpu in /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor; do
                                        echo performance | sudo tee "$cpu" >/dev/null
                                    done
                                fi
                                

                                Instead of adding something like this to a governors.cfg file?

                                lr-mupen64plus = performance
                                

                                :)

                                1 Reply Last reply Reply Quote 0
                                • BuZzB
                                  BuZz administrators
                                  last edited by BuZz

                                  Yes. If you're going to ignore the work involved putting it into runcommand and future maintenance of the code also.

                                  But putting your sarcasm to one side - You can simplify that script on the RPI by just using cpu0 and skipping the loop.

                                  no need for a corresponding reverting snippet either - can just be one line to restore to ondemand.

                                  To help us help you - please make sure you read the sticky topics before posting - https://retropie.org.uk/forum/topic/3/read-this-first

                                  H 1 Reply Last reply Reply Quote 1
                                  • H
                                    hhromic @BuZz
                                    last edited by hhromic

                                    @BuZz sorry I didn't mean to be rude, I'm genuinely being friendly here. I realise being sarcastic wasn't a good move. Apologies.

                                    Of course I'm not ignoring the work needed to code this functionality, and I was going to volunteer on doing it and testing it myself if you felt it was a contributing addition to the system. I understand your safety/maintainability concerns very well and respect your wishes as the project leader. If you don't think is worth it, no hard feelings and all good :thumbsup.

                                    no need for a corresponding reverting snippet either - can just be one line to restore to ondemand.

                                    I was just refering to the actual nice approach in runcommand where it saves the current governor and restores it on exit :)

                                    Actually runcommand has all the functionality built-in to set/unset the governor already and is robust, that's why I liked the idea of implementing it in there instead of onstart/onend scripts.

                                    BuZzB 1 Reply Last reply Reply Quote 2
                                    • BuZzB
                                      BuZz administrators @hhromic
                                      last edited by

                                      @hhromic no worries. the functionality in runcommand is technically overkill on the RPI as the cores are not independently controllable (hence why using cpu0 is enough).

                                      To help us help you - please make sure you read the sticky topics before posting - https://retropie.org.uk/forum/topic/3/read-this-first

                                      1 Reply Last reply Reply Quote 1
                                      • ParabolaralusP
                                        Parabolaralus @Brunnis
                                        last edited by

                                        @Brunnis Thank you for taking the time to research and post this data!

                                        1 Reply Last reply Reply Quote 1
                                        • RionR
                                          Rion
                                          last edited by

                                          @Brunnis I have also noticed the slowdowns happening in certain games using ondemand CPU governor.

                                          I never bothered with Overclocking but just changed to performance instead.

                                          But if would be interesting to see if there is anyway to optimize cpu governor ondemand.

                                          FBNeo rom filtering
                                          Mame2003 Arcade Bezels
                                          Fba Arcade Bezels
                                          Fba NeoGeo Bezels

                                          1 Reply Last reply Reply Quote 0
                                          • B
                                            Brunnis @dankcushions
                                            last edited by Brunnis

                                            @dankcushions

                                            no, i'm just articulating what i mean when i say that the data presented is not the "smoking gun", but your personal observations of a stutter is.

                                            Fair enough.

                                            i think the sampling_down_factor might be the one we would tweak:

                                            • sampling_down_factor:

                                              This parameter controls the rate at which the kernel makes a decision

                                              on when to decrease the frequency while running at top speed. When set

                                              to 1 (the default) decisions to reevaluate load are made at the same

                                              interval regardless of current clock speed. But when set to greater

                                              than 1 (e.g. 100) it acts as a multiplier for the scheduling interval

                                              for reevaluating load when the CPU is at its top speed due to high

                                              load. This improves performance by reducing the overhead of load

                                              evaluation and helping the CPU stay at its top speed when truly busy,

                                              rather than shifting back and forth in speed. This tunable has no

                                              effect on behavior at lower speeds/lower CPU loads.

                                            I don't think that will work either. The problem is, again, that the average load is too low. Whether we stretch out the sample period over 1, 2, 10 frames, the average load will be close to the same and far below the required 95% that's needed to stay at the highest speed.

                                            The way I see it, rapid highly periodic loads like these are hard to handle. The same issue occurs when running RetroArch on Windows 10 machines with modern Core processors, so it's not isolated to the Raspberry Pi. The only possible solutions I've been able to come up with so far are to:

                                            1. Decrease the sample period, so that reactions to load changes can be carried out faster. If the default sample period is really 10 ms, that means more than half the execution time of a frame can be spent at the lower frequency before the CPU is instructed to increase clocks. The sample period would need to be drastically reduced in order to minimize the time spent down clocked after beginning actual timing critical work.

                                            2. Use the "performance" governor. This completely eliminates the inefficiency of needing to sample CPU load before reacting.

                                            That's it for me on the topic. I'm fine with using the run command settings to control this, like I always have before. Sometimes it's just fun to try to understand the mechanics behind a behavior. :-)

                                            @Rion said in Overclocking the Pi3b+ GPU (Results):

                                            I have also noticed the slowdowns happening in certain games using ondemand CPU governor.

                                            Even without using video_max_swapchain_images=2?

                                            @Rion said in Overclocking the Pi3b+ GPU (Results):

                                            I never bothered with Overclocking but just changed to performance instead.

                                            Yeah, that's the correct approach. Starting out with overclocking would be bad, since you're then just working against a mechanism that's now even more prone to try to lower the frequency. So, first change the governor, then overclock if performance still isn't good enough. :-)

                                            @Rion said in Overclocking the Pi3b+ GPU (Results):

                                            But if would be interesting to see if there is anyway to optimize cpu governor ondemand.

                                            I think the nature of the load makes it hard. It will never be as performant as simply using the "performance" governor. Well, if you tweak the "ondemand" governor so that it considers the emulator load to be high enough to not down clock inbetween frames, then it will perform the same as the "performance" governor. But then there's no point in doing the optimization in the first place, since it won't save you any power consumption over the "performance" governor anyway!

                                            dankcushionsD 1 Reply Last reply Reply Quote 3
                                            • First post
                                              Last post

                                            Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                            Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.