An input lag investigation
-
Note: Don't read this post (but feel free to discuss the subject in this thread). Instead, read the newer version of this post along with lots of further tests/updates in my thread over at the LibRetro forums: https://forums.libretro.com/t/an-input-lag-investigation/4407
Hi everyone!
I've previously posted some tests of input lag on RetroPie in a thread in the old forum (link). Since then, I've performed quite a lot of tests trying to quantify the input lag in emulation and where it comes from. Since posting in the old forum, I've been laying low for a while due to several things. First of all, I simply haven't had much time to spare (and this testing is pretty time consuming). I've also spent some time on modifying a USB SNES replica with an LED, to get more accurate readings. Additionally, I took the decision to postpone any further testing until I got my new phone, an iPhone SE, which is capable of 240 FPS recording. Recording at 240 FPS, combined with the LED-rigged controller, provides for much quicker and more accurate measurements than previously.
So, having improved the test rig, I decided to re-run some of my previous tests (and perform some additional ones). The aim of this post is to:
-
Look at what sort of input lag that can be expected from RetroPie and if it differs between the OpenGL and Dispmanx video drivers.
-
Look at what sort of input lag that can be expected from RetroArch in Windows 10.
-
Isolate the emulator/game part of the input lag from the rest of the system, in order to see how much lag they actually contribute with and if there are any meaningful differences between emulators.
Let's begin! :-D
Test systems and equipment
Windows PC:
-
Core i7-6700K @ stock frequencies
-
Radeon R9 390 8GB (Radeon Software 16.5.2.1, default driver settings except OpenGL triple buffering enabled)
-
Windows 10 64-bit
-
RetroArch 1.3.4
-
lr-nestopia v1.48-WIP
-
lr-snes9x-next v1.52.4
-
bsnes-mercury-balanced v094
RetroPie:
-
Raspberry Pi 3
-
RetroPie 3.8.1
-
lr-nestopia (need to check version)
-
lr-snes9x-next (need to check version)
Monitor (used for all tests): HP Z24i LCD monitor with 1920x1200 resolution, connected to the systems via the DVI port. This monitor supposedly has almost no input lag (~1 ms), but I've only seen one test and I haven't been able to verify this myself. All tests were run at native 1920x1200 resolution.
Gamepad (used for all tests): Modified RetroLink USB SNES replica. I have modified the controller with a red LED connected to the B button. As soon as the B button is pressed, the LED lights up. Below is a photo of the modified controller:
Recording equipment: iPhone SE recording at 240 FPS and 1280x720 resolution.
Configuration
RetroArch on Windows PC:
Default settings except:
Use Fullscreen Mode: On
Windowed Fullscreen Mode: Off
HW Bilinear Filtering: Off
Hard GPU Sync: On
RetroPie:Default settings, except video_threaded=false.
The video_driver config value was changed between "gl" and "dispmanx" to test the different video drivers.
Note: Just to be clear, vsync was ALWAYS enabled for all tests on both systems.
Test procedure and test cases
The input lag tests were carried out by recording the screen and the LED (connected to the controller's B button), while repeatedly pressing the button to make the character jump. The LED lights up as soon as the button is pressed, but the character on screen doesn't jump until some time after the LED lights up. The time between the LED lighting up and the character on the screen jumping is the input lag. By recording repeated attempts and then analyzing the recordings frame-by-frame, we can get a good understanding of the input lag. Even if the HP Z24i display used during these tests were to add a meaningful and measurable delay (which we can't know for sure), the results should still be reliable for doing relative comparisons.
I used Mega Man 2, Super Mario World 2: Yoshi's Island and the SNES 240p test suite (v1.02). All NTSC. For each test case, 35 button presses were recorded.
The test cases were as follows:
Test case 1: RetroPie, lr-nestopia, Mega Man 2, OpenGL
Test case 2: RetroPie, lr-nestopia, Mega Man 2, Dispmanx
Test case 3: RetroPie, lr-snes9x-next, Super Mario World 2: Yoshi's Island, OpenGL
Test case 4: RetroPie, lr-snes9x-next, Super Mario World 2: Yoshi's Island, Dispmanx
Test case 5: RetroPie, lr-snes9x-next, 240p test suite, Dispmanx
Test case 6: Windows 10, lr-nestopia, Mega Man 2, OpenGL
Test case 7: Windows 10, lr-snes9x-next, Super Mario World 2: Yoshi's Island, OpenGL
Test case 8: Windows 10, bsnes-mercury-balanced, Super Mario World 2: Yoshi's Island, OpenGL
Test case 9: Windows 10, bsnes-mercury-balanced, 240p test suite, OpenGLMega Man 2 was tested at the beginning of Heat Man's stage:
Super Mario World 2: Yoshi's Island was tested at the beginning of stage 1-2:
The 240p test was run in the "Manual Lag Test" screen. I didn't care about the actual lag test results within the program, but rather wanted to record the time from the controller's LED lighting up until the result appearing on the screen, i.e. the same procedure as when testing the games.
Below is an example of a few video frames from one of the tests (Mega Man 2 on RetroPie with OpenGL driver), showing what the recorded material looked like.
Test results
Note: The 240p test results below have all had 0.5 frame added to them to make them directly comparable to the NES/SNES game results above. Since the character in the NES/SNES game tests is located at the bottom part of the screen and the output from the 240p test is located at the upper part, the additional display scanning delay would otherwise produce artificially good results for the 240p test.
Comments
-
Input lag on RetroPie is pretty high with the OpenGL driver, at 6 frames for NES and 7-8 frames for SNES.
-
The Dispmanx driver is 1 frame faster than the OpenGL driver on the Raspberry Pi, resulting in 5 frames for NES and 6-7 frames for SNES.
-
SNES input lag differs by 1 frame between Yoshi's Island and the 240p test suite. Could this be caused by differences in how the game code handles input polling and responding to input? Snes9x and bsnes are equally affected by this.
-
Input lag in Windows 10, with GPU hard sync enabled in RetroArch, is very fast. 4 frames for NES and 5-6 frames for SNES.
-
The Windows 10 results indicate that there are no differences in input lag between Snes9x and bsnes.
How much of the input lag is caused by the emulators?
One interesting question that remains is how many frames it takes from the point where the emulator receives the input and until it has produced a finished frame. This is obviously crucial information in a thread like this, since it tells us how much of the input lag is actually caused by the emulation and how much is caused by the host system.
I have googled endlessly to try to find an answer to this and I also created a thread in byuu's forum (author of bsnes/higan), but no one could provide any good answers.
Instead, I realized that this is incredibly easy to test with RetroArch. RetroArch has the ability to pause a core and advance it frame by frame. Using this feature, I made the following test:
-
Pause emulation (press 'p' button on keyboard).
-
Press and hold the jump button on the controller.
-
Advance emulation frame by frame (press 'k' button on keyboard) until the character jumps.
This simple test seems to work exactly as I had imagined. By not running the emulator in real-time, we remove the input polling time, the waiting for the emulator to kick off the frame rendering, as well as the frame output. We have isolated the emulator's part of the input lag equation! Without further ado, here are the results (taken from RetroArch on Windows 10, but host system should be irrelevant):
As can be seen, there is a minimum of two frames needed to produce the resulting frame. As expected, SNES emulation is slower than NES emulation and needs 1-2 frames of extra time. This matches exactly with the results measured and presented earlier in this post.
From what I can tell, having taken a quick look at the source code for RetroArch, Nestopia and Snes9x, the input will be polled after pressing the button to advance the frame, but before running the emulator's main loop. So the input should be available to the emulator already when advancing to the first frame. So, why is neither emulator able to show the result of the input already in the first rendered frame? My guess is that it depends on when the game itself actually reads the input. The input is polled from the host system before kicking off the emulator's main loop. Some time during the emulator's main loop, the game will then read that input. If the game reads it late, it might not actually use the input data until it renders the next frame. The result would be two frames of lag, such as in the case of Nestopia. If this hypothesis is correct, another game running on Nestopia might actually be able to pull off one frame of lag.
For the SNES, there's another frame or two of lag being introduced and I can't really explain where it's coming from. My guess right now is that this may actually be a system characteristic which would show up on a real system as well. Maybe an SNES emulator developer can chime in on this?
Final conclusions
So, after all of these tests, how can we summarize the situation? Regarding the emulators, the best we can hope for is probably a 1 frame improvement compared to what we have now. However, I don't think there's any motivation from the developers to do the necessary changes (and it might get complex). I'm not holding my breath on that one. So, in terms of NES and SNES emulation that leaves us with 2 and 3-4 frames as the baseline.
On top of that delay, we'll get 4 ms average (half the polling interval) for USB polling and 8.33 ms average (half the frame interval at 60 FPS) between having polled the input and kicking off the emulator. These delays cannot be avoided.
When the frame has been rendered it needs to be output to the display. This is where improvements seem to be possible for the Raspberry Pi/RetroPie combo. Dispmanx already improves by one frame over OpenGL, and with Dispmanx it's as fast as the PC (the one used in this test) running RetroArch under Ubuntu in KMS mode (I've tested this previously). However, since Windows 10 on the same PC is even faster, I suspect that there's still room for improvements in RetroPie/Linux. Again, though, I'm not holding my breath for any improvements here. As it is right now, Windows 10 seems to be able to start scanning out a new frame less than one frame interval after it has been generated. RetroPie (Dispmanx)/Ubuntu (KMS) seems to need one additional frame.
The final piece of the puzzle is the actual display. Even if the display has virtually no signal delay (i.e. time between getting the signal at the input and starting to change the actual pixel), we will still have to deal with the scanning delay and the pixel response time. The screen is updated line by line from the top left to the bottom right. The average time until we see what we want to see will be half a frame interval, i.e. 8.33 ms.
Bottom line: Getting 4 frames of input lag with NES and 5-6 with SNES, like we do on Windows 10, is actually quite good and not trivial to improve. RetroPie is only 1 frame slower, but if we could make it match RetroArch on Windows 10 that would be awesome. Anyone up for the challenge? ;-)
Questions for the developers
-
Do we know why Dispmanx is one frame faster than OpenGL on the Raspberry Pi?
-
Is there any way of disabling the aggressive smoothing/bilinear filtering when using Dispmanx?
-
Even when using Dispmanx, the Raspberry Pi is slower than Windows 10 by 1 frame. The same seems to hold true for Linux in KMS compared to Windows 10. Maybe it's a Linux thing or maybe it's a RetroArch thing when running under Linux. I don't expect anyone to actually do this, but it would be interesting if an experienced Linux and/or RetroArch dev were to look into this.
-
It would also be nice if any emulator devs could comment regarding the input handling and if that's to blame for an extra frame of input lag, like we can now suspect. Also, would be interesting to know if the input polling could be delayed until the game actually requests it. I believe byuu made a reference to this in the thread I posted on his forum, by saying: "HOWEVER, the SNES gets a bit shafted here. The SNES can run with or without overscan. And you can even toggle this setting mid-frame. So I have to output the frame at V=241,H=0. Whereas input is polled at V=225,H=256+. One option would be to break out of the emulation loop for an input poll event as well. But we'd complicate the design of all cores just for the sake of potentially 16ms that most humans couldn't possibly perceive (I know, I know, "but it's cumulative!" and such. I'm not entirely unsympathetic.)"
Next steps
I will try to perform a similar lag test using actual NES and SNES hardware. I will try to use a CRT for this test and I will not have LED equipped controllers (at least not initially). However, this should be good enough to determine if there's any meaningful difference in input lag between the two consoles. I don't have a CRT or SNES right now, so I would need to borrow that. If anyone has easy access to the needed equipment/games (preferably a 120 or 240 FPS camera), I'd be grateful if you'd help out and perform these tests.
In the meantime, I found this: https://docs.google.com/spreadsheets/d/1L4AUY7q2OSztes77yBsgHKi_Mt4fIqmMR8okfSJHbM4/pub?hl=en&single=true&gid=0&output=html
As you can see, the SNES (NTSC) with Super Mario World on a CRT resulted in 3 frames of input lag. That corresponds very well with the theories and results presented in this post.
Note regarding the HP Z24i display (can be skipped)
I measured a minimum of 3.25 frames of input lag for Nestopia running under Windows 10. Out of those 3.25 frames, 2 come from Nestopia itself. That leaves us with 1.25 frames for controller input and display output. Given that this was the quickest result out of 35 tests, it's pretty safe to assume that it's the result of a nicely aligned USB polling event and emulator frame rendering start, resulting in a miniscule input delay. Let's assume it's 0 ms (it's not, but it's probably close enough). That leaves us with the display output. The only thing we can be sure of is the scanning delay, caused by the display updating the image from top left to bottom right, line by line. Given the position of the character during the Mega Man 2 test, the scanning accounts for approximately 11.9 ms or 0.7 frames. We're left with 0.5 frames. This could come from measuring errors/uncertainties, frame delay/buffering in the display hardware, other small delays in the system, liquid crystal response time and of course some signal delay in the screen. Either way, it more or less proves that this HP display has very low input lag (less than one frame).
-
-
this is fantastic! :) great job!
Do we know why Dispmanx is one frame faster than OpenGL on the Raspberry Pi?
vanafel (the person who wrote the retraorch dispmanx driver) is active on the libretro forums so hopefully he can answer
Is there any way of disabling the aggressive smoothing/bilinear filtering when using Dispmanx?
this has literally just been fixed! if you recompile retroarch from source you will get no filtering (unless video_smooth is turned on, of course)
Even when using Dispmanx, the Raspberry Pi is slower than Windows 10 by 1 frame. The same seems to hold true for Linux in KMS compared to Windows 10. Maybe it's a Linux thing or maybe it's a RetroArch thing when running under Linux. I don't expect anyone to actually do this, but it would be interesting if an experienced Linux and/or RetroArch dev were to look into this.
i have no experience in the KMS stuff, but according to vanfanel there is KMS/DRM support coming to raspian - see http://libretro.com/forums/showthread.php?t=4648 (also some general dispmanx info here also). this sounds like a good thing for retropie in general.
It would also be nice if any emulator devs could comment regarding the input handling and if that's to blame for an extra frame of input lag, like we can now suspect. Also, would be interesting to know if the input polling could be delayed until the game actually requests it.
there's some options in retroarch that might be relevant here:
- input driver: it's udev by default, but what about linuxraw? udev is via the device manager... possibly slower than linuxraw?
- poll type behaviour: there's a setting in the RGUI input menu called "poll type behavior" that may help when set to "late" vs "early" or "normal" - what does this do re: input lag? i think 'late' is what you want, but who knows.
-
@dankcushions said in An input lag investigation:
this is fantastic! :) great job!
Thanks a lot! Hopefully it will get things going regarding this tired old subject that never seems to go anywhere. :P
Do we know why Dispmanx is one frame faster than OpenGL on the Raspberry Pi?
vanafel (the person who wrote the retraorch dispmanx driver) is active on the libretro forums so hopefully he can answer
Yep, he has responded to me over there. I've yet to respond to him, though... Too much to do and too little time.
Is there any way of disabling the aggressive smoothing/bilinear filtering when using Dispmanx?
this has literally just been fixed! if you recompile retroarch from source you will get no filtering (unless video_smooth is turned on, of course)
Yep, vanfanel told me and I tested it right away. Worked like a charm! Nice to actually be able to use dispmanx.
Even when using Dispmanx, the Raspberry Pi is slower than Windows 10 by 1 frame. The same seems to hold true for Linux in KMS compared to Windows 10. Maybe it's a Linux thing or maybe it's a RetroArch thing when running under Linux. I don't expect anyone to actually do this, but it would be interesting if an experienced Linux and/or RetroArch dev were to look into this.
i have no experience in the KMS stuff, but according to vanfanel there is KMS/DRM support coming to raspian - see http://libretro.com/forums/showthread.php?t=4648 (also some general dispmanx info here also). this sounds like a good thing for retropie in general.
Yep, I've been following that as well. Hoping for any improvement, but give the fact that Linux/KMS on a PC couldn't outperform Win 10, I'm not holding my breath.
It would also be nice if any emulator devs could comment regarding the input handling and if that's to blame for an extra frame of input lag, like we can now suspect. Also, would be interesting to know if the input polling could be delayed until the game actually requests it.
there's some options in retroarch that might be relevant here:
- input driver: it's udev by default, but what about linuxraw? udev is via the device manager... possibly slower than linuxraw?
I believe I did a quick and dirty test of linuxraw a while ago. Granted, it was with a 60 FPS camera, but I couldn't detect any difference between that and the default driver. More investigation work might be needed, but I think there might be other avenues to explore first.
- poll type behaviour: there's a setting in the RGUI input menu called "poll type behavior" that may help when set to "late" vs "early" or "normal" - what does this do re: input lag? i think 'late' is what you want, but who knows.
Yeah, it's set to "late" by default. The way the nestopia and snes9x-next cores are currently implemented, I don't think this setting makes a difference. They both poll the input right before kicking off the emulator's main loop anyway. Even if they were designed to poll right when they actually read the input (which seems to be once per frame), this wouldn't help much in many cases anyway. The emulator renders a frame as fast as possible and it usually takes a few milliseconds at most (although certain games can take the majority of a frame interval to render on a Pi). If the frame rendering is fast, reading the input at the beginning or at the end will make little difference.
However, I have found one thing that does make a difference: At least in snes9x-next, frame rendering is made so that the emulator's main loop begins by rendering what was calculated in the last call to the main loop. Having finished rendering the frame, it continues by reading the input and calculates the next frame. It then exits the main loop and the new frame isn't rendered until the main loop is executed once again. This causes the emulator to have two frames of lag even in the ideal case.
See this post here for the whole story (and the fix): http://libretro.com/forums/showthread.php?t=5428&p=41376&viewfull=1#post41376
-
Interesting reading and good jobs.
Some comment:
I think you'll need to measure the switch deboucing timing of your game controller and probably you'll want to exempt it from your data. That has nothing to do with the performance of RetroArch and its underlying drivers, with respect to the handling of user input. And I think this timing might contribute huge percentage of amount to the total delay!Okay let me explain why I think so - a wee bit of technical thing and some behind the scenes of what's happening when you press the button on your controller.
When you press a button on the game controller, the MCU (a micro-controller, kind of a small computer, resides in your game controller) will almost detect the press right away. Due to the nature of the mechanical switches that tend to bounce in unpredictable manner for a short period of time, however, the MCU will not proceed to acknowledge the press and send a report across the USB bus. Instead it will do many sampling and if all the results are matching, then only the MCU will acknowledge and register the press and send a report across USB. That extra sampling time that is imposed in order to counter the nature of switch bounce is what referred as 'switch debouncing time'.
I think you really want to distinguish this timing out, so that we can see more clearer picture in term of how much each component of a system contributes to the delay. As I alluded earlier, the amount contributed by switch debouncing might surprise you.
(Reference: http://www.ganssle.com/debouncing.htm)
-
if anyone is still following this thread, @Brunnis 's lag fix for lr-snes9x-next is now live! update from source to get it :)
-
@dankcushions said in An input lag investigation:
if anyone is still following this thread, @Brunnis 's lag fix for lr-snes9x-next is now live! update from source to get it :)
Thanks for letting people know! I haven't actually tried it on RetroPie myself yet, but I will probably get around to doing it tomorrow. :)
-
I hate to pop in here all ignorant, simply looking for answers.....
But what are the answers?
I mean, (I did read your whole post, and what I got from it was you stating that there is ALWAYS going to be some input lag that you CANNOT fix "Because Pi." basically. Is this correct? you stated that an actual computer plays SNES better than a Pi?
Have you found any commands that can be entered into command line, or into a config to fix things?
Whats this about an update to snes9x? Is this a fix for input lag? How do you install it? Is it as simple as going to "update retropie" in the menu system?Im a bit noobish, I could always use the help, thank you!
-
@scrappbrapp the first post links to a thread in the libretro forums which has more of the technical details.
code fixes have been done on the default snes and NES emulators that reduce the input lag. they can gotten by updating the individual lr-snes9x-next and lr-fceumm emulators by binary in the 'main' section of the packages section of the retropie-setup script.
-
Hi !
First of all : great job !!! So basically, I guess those 2 more frames of delay (ogl vs dispmanx and linux vs win) is what I perceive and what bugs me :) Hmm, I'm so addicted to pixel shaders, it'll be difficult to switch to dispmanx ! If it were possible to use scanlines with dispmanx, it would be a nice solution.
With the same display and joysticks, I compared the same NES / PC Engine games on my FPGA computer (MiST) and the Raspberry Pi, and the difference is important (meaning : important enough to notice it without making measures). Some fast PC Engine games like Star Soldier seem unplayable on the Pi (because the ship moves extremely fast) while I perceive no delay on the MiST. In some other games, it's barely noticeable. Mainly depends on the sprite speed I guess !
Personally, I made the configuration tweaks everyone does (enable hard sync, disable multithreaded video, set frame delay) but I'm unable to measure. It may make things better but input delay is still noticeable. There are also options to poll input at different times.
It's pretty cool some people are paying attention to this issue. I guess casual gamers don't mind, but it makes a big difference when playing fast arcade games seriously !
-
BTW, Brunnis, thanks a lot for your fix too ! Will it get into the main tree ? Would be great if libretro devs checked the other cores :)
-
@dankcushions could you walk me through?
Im taking a guess of how to update .. from the main menu, do I go: retropie > Retropie setup > Manage packages > Update all installed packages OR Manage main packages > lr-fceumm/lr-snes9x > and then do I choose update from binary OR update from source?
Thanks
-
they can gotten by updating the individual lr-snes9x-next and lr-fceumm emulators by binary in the 'main' section of the packages section of the retropie-setup script.
-
Well I guess I followed it correctly, and I think I do see the differencee, thank you. When I mash sword in zelda for example, I feel like its reacting different than it use to.
Whats the difference from updating by binary, and source?
-
First of all, I just want to say great work to the OP for doing such a thorough investigation and thanks also to the devs for implementing the delay-reducing change into lbr-snes9x-next and lbr-fceumm.
I recently updated my Retropie from my old 3.7 setup to the latest 3.8.1 just so I could install the updated emulators with these changes, and here are my findings so far:
- Running on the default OpenGL display driver with Floob's Video Manager overlays/shaders applied to most of my emulators (NES, Genesis, SNES... etc.), I didn't really notice a discernible reduction in input delay from before the update.
- Then I switched to the Dispmanx display driver, and I noticed a signifcant reduction in input delay. It was pretty freaking fantastic, as even tighter platformers like DKC and Super Mario World felt nearly native.
- BUT running on Dispmanx means that I can not use any of the shaders/overlays that I've enjoyed up until this point.
So now I'm faced with a tough decision. Do I want to enjoy the awesome CRT overlays/shaders that give all of my console emulators that authentic look and feel at the expense of some input delay, or do I want my gameplay to be as responsive as possible but with only a simple bilinear filter applied and no overlays?
I'm having a hard time choosing. Why is it that OpenGL inherently has more input lag than Dispmanx? Is there theoretically anything that can be done to achieve parity between the two?
-
@ScOULaris said in An input lag investigation:
I'm having a hard time choosing. Why is it that OpenGL inherently has more input lag than Dispmanx? Is there theoretically anything that can be done to achieve parity between the two?
The GL driver has to do lots of extra processing (i.e. running the shader) that the Dispmanx driver doesn't. Keep in mind the shader has to be run for every pixel on the screen - which for a 1080 screen is about 1.5 million times. You can see where the extra lag comes from.
Lag parity isn't possible because of this. With a really fast video card it could be reduced but that's not an option on the Pi.
-
Question about the NES and SNES input lag fix. Should I remove the video hard sync and frame settings after applying the fix? I know those were previously known ways to fix the input lag slightly but will that throw anything off with these new fixes in place?
-
Video hard sync does nothing on the rpi
-
@BuZz Should this be removed from the Git page then?
https://github.com/retropie/retropie-setup/wiki/Overclocking#improving-input-lag-and-delay -
@silentq yes. The wiki is edited by the community so mistakes do happen.
-
@davej said in An input lag investigation:
@ScOULaris said in An input lag investigation:
I'm having a hard time choosing. Why is it that OpenGL inherently has more input lag than Dispmanx? Is there theoretically anything that can be done to achieve parity between the two?
The GL driver has to do lots of extra processing (i.e. running the shader) that the Dispmanx driver doesn't. Keep in mind the shader has to be run for every pixel on the screen - which for a 1080 screen is about 1.5 million times. You can see where the extra lag comes from.
Lag parity isn't possible because of this. With a really fast video card it could be reduced but that's not an option on the Pi.
Hm. So I understand how a shader can add more input delay (increasingly in correlation with the complexity of the shader). Would lowering the Retroarch rendering resolution to 720p instead of the default 1080p for SNES have any impact on input delay since it's half of the pixels that need to be processed by the shader?
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.