RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login

    A Better Way to Scrape?

    Scheduled Pinned Locked Moved General Discussion and Gaming
    scraper fixretropie 4.5
    22 Posts 12 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • George SpiggottG
      George Spiggott
      last edited by

      With the current situation putting unprecedented strain on scraping servers I wonder if there are more efficient ways to scrape games that put less strain on their servers?

      Currently running:
      Retropie 4.8.9 on a Pi Zero 2W (Overclock Settings: CPU 1400Mhz)
      ES-DE on a GMKtec K6 (Windows 11, 32GB RAM)

      1 Reply Last reply Reply Quote 1
      • S
        serdateclas
        last edited by

        I have just started using retropie (as result of covid-19 I had to complete my daughter's arcade months ahead of planned) so you can imagine my frustration with the scrappers.
        For text there are some options working but I really want those videos on the interface, however everything I can get are download errors.
        I am considering to write a very complex solution, a desktop based scrapper (ssh connection to the pi, I'll need the CPU power) that would download youtube gameplay videos and cut about 20 seconds from the middle of the video. It would then detect if there are people talking over the video (in a review or commented gameplay) to search for another video if that was the case. Then handbrake encode and generate the compatible videos.
        Say we all start using this, youtube won't run out of server capacity, but is there a solution (ideally serverless) for the matching the hashs to the games?

        langestL nexusone13N 2 Replies Last reply Reply Quote 0
        • langestL
          langest @serdateclas
          last edited by

          @serdateclas said in A Better Way to Scrape?:

          but is there a solution (ideally serverless) for the matching the hashs to the games?

          You would need to rely on the meta information you get from YouTube. I doubt they provide game hashes. But I've seen some videos that are tagged with game information, you might need to parse that and match against some scraping db.

          How did you intend to figure out if they talk in the video? Sounds like a non-trivial task.

          1 Reply Last reply Reply Quote 0
          • nexusone13N
            nexusone13 @serdateclas
            last edited by

            Hi @serdateclas, Screenscraper encounter some trouble with their servers, that's why you face some video downloading errors. They are working on it. you can follow the progress of the situation on their Discord : https://discord.gg/hCxqCFe

            ARRM (Roms Manager, Scraper for Recalbox, Batocera, Retropie, ES...) : http://jujuvincebros.fr/telechargements2/file/10-arrm-another-recalbox-roms-manager
            Wiki : http://jujuvincebros.fr/wiki/arrm/doku.php?id=start-en
            Discord: https://discord.gg/p7QsBTS

            1 Reply Last reply Reply Quote 0
            • ExarKunIvE
              ExarKunIv
              last edited by

              Right now it is very hit or miss, but in the morning seems to be the best time to get scraping done. im running a batch right now and its pulling both vid, and pics. but yesterday it was doing nothing.

              think other then writing a new program that does not use the scraping sites. we just have to try and wait it out.

              RPi3B+ / 200GB/ RetroPie v4.5.14, RPi4 Model B 4gb / 256gb / RetroPie 4.8.2
              RPi5 4gb / 512gb / RetroPie 4.8.9 -Basic
              Maintainer of RetroPie-Extra .

              1 Reply Last reply Reply Quote 0
              • quicksilverQ
                quicksilver
                last edited by

                Using skyscraper with my login info in the config.ini for screenscraper I have not had any issues scraping recently.

                BriganeB 1 Reply Last reply Reply Quote 0
                • BriganeB
                  Brigane @quicksilver
                  last edited by Brigane

                  @quicksilver Do you know if it's possible to add the login info when using Steven Selph Scraper?
                  I kinda like the way I can add scraped data to my gamelist, instead of overwriting the old gamelist each time. I can't seem to find that feature with Skyscraper.

                  Systems: Raspberry Pi 0/2/3 Model B+
                  Os: RetroPie 4.5
                  Frontend: Emulationstation & Attract Mode

                  quicksilverQ OldSchoolGuyO 2 Replies Last reply Reply Quote 0
                  • quicksilverQ
                    quicksilver @Brigane
                    last edited by

                    @Brigane what's the advantage for appending the gameslist.xml vs overwriting?

                    mituM BriganeB 2 Replies Last reply Reply Quote 1
                    • mituM
                      mitu Global Moderator @quicksilver
                      last edited by

                      @quicksilver The idea is that the scraper would add artwork/metadata only for the entries that don't have it. It's kind of equivalent of Skyscraper's cache, only that it used the gamelist.xml as a cache to decide to scrape the entry or not.

                      @Brigane you can add credentials when scraping using the command line parameters (-ss_user/-ss_password).

                      1 Reply Last reply Reply Quote 1
                      • BriganeB
                        Brigane @quicksilver
                        last edited by Brigane

                        @quicksilver I had some issues and I don't think I'm the only one. The scraper downloaded the video snaps in small sizes so they somehow became corrupted for some of the games, so the video snap didn't show up in Emulationstation. I have manually added new videos to the games I had problems with, and I don't want the scraper to overwrite them again.

                        Systems: Raspberry Pi 0/2/3 Model B+
                        Os: RetroPie 4.5
                        Frontend: Emulationstation & Attract Mode

                        nexusone13N quicksilverQ 2 Replies Last reply Reply Quote 0
                        • nexusone13N
                          nexusone13 @Brigane
                          last edited by nexusone13

                          @Brigane Sometimes you have to recompress video that didn't show up in emulationstation (you can use handbrake or ffmpeg or you can use ARRM to do this by selecting roms you want to convert http://jujuvincebros.fr/wiki/arrm/doku.php?id=compress_video_en ).

                          ARRM (Roms Manager, Scraper for Recalbox, Batocera, Retropie, ES...) : http://jujuvincebros.fr/telechargements2/file/10-arrm-another-recalbox-roms-manager
                          Wiki : http://jujuvincebros.fr/wiki/arrm/doku.php?id=start-en
                          Discord: https://discord.gg/p7QsBTS

                          1 Reply Last reply Reply Quote 1
                          • quicksilverQ
                            quicksilver @Brigane
                            last edited by

                            @Brigane I believe there is a way for skyscraper to scrape your current gameslist. That way you wouldn't lose any custom work you have done. @muldjord is this possible?

                            muldjordM 1 Reply Last reply Reply Quote 0
                            • muldjordM
                              muldjord @quicksilver
                              last edited by

                              @quicksilver Yes, Skyscraper can cache the current gamelist.xml data using the -s esgamelist scraping module. It will grab the data from the gamelist and save it in Skyscraper cache, in turn allowing users to re-generating the gamelist with that same data in case they want to avoid Skyscraper overwriting it. But I always advice to make backups of the current gamelist.xml's, simply because people often misunderstand what it does. For instance, if you have composited screenshots in the current gamelist.xml, Skyscraper will import that as a screenshot. But because it is already composited of different artwork, users get confused when they re-generate the gamelist, since Skyscraper also composites from different artwork. So you get sortof a composite inside a composite.

                              So -s esgamelistworks best for gamelists containing only raw artwork.

                              quicksilverQ 1 Reply Last reply Reply Quote 0
                              • quicksilverQ
                                quicksilver @muldjord
                                last edited by

                                @muldjord said in A Better Way to Scrape?:

                                But I always advice to make backups of the current gamelist.xml's, simply because people often misunderstand what it does. For instance, if you have composited screenshots in the current gamelist.xml, Skyscraper will import that as a screenshot. But because it is already composited of different artwork, users get confused when they re-generate the gamelist, since Skyscraper also composites from different artwork. So you get sortof a composite inside a composite.

                                Thanks for the warning that makes perfect sense though I probably wouldnt have thought of that.

                                1 Reply Last reply Reply Quote 0
                                • George SpiggottG
                                  George Spiggott
                                  last edited by George Spiggott

                                  I think Skyscraper will allow you to import images from a local folder. As a suggestion would it make sense to make the images that are downloaded from Skyscraper's server available as a torrent file? That would allow many more people to access the scraper while taking images from a local PC.

                                  Currently running:
                                  Retropie 4.8.9 on a Pi Zero 2W (Overclock Settings: CPU 1400Mhz)
                                  ES-DE on a GMKtec K6 (Windows 11, 32GB RAM)

                                  muldjordM 1 Reply Last reply Reply Quote 0
                                  • muldjordM
                                    muldjord @George Spiggott
                                    last edited by

                                    @George-Spiggott Skyscraper doesn't have a server. Skyscraper is the scraping software. Maybe you mean ScreenScraper? Which is one of the sources supported inside Skyscraper.

                                    Skyscraper: Scraping software
                                    ScreenScraper: Online game data source

                                    George SpiggottG 1 Reply Last reply Reply Quote 0
                                    • George SpiggottG
                                      George Spiggott @muldjord
                                      last edited by

                                      @muldjord Ok thanks. So the same suggestion but using images from Screenscraper or similar then.

                                      Currently running:
                                      Retropie 4.8.9 on a Pi Zero 2W (Overclock Settings: CPU 1400Mhz)
                                      ES-DE on a GMKtec K6 (Windows 11, 32GB RAM)

                                      muldjordM 1 Reply Last reply Reply Quote 0
                                      • muldjordM
                                        muldjord @George Spiggott
                                        last edited by

                                        @George-Spiggott It's not that simple I'm afraid. To upload such a pack would require the license / permission of the sources to be sorted out and that's just not something I have any interest in looking into.

                                        And I'm not really sure it's a great idea either. If a pack was created, you would go from downloading just the data you need per user (as it is now), to downloading all data for all games in a pack even though you don't need all of it. So... It sounds good on paper. But it's a huge waste of bandwidth as I see it.

                                        George SpiggottG 1 Reply Last reply Reply Quote 0
                                        • George SpiggottG
                                          George Spiggott @muldjord
                                          last edited by

                                          @muldjord Yes there would be bandwidth wastage but it would be the users rather than the content providers and as I understand the situation
                                          it is currently their bandwidth that is at a premium.

                                          That said I'm certainly not trying to force my ideas on the creators. Thanks both for your work and your valued input.

                                          Currently running:
                                          Retropie 4.8.9 on a Pi Zero 2W (Overclock Settings: CPU 1400Mhz)
                                          ES-DE on a GMKtec K6 (Windows 11, 32GB RAM)

                                          1 Reply Last reply Reply Quote 0
                                          • OldSchoolGuyO
                                            OldSchoolGuy @Brigane
                                            last edited by

                                            @Brigane There is, indeed, a way to use log-in info for Sky Scraper when using Selph Scraper.
                                            https://github.com/sselph/scraper/wiki

                                            In Command Line Flags under the sub-heading "Full List", and about 3/4 of the way down the commands, you will find the options to insert your log-in info. You have to run the scraper from the console and use the command line. You can also use the -console_src ss to force it to prioritize Sky Scraper.

                                            As a side note, If you have some stubborn roms that won't scrape, you can edit a copy of the "Hash.csv" file to include the new hashes and then point the scraper at the updated file with -hash_file command. I use this technique all the time to scrape ps1 games that I made into .PBP files since their hashes are not in the original file.

                                            If you have lots of roms that you need to get hashes for and add to the hash file, I wrote a nice little Powershell script that does a great job of spitting out hashes for files in lowercase. You can download it from this link:

                                            https://drive.google.com/file/d/1saYP91JHvkgK8By60BRChSMakSxj3Sno/view?usp=sharing

                                            The script runs on Windows 10 and is safe to use and distribute if you like it or feel it could help someone. The best part is being able to get the hashes output in lowercase, which is a pain in the butt if you are a Windows user.

                                            B 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post

                                            Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                            Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.