RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login

    Versatile C++ game scraper: Skyscraper

    Scheduled Pinned Locked Moved Ideas and Development
    skyscraperscrapergamelist.xmlscrapinggithub
    1.6k Posts 113 Posters 1.6m Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • muldjordM
      muldjord
      last edited by

      Skyscraper version 2.7.5 released: https://github.com/muldjord/skyscraper

      • Fixed a bug where 'brackets="false"' in config.ini would be flipped (Thanks to Vynce for reporting this)
      • Completely refactored pass procedures for cleaner code and to enable '--query' option
      • Added '--query' command line option. This option requires a single rom file to be passed on command line aswell, otherwise it will be ignored (Thank you to AnalogHero and Vynce for suggesting this)
      • Added scrapers to 'psx' and 'pc' platforms when using Simple Mode

      To elaborate on the "--query" option, this is how it works: For most modules a search query is sent to the scraping module in an URL format. That means that a filename such as "Rick Dangerous.lha" becomes "rick+dangerous". The '+' here means a space. You could probably also use the URL encoded space "rick%20dangerous" but my tests show that most modules expect spaces as '+'. And it is the "rick+dangerous" that you, as the user, can pass as the query, like so:

      $ Skyscraper -p [platform] -s [module] --query "rick+dangerous" [filename]
      

      Remember to also add a filename that you wish to use the override with. Otherwise the query will be ignored.

      But, not all of the scraping modules are search name based. For instance, the "screenscraper" module can use a variety of different search methods. So for screenscraper, you also have the option of overriding the checksums it use to search for a game. This is especially convenient in cases where a filename exists multiple times in their database and your own local file doesn't match with any of the connected checksums (maybe you've compressed the rom yourself or whatever).
      In this case you can look up one of the working checksums on "screenscrapers" website (screenscraper.fr) and override the checksum like these examples:

      $ Skyscraper -p [platform] -s [module] --query sha1=[checksum] [filename]
      $ Skyscraper -p [platform] -s [module] --query md5=[checksum] [filename]
      $ Skyscraper -p [platform] -s [module] --query sha1=[checksum]&md5=[checksum]&romnom=[exact url encoded filename] [filename]
      

      The last example combines two of the checksum options and even the "romnom" option which is "rom name" in French (this is a screenscraper thing, not a Skyscraper thing). You obviously only need one of the checksum options, it's just to show that you can combine them if you really need to.

      The '--query' option is clearly an "experts only" option, but for those that like to go down the rabbit hole, I am your humble servant. Down you go... :D

      And happy scraping! :)

      1 Reply Last reply Reply Quote 2
      • AnalogHeroA
        AnalogHero
        last edited by AnalogHero

        @muldjord Dont know if im doing something wrong, but the new query option isnt working for me. I tried this

        $ Skyscraper -p amiga -s screenscraper --query md5=0D5E4770B34021A666E9CADF0F39DA75 /home/pi/RetroPie/roms/amiga/BlackViper_v1.0_AGA.lha

        and it outputs
        1/1 (T1) Pass 1 ---- Game 'BlackViper_v1.0_AGA' not found :( ----

        What am i doing wrong?

        EDIT: Ok. It works. Just used a different entry from screenscraper.fr md5=7858DFE8AE9A6725B6CAD18D67FEBA61 worked.

        muldjordM 1 Reply Last reply Reply Quote 1
        • muldjordM
          muldjord @AnalogHero
          last edited by muldjord

          @analoghero Yeah, I ran into this problem with amiga games aswell. It needs to be the same platform, otherwise it won't match them even though you put in a custom checksum. So for instance, if you copy a sha1 from a cd32 game and try to override it for a standard amiga game, it won't be found. It will be found if you use the amiga version. So this should not be an issue as I assume people will always be using sha1's from the actual platform.

          Btw, Skyscraper differentiates the Amiga platforms from the filenames. If a filename has 'cd32' or 'cdtv' in it, it will switch to that platform for that rom.

          1 Reply Last reply Reply Quote 1
          • screechS
            screech
            last edited by

            @muldjord Just checked for the Screenscraper DB ;) "Amiga CD32", "Amiga AGA", "Amiga CDTV",... are child of "Amiga" ;) so if you scrape a CD32 rom on "Amiga" system, it normally return the good game referenced on "Amiga CD32" ;)

            It's the same for "MAME" for exemple, if you scrape a CPS1 game in the MAME System, it will return the good game (not needed to specify CPS1 ;) ).

            muldjordM 1 Reply Last reply Reply Quote 0
            • muldjordM
              muldjord @screech
              last edited by muldjord

              @screech Yes, this is true, but switching the platform (in Skyscraper) is necessary for the match to occur. If you scrape the 'amiga' platform with Skyscraper, it will not accept an entry for the 'Amiga CD32' platform because some scraping modules are name search based. So it needs to change the platform in order to accept the returned entry. :)

              With Screenscraper this is less important since it's checksum based. But for the others, it's a necessity since many games are multiplatform, and Skyscraper needs to find the right one. :)

              1 Reply Last reply Reply Quote 0
              • T
                tacodog
                last edited by

                I have a couple question about the usage of Skyscraper that I was hoping someone could help with.

                My first issue is that I use m3u files for my psx games because it handles multi disc games organically giving me just one entry in emulationstation. My m3u file points to ../psx-discs/...(disc 1).bin ../psx-discs/...(disc 2).bin etc. If I scrape by m3u filename, nothing is found. If I check by disc, I get duplicates for each disk, and the gamelist file doesn't point to the m3u. Is there a way to search by filename rather than checksums?

                Secondly, I have scraped all of my systems using screenscraper, and I am missing a handful of files.

                Local database cache stats:
                'screenscraper' module
                  Titles       : 428
                  Platforms    : 428
                  Descriptions : 426
                  Publishers   : 418
                  Developers   : 412
                  Ages         : 171
                  Tags         : 414
                  Ratings      : 279
                  ReleaseDates : 316
                  Covers       : 428
                  Screenshots  : 428
                  Wheels       : 423
                  Marquees     : 423
                  Videos       : 0
                
                

                I'd like to fill in these small gaps (just 3 wheels) from a second source, but don't necessary need to grab everything all over again. Is there a way to tell Skyscraper to only get the missing files? Or can I see which files are missing so I can try to grab them manually?

                Thanks

                muldjordM 1 Reply Last reply Reply Quote 0
                • muldjordM
                  muldjord @tacodog
                  last edited by muldjord

                  @tacodog You can scrape single files simply by providing the full or partial path to the rom on command line. You can also use the '--startat' and '--endat' command line options to only scrape a span of roms. As for seeing exactly which roms are missing the data, that is currently not possible.

                  For the m3u issue, the reason it isn't working for those is that the source you are using (ScreenScraper) doesn't have those exact files in their database. You'd need to overwrite the checksum manually, which can be done with '--query'. Try checking out the release notes for 2.7.5 here: https://github.com/muldjord/skyscraper/releases/tag/2.7.5

                  It's described in there. You can also try scraping it with another source, such as "mobygames" or "thegamesdb". They search by name instead of checksum, so if your filenames are good, they should provide good data. Not for wheels though, as those sources don't support wheels.

                  1 Reply Last reply Reply Quote 0
                  • H
                    hermit
                    last edited by

                    would it be possible to scrape Mess roms?
                    example for Amstrad GX4000 :
                    barb2.zip = Barbarian 2
                    brubber.zip = Burnin' Rubber

                    muldjordM 1 Reply Last reply Reply Quote 0
                    • muldjordM
                      muldjord @hermit
                      last edited by

                      @hermit Skyscraper supports many of the MESS platforms simply by scraping them with their own names such as:

                      $ Skyscraper -p nes -s screenscraper
                      

                      But GX4000 in particular isn't supported.

                      1 Reply Last reply Reply Quote 0
                      • A
                        annomatik
                        last edited by annomatik

                        Hi! Skyscraper scraps many games for me, but some it does not scrap are quite common which I find odd. E.g. Chaos Engine and Lemmings.

                        Here's the list of some of the currently not scraped games I'm experiencing with Skyscraper:

                        Amiga

                        AlienBreed1_v2.3a_0998.lha (2 + 3D scrap fine)
                        "Bad Ninjas (Homebrew 2018).adf"
                        Cadaver&CadaverThePayoff_v2.2_0900.lha
                        ChaosEngine1_v1.2_AGA_1324.lha
                        ChaosEngine2_v2.1_AGA_0173.lha
                        CrazyCars1_v1.2.lha
                        CrazyCars2_v1.1_0311.lha
                        CrazyCars3_v2.1.lha
                        DefenderOfTheCrown_v3.1_0317.lha
                        DeluxePacMan_v1.0_AGA.lha
                        EyeOfTheBeholder1_v1.5_0116.lha
                        EyeOfTheBeholder2_v1.1_0834.lha
                        GhostsNGoblins_v1.2.lha
                        GhoulsNGhosts_v1.0_2252.lha
                        Gobliiins1_v1.3_0065.lha
                        Gobliins2_v1.4_0185.lha (Goblins3 scraps fine)
                        GreatGianaSisters_v1.6_2945.lha
                        "Inviyya (Homebrew Demo 2018).adf"
                        ItCameFromTheDesert1_v2.0_0014.lha
                        ItCameFromTheDesert2_v2.0.lha
                        JamesPond1_v2.1.lha
                        JamesPond2_v2.0_AGA_1354.lha
                        JamesPond3_v1.0_AGA_0688.lha
                        "Lemmings 3 AGA (1 of 4).adf"
                        Lemmings1_v1.5_Files_2089.lha (Lemmings 2 scraps fine)
                        Llamatron_v1.1_1MB.lha
                        Lotus1_v1.1_0774.lha
                        Lotus2_v1.11_0497.lha (Lotus 3 scraps fine)
                        MegaLoMania_v1.7_0272.lha
                        MontyPythonsFlyingCircus_v1.3_0273.lha
                        Nebulus1_v1.3_0361.lha (Nebulus 2 scraps fine)
                        "Operation Lemming v1.2.adf"
                        Populous1&DataDisks_v1.1_0069&1217.lha
                        Populous2&ChallengeGames_v1.3_0079.lha
                        ProjectX1_v1.3_0886.lha
                        ProjectX2_v1.3_0289.lha
                        ProjectXSE_v1.5_0927.lha
                        RType1_v1.4.1_0940.lha (RType 2 scraps fine)
                        SensibleTrainSpotting_v1.0.lha
                        ShadowOfTheBeast1_v2.2_1357.lha
                        ShadowOfTheBeast2_v1.3a_1359.lha
                        ShadowOfTheBeast3_v1.6_0016.lha
                        Speedball1_v2.0_0581.lha (Speedball 2 scraps fine)
                        SuperMethaneBros_v1.3.lha
                        TitusTheFox_v2.0_0226.lha
                        Turrican1_v2.0_0092.lha ( Turrican 2 + 3 scrap fine)
                        Xenon1_v1.2_0399.lha (Xenon 2 scraps fine)
                        "Zerosphere (Homebrew 2015).adf"
                        NES

                        "Balloon Fight (USA) 2P.nes"
                        "Banana Prince (GER).nes"
                        "Castlevania 2 (USA).nes" (Castlevania 1 scraps fine)
                        "Donkey Kong Jr. (World).nes" (Donkey Kong scraps fine)
                        "Galaga (USA).nes"
                        "Ninja Gaiden 1 (USA).nes"
                        "SCAT (USA) 2P.nes"
                        "StarTropics 1 (USA).nes"
                        "Tetris (Nintendo).nes" (Tetris (Tengen) scraps fine)

                        1 Reply Last reply Reply Quote 0
                        • muldjordM
                          muldjord
                          last edited by muldjord

                          @annomatik Most of the LHA's are not yet in the screenscraper database. It will work for some of them with the openretro module, but overall they are poorly supported at this time (for all scrapers, not just Skyscraper). We are in the process of making them work with both openretro and screenscraper.

                          As for the nes games, try looking them up on the screenscraper site and check if your checksums for your roms match those in their database. If they do please provide some documented examples of matches that don't return a result and I can look into this further. Thank you. :)

                          1 Reply Last reply Reply Quote 0
                          • A
                            annomatik
                            last edited by

                            thanks, will do. Which checksum tool should I use (either Windows or Raspbian would be fine)?

                            mituM 1 Reply Last reply Reply Quote 0
                            • mituM
                              mitu Global Moderator @annomatik
                              last edited by

                              @annomatik On Linux you can use md5sum <filename> or sha1sum <filename>, on Windows I think you can use Powershell to compute them - https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/get-filehash?view=powershell-6.

                              1 Reply Last reply Reply Quote 1
                              • muldjordM
                                muldjord
                                last edited by

                                Hi guys, please check here: https://retropie.org.uk/forum/topic/19588/skyscraper-now-officially-part-of-retropie-please-test

                                :)

                                1 Reply Last reply Reply Quote 0
                                • A
                                  annomatik
                                  last edited by

                                  ok, here are the check sums for the NES games... which database should I compare them to?

                                  The MD5 sums:

                                  8bff672e12a06787abfa3372f27f2ae4 Balloon Fight (USA) 2P.nes
                                  a1db8a18bae78e276d22e0725062b559 Banana Prince (GER).nes
                                  73fa5a34a5c1579161a8c1e8ef1b2afc Castlevania 2 (USA).nes
                                  e824d5b1c62bc13de3d2716d29225371 Donkey Kong Jr. (World).nes
                                  9ab02f9f92d67a7ae91ebf981c1e2adf Galaga (USA).nes
                                  2e4a6b4553d1f47db8f0e484aa5cfa3e Ninja Gaiden 1 (USA).nes
                                  d8cbc5f2dbf44fae5239a170ad6a092d SCAT (USA) 2P.nes
                                  cd7816986cbae890e15debd5a06e6414 StarTropics 1 (USA).nes
                                  5c4ab6b03431f7298b19bcab58e858f0 Tetris (Nintendo).nes

                                  The SHA1 sums:

                                  3031c47d4481360a911763213fc3ee4a94779d0a Balloon Fight (USA) 2P.nes
                                  ef8214f4bad14443b37121091690416f8375fbfc Banana Prince (GER).nes
                                  990f6b4c12812ab11f2716d82bfda59de742d636 Castlevania 2 (USA).nes
                                  1e6fcc6af295a2e41f8370efcf2440da4f29d017 Donkey Kong Jr. (World).nes
                                  f0c642b45424b85231e8fe678c9b741fbf16cce9 Galaga (USA).nes
                                  82e8d9e63a2f7cdaefc89ca3a8946b793b15db76 Ninja Gaiden 1 (USA).nes
                                  74a15693eebc2c01c7982c0c9a3f294bc26f53a7 SCAT (USA) 2P.nes
                                  ec6ebf666e3871a8dd6f296a6e200a3ca3d82a87 StarTropics 1 (USA).nes
                                  865849aa2064ab38a5959116e697ebbfda9f4795 Tetris (Nintendo).nes

                                  mituM 1 Reply Last reply Reply Quote 0
                                  • mituM
                                    mitu Global Moderator @annomatik
                                    last edited by

                                    @annomatik said in Versatile C++ game scraper: Skyscraper:

                                    ok, here are the check sums for the NES games... which database should I compare them to?

                                    I posted in your other topic how you can check the data on the Screenscraper.fr site, each game has it's own set of name/roms/hashes associated (in the 'ROMs/ISOs' tab of the game detail page)

                                    0_1539342857947_986b9071-805d-4840-a8a0-a9d1eab95809-image.png

                                    1 Reply Last reply Reply Quote 0
                                    • A
                                      annomatik
                                      last edited by

                                      Wow, that's a ton of games there... :-)

                                      I just checked the first couple of games. Balloon Fight, Banana Prince, Castlevania 2, Donkey Kong Jr. : many versions are there, but not with the checksums I have.

                                      So... I can think of at least four solutions:

                                      • Do I really have to keep looking for the right roms ? That would be probably a lot of effort.
                                      • I could try adding my own versions to the lists, but why, the meta-data is already there, the roms are working fine, there's already many different versions of the same game (why not just have a list of different checksums for the same game?)...
                                      • Could Skyscrapper just match filenames, not CRC sums? How would I enable that? and, if not
                                      • Can I maybe give Skyscrapper a pointer to the right meta-data?

                                      In Kodi, it is possible to add a .nfo file, which can contain the link to the details-page of a movie / tv show episode. It is in the exact same solution like the video, just instead of mp4/mkv/whatever, it has the extension nfo. In its simplest form, it's just a simple link, but it can also be used for storing all meta-data, if wanted. And if combined with a link, it first takes the provided meta-data from the nfo and takes the rest via scrapping from the nfo.

                                      Maybe, if that's not yet possible, it would be possible to provide direct links somehow?

                                      Thanks!

                                      muldjordM 1 Reply Last reply Reply Quote 0
                                      • T
                                        tacodog
                                        last edited by

                                        @muldjord Thanks for the replay about the psx games. I was able to write a script that would read the m3u file to get the cue file, and then grab the checksum of the bin/cue files and grab the correct images using --query. It works great.

                                        I have another question/issue. I just scraped my nes roms and Super Mario 2 seems to grab the japanese wheel no matter what I do. I've tried --region 'us', 'eu', and 'wor', but I always get the "Super Mario USA" wheel.

                                        Is this an issue with the way I'm running Skyscraper? an issue with the program itself? or an issue with the screenscraper database? Can you repeat this?

                                        The wheel I want is located here:
                                        https://screenscraper.fr/image.php?gameid=1248&media=wheel&hd=1&region=wor&num=&version=&maxwidth=600&maxheight=300

                                        Skyscraper -p nes --refresh --region 'wor' --unpack -s screenscraper -i . -o . -g . --pretend /mnt/exhd0/roms/nes/Favorites/Super\ Mario\ Bros.\ 2\ \(USA\)\ \(Rev\ A\).zip
                                        

                                        The md5 of my (unzipped) rom is d769d73689ea4684dbb0bd7cf3a24b2f.

                                        Thanks

                                        muldjordM 1 Reply Last reply Reply Quote 0
                                        • muldjordM
                                          muldjord @annomatik
                                          last edited by

                                          @annomatik Skyscraper already matches just filenames when it looks for games at screenscraper. But it only works if the filename is unique in their database. So if they have 2 roms with the same filename, it won't work. This is not a Skyscraper thing, but a screenscraper thing, so not something I can change. You can try scraping the games with some of the other sources. They are filename search based.

                                          You can, however, overwrite the checksums Skyscraper uses when looking for the games with the '--query' option. Read more about that here: https://github.com/muldjord/skyscraper/releases/tag/2.7.5

                                          It works from 2.7.5 and newer. :)

                                          1 Reply Last reply Reply Quote 0
                                          • muldjordM
                                            muldjord @tacodog
                                            last edited by

                                            @tacodog I'll need to look into that when I get the time. It could be a bug of some sort. If the '--region us' doesn't pull it in, there might be a problem. If the wheel exists in their database, it should use it. If it didn't it will prioritize other regions until it finds one that does exist. So either there's a problem with the us wheel in their database, or there's an issue in Skyscraper. Should be easy to reproduce though.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post

                                            Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                            Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.