RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login

    Versatile C++ game scraper: Skyscraper

    Scheduled Pinned Locked Moved Ideas and Development
    skyscraperscrapergamelist.xmlscrapinggithub
    1.6k Posts 113 Posters 1.5m Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Silent
      last edited by Silent

      I just had a realization (and a case where I tested this) - for Screenscraper module, it's taking checksums of .cue files, which is making queries extremely sensitive to filename changes. Have you considered adding a pass using checksums of corresponding .bin files?

      Yes, I realize it would be painfully slow to hash a big .bin file, so if anything that should be an opt-in option with a very clear "Please be patient, go get yourself some tea or do something useful for once" message.

      I also realize --query can handle this case - so if you're not a fan of this option (honestly, waiting minutes till hashes are done may be sup-optimal), maybe we could get something parallel to aliasMap, but for Screenscraper hashes? Some kinda "don't bother calculating, use this hash instead" file, so games with a .cue and .bin files could be manually tailored like this.

      EDIT:
      I have great success scraping my PSX roms with --query so yes, IMO a hashMap.csv file identical to how aliases work would be great. Objectively better than making skyscraper hash .bin files together with .cue, as then 1) matching filenames would not be a concern 2) can hash files from PC and do it just once.

      EDIT2:
      Just so I don't double post, another unrelated idea - what do you think about allowing Moonlight .ml extensions (once it's part of RetroPie-Setup of course) for all emulators, like .zip and .7z are handled now? Technically you can stream any emulator to pi using it, so it'd be very logical to allow it to be scraped from everywhere. I just set up a ps2 system like this and it works wonders.

      muldjordM 1 Reply Last reply Reply Quote 0
      • muldjordM
        muldjord @Silent
        last edited by

        @Silent said in Versatile C++ game scraper: Skyscraper:

        EDIT2:
        Just so I don't double post, another unrelated idea - what do you think about allowing Moonlight .ml extensions (once it's part of RetroPie-Setup of course) for all emulators, like .zip and .7z are handled now? Technically you can stream any emulator to pi using it, so it'd be very logical to allow it to be scraped from everywhere. I just set up a ps2 system like this and it works wonders.

        I wouldn't be opposed to that as long as it makes sense. But let's talk about that when we get there.

        1 Reply Last reply Reply Quote 1
        • muldjordM
          muldjord
          last edited by

          I am working towards the 3.0.0 release and I am planning a change which will clarify the gather and combine paths in Skyscraper. Basically the change is simply this:

          Only when scraping with the localdb module will the artwork and game list generator be run.

          I other words: When scraping with anything other than localdb, it won't save a gamelist.xml and composite the artwork. Ever.

          So why am I changing this? For several reasons. I almost always personally forget to add --pretend whenever I gather data from any of the non-localdb modules. This means that hundreds or thousands of image files are written to disk and my gamelist.xml is overwritten with data from just that one source. Processing artwork slows down the scraping process significantly and hammers the SD card for no reason. And when using Skyscraper you should always scrape with localdb after having gathered data from any of the other modules anyways. I've outlined that a bit here.

          With this in place I will of course also give the user better tools to exclude certain sources when scraping any platform. So if you want to scrape from cached but only allow resources from one or more sources, you will be able to do that.

          Please let me hear your thoughts on this change. I know some of you might be against this change, but please read the above before jumping to conclusions. :)

          1 Reply Last reply Reply Quote 3
          • S
            Silent
            last edited by

            Those are very good changes! It IMO makes sense that scraping from online sources should not update your gamelists, as more often than not you'll want to scrape your brand new ROMs from multiple sources and then have localdb output the best of the best.

            This change should make the difference between "scraping" and "generating gamelists" pretty well defined - admitedly, that was a tiny bit confusing for me when I started using skyscraper, but with this new behaviour it should be clear.

            1 Reply Last reply Reply Quote 0
            • S
              sglavach
              last edited by

              Has anyone got the igdb scraping module to work? I have tried passing my credentials with the -u command line option in addition to trying to use config.ini I have also tried using my api key and my userid:pw and can't get any combination to work. I only have the free account with igdb. Any suggestions are appreciated!

              Thanks!

              muldjordM 1 Reply Last reply Reply Quote 0
              • muldjordM
                muldjord @sglavach
                last edited by

                @sglavach All keys given out by igdb currently are APIv3 keys. I am in the process of converting the module to the new API, so until then it won't work.

                1 Reply Last reply Reply Quote 0
                • muldjordM
                  muldjord
                  last edited by

                  I've been talking to the good people at IGDB and gotten some things cleared up. The "user-key" provided to me for the API is only meant for developers. As such, the 10k monthly limit should be applied to all Skyscraper users in total, not 1 key per user. So this will be changed to use a hidden key instead. The good thing is that people will then no longer need to supply their own to use it. The caveat is the limit obviously. But to keep the databases stable, we need to adhere to these things. And I certainly will with Skyscraper.

                  1 Reply Last reply Reply Quote 2
                  • S
                    Silent
                    last edited by

                    That's great news! From what I have seen IGDB has very good resources, so it may improve quality of some scrapes (especially for newer games) significantly.

                    1 Reply Last reply Reply Quote 1
                    • HalvhjearneH
                      Halvhjearne
                      last edited by

                      really nice tool
                      skyscraper+screenscraper was able to scrape 98% of all my nintendo and sega roms. :D

                      however i do have a problem when scraping fba and mame roms.
                      as example:
                      1941 when scraped with sselph scraper returns the name "1941: Counter Attack (World 900227)" from arcadeitalia and 1941u returns the name "1941: Counter Attack (USA 900227)", but when i scrape it with skyscraper, all 1941 variants have the name "1941: Counter Attack" and its impossible to tell them apart.
                      is there a way to force it to return the name in the same way as sselph's scraper?
                      i noticed that the nintendo and sega games show version and region in the name, but my guess is thats maybe from the filenames?
                      idk

                      anyway, it would also be nice to see a detailed description of the options in config.ini, its rather hard to figure out how or what some of them does.

                      otherwise, keep up the good work :)

                      muldjordM 1 Reply Last reply Reply Quote 0
                      • muldjordM
                        muldjord @Halvhjearne
                        last edited by muldjord

                        @Halvhjearne It already does so :) It's only in the output it doesn't show the bracket notes (if not you might have disabled it with the --nobrackets option or brackets="false" in config.ini. Please reload your ES and check it. It shows up as "1941: Counter Attack (World 900227)" here on my system.

                        HalvhjearneH 1 Reply Last reply Reply Quote 0
                        • HalvhjearneH
                          Halvhjearne @muldjord
                          last edited by

                          @muldjord
                          i tried it a few times and every time i scraped with skyscraper, all variants have the same name.

                          its running now with brackets="true" as i thought maybe i got it wrong and it takes a while to scrape, when only using 1 thread, however its almost done now and i will see what came out and if its still the same i will try with brackets="false", but isnt that supposed to be default behaviour?

                          im pretty sure i tried that too with same result or maybe im just confused by now ...

                          muldjordM 1 Reply Last reply Reply Quote 0
                          • muldjordM
                            muldjord @Halvhjearne
                            last edited by muldjord

                            @Halvhjearne I think you are missing out a bit. If you scrape with the same scraping module twice, it will be really fast, unless you have enabled "--refresh". If refresh is enabled it will rescrape all of the files from the source again. No need for that. Skyscraper has a cache that is much faster. With refresh disabled, it will simply use the already cached data. I recommend skimming the documentation if you haven't already done so. Understanding how the cache works is pretty important if you want to use Skyscraper to its full potential: https://github.com/muldjord/skyscraper . It's a very powerful tool beyond just scraping from a single source.

                            Either way, I am not sure what you have done to your setup, but by default, if nothing in the config has been changed, it will have the USA and WORLD designations for fba and mame roms. I just tested it here and it works perfectly.

                            For a quick use case example, check here: https://github.com/muldjord/skyscraper/blob/master/USECASE.md

                            HalvhjearneH 1 Reply Last reply Reply Quote 0
                            • HalvhjearneH
                              Halvhjearne @muldjord
                              last edited by

                              @muldjord
                              so it only cache the pictures and not the text?

                              muldjordM 1 Reply Last reply Reply Quote 0
                              • muldjordM
                                muldjord @Halvhjearne
                                last edited by

                                @Halvhjearne No, it caches everything. Including the USA and WORLD designations. But it's a bit much to explain here, try reading through the documentation. I think you'll understand once you have.

                                HalvhjearneH 1 Reply Last reply Reply Quote 0
                                • HalvhjearneH
                                  Halvhjearne @muldjord
                                  last edited by

                                  @muldjord
                                  i think i got it now, but i am using the "interface" from retropie setup and im pretty sure there is a bug with it, if its supposed to work as you explained.
                                  i just scraped with brackets="true" in config.ini and now it shows up as i wanted it.

                                  in the interface it seems not to care what setting its set to, it just removes brackets with fba and mame games, no matter what, but after setting brackets in config.ini, it now works.

                                  i hope it still scrapes nintendo and sega games as expected tho ... lol

                                  muldjordM 1 Reply Last reply Reply Quote 0
                                  • muldjordM
                                    muldjord @Halvhjearne
                                    last edited by muldjord

                                    @Halvhjearne Have you updated the RetroPie script lately? There was a bug with the bracket option at one point I think. If that doesn't fix it, please report it to @mitu here (he's the one who made the RetroPie script for Skyscraper): https://retropie.org.uk/forum/topic/19588/skyscraper-now-officially-part-of-retropie-please-test

                                    HalvhjearneH 1 Reply Last reply Reply Quote 0
                                    • HalvhjearneH
                                      Halvhjearne @muldjord
                                      last edited by Halvhjearne

                                      @muldjord
                                      it was updated within the last 3 days

                                      edit:
                                      i like that it does not sort "the" under t, although i almost got used to that by now, so probably wont be able to find the games i play that starts with "the" :D

                                      muldjordM 1 Reply Last reply Reply Quote 1
                                      • muldjordM
                                        muldjord @Halvhjearne
                                        last edited by

                                        @Halvhjearne Yeah, that's one of the earliest features I added back a year ago or so. :)

                                        1 Reply Last reply Reply Quote 0
                                        • HalvhjearneH
                                          Halvhjearne
                                          last edited by

                                          i got an api key from igdb and i added userCreds="APIKEY" in config.ini, but for some reason it keeps saying the key dosnt work even if i run it at command line with -u APIKEY, am i doing something wrong?

                                          also i noticed that when i set maxFails in config.ini over 200 it will always end when it reaches 42 and that it for some reason always refreshes when scraping mame-mame4all ... is this intended behavior?

                                          S 1 Reply Last reply Reply Quote 0
                                          • S
                                            sglavach @Halvhjearne
                                            last edited by

                                            @Halvhjearne There are responses to this issue below but in a nutshell IGDB scraping won't work until the next release and you won't need your personal api key as it will use an api key assigned to skyscraper. There will also be a limit to the number of skyscraper api calls per day/month I believe.

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post

                                            Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                            Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.