RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login
    Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

    Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source

    Scheduled Pinned Locked Moved Help and Support
    skyscrapergamelist.xmlscrapingcli
    14 Posts 5 Posters 4.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      TomFury
      last edited by

      Hello!

      This is a Skyscraper specific question. I have been fiddling with my RetroPie setup and fallen in love with the versatility of Skyscraper. I've been perpetually adding roms to my RetroPie installation whereafter I have executed Skyscraper (CLI) to scrape the new entries. For example:

      Skyscraper -p nes -s screenscraper
      

      ... and then generating a new gamelist.xml:

      Skyscraper -p nes
      

      This works great, but as I go I notice that there are roms that never are recognized (and that is fine - some of these are obscure and I do not expect to find a match at Screenscraper). Example output:

      #51/79, (1/50)
      Elapsed time   : 00:05:23
      Est. time left : 00:02:57
      
      #52/79 (T1) Pass 1 ---- Game 'Duel, The by Bokudono (PD)' not found :( ----
      
      
      'screenscraper' requests remaining: 16946
      
      #52/79, (1/51)
      Elapsed time   : 00:05:27
      Est. time left : 00:02:49
      
      #53/79 (T1) Pass 1 ---- Game 'Air (SMB1 Hack)' not found :( ----
      
      
      'screenscraper' requests remaining: 16946
      
      #53/79, (1/52)
      Elapsed time   : 00:05:31
      Est. time left : 00:02:42
      
      #54/79 (T1) Pass 1 ---- Game 'Amiga! Demo (PD)' not found :( ----
      
      
      'screenscraper' requests remaining: 16944
      
      #54/79, (1/53)
      Elapsed time   : 00:05:35
      Est. time left : 00:02:35
      
      #55/79 (T1) Pass 1 ---- Game 'Atomic Robo-Kid Demo (PD)' not found :( ----
      

      Now, this list of "allowed unknowns" will grow as my rom collection grows. I feel it's unecessary to hammer the Screenscraper online service for items that I know will not render a match. The question I have is if it's possible to instruct Screenscraper to skip certain set of files (preferably based on the report that the script provides at the end; skipped-nes-screenscraper.txt)?

      If it's possible - what would your recommended workflow be for accomplishing this?

      I have been reading through the CLI documentation where I found that one can use --fromfile as input, but only as a means to instruct Skyscraper what to scrape. I'm looking for the inverse option; How to instruct Skyscraper what to not scrape.

      The only option I have concluded that might work (haven't tried yet!) is to manually add entries into gamelist.xml and instruct Skyscraper to "skip existing entries". I guess that would work, but there might be a smarter/better way to accomplish this.

      What's your thoughts? :)

      1 Reply Last reply Reply Quote 0
      • S
        sleve_mcdichael
        last edited by sleve_mcdichael

        Perhaps with the --excludefiles "PATTERN1,PATTERN2" option?

        https://github.com/muldjord/skyscraper/blob/master/docs/CLIHELP.md#--excludefiles-pattern1-pattern-2

        T 1 Reply Last reply Reply Quote 0
        • T
          TomFury @sleve_mcdichael
          last edited by

          @sleve_mcdichael said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:

          Perhaps with the --excludefiles "PATTERN1,PATTERN2" option?

          https://github.com/muldjord/skyscraper/blob/master/docs/CLIHELP.md#--excludefiles-pattern1-pattern-2

          Aha, that could be useful perhaps. I got locked into only using that option as a more generic pattern matching expression, but of course one could be a little bit more literal I guess. Then I would have to populate a list with all these files where the pattern would have to be the complete file name. Like this perhaps:

          Skyscraper -p nes -s screenscraper --excludefiles "Duel, The by Bokudono (PD).nes,Air (SMB1 Hack).nes,Amiga! Demo (PD).nes,Atomic Robo-Kid Demo (PD).nes"
          

          Perhaps this could be suitable to do within a wrapper script (platform and scraper module could be passed as input), and the list perpetually populated with more entries as my exclusion list of roms is growing:

          #!/bin/bash
          EXCLUDED_FILES="\"Duel, The by Bokudono (PD).nes,Air (SMB1 Hack).nes,Amiga! Demo (PD).nes,Atomic Robo-Kid Demo (PD).nes\""
          Skyscraper -p nes -s screenscraper --excludefiles "$EXCLUDED_FILES"
          
          YFZdudeY 1 Reply Last reply Reply Quote 0
          • YFZdudeY
            YFZdude @TomFury
            last edited by YFZdude

            @tomfury
            If it is not already an option perhaps a request could be made to add a feature to skyscraper to pull a list of excludes from a text file.

            EDIT: Or if it doesn't put a wrench in your setup, you could just add a tag like [exc] to all the rom files you want to skip and then match that one pattern.

            T 1 Reply Last reply Reply Quote 1
            • T
              TomFury @YFZdude
              last edited by

              @yfzdude said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:

              @tomfury
              If it is not already an option perhaps a request could be made to add a feature to skyscraper to pull a list of excludes from a text file.

              EDIT: Or if it doesn't put a wrench in your setup, you could just add a tag like [exc] to all the rom files you want to skip and then match that one pattern.

              Yes, I will make a request (at the Github project I believe?).

              Clever suggestion though! That's a better option than my first suggestion. Follow-up question/thought on that: If I change the filename for a rom, then I would have to manually edit the gamelist.xml (and cache/<platform>/quickid.xml) to keep the mappings intact?

              mituM YFZdudeY kiroK 3 Replies Last reply Reply Quote 0
              • mituM
                mitu Global Moderator @TomFury
                last edited by

                @tomfury said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:

                Yes, I will make a request (at the Github project I believe?).

                You can also ask in this topic.

                1 Reply Last reply Reply Quote 1
                • YFZdudeY
                  YFZdude @TomFury
                  last edited by YFZdude

                  @tomfury said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:

                  If I change the filename for a rom, then I would have to manually edit the gamelist.xml (and cache/<platform>/quickid.xml) to keep the mappings intact?

                  You can do that if you are interested in keeping any details in the gamelist.xml such as play count in tact.

                  Skyscraper should just add a new lines for the new names in the quickid.xml and the old names would just become irrelevant. I recently transferred the quickid.xml files from one system to another along with the cache folder to avoid having to re-scrape my roms. Then I realized I had some different rom files on the second system and Skyscraper just made extra lines for the files it didn't recognize and didn't try to scrape the ones that didn't exist anymore.

                  1 Reply Last reply Reply Quote 1
                  • T TomFury referenced this topic on
                  • kiroK
                    kiro @TomFury
                    last edited by

                    @TomFury I've always been puzzled in why scraping 'unknown' roms should be penalized.... this is why I removed this from my scraper... if it is unknown it is unknown and should be taken as input into the DB, and not penalize the consumer of the DB for that... it's so weird...

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      sleve_mcdichael @kiro
                      last edited by sleve_mcdichael

                      @kiro said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:

                      @TomFury I've always been puzzled in why scraping 'unknown' roms should be penalized.... this is why I removed this from my scraper... if it is unknown it is unknown and should be taken as input into the DB, and not penalize the consumer of the DB for that... it's so weird...

                      So if you f--- up and accidentally start to scrape your...I dunno, archive of 9,000 gas station receipts or something, instead of your ROM collection, there's no point at which it goes "you know, all 200 out of the first 200 files we've tried haven't worked, like, at all, are we sure these are actually rom files and not, like, receipts or something?"

                      It's not "penalizing you" it's "protecting the server from you."

                      kiroK 1 Reply Last reply Reply Quote 0
                      • kiroK
                        kiro @sleve_mcdichael
                        last edited by

                        @sleve_mcdichael Well, there are other ways to protect the server, but again, this is a choice and if you scrape 20000 files of rubbish, my backend won't feel the heat. after all is just looking up for a SHA, it either exists or not, if not, it is the same as a request where it exists...Am I making sense?

                        I just did a test where 250k roms where scrapped in around 5hs (downloading screenshots, videos, marquees and videos where available), so the system seems to be performant enough, I guess...

                        S 1 Reply Last reply Reply Quote 0
                        • S
                          sleve_mcdichael @kiro
                          last edited by

                          @kiro I'm just saying, for the scraper to abort after so many failed attempts is not done as a slight to the user, but as a courtesy to the host whose free services we are using. I don't care if your backside "feels" it or not, I'm sure the server would rather you not. Someone else could be using that spot.

                          kiroK 1 Reply Last reply Reply Quote 0
                          • kiroK
                            kiro @sleve_mcdichael
                            last edited by kiro

                            @sleve_mcdichael Yes, and that's why I created my own backend/frontend....so it is not impacting anyone but me..(and that's why I'm not ready to release the source code until I'm certain my backend is safe enough)

                            S 1 Reply Last reply Reply Quote 0
                            • S
                              sleve_mcdichael @kiro
                              last edited by

                              @kiro sorry I was confused. So you serve your own data then? I thought it was using screenscraper.fr.

                              kiroK 1 Reply Last reply Reply Quote 0
                              • kiroK
                                kiro @sleve_mcdichael
                                last edited by

                                @sleve_mcdichael Exactly, I couldn't override any rules that screenscraper may set in terms of limits or others in their APIs

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post

                                Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.