Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source
-
Hello!
This is a Skyscraper specific question. I have been fiddling with my RetroPie setup and fallen in love with the versatility of Skyscraper. I've been perpetually adding roms to my RetroPie installation whereafter I have executed Skyscraper (CLI) to scrape the new entries. For example:
Skyscraper -p nes -s screenscraper
... and then generating a new gamelist.xml:
Skyscraper -p nes
This works great, but as I go I notice that there are roms that never are recognized (and that is fine - some of these are obscure and I do not expect to find a match at Screenscraper). Example output:
#51/79, (1/50) Elapsed time : 00:05:23 Est. time left : 00:02:57 #52/79 (T1) Pass 1 ---- Game 'Duel, The by Bokudono (PD)' not found :( ---- 'screenscraper' requests remaining: 16946 #52/79, (1/51) Elapsed time : 00:05:27 Est. time left : 00:02:49 #53/79 (T1) Pass 1 ---- Game 'Air (SMB1 Hack)' not found :( ---- 'screenscraper' requests remaining: 16946 #53/79, (1/52) Elapsed time : 00:05:31 Est. time left : 00:02:42 #54/79 (T1) Pass 1 ---- Game 'Amiga! Demo (PD)' not found :( ---- 'screenscraper' requests remaining: 16944 #54/79, (1/53) Elapsed time : 00:05:35 Est. time left : 00:02:35 #55/79 (T1) Pass 1 ---- Game 'Atomic Robo-Kid Demo (PD)' not found :( ----
Now, this list of "allowed unknowns" will grow as my rom collection grows. I feel it's unecessary to hammer the Screenscraper online service for items that I know will not render a match. The question I have is if it's possible to instruct Screenscraper to skip certain set of files (preferably based on the report that the script provides at the end; skipped-nes-screenscraper.txt)?
If it's possible - what would your recommended workflow be for accomplishing this?
I have been reading through the CLI documentation where I found that one can use --fromfile as input, but only as a means to instruct Skyscraper what to scrape. I'm looking for the inverse option; How to instruct Skyscraper what to not scrape.
The only option I have concluded that might work (haven't tried yet!) is to manually add entries into gamelist.xml and instruct Skyscraper to "skip existing entries". I guess that would work, but there might be a smarter/better way to accomplish this.
What's your thoughts? :)
-
Perhaps with the
--excludefiles "PATTERN1,PATTERN2"
option?https://github.com/muldjord/skyscraper/blob/master/docs/CLIHELP.md#--excludefiles-pattern1-pattern-2
-
@sleve_mcdichael said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:
Perhaps with the
--excludefiles "PATTERN1,PATTERN2"
option?https://github.com/muldjord/skyscraper/blob/master/docs/CLIHELP.md#--excludefiles-pattern1-pattern-2
Aha, that could be useful perhaps. I got locked into only using that option as a more generic pattern matching expression, but of course one could be a little bit more literal I guess. Then I would have to populate a list with all these files where the pattern would have to be the complete file name. Like this perhaps:
Skyscraper -p nes -s screenscraper --excludefiles "Duel, The by Bokudono (PD).nes,Air (SMB1 Hack).nes,Amiga! Demo (PD).nes,Atomic Robo-Kid Demo (PD).nes"
Perhaps this could be suitable to do within a wrapper script (platform and scraper module could be passed as input), and the list perpetually populated with more entries as my exclusion list of roms is growing:
#!/bin/bash EXCLUDED_FILES="\"Duel, The by Bokudono (PD).nes,Air (SMB1 Hack).nes,Amiga! Demo (PD).nes,Atomic Robo-Kid Demo (PD).nes\"" Skyscraper -p nes -s screenscraper --excludefiles "$EXCLUDED_FILES"
-
@tomfury
If it is not already an option perhaps a request could be made to add a feature to skyscraper to pull a list of excludes from a text file.EDIT: Or if it doesn't put a wrench in your setup, you could just add a tag like
[exc]
to all the rom files you want to skip and then match that one pattern. -
@yfzdude said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:
@tomfury
If it is not already an option perhaps a request could be made to add a feature to skyscraper to pull a list of excludes from a text file.EDIT: Or if it doesn't put a wrench in your setup, you could just add a tag like
[exc]
to all the rom files you want to skip and then match that one pattern.Yes, I will make a request (at the Github project I believe?).
Clever suggestion though! That's a better option than my first suggestion. Follow-up question/thought on that: If I change the filename for a rom, then I would have to manually edit the gamelist.xml (and cache/<platform>/quickid.xml) to keep the mappings intact?
-
@tomfury said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:
Yes, I will make a request (at the Github project I believe?).
You can also ask in this topic.
-
@tomfury said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:
If I change the filename for a rom, then I would have to manually edit the gamelist.xml (and cache/<platform>/quickid.xml) to keep the mappings intact?
You can do that if you are interested in keeping any details in the gamelist.xml such as play count in tact.
Skyscraper should just add a new lines for the new names in the quickid.xml and the old names would just become irrelevant. I recently transferred the quickid.xml files from one system to another along with the cache folder to avoid having to re-scrape my roms. Then I realized I had some different rom files on the second system and Skyscraper just made extra lines for the files it didn't recognize and didn't try to scrape the ones that didn't exist anymore.
-
-
@TomFury I've always been puzzled in why scraping 'unknown' roms should be penalized.... this is why I removed this from my scraper... if it is unknown it is unknown and should be taken as input into the DB, and not penalize the consumer of the DB for that... it's so weird...
-
@kiro said in Skyscraper: Recommended workflow for skipping roms that does not exist on scraper source:
@TomFury I've always been puzzled in why scraping 'unknown' roms should be penalized.... this is why I removed this from my scraper... if it is unknown it is unknown and should be taken as input into the DB, and not penalize the consumer of the DB for that... it's so weird...
So if you f--- up and accidentally start to scrape your...I dunno, archive of 9,000 gas station receipts or something, instead of your ROM collection, there's no point at which it goes "you know, all 200 out of the first 200 files we've tried haven't worked, like, at all, are we sure these are actually rom files and not, like, receipts or something?"
It's not "penalizing you" it's "protecting the server from you."
-
@sleve_mcdichael Well, there are other ways to protect the server, but again, this is a choice and if you scrape 20000 files of rubbish, my backend won't feel the heat. after all is just looking up for a SHA, it either exists or not, if not, it is the same as a request where it exists...Am I making sense?
I just did a test where 250k roms where scrapped in around 5hs (downloading screenshots, videos, marquees and videos where available), so the system seems to be performant enough, I guess...
-
@kiro I'm just saying, for the scraper to abort after so many failed attempts is not done as a slight to the user, but as a courtesy to the host whose free services we are using. I don't care if your backside "feels" it or not, I'm sure the server would rather you not. Someone else could be using that spot.
-
@sleve_mcdichael Yes, and that's why I created my own backend/frontend....so it is not impacting anyone but me..(and that's why I'm not ready to release the source code until I'm certain my backend is safe enough)
-
@kiro sorry I was confused. So you serve your own data then? I thought it was using screenscraper.fr.
-
@sleve_mcdichael Exactly, I couldn't override any rules that screenscraper may set in terms of limits or others in their APIs
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.