Versatile C++ game scraper: Skyscraper
-
Quick update: I am hopeful I've FINALLY tracked down where the data gets mixed up between games. As I've mentioned earlier I was at a loss as to where this could happen since all of my game data structures are in-scope and get thrown out before moving on to the next game. So so far I thought the culprit was my local database, since I share it between threads. But, IT IS NOT!!! After much testing and COMPLETELY removing the local database from my loop, it still happened!
So what might be causing this? Well, there's a little thing called caching inside of Qt's own network request thingamabob. And as it turns out, that internal piece of data doesn't seem to always get cleared correctly before I make a new network request. And for some i-have-no-idea-how reason, it sometimes gets skewed and starts always returning the data from the previous game...
So am I 100% certain of this? No, I am not. But I just added code to clear the cache before each new network request, and after I compiled it I've been testing it 10 times (so far) and haven't been able to make it mix stuff up again. That's why I'm cautiously optimistic.
The longest I have been able to get a success streak before the cache clearing fix was 3 times in a row. So now being at 10 without flaws makes me hopeful. I'll keep testing for the remainder of the day.
If all goes well I'll release 2.3.5 with a fix before I go full caveman the next two weeks without internet.
-
Thanks for being fast. I was thinking this could a very weird error that keep it broken for weeks... xD
Maybe this explain the strange broken pngs i was getting sometimes, if buffer didn't clear fully things could be mixed up.
Good work man!
-
So I just got access to MobyGames' API aswell, that's pretty cool! They have pretty high restrictions on request usage (which is absolutely understandable), so when I include it it can mostly be used for command line scraping of just a few games that might not be found by the other modules. It won't be in 2.3.5 though, but just wanted to let you know of the good news.
2.3.5 is almost ready now btw. I've been testing it like crazy since yesterday and I can't get it to mix up game data anymore. So that's good!
-
PLEASE UPDATE ASAP!!! Quick note: I urge you all to completely remove your "[homedir]/.skyscraper/dbs" folder as it contains faulty data. If you don't it will just reuse those data and it will appear as if the mixed entry bug hasn't been fixed at all!
Skyscraper version 2.3.5 released: https://github.com/muldjord/skyscraper
- IMPORTANT: Fixed bug that caused resources to be mixed up between games because Qt's network cache wasn't cleared (Probably not a Qt bug, but a DAMN hard bug to spot either way). All previous Skyscraper releases have this bug, SO PLEASE UPDATE!!!
- Made 'scummvm' parsing look for config in homedir aswell ('.scummvmrc')
- Now always removes brackets from returned titles
- Now always shows current scraping module in output
- Optimized search matching even more
- No longer asks user about skipping entries if filenames are provided on command line
This release should make mixed entries a thing of the past. I thought this bug was related to the local database cache (and I thought I already fixed it). But it was not! It was related to Qt's internal network requests caching its own data and sometimes returning the data of the previous request. After I included a full cache clear on it, I can no longer make it mix up entries. I've been testing it for well over 15 times on thousands of entries over the last two days. Before this fix I got the problem within 3 tries. I haven't been able to since.
I've also improved the search matching even more, which should now result in some positives that were filtered out before are now back in. Quite a lot actually. With that said, there will be the occasional false-positive still. It can't be prevented 100%. If you are very strict about this, please remember that you can set the "Minimum search percentage match" with the "-m" command line parameter.
If you experience problems with this release, please let me know. Also, I will be without internet for the next couple of weeks, so I won't be able to provide support or comment.
Please update asap! And happy scraping! :)
-
@muldjord Do you made a enquiry to add this scraper as standard installation of RetroPie?
-
@cyperghost No, I'm not sure how that works. Someone told me to contact someone on the forum, but I haven't had the time to look into it yet.
-
@muldjord Thanks for the update. Quick question: Is ~/skyscraper the new homedir? Before it was ~/.skyscraper. Just want to be sure.
-
@analoghero Homedir is still "~/.skyscraper". I can see the confusion. Perhaps I should change the github instructions a bit.
EDIT: Changed the instructions to "skysource" instead of "skyscraper". It's has no functional difference other than clarity to avoid confusion.
-
@muldjord I've read the documentation but not found this, could i change the folder of the localdb? I am running out of space in my SD. I already have mounted in an external HD the roms, so i thought i could move the localdb to the hd, and maybe in someway in the config file or whatever, indicate Skyscraper to use the localdb in the HD? Thanks!
-
-
@analoghero Thanks, I could try a little hacking (modifying the source) here to see if i can force the localdb in the mounted hd. It's out of my coding skills, but having a windows port could be a lot lot easier, i tried Cygwin, but as muldjord told me, this is not going to work so easily.
-
@bleuge The ability to define the path for localdb is an option i also would like to see. Great for us that use usb sticks and hdd's.
-
@bleuge @Rion Not sure how you are running Skyscraper but there is
-d
switch which may be of interest. If you run the commandSkyscraper --help
, you can see all the available options.Here is the relevant output for the
-d
switch:-d <folder> Set local resource database folder. (default is '[homedir]/.skyscraper/dbs/[platform]')
-
@dudleydes I should have checked before i write. Looks like it isnt hardcoded then.
-
@dudleydes That you for pointing that out. 😀
-
Edit - Reference @bleuge request to change the localdb location due to limited space on the SD...as well as my diatribe below:
As suggested, I've "solved" this by mounting a network store in the .skyscraper/dbs folder (via fstab mod, although recommended is through autostart.sh)
This gives the best solution for me, as it caches the downloaded media files in the NAS storage for when I'm home/on my LAN. That's where I would be doing any updates to the scraping anyway. When I take the Pi with me, I don't need the cached files, just those I'm actually using through the final "Skyscraper localdb" command.
I believe the other comments and readme docs mean you could put the localdb on either a separate USB attached drive, or potentially a network share.
However...is it acceptable to just delete the localdb files once you've updated? Assuming you understand the risk that they (obviously) aren't available anymore to reference if you decide to do a full update.
More specifically, when you run updates later for added ROMs, will it delete or corrupt the old information when you overwrite the XML if it can't find the media in the localdb?
In my scenario, with updating initially from import, I could have THREE copies of the same file on the SD. And if we're talking video, that's a lot of space.
1st copy: You import the video files to .skyscraper\import\videos
2nd copy: Run "Skyscraper -p [platform] -s import --videos" and the videos are all copied (and renamed) to skyscraper\dbs[platform]\videos\import folder.
3rd copy: Run "Skyscraper -p [platform] -s localdb --videos" and the videos are all copied again (and renamed back to original) to the roms[platform]\media\videos folder.I assume I can delete the 1st copy once I'm done, but at minimum I still have two copies of the video.
Wouldn't a hardlink between the database and the actual video in the rom folder be better at saving space?
Regardless, as above, can I delete the localdb media files once they've been scanned and copied to the ROM folder without risk of when I update it says "oops, no file, better update the XML file to say it doesn't exist?"Thanks - I tried to make this as clear as possible.
-
@timekills Sure, you can delete the dbs folder. I wouldnt recommend it though, as skyscraper would need to download everything again if you ever want to rescrape.
Maybe youre better off zipping it and storing it elsewhere, so you can reuse it later.Btw: Whats the best way to get video files for your roms? No luck with skyscraper atm.
-
@analoghero I figured as much, but thank you for confirming. It just seems to me there is a better way to store the database if you don't have a need to keep duplicates. Even a choice with a warning to remove once they've been transferred to the ROM folder (or whatever location). I understand anything short of keeping a copy, either local or on a remote localdb location means redownloading.
Regarding video download location, I don't have a great suggestion ATM because I'm fortunate enough to have downloaded all the video snaps over the years. My go-to is always EmuMovies, as I've had an account with them for years. All the way back to when they actually mailed me a full seven DVD set of snaps for everything they had at that time.
-
https://www.reddit.com/r/RetroPie/comments/828a0p/best_way_to_scrape_games_in_2018
Skyscraper wins ;-)
Also thanks for the -d option, I checked the help, but don't know why i missed it.
-
@bleuge This makes me happy. :) Thanks to all of you for your support.
I've just arrived home from a trip to Africa. I'll probably take a couple days to rejoin the internet and then I'll get right back on Skyscraper duties. :)
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.