Versatile C++ game scraper: Skyscraper
-
Hi @muldjord, is it possible to "clean up" the media? Let's say I had a bunch of games that were scraped and looking good, but some time later they were deleted for being dupes, or I just didn't want them anymore. Is there an easy way to delete the media associated with those now deleted games without clearing all of the media folders and cache, and scraping everything again? And can the cache be emptied if the games already have their media in the media folders after generating the gsmelist? Because with the cache and media folders combined I believe there's a duplicate of every media file, correct? Thanks for the help.
-
@OldSchool Yes, but only from the cache. Please read the documentation for the
--cache
command hereParticularly the
--cache vacuum
option.Concerning the ES media folder Skyscraper can't and won't keep track of what is in it. Remember that Skyscraper supports multiple frontends. Keeping track of all that is mindbogglingly difficult. But as long as you haven't made manual changes and kept your Skyscraper cache, you can just delete the media folder and regenerate the gamelist and media files. Then it will be "clean" again.
-
@muldjord Great! Appreciate the reply.
-
Found a bug in my mobygames cover download code that could result in an endless loop. 3.2.1 just released with this fixed, please update.
-
Skyscraper 3.2.2 released: https://github.com/muldjord/skyscraper
- Added 'bat' scripts to sha1 special handling list (please purge platforms using 'bat' files and rescrape)
- Now discards 'ZZZ(notgame)' results from ScreenScraper
- Fixed double-quote issue when reading titles from scummvm.ini
- Made location of scummvm.ini configurable in config.ini
- Fixed bug in roman and integer numeral conversion functions
- Rewrote the entire 'screenscraper' module to use JSON instead of XML
Fixed some bugs and rewrote the ScreenScraper module from the ground up. I checked out their JSON response format and decided it was a lot cleaner to work with than the XML format. Obviously, with a total rewrite, there can be issues. I've tested it both with unregistered use and with registered use and haven't found any issues beyond what would have happened when using the XML format.
Let me know if you run into problems.
Have fun!
-
Skyscraper 3.2.3 released: https://github.com/muldjord/skyscraper
- MAJOR: Added support for the 'Pegasus' frontend (set with '-f pegasus')
- Now checks for, and removes, double bracket notes in final game list title
- Fixed minor bug in the 3D gamebox effect renderer
- Completely transparent images are no longer saved when compositing (Thank you to metallkopf for getting me to finally fix this)
- Optimized the final game list assembling code to use game entry references instead of copies
- Optimized all cache resource iterations to use const references instead of copies
- Optimized the entire codebase by removing all Qt-centric foreach iterate-by-copy to use references instead
- Fixed bug where 'screenscraper' would only look for ESRB age classification
- Potential faulty JSON replies from Screenscraper are now saved to '~/.skyscraper/screenscraper_error.json' for easier debugging
Most prominent new feature is the support for the Pegasus frontend. Use it with
-f pegasus
. Keep in mind that the default artwork.xml is not well suited for the default Pegasus theme, so I suggest using this:<?xml version="1.0" encoding="UTF-8"?> <artwork> <output type="screenshot" width="640"/> <output type="cover" width="640" height="480"> <layer resource="cover" height="480" align="center" valign="middle"> <gamebox side="wheel" rotate="90"/> </layer> </output> <output type="wheel" height="200"/> </artwork>
Which will give you this:
Please also remember to disable the EmulationStation linking in the Pegasus UI settings. And equally important, when a platform has been scraped, remember to add the
metadata.pegasus.txt
to Pegasus afterwards. The games won't show up until you do.I've also done quite a bit of optimization resulting in a 10% speed boost when generating game lists. Oh, and completely empy and transparent images are no longer saved in the media folders.
Have fun! And please report any problems you might run into. There's a lot of code rewrite going on at the moment, so some bugs could have snuck in.
-
A quick note to anyone reading this: There is currently a bug in the ScreenScraper API that causes some searches to return a faulty JSON document (the raw data format). I've reported this to ScreenScraper.
EDIT: They should have fixed this now, so if you still encounter JSON errors, please create a bug report report at the Skyscraper github and attach the screenscraper_error.json file as the Skyscraper output instructs. Thanks! :)
-
@muldjord Yeah, I stumbled upon this yesterday. Thanks for the announcement.
-
@muldjord hey. thanks for the great work on skyscrapper.. just a qick question.. is there a way to try scrapping only missing games? (Skiping games already in DB and cache )?
-
@dudumaroja Just disable refresh, and it will load all existing games from the cache (which is quite fast). Other than that, you can also use the
--startat
and--endat
command line options to narrow the scope of the scraping run. :) But the most common error people do is to enable refresh without knowing what it means. Enabling refresh means that it will always fetch the data from the online source, even though it has already been cached. -
@muldjord Excuse me for jumping in on that, but I just started using Skyscraper. Will it complete missing resources from already cached games? If, say, a game has everything cached but a Video, will SS download the Video on the next run (without refresh) if it becomes available?
I'm asking because AFAIK, Sselph's scraper which I used up until now doesn't do that for games that are already in the gamelist. It doesn't use a cache like SS does, though.
-
@Clyde No, if a game has previously provided a result for any resource, it won't fill in the holes unless you run a refresh on it. This is how the databases work. If you look up a game on any of the databases, it will give you most, if not all, of the info in one request. At that point it only makes sense to use all of it or not look up the game again at all.
The idea is to use the
--cache edit
or the import module feature to fill in the holes. You can use the--cache report:missing=<RESOURCE>
to figure out what your games are missing. And you can use the--cache edit --fromfile <REPORT FILE>
afterwards to edit only the games missing any of the resources in the reports. -
@muldjord thanks very much.. the refresh issue is more of the retropie menu problem i guess.. it was enabled by default for me ( never messed if it.. ) and i didnt know about the option until you pointed it to me.. ( i was even adenventuring myself to learn how to use your program without the retropie-setup menu, and just leaned about --startat and was about to use it.
another question.. im seen to have problem with modified roms. ( translation and fixes.. ) but i can see to find a option to scrapt theses rom by name.
could you provide-me a example on how to do it? im using the proper no-intro names.. but the hashes as messed up..
and again... thanks for the great work. i loved the ideia on have a cache for personal re-scraping.. i set a network share so it wont waste space on my SD or usb drive. and it works amazing.. i wish there was a tool so we can acess the cache data.. and make alterations ouselfs.. ( and maybe send it back later to screenscraper. )
-
@muldjord Thanks for the explanation. So, a game can only have all of its resources from one source, which will be overwritten by one single other source on a refresh?
The source in brackets when scraping subsequently (e.g.
Screenshot: YES (screenscraper)
) made me hope that you can have resources from different sources.If I do understand you correctly, why can't SS fill in the gaps locally even if the online databases give you all resources in one request? The resources are saved in separate files, after all.
I'm still hoping that I misanderstand you. ;)
-
@Clyde said in Versatile C++ game scraper: Skyscraper:
@muldjord Thanks for the explanation. So, a game can only have all of its resources from one source, which will be overwritten by one single other source on a refresh?
No, it can have one of each resource per scraping module. Please check out the documentation it's all in there
The source in brackets when scraping subsequently (e.g.
Screenshot: YES (screenscraper)
) made me hope that you can have resources from different sources.It does. :) It even allows for user resources on top of that with the
--cache edit
option where you can add new textual resources yourself that will be prioritized above all other resources. You can also prioritize the resources from all of the other databases with the priorities.xml files per platform. It's also documented, you'll find it. :)If I do understand you correctly, why can't SS fill in the gaps locally even if the online databases give you all resources in one request? The resources are saved in separate files, after all.
It does, but there's no difference between filling in the holes and filling in everything. The point here being that if you are refreshing the data for a game, you will always be interested in filling in everything, since everything is already part of the data from the servers. Only filling in the holes would be a huge waste of database bandwidth.
-
@dudumaroja said in Versatile C++ game scraper: Skyscraper:
@muldjord thanks very much.. the refresh issue is more of the retropie menu problem i guess.. it was enabled by default for me ( never messed if it.. ) and i didnt know about the option until you pointed it to me..
I'm pretty sure it's not enabled by default unless @mitu changed that. It shouldn't be.
another question.. im seen to have problem with modified roms. ( translation and fixes.. ) but i can see to find a option to scrapt theses rom by name.
Depends on the scraping module. Some of the modules use a file name search based query (and will just do so, you don't need to do anything special to enable it), others use checksums. You can read more about all of them here
and again... thanks for the great work. i loved the ideia on have a cache for personal re-scraping.. i set a network share so it wont waste space on my SD or usb drive. and it works amazing.. i wish there was a tool so we can acess the cache data.. and make alterations ouselfs.. ( and maybe send it back later to screenscraper. )
You're welcome. There's plenty of options to edit the cache. :) Check out the
--cache help
output from a terminal. It can't send it back to the sources though. You can also use the-s import
module to add your own artwork resources. That's also well-documented. -
@muldjord Thanks again – for the information and your very polite RTFM. :) I admit that asking here seemed more convenient than looking it up by myself, which I will do now with the proper amount of guilty conscience.
I also want to thank you for this great piece of work and your ongoing support. Like @dudumaroja, I especially like the caching concept.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
I'm pretty sure it's not enabled by default unless @mitu changed that. It shouldn't be.
Yes, it's not enabled by default - when the configuration is first generated, the refresh is disabled - https://github.com/RetroPie/RetroPie-Setup/blob/922149b09dfab04408e133114bcb2a97cf7ed87c/scriptmodules/supplementary/skyscraper.sh#L358.
-
@Clyde Yes, caching the data is extremely important for any scraper if you ask me. :) Have fun with it!
@dudumaroja, see mitu's answer above. :)
-
Skyscraper 3.3.0 released: https://github.com/muldjord/skyscraper
- MAJOR: File identification now uses new quick id method for up to 75% faster processing (Thank you to 'langest' for finally making me look into this)
- Added 'pc98' platform (Thank you to 'leosmeira' for suggesting it)
- Added 'pokemini' platform (Thank you to 'leosmeira' for suggesting it)
- Renamed all 'sha1' file id's to 'id' as sha1 was misleading
- Changed relevant defines to constexpr
- ScreenScraper now always prioritizes the 'video-normalized' above 'video' (Thank you to 'JuanVCS' for suggesting this)
- Fixed bug in ScreenScraper retry code which made it retry more than necessary
Most prominent feature is the new quick id concept. So far Skyscraper has calculated the internal cache id's from the data of the roms every time you run Skyscraper. This seemed like a lot of work for files that rarely change, and takes a long time to do, especially for the larger roms such as n64 roms. So, I've made a new system called quick id which is basically an id caching that works seemlessly with the cache id system. Here's how it works:
- Whenever a file processed for anything using the cache id, the cache id is calculated using the "slow" method. And at this time, the file path and file modified time is also saved in a quick id xml file. Then, the next time this particular file needs processing, it will first compare the file path and modified time from the quick id list to the actual file. If the file is still located at the same place and hasn't been modified since last run, it will look up the cache id from the quick id list.
So everything will be "as slow as usual" the first time you process any file, since it is then forced to use the "slow" id technique as it doesn't yet know the id. But for any subsequent processing of that particular file, you will see up to a 75% improvement in speed (the larger the roms, the speedier it will be). A great platform to test it on is the
n64
platform. But remember, after updating, you won't see the speed improvement until you've processed the files one way or another once so the id's get cached in the quick id list.A great way to test this is to update to 3.3.0, then regenerate the gamelist for any platform with
Skyscraper -p <PLATFORM>
and take note of how long it took . When that's done do it again and take note of the time again. It should have taken significantly shorter time the second time you ran it, as it was now able to use the quick id feature.
The quick id list can be vacuumed together with the cache with the--cache vacuum
option to avoid having lingering quick id's if you move files around or rename them.A lot of code has changed in this version. I've fixed some silly bugs in the ScreenScraper module aswell which made it retry more times than necessary. If you encounter any problems with any of the above, please let me know. As always I've been testing this quite a bit on my own systems, but it's impossible for me to test any use case.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.