Versatile C++ game scraper: Skyscraper
-
@Silent You need to use the
--query
parameter and provide the filename or checksum - look up the usage in the--help
output. -
@Silent Updated doc with info on how to scrape single or a subset of roms. It was already in the localdb doc, but I've added it to the main doc aswell.
-
@Silent Just added the alias feature. In 2.9.2 you will be able to add lines to the file
~/.skyscraper/aliasMap.csv
with the following format:
filename without suffix;use this name instead
It will then override the name gotten from the filename and other sources and instead useuse this name instead
whenever a rom with that base name is scraped. This will of course not work for thescreenscraper
module (it's checksum based) but will work for all search query based modules such asthegamesdb
-
@muldjord Awesome! Two things to nitpick if I may:
--aliases
doesn't seem to have made it to the final cut, but docs still refer to it.- Shouldn't aliasMap be a multimap, in case multiple aliases are defined for a base ROM? Unless it's contractual that you can only have a single alias per ROM (which should be perfectly sufficient).
-
@Silent said in Versatile C++ game scraper: Skyscraper:
@muldjord Awesome! Two things to nitpick if I may:
--aliases
doesn't seem to have made it to the final cut, but docs still refer to it.
Removed. I've designed it so the "--aliases" option isn't necessary. It will always look up the name in the aliases file. So if you want to disable a name, just remove it from the file again.
- Shouldn't aliasMap be a multimap, in case multiple aliases are defined for a base ROM? Unless it's contractual that you can only have a single alias per ROM (which should be perfectly sufficient).
No. This gives you the option to hand 1 alias, not several. Why would you need multiple aliases? This doesn't add a pass to the scraping, it just allows you to overwrite whatever name is fetched from the filename. So this will gives you the option to fix your
Dirt 3
problem for good. -
Good news - that worked perfectly!
Bad news - "Hitman™" is still problematic, with an error message like this:
Couldn't calculate sha1 hash sum of rom file 'Hitman���.ml', please check permissions and try again, now exiting...
I think I'm just going to rename that one. Perhaps Moonlight script should trip such odd characters from filenames.
-
Skyscraper 2.9.5 released: https://github.com/muldjord/skyscraper
- MAJOR: Added option "--purgedb vacuum" which vacuums all resources not related to your current romset. Remember to make backups of your cache before using this
- MAJOR: Added option "--purgedb all" that purges all resources for the selected platform. Remember to make backups of your cache before using this
- MAJOR: Added "--symlink" option which forces cached videos to be symlinked to destination instead of being copied when scraping with the "localdb" scraping module
- MAJOR: Added "esgamelist" emulationstation gamelist.xml scraping module. Contributed by "mgerhardy". Rewritten by me to better conform to Skyscraper design
- MAJOR: Added aliasMap.csv that forces the use of a title alias when searching for specific filenames
- Removed version bracket tag for Amiga lha files
- Improved getCompareTitle for mame games and lha files
- Code cleanup for sqrNotes
- Added the "ti99" platform. (Thank you to "jhbeskow" for suggesting it)
Lots of cool stuff in this release. A new platform was added (
ti99
). Several improvements to the--purgedb
option made it in, which now allows you to vacuum (--purgedb vacuum
) and clear out (--purgedb all
) the cache for the currently selected platform.
On top of this the newesgamelist
scraping module was also added that scrapes data and artwork from a local EmulationStation gamelist.xml. Originally contributed by mgerhardy, but I had to rewrite most of it to conform better with the Skyscraper code. The idea is that you can now use any scraper to create an EmulationStation gamelist.xml. And then use theesgamelist
module to import that data into Skyscraper's local database cache. When that's done, you can then make use of it by prioritizing it in the~/.skyscrapers/dbs/[platform]/priorities.xml
file and scraping the platform withSkyscraper -p [platform]
which defaults to scraping it with-s localdb
where the data is stored. Should come in handy. :) So thanks to mgerhardy for providing the initial code and the idea!
I've also added the new file~/.skyscraper/aliasMap.csv
which is a lookup file that allows you to create an alias for any filename (thanks to silent for suggesting this). So if you're having trouble scraping a game calledGame Name Which Is Waaaaaaay Too Long
you can try adding an alias for it to, for instance, just beGame Name
. The file will be created the first time you run Skyscraper after updating to this version. The file contains instructions on how to use it. I will also document it at some point at github. Maybe later today.
Last of the more prominent features is the--symlink
option which will simply symlink any videos from the cache instead of copying them when creating the gamelist for the selected frontend. This will save space, but comes with the caveat that if you remove the video from the cache, the link will break! So please be aware of this. :)Let me know what you think of these new features.
Merry christmas if that's your thing, and happy scraping! -
Interesting - I've been scraping my newly purchased games via Moonlight, and I ran into a case where:
- Game file was called "Worms W.M.D"
- thegamesdb scraped it fine, mobygames did not because there it is called "Worms: W.M.D."
- I therefore added
"Worms W.M.D";"Worms: W.M.D."
to aliasMap and scraped again - this time both scrapers worked fine.
In other words, case:
Compare title: 'Worms: W.M.D.' Result title: 'Worms W.M.D'
works fine, but the opposite returns no matches for mobygames.
Can I find out if it's mobygames not matching this query or Skyscraper rejects it because it's not similar enough?
-
@Silent said in Versatile C++ game scraper: Skyscraper:
Can I find out if it's mobygames not matching this query or Skyscraper rejects it because it's not similar enough?
It says so in the output.
-
This looks awesome! I can't wait to try this scrapper out. Do you have any plans to include video scraping options? It would be awesome if you could specify a "maximum video length" and have the scrapper cut the videos to that length after download, and also fix the format / codec, etc. I have a script that does it using ffmpeg, but haven't been able to make something that will run on non-windows.
-
@PC No current plans to expand the video functionality. Currently it relies entirely on the video that the source delivers. But it's, as you mention, pretty easy to mass convert the videos. You can easily just convert all of the videos in the cache with ffmpeg to suit your needs.
I could implement some ffmpeg calls that simply runs those commands, but it's an "ugly" solution codewise. So I think I'll let other tools handle this.
-
Mild request: Could
--nobrackets
be ignored when processing imported data? It should be fair to assume that "User knows better" and stripping brackets there makes no sense.Use case: User wants to have
(U) [!] {BESTVERSION}
stripped but still keep some brackets, eg.Gran Turismo 2 (Arcade)
. With this, it could just be defined in an import. -
@Silent It doesn't work like that (it's not source filtered), so that would be a no.
-
I just had a realization (and a case where I tested this) - for Screenscraper module, it's taking checksums of .cue files, which is making queries extremely sensitive to filename changes. Have you considered adding a pass using checksums of corresponding .bin files?
Yes, I realize it would be painfully slow to hash a big .bin file, so if anything that should be an opt-in option with a very clear "Please be patient, go get yourself some tea or do something useful for once" message.
I also realize
--query
can handle this case - so if you're not a fan of this option (honestly, waiting minutes till hashes are done may be sup-optimal), maybe we could get something parallel toaliasMap
, but for Screenscraper hashes? Some kinda "don't bother calculating, use this hash instead" file, so games with a .cue and .bin files could be manually tailored like this.EDIT:
I have great success scraping my PSX roms with--query
so yes, IMO ahashMap.csv
file identical to how aliases work would be great. Objectively better than making skyscraper hash .bin files together with .cue, as then 1) matching filenames would not be a concern 2) can hash files from PC and do it just once.EDIT2:
Just so I don't double post, another unrelated idea - what do you think about allowing Moonlight .ml extensions (once it's part of RetroPie-Setup of course) for all emulators, like .zip and .7z are handled now? Technically you can stream any emulator to pi using it, so it'd be very logical to allow it to be scraped from everywhere. I just set up a ps2 system like this and it works wonders. -
@Silent said in Versatile C++ game scraper: Skyscraper:
EDIT2:
Just so I don't double post, another unrelated idea - what do you think about allowing Moonlight .ml extensions (once it's part of RetroPie-Setup of course) for all emulators, like .zip and .7z are handled now? Technically you can stream any emulator to pi using it, so it'd be very logical to allow it to be scraped from everywhere. I just set up a ps2 system like this and it works wonders.I wouldn't be opposed to that as long as it makes sense. But let's talk about that when we get there.
-
I am working towards the 3.0.0 release and I am planning a change which will clarify the gather and combine paths in Skyscraper. Basically the change is simply this:
Only when scraping with the
localdb
module will the artwork and game list generator be run.I other words: When scraping with anything other than
localdb
, it won't save a gamelist.xml and composite the artwork. Ever.So why am I changing this? For several reasons. I almost always personally forget to add
--pretend
whenever I gather data from any of the non-localdb modules. This means that hundreds or thousands of image files are written to disk and my gamelist.xml is overwritten with data from just that one source. Processing artwork slows down the scraping process significantly and hammers the SD card for no reason. And when using Skyscraper you should always scrape withlocaldb
after having gathered data from any of the other modules anyways. I've outlined that a bit here.With this in place I will of course also give the user better tools to exclude certain sources when scraping any platform. So if you want to scrape from cached but only allow resources from one or more sources, you will be able to do that.
Please let me hear your thoughts on this change. I know some of you might be against this change, but please read the above before jumping to conclusions. :)
-
Those are very good changes! It IMO makes sense that scraping from online sources should not update your gamelists, as more often than not you'll want to scrape your brand new ROMs from multiple sources and then have
localdb
output the best of the best.This change should make the difference between "scraping" and "generating gamelists" pretty well defined - admitedly, that was a tiny bit confusing for me when I started using skyscraper, but with this new behaviour it should be clear.
-
Has anyone got the igdb scraping module to work? I have tried passing my credentials with the -u command line option in addition to trying to use config.ini I have also tried using my api key and my userid:pw and can't get any combination to work. I only have the free account with igdb. Any suggestions are appreciated!
Thanks!
-
@sglavach All keys given out by igdb currently are APIv3 keys. I am in the process of converting the module to the new API, so until then it won't work.
-
I've been talking to the good people at IGDB and gotten some things cleared up. The "user-key" provided to me for the API is only meant for developers. As such, the 10k monthly limit should be applied to all Skyscraper users in total, not 1 key per user. So this will be changed to use a hidden key instead. The good thing is that people will then no longer need to supply their own to use it. The caveat is the limit obviously. But to keep the databases stable, we need to adhere to these things. And I certainly will with Skyscraper.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.