Versatile C++ game scraper: Skyscraper
-
OK sounds like you have it covered. I look forward to trying it out.
-
Hey there,
Sorry for the late reply, the last couple of days have been quite busy and have not had much time in front of the PC to continue my build!
I am sure it works fine now, but I will give it a test as soon as I can (should be able to do it later) and clarify it for you.
Cheers!
-
Skyscraper 2.9.0 released: https://github.com/muldjord/skyscraper
Update either from the RetroPie-Setup script or using the update procedure from the Github readme, depending on what version you are using.
Release notes:
- MAJOR: Now looks up Amiga lha files in the Amiberry "whdload_db.xml" and retrieves data from "openretro.org" based on the uuid from the xml
- Added search based fallback pass for Amiga when game isn't found via uuid
- Added "<scanlines>" effect to compositor. Check artwork documentation for more info (Thank you to "jakejm79" for suggesting this)
- "mobygames" module now uses https
- Fixed bug in "openretro" module where "developer" would potentially scrape wrong under certain circumstances
- Improved "description" scraping for "openretro" module
- Improved bracket tag handling for Amiga lha files A LOT
- Fixed minor pass 2 bug when using search based sources
- Added a pass for integer to roman conversion for search based sources (eg. "4" converts to "IV")
- "the" matching now uses regular expression for better precision
- Added "mame-*" platforms to "mameMap" name load list
This is a big update for the Amiga whdload crowd. The best amiga database around is surely the OpenRetro database. Unfortunately the lha filenames used with whdload gave us a bit of a hard time. This is all a thing of the past now. With the help of HoraceAndTheSpider and Olly, we now have a database which maps pretty much all of the lha files directly to their data inside OpenRetro. And it all works seemlessly. Just pop those lha's in your roms/amiga folder and run "Skyscraper -p amiga -s openretro --refresh" and watch that glorious data stream in.
Lots of other stuff in this release aswell. Some rather big improvements to the filename search based scraping. If a game is called "Game Name 6" and no results are returned, an extra pass will now be tried with "Game Name VI" automatically.
Lastly, a new
<scanlines>
effect has been added to the compositor. It comes with two default overlays you can use. But it's also very customizable, so feel free to use your own overlay and fiddle with the compositing method to get the desired result. It's all documented here: https://github.com/muldjord/skyscraper/blob/master/ARTWORK.md#scanlines-effect-node-from-v290-optionalHappy scraping!
-
Regarding your last changes to Roman numerals handling and other permutations of title:
Do you think it makes sense to remove ™, ® and © (also maybe others) from titles prior to scraping? "Hitman™" currently fails to scrape miserably even though it has entries in databases:
https://thegamesdb.net/game.php?id=44303However, that's also something which will be solved by "title hints" feature you mentioned is on the roadmap at some point.
EDIT:
FYI numerals handling change fixed the way R.C. Pro-Am II is scraped (previously it'd scrape as R.C. Pro-Am, so first and second game showed the same), so good job :D -
@Silent Do you have ™, ® and © in your filenames? Or is it in the returned titles?
Title hints? I don't remember what that is. :S
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
Do you have ™, ® and © in your filenames? Or is it in the returned titles?
Filenames - as Hitman presents itself as Hitman™ and so that's how Moonlight saves it. However, I should just rename the file and be done with it - no point bothering you with something this minor.
@muldjord said in Versatile C++ game scraper: Skyscraper:
Title hints? I don't remember what that is. :S
https://github.com/muldjord/skyscraper/issues/77#issuecomment-446739139
This and your reply. At least the way I understood it is that you may consider defining title aliases for those games which are poorly matched (another use case I have for this: nothing in skyscraper detects "DiRT 3 Complete Edition", but "DiRT 3" works fine so I could define it as an alias).
EDIT:
"esgamelist" scraper sounds excellent! Will be super useful for cases where Skyscraper can't scrape the title fine but ES inbuilt scraper does! DiRT 3 issue I just mentioned is one such case, for example.Thank you and mgerhardy!
-
@Silent I don't mention any custom aliases in that. All I mention is the already implemented automatic extra pass for "Game Name II" to "Game Name 2". But you can use the --query option to create custom search queries for special cases. https://github.com/muldjord/skyscraper/releases/tag/2.7.5
But it's not a bad idea to have some kind of aliases.csv file that you could put in
"DiRT 3 Complete Edition";"Dirt 3"
. Just one per line for instance (it's exactly what I already do for mame games). And it would then check that if you apply a "--aliases" command line option or something like that. I'll make that, should be useful. -
Also call me stupid, but I don't see a way how to scrape a single rom? Just point to a specific file instead of a directory, like
Skyscraper -p nes -s screenscraper /home/pi/RetroPie/roms/nes/MyGreatGame.nes
?
-
@Silent You need to use the
--query
parameter and provide the filename or checksum - look up the usage in the--help
output. -
@Silent Updated doc with info on how to scrape single or a subset of roms. It was already in the localdb doc, but I've added it to the main doc aswell.
-
@Silent Just added the alias feature. In 2.9.2 you will be able to add lines to the file
~/.skyscraper/aliasMap.csv
with the following format:
filename without suffix;use this name instead
It will then override the name gotten from the filename and other sources and instead useuse this name instead
whenever a rom with that base name is scraped. This will of course not work for thescreenscraper
module (it's checksum based) but will work for all search query based modules such asthegamesdb
-
@muldjord Awesome! Two things to nitpick if I may:
--aliases
doesn't seem to have made it to the final cut, but docs still refer to it.- Shouldn't aliasMap be a multimap, in case multiple aliases are defined for a base ROM? Unless it's contractual that you can only have a single alias per ROM (which should be perfectly sufficient).
-
@Silent said in Versatile C++ game scraper: Skyscraper:
@muldjord Awesome! Two things to nitpick if I may:
--aliases
doesn't seem to have made it to the final cut, but docs still refer to it.
Removed. I've designed it so the "--aliases" option isn't necessary. It will always look up the name in the aliases file. So if you want to disable a name, just remove it from the file again.
- Shouldn't aliasMap be a multimap, in case multiple aliases are defined for a base ROM? Unless it's contractual that you can only have a single alias per ROM (which should be perfectly sufficient).
No. This gives you the option to hand 1 alias, not several. Why would you need multiple aliases? This doesn't add a pass to the scraping, it just allows you to overwrite whatever name is fetched from the filename. So this will gives you the option to fix your
Dirt 3
problem for good. -
Good news - that worked perfectly!
Bad news - "Hitman™" is still problematic, with an error message like this:
Couldn't calculate sha1 hash sum of rom file 'Hitman���.ml', please check permissions and try again, now exiting...
I think I'm just going to rename that one. Perhaps Moonlight script should trip such odd characters from filenames.
-
Skyscraper 2.9.5 released: https://github.com/muldjord/skyscraper
- MAJOR: Added option "--purgedb vacuum" which vacuums all resources not related to your current romset. Remember to make backups of your cache before using this
- MAJOR: Added option "--purgedb all" that purges all resources for the selected platform. Remember to make backups of your cache before using this
- MAJOR: Added "--symlink" option which forces cached videos to be symlinked to destination instead of being copied when scraping with the "localdb" scraping module
- MAJOR: Added "esgamelist" emulationstation gamelist.xml scraping module. Contributed by "mgerhardy". Rewritten by me to better conform to Skyscraper design
- MAJOR: Added aliasMap.csv that forces the use of a title alias when searching for specific filenames
- Removed version bracket tag for Amiga lha files
- Improved getCompareTitle for mame games and lha files
- Code cleanup for sqrNotes
- Added the "ti99" platform. (Thank you to "jhbeskow" for suggesting it)
Lots of cool stuff in this release. A new platform was added (
ti99
). Several improvements to the--purgedb
option made it in, which now allows you to vacuum (--purgedb vacuum
) and clear out (--purgedb all
) the cache for the currently selected platform.
On top of this the newesgamelist
scraping module was also added that scrapes data and artwork from a local EmulationStation gamelist.xml. Originally contributed by mgerhardy, but I had to rewrite most of it to conform better with the Skyscraper code. The idea is that you can now use any scraper to create an EmulationStation gamelist.xml. And then use theesgamelist
module to import that data into Skyscraper's local database cache. When that's done, you can then make use of it by prioritizing it in the~/.skyscrapers/dbs/[platform]/priorities.xml
file and scraping the platform withSkyscraper -p [platform]
which defaults to scraping it with-s localdb
where the data is stored. Should come in handy. :) So thanks to mgerhardy for providing the initial code and the idea!
I've also added the new file~/.skyscraper/aliasMap.csv
which is a lookup file that allows you to create an alias for any filename (thanks to silent for suggesting this). So if you're having trouble scraping a game calledGame Name Which Is Waaaaaaay Too Long
you can try adding an alias for it to, for instance, just beGame Name
. The file will be created the first time you run Skyscraper after updating to this version. The file contains instructions on how to use it. I will also document it at some point at github. Maybe later today.
Last of the more prominent features is the--symlink
option which will simply symlink any videos from the cache instead of copying them when creating the gamelist for the selected frontend. This will save space, but comes with the caveat that if you remove the video from the cache, the link will break! So please be aware of this. :)Let me know what you think of these new features.
Merry christmas if that's your thing, and happy scraping! -
Interesting - I've been scraping my newly purchased games via Moonlight, and I ran into a case where:
- Game file was called "Worms W.M.D"
- thegamesdb scraped it fine, mobygames did not because there it is called "Worms: W.M.D."
- I therefore added
"Worms W.M.D";"Worms: W.M.D."
to aliasMap and scraped again - this time both scrapers worked fine.
In other words, case:
Compare title: 'Worms: W.M.D.' Result title: 'Worms W.M.D'
works fine, but the opposite returns no matches for mobygames.
Can I find out if it's mobygames not matching this query or Skyscraper rejects it because it's not similar enough?
-
@Silent said in Versatile C++ game scraper: Skyscraper:
Can I find out if it's mobygames not matching this query or Skyscraper rejects it because it's not similar enough?
It says so in the output.
-
This looks awesome! I can't wait to try this scrapper out. Do you have any plans to include video scraping options? It would be awesome if you could specify a "maximum video length" and have the scrapper cut the videos to that length after download, and also fix the format / codec, etc. I have a script that does it using ffmpeg, but haven't been able to make something that will run on non-windows.
-
@PC No current plans to expand the video functionality. Currently it relies entirely on the video that the source delivers. But it's, as you mention, pretty easy to mass convert the videos. You can easily just convert all of the videos in the cache with ffmpeg to suit your needs.
I could implement some ffmpeg calls that simply runs those commands, but it's an "ugly" solution codewise. So I think I'll let other tools handle this.
-
Mild request: Could
--nobrackets
be ignored when processing imported data? It should be fair to assume that "User knows better" and stripping brackets there makes no sense.Use case: User wants to have
(U) [!] {BESTVERSION}
stripped but still keep some brackets, eg.Gran Turismo 2 (Arcade)
. With this, it could just be defined in an import.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.