Versatile C++ game scraper: Skyscraper
Rion last edited by
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
Big ass changelog!
This looks very promising! Bookmarking this now and following this thread.
Skyscraper 1.6.0 release: github.com/muldjord/skyscraper
- Now allows more resources of same type, as long as 'source' differs
- Now allows user to set priorities for local resource sources
- Fixed a bug that would nullify timestamp of local resources
- Optimized LocalDb communication to improve scraping speed
- Added README.md to dbs subfolder
- Added priorities.xml.example file to dbs subfolder. Automatically copies this to new databases when they are created if none already exists.
- Implemented '--cleandb' command line option that removes files with no resource entry
- Implemented '--mergedb' command line option that merges two local databases together
- Now no longer does sha1 for roms bigger than 50 MBs (Pi runs out of ram when reading them). Instead does sha1 on filename for those special cases.
- Removed default platform when scraping. You are now forced to put in a valid platform with '-p [platform]'
- Added more initial info when running Skyscraper
- Added '--unattend' command line option
- Added 'source' attribute to local database resources
- Removed 'mobygames' descriptions from 'openretro' scraper. Now uses native descriptions.
- Improved cover and screenshot scraping for 'openretro' module
- Disabled filling in missing data when scraping from web sources. User is meant to use 'localdb' scraping module for this.
- Implemented date formats to standardize output and better support EmulationStation requirements
Most prominent new additions are the '--cleandb' which cleans any local database folder from files that have no entry in the database and '--mergedb' which allows you to merge to databases together. Combine that with '--updatedb' if you want the source db's resources to take preference.
I've also added a 'source' attribute to any resource in the local databases, which means that you can now have several versions of the same type of resource for each rom in a local database. You can then prioritize them using a 'priorities.xml' file (find an example in './dbs'). A note on this: In the 1.5.0 release resources didn't have a 'source' attribute. When you use these resources with 1.6.0 it will autoadd a 'generic' source to those entries. I recommend deleting the 1.5.0 databases and start over. I know this is inconvenient, so I apologize for this. With that said, I feel the format of the database now does everything I want it to do, so I don't expect it to change again in the 1.x branch.
I've also improved some of the scraping modules quite a bit and fixed a few bugs (and probably created a few new ones :D)
Have fun! As always; comments and feedback are welcome!
This tool is awesome! Thanks loads for all of the hard work and sharing it with the community!
UDb23 last edited by
I will look into vic20 later today. :)
EDIT: I've just had a look at the RetroPie platforms wiki page. I don't see vic20 in there. I know it's a Commodore branded machine, but how does the platform work in RetroPie? Where do the roms reside?
Commodore Vic20 is emulated by Vice; same as Commodore64.
So the System still is Commodore64 in Retropie ,but it can emulate also Vic20 (just different runcommand option)
Vic20 roms & images are therefore located inside the Commodore64 system folder.
@incunabula Thank you! Enjoy! :)
@udb23 In that case, you should just be able to use the 'c64' platform which also looks for vic20 files. Just run it with "./Skyscraper -p c64". Have you tried that or am I missing something?
UDb23 last edited by
@muldjord ok, will try it out over the weekend and let you know.
Used2BeRX last edited by
@muldjord I still have some more NES games to add and some more synopsis tweaks, but I'm going to be away from this work for a few weeks unfortunately. I won't forget about this though and you'll get the pack of updated NES synopsis.txt files when they're done. I've got this page bookmarked so I can catch up when I'm back to it.
@Used2BeRX No hurries, I'll look into it when you have things ready. :)
Is there a way to change the output of the <name> tag that is written to the gamelist.xml file such that the country information is not shown in ES? It looks like it is currently using the file name of the rom. For example:
Super Fun Game Deluxe (USA, Europe, Japan).zip
should be displayed in the ES game list as
Super Fun Game Deluxe
Is this already possible? I looked throught the readme and the command line switches but didn't see anything relevant. Thank you!
@incunabula No, this is not possible at the moment. It is not actually using the file name. What it does is register any round and squarebracket "notes" as I call them. It then uses the web result title, and adds the notes back in.
I'll add a '--nobrackets' option in the next release that will disable the bracket tags. :) You will also be able to set 'brackets="false"' in the config.ini file under both [main] and [platform]. That should give you plenty of options for disabling it. :)
Also, stay tuned for attractmode support, also coming in the next release.
Used2BeRX last edited by
@muldjord Hey guys. I won't be around for a few weeks, but I'll get right back to work on the NES synopsis stuff when I do.
I will only have the NES/Famicom/FDS games done at that point, but the problem you mentioned above won't be an issue if you scrape from the top line of the synopsis. I have a spreadsheet that displays all of this, but I wasn't able to get it ready for a public release before it was time to wrap it up unfortunately.
The spreadsheet shows the file name for the roms, synopsis and all associated media. It shows the top line of the synopsis which is the name displayed in the romlist using meleu's script, as well as using the XBox emulators. It also shows which games have a manual, which ones have videos, and the exact dimensions of the raw artwork files for Box Front and Cart images. Every single game has these two images. Hundreds of them were made by me personally for the more obscure games, hundreds more have been touched up to varying degrees, and anywhere from 1,000-1500 of them were cropped slightly to have a uniform look and to get rid of any beat up edges. (Most US games had great restorations done by other people, but a lot of foreign, pirate and other games had some pretty shoddy boxes even if they were HD images).
Anyways.... gotta go for now, but I'll be back soon. Good luck on your project muldjord.
@Used2BeRX Have fun dude! I'll be here when you get back. :)
@muldjord That would be great, thank you! If you need someone to test out new builds or whatever, i'm happy to help.
@incunabula If I can get you to test the current built, that would be great. The tag option I mentioned is in there and I've fixed a bunch of other stuff aswell. Any feedback would be very welcome before I release it officially:
OK, i'll give it a go. Something i noticed last night was that my MSX and Game Gear roms had to be unzipped before they could be scraped. All other platforms that i've scraped so far worked fine as zipped.
@incunabula Yes, according to the RetroPie wiki, the GameGear and MSX emulators don't support .zip files. So that's why it isn't included. If you can confirm that it works with those filenames without unzipping them, I can easily add it. Using zipped roms does have a few disadvantages though, so I don't recommend ever using zipped roms.
EDIT: Let me elaborate a bit on that. I use the sha1 checksum for storage of local resources as a means of having a unique key per rom. For best results the actual rom data is preferred.
Another disadvantage is that the 'screenscraper' module uses the sha1 checksum of rom data for identifying them. If they are zipped, it can't do that. And unzipping them internally makes no sense, since zips often contain more than 1 rom.
The only reason I can see to actually zip roms, is that it makes it easier to pack together different roms for the same game. It doesn't save much space because of the type of data anyways, and if you use a zip with multiple roms, it actually costs you space, since you have a bunch of roms inside the zip you won't ever use.
So, that's my thoughts and concerns on the subject. :) Not trying to tell you what to do, just thought I'd give a bit of background for why I think zipped roms is a problem.
incunabula last edited by incunabula
That makes sense - the checksum could be anything depending on what application was used to compress the file, what level of compression was used, etc. Ok, i'm fine with unzipping these roms (rather small file sizes already) but i can confirm that MSX (using lr-bluemsx) and GG (using lr-genesis-plus-gx) both do in fact work with zipped roms.
Is there a way you can add local folders as a scraping option? For instance if i already had a folder of boxart and it has some images that the scraper modules are not returning could it be possible to "scrape" images from a default path like %roms%/boxart ?