Versatile C++ game scraper: Skyscraper
-
@muldjord Hey guys. I won't be around for a few weeks, but I'll get right back to work on the NES synopsis stuff when I do.
I will only have the NES/Famicom/FDS games done at that point, but the problem you mentioned above won't be an issue if you scrape from the top line of the synopsis. I have a spreadsheet that displays all of this, but I wasn't able to get it ready for a public release before it was time to wrap it up unfortunately.
The spreadsheet shows the file name for the roms, synopsis and all associated media. It shows the top line of the synopsis which is the name displayed in the romlist using meleu's script, as well as using the XBox emulators. It also shows which games have a manual, which ones have videos, and the exact dimensions of the raw artwork files for Box Front and Cart images. Every single game has these two images. Hundreds of them were made by me personally for the more obscure games, hundreds more have been touched up to varying degrees, and anywhere from 1,000-1500 of them were cropped slightly to have a uniform look and to get rid of any beat up edges. (Most US games had great restorations done by other people, but a lot of foreign, pirate and other games had some pretty shoddy boxes even if they were HD images).
Anyways.... gotta go for now, but I'll be back soon. Good luck on your project muldjord.
-
@Used2BeRX Have fun dude! I'll be here when you get back. :)
-
@muldjord That would be great, thank you! If you need someone to test out new builds or whatever, i'm happy to help.
-
@incunabula If I can get you to test the current built, that would be great. The tag option I mentioned is in there and I've fixed a bunch of other stuff aswell. Any feedback would be very welcome before I release it officially:
https://github.com/muldjord/skyscraper/archive/b031b26889c827f2559995fc6592f827e4b00a4b.zip -
OK, i'll give it a go. Something i noticed last night was that my MSX and Game Gear roms had to be unzipped before they could be scraped. All other platforms that i've scraped so far worked fine as zipped.
-
@incunabula Yes, according to the RetroPie wiki, the GameGear and MSX emulators don't support .zip files. So that's why it isn't included. If you can confirm that it works with those filenames without unzipping them, I can easily add it. Using zipped roms does have a few disadvantages though, so I don't recommend ever using zipped roms.
EDIT: Let me elaborate a bit on that. I use the sha1 checksum for storage of local resources as a means of having a unique key per rom. For best results the actual rom data is preferred.
Another disadvantage is that the 'screenscraper' module uses the sha1 checksum of rom data for identifying them. If they are zipped, it can't do that. And unzipping them internally makes no sense, since zips often contain more than 1 rom.The only reason I can see to actually zip roms, is that it makes it easier to pack together different roms for the same game. It doesn't save much space because of the type of data anyways, and if you use a zip with multiple roms, it actually costs you space, since you have a bunch of roms inside the zip you won't ever use.
So, that's my thoughts and concerns on the subject. :) Not trying to tell you what to do, just thought I'd give a bit of background for why I think zipped roms is a problem.
-
That makes sense - the checksum could be anything depending on what application was used to compress the file, what level of compression was used, etc. Ok, i'm fine with unzipping these roms (rather small file sizes already) but i can confirm that MSX (using lr-bluemsx) and GG (using lr-genesis-plus-gx) both do in fact work with zipped roms.
-
Is there a way you can add local folders as a scraping option? For instance if i already had a folder of boxart and it has some images that the scraper modules are not returning could it be possible to "scrape" images from a default path like %roms%/boxart ?
-
@incunabula I'm considering options for this, same thing I am working on with Used2BeRX. The problem here of course being that people tend to have varying ways of saving these files so I would need to handle "all of them" so to speak.
If would be really cool if you could sortof "import" a snap folder into the localdb and then just scrape using the 'localdb' module and it would use those snaps. But all local db resources are identified by sha1 checksum, and the snap images that are not easily identifiable other than from their filenames. And that poses a big problem.
I think a first implementation will be to define an xml format where your snaps would need to be in. Then you can scrape from that. That would then automatically add it to the localdb. So when you scrape from 'localdb' afterwards, it would use those snaps aswell as any other snaps you might have acquired.
EDIT: I realize what I just wrote might seem a bit fuzzy :D Bottom line is that I am considering it. And if I find "the right way"(tm) I will implement it.
-
@incunabula said in Versatile C++ game scraper: Skyscraper:
That makes sense - the checksum could be anything depending on what application was used to compress the file, what level of compression was used, etc. Ok, i'm fine with unzipping these roms (rather small file sizes already) but i can confirm that MSX (using lr-bluemsx) and GG (using lr-genesis-plus-gx) both do in fact work with zipped roms.
Exactly. But I will add .zip to both MSX and GameGear, no problem.
EDIT: zip support is implemented in this release, feel free to test it: https://github.com/muldjord/skyscraper/archive/a820fbd8368372c09c5f3a00338dbf040bc23e43.zip
-
@muldjord I understand, and i don't mean to be pushy by requesting new features. Just suggesting things that might be useful to others. So far 3 out of 3 platforms are working OK using the --nobrackets switch. :)
-
@incunabula Suggestions are always more than welcome. :)
-
Is adding support for other platforms as simple as adding additional if/else statements in the Skyscraper::run() function? I know nothing about C++ but have done a bit of VB .net so i apologize if this is an ignorant question or if i'm using the wrong terminology hehe :)
-
@incunabula Almost. Adding new platforms is probably the easiest thing to do in Skyscraper. :) As long as the platform doesn't have any special demands. If it's just an ordinary platform with ordinary file formats that need to be run, then I can do it real easy. Just let me know which ones you'd like in there.
-
Skyscraper 1.7.0 released: https://github.com/muldjord/skyscraper
- MAJOR: Fixed and refined 'attractmode' frontend implementation, now works in a basic manner
- 'attractmode' can now skip existing entries
- 'emulationstation' now properly add brackets to 'name' on skipped entries
- Added check for 'db.xml' when doing '--cleandb'
- Refactored GameEntry variables
- Changed GameEntry from struct to class
- Added 'Overall title similarity' to final output
- Added 'Overall completeness' to final output
- Code refactoring here, there and everywhere
- Now accepts results where we have low editDistance, but high similarity (For instance "Disney's Darkwing Duck" with fileName "Darkwing Duck").
- Added '--nobrackets' option that disables and [] and () tags in the frontend game titles. (Thanks for the feedback 'incunabula')
- Fixed bracket parsing
- Now always uses completeBaseName since some filenames have more than one '.'
- Completely rewrote sorting algorithm. 30 lines became one with a nifty C++11 lambda :D
- Added zip format to GameGear and MSX platforms
- Now uses filenames for output image files again
Have a great weekend everyone!
-
Skyscraper 1.7.1c released: https://github.com/muldjord/skyscraper
- Moved all source files to 'src' folder
- '[homedir]/.skyscraper' is now default folder for all files used by Skyscraper
- '/usr/local/bin/Skyscraper' is now default location for Skyscraper executable
- Refined '--help' output a bit
- Fixed lemon64 scraping
- Added 'lemonamiga' scraping module
- Added '--skipped' command line option
- Added 'make install' for correct installation of files
This is more of a clean-up release. First of all, I've rewritten the compile and install procedure in the readme on github, so be sure to use it exactly as written. Basically what I've added is a "sudo make install" command, which will take care of installing the files in the correct locations on your Pi. This ensures that each time you compile a new version of Skyscraper, it will install it in the same place, thus perserving all of your configs and local dbs properly. It was a bit of a mess before, sorry about that.
In other news, I've fixed the lemon64 scraper. I've also added the 'lemonamiga' scraping module. I've also added the '--skipped' option which allows you to always include skipped entries in the resulting gamelists. It is especially useful for attractmode.
As always, if you run into trouble, let me know. I expect a few bugs in this release, since I've moved a lot of code around and refactored quite a bit to enable the '--skipped' option.
Have fun!
-
Skyscraper 1.7.2 released: github.com/muldjord/skyscraper
- Added 'uvlist' scraping module (http://www.uvlist.net)
- Added rating resource and support
- Added rating support to lemonamiga
- Added rating support to lemon64
- Added rating support to mobygames
- Added rating support to uvlist
Enjoy!
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
I think a first implementation will be to define an xml format where your snaps would need to be in. Then you can scrape from that. That would then automatically add it to the localdb. So when you scrape from 'localdb' afterwards, it would use those snaps aswell as any other snaps you might have acquired.
I would suggest something really simple as first implementation: an "import to localdb" function that will take the scrape images from a local folder. The key as you mentioned should be: image file name = rom filename.
-
@UDb23
Ok, so here's what I am thinking: I will create a folder called "~/.skyscraper/import". In this folder there will be a folder per supported import type. So there will be a folder called "snaps" and a folder called "boxart". If you place files inside this folder with the precise filename of one of the rom files you are going to scrape, and set the scraper to '-s import', it will then import these images into the localdb with the source of "import" or similar. Then, when you scrape using 'localdb' afterwards, it will use those files. So basically the import procedure becomes a scraper alongside 'mobygames' and others. The difference is that is scrapes using files from the "~/.skyscraper/import" folder.I think this is a pretty good solution. It's pretty straightforward to do this with snaps, boxart, videos and other media files. I'll need to figure out how to do it for other resources. But let me start with media files.
How does that sound?
-
@UDb23
Aaaan here it is:Skyscraper 1.7.3 released: https://github.com/muldjord/skyscraper
- Added 'import' scraper, scraping from resources located in '[homedir]/.skyscraper/import' folder
- Added 'developer' support for 'uvlist' scraper
- Improved html unescaping a lot
- Cleaned up xml escaping
The 'import' release. So the big new thing in this update is that I now allow you to import your own 'snaps' and 'boxart' into the local database using the '-s import' scraper. It's a local scraper that scrapes from '[homedir]/.skyscraper/import' resources. Place your images in the folders within with the EXACT name of any rom you will be scraping, and it will import that image into the local database. Then, afterwards, scrape with the '-s localdb' scraper to make use of the artwork in your frontend of choice. (Read more under "Local data import" in the Github readme)
I will expand the import functionality over time to also allow importing of textual data. But for now, snaps and boxart seemed like the most important ones.
Also, I realize that Skyscraper now has a bunch of very abstract features that might seem a bit confusing, so I plan to make some videos explaining how to properly make use of the powerful features Skyscraper now provides. Simply scraping using one scraping module is just the basic usage. You can get much, MUCH better results very simply by also making use of the '-s localdb' scraper AFTER you've scraped with a bunch of web sources. They'll all be stitched together to form near-100% perfect results for the frontend you choose.
Anyways, more on this when I get the time to make those videos. :D
That's all for now, enjoy! :)
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.