Versatile C++ game scraper: Skyscraper
-
@muldjord I'm actually in the middle of a HUGE project updating the synopsis files and artwork files for the NES, as well as adding 150+ complete translations from romhacking and I'll be adding a lot of homebrew and other games I've found along the way as well. There is a lot of updates possible because the quality of stuff online has improved so much in the last decade.
I actually intend to make new videos too since ours were from 8-10 years ago and all in 480. The stuff on places like emu-movies is "ok", but there is no standards. Volume levels are all over the map, some videos are short while other videos are too long.
I want to make every video between 25-30 seconds long maximum, with an equal sound volume, and with about 15-20 seconds of gameplay wrapped up with 5-10 seconds of the title screen animation. This would be like the demo mode on arcade games. This dream might not be possible right now because of my very old computer that needs updating.
Obviously, this would take a ton of time beyond the thousands of hours I've recently put into emulation since the start of the year. I'm considering possibly looking into crowd funding so I can continue on with this work.
How do you handle image and video files? Are there paths to them in the txt files?
No. There is a specific folder structure for the various media that includes Box Front, Cart, Action, Title, 3D Boxart, videos, gamefaqs, manuals and various other media.
Meleu's script not only pulls all of the data from the synopsis files, but it searches these directories for any media that matches the file name of the game synopsis. Case sensitivity is ignored and it looks for proper extensions by filetype, ie: .jpg or .png for images and .mp4 or .wmv for movies.
If you look at the other thread, you can find the file structure being used. It makes sense, so if you get it for one system, you'd easily be able to figure out where media would go for all of the other systems as well.
I plan on doing writeups for all of this at some point, but I'm just not there yet.
EDIT:
Expect a link for a spreadsheet I've been making soon. It was originally a compatibility list for NES games on the XBox years back that I made that I'm using to update all sorts of stuff now. It has the dimensions of all of the new boxes and cartridge art, as well as a lot of other info.
@Rion Cool. I'll have to look into that. Any way of making that English without Google translate? I clicked on the US flag for "Région préférée" and that didn't do anything.
-
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
- MAJOR: Added support for local database resources
- MAJOR: Added support for video scraping (currently supported in the 'screenscraper' scraping module)
- MAJOR: Added 'localdb' scraping module
- Added video tag in EmulationStation gamelist.xml output. Beware though, the Pi's are having a difficult time showing the videos properly.
- Added several new command line options relevant to the new video and localdb features
- Added cover, screenshot and video as part of the result output with "YES" or "NO" depending on whether they were found or not
- Fixed a bug where image tag in gamelist.xml had wrong path when using non-default path
- Now uses rom or filename (for .uae) sha1 for image filename, in case people have several roms with the same name under subdirs
- Added 'players' scraping for 'mobygames' module and improved screenshot getter even more
The major new feature, of course, is the support for local game resource cache databases. One per platform. As you scrape more data, you will find it all under the default folder "./dbs/[platform]". It works completely seamless. The more sources you scrape from, the more complete results you'll get per platform.
And when you finally have results you really like, you can share your database with friends simply by copying the "./dbs/[platform]" folder and handing it to them. They can then use that database with the '-d' option and scrape using the '-s localdb' scraping module. A word of warning though. If your friend copies your database into his own database with the same name, his database will be overwritten! I will implement the '--mergedb' flag soon. But until then, just copy the database to any other location and set it with '-d' when scraping.
Another new feature is video support. As of RetroPie 4.2 EmulationStation supports video. Good news is that it works if you scrape with the '-s screenscraper' module. Bad news is that EmulationStation is having a hard time showing the videos properly. I recon it will get better over time if they keep developing it. If you are having issues with videos, simply disable them altogether with '--novideos'. Then rescrape with '-s localdb'.
This is by far the most elaborate release of Skyscraper to date. I've done a bunch of testing, haven't had any problems so far. But if you do run into trouble or have suggestions, pleeeease let me know. :)
Also, I'd really like to hear if you like these new features. I'm pretty darned excited about the prospects of sharing local db's. I plan to run my amiga setup through a bunch of the scraping modules. Then I'll have a 100% complete set of game details that I can then upload for all of you to use.
I'll get around to that soon aswell.
And now, I think, I've deserved to take the weekend off. I've literally worked on this project for 8 hours a day for the past several days. I love it! But it's starting to take its toll. :D
Have a great weekend everyone! And please comment and let me know what you think!
-
@Used2BeRX
Ok, this is no problem. It is easy to find the files as long as the filenames match 100% and I'll just do .toLower on them. -
@used2berx said in Versatile C++ game scraper: Skyscraper:
@Rion Cool. I'll have to look into that. Any way of making that English without Google translate? I clicked on the US flag for "Région préférée" and that didn't do anything.
Sorry no. As of now the translation is lacking on screenscraper.fr. But the people over there are friendly and helpful over irc.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
Big ass changelog!
This looks very promising! Bookmarking this now and following this thread.
-
Skyscraper 1.6.0 release: github.com/muldjord/skyscraper
Changelog:- Now allows more resources of same type, as long as 'source' differs
- Now allows user to set priorities for local resource sources
- Fixed a bug that would nullify timestamp of local resources
- Optimized LocalDb communication to improve scraping speed
- Added README.md to dbs subfolder
- Added priorities.xml.example file to dbs subfolder. Automatically copies this to new databases when they are created if none already exists.
- Implemented '--cleandb' command line option that removes files with no resource entry
- Implemented '--mergedb' command line option that merges two local databases together
- Now no longer does sha1 for roms bigger than 50 MBs (Pi runs out of ram when reading them). Instead does sha1 on filename for those special cases.
- Removed default platform when scraping. You are now forced to put in a valid platform with '-p [platform]'
- Added more initial info when running Skyscraper
- Added '--unattend' command line option
- Added 'source' attribute to local database resources
- Removed 'mobygames' descriptions from 'openretro' scraper. Now uses native descriptions.
- Improved cover and screenshot scraping for 'openretro' module
- Disabled filling in missing data when scraping from web sources. User is meant to use 'localdb' scraping module for this.
- Implemented date formats to standardize output and better support EmulationStation requirements
Most prominent new additions are the '--cleandb' which cleans any local database folder from files that have no entry in the database and '--mergedb' which allows you to merge to databases together. Combine that with '--updatedb' if you want the source db's resources to take preference.
I've also added a 'source' attribute to any resource in the local databases, which means that you can now have several versions of the same type of resource for each rom in a local database. You can then prioritize them using a 'priorities.xml' file (find an example in './dbs'). A note on this: In the 1.5.0 release resources didn't have a 'source' attribute. When you use these resources with 1.6.0 it will autoadd a 'generic' source to those entries. I recommend deleting the 1.5.0 databases and start over. I know this is inconvenient, so I apologize for this. With that said, I feel the format of the database now does everything I want it to do, so I don't expect it to change again in the 1.x branch.
I've also improved some of the scraping modules quite a bit and fixed a few bugs (and probably created a few new ones :D)
Have fun! As always; comments and feedback are welcome!
-
This tool is awesome! Thanks loads for all of the hard work and sharing it with the community!
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
I will look into vic20 later today. :)
EDIT: I've just had a look at the RetroPie platforms wiki page. I don't see vic20 in there. I know it's a Commodore branded machine, but how does the platform work in RetroPie? Where do the roms reside?Commodore Vic20 is emulated by Vice; same as Commodore64.
So the System still is Commodore64 in Retropie ,but it can emulate also Vic20 (just different runcommand option)
Vic20 roms & images are therefore located inside the Commodore64 system folder.Thanks!!
-
@incunabula Thank you! Enjoy! :)
-
@udb23 In that case, you should just be able to use the 'c64' platform which also looks for vic20 files. Just run it with "./Skyscraper -p c64". Have you tried that or am I missing something?
-
@muldjord ok, will try it out over the weekend and let you know.
-
@udb23
Awesome! -
@muldjord I still have some more NES games to add and some more synopsis tweaks, but I'm going to be away from this work for a few weeks unfortunately. I won't forget about this though and you'll get the pack of updated NES synopsis.txt files when they're done. I've got this page bookmarked so I can catch up when I'm back to it.
-
@Used2BeRX No hurries, I'll look into it when you have things ready. :)
-
Is there a way to change the output of the <name> tag that is written to the gamelist.xml file such that the country information is not shown in ES? It looks like it is currently using the file name of the rom. For example:
Super Fun Game Deluxe (USA, Europe, Japan).zip
should be displayed in the ES game list as
Super Fun Game Deluxe
Is this already possible? I looked throught the readme and the command line switches but didn't see anything relevant. Thank you!
-
@incunabula No, this is not possible at the moment. It is not actually using the file name. What it does is register any round and squarebracket "notes" as I call them. It then uses the web result title, and adds the notes back in.
I'll add a '--nobrackets' option in the next release that will disable the bracket tags. :) You will also be able to set 'brackets="false"' in the config.ini file under both [main] and [platform]. That should give you plenty of options for disabling it. :)
Also, stay tuned for attractmode support, also coming in the next release.
-
@muldjord Hey guys. I won't be around for a few weeks, but I'll get right back to work on the NES synopsis stuff when I do.
I will only have the NES/Famicom/FDS games done at that point, but the problem you mentioned above won't be an issue if you scrape from the top line of the synopsis. I have a spreadsheet that displays all of this, but I wasn't able to get it ready for a public release before it was time to wrap it up unfortunately.
The spreadsheet shows the file name for the roms, synopsis and all associated media. It shows the top line of the synopsis which is the name displayed in the romlist using meleu's script, as well as using the XBox emulators. It also shows which games have a manual, which ones have videos, and the exact dimensions of the raw artwork files for Box Front and Cart images. Every single game has these two images. Hundreds of them were made by me personally for the more obscure games, hundreds more have been touched up to varying degrees, and anywhere from 1,000-1500 of them were cropped slightly to have a uniform look and to get rid of any beat up edges. (Most US games had great restorations done by other people, but a lot of foreign, pirate and other games had some pretty shoddy boxes even if they were HD images).
Anyways.... gotta go for now, but I'll be back soon. Good luck on your project muldjord.
-
@Used2BeRX Have fun dude! I'll be here when you get back. :)
-
@muldjord That would be great, thank you! If you need someone to test out new builds or whatever, i'm happy to help.
-
@incunabula If I can get you to test the current built, that would be great. The tag option I mentioned is in there and I've fixed a bunch of other stuff aswell. Any feedback would be very welcome before I release it officially:
https://github.com/muldjord/skyscraper/archive/b031b26889c827f2559995fc6592f827e4b00a4b.zip
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.