Versatile C++ game scraper: Skyscraper
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord Really cool functionality!
Could you include an option to import data and images to the localdb from a local file(s) ?
Yes, I will create a dbtool for this down the line. The local files will have to be Skyscraper formatted, so basically I will make it so you can share your own Skyscraper db with a friend, and you can import his data into your own, making for a more complete set of data. Then you can scrape locally from that afterwards, leaving you with an awesome result! :)
Alternatively what is the structure of the localdb? Can it be edited manually?No, it can't be edited manually (well it can, but it's really abstract, so it's not really feasible). It is basically just a huge list of <resource> nodes bound to the sha1 checksum of the rom data. It is not meant to be edited manually. But really it shouldn't be necessary. The local database can be seen as an abstract bunch of resources, that Skyscraper is an expert in stitching together to the format of choice. For now I support EmulationStation. But I plan to also support AttractMode and Hyperspin at some point. The resulting gamelist.xml file can of course be edited manually.
What are the options concerning the existing ES gameslist? I mean things like overwrite completely, fill just missing metadata, only write data for games with no metadata, only write images etc.
Currently you have the option to either scrape from scratch, or to only scrape entries that don't already exist.
Sorry for so many questions ;-)
No problem, I need the feedback :)
Thanks for your hard work.You are most welcome, I love working on this, about 4 hours a day for the past month or two. :D
Note: any clue why after watching your video I get a video with a girl's first flight in a cessna?!! :-D
What!? Haha, that's weird.
Maybe it's some tag in your video that makes YouTube proposing this?!
-
@used2berx said in Versatile C++ game scraper: Skyscraper:
@muldjord That's pretty sweet. I could have used somebody like you when I was making nearly 10,000 game synopsis files by hand. Lots of copy and paste....
Holy shit! Just thinking about that makes me dizzy :D Great work though.
Too bad I don't know how to put my stuff up where a scraper like yours could grab it for everybody. When I'm done you could be sure that it's just about as perfect as possible and there would be a Box, Cart, Title and Action shot for every single game... even prototypes and translations and hacks and other more obscure roms.
If it's well formatted, which is certainly sounds like it is, I could easily write a scraper module that scrapes from a local resource such as yours.
I'm working on a spreadsheet that details what is done on the NES. I'm almost ready to put it up.
thumbsup
-
Ok, so I've been expanding the scraper quite a bit. I've decided to implement a 'xmlscraper' module which basically takes a well-formed xml document and uses that as a source for scraping the platform you've chosen.
And it very much sounds like you have this stuff down, so I'd like you to define the XML format for me. So can you send me a file consisting of a single entry? With everything you feel is relevant. I will then enable my scraper to scrape from such a source.
I think this would be really, REALLY cool to have in Skyscraper.
What do you say?
-
@muldjord Sure. I can't do it at the moment, but that's not a problem when I get back to the pi.
You want just a single game entry, or an entire gamelist.xml file?
BTW.... my xml files do not have a description at the moment. meleu made it so that you could do it with or without the description, and since I'm running everything on a pi zero right now the
<desc>
fields are blank. The pi was running into memory issues and getting very buggy when the descriptions were put in.I could re-run one with the
--full
command, but that would take longer to get to you.Just let me know.
-
@Used2BeRX
I think I need more information on what it is you are working on. A gamelist.xml won't be useful, at least not if it's the same format as the EmulationStation format. What I would like is for someone to define an xml format for game entries, that have nodes and attributes for everything that would make sense for a game.What format is your source nes database in currently?
-
@muldjord The source files are TXT files.
Double Dragon Platform: Nintendo Entertainment System Region: USA Media: Cartridge Controller: NES Gamepad Genre: Beat 'em Up Gametype: Licensed Release Year: 1988 Developer: Technos Publisher: Tradewest Players: 1 or 2 Alternating _________________________ Set in a post-apocalyptic New York, Double Dragon is the story of Billy and Jimmy Lee, twin brothers trained in the fighting style of Sou-Setsu-Ken. Together, they manage a small martial arts training school, teaching their students in self-defense. One day, Billy's girlfriend, Marian, is kidnapped off the street by the "Black Warriors", a savage street gang led by a man named Willy. The Black Warriors demand the Lee brothers disclose their martial arts secrets in exchange for Marian's freedom. The Lee brothers set out on a rescue mission to crush the Black Warriors and save Marian. Using whatever techniques they have at their disposal, from the basic punches and kicks to the invulnerable elbow strike, as well any weapon that comes into their hands, the Lee brothers must pursue the gang through the city slum, industrial area and the forest before reaching their hideout to confront the big boss, Willy. Controls: A Button: Punch B Button: Kick Start Button: Start game, Pause/Unpause Select Button: Choose between game Hints: Vanishing Weapons - When you get a weapon from someone it will vanish if you kill them so make sure to kill that person last. Reviewer: Egosolus http://www.consoleclassix.com/nes/double_dragon.html
meleu's script pulls all the relevant information and makes gamelist.xml entries from them. They were made for a system on the XBox that is much different than retropie, so depending on the category of rom there are quite a few info fields in these that will not be usable in EmulationStation.
-
@used2berx said in Versatile C++ game scraper: Skyscraper:
Vanishing Weapons
- When you get
Txt is easy to parse, that is not a problem. Any complete set of information for any platform would be of interest to me, since I could then take that and create a complete Skyscraper resource database from it. You would of course get all the credit for that database since it is your work it is based on.
How do you handle image and video files? Are there paths to them in the txt files?
-
@used2berx You could submit it to screenscraper.fr for example. Join the Irc channel when you submit large collection of files at once.
-
@muldjord I'm actually in the middle of a HUGE project updating the synopsis files and artwork files for the NES, as well as adding 150+ complete translations from romhacking and I'll be adding a lot of homebrew and other games I've found along the way as well. There is a lot of updates possible because the quality of stuff online has improved so much in the last decade.
I actually intend to make new videos too since ours were from 8-10 years ago and all in 480. The stuff on places like emu-movies is "ok", but there is no standards. Volume levels are all over the map, some videos are short while other videos are too long.
I want to make every video between 25-30 seconds long maximum, with an equal sound volume, and with about 15-20 seconds of gameplay wrapped up with 5-10 seconds of the title screen animation. This would be like the demo mode on arcade games. This dream might not be possible right now because of my very old computer that needs updating.
Obviously, this would take a ton of time beyond the thousands of hours I've recently put into emulation since the start of the year. I'm considering possibly looking into crowd funding so I can continue on with this work.
How do you handle image and video files? Are there paths to them in the txt files?
No. There is a specific folder structure for the various media that includes Box Front, Cart, Action, Title, 3D Boxart, videos, gamefaqs, manuals and various other media.
Meleu's script not only pulls all of the data from the synopsis files, but it searches these directories for any media that matches the file name of the game synopsis. Case sensitivity is ignored and it looks for proper extensions by filetype, ie: .jpg or .png for images and .mp4 or .wmv for movies.
If you look at the other thread, you can find the file structure being used. It makes sense, so if you get it for one system, you'd easily be able to figure out where media would go for all of the other systems as well.
I plan on doing writeups for all of this at some point, but I'm just not there yet.
EDIT:
Expect a link for a spreadsheet I've been making soon. It was originally a compatibility list for NES games on the XBox years back that I made that I'm using to update all sorts of stuff now. It has the dimensions of all of the new boxes and cartridge art, as well as a lot of other info.
@Rion Cool. I'll have to look into that. Any way of making that English without Google translate? I clicked on the US flag for "Région préférée" and that didn't do anything.
-
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
- MAJOR: Added support for local database resources
- MAJOR: Added support for video scraping (currently supported in the 'screenscraper' scraping module)
- MAJOR: Added 'localdb' scraping module
- Added video tag in EmulationStation gamelist.xml output. Beware though, the Pi's are having a difficult time showing the videos properly.
- Added several new command line options relevant to the new video and localdb features
- Added cover, screenshot and video as part of the result output with "YES" or "NO" depending on whether they were found or not
- Fixed a bug where image tag in gamelist.xml had wrong path when using non-default path
- Now uses rom or filename (for .uae) sha1 for image filename, in case people have several roms with the same name under subdirs
- Added 'players' scraping for 'mobygames' module and improved screenshot getter even more
The major new feature, of course, is the support for local game resource cache databases. One per platform. As you scrape more data, you will find it all under the default folder "./dbs/[platform]". It works completely seamless. The more sources you scrape from, the more complete results you'll get per platform.
And when you finally have results you really like, you can share your database with friends simply by copying the "./dbs/[platform]" folder and handing it to them. They can then use that database with the '-d' option and scrape using the '-s localdb' scraping module. A word of warning though. If your friend copies your database into his own database with the same name, his database will be overwritten! I will implement the '--mergedb' flag soon. But until then, just copy the database to any other location and set it with '-d' when scraping.
Another new feature is video support. As of RetroPie 4.2 EmulationStation supports video. Good news is that it works if you scrape with the '-s screenscraper' module. Bad news is that EmulationStation is having a hard time showing the videos properly. I recon it will get better over time if they keep developing it. If you are having issues with videos, simply disable them altogether with '--novideos'. Then rescrape with '-s localdb'.
This is by far the most elaborate release of Skyscraper to date. I've done a bunch of testing, haven't had any problems so far. But if you do run into trouble or have suggestions, pleeeease let me know. :)
Also, I'd really like to hear if you like these new features. I'm pretty darned excited about the prospects of sharing local db's. I plan to run my amiga setup through a bunch of the scraping modules. Then I'll have a 100% complete set of game details that I can then upload for all of you to use.
I'll get around to that soon aswell.
And now, I think, I've deserved to take the weekend off. I've literally worked on this project for 8 hours a day for the past several days. I love it! But it's starting to take its toll. :D
Have a great weekend everyone! And please comment and let me know what you think!
-
@Used2BeRX
Ok, this is no problem. It is easy to find the files as long as the filenames match 100% and I'll just do .toLower on them. -
@used2berx said in Versatile C++ game scraper: Skyscraper:
@Rion Cool. I'll have to look into that. Any way of making that English without Google translate? I clicked on the US flag for "Région préférée" and that didn't do anything.
Sorry no. As of now the translation is lacking on screenscraper.fr. But the people over there are friendly and helpful over irc.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
Big ass changelog!
This looks very promising! Bookmarking this now and following this thread.
-
Skyscraper 1.6.0 release: github.com/muldjord/skyscraper
Changelog:- Now allows more resources of same type, as long as 'source' differs
- Now allows user to set priorities for local resource sources
- Fixed a bug that would nullify timestamp of local resources
- Optimized LocalDb communication to improve scraping speed
- Added README.md to dbs subfolder
- Added priorities.xml.example file to dbs subfolder. Automatically copies this to new databases when they are created if none already exists.
- Implemented '--cleandb' command line option that removes files with no resource entry
- Implemented '--mergedb' command line option that merges two local databases together
- Now no longer does sha1 for roms bigger than 50 MBs (Pi runs out of ram when reading them). Instead does sha1 on filename for those special cases.
- Removed default platform when scraping. You are now forced to put in a valid platform with '-p [platform]'
- Added more initial info when running Skyscraper
- Added '--unattend' command line option
- Added 'source' attribute to local database resources
- Removed 'mobygames' descriptions from 'openretro' scraper. Now uses native descriptions.
- Improved cover and screenshot scraping for 'openretro' module
- Disabled filling in missing data when scraping from web sources. User is meant to use 'localdb' scraping module for this.
- Implemented date formats to standardize output and better support EmulationStation requirements
Most prominent new additions are the '--cleandb' which cleans any local database folder from files that have no entry in the database and '--mergedb' which allows you to merge to databases together. Combine that with '--updatedb' if you want the source db's resources to take preference.
I've also added a 'source' attribute to any resource in the local databases, which means that you can now have several versions of the same type of resource for each rom in a local database. You can then prioritize them using a 'priorities.xml' file (find an example in './dbs'). A note on this: In the 1.5.0 release resources didn't have a 'source' attribute. When you use these resources with 1.6.0 it will autoadd a 'generic' source to those entries. I recommend deleting the 1.5.0 databases and start over. I know this is inconvenient, so I apologize for this. With that said, I feel the format of the database now does everything I want it to do, so I don't expect it to change again in the 1.x branch.
I've also improved some of the scraping modules quite a bit and fixed a few bugs (and probably created a few new ones :D)
Have fun! As always; comments and feedback are welcome!
-
This tool is awesome! Thanks loads for all of the hard work and sharing it with the community!
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
I will look into vic20 later today. :)
EDIT: I've just had a look at the RetroPie platforms wiki page. I don't see vic20 in there. I know it's a Commodore branded machine, but how does the platform work in RetroPie? Where do the roms reside?Commodore Vic20 is emulated by Vice; same as Commodore64.
So the System still is Commodore64 in Retropie ,but it can emulate also Vic20 (just different runcommand option)
Vic20 roms & images are therefore located inside the Commodore64 system folder.Thanks!!
-
@incunabula Thank you! Enjoy! :)
-
@udb23 In that case, you should just be able to use the 'c64' platform which also looks for vic20 files. Just run it with "./Skyscraper -p c64". Have you tried that or am I missing something?
-
@muldjord ok, will try it out over the weekend and let you know.
-
@udb23
Awesome!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.