Versatile C++ game scraper: Skyscraper
-
@muldjord "PBP" format is a compressed iso which can be used for PSP and PSX.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
-
@omnija said in Versatile C++ game scraper: Skyscraper:
@muldjord "PBP" format is a compressed iso which can be used for PSP and PSX.
Yes, it supports pbp :) As long as the filenames make sense, you should get good results.
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
Looked into it and it seems they are going in a bit of a different direction that what I want to do. I am not going to create a database with game entries. I am going to create a database with single resources. This allows me to expand it at any given time, with any piece of information I see fit. So basically you have a primary key, which will be the rom sha1. Then you have a bunch of resources connected to that sha1. But it won't be bound in a <game> node. Instead it will just be a '<resource type="something" sha1="rom checksum" timestamp="something">This is an example</resource>'. The resource type can then be anything. For instance "description" or "players". When reading from the database I will then do a checksum from the source rom file, and then search through the database, connectin any resource that matches the sha1. And then assemble it to give me as much game detail as the database contains.
If later, you scrape using a web source that contains information that the local db doesn't have, it will then simply create a resource for it.
This approach is much more flexible than having a <game> node where all nodes are predefined inside.
So this is what I am planning to do. I will probably also create a stitcher, which allows any users' Skyscrape local db xml file to be stitched together with some other persons file. It will then correlate and implement missing data from that. So users can basically just share the xml files with each other. So if you have a great 'snes' xml local db, you can just send it to your friend, and he'll be able to stitch it into his own local Skyscraper db. And then he can scrape from it using a 'local' scraper module.
Ok, so that was a bit of an essay. I like to keep things simple. And I feel like this approach makes more sense for what I want to achieve with Skyscraper. An expandable database, only containing the details that you actually /have/. This will also save space, not having a bunch of empty xml subnodes in the db.
Thanks for sending me their way. I might do a collaboration with them at some point. But for now, I wanna try out this method instead.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
So this is what I am planning to do. I will probably also create a stitcher, which allows any users' Skyscrape local db xml file to be stitched together with some other persons file. It will then correlate and implement missing data from that. So users can basically just share the xml files with each other. So if you have a great 'snes' xml local db, you can just send it to your friend, and he'll be able to stitch it into his own local Skyscraper db. And then he can scrape from it using a 'local' scraper module.
Sweet; I'd really appreciate that functionality.
I mentioned that thread mainly because it could be a potential source of info and images for your scraper; in the sense you just mentioned above. -
I've implemented a local db tonight. And it's looking very promising so far! Basically it uses the local db to fill in the blanks. You enable the local db with '--localdb' and can provide an optional filename. If no filename is provided it will use something like 'localdb.xml'. Every time you use any given local db, it will update it with new resources it comes across for any given rom. So the more sources you've scraped, the more data you'll have in the final result. You can then send these databases to your friends, and they can use them and get the same great results, but without putting load on any http servers!
I'm really excited about the potential of this. :)
I do need to figure out how to handle images and videos though. I can't have them inside the xml files, so they need to just be paths to the files. But that means you can't just send your xml file to your friends because the data would be missing. So maybe I'll create a "prepare" cli option, that prepares a folder with the data you have in your local db. Then you can send that folder to your friends. Or something like that. I'll figure it out. :)
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
Yeah... I'm a pretty long way from having this stuff online. Currently I've been upgrading just the NES artwork. I've basically put in back to back 100+ hour weeks on it.
Most of the US Licensed artwork is 2100 pixels and the carts are nearly 700 pixels. Japan is a little spotty, but I touched up hundreds of images by hand. There is also box/carts for hundreds of translations and rom hacks as well as prototypes, homebrew and pirate games.
When this is done, it will be the best artwork database for the NES online.
I'd like to do all of the major console systems one day. I'm considering crowd funding so I can continue with this work, otherwise I need to get a job pretty soon and this work may never get done.
I'm also considering re-doing doing ALL of the videos for each and every game personally. My set is from 8 or so years ago and is only 480. I'm not really happy with the collections online. No offense to anybody who's put them together, but that's what happens when you have a lot of different people doing things. There is no standard. I'd love the opportunity to work for the community and make those 10,000-12,000 videos myself.
I'm so OCD it hurts sometimes. :)
-
Ok, so I am making some pretty cool progress with the local database idea I had. Check it out:
-
@muldjord Really cool functionality!
Could you include an option to import data and images to the localdb from a local file(s) ?
Alternatively what is the structure of the localdb? Can it be edited manually?What are the options concerning the existing ES gameslist? I mean things like overwrite completely, fill just missing metadata, only write data for games with no metadata, only write images etc.
Sorry for so many questions ;-)
Thanks for your hard work.Note: any clue why after watching your video I get a video with a girl's first flight in a cessna?!! :-D
Maybe it's some tag in your video that makes YouTube proposing this?! -
@muldjord That's pretty sweet. I could have used somebody like you when I was making nearly 10,000 game synopsis files by hand. Lots of copy and paste....
Too bad I don't know how to put my stuff up where a scraper like yours could grab it for everybody. When I'm done you could be sure that it's just about as perfect as possible and there would be a Box, Cart, Title and Action shot for every single game... even prototypes and translations and hacks and other more obscure roms.
I'm working on a spreadsheet that details what is done on the NES. I'm almost ready to put it up.
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord Really cool functionality!
Could you include an option to import data and images to the localdb from a local file(s) ?
Yes, I will create a dbtool for this down the line. The local files will have to be Skyscraper formatted, so basically I will make it so you can share your own Skyscraper db with a friend, and you can import his data into your own, making for a more complete set of data. Then you can scrape locally from that afterwards, leaving you with an awesome result! :)
Alternatively what is the structure of the localdb? Can it be edited manually?No, it can't be edited manually (well it can, but it's really abstract, so it's not really feasible). It is basically just a huge list of <resource> nodes bound to the sha1 checksum of the rom data. It is not meant to be edited manually. But really it shouldn't be necessary. The local database can be seen as an abstract bunch of resources, that Skyscraper is an expert in stitching together to the format of choice. For now I support EmulationStation. But I plan to also support AttractMode and Hyperspin at some point. The resulting gamelist.xml file can of course be edited manually.
What are the options concerning the existing ES gameslist? I mean things like overwrite completely, fill just missing metadata, only write data for games with no metadata, only write images etc.
Currently you have the option to either scrape from scratch, or to only scrape entries that don't already exist.
Sorry for so many questions ;-)
No problem, I need the feedback :)
Thanks for your hard work.You are most welcome, I love working on this, about 4 hours a day for the past month or two. :D
Note: any clue why after watching your video I get a video with a girl's first flight in a cessna?!! :-D
What!? Haha, that's weird.
Maybe it's some tag in your video that makes YouTube proposing this?!
-
@used2berx said in Versatile C++ game scraper: Skyscraper:
@muldjord That's pretty sweet. I could have used somebody like you when I was making nearly 10,000 game synopsis files by hand. Lots of copy and paste....
Holy shit! Just thinking about that makes me dizzy :D Great work though.
Too bad I don't know how to put my stuff up where a scraper like yours could grab it for everybody. When I'm done you could be sure that it's just about as perfect as possible and there would be a Box, Cart, Title and Action shot for every single game... even prototypes and translations and hacks and other more obscure roms.
If it's well formatted, which is certainly sounds like it is, I could easily write a scraper module that scrapes from a local resource such as yours.
I'm working on a spreadsheet that details what is done on the NES. I'm almost ready to put it up.
thumbsup
-
Ok, so I've been expanding the scraper quite a bit. I've decided to implement a 'xmlscraper' module which basically takes a well-formed xml document and uses that as a source for scraping the platform you've chosen.
And it very much sounds like you have this stuff down, so I'd like you to define the XML format for me. So can you send me a file consisting of a single entry? With everything you feel is relevant. I will then enable my scraper to scrape from such a source.
I think this would be really, REALLY cool to have in Skyscraper.
What do you say?
-
@muldjord Sure. I can't do it at the moment, but that's not a problem when I get back to the pi.
You want just a single game entry, or an entire gamelist.xml file?
BTW.... my xml files do not have a description at the moment. meleu made it so that you could do it with or without the description, and since I'm running everything on a pi zero right now the
<desc>
fields are blank. The pi was running into memory issues and getting very buggy when the descriptions were put in.I could re-run one with the
--full
command, but that would take longer to get to you.Just let me know.
-
@Used2BeRX
I think I need more information on what it is you are working on. A gamelist.xml won't be useful, at least not if it's the same format as the EmulationStation format. What I would like is for someone to define an xml format for game entries, that have nodes and attributes for everything that would make sense for a game.What format is your source nes database in currently?
-
@muldjord The source files are TXT files.
Double Dragon Platform: Nintendo Entertainment System Region: USA Media: Cartridge Controller: NES Gamepad Genre: Beat 'em Up Gametype: Licensed Release Year: 1988 Developer: Technos Publisher: Tradewest Players: 1 or 2 Alternating _________________________ Set in a post-apocalyptic New York, Double Dragon is the story of Billy and Jimmy Lee, twin brothers trained in the fighting style of Sou-Setsu-Ken. Together, they manage a small martial arts training school, teaching their students in self-defense. One day, Billy's girlfriend, Marian, is kidnapped off the street by the "Black Warriors", a savage street gang led by a man named Willy. The Black Warriors demand the Lee brothers disclose their martial arts secrets in exchange for Marian's freedom. The Lee brothers set out on a rescue mission to crush the Black Warriors and save Marian. Using whatever techniques they have at their disposal, from the basic punches and kicks to the invulnerable elbow strike, as well any weapon that comes into their hands, the Lee brothers must pursue the gang through the city slum, industrial area and the forest before reaching their hideout to confront the big boss, Willy. Controls: A Button: Punch B Button: Kick Start Button: Start game, Pause/Unpause Select Button: Choose between game Hints: Vanishing Weapons - When you get a weapon from someone it will vanish if you kill them so make sure to kill that person last. Reviewer: Egosolus http://www.consoleclassix.com/nes/double_dragon.html
meleu's script pulls all the relevant information and makes gamelist.xml entries from them. They were made for a system on the XBox that is much different than retropie, so depending on the category of rom there are quite a few info fields in these that will not be usable in EmulationStation.
-
@used2berx said in Versatile C++ game scraper: Skyscraper:
Vanishing Weapons
- When you get
Txt is easy to parse, that is not a problem. Any complete set of information for any platform would be of interest to me, since I could then take that and create a complete Skyscraper resource database from it. You would of course get all the credit for that database since it is your work it is based on.
How do you handle image and video files? Are there paths to them in the txt files?
-
@used2berx You could submit it to screenscraper.fr for example. Join the Irc channel when you submit large collection of files at once.
-
@muldjord I'm actually in the middle of a HUGE project updating the synopsis files and artwork files for the NES, as well as adding 150+ complete translations from romhacking and I'll be adding a lot of homebrew and other games I've found along the way as well. There is a lot of updates possible because the quality of stuff online has improved so much in the last decade.
I actually intend to make new videos too since ours were from 8-10 years ago and all in 480. The stuff on places like emu-movies is "ok", but there is no standards. Volume levels are all over the map, some videos are short while other videos are too long.
I want to make every video between 25-30 seconds long maximum, with an equal sound volume, and with about 15-20 seconds of gameplay wrapped up with 5-10 seconds of the title screen animation. This would be like the demo mode on arcade games. This dream might not be possible right now because of my very old computer that needs updating.
Obviously, this would take a ton of time beyond the thousands of hours I've recently put into emulation since the start of the year. I'm considering possibly looking into crowd funding so I can continue on with this work.
How do you handle image and video files? Are there paths to them in the txt files?
No. There is a specific folder structure for the various media that includes Box Front, Cart, Action, Title, 3D Boxart, videos, gamefaqs, manuals and various other media.
Meleu's script not only pulls all of the data from the synopsis files, but it searches these directories for any media that matches the file name of the game synopsis. Case sensitivity is ignored and it looks for proper extensions by filetype, ie: .jpg or .png for images and .mp4 or .wmv for movies.
If you look at the other thread, you can find the file structure being used. It makes sense, so if you get it for one system, you'd easily be able to figure out where media would go for all of the other systems as well.
I plan on doing writeups for all of this at some point, but I'm just not there yet.
EDIT:
Expect a link for a spreadsheet I've been making soon. It was originally a compatibility list for NES games on the XBox years back that I made that I'm using to update all sorts of stuff now. It has the dimensions of all of the new boxes and cartridge art, as well as a lot of other info.
@Rion Cool. I'll have to look into that. Any way of making that English without Google translate? I clicked on the US flag for "Région préférée" and that didn't do anything.
-
Skyscraper 1.5.0 released!!! https://github.com/muldjord/skyscraper
And this is a whopper! Just look at that changelog!
- MAJOR: Added support for local database resources
- MAJOR: Added support for video scraping (currently supported in the 'screenscraper' scraping module)
- MAJOR: Added 'localdb' scraping module
- Added video tag in EmulationStation gamelist.xml output. Beware though, the Pi's are having a difficult time showing the videos properly.
- Added several new command line options relevant to the new video and localdb features
- Added cover, screenshot and video as part of the result output with "YES" or "NO" depending on whether they were found or not
- Fixed a bug where image tag in gamelist.xml had wrong path when using non-default path
- Now uses rom or filename (for .uae) sha1 for image filename, in case people have several roms with the same name under subdirs
- Added 'players' scraping for 'mobygames' module and improved screenshot getter even more
The major new feature, of course, is the support for local game resource cache databases. One per platform. As you scrape more data, you will find it all under the default folder "./dbs/[platform]". It works completely seamless. The more sources you scrape from, the more complete results you'll get per platform.
And when you finally have results you really like, you can share your database with friends simply by copying the "./dbs/[platform]" folder and handing it to them. They can then use that database with the '-d' option and scrape using the '-s localdb' scraping module. A word of warning though. If your friend copies your database into his own database with the same name, his database will be overwritten! I will implement the '--mergedb' flag soon. But until then, just copy the database to any other location and set it with '-d' when scraping.
Another new feature is video support. As of RetroPie 4.2 EmulationStation supports video. Good news is that it works if you scrape with the '-s screenscraper' module. Bad news is that EmulationStation is having a hard time showing the videos properly. I recon it will get better over time if they keep developing it. If you are having issues with videos, simply disable them altogether with '--novideos'. Then rescrape with '-s localdb'.
This is by far the most elaborate release of Skyscraper to date. I've done a bunch of testing, haven't had any problems so far. But if you do run into trouble or have suggestions, pleeeease let me know. :)
Also, I'd really like to hear if you like these new features. I'm pretty darned excited about the prospects of sharing local db's. I plan to run my amiga setup through a bunch of the scraping modules. Then I'll have a 100% complete set of game details that I can then upload for all of you to use.
I'll get around to that soon aswell.
And now, I think, I've deserved to take the weekend off. I've literally worked on this project for 8 hours a day for the past several days. I love it! But it's starting to take its toll. :D
Have a great weekend everyone! And please comment and let me know what you think!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.