Versatile C++ game scraper: Skyscraper
-
@muldjord sounds great,thanks!
Can it also scrape using local metadata and images as source?
Vic20 also supported as system ? -
@udb23 If the local data can be defined somehow, it could easily be implemented to "scrape from local files". I've made the scraping modules very modularized and I could then create a module called "local" containing the rules for the local scraping. Then it would simply build the gamelist.xml from those rules and files instead of using http gets.
I will look into vic20 later today. :)
EDIT: I've just had a look at the RetroPie platforms wiki page. I don't see vic20 in there. I know it's a Commodore branded machine, but how does the platform work in RetroPie? Where do the roms reside?
-
@muldjord thanks.
Mobygames has the best data and images for Vic20.We could define a standard (txt, XML) for local data scraping.
-
Actually the idea of a local scraping source is quite interesting. For instance, it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database, them compare the results for each data field and figure out which one is the best for that particular game. So if 1 source has a video for a game, it would fetch the video from that local resource, but it might be missing a description from that source. But some other entry would have that information, so it would stitch it together for an almost perfect result.
I love this idea. It'll take a while to implement, but it's definitely interesting. I think I will make it xml based and keep the cover, screenshot, marguee and videos as filesystem links.
-
Does it work with psp pbps?
As far as i know currently none of the scrapers works (well) with this method yet. -
@omnija It supports the psp platform. I'm not sure what pbps is. Can you elaborate?
-
@muldjord "PBP" format is a compressed iso which can be used for PSP and PSX.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
-
@omnija said in Versatile C++ game scraper: Skyscraper:
@muldjord "PBP" format is a compressed iso which can be used for PSP and PSX.
Yes, it supports pbp :) As long as the filenames make sense, you should get good results.
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
Looked into it and it seems they are going in a bit of a different direction that what I want to do. I am not going to create a database with game entries. I am going to create a database with single resources. This allows me to expand it at any given time, with any piece of information I see fit. So basically you have a primary key, which will be the rom sha1. Then you have a bunch of resources connected to that sha1. But it won't be bound in a <game> node. Instead it will just be a '<resource type="something" sha1="rom checksum" timestamp="something">This is an example</resource>'. The resource type can then be anything. For instance "description" or "players". When reading from the database I will then do a checksum from the source rom file, and then search through the database, connectin any resource that matches the sha1. And then assemble it to give me as much game detail as the database contains.
If later, you scrape using a web source that contains information that the local db doesn't have, it will then simply create a resource for it.
This approach is much more flexible than having a <game> node where all nodes are predefined inside.
So this is what I am planning to do. I will probably also create a stitcher, which allows any users' Skyscrape local db xml file to be stitched together with some other persons file. It will then correlate and implement missing data from that. So users can basically just share the xml files with each other. So if you have a great 'snes' xml local db, you can just send it to your friend, and he'll be able to stitch it into his own local Skyscraper db. And then he can scrape from it using a 'local' scraper module.
Ok, so that was a bit of an essay. I like to keep things simple. And I feel like this approach makes more sense for what I want to achieve with Skyscraper. An expandable database, only containing the details that you actually /have/. This will also save space, not having a bunch of empty xml subnodes in the db.
Thanks for sending me their way. I might do a collaboration with them at some point. But for now, I wanna try out this method instead.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
So this is what I am planning to do. I will probably also create a stitcher, which allows any users' Skyscrape local db xml file to be stitched together with some other persons file. It will then correlate and implement missing data from that. So users can basically just share the xml files with each other. So if you have a great 'snes' xml local db, you can just send it to your friend, and he'll be able to stitch it into his own local Skyscraper db. And then he can scrape from it using a 'local' scraper module.
Sweet; I'd really appreciate that functionality.
I mentioned that thread mainly because it could be a potential source of info and images for your scraper; in the sense you just mentioned above. -
I've implemented a local db tonight. And it's looking very promising so far! Basically it uses the local db to fill in the blanks. You enable the local db with '--localdb' and can provide an optional filename. If no filename is provided it will use something like 'localdb.xml'. Every time you use any given local db, it will update it with new resources it comes across for any given rom. So the more sources you've scraped, the more data you'll have in the final result. You can then send these databases to your friends, and they can use them and get the same great results, but without putting load on any http servers!
I'm really excited about the potential of this. :)
I do need to figure out how to handle images and videos though. I can't have them inside the xml files, so they need to just be paths to the files. But that means you can't just send your xml file to your friends because the data would be missing. So maybe I'll create a "prepare" cli option, that prepares a folder with the data you have in your local db. Then you can send that folder to your friends. Or something like that. I'll figure it out. :)
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord said in Versatile C++ game scraper: Skyscraper:
it could improve/learn over time, by scraping the same game/rom/file from different sources, it would add all this info to the database
Concerning local scraping you could adopt the xml file standard (basically an extended set of the tags used by ES) that @meleu and @Used2BeRX are already working on in this thread. This would allow to scrape also the huge info database and images that @Used2BeRX has created (not available anywhere online yet).
Yeah... I'm a pretty long way from having this stuff online. Currently I've been upgrading just the NES artwork. I've basically put in back to back 100+ hour weeks on it.
Most of the US Licensed artwork is 2100 pixels and the carts are nearly 700 pixels. Japan is a little spotty, but I touched up hundreds of images by hand. There is also box/carts for hundreds of translations and rom hacks as well as prototypes, homebrew and pirate games.
When this is done, it will be the best artwork database for the NES online.
I'd like to do all of the major console systems one day. I'm considering crowd funding so I can continue with this work, otherwise I need to get a job pretty soon and this work may never get done.
I'm also considering re-doing doing ALL of the videos for each and every game personally. My set is from 8 or so years ago and is only 480. I'm not really happy with the collections online. No offense to anybody who's put them together, but that's what happens when you have a lot of different people doing things. There is no standard. I'd love the opportunity to work for the community and make those 10,000-12,000 videos myself.
I'm so OCD it hurts sometimes. :)
-
Ok, so I am making some pretty cool progress with the local database idea I had. Check it out:
-
@muldjord Really cool functionality!
Could you include an option to import data and images to the localdb from a local file(s) ?
Alternatively what is the structure of the localdb? Can it be edited manually?What are the options concerning the existing ES gameslist? I mean things like overwrite completely, fill just missing metadata, only write data for games with no metadata, only write images etc.
Sorry for so many questions ;-)
Thanks for your hard work.Note: any clue why after watching your video I get a video with a girl's first flight in a cessna?!! :-D
Maybe it's some tag in your video that makes YouTube proposing this?! -
@muldjord That's pretty sweet. I could have used somebody like you when I was making nearly 10,000 game synopsis files by hand. Lots of copy and paste....
Too bad I don't know how to put my stuff up where a scraper like yours could grab it for everybody. When I'm done you could be sure that it's just about as perfect as possible and there would be a Box, Cart, Title and Action shot for every single game... even prototypes and translations and hacks and other more obscure roms.
I'm working on a spreadsheet that details what is done on the NES. I'm almost ready to put it up.
-
@udb23 said in Versatile C++ game scraper: Skyscraper:
@muldjord Really cool functionality!
Could you include an option to import data and images to the localdb from a local file(s) ?
Yes, I will create a dbtool for this down the line. The local files will have to be Skyscraper formatted, so basically I will make it so you can share your own Skyscraper db with a friend, and you can import his data into your own, making for a more complete set of data. Then you can scrape locally from that afterwards, leaving you with an awesome result! :)
Alternatively what is the structure of the localdb? Can it be edited manually?No, it can't be edited manually (well it can, but it's really abstract, so it's not really feasible). It is basically just a huge list of <resource> nodes bound to the sha1 checksum of the rom data. It is not meant to be edited manually. But really it shouldn't be necessary. The local database can be seen as an abstract bunch of resources, that Skyscraper is an expert in stitching together to the format of choice. For now I support EmulationStation. But I plan to also support AttractMode and Hyperspin at some point. The resulting gamelist.xml file can of course be edited manually.
What are the options concerning the existing ES gameslist? I mean things like overwrite completely, fill just missing metadata, only write data for games with no metadata, only write images etc.
Currently you have the option to either scrape from scratch, or to only scrape entries that don't already exist.
Sorry for so many questions ;-)
No problem, I need the feedback :)
Thanks for your hard work.You are most welcome, I love working on this, about 4 hours a day for the past month or two. :D
Note: any clue why after watching your video I get a video with a girl's first flight in a cessna?!! :-D
What!? Haha, that's weird.
Maybe it's some tag in your video that makes YouTube proposing this?!
-
@used2berx said in Versatile C++ game scraper: Skyscraper:
@muldjord That's pretty sweet. I could have used somebody like you when I was making nearly 10,000 game synopsis files by hand. Lots of copy and paste....
Holy shit! Just thinking about that makes me dizzy :D Great work though.
Too bad I don't know how to put my stuff up where a scraper like yours could grab it for everybody. When I'm done you could be sure that it's just about as perfect as possible and there would be a Box, Cart, Title and Action shot for every single game... even prototypes and translations and hacks and other more obscure roms.
If it's well formatted, which is certainly sounds like it is, I could easily write a scraper module that scrapes from a local resource such as yours.
I'm working on a spreadsheet that details what is done on the NES. I'm almost ready to put it up.
thumbsup
-
Ok, so I've been expanding the scraper quite a bit. I've decided to implement a 'xmlscraper' module which basically takes a well-formed xml document and uses that as a source for scraping the platform you've chosen.
And it very much sounds like you have this stuff down, so I'd like you to define the XML format for me. So can you send me a file consisting of a single entry? With everything you feel is relevant. I will then enable my scraper to scrape from such a source.
I think this would be really, REALLY cool to have in Skyscraper.
What do you say?
-
@muldjord Sure. I can't do it at the moment, but that's not a problem when I get back to the pi.
You want just a single game entry, or an entire gamelist.xml file?
BTW.... my xml files do not have a description at the moment. meleu made it so that you could do it with or without the description, and since I'm running everything on a pi zero right now the
<desc>
fields are blank. The pi was running into memory issues and getting very buggy when the descriptions were put in.I could re-run one with the
--full
command, but that would take longer to get to you.Just let me know.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.