Versatile C++ game scraper: Skyscraper
-
@chipsnblip ok I'll stop spamming stuff on this thread because I know that he works hard on this stuff. And @muldjord sorry for being so annoying.
-
@muldjord First of all, love what you’re doing with Skyscraper. It’s a cracking scraper – fast, efficient, and the Simple mode works very well. Keep up the good work!
I have a question which I’ve not been able to find the answer to after much searching and experimenting:
The gamelist.xml file produced by Skyscraper is populating the
<image>
node with screenshots, whereas my theme (Art book) requires the boxart / cover to populate the<image>
node (it ignores the<cover>
node entirely). I’m keen to keep all the image types in the db, so don’t want to simply copy / replace the covers onto the screenshots, and try and I may I cannot get the artwork.xml to produce the required<image>
node. The other scrapers I use put the boxart / cover in the<image>
node, but I’m keen to use Skyscraper as my default from now on.I’m assuming that the nodes and their scraped content are hard-coded by Skyscraper, but is there any way I can edit the config / src files to get it to populate the
<image>
node with the cover art? I tried editing the code in emulationstation.cpp to do just that but it doesn’t seem to have worked. Have I got a config setting wrong somewhere?My workaround for now is a series of find / replace tasks in a text editor, but I’d love to be able to get this excellent scraper to output the perfect xml file for my theme.
-
@aphyx Check the artwork.xml documentation. You can add an <artwork type="screenshot"> node that has the <layer type="cover"> inside. It will then export the cover as the screenshot (which populates the <image> node as you request).
Concerning the <cover> node I can't remember if I export that currently. I think I do, but I am on vacation currently so I'll have to check it when I get back home in a weeks time. :)
-
So, currently working on the option of providing filename(s) on command line. My current approach is this: If one or more filenames are provided on command line, it will scrape those using the provided platform with '-p'. It will NOT alter the game list you have, and it will NOT process artwork files. It WILL cache or update the local data and media for the games provided, meaning that a scraping with '-s localdb' afterwards, will make use of the data.
The usefulness as I see it, would be that it would allow you to experiement with changing the filename for a better result from the scraping. And when you've found one that works, you can scrape everything again.
Would this be useful to you guys? Comments? Suggestions? It's been requested quite a few times, so I'm very interested in feedback on this.
-
Skyscraper 2.2.6 released: https://github.com/muldjord/skyscraper
- Now always caches resources locally, even if pretend is set
- Optimized 'simple mode' generated script. Now has '--pretend' set for all non-local modules to avoid artwork processing on those runs. This is a lot faster and provides the same result
- Added the possibility to supply one or more filenames on the command line - it will then ONLY scrape those particular files. Platform still has to be set with '-p' for this to work
- Fixed bug where [tags] would be appended twice when using '--forcefilename'
The much requested feature of providing filenames on command line is implemented in this release. When used, it will scrape those files exclusively and cache the resulting data in the local database cache. To make use of the data afterwards you need to rescrape the entire platform using '-s localdb'. Please give it a go and let me know what you think.
Happy scraping! :)
-
@muldjord whatever I do I just can't get the textual data to scrape and I've created the definitions.dat folder and the made the ROM base.txt file what am I doing wrong? These are the files that I have made:
Description: ###DESCRIPTION###
Developer : ###DEVELOPER###
Publisher : ###PUBLISHER###
Rating : ###RATING###
Genre : ###TAGS###Description: Being a real-estate magnate used to be hard work. Thanks to technology, the computer does all the hard stuff, like rolling dice, for you. Freed of the anxiety of arguing over leaners, you have plenty of time to strategize your next financial move. Up to four players, or you and three computer opponents, take turns around the board. A player wins when the others go belly up. You can also play a quick game mode or a time limit game. In any of the game modes, you can institute your own house rules. Some of those could include awarding a person for landing on free parking, dealing out some properties at random to begin the game, or changing the number of properties you have to own in a group before you can build houses. The easy-to-use interface makes it easy to manage your finances. When you have multiple players, the games can take a long time to finish, but that's the nature of Monopoly.
Developer : Takara
Publisher : Destination Software Inc
Rating : 3.5
Genre : StrategyThe name of the file is Monopoly (U).txt
-
EDIT: Nevermind, read your post wrong. Looks ok to me, not sure what is wrong. I recommend using an xml based format instead, try that.
-
@muldjord tried that and it is the exact same result (i tried it first)
-
@muldjord what should I do?
-
All I can say is that if you've followed the documentation completely, it WILL work. I'm assuming you might have something wrong with where you've placed the files, what you've named them or what command line you run Skyscraper with. And since I don't know any of these things, I can't tell you what to do.
-
@muldjord i send pictures and the command I use is Skyscraper -p gba -s import
-
Visit my Twitter Take a look at Sammy boy (@samsaju04_boy): https://twitter.com/samsaju04_boy?s=09
-
@muldjord the pictures are on Twitter - the first thing you see
-
Please see next post.
-
Please try rerunning it with 'Skyscraper -p gba -s import --updatedb'. That might be the problem.
-
Skyscraper 2.2.6a released: https://github.com/muldjord/skyscraper
- Now always sets '--updatedb' when using 'import' scraping module
Get this release, then it'll work without '--updatedb'
-
@muldjord Thank you so much!
-
Hi @muldjord , I used your software for the scraping the images for roms and I loved it.
Seems to work almost OK, sometimes when game names are closely the same e.g. gameName 1 and gameName 2 it messes the description for these games. There's might be a chance that these descriptions are messed up in the location where it get it from at the first place, dunno.
When I checked the Lakka which is another Retro gaming fontend seems that they get images and thimbnails from the libretro. Dunno is it only based on the name of the game.
Any chance to use also this location which is used in Lakka for the images to the games? :)
https://github.com/libretro/libretro-thumbnails/tree/master/
-
@jura It doesn't mess them up, rather it sometimes gets faulty results from the sources, which is a side-effect of the highly automated way Skyscraper works. For instance, if you have a relatively unknown game called "Star Blob" and a game called "Star Fox", it will quite likely get a correct hit for "Star Fox" since it's a well-known game. BUT, it will probably also return the data for "Star Fox" when it searches for "Star Blob" since that's how the source search engines work. So it doesn't mess them up, it just, sometimes, gets faulty results where it seems like the previous game has "spilled over" into the next entry, when in fact it only does so because the names are closely related. In this example by the "Star" in the name.
You can force it to not accept these by changing the minimum match percentage with the '-m' flag on command line. :) It defaults to 50%, which means that "Star Blob" and "Star Fox" gives a positive result.
I can't use a github repository for scraping, simply because I do not have the permission from github to do so. Github is not meant to be used for game media scraping and I highly doubt they would allow it if they knew people used it for that purpose.
EDIT: On a sidenote, I plan on completely rewriting the way matches are achieved, so matches will improve in a future version. No ETA though.
-
A huge thank you to @chipsnblip for spreading the word about Skyscraper on reddit. I notice these things, and I appreciate it a lot! :)
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.