Versatile C++ game scraper: Skyscraper
-
@AnalogHero A gui is something I'd really like to have. I am an experienced gui programmer aswell, so I certainly got the skills for it. Just need to get the command line version to a level where I feel I can leave it for a while. Then I might look into a gui. No promises, but it's certainly something I'd like to do.
-
@muldjord Maybe i found a bug. Cant scrape mastersystem, it says
no entries to scrape
although i have games in there for sure. Paths seem to be correct.Also looks like gamedatabase has blocked me. Anyone knows if they will unblock me ever again?
-
@AnalogHero I got unblocked from gamesdatabase again, but never used it since. I took the hint :)
If it says no entries to scrape, probably you chose to skip existing? Or maybe your roms suffixes aren't supported. Where are your files located and what suffix do they have? 'mastersystem' only supports '.sms' according to the official RetroPie wiki, so that's what Skyscraper looks for.
If you're using zipped rom files I suggest unzipping them, even if the emulator supports zipped files. There are a couple of disadvantags to zipped roms. The major one is that many pieces of software uses the sha1 checksum of the rom data to recognize the rom. If you use a zipped rom, this feature is completely negated since a zipped rom can vary in size and content. It is always best to use the raw rom files in my opinion. But it is just that, an opinion. :) I merely wanted to point out why I feel zipped roms don't make much sense from a frontend and scraping perspective.
-
@muldjord Yes they are zipped. Sure i can extract them. I scraped my other systems, which are also zipped, without any issues.
-
@AnalogHero
Yes, I allow zipped files for the ones where the official RetroPie documentation says that they are supported. But my points stand, in my opinion it is not a good idea to ever use zipped roms unless they are dos games or similar. If they are simply one or more .nes, .sms, .gg or other rom files zipped up, it just brings a lot of disadvantages. You even loose space in some cases, because zipped roms often contain several versions of the rom inside, where a bunch of those are broken or not for the region you want. So if people instead unzip just the one they need, they would even save a lot of space on the Pi.So yeah, I have no idea why people use them zipped. I'm guessing it's because that's how they downloaded them and they don't know about the disadvantages. :)
-
@muldjord I just unzipped them. It makes not alot of difference in filesize even if you have full collections since i use a cleaned romset with one rom per game.
-
@analoghero Awesome!
-
@incunabula said in Versatile C++ game scraper: Skyscraper:
Is there a way you can add local folders as a scraping option? For instance if i already had a folder of boxart and it has some images that the scraper modules are not returning could it be possible to "scrape" images from a default path like %roms%/boxart ?
I found a minute to check in on things, so I can't really talk much, but I did want to reply to this question.
For the NES/FDS I have a great deal of folders separating roms between Licensed, Unlicensed, Translations, Pirates, Prototypes, Hacks, etc...
I spent a few days making great images for all of these folders. I have some examples in the other thread, but you'd have to go back a bit to see them since that was a few months back now. I think some of the coolest stuff is when there are multiple nationalities for a category and I put a backdrop of that country's flag behind the image of the particular system so when you're scrolling through the list of folders the console image remains the same but the flag behind it changes. :)
There are synopsis entries for each of these folders as well, so when I run meleu's script it will pick up these images and display them. They've already been tested and there are no errors for any of the folders or games.
Unfortunately, this is just for the NES and FDS at the moment. It might be YEARS before I get around to finishing this for all the major systems since I'm going to have to find a job soon. :(
Anyway.... gotta run. Just wanted to give you that update.
Things are looking to be pretty busy for me up until maybe through September. I will still be working on this stuff in my down time, but I won't be online much at all until things calm down.
Later
P.S. If you had any questions about how these images are stored, made and/or how meleu's script works with the synopsis with them just leave me a reply and I'll respond when I get back and see it.
-
@muldjord said in Versatile C++ game scraper: Skyscraper:
Yes, I allow zipped files for the ones where the official RetroPie documentation says that they are supported. But my points stand, in my opinion it is not a good idea to ever use zipped roms unless they are dos games or similar. If they are simply one or more .nes, .sms, .gg or other rom files zipped up, it just brings a lot of disadvantages. You even loose space in some cases, because zipped roms often contain several versions of the rom inside, where a bunch of those are broken or not for the region you want. So if people instead unzip just the one they need, they would even save a lot of space on the Pi.
What disadvantages do you see when you have a particular rom you intended to be in the zip, and only that rom?
I've been zipping roms for about 10 years since I started work on the XBox and I've never noticed any negative effects. I also torrentzip them, which has its own benefits you can't get without zipping.
The only exception to this rule I've seen so far is on the RetroPie for the Atari5200/800 and the Odyssey 2 systems since they both currently use emulators that don't support zip files.
The size savings might not be all that much when you're talking about NES and Atari games, but when you start working with SNES and GBA and N64 games you can actually save a great deal of space by zipping your roms. (Especially when you have every licensed game that's in english as well as often times double or triple that library with all the other categories of roms).
Just curious what disadvantages you see this bringing other than the one you mentioned when you have multiple versions of a game in the same zip file.
-
@Used2BeRX I mentioned the checksum problem and multiple roms in a zip problem. If you zip a rom, depending on what zip encoding you use, the rom will differ in size. That makes it very hard to do meaningful sha1 checksumming for use when scraping or in general identifying which rom you are looking at. Especially when caching data locally for scraping.
For instance, many roms have a [!] version which is considered a "good dump". This means the rom is pretty much perfect. That one version is THE version of that game for that region. That's just ONE file. Now, someone decides to zip it. They zip it using X software. It suddenly becomes a different file when you look at the contents. They create a nice rom pack with this zip in it. Someone else downloads it. Some other dude does the same thing, but with a different zip encoding. This file also gets shared. Now you have THREE versions of the same rom file. You can probably see where I'm going with this.
For identifying roms, this is a bit of a problem because I can't cache data for X rom and assume the same cached data will be used for the same game when I give my locally cached data to someone else. Because he might have zipped his file! So when he scrapes the exact same rom, because his was zipped this way or other, it won't be recognized.
People can do what they want, I don't personally care. It just messes up a bunch of stuff for scrapers who want to not hit the web sources so hard and try to cache things to prevent that.
-
Skyscraper 1.7.4 released: https://github.com/muldjord/skyscraper
- Added textual import with 'import' scraper using '[homedir]/.skyscraper/important/definitions.dat' file
- Added video import with 'import' scraper
- Improved 'uvlist' description scraping
- Now properly handles empty nodes in EmulationStation gamelist.xml export
Be sure to read the readme's thoroughly. Everything is explained in there. Enjoy!
-
@muldjord I see.
Torrentzip eliminates this problem. Not sure if you've heard of that program before.
EDITED TO ADD:
You can be sure everything I do is torrentzipped, so any datfiles I would make would be for torrentzipped stuff. That way as long as people found the right stuff and used the dats in romcenter or clrmamepro theirs would be identical as well.
-
@muldjord @Used2BeRX Another workaround for this 'problem' would be that the scraper unzips before scraping, generating the checksum and afterwards cleaning the temp folder. Would create a cpu overhead for sure.
Fun fact: I scraped my zipped snes folder with about 700 games in it with various scrapers. Endresult: about 15 skipped games.
Did the same thing again but this time unzipped them. Result: almost the same as above.
Can you explain how the romnames and the checksum are compared to the websource? From what i see only the name has to match (adjustable with the -m flag). -
@AnalogHero Only the 'screenscraper' uses sha1 checksum for identifying the game. The other web sources uses the file name. Read the readme about the local database cache for more information on why sha1 is important for Skyscraper beyond that. It's all explained in great detail in there. :)
-
@muldjord A little update....
The zip function is going to be even more valuable for my set. For the hacks and translations, the file inside will have quite a bit of information about the rom that the zipfile itself will not have.
So far, it would look something like this inside the zip:
[game name] (Eng-Trans, [hacker-name], [patch version], [patch release date].nes
). I'm also considering whether or not to add original CRC and finished CRC to this file name as well. That's a ton of great information for people if somebody wanted to upgrade a particular translation down the road so they could compare this info and see if there was a more current release.Also, this should overcome any problems there are with long file names. For example, XBox uses FATX which is limited to only 42 characters including the extension, but I tested a few of these and they work fine. They will all be tested on the Pi at some point as well.
I'm considering having the name of the rom file inside the zip for official releases be the no-intro file name as well... we'll see how much time I have. I will not be involved in supplying any roms to anybody, but my intention is to make it as easy as possible to use the work I've done for the end users.
-
Yeah, so video howtos ain't gonna happen anytime soon. Just spend 4 hours in video recording hell and I am not going back. It is clearly not where my talent lies. Deleted everything, too inconsistent and... frankly, crap. I'm just about ready to throw my computer out the window. Not going to, but man, the guys on Youtube who knows how to do this? Mad respect from me. It is friggin' HARD! So many details go wrong all the time. Stumbling over words, forgetting commands, technical problems, having to reset all the time after each "take"...
Just trying to figure out decent examples of what I want to convey in the videos is really, really difficult.
If anyone wants to help out on this front, let me know.
-
@Used2BeRX Basically Skyscraper is "feature complete" as of 1.7.4. I have implemented the "import" scraper which allows anyone do import their own data (artwork and textual) and define the format in the '[homedie]/.skyscraper/import/definitions.dat' file. I recon an importer for your data can be made from this so feel free to do so.
Aside from reported bug fixes, I'm gonna take a break from the project now, the requests are getting very specific, which is fine, but a lot of it is beyond the scope of what I want for Skyscraper. The importer was made in a way so that it can fit basically any custom format of information, as long as they are contained in single text files and artwork files named after the roms you wish to scrape.
Everything is detailed in the github readmes, so feel free to check those out.
That is all for now. :)
-
@muldjord Could you improve the way neogeo games are handled? Results are pretty bad, since naming of neogeo roms is like mame roms. For example you cant scrape mslug.zip or sonicwi3.zip. They dont match with Metal Slug or Aero Fighter/Sonic Wings 3. And in this case you cant unzip them. It would be a mess.
EDIT: I just read that you take a break, so nvm!
-
@AnalogHero Actually that is the one thing I would like to work on. I even created the mamemap.csv file for this purpose some time back. All I need to do is to look up the name in that file before scraping and use that instead of the actual filename. I'll think about it over the next couple of days and try to work it in. If it works well, I'll release it with 1.7.5 sometime soon.
-
Skyscraper 1.8.0 released: https://github.com/muldjord/skyscraper
- Added 'arcadedb' scraper module with video support
- Vastly improved scraping of 'neogeo' and 'arcade' platforms in general by mapping the filenames to real names from mameMap.csv
- Improved 'neogeo' and 'arcade' search platform matching
Apparently my idea of taking a break from a project is to keep working on it... :D Anyways, 1.8.0 is here! And the big news this time around is vastly improved scraping of 'neogeo' and 'arcade' and also a new scraping module using the data from http://adb.arcadeitalia.net/ . This module also supports video!
Enjoy!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.