Versatile C++ game scraper: Skyscraper
-
@bleuge I'm not gonna work on Skyscraper for a while :)
You can easily get the missing games simply by scraping the platform with any scraper and adding --pretend. It will load all of the data from cache, so it doesn't use network bandwidth from the sources. And the ones it doesn't have it will add to the [homedir]/.skyscraper/skipped-[module].txt files. :)
And yes, you have the basic idea of Skyscraper correct. The idea is, as you say, that you can change any of the options for Skyscraper, and simply rescrape with localdb. If you do so, you will get 99% the same result as scraping with the web sources all over again. So it saves a tonne of bandwidth for the sources and is a lot faster. So basically you can, in theory, redo your entire collection in Emulationstation just by rescraping with localdb. I don't think many other scrapers give you this option, if any.
The reason you have to provide a path for localdb is that they are platform specific. Having one database for all platforms would make it extremely slow, since I have to do lookups in ALL resources for each and every rom. It would make Skyscraper almost unusable.
Nothing keeps you from setting a "wrong" platform with --dbstats though if you also use '-d', it will display the stats correctly. :) I can see how that's confusing, but just think of it in a broader perspective. A localdb cache is not really platform specific. It is rom collection specific. But since everything is tied to platforms when you talk emulation, so does Skyscraper.
EDIT: MobyGames is high on my priority list. Just keep in mind that they have very strict connection limits, which basically means it is only usable to scrape about 10 roms an hour. But you could use it to fill in the gaps I think. So if you have one game that just doesn't scrape with the others, then you could scrape it on command line with MobyGames and maybe get a hit.
But yeah, for now, I'm taking a break. :)
-
@analoghero Yes, that is exactly correct. Skyscraper is 100% capable of scraping offline if all games are in the local cache. :)
-
@muldjord jeje ok, understood, you need a very well deserved rest!
Thanks for all the good work and enjoy!
P.D. Yes, skyscraper is unique featuring a localdb. That was a genius idea. -
@muldjord I know youre taking a break, and i dont want to disturb, so consider this as a thing you do when you like to. When scraping amiga, the names of the files (minus extension) are displayed in ES instead of the gamename. Example: Agony_v1.3_0960 should be Agony.
Dont know if other systems are affected too. -
@analoghero Either you're using '--forcefilename' or it hasn't been found. Those are the only two reasons for a game to show up with the filename. :) Otherwise, if it was found as "Agony" it will be shown as "Agony".
-
@muldjord Strange issue. I never used --forcefilename.
#4/63 (T2) ---- Game 'Agony_v1.3_0960' found! :) ---- Scraper: localdb Search match: 100 % Compare title: 'Agony' Result title: 'Agony_v1.3_0960' (import) Platform: 'Amiga' (thegamesdb) Release Date: '1992-01-01' (openretro) Developer: 'Art and Magic' (openretro) Publisher: 'Psygnosis' (openretro) Players: '1' (openretro) Tags: 'Animalprotagonist, Autoscroll, Horizontal, Powerup, Shootemup, Sideways' (openretro) Rating (0-1): '0.7' (thegamesdb) Cover: YES (openretro) Screenshot: YES (openretro) Wheel: NO () Marquee: NO () Video: YES (import) Description: (thegamesdb) This horizontally-scrolling shoot 'em up features six long levels, all with detailed and mellow background graphics, aiming for a less hectic feel than contemporaries such as Project X. As a magician's apprentice, you have been turned into an owl to give you the best chance of destroying the many dark creatures to be faced, and thus discovering the secret of cosmic strength. These dark creatures include piranhas, giant ants and mosquitoes. Extra weapons and invincibility periods can be collected. The technical details include 3 layers of multi-directional parallax scrolling, background animation, and different title and in-game music. Elapsed time: 00:00:04 Estimated time: 00:01:10
Heres a example from my gamelist.xml:
<?xml version="1.0"?> <gameList> <game> <path>/home/pi/RetroPie/roms/amiga/Agony_v1.3_0960.lha</path> <name>Agony_v1.3_0960</name> <cover /> <image>/home/pi/RetroPie/roms/amiga/media/screenshots/Agony_v1.3_0960.png</image> <marquee /> <video>/home/pi/RetroPie/roms/amiga/media/videos/Agony_v1.3_0960.mp4</video> <rating>0.7</rating> <desc>This horizontally-scrolling shoot 'em up features six long levels, all with detailed and mellow background graphics, aiming for a less hectic feel than contemporaries such as Project X. As a magician's apprentice, you have been turned into an owl to give you the best chance of destroying the many dark creatures to be faced, and thus discovering the secret of cosmic strength. These dark creatures include piranhas, giant ants and mosquitoes. Extra weapons and invincibility periods can be collected. The technical details include 3 layers of multi-directional parallax scrolling, background animation, and different title and in-game music.</desc> <releasedate>19920101</releasedate> <developer>Art and Magic</developer> <publisher>Psygnosis</publisher> <genre>Animalprotagonist, Autoscroll, Horizontal, Powerup, Shootemup, Sideways</genre> <players>1</players> </game>
-
@analoghero Ah, that's because the title is prioritized from the "import" module. Just remove the "<source>import</source>" line from the "<order type="title">...</order>" in your priotities.xml under "[homedir]/.skyscraper/dbs/[platform]/priorities.xml" and rescrape with localdb.
:)
-
@muldjord yes youre right! Altough there was no <source>import</source> tag under the order type title in the priorites.xml i added <source>thegamesdb</source>. Now it uses the correct title.
Thank you for your help.
-
@analoghero Glad I could help. :)
-
@muldjord I'm having an issue with a few of my roms scraping incorrectly in simple mode.
They are showing incorrect names in EmulationStation.
I can give specific examples if that helps.
-
@maroonout09 It is not uncommon for a few roms to scrape incorrectly. I assume the name they are scraped as are quite close to the the one you expect. Skyscraper is based on filename searches for some modules, and checksum searches for others and use several different tricks to try and be as precise as possible. But there will be false positives, it cannot be avoided.
But yes, please give examples and also what version of Skyscraper you are running (important). I would like to make sure it is the expected behaviour and not something else entirely.
Quick note: If you want to avoid false positives completely, set '-m 100' on command line or 'minMatch' in '[homedir]/.skyscraper/config.ini'. Then it will only allow 100% correct results. But keep in mind that you will also loose a lot of the correct results if you do so. It's a bit of a balancing act.
-
@muldjord Here are the games that I found that were scraped incorrectly:
Filename: Pokemon_-_Yellow_Version.gbc
Scraped Name: Robopon: Sun Version
Comments: The scrape also included the description for Robopon: Sun Version, and for some reason, the images for Pokemon: Gold Version.Filename: Super_Mario_Advance.gba
Scraped Name: Chaoji Maliou Shijie
Comments: The scrape had the correct description and images.Filename: Super_Mario_Advance_3_-_Yoshi's_Island.gba
Scraped Name: Yaoxi Dao
Comments: The scrape had the correct description and images.Filename: Wario_Land_4.gba
Scraped Name: Waliou Xunbao Ji
Comments: The scrape had the correct description and images.I think those may have been the only ones that scraped incorrectly.
I'm using Skyscraper v2.4.3.
-
@maroonout09
Just tested all of them, these are the reasons and what you can do about it:Pokemon_-_Yellow_Version.gbc:
It returns a match for Robopon: Sun Version because of the "-" in the filename (it will include this in the search which messes with it, I will consider removing these dashes automatically in 2.4.4). And since that name matches 83%, it accepts it. You can make it work by changing the name of that file to "Pokemon_Yellow.gbc"Super_Mario_Advance.gba / Super_Mario_Advance_3_-_Yoshi's_Island.gba / Wario_Land_4.gba:
These titles are actually correct, they are just the 'wor' region titles for them and are the titles ScreenScraper returns for them. I was not aware that the 'wor' titles were sometimes to the japanese titles, so I'll prioritize the 'eu' and 'us' titles higher for the next release (2.4.4). In the meantime, please set 'region' manually with '--region us' or '--region eu' to prevent this from happening.Thank you for reporting this, I appreciate it.
-
@muldjord For Amiga: Deluxe Pacman is scraped as Deluxe Pac Man, and not found. Rock n Roll is not found, too.
With .lha files it doesnt add [AGA] anymore. Not really important though.Edit: Shame that we cant use LemonAmiga or HOL.
-
@analoghero [AGA]'s will be back in 2.4.4. :) And so will [CD32], [CDTV] and [Demo].
You can change the filenames of your lha's if you want better results. Try changing "DeluxePacManxxx.lha" to "DeluxePacmanxxx.lha" for instance, that might fix it. But for now many Amiga games with .lha suffix will scrape wrongfully since I have to convert the filenames on the fly to add spaces, and that is just bound to be a problem.
I'm working with Dom from the Amiberry team for a better solution in the future. But for now, this will have to do. I also would like to point out that Skyscraper is the only scraper to even support the .lha's at this point, so I guess anything is better than nothing. Skyscraper scrapes about 75% of the lha's at the moment.
EDIT: Agreed, I actually supported LemonAmiga and HOL half a year ago, but had to remove support since I couldn't get official permission to scrape from their sites... :S I never got a reply to my emails if I recall correctly. And without permission I won't use them of course.
-
@muldjord Yes i know that they were once supported, but removed. I think they assume a scraper for a well known platform such as retropie will cause a lot of traffic. Good idea just to rename files. Will try that. :)
-
@muldjord Thank you very much for your help!
-
@maroonout09 You're welcome. Good luck with it! :)
-
Just for reference we have also been testing this on our RetroPie base image for the Odroid XU4 and it does work well. The only item of note that we have found is that with that board a lot more folks use small EMMC or microSD cards for the base and then an ext drive for their games/media. With how the db is storing what we can tell are duplicates in the cache for quicker results when performing a rescrape it is easy to chew up the remaining space on the OS "drive" and filling it very quickly. Excellent work tho with how great the metadata that is returned for the gamelists and also the media itself.
-
@fnkngrv Thank you, glad you like it. You can change the dbFolder with '-d' and I will make sure it can be set in the config.ini file for the next release aswell. Then you can create a config.ini and add the 'dbFolder="[db base folder]"' in the main section of it, and it will put the cache there for all platforms in subfolders. That should give you the dynamic you are looking for. Will be in 2.4.4.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.