Versatile C++ game scraper: Skyscraper
-
Skyscraper is so flexible i think many users don't use to its fully potential xD
I think this is correct: if i have already scraped my sets, my localdb is around 15-16GB. And i want to do things like this, export only the cover, re-build a graphic composition, etc... I just need to modify the artwork.xml file and do a -s localdb rescrape, am i right? Graphics will be rebuild and overwritten in the media folders?
Thanks
-
@bleuge Id say yes. Thats one point for having a good localdb. If you have all needed images already downloaded, you can edit artwork.xml, run skyscraper with -s localdb. Not tested, but it should work offline, too.
-
@muldjord Regarding --dbstats
As I have localdb mounted in external HD, I must use
Skyscraper --dbstats -p arcade -d /home/pi/RetroPie/dbs/arcade
I thought that I could only use -d /home/pi/RetroPie/dbs , but I must include platform also.
What I was strange is that I was testing --dbstats and did a wrong commandline
Skyscraper --dbstats -p arcade -d /home/pi/RetroPie/dbs/c64
So, checking c64 localdb with arcade platform, it showed data, don't know exactly what this means?
Don't you think is best to simply use the path of localdb 'without' platform (as already you have stated the platform you want the stats) and so this "bug" ? "feature"? is showing correct data.
Also, I know you have tons of ideas pending in Skyscraper, but I have two requests
Maybe --dbstats could just run without specifying platform and directly show the stats for the entire localdb?Also, I've been thinking how could I know what games are missing data. I could process gamelist with an external tool.
But could this be interesting in Skyscraper?A run of dbstats without platforms, between scraping sessions, could indicate users how many new data have you gathered in the new scrapping. And you could now how complete is your data sets with the --showmising command
Example:
Skyscraper --showmissing -> show report of missing datas/covers/etc for all platforms
Skyscraper --showmissing --verbosity 1 ->show missing datas/covers/etc for all games on each platformI've been building a set of what I call "the best games" for console/arcade platforms, i've worked on this for many months, some platforms as snes o nes, need to be reduced because contains more than 200games in my initial selection and i think is too much. Also, i've included translatations, interesting hacks, etc...
Sorry for trying to give you more work xD But I love Skyscraper and currently I know is the best scraper around with a big difference.
The day you could implement MobyGames I think I am going to explode with excitement (imagine scraping TDC#14) this could be totally crazyAs always thank you very much and sorry the walls of text
-
@bleuge I'm not gonna work on Skyscraper for a while :)
You can easily get the missing games simply by scraping the platform with any scraper and adding --pretend. It will load all of the data from cache, so it doesn't use network bandwidth from the sources. And the ones it doesn't have it will add to the [homedir]/.skyscraper/skipped-[module].txt files. :)
And yes, you have the basic idea of Skyscraper correct. The idea is, as you say, that you can change any of the options for Skyscraper, and simply rescrape with localdb. If you do so, you will get 99% the same result as scraping with the web sources all over again. So it saves a tonne of bandwidth for the sources and is a lot faster. So basically you can, in theory, redo your entire collection in Emulationstation just by rescraping with localdb. I don't think many other scrapers give you this option, if any.
The reason you have to provide a path for localdb is that they are platform specific. Having one database for all platforms would make it extremely slow, since I have to do lookups in ALL resources for each and every rom. It would make Skyscraper almost unusable.
Nothing keeps you from setting a "wrong" platform with --dbstats though if you also use '-d', it will display the stats correctly. :) I can see how that's confusing, but just think of it in a broader perspective. A localdb cache is not really platform specific. It is rom collection specific. But since everything is tied to platforms when you talk emulation, so does Skyscraper.
EDIT: MobyGames is high on my priority list. Just keep in mind that they have very strict connection limits, which basically means it is only usable to scrape about 10 roms an hour. But you could use it to fill in the gaps I think. So if you have one game that just doesn't scrape with the others, then you could scrape it on command line with MobyGames and maybe get a hit.
But yeah, for now, I'm taking a break. :)
-
@analoghero Yes, that is exactly correct. Skyscraper is 100% capable of scraping offline if all games are in the local cache. :)
-
@muldjord jeje ok, understood, you need a very well deserved rest!
Thanks for all the good work and enjoy!
P.D. Yes, skyscraper is unique featuring a localdb. That was a genius idea. -
@muldjord I know youre taking a break, and i dont want to disturb, so consider this as a thing you do when you like to. When scraping amiga, the names of the files (minus extension) are displayed in ES instead of the gamename. Example: Agony_v1.3_0960 should be Agony.
Dont know if other systems are affected too. -
@analoghero Either you're using '--forcefilename' or it hasn't been found. Those are the only two reasons for a game to show up with the filename. :) Otherwise, if it was found as "Agony" it will be shown as "Agony".
-
@muldjord Strange issue. I never used --forcefilename.
#4/63 (T2) ---- Game 'Agony_v1.3_0960' found! :) ---- Scraper: localdb Search match: 100 % Compare title: 'Agony' Result title: 'Agony_v1.3_0960' (import) Platform: 'Amiga' (thegamesdb) Release Date: '1992-01-01' (openretro) Developer: 'Art and Magic' (openretro) Publisher: 'Psygnosis' (openretro) Players: '1' (openretro) Tags: 'Animalprotagonist, Autoscroll, Horizontal, Powerup, Shootemup, Sideways' (openretro) Rating (0-1): '0.7' (thegamesdb) Cover: YES (openretro) Screenshot: YES (openretro) Wheel: NO () Marquee: NO () Video: YES (import) Description: (thegamesdb) This horizontally-scrolling shoot 'em up features six long levels, all with detailed and mellow background graphics, aiming for a less hectic feel than contemporaries such as Project X. As a magician's apprentice, you have been turned into an owl to give you the best chance of destroying the many dark creatures to be faced, and thus discovering the secret of cosmic strength. These dark creatures include piranhas, giant ants and mosquitoes. Extra weapons and invincibility periods can be collected. The technical details include 3 layers of multi-directional parallax scrolling, background animation, and different title and in-game music. Elapsed time: 00:00:04 Estimated time: 00:01:10
Heres a example from my gamelist.xml:
<?xml version="1.0"?> <gameList> <game> <path>/home/pi/RetroPie/roms/amiga/Agony_v1.3_0960.lha</path> <name>Agony_v1.3_0960</name> <cover /> <image>/home/pi/RetroPie/roms/amiga/media/screenshots/Agony_v1.3_0960.png</image> <marquee /> <video>/home/pi/RetroPie/roms/amiga/media/videos/Agony_v1.3_0960.mp4</video> <rating>0.7</rating> <desc>This horizontally-scrolling shoot 'em up features six long levels, all with detailed and mellow background graphics, aiming for a less hectic feel than contemporaries such as Project X. As a magician's apprentice, you have been turned into an owl to give you the best chance of destroying the many dark creatures to be faced, and thus discovering the secret of cosmic strength. These dark creatures include piranhas, giant ants and mosquitoes. Extra weapons and invincibility periods can be collected. The technical details include 3 layers of multi-directional parallax scrolling, background animation, and different title and in-game music.</desc> <releasedate>19920101</releasedate> <developer>Art and Magic</developer> <publisher>Psygnosis</publisher> <genre>Animalprotagonist, Autoscroll, Horizontal, Powerup, Shootemup, Sideways</genre> <players>1</players> </game>
-
@analoghero Ah, that's because the title is prioritized from the "import" module. Just remove the "<source>import</source>" line from the "<order type="title">...</order>" in your priotities.xml under "[homedir]/.skyscraper/dbs/[platform]/priorities.xml" and rescrape with localdb.
:)
-
@muldjord yes youre right! Altough there was no <source>import</source> tag under the order type title in the priorites.xml i added <source>thegamesdb</source>. Now it uses the correct title.
Thank you for your help.
-
@analoghero Glad I could help. :)
-
@muldjord I'm having an issue with a few of my roms scraping incorrectly in simple mode.
They are showing incorrect names in EmulationStation.
I can give specific examples if that helps.
-
@maroonout09 It is not uncommon for a few roms to scrape incorrectly. I assume the name they are scraped as are quite close to the the one you expect. Skyscraper is based on filename searches for some modules, and checksum searches for others and use several different tricks to try and be as precise as possible. But there will be false positives, it cannot be avoided.
But yes, please give examples and also what version of Skyscraper you are running (important). I would like to make sure it is the expected behaviour and not something else entirely.
Quick note: If you want to avoid false positives completely, set '-m 100' on command line or 'minMatch' in '[homedir]/.skyscraper/config.ini'. Then it will only allow 100% correct results. But keep in mind that you will also loose a lot of the correct results if you do so. It's a bit of a balancing act.
-
@muldjord Here are the games that I found that were scraped incorrectly:
Filename: Pokemon_-_Yellow_Version.gbc
Scraped Name: Robopon: Sun Version
Comments: The scrape also included the description for Robopon: Sun Version, and for some reason, the images for Pokemon: Gold Version.Filename: Super_Mario_Advance.gba
Scraped Name: Chaoji Maliou Shijie
Comments: The scrape had the correct description and images.Filename: Super_Mario_Advance_3_-_Yoshi's_Island.gba
Scraped Name: Yaoxi Dao
Comments: The scrape had the correct description and images.Filename: Wario_Land_4.gba
Scraped Name: Waliou Xunbao Ji
Comments: The scrape had the correct description and images.I think those may have been the only ones that scraped incorrectly.
I'm using Skyscraper v2.4.3.
-
@maroonout09
Just tested all of them, these are the reasons and what you can do about it:Pokemon_-_Yellow_Version.gbc:
It returns a match for Robopon: Sun Version because of the "-" in the filename (it will include this in the search which messes with it, I will consider removing these dashes automatically in 2.4.4). And since that name matches 83%, it accepts it. You can make it work by changing the name of that file to "Pokemon_Yellow.gbc"Super_Mario_Advance.gba / Super_Mario_Advance_3_-_Yoshi's_Island.gba / Wario_Land_4.gba:
These titles are actually correct, they are just the 'wor' region titles for them and are the titles ScreenScraper returns for them. I was not aware that the 'wor' titles were sometimes to the japanese titles, so I'll prioritize the 'eu' and 'us' titles higher for the next release (2.4.4). In the meantime, please set 'region' manually with '--region us' or '--region eu' to prevent this from happening.Thank you for reporting this, I appreciate it.
-
@muldjord For Amiga: Deluxe Pacman is scraped as Deluxe Pac Man, and not found. Rock n Roll is not found, too.
With .lha files it doesnt add [AGA] anymore. Not really important though.Edit: Shame that we cant use LemonAmiga or HOL.
-
@analoghero [AGA]'s will be back in 2.4.4. :) And so will [CD32], [CDTV] and [Demo].
You can change the filenames of your lha's if you want better results. Try changing "DeluxePacManxxx.lha" to "DeluxePacmanxxx.lha" for instance, that might fix it. But for now many Amiga games with .lha suffix will scrape wrongfully since I have to convert the filenames on the fly to add spaces, and that is just bound to be a problem.
I'm working with Dom from the Amiberry team for a better solution in the future. But for now, this will have to do. I also would like to point out that Skyscraper is the only scraper to even support the .lha's at this point, so I guess anything is better than nothing. Skyscraper scrapes about 75% of the lha's at the moment.
EDIT: Agreed, I actually supported LemonAmiga and HOL half a year ago, but had to remove support since I couldn't get official permission to scrape from their sites... :S I never got a reply to my emails if I recall correctly. And without permission I won't use them of course.
-
@muldjord Yes i know that they were once supported, but removed. I think they assume a scraper for a well known platform such as retropie will cause a lot of traffic. Good idea just to rename files. Will try that. :)
-
@muldjord Thank you very much for your help!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.