Versatile C++ game scraper: Skyscraper
-
Skyscraper 3.4.0 released: https://github.com/muldjord/skyscraper
- Disabled config.ini migration as most people should be migrated now
- Added 'launcher' option to 'simple mode' when using 'pegasus' frontend
- Added 'excludeFiles' config option that allows excluding certain files when scraping (Thank you to 'timothybrown' for suggesting this)
- Added 'includeFiles' config option that allows only including certain files when scraping
- Added '--excludefiles' cli option that allows excluding certain files when scraping
- Added '--includefiles' cli option that allows only including certain files when scraping
- If 'noresize' is set all images are now saved to cache in their original format and size instead of always converting them to PNG's (Thank you to 'krkroft' for requesting this)
- Added 'jpgQuality' config option that sets the default JPG quality (0-100) when '--noresize' is NOT set. Screenshots and images with transparency are still saved as PNG's.
- Upped 'screenscraper' request limiter to 1.2 seconds per request to avoid 'maximum threads per minute reached' error message
Quite a few new things made it into this release. The most prominent one is that Skyscraper now handles the
--noresize
option a bit differently. The option makes Skyscraper save all media "as is" from the sources, without resizing them. But in older versions I always converted all artwork resources to PNG when saving them, which caused the 4000x3000 cover artwork for some games to take up huge amounts of space. So when user 'krcroft' requested that the artwork would use the format that the sources served to me instead, I thought about that for half a year... hrm... Anyways, I completely rewrote the artwork acquiring code and internal structures so it will now, in fact, save the artwork exactly as is when using the--noresize
orcacheResize="false"
options. While doing this I also optimized how Skyscraper works by default, so the sizes of any artwork will now be a bit bigger and take up less space, pretty much without any loss of quality. So it's a win-win on all fronts! You can even save more space by setting and lowering thejpgQuality="95"
config.ini option.Another quite interesting feature is the
--excludefiles
/excludeFiles=
option and companion--includefiles
/includeFiles=
option. These will allow you to mask out any files within the scraping scope. So if you have a bunch of files with[BIOS]
in their name, you can now mask these out but enteringexcludeFiles="*[BIOS]*"
in config.ini or similar for the CLI option. This can be set on several levels, check the documentation for details.Lastly, I rarely scrape a huge number of files these days as I basically have all of my data in my cache now. So it was only today, when testing this release, I stumbled upon an error when scraping with the
screenscraper
module. Turns out my 60 request per minut limiter was too close to the edge and sometimes ScreenScraper would actually give back a "too many requests" error. So I've upped the limiter a bit to 1.2 seconds per request per thread. This fixed the issue. So if you've been having json errors from screenscraper, please try it again.As I have rewritten quite a lot of code for this release I've also tested it quite thoroughly. I have not found any errors. But software has bugs, so please let me know if this version gives you any problems.
-
Thank you so much for your ongoing effort! π
-
Is anyone else having trouble scraping through the
screenscraper
module right now? I just keep getting amaximum threads allowed already used : 5/5
and I keep getting libpng errors... Something is going on. Please let me know if you are also seeing these issues. -
@muldjord I scraped some PSP ROMs about 6 hours ago and didn't have any issues. So if there is a problem it just started recently.
-
Thanks, yeah, I just tried it again and now it seems to be working... Oh well. :)
EDIT: Asked on their Discord and turned out it was a bug in the API that they've now fixed.
-
Skyscraper 3.4.1 released: https://github.com/muldjord/skyscraper
- Further optimized artwork space requirements. Now checks if original takes up less space than resized artwork, then forces use of original for those cases
- The 'thegamesdb' module now also supports wheel and marquee for the games that have them (Thank you to 'tv21' for pointing this out)
- Updated developer and publisher json list for 'thegamesdb'
thegamesdb
now supports retrievingwheel
andmarquee
artwork resource types. And I optimized the artwork resource export pipeline a bit further. If you scraped with 3.4.0 there's no need to redo it, it's only a small difference. -
Any chance you can updated the Windows version to 3.4.1? Unless you have an update in the pipeline.
-
@LiveFastCyYoung Sure, I've updated it to the latest 3.4.2 now. Get it here: http://www.muldjord.com/downloads/Skyscraper_3.4.2_unsupported_win_version.zip
-
@muldjord Much appreciated!
-
Skyscraper 3.4.3 released: https://github.com/muldjord/skyscraper
- Implemented workaround to incorrectly formatted JSON returned from 'screenscraper' when checking user credentials
- All arcade platforms now use 'flyer' from 'screenscraper' for cover artwork instead of 'box-2D'
There have been issues logging in to
screenscraper
lately. This stems from a bug in their code which formats the returned JSON incorrectly after checking the credentials. This leads to a parse error in the Skyscraper JSON parser which then lead to a failed login. I've implemented a workaround that fixes the JSON before parsing it. I have reported the bug to them, I hope they fix it soon so I can remove this workaround again.User @aidy80-s suggested I use the
flyer
artwork fromscreenscraper
for the arcade platforms. This was a brilliant idea, so this has now been implemented as well. -
Have you ever looked into adding the LaunchBox DB as a scraping source?
-
@LiveFastCyYoung said in Versatile C++ game scraper: Skyscraper:
Have you ever looked into adding the LaunchBox DB as a scraping source?
Yes, several times actually. I can't remember exactly why, but as I recall it is not an open API like other sources use.
-
Certain files get scraped as ZZZnotagame instead of just ignored (especially +StartDOSbox.sh) Its really annoying to have to manually rename these after scraping. Is there anyway to avoid this?
-
@quicksilver Hmm, I already have a filter to avoid these, but maybe some of them goes under the radar. Can you provide a specific filename for an entry that returns this?
-
@muldjord Ok cool, if that is the case let me try doing a cache refresh and then rescrape first to see if its fixed now and I'll report back.
-
Refreshing the cache and rescraping took care of the issue. Thanks for the quick response.
Edit: Just as a side note to anyone else having the same issue. You'll need to purge your cache for that specific system otherwise when you rescrape it will grab the ZZZnotagame info from your cache and cause the problem all over again.
-
@muldjord I cannot figure it out. I don't have the credentials commented out, they are properly written, yet I continue to get a "Received invalid ScreenScraper server response, maybe their server is having issues, forcing 1 thread..." and then nothing but empty Json scrapes. Please help! This is with version 3.4.3 BTW.
-
@AlCzervik It works fine, just tested it. Maybe their servers were having issues (they are having a lot of issues in general because of so many people scraping at the moment), please try again.
-
@muldjord I'll take another look but are there other things I can check that could cause that issue? Is there a way I can test the API?
-
@AlCzervik Hmm, actually that response is also shown if your credentials are wrong. It needs to look exactly like this in the config.ini:
[screenscraper] userCreds="user:pass"
But with your own user and pass of course. If you've entered that correctly and don't have any stray configs messing with that, it will work.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.