Trouble scraping ZX Spectrum titles in Skyscraper from World of Spectrum
@sith-lord-goz Hi, thanks for the heads-up. It might be that the website changed the layout and/or the software. It seems to me they have an issue on their side.
Very likely on their side, cf.
It seems reporting via their forum is also not possible atm.
Anyway, the situation is manyfold.
- The search URL Skyscraper uses is no longer provided by as they redo their website
- Currently Skyscraper uses HTML-scraping to retrieve the game metadata from their site, which must be overhauled due to their website redesign
- Scraping via web API would be preferred, however their web API is WIP [1] as I understand it. Textual metadata can be queried via an web API, but It lacks the abilitiy to request media files. Also it limits search results to 10 matching results.
- Also I did not find an information how to ask for an API-key to bypass the 10 results limit.
TL;DR consider the WoS scraper broken for above reasons.
Once the situation is settled on their site and as my time permits I will adapt+fix the WoS scraper.As a workaround: Use other scraping modules of Skyscraper (screenscraper, mobygames, ...) or do use the import scraper for manually collected spectrum game info to populate your Spectrum gamelist.
Thank you so much @Lolonois for digging into that and confirming what I saw too. Hopefully they will get it fixed up and eventually make it easier to integrate w/Skyscraper. For now yup I can fallback to the other scrapers. Keep up the great work - cheers!
You may use also this Python script to scrape the majority of information from the worldofspectrum site ad interim
Usage: See comments in header of script.
@Lolonois This script is great thanks, definitely good for filling in the ~600 gaps in the 3,800 total ZX games I scraped from screenscraper. I think maybe the other dependency is python3-bs4? In the header, python3-requests is listed twice.
Thanks again
UPDATE: it's interesting that the results on the WoS archive page do not match the Infoseek results... at least not for a test search string "Force, The". This term will return "The Force" game entries in the archive, but not in InfoSeek. Could be the comma, or the way it treats spaces? Just something I noticed.
@sith-lord-goz Thanks for the notes. I have updated the gist.
The "Forge,The" thing: Most likely some logic between browser search and before it hits the API or DB. Things are not final at their side and obviously not consolidated that same inputs yield the same output regardless of which search route is used.
Sorry if this is a little off-topic for this thread - I can start a new one if that's better... but how would I know which scrapers support the --query param, and which values for that param I can/should use? I can see the readme examples for screenscraper (romnom, md5 etc.) but do some/all the other scrapers support it?
For instance, I see IGDB has an "IGDB ID" for each game - so in theory if I pick one (like ) and I can't find a match in WoS... then would:
Skyscraper -p zxspectrum -s igdb --query="<IGDBID>" <path-to-this-rom>
do anything? (Or whatever format of the --query param should be for this scraper)
From my memory the IGDB scraper module does not honor the game id, yet.
Did you try it btw. with
?If some scraper supports other search modes than text it is in the respective documentation of the scraper module.
Yeah I tried that, and various combos of "IGDBID=" and "id= etc. No big deal, just curious, thanks.
@sith-lord-goz You may try with Skyscraper 3.16 which adds
. However, the Game filename must have some similarity with the IGDB Game Title. Also the GameBase DB import from the SpeccyMania GameBase may come in handy.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.