Trouble scraping ZX Spectrum titles in Skyscraper from World of Spectrum
-
Hi @Lolonois -
I am trying to use the worldofspectrum scraper within Skyscraper to pull info for my ZX Spectrum titles, and no matter the filename, it is not finding any results. I am sure I used this years ago out of the box without any problem - could it be that WoS has changed their api?
I see this is the public page: https://worldofspectrum.org/infoseek?q=deathchase (for example, if I search for "deathchase". Strangely I see an accordion for "Software" with (1) hit.. but I can't expand to see the results and it doesn't seem to show the game info anywhere.
Am I doing something wrong? I searched this forum, the issues on github, and the WoS forum - but nothing came up.
Thank you for your help!
-
@sith-lord-goz Hi, thanks for the heads-up. It might be that the website changed the layout and/or the software. It seems to me they have an issue on their side.
Very likely on their side, cf. https://worldofspectrum.org/forums/discussion/comment/1019618/#Comment_1019618
-
It seems reporting via their forum is also not possible atm.
Edit:
Anyway, the situation is manyfold.
- The search URL Skyscraper uses is no longer provided by worldofspectrum.org as they redo their website
- Currently Skyscraper uses HTML-scraping to retrieve the game metadata from their site, which must be overhauled due to their website redesign
- Scraping via web API would be preferred, however their web API is WIP [1] as I understand it. Textual metadata can be queried via an web API, but It lacks the abilitiy to request media files. Also it limits search results to 10 matching results.
- Also I did not find an information how to ask for an API-key to bypass the 10 results limit.
TL;DR consider the WoS scraper broken for above reasons.
Once the situation is settled on their site and as my time permits I will adapt+fix the WoS scraper.As a workaround: Use other scraping modules of Skyscraper (screenscraper, mobygames, ...) or do use the import scraper for manually collected spectrum game info to populate your Spectrum gamelist.
-
Thank you so much @Lolonois for digging into that and confirming what I saw too. Hopefully they will get it fixed up and eventually make it easier to integrate w/Skyscraper. For now yup I can fallback to the other scrapers. Keep up the great work - cheers!
-
You may use also this Python script to scrape the majority of information from the worldofspectrum site ad interim
https://gist.github.com/Gemba/c323e8036b921a1aa2fb927bb4958928
Usage: See comments in header of script.
-
@Lolonois This script is great thanks, definitely good for filling in the ~600 gaps in the 3,800 total ZX games I scraped from screenscraper. I think maybe the other dependency is python3-bs4? In the header, python3-requests is listed twice.
Thanks again
UPDATE: it's interesting that the results on the WoS archive page https://worldofspectrum.org/archive/software/games do not match the Infoseek results... at least not for a test search string "Force, The". This term will return "The Force" game entries in the archive, but not in InfoSeek. Could be the comma, or the way it treats spaces? Just something I noticed.
-
@sith-lord-goz Thanks for the notes. I have updated the gist.
The "Forge,The" thing: Most likely some logic between browser search and before it hits the API or DB. Things are not final at their side and obviously not consolidated that same inputs yield the same output regardless of which search route is used.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.