RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login
    Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

    Trouble scraping ZX Spectrum titles in Skyscraper from World of Spectrum

    Scheduled Pinned Locked Moved Help and Support
    zxspectrumworldofspectrumskyscraperscraping
    11 Posts 2 Posters 633 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • LolonoisL
      Lolonois @sith.lord.goz
      last edited by

      @sith-lord-goz Hi, thanks for the heads-up. It might be that the website changed the layout and/or the software. It seems to me they have an issue on their side.

      Very likely on their side, cf. https://worldofspectrum.org/forums/discussion/comment/1019618/#Comment_1019618

      1 Reply Last reply Reply Quote 0
      • LolonoisL
        Lolonois
        last edited by Lolonois

        It seems reporting via their forum is also not possible atm.

        Edit:

        Anyway, the situation is manyfold.

        1. The search URL Skyscraper uses is no longer provided by worldofspectrum.org as they redo their website
        2. Currently Skyscraper uses HTML-scraping to retrieve the game metadata from their site, which must be overhauled due to their website redesign
        3. Scraping via web API would be preferred, however their web API is WIP [1] as I understand it. Textual metadata can be queried via an web API, but It lacks the abilitiy to request media files. Also it limits search results to 10 matching results.
        4. Also I did not find an information how to ask for an API-key to bypass the 10 results limit.

        TL;DR consider the WoS scraper broken for above reasons.
        Once the situation is settled on their site and as my time permits I will adapt+fix the WoS scraper.

        As a workaround: Use other scraping modules of Skyscraper (screenscraper, mobygames, ...) or do use the import scraper for manually collected spectrum game info to populate your Spectrum gamelist.

        [1] https://worldofspectrum.org/using-the-api/software

        S 1 Reply Last reply Reply Quote 0
        • S
          sith.lord.goz @Lolonois
          last edited by

          Thank you so much @Lolonois for digging into that and confirming what I saw too. Hopefully they will get it fixed up and eventually make it easier to integrate w/Skyscraper. For now yup I can fallback to the other scrapers. Keep up the great work - cheers!

          1 Reply Last reply Reply Quote 0
          • LolonoisL
            Lolonois
            last edited by

            You may use also this Python script to scrape the majority of information from the worldofspectrum site ad interim

            https://gist.github.com/Gemba/c323e8036b921a1aa2fb927bb4958928

            Usage: See comments in header of script.

            S 1 Reply Last reply Reply Quote 1
            • S
              sith.lord.goz @Lolonois
              last edited by sith.lord.goz

              @Lolonois This script is great thanks, definitely good for filling in the ~600 gaps in the 3,800 total ZX games I scraped from screenscraper. I think maybe the other dependency is python3-bs4? In the header, python3-requests is listed twice.

              Thanks again

              UPDATE: it's interesting that the results on the WoS archive page https://worldofspectrum.org/archive/software/games do not match the Infoseek results... at least not for a test search string "Force, The". This term will return "The Force" game entries in the archive, but not in InfoSeek. Could be the comma, or the way it treats spaces? Just something I noticed.

              LolonoisL 1 Reply Last reply Reply Quote 0
              • LolonoisL
                Lolonois @sith.lord.goz
                last edited by

                @sith-lord-goz Thanks for the notes. I have updated the gist.

                The "Forge,The" thing: Most likely some logic between browser search and before it hits the API or DB. Things are not final at their side and obviously not consolidated that same inputs yield the same output regardless of which search route is used.

                S 1 Reply Last reply Reply Quote 0
                • S
                  sith.lord.goz @Lolonois
                  last edited by

                  Sorry if this is a little off-topic for this thread - I can start a new one if that's better... but how would I know which scrapers support the --query param, and which values for that param I can/should use? I can see the readme examples for screenscraper (romnom, md5 etc.) but do some/all the other scrapers support it?

                  For instance, I see IGDB has an "IGDB ID" for each game - so in theory if I pick one (like https://www.igdb.com/games/3d-deathchase ) and I can't find a match in WoS... then would:

                  Skyscraper -p zxspectrum -s igdb --query="<IGDBID>" <path-to-this-rom>

                  do anything? (Or whatever format of the --query param should be for this scraper)

                  1 Reply Last reply Reply Quote 0
                  • LolonoisL
                    Lolonois
                    last edited by

                    From my memory the IGDB scraper module does not honor the game id, yet.

                    Did you try it btw. with --query=<igdb-game-id>?

                    If some scraper supports other search modes than text it is in the respective documentation of the scraper module.

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      sith.lord.goz @Lolonois
                      last edited by

                      Yeah I tried that, and various combos of "IGDBID=" and "id= etc. No big deal, just curious, thanks.

                      LolonoisL 1 Reply Last reply Reply Quote 0
                      • LolonoisL
                        Lolonois @sith.lord.goz
                        last edited by

                        @sith-lord-goz You may try with Skyscraper 3.16 which adds --query="id=<IGDB-id>". However, the Game filename must have some similarity with the IGDB Game Title. Also the GameBase DB import from the SpeccyMania GameBase may come in handy.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post

                        Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                        Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.