Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

Versatile C++ game scraper: Skyscraper


  • Global Moderator

    @muldjord said in Versatile C++ game scraper: Skyscraper:

    I'm pretty sure it's not enabled by default unless @mitu changed that. It shouldn't be.

    Yes, it's not enabled by default - when the configuration is first generated, the refresh is disabled - https://github.com/RetroPie/RetroPie-Setup/blob/922149b09dfab04408e133114bcb2a97cf7ed87c/scriptmodules/supplementary/skyscraper.sh#L358.



  • @Clyde Yes, caching the data is extremely important for any scraper if you ask me. :) Have fun with it!

    @dudumaroja, see mitu's answer above. :)



  • Skyscraper 3.3.0 released: https://github.com/muldjord/skyscraper

    • MAJOR: File identification now uses new quick id method for up to 75% faster processing (Thank you to 'langest' for finally making me look into this)
    • Added 'pc98' platform (Thank you to 'leosmeira' for suggesting it)
    • Added 'pokemini' platform (Thank you to 'leosmeira' for suggesting it)
    • Renamed all 'sha1' file id's to 'id' as sha1 was misleading
    • Changed relevant defines to constexpr
    • ScreenScraper now always prioritizes the 'video-normalized' above 'video' (Thank you to 'JuanVCS' for suggesting this)
    • Fixed bug in ScreenScraper retry code which made it retry more than necessary

    Most prominent feature is the new quick id concept. So far Skyscraper has calculated the internal cache id's from the data of the roms every time you run Skyscraper. This seemed like a lot of work for files that rarely change, and takes a long time to do, especially for the larger roms such as n64 roms. So, I've made a new system called quick id which is basically an id caching that works seemlessly with the cache id system. Here's how it works:

    • Whenever a file processed for anything using the cache id, the cache id is calculated using the "slow" method. And at this time, the file path and file modified time is also saved in a quick id xml file. Then, the next time this particular file needs processing, it will first compare the file path and modified time from the quick id list to the actual file. If the file is still located at the same place and hasn't been modified since last run, it will look up the cache id from the quick id list.

    So everything will be "as slow as usual" the first time you process any file, since it is then forced to use the "slow" id technique as it doesn't yet know the id. But for any subsequent processing of that particular file, you will see up to a 75% improvement in speed (the larger the roms, the speedier it will be). A great platform to test it on is the n64 platform. But remember, after updating, you won't see the speed improvement until you've processed the files one way or another once so the id's get cached in the quick id list.

    A great way to test this is to update to 3.3.0, then regenerate the gamelist for any platform with Skyscraper -p <PLATFORM> and take note of how long it took . When that's done do it again and take note of the time again. It should have taken significantly shorter time the second time you ran it, as it was now able to use the quick id feature.
    The quick id list can be vacuumed together with the cache with the --cache vacuum option to avoid having lingering quick id's if you move files around or rename them.

    A lot of code has changed in this version. I've fixed some silly bugs in the ScreenScraper module aswell which made it retry more times than necessary. If you encounter any problems with any of the above, please let me know. As always I've been testing this quite a bit on my own systems, but it's impossible for me to test any use case.



  • @muldjord said in Versatile C++ game scraper: Skyscraper:

    • ScreenScraper now always prioritizes the 'video-normalized' above 'video'

    If I wanted to purge all videos from the cache to re-scrape their normalized versions, would this be the right command? (run from whichever location of the Skyscraper executable)

    Skyscraper -p PLATFORM --cache purge:video
    


  • @Clyde Almost. You need to run Skyscraper -p PLATFORM --cache purge:t=video,m=screenscraper to purge all videos from ScreenScraper. But you can just rescrape it with --refresh. Then it will overwrite the old videos automatically.



  • @muldjord Thank you. But wouldn't --refresh re-scrape all resources instead of only the videos? Although the videos are the biggest resource, I would like to avoid re-scraping every other resource as well.



  • @Clyde Long story short, that's not how it works. It always downloads "everything or nothing" due to my current implementation of the cache and how the online databases work.



  • @muldjord That's what I feared after reading the docs, thanks again. One last question (I hope): If I purge the videos only, then only they would be re-scraped in a normal scraping run without --refresh, right?

    If so, I'd rather use that, since I would very much like to have the normalized videos, but with as little stress to the SS servers as possible. I'm already donating 5€/month to them, but I don't see that as a free ticket to do whatever I like. 😇

    If it ever happens that you don't know what else to do on Skyscraper, more control over refreshing the cache would be nice (if the online servers allow it). 😊



  • @Clyde said in Versatile C++ game scraper: Skyscraper:

    @muldjord That's what I feared after reading the docs, thanks again. One last question (I hope): If I purge the videos only, then only they would be re-scraped in a normal scraping run without --refresh, right?

    No. ;) Then it will just fetch the game from the cache without the video. Entries are only ever redownloaded from the online sources if --refresh is set. Otherwise it fetches what it has from the cache.

    If it ever happens that you don't know what else to do on Skyscraper, more control over refreshing the cache would be nice (if the online servers allow it). 😊

    I am painfully aware :D But it's not a small change as my current implementation doesn't support this. I have ideas for it, and it might happen at some point, but don't hold your breath for it. :) Maybe for 4.0.0 at some point down the line.



  • Thanks a final time in this matter. I'll register this as one of the many things we'll have to live with in an imperfect world, then. :)



  • Skyscraper 3.3.1 released: https://github.com/muldjord/skyscraper

    • Added new '--cache edit:new=<TYPE>' option for efficient batch adding of a resource of a certain type
    • Improved ctrl+c handling
    • Protected cache write calls from ctrl+c obstruction

    If you've created reports with missing textual resources for certain types, it is now easy to add those resource using the Skyscraper -p <PLATFORM> --cache edit:new=<TYPE> --fromfile <REPORTFILE>. This will only edit the roms from the report, and move directly through each of them asking for that particular resource type to be added. Much easier and faster than before.



  • @muldjord I just looked at your priorities.xml.example. The order of entries for each section seems rather random. Is it so for exemplary reasons, or what did you base the arrangements on? It doesn't always seem to follow your suggestions in https://github.com/muldjord/skyscraper/blob/master/docs/SCRAPINGMODULES.md.

    Also, this file doesn't seem to be installed with the Retropie build of Skyscraper. I had to get it directly from your Github repository. Is this intentional, and if so, why?

    Thanks in advance for any light shed on these.



  • @Clyde said in Versatile C++ game scraper: Skyscraper:

    @muldjord ...what did you base the arrangements on?

    It is based on a bunch of tests I did 2 and a half years back and some adjustments I've made to it since after adding new modules. Basically it is a subjective thing and people are supposed to define it themselves. It works well for me as it is but as it is a global default it will of course work better for some than others. Keep in mind that if modules are missing, it doesn't mean the data won't be used, then it's just decided by timestamp.

    It doesn't always seem to follow your suggestions in https://github.com/muldjord/skyscraper/blob/master/docs/SCRAPINGMODULES.md.

    If you have suggestions for changes to the defaults I'll gladly hear about it. Just keep in mind that it has to work well for all platforms.

    Also, this file doesn't seem to be installed with the Retropie build of Skyscraper. I had to get it directly from your Github repository. Is this intentional, and if so, why?

    I thought it did. Maybe @mitu can shed some light on this. He's the script author.


  • Global Moderator

    @muldjord said in Versatile C++ game scraper: Skyscraper:

    I thought it did. Maybe @mitu can shed some light on this. He's the script author.

    It's not included because at that time it didn't exist (?), but since it should be copied to the cache folder and that folder might not exit during installation, I can maybe add it directly to the normal Skyscraper folder.



  • @mitu Skyscraper looks for it in the cache folder, and copies it to the platform subfolders inside the cache folder as a default. If it doesn't exist in the cache folder it just ignores it and only ever prioritizes by timestamp. That can cause some weird behaviour. The best result would be achieved if a cache folder was created upon installation and the priorities.xml.example file was copied there.

    EDIT: Alternatively I can make Skyscraper look for it in ~/.skyscraper directly as well and copy it from there. Let me know if creating a cache folder is an issue, then I will implement this.


  • Global Moderator

    @muldjord said in Versatile C++ game scraper: Skyscraper:

    EDIT: Alternatively I can make Skyscraper look for it in ~/.skyscraper directly as well and copy it from there. Let me know if creating a cache folder is an issue, then I will implement this.

    Don't need to modify the behaviour - I'll check again the scriptmodule and we'll find a way to add it.



  • Thank you both very much. I'm happy that I could contribute a tiny little bit by mentioning the missing file. :) I stumbled upon it when I wondered how skyscraper would choose which artwork it will use if multiple sources have been scraped, which I did for the first time last weekend. I happily noticed that Skyscraper will fill the slots from different sources in case of missing content from the preferred one.

    @muldjord Now that I know more thanks to you, I'll look into the priorities and decide if I want to change any of them. If I come up with anything I deem useful universally, I'll post it here.



  • @muldjord said in Versatile C++ game scraper: Skyscraper:

    Skyscraper looks for it in the cache folder, and copies it to the platform subfolders inside the cache folder as a default.

    When does it do that? And does it update the copies if the example file was changed?

    Thanks.



  • @Clyde said in Versatile C++ game scraper: Skyscraper:

    When does it do that?

    Whenever you run it and it doesn't already have an existing priorities.xml in the platform cache/<platform> folder it will look for cache/priorities.xml.example and copy it to the folder as priorities.xml to have a decent default to use.

    And does it update the copies if the example file was changed?

    No. If one already exists, it never copies it.



  • @muldjord Thanks!

    (Sometimes, I want to say that literally instead "only" upvoting the helpful post. 😉 )



Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.