Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

Versatile C++ game scraper: Skyscraper



  • Skyscraper 1.7.2 released: github.com/muldjord/skyscraper

    • Added 'uvlist' scraping module (http://www.uvlist.net)
    • Added rating resource and support
    • Added rating support to lemonamiga
    • Added rating support to lemon64
    • Added rating support to mobygames
    • Added rating support to uvlist

    Enjoy!



  • @muldjord said in Versatile C++ game scraper: Skyscraper:

    I think a first implementation will be to define an xml format where your snaps would need to be in. Then you can scrape from that. That would then automatically add it to the localdb. So when you scrape from 'localdb' afterwards, it would use those snaps aswell as any other snaps you might have acquired.

    I would suggest something really simple as first implementation: an "import to localdb" function that will take the scrape images from a local folder. The key as you mentioned should be: image file name = rom filename.



  • @UDb23
    Ok, so here's what I am thinking: I will create a folder called "~/.skyscraper/import". In this folder there will be a folder per supported import type. So there will be a folder called "snaps" and a folder called "boxart". If you place files inside this folder with the precise filename of one of the rom files you are going to scrape, and set the scraper to '-s import', it will then import these images into the localdb with the source of "import" or similar. Then, when you scrape using 'localdb' afterwards, it will use those files. So basically the import procedure becomes a scraper alongside 'mobygames' and others. The difference is that is scrapes using files from the "~/.skyscraper/import" folder.

    I think this is a pretty good solution. It's pretty straightforward to do this with snaps, boxart, videos and other media files. I'll need to figure out how to do it for other resources. But let me start with media files.

    How does that sound?



  • @UDb23
    Aaaan here it is:

    Skyscraper 1.7.3 released: https://github.com/muldjord/skyscraper

    • Added 'import' scraper, scraping from resources located in '[homedir]/.skyscraper/import' folder
    • Added 'developer' support for 'uvlist' scraper
    • Improved html unescaping a lot
    • Cleaned up xml escaping

    The 'import' release. So the big new thing in this update is that I now allow you to import your own 'snaps' and 'boxart' into the local database using the '-s import' scraper. It's a local scraper that scrapes from '[homedir]/.skyscraper/import' resources. Place your images in the folders within with the EXACT name of any rom you will be scraping, and it will import that image into the local database. Then, afterwards, scrape with the '-s localdb' scraper to make use of the artwork in your frontend of choice. (Read more under "Local data import" in the Github readme)

    I will expand the import functionality over time to also allow importing of textual data. But for now, snaps and boxart seemed like the most important ones.

    Also, I realize that Skyscraper now has a bunch of very abstract features that might seem a bit confusing, so I plan to make some videos explaining how to properly make use of the powerful features Skyscraper now provides. Simply scraping using one scraping module is just the basic usage. You can get much, MUCH better results very simply by also making use of the '-s localdb' scraper AFTER you've scraped with a bunch of web sources. They'll all be stitched together to form near-100% perfect results for the frontend you choose.

    Anyways, more on this when I get the time to make those videos. :D

    That's all for now, enjoy! :)



  • @muldjord Great news ! Thanks .
    This week I should be able to dedicate time to test both VIC20 scraping and local image import functionalities.



  • @UDb23 Awesome! I'll probably expand it further during the coming days. I have some nifty ideas that will also allow for textual imports using a format defined in a definition file the user creates. More on this later. :)



  • Hi,

    first of all i must say thank you for this tool. It works great.
    I have a question about the localdb. If i scrape a system it creates a folder with screenshot and covers in the dbs folder. Now lets say in the first run i end up with a few games missing either box art or screenshot.
    How can i update the games which are missing something? Can you give some command line examples how to do that?



  • @AnalogHero
    Thank you, glad you find it useful!

    It is quite simple. There are two options for getting the remaining screenshots and covers.

    1. Scrape with more of the webscrapers, for instance 'Skyscraper -p [platform] -s mobygames' or 'Skyscraper -p [platform] -s screenscraper'. Then rescrape with 'Skyscraper -p [platform] -s localdb' afterwards. Each time you scrape with a web scraper, all of the data is cached. So the more web scrapers you've scraped a given platform with, the more complete results you'll get when you rescrape with 'localdb' afterwards.

    2. The second option is to import your own covers and screenshots. You do this by placing your files in the '[homedir]/.skyscraper/import' subfolders. Then run "Skyscraper -p [platform] -s import". As long as the filenames of your images files match EXACTLY the rom filenames (without the suffix of .nes or similar of course), it will then import the images you provided into the localdb. Then rescrape with 'Skyscraper -p [platform] -s localdb' and it will then use the images you've imported.

    Also, be sure to read the entire readme on github. It is all explained in more detail there. I plan to do some easy-to-understand videos when I get the time and motivation (I don't really like making videos, I'm more of a coder-dude).

    Hope this made sense. :)



  • Ahh i see. Wasnt sure how the cached data will be handled after scraping with more than one websource. Thanks for your help.

    Edit: Wow. This works amazing. I was using sselphs scraper, but i liked the mixed screenshot/boxart/logo from the screenscraper tool. But screenscraper is windows and the pictures are a little too small when using the nes mini theme from @ruckage .
    Now this is a perfect solution for me.

    If i had one whish free: A small gui would be nice. (with info whats in your localdb and stats about your scraped games and whats missing)



  • @AnalogHero A gui is something I'd really like to have. I am an experienced gui programmer aswell, so I certainly got the skills for it. Just need to get the command line version to a level where I feel I can leave it for a while. Then I might look into a gui. No promises, but it's certainly something I'd like to do.



  • @muldjord Maybe i found a bug. Cant scrape mastersystem, it says no entries to scrape although i have games in there for sure. Paths seem to be correct.

    Also looks like gamedatabase has blocked me. Anyone knows if they will unblock me ever again?



  • @AnalogHero I got unblocked from gamesdatabase again, but never used it since. I took the hint :)

    If it says no entries to scrape, probably you chose to skip existing? Or maybe your roms suffixes aren't supported. Where are your files located and what suffix do they have? 'mastersystem' only supports '.sms' according to the official RetroPie wiki, so that's what Skyscraper looks for.

    If you're using zipped rom files I suggest unzipping them, even if the emulator supports zipped files. There are a couple of disadvantags to zipped roms. The major one is that many pieces of software uses the sha1 checksum of the rom data to recognize the rom. If you use a zipped rom, this feature is completely negated since a zipped rom can vary in size and content. It is always best to use the raw rom files in my opinion. But it is just that, an opinion. :) I merely wanted to point out why I feel zipped roms don't make much sense from a frontend and scraping perspective.



  • @muldjord Yes they are zipped. Sure i can extract them. I scraped my other systems, which are also zipped, without any issues.



  • @AnalogHero
    Yes, I allow zipped files for the ones where the official RetroPie documentation says that they are supported. But my points stand, in my opinion it is not a good idea to ever use zipped roms unless they are dos games or similar. If they are simply one or more .nes, .sms, .gg or other rom files zipped up, it just brings a lot of disadvantages. You even loose space in some cases, because zipped roms often contain several versions of the rom inside, where a bunch of those are broken or not for the region you want. So if people instead unzip just the one they need, they would even save a lot of space on the Pi.

    So yeah, I have no idea why people use them zipped. I'm guessing it's because that's how they downloaded them and they don't know about the disadvantages. :)



  • @muldjord I just unzipped them. It makes not alot of difference in filesize even if you have full collections since i use a cleaned romset with one rom per game.



  • @analoghero Awesome!



  • @incunabula said in Versatile C++ game scraper: Skyscraper:

    Is there a way you can add local folders as a scraping option? For instance if i already had a folder of boxart and it has some images that the scraper modules are not returning could it be possible to "scrape" images from a default path like %roms%/boxart ?

    I found a minute to check in on things, so I can't really talk much, but I did want to reply to this question.

    For the NES/FDS I have a great deal of folders separating roms between Licensed, Unlicensed, Translations, Pirates, Prototypes, Hacks, etc...

    I spent a few days making great images for all of these folders. I have some examples in the other thread, but you'd have to go back a bit to see them since that was a few months back now. I think some of the coolest stuff is when there are multiple nationalities for a category and I put a backdrop of that country's flag behind the image of the particular system so when you're scrolling through the list of folders the console image remains the same but the flag behind it changes. :)

    There are synopsis entries for each of these folders as well, so when I run meleu's script it will pick up these images and display them. They've already been tested and there are no errors for any of the folders or games.

    Unfortunately, this is just for the NES and FDS at the moment. It might be YEARS before I get around to finishing this for all the major systems since I'm going to have to find a job soon. :(

    Anyway.... gotta run. Just wanted to give you that update.

    Things are looking to be pretty busy for me up until maybe through September. I will still be working on this stuff in my down time, but I won't be online much at all until things calm down.

    Later

    P.S. If you had any questions about how these images are stored, made and/or how meleu's script works with the synopsis with them just leave me a reply and I'll respond when I get back and see it.



  • @muldjord said in Versatile C++ game scraper: Skyscraper:

    Yes, I allow zipped files for the ones where the official RetroPie documentation says that they are supported. But my points stand, in my opinion it is not a good idea to ever use zipped roms unless they are dos games or similar. If they are simply one or more .nes, .sms, .gg or other rom files zipped up, it just brings a lot of disadvantages. You even loose space in some cases, because zipped roms often contain several versions of the rom inside, where a bunch of those are broken or not for the region you want. So if people instead unzip just the one they need, they would even save a lot of space on the Pi.

    What disadvantages do you see when you have a particular rom you intended to be in the zip, and only that rom?

    I've been zipping roms for about 10 years since I started work on the XBox and I've never noticed any negative effects. I also torrentzip them, which has its own benefits you can't get without zipping.

    The only exception to this rule I've seen so far is on the RetroPie for the Atari5200/800 and the Odyssey 2 systems since they both currently use emulators that don't support zip files.

    The size savings might not be all that much when you're talking about NES and Atari games, but when you start working with SNES and GBA and N64 games you can actually save a great deal of space by zipping your roms. (Especially when you have every licensed game that's in english as well as often times double or triple that library with all the other categories of roms).

    Just curious what disadvantages you see this bringing other than the one you mentioned when you have multiple versions of a game in the same zip file.



  • @Used2BeRX I mentioned the checksum problem and multiple roms in a zip problem. If you zip a rom, depending on what zip encoding you use, the rom will differ in size. That makes it very hard to do meaningful sha1 checksumming for use when scraping or in general identifying which rom you are looking at. Especially when caching data locally for scraping.

    For instance, many roms have a [!] version which is considered a "good dump". This means the rom is pretty much perfect. That one version is THE version of that game for that region. That's just ONE file. Now, someone decides to zip it. They zip it using X software. It suddenly becomes a different file when you look at the contents. They create a nice rom pack with this zip in it. Someone else downloads it. Some other dude does the same thing, but with a different zip encoding. This file also gets shared. Now you have THREE versions of the same rom file. You can probably see where I'm going with this.

    For identifying roms, this is a bit of a problem because I can't cache data for X rom and assume the same cached data will be used for the same game when I give my locally cached data to someone else. Because he might have zipped his file! So when he scrapes the exact same rom, because his was zipped this way or other, it won't be recognized.

    People can do what they want, I don't personally care. It just messes up a bunch of stuff for scrapers who want to not hit the web sources so hard and try to cache things to prevent that.



  • Skyscraper 1.7.4 released: https://github.com/muldjord/skyscraper

    • Added textual import with 'import' scraper using '[homedir]/.skyscraper/important/definitions.dat' file
    • Added video import with 'import' scraper
    • Improved 'uvlist' description scraping
    • Now properly handles empty nodes in EmulationStation gamelist.xml export

    Be sure to read the readme's thoroughly. Everything is explained in there. Enjoy!



Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.