RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login

    Versatile C++ game scraper: Skyscraper

    Scheduled Pinned Locked Moved Ideas and Development
    skyscraperscrapergamelist.xmlscrapinggithub
    1.6k Posts 113 Posters 1.6m Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SammyBoy
      last edited by

      @chipsnblip ok I'll stop spamming stuff on this thread because I know that he works hard on this stuff. And @muldjord sorry for being so annoying.

      1 Reply Last reply Reply Quote 1
      • A
        aphyx
        last edited by

        @muldjord First of all, love what you’re doing with Skyscraper. It’s a cracking scraper – fast, efficient, and the Simple mode works very well. Keep up the good work!

        I have a question which I’ve not been able to find the answer to after much searching and experimenting:

        The gamelist.xml file produced by Skyscraper is populating the <image> node with screenshots, whereas my theme (Art book) requires the boxart / cover to populate the <image> node (it ignores the <cover> node entirely). I’m keen to keep all the image types in the db, so don’t want to simply copy / replace the covers onto the screenshots, and try and I may I cannot get the artwork.xml to produce the required <image> node. The other scrapers I use put the boxart / cover in the <image> node, but I’m keen to use Skyscraper as my default from now on.

        I’m assuming that the nodes and their scraped content are hard-coded by Skyscraper, but is there any way I can edit the config / src files to get it to populate the <image> node with the cover art? I tried editing the code in emulationstation.cpp to do just that but it doesn’t seem to have worked. Have I got a config setting wrong somewhere?

        My workaround for now is a series of find / replace tasks in a text editor, but I’d love to be able to get this excellent scraper to output the perfect xml file for my theme.

        muldjordM 1 Reply Last reply Reply Quote 0
        • muldjordM
          muldjord @aphyx
          last edited by muldjord

          @aphyx Check the artwork.xml documentation. You can add an <artwork type="screenshot"> node that has the <layer type="cover"> inside. It will then export the cover as the screenshot (which populates the <image> node as you request).

          Concerning the <cover> node I can't remember if I export that currently. I think I do, but I am on vacation currently so I'll have to check it when I get back home in a weeks time. :)

          1 Reply Last reply Reply Quote 1
          • muldjordM
            muldjord
            last edited by muldjord

            So, currently working on the option of providing filename(s) on command line. My current approach is this: If one or more filenames are provided on command line, it will scrape those using the provided platform with '-p'. It will NOT alter the game list you have, and it will NOT process artwork files. It WILL cache or update the local data and media for the games provided, meaning that a scraping with '-s localdb' afterwards, will make use of the data.

            The usefulness as I see it, would be that it would allow you to experiement with changing the filename for a better result from the scraping. And when you've found one that works, you can scrape everything again.

            Would this be useful to you guys? Comments? Suggestions? It's been requested quite a few times, so I'm very interested in feedback on this.

            1 Reply Last reply Reply Quote 2
            • muldjordM
              muldjord
              last edited by

              Skyscraper 2.2.6 released: https://github.com/muldjord/skyscraper

              • Now always caches resources locally, even if pretend is set
              • Optimized 'simple mode' generated script. Now has '--pretend' set for all non-local modules to avoid artwork processing on those runs. This is a lot faster and provides the same result
              • Added the possibility to supply one or more filenames on the command line - it will then ONLY scrape those particular files. Platform still has to be set with '-p' for this to work
              • Fixed bug where [tags] would be appended twice when using '--forcefilename'

              The much requested feature of providing filenames on command line is implemented in this release. When used, it will scrape those files exclusively and cache the resulting data in the local database cache. To make use of the data afterwards you need to rescrape the entire platform using '-s localdb'. Please give it a go and let me know what you think.

              Happy scraping! :)

              1 Reply Last reply Reply Quote 4
              • S
                SammyBoy
                last edited by SammyBoy

                @muldjord whatever I do I just can't get the textual data to scrape and I've created the definitions.dat folder and the made the ROM base.txt file what am I doing wrong? These are the files that I have made:

                Description: ###DESCRIPTION###
                Developer : ###DEVELOPER###
                Publisher : ###PUBLISHER###
                Rating : ###RATING###
                Genre : ###TAGS###

                Description: Being a real-estate magnate used to be hard work. Thanks to technology, the computer does all the hard stuff, like rolling dice, for you. Freed of the anxiety of arguing over leaners, you have plenty of time to strategize your next financial move. Up to four players, or you and three computer opponents, take turns around the board. A player wins when the others go belly up. You can also play a quick game mode or a time limit game. In any of the game modes, you can institute your own house rules. Some of those could include awarding a person for landing on free parking, dealing out some properties at random to begin the game, or changing the number of properties you have to own in a group before you can build houses. The easy-to-use interface makes it easy to manage your finances. When you have multiple players, the games can take a long time to finish, but that's the nature of Monopoly.
                Developer : Takara
                Publisher : Destination Software Inc
                Rating : 3.5
                Genre : Strategy

                The name of the file is Monopoly (U).txt

                muldjordM 1 Reply Last reply Reply Quote 0
                • muldjordM
                  muldjord @SammyBoy
                  last edited by muldjord

                  EDIT: Nevermind, read your post wrong. Looks ok to me, not sure what is wrong. I recommend using an xml based format instead, try that.

                  S 1 Reply Last reply Reply Quote 1
                  • S
                    SammyBoy @muldjord
                    last edited by

                    @muldjord tried that and it is the exact same result (i tried it first)

                    1 Reply Last reply Reply Quote 0
                    • S
                      SammyBoy
                      last edited by

                      @muldjord what should I do?

                      1 Reply Last reply Reply Quote 0
                      • muldjordM
                        muldjord
                        last edited by

                        All I can say is that if you've followed the documentation completely, it WILL work. I'm assuming you might have something wrong with where you've placed the files, what you've named them or what command line you run Skyscraper with. And since I don't know any of these things, I can't tell you what to do.

                        1 Reply Last reply Reply Quote 1
                        • S
                          SammyBoy
                          last edited by

                          @muldjord i send pictures and the command I use is Skyscraper -p gba -s import

                          1 Reply Last reply Reply Quote 0
                          • S
                            SammyBoy
                            last edited by

                            Visit my Twitter Take a look at Sammy boy (@samsaju04_boy): https://twitter.com/samsaju04_boy?s=09

                            1 Reply Last reply Reply Quote 0
                            • S
                              SammyBoy
                              last edited by

                              @muldjord the pictures are on Twitter - the first thing you see

                              1 Reply Last reply Reply Quote 0
                              • muldjordM
                                muldjord
                                last edited by muldjord

                                Please see next post.

                                muldjordM 1 Reply Last reply Reply Quote 0
                                • muldjordM
                                  muldjord @muldjord
                                  last edited by muldjord

                                  Please try rerunning it with 'Skyscraper -p gba -s import --updatedb'. That might be the problem.

                                  1 Reply Last reply Reply Quote 0
                                  • muldjordM
                                    muldjord
                                    last edited by

                                    Skyscraper 2.2.6a released: https://github.com/muldjord/skyscraper

                                    • Now always sets '--updatedb' when using 'import' scraping module

                                    Get this release, then it'll work without '--updatedb'

                                    S 1 Reply Last reply Reply Quote 0
                                    • S
                                      SammyBoy @muldjord
                                      last edited by

                                      @muldjord Thank you so much!

                                      1 Reply Last reply Reply Quote 0
                                      • juraJ
                                        jura
                                        last edited by

                                        Hi @muldjord , I used your software for the scraping the images for roms and I loved it.

                                        Seems to work almost OK, sometimes when game names are closely the same e.g. gameName 1 and gameName 2 it messes the description for these games. There's might be a chance that these descriptions are messed up in the location where it get it from at the first place, dunno.

                                        When I checked the Lakka which is another Retro gaming fontend seems that they get images and thimbnails from the libretro. Dunno is it only based on the name of the game.

                                        Any chance to use also this location which is used in Lakka for the images to the games? :)

                                        https://github.com/libretro/libretro-thumbnails/tree/master/

                                        There's nothing unclear about IT

                                        muldjordM 1 Reply Last reply Reply Quote 0
                                        • muldjordM
                                          muldjord @jura
                                          last edited by muldjord

                                          @jura It doesn't mess them up, rather it sometimes gets faulty results from the sources, which is a side-effect of the highly automated way Skyscraper works. For instance, if you have a relatively unknown game called "Star Blob" and a game called "Star Fox", it will quite likely get a correct hit for "Star Fox" since it's a well-known game. BUT, it will probably also return the data for "Star Fox" when it searches for "Star Blob" since that's how the source search engines work. So it doesn't mess them up, it just, sometimes, gets faulty results where it seems like the previous game has "spilled over" into the next entry, when in fact it only does so because the names are closely related. In this example by the "Star" in the name.

                                          You can force it to not accept these by changing the minimum match percentage with the '-m' flag on command line. :) It defaults to 50%, which means that "Star Blob" and "Star Fox" gives a positive result.

                                          I can't use a github repository for scraping, simply because I do not have the permission from github to do so. Github is not meant to be used for game media scraping and I highly doubt they would allow it if they knew people used it for that purpose.

                                          EDIT: On a sidenote, I plan on completely rewriting the way matches are achieved, so matches will improve in a future version. No ETA though.

                                          1 Reply Last reply Reply Quote 2
                                          • muldjordM
                                            muldjord
                                            last edited by muldjord

                                            A huge thank you to @chipsnblip for spreading the word about Skyscraper on reddit. I notice these things, and I appreciate it a lot! :)

                                            chipsnblipC 1 Reply Last reply Reply Quote 3
                                            • First post
                                              Last post

                                            Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                                            Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.