Versatile C++ game scraper: Skyscraper
-
@muldjord Sentinel_v1.0.lha
I don't mean to keep bringing it up, but the same file works fine (as far as scraping goes) in FS-UAE Arcade (which also scrapes from OpenRetro.org).
-
@LineOf7s said in Versatile C++ game scraper: Skyscraper:
Sentinel_v1.0.lha
That's ok. It's the only way I can get it fixed if there's an error. So it seems like this particular file name doesn't have an uuid for searching, so it does an ordinary file name search for it. It finds 2 results where one has pretty much only the title, and the other has all data on it. And Skyscraper chooses the wrong one. I'll need to look into this.
-
@muldjord I haven't looked yet into exactly how it works (perhaps I should before I ask), but is this something the use of --query might be able to deal with?
-
@LineOf7s It's because I also include the "unpublished=1" entries when I search. FS-UAE probably doesn't. I include them since some of them are useful for demos and other unreleased / unfinished stuff (I had requests for including them). But I've just fixed this by implementing a fix that will distinguish it differently. I also found aminor bug in the whdLoadMap lookup, so this was pretty relevant. :) Anyways, it will be in 3.0.1 which I guess I could just release now.
-
Hooray! Something tangible you could tweak! :) I love it when that happens. Thanks for your efforts.
-
EDIT: Remember to use the
--refresh
or--cache refresh
option when testing it. Otherwise it'll use the old cached "no screenshot" entry.3.0.1 is out now. Please retest with the same file. It should work as expected now. Just keep in mind that this can happen for others. It all depends on whether they have two entries with the same name where one is unpublished. But as long as the unpublished one has a bracket note behind it, it should work for you (since it will choose the one without brackets which should be the complete result). Which is the case for the Sentinel release.
-
@muldjord 3.0.1 worked a treat. There may or may not have been others affected too, but I can confirm that The Sentinel now has cover art and a screenshot. Huzzah!
-
@muldjord @mitu i have some custom Folders in my Rom structure but they are all using scrapable Rom categories…
for example i have a dedicated "adult" Folder with some of the fba adult Content. so in respect of scraping this is simply arcade. however i cannot scrape them because recognization is done via Rom Folder Name i guess… is there a way to tell Skyscraper that i want to scrape "adult" as "arcade"?
other examples are "satview" for seperated satella view roms. this would simply be scraped as "snes"
i am using the gui for scraping and i am on retropie 4
-
@robertvb83 said in Versatile C++ game scraper: Skyscraper:
is there a way to tell Skyscraper that i want to scrape "adult" as "arcade"?
I don't think so, right now. You can work-around by using a well-known folder/platform name (
mame-libretro
,mame-mame4all
), but as a general feature, I don't think it works right now.
Maybe something can be added - similar with the existigaliasMap
, but for platforms - i.e. nes-hacks, nes-homebrew for NES, cps2, cps3, cave for arcade, etc. -
This post is deleted! -
@robertvb83 Just add it to the fba folder instead. So it's inside of roms/fba/<FOLDER>
-
@muldjord thanks for the hint... but this will not lead to a seperate gamelist and also no separate media folders right? So i have to copy the total gamelist and change paths manually?
-
@robertvb83 Correct. It will create one game list, but the subfolders can be entered and will show the game inside of them. You'll figure it out. Try playing around with it and see what you can come up with. I think that's easier than me trying to explain all the ways you could go about doing this.
-
I've been working behind-the-scenes with the good people of igdb. This has resulted in Skyscraper becoming part of the partner program which ups the request limit for the module to 50000 per month. And they also let me know that they are working on improving the artwork situation for their database during 2019. This basically means that we at some point will be able to distinguish platforms for their screenshots and other artwork, which it can't at the moment (which is why all artwork is disabled for the module). So this is some pretty awesome news on all accounts.
-
Great! With 50k requests per month, it should probably be fairly safe to expose IGDB in Retropie-Setup script?
No biggie if not, just making an observation.
-
@Silent I'm thinking about it, but the problem here being just "letting loose". If someone decides to scrape 5000 roms on it we wouldn't be able to stop them and... Well, you can guess the rest. A couple of days and the requests would be gone.
It's a little easier to manage with thegamesdb, where the requests are per IP.
EDIT: But I do believe it would be worth trying it out and see how it goes. It will have to wait until the next release though, as I need to remove the 5 roms per run limit.
EDIT2: I'll up the limit to 250 roms per scraping run. This should be adequate for most users for a platform. If users have more, it will just quit on them.
-
@muldjord lets say i have 100 roms that i want to scrape. Now i already have video for 95 of those put in the input folder. Gathering the local videos will fill the cache with those 95 videos
How can i now search videos for the remaining 5. If i run the scrape again via gathering from screenscraper or arcadedb, it will again download all the videos for 100 roms. This is not what i want, it would be nice to be able to "only fill the gaps"Especially for videos there is often no need for redundancy. But also other artwork i would prefer a "fill the gaps" mode.
-
If i run the scrape again via gathering from screenscraper or arcadedb, it will again download all the videos for 100 roms.
No it won't. If the videos are already scraped, they are detected from the cache and not fetched again. You will notice that it is a lot faster to scrape a platform the second time around, since it will detect that all data is already in the cache and just show what data is has on each file (check the output that says "From cache: YES". If it says "NO" maybe you have enabled
--refresh
, I do not recommend that as it force all data to be redownloaded which is the opposite of what you are asking). Except for the ones that aren't cached, where it will retry from the source.And if you want to just fetch data for the remaining 5 roms, just use the
--startat
and--endat
options or add the file names to the command line. This will refetch the data for just those files.EDIT: But I see room for improvement here. Clearly you are confused about the output as it is right now. So I will ponder on this and see if I can make it clearer in a future version of Skyscraper. I'll add it to the roadmap.
EDIT2: I realize that I misunderstand you. You are correct, if the 95 videos came from the
import
module, then it will download videos for all roms if you then use screenscraper or arcadedb. But as mentioned, if you are just missing 5, just use command line file names or--startat
and--endat
-
@muldjord hmm but what is the point in redownloading videos with each scraping module... i am not using this via command line because i like the lazy gui way... so i end up with a lot of wasted disk space (at least for videos, it doesnt really matter for images)
Dont get me wrong... i am not really complaining. I know such things are not just coded in 5 minutes and i am very happy with your tool and could not thank you enough. I will just switch of the video download in any second scraping module run. -
@robertvb83 Skyscraper's cache is the basis on which you would generate any game list. This means that you can, at any time, change anything in the general config, artwork config, the resource priorities, the frontend you wish to generate game lists for. Basically anything can be changed and you can regenerate the game lists as if you just re-downloaded everything. But this only works if the cache is consistent with the data from the sources. This is the basic principle of Skyscraper.
You could consider simply generating the ES gamelist.xml with videos. Then remove all videos from the cache (just the videos! Not everything as the ES gamelists don't support all data) and then use the
esgamelist
module to reimport the data back in. Then all videos will be in the cache from theesgamelist
module and from no other modules. I would strongly suggest making a backup of the cache before you try this though.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.