A Better Way to Scrape?
-
@quicksilver Do you know if it's possible to add the login info when using Steven Selph Scraper?
I kinda like the way I can add scraped data to my gamelist, instead of overwriting the old gamelist each time. I can't seem to find that feature with Skyscraper. -
@Brigane what's the advantage for appending the gameslist.xml vs overwriting?
-
@quicksilver The idea is that the scraper would add artwork/metadata only for the entries that don't have it. It's kind of equivalent of Skyscraper's cache, only that it used the
gamelist.xml
as a cache to decide to scrape the entry or not.@Brigane you can add credentials when scraping using the command line parameters (-ss_user/-ss_password).
-
@quicksilver I had some issues and I don't think I'm the only one. The scraper downloaded the video snaps in small sizes so they somehow became corrupted for some of the games, so the video snap didn't show up in Emulationstation. I have manually added new videos to the games I had problems with, and I don't want the scraper to overwrite them again.
-
@Brigane Sometimes you have to recompress video that didn't show up in emulationstation (you can use handbrake or ffmpeg or you can use ARRM to do this by selecting roms you want to convert http://jujuvincebros.fr/wiki/arrm/doku.php?id=compress_video_en ).
-
-
@quicksilver Yes, Skyscraper can cache the current gamelist.xml data using the
-s esgamelist
scraping module. It will grab the data from the gamelist and save it in Skyscraper cache, in turn allowing users to re-generating the gamelist with that same data in case they want to avoid Skyscraper overwriting it. But I always advice to make backups of the current gamelist.xml's, simply because people often misunderstand what it does. For instance, if you have composited screenshots in the current gamelist.xml, Skyscraper will import that as a screenshot. But because it is already composited of different artwork, users get confused when they re-generate the gamelist, since Skyscraper also composites from different artwork. So you get sortof a composite inside a composite.So
-s esgamelist
works best for gamelists containing only raw artwork. -
@muldjord said in A Better Way to Scrape?:
But I always advice to make backups of the current gamelist.xml's, simply because people often misunderstand what it does. For instance, if you have composited screenshots in the current gamelist.xml, Skyscraper will import that as a screenshot. But because it is already composited of different artwork, users get confused when they re-generate the gamelist, since Skyscraper also composites from different artwork. So you get sortof a composite inside a composite.
Thanks for the warning that makes perfect sense though I probably wouldnt have thought of that.
-
I think Skyscraper will allow you to import images from a local folder. As a suggestion would it make sense to make the images that are downloaded from Skyscraper's server available as a torrent file? That would allow many more people to access the scraper while taking images from a local PC.
-
@George-Spiggott Skyscraper doesn't have a server. Skyscraper is the scraping software. Maybe you mean ScreenScraper? Which is one of the sources supported inside Skyscraper.
Skyscraper: Scraping software
ScreenScraper: Online game data source -
@muldjord Ok thanks. So the same suggestion but using images from Screenscraper or similar then.
-
@George-Spiggott It's not that simple I'm afraid. To upload such a pack would require the license / permission of the sources to be sorted out and that's just not something I have any interest in looking into.
And I'm not really sure it's a great idea either. If a pack was created, you would go from downloading just the data you need per user (as it is now), to downloading all data for all games in a pack even though you don't need all of it. So... It sounds good on paper. But it's a huge waste of bandwidth as I see it.
-
@muldjord Yes there would be bandwidth wastage but it would be the users rather than the content providers and as I understand the situation
it is currently their bandwidth that is at a premium.That said I'm certainly not trying to force my ideas on the creators. Thanks both for your work and your valued input.
-
@Brigane There is, indeed, a way to use log-in info for Sky Scraper when using Selph Scraper.
https://github.com/sselph/scraper/wikiIn Command Line Flags under the sub-heading "Full List", and about 3/4 of the way down the commands, you will find the options to insert your log-in info. You have to run the scraper from the console and use the command line. You can also use the -console_src ss to force it to prioritize Sky Scraper.
As a side note, If you have some stubborn roms that won't scrape, you can edit a copy of the "Hash.csv" file to include the new hashes and then point the scraper at the updated file with -hash_file command. I use this technique all the time to scrape ps1 games that I made into .PBP files since their hashes are not in the original file.
If you have lots of roms that you need to get hashes for and add to the hash file, I wrote a nice little Powershell script that does a great job of spitting out hashes for files in lowercase. You can download it from this link:
https://drive.google.com/file/d/1saYP91JHvkgK8By60BRChSMakSxj3Sno/view?usp=sharing
The script runs on Windows 10 and is safe to use and distribute if you like it or feel it could help someone. The best part is being able to get the hashes output in lowercase, which is a pain in the butt if you are a Windows user.
-
@OldSchoolGuy This topic appeared to me because I was searching solutions, I'm using Skyscraper on Retropie and it seems like every single rom and every single site that I scrape returns "no roms found", even when a manual check confirms that the roms are actually present on their respective sites. I was wandering if there is a different solution to get those information (info, jpgs and mp4s), for example a torrent.
I don't care if this requires a ton of hand work.
I thought that there could even be a torrent containing directly the Skyscraper cached information.
Any news?
thank you very much -
@beppi give this a try https://github.com/zayamatias/retroscraper
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.