Scraper not working
-
@thestarglider Can you elaborate on this. What exact file (exact filename, md5sum, sha1sum) for Tomb Raider is showing up as Revelations?
-
@muldjord said in Scraper not working:
Can you elaborate on this. What exact file (exact filename, md5sum, sha1sum) for Tomb Raider is showing up as Revelations?
Much of what you asked is gibberish to me! The filename is and has always been "Tomb Raider (USA).PBP" and up until around two months ago if I did a scrape of PSX games that it would correctly come back as Tomb Raider, but now it flags up as Revelation.
And also it completely misses (and somehow causes to be hidden from being seen in Retropie) Tomb Raider II.
-
@thestarglider I've looked into this particular case, and it seems they've changed their match algorithm internally. It probably does not match on the checksum of your file because it is unknown to them (sorry for gibberish, look it up) so instead it tries to match for filename directly. And for some reason ScreenScraper decides that Revelations and actually sometimes Tomb Raider III is a match. There's no simple fix for that on my end, as it is the scraping source that gets it wrong, not Skyscraper.
-
@muldjord Yeah, I kinda see what you mean. I finally found Tomb Raider II. Check these cock-ups!
Tomb Raider 2 - Scraped as Coolboarders 2001
Destruction Derby - Scraped as Destruction Derby Raw
Tomb Raider - Scraped as Tomb Raider, The Last Revelation
Grand Theft Auto - Scraped as Grand Theft Auto 2.The videos, marquees and images are all for the scraped games, but are on file in the Downloaded Media folder as the correct game names. The whole thing is screwed. I will just have to delete the gameslist.xml file from the Pi, keep the PSX game list as the default unscraped games and not scrape any more games in the meantime.
-
@thestarglider In general though psx is a difficult platform to scrape as many people create their own pbp files. This means that the checksums have a high chance of being unique to your local file, which in turn means that ScreenScraper doesn't recognize it from the checksum. So it's left with trying to match it with the filename only. And that is all down to their internal algorithm which doesn't seem to be doing a good job.
To explain checksum a bit more. A checksum of a file is like a fingerprint of a file. It takes a small piece of data, the fingerprint / checksum, and if it matches the file / person, you know for certain it's the right person.
A checksum is made by summarizing all of the data in a file a certain way, which gives back a tiny piece of data that has a high chance of only being generated when being calculated from that particular file. This works well for well-known roms for platforms where roms are always the same. And that is not the case for cd-rom based systems as people have a tendency to make their own compressed versions of the games, thus rendering any checksum useless with ScreenScraper, forcing it to rely on the filename only.
I'm guessing you are having a lot more success with rom based platforms such as snes and megadrive, right? That is the reason.
-
@muldjord I just re-ripped my copy of the game, and also (ahem) tested two other downloads of the game on the Pi.
All reported as Tomb Raider - The Last Revelation.
Don't get me wrong, I know it's not Skyscraper at all. But for Screenscraper to read something like Tomb Raider 2 as Coolboarders 2001 is frankly insane.
Oh well, perhaps one day it'll work again like it used to!
-
@thestarglider from what I can tell, the database information is correct. I know that they were having server issues as well with heavy load. I added a lot of information to games, it was approved, and the updated game data is showing up, but it's not coming up during scrapes.
-
@gokuh306 It's not the data in the database that's incorrect. It's how they do a filename search match that turns up faulty results.
@thestarglider Could you perhaps run the following command and paste the output for me? That can help me conclude whether the checksums are unique or not:
sha1sum FILENAME.pbp
Just run that command on all your versions of the Tomb Raider pbp files. Then paste it here with the exact filenames and checksums (the output of the sha1sum command).
Btw, if you're having these issues, you can manually look up a correct checksum at ScreenScraper for any rom connected to the game you want, and then scrape the game with:
Skyscraper -p psx -s screenscraper --query sha1=CHECKSUMFOUNDATSCREESCRAPER FILENAME.pbp
That will force it to register the game as the one linked to that particular checksum.
-
My original File: 6c76668ebf7dd8290668ba4d81a6d1dbd1facfa8 TombRaider.PBP
Another file: 17291f067313e67f40a9bbdeadb81d27cc397218 TombRaider1.PBP(the games were actually 'Tomb Raider (USA).PBP' and 'Tomb Raider.PBP', but the command you asked me to use seemed to have a problem with filenames that include spaces).
I deleted the third one I had, and the source for it is no longer available.
Frankly though it's too much effort for me to have to check each of the games that no longer scrape properly and do all you suggest just to get them right. I'd rather just remove any scrape info from the games altogether and leave the default unscraped lists up.
-
@thestarglider No, that's just because you need to escape filenames with spaces. That is a global thing, not just that command. You can do
"Filename with space.pgp"
orFilename\ with\ space.pgp
. Both will work. But thanks, I'll look into them. I'm asking because I want to rule out that this is a Skyscraper issue.EDIT: Ok, so I checked those two, and my suspicion was correct. The checksums don't exist in their database, so it relies on the filenames entirely. And it does poorly at that which causes the issue of the faulty results. I would have to do further local tests to rule them out. I might do that in a future release.
-
@thestarglider Great news!!! Please update Skyscraper to 3.4.5 and try the Tomb Raider games again. I just had a chat with the ScreenScraper guys and it seems this was in fact an issue, and it has now been fixed in their code (as I suspected it was not a Skyscraper issue, but please update to 3.4.5 anyway)! Scraping
Tomb Raider (USA).pbp
will now result in the correct game. It should also work for all other games now, unless the filename is not in their database of course. Let me know if you test this out. I've tested it myself and it seems to work. :) -
Is screen scraper down? I tried to scrape a few and its not working for me. I scraped a month back and it was fine.
-
the website seems to be completely down at the moment
-
@dman Yes, their API is entirely down atm.
-
Website is working, so API might also be working now.
-
API is working and enabled for non registered users. The built-in scraper should also work now.
-
It does appear that a lot of filename and CRC data is missing, for now. I have a lot of games that scraped fine before not being found.
-
still not working for me but it may be something else
-
@muldjord success! I updated and scraped and all the culprits were found correctly (two weren't but they always were tricky ones to scrape so not an issue).
Many thanks for the heads up!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.