Versatile C++ game scraper: Skyscraper
-
2.3.1 is coming along nicely. I've implemented the scummvm.ini parsing which is really cool, so thank you for that suggestion @stoo .
I've also implemented numeral checking on titles. Which basically means that if a file is called "Blah 4" and the returned title is "Blargh" it won't even check them, it just skips it. The default numeral is "1" so "Blah 1" and "Blah" is a match.
-
Skyscraper 2.3.1 released: https://github.com/muldjord/skyscraper
- Fixed 'players' tag to always conform to a 1-digit format
- Now filters out ".hack-Link" results from 'screenscraper' to avoid bad localdb data
- Added note to output about how many new resources have been added during scraping run
- Added 'color="#fffff"' option to stroke effect for the geeky people (including me of course)
- Conformed 'game tags' to 'Platform, Action' format
- Fixed so 'localdb' folder isn't created inside dbs media folders
- Optimized the mameMap a bit
- Improved the searchMatch system to also consider numerals
- Now looks up 'scummvm' dummy files in 'scummvm.ini' and uses the correct game name
This release contains some user requests, the screenscraper 'hack-Link' fix and a bunch of optimizations. Most prominently Skyscraper is now fully aware of game numerals ("Game 4" or "Game IV") and acts upon them when comparing results. This should mark the end of game sequels being matched with results that don't have the same numeral in the title as the filename. A quick note: You might notice that you have fewer "game found" with this release. This is intentional. I've changed the default minimum match percentage to 65 (from 50 before) to eliminate more false-positives. And combined with the more strict numeral checking, that will result in less false-positives, which might look like it finds less correct results. That should not be the case. The quality of the results are just more precise.
Let me know what you think and happy scraping!
-
Thanks very much!!! Skyscraper is getting better and better!
-
Skyscraper is great. But I still get the ".hack-Link" results from screenscraper with version 2.3.1.
-
@jwcbronski Yes, you need to rerun it with '--updatedb'. Otherwise it uses the cached results (which still contain the hack-link entries). Running it with '--updatedb' will refetch the data from screenscraper and overwrite the faulty hack-link results effectively removing them.
-
@muldjord I deleted the whole .skyscraper folder before I installed and ran Skyscraper 2.3.1. So there where no cached results. I started from scratch and got the ".hack-Link" results again. That was just 2 hours ago.
-
@jwcbronski Crap. Just for good measure, can you please run "Skyscraper" and visually read the version number to verify 100% that you are in fact running 2.3.1? Just so I don't start spending a lot of time creating a new fix for no reason.
Problem here being that I can't test the fix myself since I don't get these faulty results. So I made the fix blind. But it really should work. I check every result from screenscraper and compare it to ".hack-Link" and then skip it.
Could you please provide a snippet of the output from Skyscraper when it delivers the faulty results? Then I can use that to work on a new fix.
EDIT: Also, anyone else still having the issue?
-
@muldjord I just started Skyscraper and it is v2.3.1. Here's a gamelist entry:
<game> <path>/home/pi/RetroPie/roms/c64/Supermacy.d64</path> <name>.hack//Link</name> <cover>/home/pi/RetroPie/roms/c64/media/covers/Supermacy.png</cover> <image /> <marquee /> <rating /> <desc>The first game in the .hack series for PSP (and the planned final game for the franchise), .hack//LINK logs player into a new version of its virtual landscape called The World R:X (the "R" stands for "Revision"). Set 10 years after the last .Hack, players take control of Tokio Kuryu, a second year junior-high student. Presented through manga-style visuals, the game's story promises to clear up the mysteries from past entries. Over 100 characters from past .hack games, anime, manga, and books will make an appearance. Gameplay promises to retain the basics of past titles, with players facing off in battle against enemies as they explore dungeons. The difference here is that you move around in a party of two, with the CPU-controlling the other character. The game will include 33 such CPU-controlled characters. For the PSP game, the battle system has been changed to a more action-heavy combat system.</desc> <releasedate /> <developer>Bandai Namco</developer> <publisher>CyberConnect2</publisher> <genre>Role playing games</genre> <players /> </game>
-
@muldjord I scraped everything new except for c64 and it worked fine. Tested c64 now, and it gave me also-hack link results when scrapeing with
Skyscraper -p c64 -s screenscraper --updatedb
. As i said strange thing is that every other platform i have worked fine with-s screenscraper
(except for amiga, which we discussed earlier). -
@jwcbronski Oh, I see the problem... There's more than one way it'll return the hack-link entry. I only filter on ".hack-Link" not ".hack//Link". I wonder how many there is then... Anyways, I'll create more robust filter that simply looks for "hack" and "Link" and filters all of those.
Thank you for your help on this.
-
@analoghero Yes, it appears that the problems on screenscrapers end persists and even seem to be broader than I first thought. Anyways, 2.3.2 coming up... I want this fix out there asap.
-
@muldjord Glad I could help. Have you seen this thread on GitHub?
https://github.com/sselph/scraper/issues/214
They also talk about ".hack//Link".
-
@jwcbronski Thank you, yes I glanced over that thread just earlier today. I thought the "//" was just a spelling error. But it would seem that it actually sometimes returns one and sometimes the other.
Either way... Release is ready.
-
Skyscraper 2.3.2 released: https://github.com/muldjord/skyscraper
- Added support for 'wii' and 'gc' platforms
- Added '.chd' format to a bunch of platforms
- Added more robust filtering of the faulty screenscraper 'hack-Link' results
It now looks for "hack" and "Link" and if both exist in the title it skips it. So it'll work for both ".hack-Link" and ".hack//Link". Please let me know if the issue persists in any form.
Also added two new platforms per user request and a bunch of file formats to new and existing platforms. :)Happy scraping!
-
Curious, what happens if you want to scrape the game .hack//Link?
-
@livefastcyyoung Haha, yeah, I thought about that myself and it simply won't. I could do some further checks, for instance check if the platform is "psp" and then allow it anyways, but what if other psp results are faulty? Of course there is a way to get around all of that, but frankly I don't feel like it's worth plastering my code with all sorts of weird checks, just to let people scrape that one game. :) So I hope people are ok with that. At least until screenscraper fixes the problem and I can remove the checks again.
-
Users with experience scraping, do you have edited or used the priorities.xml file, general? per platform?
Any tips about best sources for each platform?I am thinking to delete my localdb and rescrape everything, as i think i have some kind of mess after all the tests and learning about this in the last weeks.
Thanks -
@bleuge I didnt edit any of those except for amiga. Im also not sure about that. Im always using all sources.
-
By popular request Skyscraper is now upgradeable. To get the updateable release go to https://github.com/muldjord/skyscraper and follow the installation instructions. After you've installed it you can update it using the new update_skyscraper.sh script. :)
-
@muldjord I see sometimes this text in the output
'.known option 'pretend
Literal text. Is this right?
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.