Versatile C++ game scraper: Skyscraper
-
.lha support for Amiberry has already been added to the development branch ;)
-
I'm currently working on a "add spaces to filename" solution for the .lha scraping on Amiga. Most .lha files come in the name format of "ThisIsAGameName3_v1.2.lha". So figuring out rules to turn this into "This Is A Game Name 3" for use with searching is a bit of a fiddly thing. That example is easy. But what about "3DPool", "4x4Driving" or "ABCGame"? As you might be able to guess, those break the easily applied rules and return "3 D Pool", "4x 4 Driving" and "A B C Game". And then I can fix special rules. For instance I can check if we have a '3' before a 'D' and then I won't be inserting a space before it. But then what about "Game3Deluxe"? That would then become "Game 3Deluxe"... Then I add a special rule if it says "Deluxe" and so on. It's a game of tugging. Adding a new rule breaks others. But I've so far found a pretty good middle ground. There are some that are just names badly. Some games have odd numbers at the end of them that aren't sequel numbers. Such as "ThisGame45" where 45 is seemingly also some sort of versioning (it's not sequel numbering in these cases).
Btw, user credentials for 'screenscraper' has been implemented and will be in 2.3.7. You can then insert your own ss user and use as many threads as that allows.
2.3.7 NOT released yet, just to clarify. This is just a progress update. :)
-
@muldjord What if you create a txt file from a complete whdload directory, then insert spaces with a tool, and then check it if its correct. When you have a txt file, use it as a lookup table like BrianTheLionAGA.lha = Brian the Lion AGA.lha. Its a little bit of work, but maybe more accurate than checking every filename.
-
@analoghero Thank you for the suggestion. Yes, it will eventually become something like that. It's the same thing I currently do to translate the mame names to actual game names. The problem with these lookup tables is that they are fixed. So if someone decides to change the filename of an .lha file, it stops working. I could then use the sha1 of the files instead, but they might change aswell depending on where people got them from.
For the time being I think I'll stick with the automated method until I can come up with one that uses the slave files from inside the lha archives. That would work better for a lookup method.
-
@muldjord I have downloaded lhas back in the day on my real amiga, and have them stored on a amiga - sfs formated harddisk along with workbench. Im waiting for the day i can somehow connect that to the pi, as i dont have a pc with ide connections anymore.
Problem is: all my whdload files are zips, so cant help you with testing :(
-
@analoghero No worries I have plenty of files to test on :)
-
Just implemented '--startat' and '--endat' options that allow the user to define what files to begin and end at when scraping. In this mode it ONLY caches data, it doesn't change the game list.
This allows you to only scrape a span of files. So if you have 1200 files, but only want to scrape the middle 60, then you can do so by, alphabetically, defining the first and last file in that span. Makes sense? Useful? I know I will be using it myself at least. :D
-
@muldjord That will be very useful. It would be great for me where I'm trying to merge some files into a set, but unfortunately those files aren't in a contiguous group (i.e. some start with a, some b, some c, etc.) However, I will use it to portion out updates on some of the 6,000 ROM set systems so I can do it by letter and check rather than have it run and then go back to see where it stopped, what was missed, etc.
Thanks!
-
@timekills When i add roms to a system, i just run skyscraper and select yes to skip existing entries.
Flipside is that you can do this only one time with only one scrapermodule. So if you want to scrape with another module, you have to scrape all files.
-
Skyscraper version 2.3.7 released: https://github.com/muldjord/skyscraper
- Now checks for .lha suffix and adds spaces where appropriate to get better results
- Improved returned image data validity check (libpng errors still happen, but can be ignored)
- Rewrote the worker to main thread communication a bit
- Implemented '--startat' option that tells Skyscraper the first file to scrape
- Implemented '--endat' option that tells Skyscraper the last file to scrape
- Added thread id to terminal output
- Applied serverside artwork size limit to openretro module to avoid running out of memory
- Improved network communication class
Another "bits and pieces" release. The inclusion of '--startat' and '--endat' should make it easier to do scraping of a subset of your games. This has been requested a few times and I've made use of it myself A LOT during testing.
2.3.6 had some issues with the new openretro parser since some of the returned covers are INSANELY large. Like, 10000x10000 resolution. That in conjunction with the new alphabeticized queue system made it eat up ALL of the Pi ram really fast, which in turn made the kernel kill it off to ensure system stability. This has been fixed serverside, simply by asking for a resize before it reaches Skyscraper.
The changes to the network communicator are kind of beta. Which means that I've tested it, and haven't seen any problems. But I've removed the clearAccessCache call again since it should no longer be necessary. That call ensured that data didn't get mixed up, but was a bit of a workaround. My new code should ensure that data doesn't get mixed up without using that call. So pleeeeease, if you encounter game media getting mixed up between games, let me know! Shouldn't happen though.This was a bit of a tough one. I've been spending the last 4 days trying to figure out what the hell was causing the crashes, until I stumbled upon those insane cover artwork resolutions from openretro. Then it became quite clear that Skyscraper wasn't the problem at all...
Anyways, all my testing has gone well on my end, so please do update and try it out.
Happy scraping! :)
-
Skyscraper version 2.3.8 released: https://github.com/muldjord/skyscraper
- Implemented user credentials ('-u user:password') to set up threads for 'screenscraper' module
- Made sure artwork output gets exported, even if entry has no base artwork resource
- Changed 'verbose' to 'verbosity' to allow levels and made terminal output more useful overall
- Added '--dbstats' command line option that prints stats for the selected local dabatase cache
- Added '--purgedb' command line option that allows purging resources from localdb
- Fixed bugs in mergedb command line option
- Fixed bug in Simle Mode where 'attractmode' would not work properly (thank you Humayun)
Fixed a few bugs, added a few options, and FINALLY included the '-u' option properly. If you have a ScreenScraper user, please provide that using the format '-u user:password'. Remember that you can set this in '[homedir]/.skyscraper/config.ini' so you don't have to have it on the command line.
You can now check the stats of the local database cache with '--dbstats'. If you want to purge stuff from it, do this with '--purgedb' by adding 'm:[module]' and/or 't:[type]'. You can also have both comma-separated.
Example: 'Skyscraper -p amiga --purgedb m:thegamesdb,t:cover'. This will purge all covers from thegamesdb module from the Amiga platform's local database cache.
Another example: 'Skyscraper -p amiga --purgedb m:thegamesdb'. This will purge all resources from thegamesdb module from the Amiga platform's local database cache.
Last example: 'Skyscraper -p amiga --purgedb t:cover'. This will purge all cover resources from any module from the Amiga platform's local database cache.
Lastly, I've changed '--verbose' to '--verbosity [level]' where [level] can be 0-3. The higher the level, the more output it will give you while scraping.As always, please report bugs if you encounter any.
Happy scraping! :)
-
@muldjord Thanks for updating and new additions xD
I was going to write that don't know why but initial testings in 2.3.7 didn't work for me.This line for example:
Skyscraper -d /home/pi/RetroPie/dbs/arcade --nosubdirs --noresize --updatedb -t 8 -p arcade --videos --unattend --skipped -s arcadedb --pretend
just showed me 720.zip game not found (the first rom in the arcade folder)
print the elapsed and remaning time
and just hangs there, no more output, nothing, after 10min i hit ctrl+cI thought my roms or dbs folders are not ok, but yes, they are ok
Anyways, I updated a pair of hours ago to 2.3.7, and then ... I'll test 2.3.8 version
I remember you said generating a log for debugging purposes it was a bit difficult in the actual skyscraper, but if this could be added, debugging and spotting errors could be possible much much easier? What do you think?
Thanks you very much for so fast support!
Edit: tested 2.3.8, the same thing, I'll try another commandlines for other platforms, no idea what the hell is going wrong here.... Also i don't know why it can't find 720 in arcadedb...
The message i am getting is
#1/141 (T1) Pass 1 ---- Game '720' not found :( ----
Then the times and that's all...Edit 2: skipped arcadedb and tried
Skyscraper -d /home/pi/RetroPie/dbs/arcade --nosubdirs --noresize -t 8 -p arcade --videos --unattend --skipped -s openretro --pretend
It's working.
Btw, i use a generic bash script tailored to my platforms, i use -t 8 in all, i know this number is adjusted for specific sites, so, no problem using -t 8 in a generic way ?
-
@bleuge Something seems to be wrong with ArcadeDB, not sure what. I'll investigate.
EDIT: You can use -t 8 if you prefer, it'll adjust acoordingly. Be aware that you might run into problems with ScreenScraper though, as I am a bit unsure if the 4 threads limit I have on now is correct. If I were you I would provide my ScreenScraper user credentials in the config.ini so it's always used correctly for it.
EDIT2: I found the error with ArcadeDB. I had changed some stuff elsewhere that broke it... I'll release a patch shortly that will bring it back in working order.
-
2.3.9 out, please try that
-
@muldjord So fast, thanks you again!
-
@muldjord Compiling the new version right now. Couldnt resist to try out amiberrys new lha feature. Read your conversation with @HoraceAndSpider on the facebook whdload group, and edited the launch script as you said.
I know that amiberry is offtopic here but im excited. :) I think i will swap out my .uae files and gamedata with .lha files.
-
I keep getting Skyscraper hanging and don't have any idea why.
This time stalls while scraping using localdb at half of roms processed in a platform.How is this possible? If scraper is localdb i guess no network connections are being stalled or whatever, just get files from localdb and copy to whatever folder?
Has anyone got this kind of problem? Thanks
Edit: i am using mounted dbs and roms folders in an external HD, but it's working perfectly.Edit2: re-run my scripts and it hangs exactly in the same game:
Scraper: localdb
Search match: 100%
Compare title:'Seaquest'
.....
rest of data...Estimated time: 00:00:28
And it hangs there.
Any idea? Maybe i need to reescrape everything? deleting my dbs?
-
@bleuge Hangs here too. Same situation.
Edit: It hangs also when i try to import something. I put one video files in the import folder and ran skyscraper with -p amiga -s import --videos. It says Sit back, relax and let me do the work! :) forever.
My dbs folder is stored in the normal folder not on a network drive.
Edit2: Looks like it crashes when it cant find resources for a game in the dbs folder. In my amiga folder i have 74 games and 2 demos and it crashes alway when it tries to scrape a demo.
#74/76 (T1) ---- Game 'Worms Directors Cut [AGA]' found! :) ---- Scraper: localdb Search match: 100 % Compare title: 'Worms Directors Cut' Result title: 'Worms Directors Cut' (import) Platform: 'aga' (import) Release Date: '1997-01-01' (openretro) Developer: 'Team 17' (openretro) Publisher: 'Team 17' (openretro) Players: '8' (openretro) Tags: 'Artillery, Multidirectional, Puzzle, Sideways' (openretro) Rating (0-1): '' () Cover: YES (openretro) Screenshot: YES (openretro) Wheel: NO () Marquee: NO () Video: YES (import) Description: () Elapsed time: 00:01:00 Estimated time: 00:01:02
-
I'm really sorry about this guys, I do my best to test things out. Apparently not good enough. Talk about demotivation, these hangs seem to be the bane og Skyscraper even though I keep fixing them. I'll look into it...
-
2.4.0 out, please test and report back. Import and localdb halts should be fixed...
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.