[SOFT] Universal XML Scraper V2 - Easy Scrape with High Quality picture
-
I tried to use this scraper tonight with a 4-Mix and 3 threads configured to work with no connection to the Pi3 (ie local folders only) and it took over two hours to do 100 games. At first, it was pretty fast, but after a few minutes it just slowed to a painful crawl.
I noticed in the logs that it was waiting a lot.
Also, there were a number of times where the game's Wheel Image wasn't applied and others where all I received was a corrupt image despite assets being available on the scraper site. I assume the application just timed out. I am in Florida.
Anyone have any suggestions or could it indeed just be the distance?
-
@RedDog said in [SOFT] Universal XML Scraper V2 - Easy Scrape with High Quality picture:
I tried to use this scraper tonight with a 4-Mix and 3 threads configured to work with no connection to the Pi3 (ie local folders only) and it took over two hours to do 100 games. At first, it was pretty fast, but after a few minutes it just slowed to a painful crawl.
I noticed in the logs that it was waiting a lot.
Also, there were a number of times where the game's Wheel Image wasn't applied and others where all I received was a corrupt image despite assets being available on the scraper site. I assume the application just timed out. I am in Florida.
Anyone have any suggestions or could it indeed just be the distance?
So, after watching the Windows Task Manager during a few scraping sessions, I found that Chrome was stealing a lot more RAM than I realized. After closing that down, the scraping improved, but there are still problems. When starting a session, the imaging is very quick at about one image a second, but after the first 15 or 20, it starts to slow down at an exponential rate. I found that after 10 minutes or so, it was better to cancel the process and start it again.
I also found what I assume is an issue with the scraper. I have my Mix set to the 4 image. If I set it to 'Picture Only' in the General Configuration, the scraper folders will show the program creating the 4-mix image, but then it only brings the background image to the destination folder and not the final 4-mix image.
Another issue I notices is that occassionally, the thread number in the General Configuration will change from my setting (3) to 1 or empty. When empty, it appears to act as if it were a 1.
Also, does anyone know why final images might be missing Wheel images if one exists on the screenscraper.fr site? Maybe an incompatibility with the image or a time out? For example: Gain Ground for megadrive. If you try to scrape a 4-mix, the resulting image will have a Box, a Cartridge, and Screenshot, but no Wheel image...and there is a Wheel image on the site.
-
Sup @RedDog? :) Fellow GameEx'er? I don't think the mix will create with a wheel image. The 4-mix is a combination of ScreenShot, Box, Cart, and the "marquee" game name at the right. If you have "Wheel" selected as an option, I believe it downloads that as a separate file like "filename-wheel.jpg".
@screech I think your xml generator is still a bit buggy. I'm going to try and scrape again, but I noticed tonight that the scraped gamelist.xml it created for Nintendo FDS created a bunch of unnecessary blank
<image/>
lines. Please see here:
https://retropie.org.uk/forum/post/84974
Essentially, I fixed the problem I was having by removing those lines from my xml. But it shouldn't have created them in the first place. Can you check into it? :) -
@hansolo77 said in [SOFT] Universal XML Scraper V2 - Easy Scrape with High Quality picture:
Sup @RedDog? :) Fellow GameEx'er? I don't think the mix will create with a wheel image. The 4-mix is a combination of ScreenShot, Box, Cart, and the "marquee" game name at the right. If you have "Wheel" selected as an option, I believe it downloads that as a separate file like "filename-wheel.jpg".
Hey Han. Yup, it's me. Thanks for the tip on the Marquee.
-
@screech I should also mention that every scrape I have done for PSP has come up with no data and empty image files. This 15 items not found. Is it possible that the scraper isn't connecting up to the PSP area on the site?
Also, may I suggest that if in image is not found, that a placeholder image be copied over instead of an empty one, perhaps something that says "Image No Found". For some time, I assumed that some of the images were being corrupted when they were built, so I kept running them over and over hoping for a successful build. Then I finally realized that they were files that could not find a match.
-
I love the idea of the image not found idea. It would make it easier for people (cough, me) to get there own images off the net as well. Just rename your image to the created one and the xml file already points to it so you don't need to edit it.
My problem is It found all of my PSX games and got images but none of them show in ES at all. The games are in folders to keep them tidy so I took them out and the images still don't show. I ended up using the built in scraper to grab pictures and they work fine for now, so it's all good.
ALMOST all of my games are scraped now, ended up getting no-intro sets as they just work and used the built in scraper to grab my last few Amiga and Atari 2600 ones. Just missing a few that don't seem to be in any databases. I might try my hand at submitting some games to thegames.db to fill in the gaps
-
@RedDog: UXS only scrapes about 1/3 of my PSP titles.
@screech: I love your work mate, as always. I had an idea though: the built-in ES scraper gives you the option to verify which game is being scraped, or to pick one from a list. Do you think it would be viable to include an option like this in UXS?
My PSP and PSX systems both use .pbp files, so I imagine the scraper is going off file names. However, lots of the those file names have the colon
":"
character, which windows won't let you put in filenames, so the scraper skips over them. It would be great if the scraper would pause and then give you a list to choose from. Even if it was just a list of every game in the system, rather than UXS trying to work out which games would be the most likely.Obviously this wouldn't be default behaviour, but maybe a checkbox in the General Settings. Maybe UXS would scrape all the ones it can automatically, then at the end it gives you the option to manually select the games it couldn't find on it's own.
-
@hansolo77
SSH : Normally, it works now (several confirmation from different user) so if it doesn't work on your computer, there is something else (try to use putty to test). And it's normal that when you scrape locally, it don't ask to kill es...
Blank <image/> : I need to make some test.. don't know why it do this (on for now, you are the only one with this issue I know)@brunocedup
Video quality : It's depending on what is on the DB... I haven't check every video ^^. But normally they are in the "original" resolution (SNES have a bigger resolution than Master system.. it may be a reason why)@ghostlywindmill
Problem with autoconf : Can you check you can access the different path from your samba Share with your computer ? it's like UXS can't access in these folder. You can try the alternate path (in rom folder) they're is a new option for that in advanced configuration menu.@mattrixk
Neogeo rom in Mame : That's not a good news :( it mean they're some neogeo rom in the Mame system on Screenscraper DB... I need to do something on that ;)
UXS improvement : Sorry, for now it's impossible. The API can match only on CRC or on filename... No other choice (for now)@A2ra3L
micro deconnection ? Don't know why it's "very long"... what system are you scraping (a CD one ? so it's about normal since the ISO/BIN are very big to hash).
One thing : At about midnight (french hour) the server do a lot's of stuff and can be very slow for about 10/20 minutes...
PSX without picture : When you scrape PSX, your Emulationstation were killed ? Can you check your psx gamelist.xml and the image picture ?.
And FYI, UXS don't use thegamdb but ScreenScraper DB ;) so if you enjoy UXS result and want to fill the missing you are welcome in Screenscraper Too ^^@RedDog
Same as above, maybe it's the "midnight" stuff that break your scrape ?
For wheels, we still have a lot's of work to associated these we have to the good country rom (exemple : a wheel is badly referenced as "Japan" and the rom is "US" so the wheel may not be downloaded, just check the fallback in general menu)
PSP : never tryed (I haven't PSP rom ^^) I need to make some test ;) (But for now, the PSP DB isn't so "big" on ScreenScraper, we need some contributor on this system ;) -
It started working again after a while. Must have been my connection or the website was busy. But the PSX games confused me. The es was killed and the xml file for the psx games showed the same filenames and locations as the png images so they should have worked. All my other systems worked fine. The strange thing is the psx images had doubled names. For example ape escapes png file was called ape escape_ape escape.png only the psx had this issue.
-
@RedDog
Same as above, maybe it's the "midnight" stuff that break your scrape ?
For wheels, we still have a lot's of work to associated these we have to the good country rom (exemple : a wheel is badly referenced as "Japan" and the rom is "US" so the wheel may not be downloaded, just check the fallback in general menu)
PSP : never tryed (I haven't PSP rom ^^) I need to make some test ;) (But for now, the PSP DB isn't so "big" on ScreenScraper, we need some contributor on this system ;)@screech I assume the 'midnight stuff' has to do with server maintenance or some such thing around midnight CEST. I am in Florida (USA) and we are 6 hours behind CEST. The thing is that for the most part, I have done the majority of my scrape tests in the later evening which puts the times after midnight CEST. I had been fighting with it for several days. However, last night around 2:00am to 3:00am CEST they worked fantasticly. I do not think I did anything to make a difference, but I had two 700 item sets finish with a 4-Mix at about 10 minutes each.
As for PSP, I have to disagree with you. While the PSP database may need some lovin, it does not explain why some of my titles that do exist on the web site will not pull. Let me give an example:
SSX on Tour.iso returns an empty image. When I look up the game on Screenscraper.fr:- The Name of the Game is "SSX on Tour".
- The Game Name (by Region) is "SSX on Tour" for USA.
- A Publisher, Developer, Rating, Synopsys (English), Screenshot, and Logo exists.
I don't think the matching criteria could get much better than this, yet I still get no Metadata or images.
I also wanted to say that you've got a really great little program here and it has a lot of potential for this hobby. I hope this all comes over as constructive feedback and not negative. :) Keep up the good work. I know a lot of people appreciate what you have put together.
-
How do I manually download the images hosted at screenscraper.fr? I just completed a massive scan of my Playstation (PSX) games, and all it downloaded was screenshots. I have UXS configured to download 2D art (box art) with screenshots as a 2nd choice. But it completely ignored the boxes. I only have like 40 games, so I can get the art manually if need be.
-
@hansolo77 I've never found a way to get the full size images from screenscraper.fr
Rightclick and save only gave me small thumbnails, so i got my missing covers from other sites like gamesdb or mobygames -
That's what I had to do too.. oh well.
-
Just going through all my gamelist.xml files for each system.. looks like every single system has a random assortment of
<image/>
tags left over, right after the</desc>
. When I say "random" I mean, it's not there for every single game.. just randomly scattered through the list, but always in the same location. What I don't understand is why the/
is located AFTER the word "image" when it should be before. What's more.. why is it creating that line by itself in the first place? I only notice a problem with rendering the gamelist inside EmulationStation if there are no associated videos with the game. If there is a video, there isn't a problem. But when no video exists, the game list displayed doesn't have any art.For a bit of history, if it helps.. I had a previous version of UXS create the original gamelist.xml. Then after a few subsequent updates, I've had it set to just UPDATE the existing xml to add back in the missing ROMs. So maybe the "bug" exists in the function to update? I have yet to try, but maybe it's not even a problem with the most recent version of UXS. It could be something with the versions prior. And creating a NEW list instead of updating might not have this problem either.
-
@hansolo77 said in [SOFT] Universal XML Scraper V2 - Easy Scrape with High Quality picture:
Just going through all my gamelist.xml files for each system.. looks like every single system has a random assortment of
<image/>
tags left over, right after the</desc>
. When I say "random" I mean, it's not there for every single game.. just randomly scattered through the list, but always in the same location. What I don't understand is why the/
is located AFTER the word "image" when it should be before. What's more.. why is it creating that line by itself in the first place? I only notice a problem with rendering the gamelist inside EmulationStation if there are no associated videos with the game. If there is a video, there isn't a problem. But when no video exists, the game list displayed doesn't have any art.@hansolo77 I noticed the same thing when I was doing some manual editing. I got the feeling that the slash after the data object just represents an empty object. For instance, instead of using <marquee></marquee> it uses <marquee/>. I admit that it would make more sense to me for nothing to be present for an empty data object; smaller data files, less to parse, etc...but maybe ES isn't written with that in mind.
EDIT: Tested it with empty data sets removed. ES doesn't care. worked fine without. Maybe it is something that will be changed later.
-
Dear all,
I tried Universal XML Scraper to scrap Neogeo Samurai Shodown series (fba folder) for Mix3 (mixed 3 images) on Retropie. I found media in screenscraper.fr. However, when finish scraping, just Samurai Shodown V and Samurai Shodown V special zero have full scrap (all information and mix 3D boxed art with screenshot and wheel). Could you help me scrap the others? Even I tried sselph scraper on command line with mix3 option did not work, it just come up with simple 2D box art. That's strange. Thanks for your help!
For example, Samurai Shodown 2 neogeo on the screenscraper.fr link
https://www.screenscraper.fr/gameinfos.php?plateforme=142&gameid=37630 -
The scraper works (by default) by checking the CRC and then the filename. If your ROM doesn't match the information in their databases, that's why it doesn't pull it even if it exists. Double-check the name. I managed to force a match by renaming several of my ROMs to match what the database had for it's name. A few of my games were listed multiple times in their database although with many different CRC results but none that matched what I had. So I ended up renaming it and then it worked. If doing that doesn't work either, you can always ask @screech to add your CRC to their list. I've not done that, but he's said in the past it's no problem.
-
Thanks for your quick reply. I just used another romset and bumped it worked with full scrap. Viola!
-
@hansolo77 said in [SOFT] Universal XML Scraper V2 - Easy Scrape with High Quality picture:
I managed to force a match by renaming several of my ROMs to match what the database had for it's name
I need to do that for a bunch of my PSX and PSP games, but they all have the ":" character, which Windows won't let me put in the filename, so they will never scrape.
-
I'm noticing it's not actually shutting down emulation station to scrap like it used to or letting me and none of the SSH commands are working . Anyone having the same issues?
Mike
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.