How to Scrape Using Sselph Scraper: Tutorial with Command Examples
-
How to Scrape 101
This tutorial will cover:
- Commands I use to scrape existing media sets with already matched higher resolution images
- Provide example commands for pulling more accurate metadata from multiple databases
- Describe what conditions or parameters are involved
Data Types & Paths
Folder Structure
RetroPie
Data Type Directory roms RetroPie\home\pi\RetroPie\ [system_name]
images RetroPie\home\pi\RetroPie\ [system_name] \ [images]
Concrete Examples
Data Type Directory roms RetroPie\home\pi\RetroPie\atari5200\
images RetroPie\home\pi\RetroPie\atari5200\images\
Step 1: Identify Your Romset
Naming Conventions: MAME versus MESS versus No-intro
Parenthesis Types & Meanings
Country Code Meaning (U)/(USA) USA (E)/(Europe) Europe (J)/ (Japan) Japan (G) Germany (HK) Hong Kong (I) Italy (JU) USA/Japan (JE) Europe/Japan (JUE) USA/Europe/Japan Standardized Code Meaning ------------- ------------- [a*] Alternative Version [b*] Bad Dump (Demo) Demo [h*] Hack [f*] Fixed to run better on copiers or emulators (m#) Multi-language (# of Languages) [o] Over-dump [p] Pirate (Proto)/ (Prototype) Prototype (Rev) Revision (# of Revision) [t] Trained [T] Translation (Unl) Unlicensed ZZZ_ Unclassified (-) Unknown Year [!] Verified Good Dump (M#) Multilanguage (# of Languages) (###) Checksum (??k) ROM Size Concrete Examples
Rom Set Type Naming Convention No-intro Street Fighter 2 (USA) MESS dkong MAME sfzhr1 Intended Meaning
Rom Name Description Street Fighter 2 (USA) Street Fighter 2 USA version (Parent Rom) dkong Donkey Kong (Parent Rom) sfzhr1 Street Fighter Zero Hack Revision 1 (Clone)
Step 2: Do You have the Corresponding Rom Set or Do You Need to Rebuild?
Scraper & Database Limitations: No-intro over MESS
Scraper Versus Databases: Supported Systems
Sselph's Scraper was written in Go using corresponding hash files, so the supported systems below correspond to those files on the respective database and the naming conventions the metadata and media used.Scrapers
Sselph's Scraper
It's lightweight and fast with a high effeciency rate for attaining metadata.Rom Set Supported Systems No-intro NES No-intro SNES No-intro N64 No-intro GB No-intro GBC No-intro GBA No-intro MD No-intro SMS No-intro 32X No-intro GG No-intro PCE No-intro A2600 No-intro LNX MAME MAME/FBA No-intro/ TOSEC* Dreamcast(bin/gdi) No-intro/ TOSEC* PSX(bin/cue) No-intro ScummVM No-intro/ TOSEC* SegaCD ROMs Sreenzone's Universal XML Scraper
A caveat of this scraper, available here, is that it isn't as quick as Sselph's scraper; however, there is a gui that supports several frontends and it may be more user friendly for novices.Rom Set Supported Systems No-intro & MAME See Below Databases
Media and Metadata
Provider Link Screenzone Gamesdb
Rebuilding Your Roms Before Scraping
Using CLRMAMEPro to Rebuild Your Rom Set
Rebuilding has advantages and disadvantages depending on what you are trying to achieve; rom collectors may like to have a complete set that is pruned to include parent roms only; whereas, others may want to have Unl/Proto/Beta/Demo releases without duplicates.
Level of Expertise Guidance Beginner's Level: How To Guide CLMAMEPro Quick-Start Guide Beginner's Level Guide by Herb Fargus Adept Level: How to Guide Enter the Matrix
Step 3: Scraping
High Quality versus Quick and Grainy
Higher Resolution Artwork will require you to match both rom and your existing media sets before using the scraper to pull the missing metadata and create the gamelist.xml. A quick and easy solution to let the scraper pull whatever matching media is available at the resolutions provided by the respective database. Personally, I prefer the media I already have over what the scraper has been able to pull in up until now. Having said that, the screenzone media is of high quality and definitely worth using as a free, main source of images.
Sselph's Scraper Flags
Flag Effect -add_not_found
If true, add roms that are not found as an empty gamelist entry. -append
If the gamelist file already exist skip files that are already listed and only append new files. -console_img string
Comma separated order to preferred images, s=snapshot, b=boxart, f=fanart, a=banner, l=logo, 3b=3D boxart. (default "b") -download_images
If false, scraper won't download any images; instead,checks if the expected file is stored locally already. (default= "true") -extra_ext string
Comma separated list of extensions to also included in the scraper. -gdb_img string
Deprecated, see console_img. (This will be removed soon). -hash_file file
The file containing hash information. -image_dir directory
The directory to place downloaded images in locally (default "images"). -image_path path
The path to use for images in gamelist.xml (default "images"). -image_suffix suffix
The suffix added after the rom name when creating image files (default "-image"). -img_format
The format images are written ("jpg" or "png")(default "jpg"). -img_workers N
Use N worker threads to process images. If 0, then it applies the same value as workers thread. -lang string
The order to choose for language if there is more than one for a value. (en, fr, es, de, pt) (default "en") -mame
If true, run in MAME mode (default="false"). -mame_img string
Comma separated order of preference for images, s=snap, t=title, m=marquee, c=cabniet. (default "t,m,s,c") -max_width width
The max width of images; Larger images will be resized. (default 400) -missing file
The file where information about ROMs that weren't scraped is added. -nested_img_dir
Use a nested img directory structure that matches rom structure. -no_thumb
Don't add thumbnails to the gamelist. -output_file file
The filename the XML outputs to (default "gamelist.xml"). -overview_len N
If set it will truncate the overview of roms to N characters + ellipsis. -refresh
Information will be attempted to be downloaded again, but won't remove roms that are not scraped. -region string
The order to choose for region if there is more than one for a value (us, eu, jp, fr, xx) (default "us,eu,jp,fr,xx"). -retries N
Retry a rom N times on an error (default 2). -rom_dir directory
The directory containing the roms file to process (default "."). -rom_path path
The path to use for roms in gamelist.xml (default "."). -scrape_all
If true, scrape all systems listed in es_systems.cfg; all dir/path flags will be ignored. -skip_check
Skip the check if thegamesdb.net is up. -start_pprof
If true, start the pprof service used to profile the application. -strip_unicode
If true, remove all non-ascii characters. (default true) -thumb_only
Download the thumbnail for both the image and thumb (faster). -thumb_suffix
The suffix added after rom name when creating thumb files (default "-thumb"). -use_filename
If true, use the filename minus the extension as the game title in xml. -use_gdb
Use the hash.csv and theGamesDB metadata (default true). -use_nointro_name
Use the name in the No-Intro DB instead of the one in the GDB (default true). -use_ovgdb
Use the OpenVGDB if the hash isn't in hash.csv. -use_ss
Use the ScreenScraper.fr as a datasource. -version
Print the release version and exit. -workers N
Use N worker threads to process roms (default 1). Commandline Examples
*** I'll add some more examples to this at a later date; I mainly use the first one to pull metadata to match my rom set and artwork in images ***
Result Command Scrape Images Already Match scraper -add_not_found=true -append=true -download_images=false -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
Scrape MAME Images Not Already Matched scraper -add_not_found=true -append=true -download_images=true -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -mame=true -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
Useful Links
MESS
Content Link Supported Systems 2176 (parents: 1060, clones: 1116) MESS Supported Systems List Rom Lists in CSV + XML MESS Software Lists No-intro
What Is It? Link Used to Update & Rebuild Rom Sets with CLMAMEPRo No-intro DATs Tools
What Where Rom Set Rebuilder CLMAMEPRo Sselph's Scraper Download Sselph's Scraper Information Github Universal XML Scraper (RetroPie Version) Universal XML Scraper XML Editor Universal XML Editor
References
-
@DrMaxwell Thanks a lot! Hope you could develop!
-
@tien_huu_1408 I'll try to add more to this using the Screenzone scraper options as well.
@sselph Does the -append=true flag now remove roms that cannot be found from the gameslist? -
Thanks for this post! Since this scraper is using file hashes it is somehow crucial to have the exact files matching the ones in the used database. This post is the only place I could find which mentions the exact ROM set you need to match the DB. So it seems to be No-Intro centered?
How does it work for Dreamcast and Playstation? As far as I know (dat-magic), these systems are not covered by No-Intro.
Also TOSEC is mentioned in the list but at least the TOSEC Dreamcast games are in a multi-file format (GDI + track01.bin + tack02.raw) which makes them quite uncomfortable to use in Retropie. I cannot just put dem to the roms folder cause of duplicated filenames :( -
@vbs I'd recommend you to take a look here:
https://forum.recalbox.com/topic/4690/soft-universal-xml-scraper-v2 -
This post is deleted!
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.