How to Scrape Using Sselph Scraper: Tutorial with Command Examples
DrMaxwell last edited by DrMaxwell
How to Scrape 101
This tutorial will cover:
- Commands I use to scrape existing media sets with already matched higher resolution images
- Provide example commands for pulling more accurate metadata from multiple databases
- Describe what conditions or parameters are involved
Data Types & Paths
Data Type Directory roms
RetroPie\home\pi\RetroPie\ [system_name] \ [images]
Data Type Directory roms
Step 1: Identify Your Romset
Naming Conventions: MAME versus MESS versus No-intro
Parenthesis Types & Meanings
Country Code Meaning (U)/(USA) USA (E)/(Europe) Europe (J)/ (Japan) Japan (G) Germany (HK) Hong Kong (I) Italy (JU) USA/Japan (JE) Europe/Japan (JUE) USA/Europe/Japan Standardized Code Meaning [a*] Alternative Version [b*] Bad Dump (Demo) Demo [h*] Hack [f*] Fixed to run better on copiers or emulators (m#) Multi-language (# of Languages) [o] Over-dump [p] Pirate (Proto)/ (Prototype) Prototype (Rev) Revision (# of Revision) [t] Trained [T] Translation (Unl) Unlicensed ZZZ_ Unclassified (-) Unknown Year [!] Verified Good Dump (M#) Multilanguage (# of Languages) (###) Checksum (??k) ROM Size
Rom Set Type Naming Convention No-intro Street Fighter 2 (USA) MESS dkong MAME sfzhr1
Rom Name Description Street Fighter 2 (USA) Street Fighter 2 USA version (Parent Rom) dkong Donkey Kong (Parent Rom) sfzhr1 Street Fighter Zero Hack Revision 1 (Clone)
Step 2: Do You have the Corresponding Rom Set or Do You Need to Rebuild?
Scraper & Database Limitations: No-intro over MESS
Scraper Versus Databases: Supported Systems
Sselph's Scraper was written in Go using corresponding hash files, so the supported systems below correspond to those files on the respective database and the naming conventions the metadata and media used.
It's lightweight and fast with a high effeciency rate for attaining metadata.
Rom Set Supported Systems No-intro NES No-intro SNES No-intro N64 No-intro GB No-intro GBC No-intro GBA No-intro MD No-intro SMS No-intro 32X No-intro GG No-intro PCE No-intro A2600 No-intro LNX MAME MAME/FBA No-intro/ TOSEC* Dreamcast(bin/gdi) No-intro/ TOSEC* PSX(bin/cue) No-intro ScummVM No-intro/ TOSEC* SegaCD ROMs
Sreenzone's Universal XML Scraper
A caveat of this scraper, available here, is that it isn't as quick as Sselph's scraper; however, there is a gui that supports several frontends and it may be more user friendly for novices.
Rom Set Supported Systems No-intro & MAME See Below
Media and Metadata
Provider Link Screenzone Gamesdb
Rebuilding Your Roms Before Scraping
Using CLRMAMEPro to Rebuild Your Rom Set
Rebuilding has advantages and disadvantages depending on what you are trying to achieve; rom collectors may like to have a complete set that is pruned to include parent roms only; whereas, others may want to have Unl/Proto/Beta/Demo releases without duplicates.
Level of Expertise Guidance Beginner's Level: How To Guide CLMAMEPro Quick-Start Guide Beginner's Level Guide by Herb Fargus Adept Level: How to Guide Enter the Matrix
Step 3: Scraping
High Quality versus Quick and Grainy
Higher Resolution Artwork will require you to match both rom and your existing media sets before using the scraper to pull the missing metadata and create the gamelist.xml. A quick and easy solution to let the scraper pull whatever matching media is available at the resolutions provided by the respective database. Personally, I prefer the media I already have over what the scraper has been able to pull in up until now. Having said that, the screenzone media is of high quality and definitely worth using as a free, main source of images.
Sselph's Scraper Flags
If true, add roms that are not found as an empty gamelist entry.
If the gamelist file already exist skip files that are already listed and only append new files.
Comma separated order to preferred images, s=snapshot, b=boxart, f=fanart, a=banner, l=logo, 3b=3D boxart. (default "b")
If false, scraper won't download any images; instead,checks if the expected file is stored locally already. (default= "true")
Comma separated list of extensions to also included in the scraper.
Deprecated, see console_img. (This will be removed soon).
The file containing hash information.
The directory to place downloaded images in locally (default "images").
The path to use for images in gamelist.xml (default "images").
The suffix added after the rom name when creating image files (default "-image").
The format images are written ("jpg" or "png")(default "jpg").
Use N worker threads to process images. If 0, then it applies the same value as workers thread.
The order to choose for language if there is more than one for a value. (en, fr, es, de, pt) (default "en")
If true, run in MAME mode (default="false").
Comma separated order of preference for images, s=snap, t=title, m=marquee, c=cabniet. (default "t,m,s,c")
The max width of images; Larger images will be resized. (default 400)
The file where information about ROMs that weren't scraped is added.
Use a nested img directory structure that matches rom structure.
Don't add thumbnails to the gamelist.
The filename the XML outputs to (default "gamelist.xml").
If set it will truncate the overview of roms to N characters + ellipsis.
Information will be attempted to be downloaded again, but won't remove roms that are not scraped.
The order to choose for region if there is more than one for a value (us, eu, jp, fr, xx) (default "us,eu,jp,fr,xx").
Retry a rom N times on an error (default 2).
The directory containing the roms file to process (default ".").
The path to use for roms in gamelist.xml (default ".").
If true, scrape all systems listed in es_systems.cfg; all dir/path flags will be ignored.
Skip the check if thegamesdb.net is up.
If true, start the pprof service used to profile the application.
If true, remove all non-ascii characters. (default true)
Download the thumbnail for both the image and thumb (faster).
The suffix added after rom name when creating thumb files (default "-thumb").
If true, use the filename minus the extension as the game title in xml.
Use the hash.csv and theGamesDB metadata (default true).
Use the name in the No-Intro DB instead of the one in the GDB (default true).
Use the OpenVGDB if the hash isn't in hash.csv.
Use the ScreenScraper.fr as a datasource.
Print the release version and exit.
Use N worker threads to process roms (default 1).
*** I'll add some more examples to this at a later date; I mainly use the first one to pull metadata to match my rom set and artwork in images ***
Result Command Scrape Images Already Match
scraper -add_not_found=true -append=true -download_images=false -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
Scrape MAME Images Not Already Matched
scraper -add_not_found=true -append=true -download_images=true -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -mame=true -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
Content Link Supported Systems 2176 (parents: 1060, clones: 1116) MESS Supported Systems List Rom Lists in CSV + XML MESS Software Lists
What Is It? Link Used to Update & Rebuild Rom Sets with CLMAMEPRo No-intro DATs
What Where Rom Set Rebuilder CLMAMEPRo Sselph's Scraper Download Sselph's Scraper Information Github Universal XML Scraper (RetroPie Version) Universal XML Scraper XML Editor Universal XML Editor
@DrMaxwell Thanks a lot! Hope you could develop!
DrMaxwell last edited by DrMaxwell
vbs last edited by
Thanks for this post! Since this scraper is using file hashes it is somehow crucial to have the exact files matching the ones in the used database. This post is the only place I could find which mentions the exact ROM set you need to match the DB. So it seems to be No-Intro centered?
How does it work for Dreamcast and Playstation? As far as I know (dat-magic), these systems are not covered by No-Intro.
Also TOSEC is mentioned in the list but at least the TOSEC Dreamcast games are in a multi-file format (GDI + track01.bin + tack02.raw) which makes them quite uncomfortable to use in Retropie. I cannot just put dem to the roms folder cause of duplicated filenames :(
@vbs I'd recommend you to take a look here:
This post is deleted!