Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

How to Scrape Using Sselph Scraper: Tutorial with Command Examples



  • How to Scrape 101

    This tutorial will cover:

    • Commands I use to scrape existing media sets with already matched higher resolution images
    • Provide example commands for pulling more accurate metadata from multiple databases
    • Describe what conditions or parameters are involved

    Data Types & Paths

    Folder Structure

    RetroPie

    Data Type Directory
    roms RetroPie\home\pi\RetroPie\ [system_name]
    images RetroPie\home\pi\RetroPie\ [system_name] \ [images]

    Concrete Examples

    Data Type Directory
    roms RetroPie\home\pi\RetroPie\atari5200\
    images RetroPie\home\pi\RetroPie\atari5200\images\

    Step 1: Identify Your Romset

    Naming Conventions: MAME versus MESS versus No-intro

    Parenthesis Types & Meanings

    Country Code Meaning
    (U)/(USA) USA
    (E)/(Europe) Europe
    (J)/ (Japan) Japan
    (G) Germany
    (HK) Hong Kong
    (I) Italy
    (JU) USA/Japan
    (JE) Europe/Japan
    (JUE) USA/Europe/Japan
    Standardized Code Meaning
    [a*] Alternative Version
    [b*] Bad Dump
    (Demo) Demo
    [h*] Hack
    [f*] Fixed to run better on copiers or emulators
    (m#) Multi-language (# of Languages)
    [o] Over-dump
    [p] Pirate
    (Proto)/ (Prototype) Prototype
    (Rev) Revision (# of Revision)
    [t] Trained
    [T] Translation
    (Unl) Unlicensed
    ZZZ_ Unclassified
    (-) Unknown Year
    [!] Verified Good Dump
    (M#) Multilanguage (# of Languages)
    (###) Checksum
    (??k) ROM Size

    Concrete Examples

    Rom Set Type Naming Convention
    No-intro Street Fighter 2 (USA)
    MESS dkong
    MAME sfzhr1

    Intended Meaning

    Rom Name Description
    Street Fighter 2 (USA) Street Fighter 2 USA version (Parent Rom)
    dkong Donkey Kong (Parent Rom)
    sfzhr1 Street Fighter Zero Hack Revision 1 (Clone)

    Step 2: Do You have the Corresponding Rom Set or Do You Need to Rebuild?

    Scraper & Database Limitations: No-intro over MESS

    Scraper Versus Databases: Supported Systems
    Sselph's Scraper was written in Go using corresponding hash files, so the supported systems below correspond to those files on the respective database and the naming conventions the metadata and media used.

    Scrapers

    Sselph's Scraper
    It's lightweight and fast with a high effeciency rate for attaining metadata.

    Rom Set Supported Systems
    No-intro NES
    No-intro SNES
    No-intro N64
    No-intro GB
    No-intro GBC
    No-intro GBA
    No-intro MD
    No-intro SMS
    No-intro 32X
    No-intro GG
    No-intro PCE
    No-intro A2600
    No-intro LNX
    MAME MAME/FBA
    No-intro/ TOSEC* Dreamcast(bin/gdi)
    No-intro/ TOSEC* PSX(bin/cue)
    No-intro ScummVM
    No-intro/ TOSEC* SegaCD ROMs

    Sreenzone's Universal XML Scraper
    A caveat of this scraper, available here, is that it isn't as quick as Sselph's scraper; however, there is a gui that supports several frontends and it may be more user friendly for novices.

    Rom Set Supported Systems
    No-intro & MAME See Below

    Databases

    Media and Metadata

    Provider Link
    Screenzone Shiny
    Gamesdb Not So Shiny

    Rebuilding Your Roms Before Scraping

    Using CLRMAMEPro to Rebuild Your Rom Set

    Rebuilding has advantages and disadvantages depending on what you are trying to achieve; rom collectors may like to have a complete set that is pruned to include parent roms only; whereas, others may want to have Unl/Proto/Beta/Demo releases without duplicates.

    Level of Expertise Guidance
    Beginner's Level: How To Guide CLMAMEPro Quick-Start Guide
    Beginner's Level Guide by Herb Fargus
    Adept Level: How to Guide ClrmameEnter the Matrix

    Step 3: Scraping

    High Quality versus Quick and Grainy

    Higher Resolution Artwork will require you to match both rom and your existing media sets before using the scraper to pull the missing metadata and create the gamelist.xml. A quick and easy solution to let the scraper pull whatever matching media is available at the resolutions provided by the respective database. Personally, I prefer the media I already have over what the scraper has been able to pull in up until now. Having said that, the screenzone media is of high quality and definitely worth using as a free, main source of images.

    Sselph's Scraper Flags

    Flag Effect
    -add_not_found If true, add roms that are not found as an empty gamelist entry.
    -append If the gamelist file already exist skip files that are already listed and only append new files.
    -console_img string Comma separated order to preferred images, s=snapshot, b=boxart, f=fanart, a=banner, l=logo, 3b=3D boxart. (default "b")
    -download_images If false, scraper won't download any images; instead,checks if the expected file is stored locally already. (default= "true")
    -extra_ext string Comma separated list of extensions to also included in the scraper.
    -gdb_img string Deprecated, see console_img. (This will be removed soon).
    -hash_file file The file containing hash information.
    -image_dir directory The directory to place downloaded images in locally (default "images").
    -image_path path The path to use for images in gamelist.xml (default "images").
    -image_suffix suffix The suffix added after the rom name when creating image files (default "-image").
    -img_format The format images are written ("jpg" or "png")(default "jpg").
    -img_workers N Use N worker threads to process images. If 0, then it applies the same value as workers thread.
    -lang string The order to choose for language if there is more than one for a value. (en, fr, es, de, pt) (default "en")
    -mame If true, run in MAME mode (default="false").
    -mame_img string Comma separated order of preference for images, s=snap, t=title, m=marquee, c=cabniet. (default "t,m,s,c")
    -max_width width The max width of images; Larger images will be resized. (default 400)
    -missing file The file where information about ROMs that weren't scraped is added.
    -nested_img_dir Use a nested img directory structure that matches rom structure.
    -no_thumb Don't add thumbnails to the gamelist.
    -output_file file The filename the XML outputs to (default "gamelist.xml").
    -overview_len N If set it will truncate the overview of roms to N characters + ellipsis.
    -refresh Information will be attempted to be downloaded again, but won't remove roms that are not scraped.
    -region string The order to choose for region if there is more than one for a value (us, eu, jp, fr, xx) (default "us,eu,jp,fr,xx").
    -retries N Retry a rom N times on an error (default 2).
    -rom_dir directory The directory containing the roms file to process (default ".").
    -rom_path path The path to use for roms in gamelist.xml (default ".").
    -scrape_all If true, scrape all systems listed in es_systems.cfg; all dir/path flags will be ignored.
    -skip_check Skip the check if thegamesdb.net is up.
    -start_pprof If true, start the pprof service used to profile the application.
    -strip_unicode If true, remove all non-ascii characters. (default true)
    -thumb_only Download the thumbnail for both the image and thumb (faster).
    -thumb_suffix The suffix added after rom name when creating thumb files (default "-thumb").
    -use_filename If true, use the filename minus the extension as the game title in xml.
    -use_gdb Use the hash.csv and theGamesDB metadata (default true).
    -use_nointro_name Use the name in the No-Intro DB instead of the one in the GDB (default true).
    -use_ovgdb Use the OpenVGDB if the hash isn't in hash.csv.
    -use_ss Use the ScreenScraper.fr as a datasource.
    -version Print the release version and exit.
    -workers N Use N worker threads to process roms (default 1).

    Commandline Examples

    *** I'll add some more examples to this at a later date; I mainly use the first one to pull metadata to match my rom set and artwork in images ***

    Result Command
    Scrape Images Already Match scraper -add_not_found=true -append=true -download_images=false -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
    Scrape MAME Images Not Already Matched scraper -add_not_found=true -append=true -download_images=true -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -mame=true -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4

    Useful Links

    MESS

    Content Link
    Supported Systems 2176 (parents: 1060, clones: 1116) MESS Supported Systems List
    Rom Lists in CSV + XML MESS Software Lists

    No-intro

    What Is It? Link
    Used to Update & Rebuild Rom Sets with CLMAMEPRo No-intro DATs

    Tools

    What Where
    Rom Set Rebuilder CLMAMEPRo
    Sselph's Scraper Download
    Sselph's Scraper Information Github
    Universal XML Scraper (RetroPie Version) Universal XML Scraper
    XML Editor Universal XML Editor

    References

    1. Screen Scraper
    2. Sselph's Scraper
    3. Project Retrobution


  • @DrMaxwell Thanks a lot! Hope you could develop!



  • @tien_huu_1408 I'll try to add more to this using the Screenzone scraper options as well.
    @sselph Does the -append=true flag now remove roms that cannot be found from the gameslist?



  • Thanks for this post! Since this scraper is using file hashes it is somehow crucial to have the exact files matching the ones in the used database. This post is the only place I could find which mentions the exact ROM set you need to match the DB. So it seems to be No-Intro centered?
    How does it work for Dreamcast and Playstation? As far as I know (dat-magic), these systems are not covered by No-Intro.
    Also TOSEC is mentioned in the list but at least the TOSEC Dreamcast games are in a multi-file format (GDI + track01.bin + tack02.raw) which makes them quite uncomfortable to use in Retropie. I cannot just put dem to the roms folder cause of duplicated filenames :(





Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.

Looks like your connection to RetroPie Forum was lost, please wait while we try to reconnect.