RetroPie forum home
    • Recent
    • Tags
    • Popular
    • Home
    • Docs
    • Register
    • Login
    Please do not post a support request without first reading and following the advice in https://retropie.org.uk/forum/topic/3/read-this-first

    How to Scrape Using Sselph Scraper: Tutorial with Command Examples

    Scheduled Pinned Locked Moved Help and Support
    sselphscraperscraper retropiscrapegamelist.xml
    6 Posts 5 Posters 33.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DrMaxwellD
      DrMaxwell
      last edited by DrMaxwell

      How to Scrape 101

      This tutorial will cover:

      • Commands I use to scrape existing media sets with already matched higher resolution images
      • Provide example commands for pulling more accurate metadata from multiple databases
      • Describe what conditions or parameters are involved

      Data Types & Paths

      Folder Structure

      RetroPie

      Data Type Directory
      roms RetroPie\home\pi\RetroPie\ [system_name]
      images RetroPie\home\pi\RetroPie\ [system_name] \ [images]

      Concrete Examples

      Data Type Directory
      roms RetroPie\home\pi\RetroPie\atari5200\
      images RetroPie\home\pi\RetroPie\atari5200\images\

      Step 1: Identify Your Romset

      Naming Conventions: MAME versus MESS versus No-intro

      Parenthesis Types & Meanings

      Country Code Meaning
      (U)/(USA) USA
      (E)/(Europe) Europe
      (J)/ (Japan) Japan
      (G) Germany
      (HK) Hong Kong
      (I) Italy
      (JU) USA/Japan
      (JE) Europe/Japan
      (JUE) USA/Europe/Japan
      Standardized Code Meaning
      ------------- -------------
      [a*] Alternative Version
      [b*] Bad Dump
      (Demo) Demo
      [h*] Hack
      [f*] Fixed to run better on copiers or emulators
      (m#) Multi-language (# of Languages)
      [o] Over-dump
      [p] Pirate
      (Proto)/ (Prototype) Prototype
      (Rev) Revision (# of Revision)
      [t] Trained
      [T] Translation
      (Unl) Unlicensed
      ZZZ_ Unclassified
      (-) Unknown Year
      [!] Verified Good Dump
      (M#) Multilanguage (# of Languages)
      (###) Checksum
      (??k) ROM Size

      Concrete Examples

      Rom Set Type Naming Convention
      No-intro Street Fighter 2 (USA)
      MESS dkong
      MAME sfzhr1

      Intended Meaning

      Rom Name Description
      Street Fighter 2 (USA) Street Fighter 2 USA version (Parent Rom)
      dkong Donkey Kong (Parent Rom)
      sfzhr1 Street Fighter Zero Hack Revision 1 (Clone)

      Step 2: Do You have the Corresponding Rom Set or Do You Need to Rebuild?

      Scraper & Database Limitations: No-intro over MESS

      Scraper Versus Databases: Supported Systems
      Sselph's Scraper was written in Go using corresponding hash files, so the supported systems below correspond to those files on the respective database and the naming conventions the metadata and media used.

      Scrapers

      Sselph's Scraper
      It's lightweight and fast with a high effeciency rate for attaining metadata.

      Rom Set Supported Systems
      No-intro NES
      No-intro SNES
      No-intro N64
      No-intro GB
      No-intro GBC
      No-intro GBA
      No-intro MD
      No-intro SMS
      No-intro 32X
      No-intro GG
      No-intro PCE
      No-intro A2600
      No-intro LNX
      MAME MAME/FBA
      No-intro/ TOSEC* Dreamcast(bin/gdi)
      No-intro/ TOSEC* PSX(bin/cue)
      No-intro ScummVM
      No-intro/ TOSEC* SegaCD ROMs

      Sreenzone's Universal XML Scraper
      A caveat of this scraper, available here, is that it isn't as quick as Sselph's scraper; however, there is a gui that supports several frontends and it may be more user friendly for novices.

      Rom Set Supported Systems
      No-intro & MAME See Below

      Databases

      Media and Metadata

      Provider Link
      Screenzone Shiny
      Gamesdb Not So Shiny

      Rebuilding Your Roms Before Scraping

      Using CLRMAMEPro to Rebuild Your Rom Set

      Rebuilding has advantages and disadvantages depending on what you are trying to achieve; rom collectors may like to have a complete set that is pruned to include parent roms only; whereas, others may want to have Unl/Proto/Beta/Demo releases without duplicates.

      Level of Expertise Guidance
      Beginner's Level: How To Guide CLMAMEPro Quick-Start Guide
      Beginner's Level Guide by Herb Fargus
      Adept Level: How to Guide ClrmameEnter the Matrix

      Step 3: Scraping

      High Quality versus Quick and Grainy

      Higher Resolution Artwork will require you to match both rom and your existing media sets before using the scraper to pull the missing metadata and create the gamelist.xml. A quick and easy solution to let the scraper pull whatever matching media is available at the resolutions provided by the respective database. Personally, I prefer the media I already have over what the scraper has been able to pull in up until now. Having said that, the screenzone media is of high quality and definitely worth using as a free, main source of images.

      Sselph's Scraper Flags

      Flag Effect
      -add_not_found If true, add roms that are not found as an empty gamelist entry.
      -append If the gamelist file already exist skip files that are already listed and only append new files.
      -console_img string Comma separated order to preferred images, s=snapshot, b=boxart, f=fanart, a=banner, l=logo, 3b=3D boxart. (default "b")
      -download_images If false, scraper won't download any images; instead,checks if the expected file is stored locally already. (default= "true")
      -extra_ext string Comma separated list of extensions to also included in the scraper.
      -gdb_img string Deprecated, see console_img. (This will be removed soon).
      -hash_file file The file containing hash information.
      -image_dir directory The directory to place downloaded images in locally (default "images").
      -image_path path The path to use for images in gamelist.xml (default "images").
      -image_suffix suffix The suffix added after the rom name when creating image files (default "-image").
      -img_format The format images are written ("jpg" or "png")(default "jpg").
      -img_workers N Use N worker threads to process images. If 0, then it applies the same value as workers thread.
      -lang string The order to choose for language if there is more than one for a value. (en, fr, es, de, pt) (default "en")
      -mame If true, run in MAME mode (default="false").
      -mame_img string Comma separated order of preference for images, s=snap, t=title, m=marquee, c=cabniet. (default "t,m,s,c")
      -max_width width The max width of images; Larger images will be resized. (default 400)
      -missing file The file where information about ROMs that weren't scraped is added.
      -nested_img_dir Use a nested img directory structure that matches rom structure.
      -no_thumb Don't add thumbnails to the gamelist.
      -output_file file The filename the XML outputs to (default "gamelist.xml").
      -overview_len N If set it will truncate the overview of roms to N characters + ellipsis.
      -refresh Information will be attempted to be downloaded again, but won't remove roms that are not scraped.
      -region string The order to choose for region if there is more than one for a value (us, eu, jp, fr, xx) (default "us,eu,jp,fr,xx").
      -retries N Retry a rom N times on an error (default 2).
      -rom_dir directory The directory containing the roms file to process (default ".").
      -rom_path path The path to use for roms in gamelist.xml (default ".").
      -scrape_all If true, scrape all systems listed in es_systems.cfg; all dir/path flags will be ignored.
      -skip_check Skip the check if thegamesdb.net is up.
      -start_pprof If true, start the pprof service used to profile the application.
      -strip_unicode If true, remove all non-ascii characters. (default true)
      -thumb_only Download the thumbnail for both the image and thumb (faster).
      -thumb_suffix The suffix added after rom name when creating thumb files (default "-thumb").
      -use_filename If true, use the filename minus the extension as the game title in xml.
      -use_gdb Use the hash.csv and theGamesDB metadata (default true).
      -use_nointro_name Use the name in the No-Intro DB instead of the one in the GDB (default true).
      -use_ovgdb Use the OpenVGDB if the hash isn't in hash.csv.
      -use_ss Use the ScreenScraper.fr as a datasource.
      -version Print the release version and exit.
      -workers N Use N worker threads to process roms (default 1).

      Commandline Examples

      *** I'll add some more examples to this at a later date; I mainly use the first one to pull metadata to match my rom set and artwork in images ***

      Result Command
      Scrape Images Already Match scraper -add_not_found=true -append=true -download_images=false -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4
      Scrape MAME Images Not Already Matched scraper -add_not_found=true -append=true -download_images=true -image_dir="images" -image_path="images" -image_suffix="" -img_format="png" -img_workers=0 -mame=true -no_thumb=true -thumb_only=false -thumb_suffix="" -use_ss=true -use_gdb=true -use_ovgdb=true -workers=4

      Useful Links

      MESS

      Content Link
      Supported Systems 2176 (parents: 1060, clones: 1116) MESS Supported Systems List
      Rom Lists in CSV + XML MESS Software Lists

      No-intro

      What Is It? Link
      Used to Update & Rebuild Rom Sets with CLMAMEPRo No-intro DATs

      Tools

      What Where
      Rom Set Rebuilder CLMAMEPRo
      Sselph's Scraper Download
      Sselph's Scraper Information Github
      Universal XML Scraper (RetroPie Version) Universal XML Scraper
      XML Editor Universal XML Editor

      References

      1. Screen Scraper
      2. Sselph's Scraper
      3. Project Retrobution
      T 1 Reply Last reply Reply Quote 5
      • T
        tien_huu_1408 @DrMaxwell
        last edited by

        @DrMaxwell Thanks a lot! Hope you could develop!

        DrMaxwellD 1 Reply Last reply Reply Quote 0
        • DrMaxwellD
          DrMaxwell @tien_huu_1408
          last edited by DrMaxwell

          @tien_huu_1408 I'll try to add more to this using the Screenzone scraper options as well.
          @sselph Does the -append=true flag now remove roms that cannot be found from the gameslist?

          1 Reply Last reply Reply Quote 0
          • vbsV
            vbs
            last edited by

            Thanks for this post! Since this scraper is using file hashes it is somehow crucial to have the exact files matching the ones in the used database. This post is the only place I could find which mentions the exact ROM set you need to match the DB. So it seems to be No-Intro centered?
            How does it work for Dreamcast and Playstation? As far as I know (dat-magic), these systems are not covered by No-Intro.
            Also TOSEC is mentioned in the list but at least the TOSEC Dreamcast games are in a multi-file format (GDI + track01.bin + tack02.raw) which makes them quite uncomfortable to use in Retropie. I cannot just put dem to the roms folder cause of duplicated filenames :(

            P 1 Reply Last reply Reply Quote 0
            • P
              paradadf @vbs
              last edited by

              @vbs I'd recommend you to take a look here:
              https://forum.recalbox.com/topic/4690/soft-universal-xml-scraper-v2

              1 Reply Last reply Reply Quote 0
              • N
                Necrym
                last edited by

                This post is deleted!
                1 Reply Last reply Reply Quote 0
                • First post
                  Last post

                Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.

                Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.