[SOFT] mamescraper - fast and simple scraper for MAME
-
mamescraper is an open source fast and simple scraper that can scrap mame games information and images from 'mame.bigode.net' or 'adb.arcadeitalia.net' and generate a xml for use with EmulationStation.
It should run on all platforms (Windows, Linux, Mac, etc...) that have Python 2 installed. The program uses only Python standard library for maximum compatibility.
For Windows, a self contained executable file is available to download.
mamescraper can:
- Download images flyers and titles
- Run with multiple workers to increase download/scrap speed
- Read an existing xml file to scrap for missing games only
Currently it supports two sources with drastically different scraping methodologies.
The default source (bigode) is a lot faster because the scraper will download an entire mame database (1.1MB compressed) and scrap all the games information in one go. Besides that, this source uses a CDN and a very fast webserver to serve the images, resulting in faster responses and downloading speed overall.
The adb source is a more traditional approach, for each game found, the scraper will do a http request to get the information needed and then will download the appropriate image.
In my tests, I was able to scrap an entire mame 037b5 set (2241 roms) in less than two minutes using the default 'bigode' source and 10 workers threads.
A useful use case is to run the scraper using the default 'bigode' source (since it's faster) and if a game is not found, just run the scraper again in 'append' mode using 'adb' as source to scrap only the missing games.
Screenshot of the scraper in action:
Download from releases.
More info, usage and source code on GitHub.
A huge thanks to AntoPISA creator of Progetto Snaps for the images and tons of mame resources.
Also, a huge thanks to Motoschifo creator of Arcade Database for the awesome arcade database website.
Enjoy ;)
-
Absolutely fantastic bit of work - thank you very much! The worst thing about setting up a decent system is the scraping of game info, it seems to take forever, this took seconds on my Mac, I really cant thank you enough!
-
@chubsta thanks for the words!
I am glad to know someone found it useful :)
-
Worked perfect and very fast for my MAME roms. But I've some romes without images, how do I run the adb source? I've placed the .exe in my rom folder and run it. Now I try to run it again and I won' start.
-
@captainvelvet It was explained in the examples section:
It is useful for scrap on both sources if a game is not found. Just run the scraper a second time with append enable and a different source:
$ mamescraper -a -s adb
Sorry for the late reply :)
-
I just updated the script to reflect the changes made last month on Arcade Database api.
Both sources are working fine now.
-
You should REALLY think about lower the amount of workers from ten to something much more reasonable, like 2 or 3. Otherwise, sites that provide this info will just decide to stop allowing scrapers at all because they are getting hit so hard by them, which has happened many times in the past.
-
@zerojay The default number of workers is 5 (not 10) and in the last update I enforced the number of workers threads to 1 when using adb (Arcade Database) source to comply to their new rules.
When using the default "bigode" source to scrap you can use any numbers of workers threads you want, since this source is maintained by me and I really dont care about the numbers of workers people are using.
One important thing to note is that since the program is open source, it is really easy to remove the restriction on the number of workers when using adb.
So, in the end, it is really up to the server administrator to implement any kind of restriction about the number of concurrent http connections per user :)
-
What are the full list of commands for Windows?
-
@blockaboots Same as Linux:
$ mamescraper -help
Usage: mamescraper [options]scrap mame games information and images from 'mame.bigode.net' or
'adb.arcadeitalia.net'Options:
--version show program's version number and exit
-h, --help show this help message and exit
-a, --append scrap only missing roms from output file and append it
to the file (default: disabled)
-d ROMS_DIR directory containing the games (default: current
directory)
-e IMAGES_DIR_NAME directory name to download the images (default: images)
-f FORMAT file format of the games: 'zip' or '7z' (default: zip)
-i IMAGES images type: 'mixed', 'title' or 'flyer' - mixed will
download a flyer and fallback to title if a flyer is not
found (default: mixed)
-o OUTPUT_FILE the xml file that will be created (default:
gamelist.xml)
-s SOURCE information and images source: 'bigode' or 'adb'
(default: bigode)
-w WORKERS number of workers threads to use (default: 5)Github has this information and some useful use cases.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.