Manually generate Mame2003 gamelist?
-
@caver01 some has been done but I want to pull from as many official sources as possible rather than user created data as it's often incomplete and fraught with errors
-
@herb_fargus That makes sense. I assumed the scrapers that build XML were using official data sources, but from some of my own investigations I recall sites requesting help populating their databases.
-
@caver01 it pulls from some official sources for hash matching and such but as far as content there aren't official sources for some of the metadata like ratings as they are subjective.
Really the only technically proper way to handle mame is to set up a versioned database that takes into account all the many changes through each revision. Clrmamepro handles some but there isn't any real API or anything for it at least not any complete ones. Most are static based on the latest mame version
-
@herb_fargus Ok, so maybe this is just restating the obvious, but the XML generation seems like the easier part (for me) and I am happy to contribute what I have done via spreadsheets if that's helpful. The real work is populating the columns of metadata. All I have done is construct a lookup mechanism. I imagine there are script wizards that can one-up me on that just using command line.
-
@caver01 yes populating the tags and XML generation for the static mame2003 list is where is like to start. If you want to do a google sheets collaboration with what you have I'd like to do that so we have a central place to work on it
-
@herb_fargus, @caver01 Some time ago I wrote an excel VBA scaper for Mobygames for Vic20 games.
I could modify it to get their Arcade games metadata and provide these in xls or csv format adding related mame .78 rom name column.
Mobygames lists about 1800 roms (so unfortunately it is not complete set). -
@UDb23 it's a good way to get the bulk, could be done. If you haven't noticed by now I'm very particular about the quality/completeness of my data.
First I'd like to identify any complete sources we have, which means starting with MAME itself extracting information/dats/XML directly from the mame binary . Those are simple enough to parse with basic regex/sed.
Next level of completion of sorts is progettosnaps (which also has a source of the dats from the mame executables) primarily he has full sets of snaps.
Last night the issue I ran into is he has snaps for .170 not .78 so I had to use a rename dat to fix the filenames not accounted for but there are some ROMs that are hacks/bootlegs that have been dropped from mame as they aren't really "real" mame games and don't fit their archival purposes. So I hashed out the rest of those.
Then of course there are ES tags that are arbitrary or that there is no official mame source for them, that's where random scraping/user content is necessary.
Clrmamepro really is supposed to be able to handle rebuilds like this but it really only works if you have proper dats and there aren't a whole lot of dats for metadata outside of just rom info so... Makes it a bit more of a manual process.
-
@UDb23 Wow, that would be a big help for me, but part of this project is to ensure the details come from a known or respected source. Do we know where the original source data comes from for the Arcade section of MobyGames?
-
@caver01 I could get the descriptions also from progettoemma.net. It seems to be reliable. See example here; metadata in italian or english.
Contains all mame roms and I already did .78 vs actual mame rom set renaming check for the Mame resolution DB. -
You'll note that almost all the content from progetto emma is derived from the dats and the history dat.
quick update, setup some folders in google drive. one is for mame 2000, the other for mame 2003. I've taken the original dat (I generated one based on imame4all rather than the default 0375b dat which has a minor difference of 30 roms) anyways each folder has a dat and an xmlish (technically a hyperspin db generated from datutils) that has simplified tags of the relevant information that can be pulled from the dat files. I still need to parse them out into their relative columns in the spreadsheets.
Anyways once those are parsed out, you'll see I have columns for each sheet where the head is the theoretical source of the contents of that column (eg. desc come from the history.dat etc) also has the ES tags as the column heads so we know what they correspond to in emulationstation. Now its just a matter of setting the paths and parsing everything into their sheets. if you want access to them let me know and I'll add you
https://drive.google.com/open?id=0B2TMeZ6iEFvHMlAzR29lQl9mTzg
-
@herb_fargus Looks like a good setup. I have a couple of questions/observations:
Description vs. Name vs. Path (ROM)
It looks like many of the sources use the header "Description" as the long name of the game. For example, the the Name header in the source data might be "1941" which corresponds to the ROM zip filename, and the Description header is the long version of the name, like "1941 - Counter Attack (World)". These headers are slightly offset from what we need in ES. Our Name is actually the long name or Description above, and the Name above ( ROM.zip) is just part of the path. Then, if I am not mistaken, our Desc is supposed to be the paragraph of text that describes the game--something like "1941 is a vertical shooter in which you fly a World War II airplane into battle. . ." and so on.You probably already see the same thing. Also, it might be handy to add ROM.zip column even though it's not specifically used in XML as-is. This can be used on every row in many ways, both as lookup key for pulling in the other values for each row as well as reference text to construct Path and Image values. I am just "thinking" out loud here about what I would do to start building formulas to get the data into Sheet 1.
Take a look at my sheet. It started with MasterList which is a copy of the compatibility list from AdvanceMAME (.106). The second sheet is very similar to your Sheet1 with the XML columns, and finally, a Rebuilt XML sheet which uses a formula to lookup games I have marked in the MasterList and builds out specific XML rows. Notice how everything in the XML Columns sheet is also indexed, as this allows me to block out the "stacked" XML rows (with 1, 1, 1, 1, 2, 2, 2, 2, etc.) as a way to effectively transpose the columns into rows. To make this work, I simply mark a game in the MasterList as one I want to install (by placing an "X" in the R-Cade column) and it generates the XML content on the fly. THen, I simply select C,D,E columns from the Rebuilt XML sheet and copy, paste into a text file.
-
I agree there should also be a column with the romname. This allows associating other data related to that ROM; e.g. I'd like to integrate my resolution DB mentioned above and create a "global" mame 78. data file.
Concerning Descriptions, meaning the text (paragraph) that describes the game, as @herb_fargus said, it can be found in the History.dat file (will populate column C in his google sheet).
-
Agreed. Would be good to have a rom name which should essentially be the files stripped of their extensions was thinking of filling out the paths, filtering etc. Just so its a little clearer which attributes go where
-
@herb_fargus said in Manually generate Mame2003 gamelist?:
should essentially be the files stripped of their extensions
perfect.
-
Hi,
I have a little php script which I use to parse history.dat to generate or update gamelist.xmlIt's not a scraper but it's been handy for me to populate the game infos quickly without spending a long time scraping.
I just provide a snap directory and history.dat file and run my script on the pi, it takes about a minute to run for a large history.dat + large directory
might need tidying up a bit with instructions but am happy to post.
-
@kixut That sounds very useful! I wouldn't mind giving it a go.
@caver01 I've also updated my sheets filling in the content I have. the catver and history dat should get the rest mostly? I also wasn't sure what the distinction between developer and pubisher was in the metadata seems sometimes its both, sometimes its only one and not the other, sometimes may technically be neither... bit confusing.
You'll also note my paths are relative as I intend for this to just be a package someone can dump in their romlist and just have it automatically scraped that way as everyone should more or less have the same romset.
-
@herb_fargus said in Manually generate Mame2003 gamelist?:
@caver01 I've also updated my sheets filling in the content I have. the catver and history dat should get the rest mostly? I also wasn't sure what the distinction between developer and pubisher was in the metadata seems sometimes its both, sometimes its only one and not the other, sometimes may technically be neither... bit confusing.
You'll also note my paths are relative as I intend for this to just be a package someone can dump in their romlist and just have it automatically scraped that way as everyone should more or less have the same romset.
Looks good to me! It's very close now. Did you see how I used a similar sheet to build out a new one with the XML text? Maybe there's an easier way, but I am happy to try. That PHP script sounds promising.
-
Hi,
I've added a few comments now hope it helps someone, I haven't had time to retest but let me know if anything needs to be explained.
To use firstly php needs to be installed, the script needs to be on the pi with a history.dat file in the same directory, the $system needs to be set in the source (to select the rom dir), then run with php -f file.php - see more details in the script comments for more info.
Edit3: I would love to see the correct display of the Japanese names of games but either the character encoding is getting messed up somewhere in my script or emulation station doesn't either handle the character encoding or support the character set, not sure at the moment.
Edit2: in the comments it implies that it generates the gamelist.xml, it really updates an existing gamelist.xml file, so it is expected to already exist - the script fills in the missing gamelist.xml info from data from the history.dat file.
Edit: derail alert
On another note, a little while back a upgraded my pimorini picade with the latest version of retropie and also wanted to sort my roms out, I wanted to do tasks such as fill info from history.dat, or move clones into a subdirectory, or be able to select games and move a big selection into a subdirectory and other various rom management tasks.I thought that maybe a good solution for this would be to create a web app that is served from the pi then the whole thing can be managed remotely, I've a lot of experience in this so I should be able to put something together quite quickly, but haven't looked at what management solutions are already available, I don't want to re-invent the wheels but would be interested if anyone thinks there would be a need to that type of solution.
<?php /************************************************************** / / Script to populate gamelist.xml from history.dat for retropie / / By Garry Pankhurst 12/12/2016 / / Populates :- / Title / Description / Publisher / Date published / Programmer / Players / Imagefile if {{snap directory}}/{{gamerome}}.png exists / that matches either the main rom or clone **************************************************************** Instructions :- 1) Install php sudo spt-get install php5 2) Change $system below to which system (rom directory) where you want to update the gamelist.xml file 3) Set $listpath and $snappath , see comments below 4) make sure that you either exit Emulation Station or Save Metadata on Exit is turned off (otherwise changes will be overwritten) 5) back up your gamelist.xml file 6) Save this file to the same place on the pi where there is a history.dat file as "thisfile.php" from there, at the command prompt run "php -f thisfile.php" ****************************************************************/ // uncomment a system //$system = "mame"; //$system = "mame-advmame"; //$system = "fba"; //$system = "arcade"; $system = "mame-libretro"; // you dont need to change this :- $path = "/home/pi/RetroPie/roms/".$system; // do you keep your list path with your roms ? $listpath = $path."/gamelist.xml"; // or here ? //$listpath = "/home/pi/.emulationstation/gamelists/".$system."/gamelist.xml"; // path to the snap directory from roms directory - or can use absolute path $snappath = "./images"; $cwd = getcwd(); // remember starting current directory chdir($path); // go to where roms are $gameslist = simplexml_load_file($listpath); // load gameline.xml $history = array(); // start an empty history array // strip junk from history.dat attribute function sss($string) { return preg_replace ('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $string); } function get_attr($attr, $line) { if(preg_match('/^'.$attr.' : /i', $line)) return rtrim(preg_replace('/^'.$attr.' : /i', '', $line)); } function scrape_history() { global $history, $path, $cwd, $snappath; echo "Loading history...\n"; // open history, go through each line and add every game as an // object to the $history associative (hash) array $fp = fopen($cwd."/history.dat","r"); if(!$fp) { echo "Cannot open history.dat"; exit(0); } $line = fgets($fp); while(!feof($fp)) { // find $info while(!feof($fp) && !preg_match("/^\\\$info/", $line)) $line = fgets($fp); if(feof($fp)) break; // array of zips, the first zip being the the original // the following are the clones $zips=preg_split("/[\s,]+/", preg_replace("/^\\\$info=/","",$line), -1, PREG_SPLIT_NO_EMPTY); // build a $game new object from history.dat $game = (object)new StdClass; //$game = array(flag => true); // skip to "bio" while(!feof($fp) && !preg_match("/^\\\$bio/", $line)) $line = fgets($fp); if(feof($fp)) break; // first lines after a bio line // get title, publisher and release date from bio $line = fgets($fp); // blank $line = fgets($fp); // title $game->title = sss(rtrim(preg_replace("/\(c\).*/", "", $line))); $game->publisher = rtrim(rtrim(preg_replace("/.*\(c\).*[0-9] */", "", $line)),'.'); $relyear = preg_replace("|.*\(c\).*[ /]([1-2][0-9][0-9?][0-9?])[\. ].*|", "\\1", $line); if($relyear==$line) $relyear = preg_replace("|.*\(c\).*[ /]([1-2][0-9][0-9?][0-9?])$|", "\\1", $line); $relmonth = preg_replace("|.*\(c\).* ([0-9][0-9])/([1-2][0-9][0-9][0-9]) .*|", "\\1", $line); if($relmonth==$line) { //echo "bad rel month [$line]\n"; //exit; $relmonth='01'; } if($relyear==$line) { echo "bad rel year [".rtrim($line)."]\n"; } else $game->releasedate = rtrim($relyear).rtrim($relmonth).'01T000000'; $line=''; $bio = ""; // use 'just' the rest of bio as description $gameid=''; $players=''; $programmer=''; $b=0; while(!feof($fp) && !preg_match("/^\\\$end/", $line)) { // just first 10 lines will do if($b++<10) $bio .= $line; $line = fgets($fp); //if(!isset($game->gameid)) //$game->gameid = get_attr('Game ID', $line); if(!isset($game->players)) $game->players = get_attr('Players', $line); if(!isset($game->programmer)) $game->programmer = sss(get_attr('Programmer', $line)); if(!isset($game->programmer)) $game->programmer = sss(get_attr('Programmers', $line)); } $game->bio = trim(sss($bio)); // find first image/clone image in set // this image will be used for clones // if no clone image exists for any clone for($i=0; $i<count($zips); $i++) { if(!isset($game->image) && file_exists($snappath."/".$zips[$i].".png")) $game->image = $snappath."/".$zips[$i].".png"; } // Add a game entry to the history array // for each zip for($i=0; $i<count($zips); $i++) { if(!$zips[$i]) continue; $g = clone $game; $g->clone = ($i>0); // lets not store bio for clones if($g->clone) { unset($g->bio); } // override main image for the set if a // specific matching clone image exists if(file_exists("./images/".$zips[$i].".png")) $g->image = "./images/".$zips[$i].".png"; if(preg_match('/\?\?/', $game->title)) $g->title = $zips[$i]; elseif($i>0) $g->title .= ' ('.$zips[$i].')'; $history[$zips[$i]] = $g; } //print "loaded ".$game->title."\n"; $line = fgets($fp); } fclose($fp); } function merge_history($game) { global $c, $u, $history; $name = basename($game->path, ".zip"); if(!isset($history[$name])) { print "not found ".$name."\n"; return; } $gamehist = $history[$name]; $u=true; if($game->name != $gamehist->title) { $game->name = $gamehist->title; $u=true; } //if(!$game->desc) { $game->desc = $gamehist->bio; $u=true; } if(!$gamehist->clone) { $game->desc = $gamehist->bio; $u=true; } // update if missing or different if(!$game->publisher) { $game->publisher = $gamehist->publisher; $u=true; } if($game->publisher != $gamehist->publisher) { $game->publisher = $gamehist->publisher; $u=true; } if(!$game->releasedate) { $game->releasedate = $gamehist->releasedate; $u=true; } if($game->releasedate != $gamehist->releasedate) { $game->releasedate = $gamehist->releasedate; $u=true; } if(!$game->players) { $game->players = $gamehist->players; $u=true; } if($game->players != $gamehist->players) { $game->players = $gamehist->players; $u=true; } if(!$game->developer) { $game->developer = $gamehist->programmer; $u=true; } if(!$game->image && isset($gamehist->image)) { $game->image = $gamehist->image; $u=true; } if(isset($gamehist->image) && (!$game->image ||$game->image != $gamehist->image)) { $game->image = $gamehist->image; $u=true; } if($u) { echo "-------------------------\n"; //echo " Game ID = " . $gamehist->gameid . "\n"; echo " Title = " . $gamehist->title . "\n"; echo " Publisher = " . $gamehist->publisher . "\n"; echo " Developer = " . $gamehist->programmer . "\n"; echo " Released = " . $gamehist->releasedate . "\n"; echo " Players = " . $gamehist->players . "\n"; echo "-------------------------\n"; } } // first build our history.dat file into an quickly accessible // array of games (indexed by rom name) scrape_history(); // then go through the existing gamelist.xml file // // for each game that still exists // merge any missing information gathered from the history.dat file $c=0; for($i=0; $i<count($gameslist->game); $i++) { $u = false; $game = $gameslist->game[$i]; $name = basename($game->path, ".zip"); $games[$name.".zip"] = $i; echo "$i, $c) " . $name . " : " . $game->name; // if the rom exists then merge any better data from history.dat if(isset($game->path) && file_exists($game->path)) { merge_history($game); if($u) { echo " - updated"; $c++; // save every 50 games if( ($c % 50) == 0 ) { echo "\nSaving ..."; $gameslist->asXml($listpath); } } } echo "\n"; } // we've checked whatever was in gamelist.xml // now look for any new roms in the rom directory if($dh = opendir($path)) { while(($file = readdir($dh)) !== false) { $i++; if(preg_match("/\.zip/", $file) && !isset($games[$file])) { echo "$i, $c) insert : filename = " . $file . "\n"; $game = $gameslist->addChild('game'); //$game->addChild('path', $path.'/'.$file); $game->addChild('path', './'.$file); merge_history($game); $c++; // Save every 50 if( ($c % 50) == 0 ) { echo "Saving ...\n"; $gameslist->asXml($listpath); } } } closedir($dh); } // final save, we're done $gameslist->asXml($listpath); ?>
-
still need to check out the php script, haven't gotten a free moment yet.
Though as an aside to parsing the xmls, as I haven't really got the greatest scripting skills, a quick hack you can download the sheet as a csv, upload that csv to here: http://www.convertcsv.com/csv-to-xml.htm and tweak a few settings and it pukes out a functional gamelist.xml
Once the spreadsheets are filled, should be simple enough to generate :)
-
Very interesting web tool, will try it; thanks !
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.