Manually generate Mame2003 gamelist?
-
@herb_fargus The timing for your project happened to match my own research. I have been using my own manually generated gamelist for a while, but I need to add some other meta data like descriptions and genre. Thanks for the links to sources. I was about to ask for advice on those!
As for actually generating the XML, I built mine in a google sheet with a series of formulas. It starts with a copy of the compatibility list for MAME2003 linked in the wiki here, then I added sheets that list the other metadata columns like path, image, description, and so on. Finally, because I don't have EVERY ROM installed, I do a lookup and generate a list of ROMS I marked, then I build out the XML using a series of vlookup functions, basically constructing the XML lines, game by game, only it's essentially automatic. If I mark a new ROM for installation, it appears in the XML sheet. I then copy all rows from the XML sheet and paste it into the text file.
It's not super elegant, but it works.
-
@herb_fargus Any ideas how you are going to parse the history.dat? I opened it with a text editor to see if I could systematically grab descriptions, but there is a LOT of detail in that file. It was too much to hope for my description on a line with ROM,DESCRIPTION, in front of it!
-
@caver01 I've been trying to think of a solution for that as well. From what I gather it's more or less in the same format though some may not even have a description equivalent as it seems to defer to the parent for the descriptions for some clones and bootlegs. My preliminary though was to do something like we do with the ini configs in the setup script where it parses a text block based on keywords before and after but could get messy.
Overall as far as metadata goes I like your idea in that we can generate an expanded compatibility list if you will into one sheet and then we can parse that out to a gamelist XML once complete. I have some dat tools do split what's available (that's how I created the compatibility lists in the first place)'
But yeah if you'd like to help out perhaps we can come to a solution
-
@herb_fargus If we have a sheet of metadata (or simply more columns in the compatibility list to accommodate our XML needs), I can apply my formula to generate the XML, which can then be copy/pasted into a text file.
Come to think of it, it wouldn't have to be dynamic. Mine today is lacking a few columns, but it dynamically builds the XML in a worksheet based on my selected games--but we could just as easily build a master gamelist XML file that contains the details for every possible ROM in the set regardless of whether or not the ROM is installed. The idea would be that if you just drop an image into the right folder using the right filename it will display. It's a bit like "scraping the possibility" of a ROM, rather than an actual scrape of what is really there. Then, just add images.
It strikes me that some of this work might be done already. I seem to recall an earlier build when I installed an entire ROMset and started the scraper. I quickly ran out of patience with image downloads, but I did end up with a gamelist.xml that had a lot of the details already there. That might be a better starting place for columns like genre, description, publisher, etc.
Or. . .am I remembering it wrong? Can we leverage a complete romset and scrape that to get a good starting set of XML metadata?
-
@caver01 some has been done but I want to pull from as many official sources as possible rather than user created data as it's often incomplete and fraught with errors
-
@herb_fargus That makes sense. I assumed the scrapers that build XML were using official data sources, but from some of my own investigations I recall sites requesting help populating their databases.
-
@caver01 it pulls from some official sources for hash matching and such but as far as content there aren't official sources for some of the metadata like ratings as they are subjective.
Really the only technically proper way to handle mame is to set up a versioned database that takes into account all the many changes through each revision. Clrmamepro handles some but there isn't any real API or anything for it at least not any complete ones. Most are static based on the latest mame version
-
@herb_fargus Ok, so maybe this is just restating the obvious, but the XML generation seems like the easier part (for me) and I am happy to contribute what I have done via spreadsheets if that's helpful. The real work is populating the columns of metadata. All I have done is construct a lookup mechanism. I imagine there are script wizards that can one-up me on that just using command line.
-
@caver01 yes populating the tags and XML generation for the static mame2003 list is where is like to start. If you want to do a google sheets collaboration with what you have I'd like to do that so we have a central place to work on it
-
@herb_fargus, @caver01 Some time ago I wrote an excel VBA scaper for Mobygames for Vic20 games.
I could modify it to get their Arcade games metadata and provide these in xls or csv format adding related mame .78 rom name column.
Mobygames lists about 1800 roms (so unfortunately it is not complete set). -
@UDb23 it's a good way to get the bulk, could be done. If you haven't noticed by now I'm very particular about the quality/completeness of my data.
First I'd like to identify any complete sources we have, which means starting with MAME itself extracting information/dats/XML directly from the mame binary . Those are simple enough to parse with basic regex/sed.
Next level of completion of sorts is progettosnaps (which also has a source of the dats from the mame executables) primarily he has full sets of snaps.
Last night the issue I ran into is he has snaps for .170 not .78 so I had to use a rename dat to fix the filenames not accounted for but there are some ROMs that are hacks/bootlegs that have been dropped from mame as they aren't really "real" mame games and don't fit their archival purposes. So I hashed out the rest of those.
Then of course there are ES tags that are arbitrary or that there is no official mame source for them, that's where random scraping/user content is necessary.
Clrmamepro really is supposed to be able to handle rebuilds like this but it really only works if you have proper dats and there aren't a whole lot of dats for metadata outside of just rom info so... Makes it a bit more of a manual process.
-
@UDb23 Wow, that would be a big help for me, but part of this project is to ensure the details come from a known or respected source. Do we know where the original source data comes from for the Arcade section of MobyGames?
-
@caver01 I could get the descriptions also from progettoemma.net. It seems to be reliable. See example here; metadata in italian or english.
Contains all mame roms and I already did .78 vs actual mame rom set renaming check for the Mame resolution DB. -
You'll note that almost all the content from progetto emma is derived from the dats and the history dat.
quick update, setup some folders in google drive. one is for mame 2000, the other for mame 2003. I've taken the original dat (I generated one based on imame4all rather than the default 0375b dat which has a minor difference of 30 roms) anyways each folder has a dat and an xmlish (technically a hyperspin db generated from datutils) that has simplified tags of the relevant information that can be pulled from the dat files. I still need to parse them out into their relative columns in the spreadsheets.
Anyways once those are parsed out, you'll see I have columns for each sheet where the head is the theoretical source of the contents of that column (eg. desc come from the history.dat etc) also has the ES tags as the column heads so we know what they correspond to in emulationstation. Now its just a matter of setting the paths and parsing everything into their sheets. if you want access to them let me know and I'll add you
https://drive.google.com/open?id=0B2TMeZ6iEFvHMlAzR29lQl9mTzg
-
@herb_fargus Looks like a good setup. I have a couple of questions/observations:
Description vs. Name vs. Path (ROM)
It looks like many of the sources use the header "Description" as the long name of the game. For example, the the Name header in the source data might be "1941" which corresponds to the ROM zip filename, and the Description header is the long version of the name, like "1941 - Counter Attack (World)". These headers are slightly offset from what we need in ES. Our Name is actually the long name or Description above, and the Name above ( ROM.zip) is just part of the path. Then, if I am not mistaken, our Desc is supposed to be the paragraph of text that describes the game--something like "1941 is a vertical shooter in which you fly a World War II airplane into battle. . ." and so on.You probably already see the same thing. Also, it might be handy to add ROM.zip column even though it's not specifically used in XML as-is. This can be used on every row in many ways, both as lookup key for pulling in the other values for each row as well as reference text to construct Path and Image values. I am just "thinking" out loud here about what I would do to start building formulas to get the data into Sheet 1.
Take a look at my sheet. It started with MasterList which is a copy of the compatibility list from AdvanceMAME (.106). The second sheet is very similar to your Sheet1 with the XML columns, and finally, a Rebuilt XML sheet which uses a formula to lookup games I have marked in the MasterList and builds out specific XML rows. Notice how everything in the XML Columns sheet is also indexed, as this allows me to block out the "stacked" XML rows (with 1, 1, 1, 1, 2, 2, 2, 2, etc.) as a way to effectively transpose the columns into rows. To make this work, I simply mark a game in the MasterList as one I want to install (by placing an "X" in the R-Cade column) and it generates the XML content on the fly. THen, I simply select C,D,E columns from the Rebuilt XML sheet and copy, paste into a text file.
-
I agree there should also be a column with the romname. This allows associating other data related to that ROM; e.g. I'd like to integrate my resolution DB mentioned above and create a "global" mame 78. data file.
Concerning Descriptions, meaning the text (paragraph) that describes the game, as @herb_fargus said, it can be found in the History.dat file (will populate column C in his google sheet).
-
Agreed. Would be good to have a rom name which should essentially be the files stripped of their extensions was thinking of filling out the paths, filtering etc. Just so its a little clearer which attributes go where
-
@herb_fargus said in Manually generate Mame2003 gamelist?:
should essentially be the files stripped of their extensions
perfect.
-
Hi,
I have a little php script which I use to parse history.dat to generate or update gamelist.xmlIt's not a scraper but it's been handy for me to populate the game infos quickly without spending a long time scraping.
I just provide a snap directory and history.dat file and run my script on the pi, it takes about a minute to run for a large history.dat + large directory
might need tidying up a bit with instructions but am happy to post.
-
@kixut That sounds very useful! I wouldn't mind giving it a go.
@caver01 I've also updated my sheets filling in the content I have. the catver and history dat should get the rest mostly? I also wasn't sure what the distinction between developer and pubisher was in the metadata seems sometimes its both, sometimes its only one and not the other, sometimes may technically be neither... bit confusing.
You'll also note my paths are relative as I intend for this to just be a package someone can dump in their romlist and just have it automatically scraped that way as everyone should more or less have the same romset.
Contributions to the project are always appreciated, so if you would like to support us with a donation you can do so here.
Hosting provided by Mythic-Beasts. See the Hosting Information page for more information.