ckmame: repository: 5f1878370a8b docs/xmlheaders.txt

Navigation

Views: changesets, files, tags, branches

Views: file changeset latest diff comparison annotate revisions raw

view docs/xmlheaders.txt @ 955:5f1878370a8b

Clrmamepro header skip documentation.
author Dieter Baron <dillo@danbala.tuwien.ac.at>
date Mon, 09 Apr 2007 20:39:54 +0000
parents
children
line source
1 XML driven header support
2 =========================
4 So...what's the main problem with headers anyway? They have an impact on the
5 file's hash values, like crc32, sha1 or md5. A used datfile usually lists only
6 the hash value of the pure rom data while the header isn't included. Before
7 clrmamepro 3.90, the program calculated checksums for a full file only, so if
8 a header was present, it was included in the calculation and you ended in a
9 different hash value.
11 With clrmamepro 3.90 onwards, the program can do additional checks to detect
12 headers, skip them and calculated the hashes over the pure data.
14 Now how are such checks defined? Do I have to code a plugin for clrmamepro?
16 While thinking about a way to support headers, I quickly dropped the idea of a
17 plugin system (not to mention the security risks which generally arise with
18 plugins), Mike (Logiqx) came up with the idea of defining header detection
19 rules in XML. I liked the idea right away from his mail and thought about what
20 is needed to detect headers and how to describe these operations in XML.
22 Generally you need to support some test(s) which return you a start and end
23 offset of the real data block of a given file. Maybe additionally operations
24 which do byte or wordswaps on that resulting block. clrmamepro will then only
25 calculate the hash values on that 'real data' block.
29 How to use the header support:
30 ==============================
33 What do I need?
34 ---------------
36 If you got a datfile for a system which commonly comes with headers in its
37 files, you need a XML file which describes that system, what to skip, what to
38 keep, etc. For the syntax of such files, look at the end of this little
39 textfile.
42 Where to place such header definitions?
43 ---------------------------------------
45 If you use clrmamepro 3.90 for the first time, you'll see that a new subfolder
46 was created in your clrmamepro folder. That folder is named "headers". That's
47 the place where clrmamepro looks for *.XML files at startup. So if anyone
48 provides XML header detection files, put them in there and restart clrmamepro
49 to get them loaded.
52 Ok I got XML files in there, how do I use them?
53 -----------------------------------------------
55 If you start clrmamepro and go to the settings window (after loading a profile
56 ), you'll find a new entry "headers" in the combo box you usually use for
57 rompath, etc. selection. If you select that entry, you'll see all currently
58 available header formats listed. You can enable/disable them there as well.
59 This is a per-profile option and your enable/disable selection is remembered
60 in the currently loaded profile.
62 If you don't see one of your files in there, it probably didn't make it through
63 a validation check. Contact the author of that XML file and ask what's wrong
64 with it.
66 Every activated header support will then be used when a checksum of a file is
67 calculated. This includes Scanner's checksum check, scanner's name check, the
68 rebuilder, the merger and several other stuff.
70 Of course usually you should know which header support should be enabled for
71 which profile/datfile. And of course you should only enable exactly THAT used
72 format, like a NES header detector for a NES datfile.
75 Some additional information:
76 ----------------------------
78 Header detection returns a file size value which defines the size of the 'real
79 data' in the file. So generally this differs from the original file size. So
80 a datfile which uses files with headers should list either no size information
81 at all (size "-") or should use the size value of that data block (without the
82 header).
85 In which parts does the header detector play a role?
86 -----------------------------------------------------
88 Currently the variable hash calculation is active in: Scanner, Rebuilder,
89 Merger and Dir2Dat. Keep in mind, it will have an impact on file sizes too,
90 since if a header was detected the file size is set to the size of the 'real'
91 data block.
94 Performance:
95 ------------
97 Generally header detection doesn't come for free. If you work on unzipped
98 files, it's just some additional checks but if you scan zipped files, internal
99 data decompressing data is needed which takes some time.
103 That's it basically for the common user. Now header-gurus should read on. The
104 interesting part is coming. How do I write such a header XML file?
108 XML format
109 ==========
111 the general look of such a XML file:
114 <?xml version="1.0"?>
116 <detector>
118 <name>...</name>
119 <author>...</author>
120 <version>...</version>
122 <rule ...>
124 <test ...
125 <test ...
126 ...
128 </rule ...>
130 <rule ...></rule ...>
131 <rule ...></rule ...>
132 ...
134 </detector>
137 So you got -besides some metainformation- a list of rules which include a list
138 of tests.
140 A rule is 'fulfilled' if all (logical AND) tests of that rule succeed. Single
141 rules are connected with a logical OR. If a rule is fulfilled, the rule
142 attributes define the real datablock, ie skip the header/footer part.
144 As soon as a rule is fulfilled, no more rules are tested. If no rule can be
145 applied successfully to the current file, the default values (start = 0, end =
146 EOF) are used.
149 Detector:
150 ---------
152 * <name> (required)
154 an unique name for the system. Like 'NES'. This name is shown in the settings
155 window where you list the headers. Since it's a unique name, it shouldn't get
156 altered without a reason.
158 * <author> (optional)
160 author specific information, maybe also a contact address to report issues.
162 * <version> (optional)
164 some information about the version, the status of the XML file.
166 * <rule> (required)
168 at least one rule has to be specified. The rule element holds information
169 about the 'real data' start/end and a possible operation on the data.
172 Rules:
173 ------
175 Rules can contain tests (we come to that later...). If a rule is fulfilled,
176 the given start and end offset values are used for later hash calculation.
178 Example:
180 <rule start_offset="80" end_offset="-800" operation="byteswap">
182 - start_offset (optional, default = 0)
183 hexadecimal value (max 64bit) which gives the real data start
185 - end_offset (optional, default = "EOF")
186 hexadecimal value (max 64bit) which gives the real data end
188 - you can use "EOF" to specify the EndOfFile. For example:
189 <rule start_offset="0" end_offset="EOF">
191 - you can use negative offsets to indicate an offset relatively taken to the
192 end of the file. You have to use "-" for that, e.g. end_offset="-800" means
193 0x800 = 2048 bytes from the end.
195 - operation: none, bitswap, byteswap, wordswap (optional, default = none)
196 'none' does nothing to the data before the hash is calculated.
198 'byteswap' performs a byte swap (or 16bit word swap if you like that term
199 better). Bytesequence 01|02 becomes 02|01. An even filesize is required.
201 'wordswap' performs a 32bit wordswap: Bytes 01|02|03|04 become
202 04|03|02|01. Filesize mod 4 = 0 is required.
204 'bitswap' swaps higher with lower bits: 7 -> 0, 6 -> 1 etc.
207 So with that basic knowledge about rules we can already define a constant
208 header skip:
210 Example:
212 <rule start_offset="80" end_offset="EOF"/>
214 Always skips the first 0x80 bytes and the hash is calculated over the rest.
215 A rule without any tests is always 'fulfilled'.
218 Tests:
219 ------
221 Rules can contain 0, 1 or more test statements which actually 'test' things.
222 For one rule all tests have to signal 'true' to fulfill the rule.
224 There are 3 groups of possible tests: data tests, boolean tests and file tests
226 - data tests:
228 Examples:
230 <data offset="1" value="415441524937383030" result="true"/>
231 <data offset="-200" value="08AF" result="false"/>
233 offset (optional, hex value (max 64bit), default = 0)
234 value (required, hex value, something != "")
235 result (optional, true|false, default = true)
237 offset can be negative (to be relative to the end of the file) and it can
238 be set to "EOF" to specify the end of file.
240 Pretty easy syntax here. Offset attribute gives a hexadecimal offset to
241 test for a given byte sequence which is given in the value attribute.
243 In the first upper example we seek offset 1 in the file, read 9 bytes (byte
244 size of value attribute) and compare the read bytes with the value given in
245 the value attribute. If they match, the test signals 'true'.
247 The result attribute can be used to invert the result of the test.
248 So if you like to test for e.g. a value != "12", you set value="12" and
249 result="false".
252 - boolean tests:
254 Examples:
256 <or offset="10" mask="1f54" value="4154" result="true"/>
257 <xor offset="10" mask="1f54" value="4154" result="true"/>
258 <and offset="10" mask="1f54" value="4154" result="true"/>
260 The value, offset and result attributes are used in the same way as in data
261 tests (including EOF and negative offsets). The mask attribute however
262 defines a bitmask which is applied (byte by byte) to the read data before
263 it's compared to value bytesquence. Depending on the used test, either a
264 bitwise OR, XOR or AND operation is performed. The byte size of the mask
265 has to be identical to the byte size of the value bytesequence.
268 - file tests:
270 Examples:
272 <file size="1000" result="true" operator="less"/>
273 <file size="PO2" result="false"/>
275 size (required, "PO2" or a hexvalue)
276 result (as mentioned above)
277 operator (optional, equal|less|greater, default = equal)
279 A file test can test a file for a given file size. The size attribute holds
280 the hexvalue (in the first example 4096 bytes = 0x1000) which should be
281 tested. You can also use "PO2" instead of a numeric file size which means
282 you test the file against a PowerOf2 file size (1, 2, 4,..., 1024, 2048..).
284 With the operators you can define to use equality, a lesser or greater
285 filesize check. Operators don't play a role if you test on "PO2".
288 Things to remember:
289 -------------------
291 - offsets are always positive hexadecimal values (max. 64bit)
292 - a "-" in front of the offset indicates an offset relatively used to EOF
293 - offsets set to "EOF" define the physical end of a file
294 - mask and value attributes have to be equally sized
295 - result attribute can be used to invert the test result
296 - a rule is not fulfilled as soon as a test fails
297 - a detector is fulfilled as soon as a rule succeeds
298 - if no rule is fulfilled or illegal seeks/reads are performed, the default
299 (start = 0, end = EOF) is used for hash calculation
303 Examples:
304 =========
306 Well...now you know the basics, let's continue with some simple examples:
309 Example 1:
310 ----------
312 <rule start_offset="80" end_offset="EOF">
313 <data offset="64" value="41435455414C2043" result="true"/>
314 <file size="PO2" result="false"/>
315 </rule>
317 <rule start_offset="0" end_offset="EOF">
318 <file size="PO2" result="true"/>
319 </rule>
321 These two rules check
323 (offset 0x64 for bytesequence 41435455414C2043
324 AND filesize is NOT a power of 2)
325 OR (filesize IS power of two)
327 Depending on which rule applies, the start offset is set to 0x80 or 0x0 and
328 the end is always the end of file.
331 Example 2:
332 ----------
334 <rule start_offset="10" end_offset="-40">
335 <and offset="-2" mask="f0" value="20" result="false"/>
336 </rule>
338 We look at offset (EOF - 0x2), read one byte, do a bitwise AND with 0xf0 and
339 compare the result against 0x20. If that's NOT true, the real datablock of
340 the file starts at offset 0x10 and ends at EOF-0x40.
343 Example 3:
344 ----------
346 <rule start_offset="0" end_offset="EOF" operation="wordswap">
347 <data offset="0" value="504b0304" result="true"/>
348 </rule>
350 If we find a byte sequence of 0x504b0304 at offset 0, we use the full file for
351 hash calculation but we perform a 32bit word swap before the actual calculation
352 is done.
355 Example 4:
356 ----------
358 <rule start_offset="80" end_offset="EOF">
359 <data offset="64" value="41" result="true"/>
360 </rule>
361 <rule start_offset="80" end_offset="EOF">
362 <data offset="64" value="42" result="true"/>
363 </rule>
364 <rule start_offset="80" end_offset="EOF">
365 <data offset="64" value="43" result="true"/>
366 </rule>
367 <rule start_offset="80" end_offset="EOF">
368 <data offset="64" value="44" result="true"/>
369 </rule>
371 If the byte at offset 0x64 is either 41,42,43 or 44, we set the start to 0x80.
374 Example 5:
375 ----------
377 <rule start_offset="0" end_offset="EOF">
378 <file size="PO2" result="false"/>
379 <file size="1000" result="true" operator="greater"/>
380 </rule>
382 If the file size isn't a power of two and it is greater than 0x1000 bytes then
383 we take the whole file.
386 Example 6:
387 ----------
389 <rule start_offset="80" end_offset="-80" operation="byteswap">
390 <data offset="64" value="41435455414C2043" result="true"/>
391 <xor offset="20" mask="f0f0f0f0f0" value="2020202020" result="false"/>
392 <file size="PO2" result="false"/>
393 </rule>
395 Well...pretty much nonsense...but possible ;) Any useful and not so useful
396 combination can be used and combined. Feel free to play with it.
399 Forcing datfiles to use headers:
400 --------------------------------
402 By specifying the xml file name in the datfile header (with tag 'header'), you
403 can bind a datfile to this definition file. If the xml file isn't available or
404 not enabled, you'll see a warning after loading the datfile.
405 Example:
407 header nes.xml
410 Hints:
411 ------
413 Make use of the fact that rules and tests are tested in the order how they
414 are written. As soon as a rule is fulfilled the others are skipped and as
415 soon as a test fails, the others are skipped, too.
417 If you try to write a description to work on so called overdumps you may try
418 to add several rules which test for different filesize (decreasing order) and
419 set a correct as start/end offsets. Like if test for > 64k succeeds, limit it
420 to 64k, otherwise test for >32k,...>16k and so on.
424 Credits
425 =======
427 - Many thanks to Loqigx who came up with the XML idea and had some other nice
428 ideas what's needed in it and Cowering for telling me what he needs and how
429 some header detectors actually work.