Since the built-in parser allows syntax and grammar errors to be ignored, it must provide a mechanism for recovering from such errors to continue the parse. In general, there is no single, clear approach for recovering from any given error, so the following table documents the approach taken by the parser in each case.
Condition | Code | Recovery action |
---|---|---|
Wrong character encoding |
| ignore the problem |
CIF 2.0 must consist of Unicode character data encoded in UTF-8, and this error is emitted if the input is recognized to be encoded differently. That this error is emitted at all generally means that the parser has identified the signature of a different (known) encoding, so it can recover by reading the data according to the detected encoding, even though such input does not comply with the CIF 2.0 specifications. | ||
Disallowed input character |
| substitute a replacement character |
This error indicates that an input character outside the allowed set was read. The parser can recover by accepting the character. Which characters are allowed depends on which version of CIF is being parsed. | ||
Missing whitespace |
| assume the omitted whitespace |
Whitespace separation is required between most CIF grammatic units. In some cases, the omission of such whitespace can be recognized by the parser, resulting in this error. In particular, this is the error that will be reported when a CIF1-style string with embedded delimiter is encountered when parsing in CIF 2 mode. If the opening delimiter of a table is omitted then this error will occur at trailing colon of each table key. | ||
Invalid block code |
| use the block code anyway |
Although the API's CIF manipulation functions will not allow blocks with invalid codes to be created directly by client programs, the parser can and will create such blocks to accommodate inputs that use such codes. The result is not a valid instance of the CIF data model. | ||
Duplicate block code |
| reopen the specified block |
Block codes must be unique within a given CIF. To handle a duplicate block code, the parser reopens the specified block and parses the following contents into it. This may well lead to additional errors being reported. | ||
Missing block header |
| parse into an anonymous block |
To handle data that appear prior to any block header, the parser creates a data block with an empty name and parses into that. The data are available via that name, but the result is not a valid instance of the CIF data model. | ||
Invalid frame code |
| use the frame code anyway |
Although the API's CIF manipulation functions will not allow save frames with invalid codes to be created directly by client programs, the parser can and will create such frames to accommodate inputs that use such codes. The result is not a valid instance of the CIF data model. | ||
Duplicate frame code |
| reopen the specified frame |
Save frame codes must be unique within a their containing block or save frame. To handle a duplicate frame code, the parser reopens the specified frame and parses the following contents into it. This may well lead to additional errors being reported. | ||
Disallowed save frame |
| accept the save frame |
This error occurs when a save frame header is encountered while parsing with save frame support completely disabled. The parser recovers by parsing the frame as if save frame support were enabled at the default level. | ||
Unterminated save frame |
| assume the missing terminator |
This error occurs when a data block header is encountered while parsing a save frame, or when a save frame header is encountered while parsing a save frame with nested frames disabled (the default). The parser recovers by assuming the missing save frame terminator at the position where the error is detected. | ||
Unterminated save frame at end-of-file |
| assume the missing terminator |
This is basically the same as the CIF_NO_FRAME_TERM case, but triggered when the end of input occurs while parsing a save frame. This case is distinguished in part because it may indicate a truncated input. | ||
Unexpected save frame terminator |
| ignore |
If a save frame terminator is encountered outside the scope of a save frame, the parser recovers by ignoring it. This condition cannot be distinguished from the alternative that a save frame header is given without any frame code. | ||
Duplicate data name |
| parse and drop the item |
If a duplicate item name is encountered then it and its associated value(s) are dropped, including when the duplicate appears in a loop. | ||
Unexpected value |
| ignore the value |
This error occurs when a value appears outside a list or loop without being paired with a dataname or (in a table) a key. | ||
Unexpected (closing) delimiter |
| ignore the delimiter |
This error occurs when a list or table closing delimiter appears without matching opening delimiter preceding it. This can happen when such a delimiter appears in the middle of a whitespace-delimited data value. | ||
Missing data value |
| use a synthetic unknown value |
This error occurs when a data name or table key appears without a paired value. The parser recovers by synthesizing unknown-value placeholder value. | ||
Empty loop header |
| ignore |
This occurs when the | ||
Truncated loop packet |
| fill out the packet with unknown values |
This error occurs when the number of data values in a loop is not an integral multiple of the number of data names. In such cases, the parser can recover by filling in the missing values with out with unknown-value placeholder values. | ||
Empty loop |
| accept |
This occurs when a valid loop header is not followed by any values. The parser recovers by accepting the empty loop, which can be accommodated by the API's internal CIF representation. The result is not a valid instance of the CIF data model, however. | ||
Unterminated list or table |
| assume the missing delimiter |
If the closing delimiter of a list or table is omitted, then the parser recovers by assuming the terminator to appear at the point where its absence is recognized. | ||
Missing table key |
| drop the value |
If a value appears inside a table without an associated key, then the parser recovers by dropping it. | ||
Missing table key |
| use a NULL key |
If a table entry contains a (colon, value) without any key representation at all (not even an empty string), then the parser can recover by using a NULL key. The result is not a valid instance of the CIF data model. | ||
Unquoted table key |
| accept |
This case is distinguished from the | ||
Text block as a table key |
| accept |
CIF 2.0 does not allow text blocks to be used as table keys, but this is a somewhat artificial restriction. If the parser encounters a table key quoted with newline/semicolon delimiters then it can recover by accepting that key as valid. | ||
Missing text prefix |
| accept |
The text prefixing protocol requires every line of a prefixed text field to start with the chosen prefix. If any line fails to do so then the parser can typically recover by simply accepting that line verbatim. | ||
Invalid unquoted value |
| accept |
A whitespace-delimited data value has a restricted character repertoire and a more-restricted first character. When the parser recognizes that one of these restrictions has not been obeyed, it can recover by accepting the value as-is. | ||
Unquoted reserved word |
| drop |
The the strings 'data_' (without a block code), 'stop_', and 'global_' are reserved and must not appear as unquoted complete words in CIFs. If the parser encounters one, it can recover by dropping it. | ||
Overlength line |
| drop |
If a CIF input line exceeds the allowed number of characters (2048 in CIF 1.1 and CIF 2.0) then the parser can recover by ignoring the problem. Note that the limit is expressed in Unicode characters -- not bytes, nor even | ||
Missing endquote |
| assume the quote |
When a (single-) apostrophe-quoted or quotation-mark-quoted string is not terminated before the end of the line on which it begins, the parser can recover by assuming the missing delimiter at the end of the line. | ||
Unterminated multiline string |
| assume the closing delimiter |
When a text block or triple-apostrophe-quoted or triple-quotation-mark-quoted string is not terminated before the end of the end of the input, the parser can recover by assuming the missing delimiter at the end of the input. In such cases, that is often much more text than the value was meant to include, but there is no reliable way to determine where it was supposed to end. | ||
Disallowed first character |
| accept |
There are slightly different rules for the first character of a CIF than for others, in that a Unicode byte-order mark (U+FEFF) is allowed there. Moreover, an unexpected character at that position can be an indication of a mis-identified character encoding. The parser can recover by accepting the character, but that will result in at least one subsequent error. |
Copyright 2014, 2015 John C. Bollinger