CIF utility functions

Functions

int cif_get_api_version (char **version)
 Provides a string representation of the CIF API's version number.
UChar * cif_u_strdup (const UChar *str)
 Creates and returns a duplicate of a Unicode string.
int cif_normalize (const UChar *src, int32_t srclen, UChar **normalized)
 Converts (the initial part of) the specified Unicode string to a case-folded normalized form.
int cif_cstr_to_ustr (const char *cstr, int32_t srclen, UChar **ustr)
 Converts (the initial part of) a C String to a Unicode string via an ICU default converter.
int cif_analyze_string (const UChar *str, int allow_unquoted, int allow_triple_quoted, int32_t length_limit, struct cif_string_analysis_s *result)
 Analyzes a Unicode string with a view toward determining how it can be represented in CIF format as a data value.
int cif_is_reserved_string (const UChar *str)
 Determines whether a given string takes a reserved form that must not be presented whitespace-delimited in a CIF.

Function Documentation

int cif_analyze_string ( const UChar *  str,
int  allow_unquoted,
int  allow_triple_quoted,
int32_t  length_limit,
struct cif_string_analysis_s result 
)

Analyzes a Unicode string with a view toward determining how it can be represented in CIF format as a data value.

The analysis in particular recommends a (possibly empty) delimiter for the value, and it evaluates several other properties that may affect how it must be presented, such as whether the line-folding or text-prefixing protocol may need to be applied to it. Results are provided in a caller-provided structure.

This function considers delimiters in the following order: no delimiter (other than whitespace), apostrophe or quotation mark, triple apostrophe or triple quotation mark, newline/semicolon. It will recommend the first among those that is not explicitly excluded via function argument, and which is consistent with the input string. It applies CIF 2.0 rules for its evaluation in all cases, which will result in text-field (or triple-quoted) form being recommended for some values that could be expressed in CIF 1.1 with just apostrophe or quotation-mark delimiters.

For generality, this function always recommends one of the quoted forms for strings beginning with a semicolon. Some such strings can be presented whitespace-delimited, but not at the beginning of a line. Since the position at which a string may be presented is not a property of the string itself, this function considers it unsafe to use whitespace-delimited form to present any string beginning with a semicolon.

Parameters:
[in] str a NUL-terminated Unicode string to analyze
[in] allow_unquoted zero if whitespace-delimited form is not an acceptable alternative, otherwise nonezero
[in] allow_triple_quoted zero if triple-quoted form is not an acceptable alternative (i.e. for CIF 1.1 format), otherwise nonezero
[in] length_limit the line-length limit with which the formatted result must conform
[in,out] result a pointer to a struct cif_string_analysis_s to be filled in with the analysis result; must not be NULL
Returns:
CIF_OK on success, or an error code (typically CIF_ERROR ) on failure
int cif_cstr_to_ustr ( const char *  cstr,
int32_t  srclen,
UChar **  ustr 
)

Converts (the initial part of) a C String to a Unicode string via an ICU default converter.

This function is most applicable to C strings obtained from external input, rather than to strings appearing in C source code. ICU will normally try to guess what converter is appropriate for default text, but the converter it will use can be influenced via ucnv_setDefaultName() (warning: the default converter name is global). On successful conversion, the output Unicode string will be NUL terminated.

Parameters:
[in] cstr the C string to convert; may be NULL, in which case the conversion result is likewise NULL; if not NULL and srclen is -1, then must be terminated by a NUL byte, else termination is optional
[in] srclen the input string length, or -1 if the string consists of all bytes up to a NUL terminator
[in,out] ustr a pointer to a location to record the result; must not be NULL. If a non-NULL pointer is written here by this function (which can happen only on success), then the caller assumes ownership of the memory it references.
Returns:
CIF_OK on success, or an error code (typically CIF_ERROR ) on failure
int cif_get_api_version ( char **  version  ) 

Provides a string representation of the CIF API's version number.

Creates a character array, fills it with a string representation of the CIF API's version number, and records a pointer to it where the argument points. The caller is responsible for freeing the version string when it is no longer needed.

Parameters:
[in,out] version a pointer to location where a pointer to the version string should be recorded; must not be NULL
Returns:
CIF_OK on success, CIF_ARGUMENT_ERROR if version is NULL, or CIF_MEMORY_ERROR if space cannot be allocated for the version string
int cif_is_reserved_string ( const UChar *  str  ) 

Determines whether a given string takes a reserved form that must not be presented whitespace-delimited in a CIF.

This function looks for individual reserved characters at the beginning of the string, and compares the string overall to several reserved words and forms. Unlike most CIF API functions, this one does not return a CIF API result code.

Although CIF whitespace and certain other characters cannot appear anywhere in a data value that is presented whitespace-delimited, strings containing those characters are not for that reason considered "reserved" for the purposes of this function.

Parameters:
[in] str a NUL-terminated Unicode string to analyze; must not be NULL
Returns:
non-zero if and only if the string has a reserved form
int cif_normalize ( const UChar *  src,
int32_t  srclen,
UChar **  normalized 
)

Converts (the initial part of) the specified Unicode string to a case-folded normalized form.

The normalized form is that obtained by converting to Unicode normalization form NFD, applying the Unicode case-folding algorithm to the result (with default handling of Turkic dotless i), and renormalizing the case-folded form to Unicode normalization form NFC. The result string, if provided, becomes the responsibility of the caller. If not NULL, it is guaranteed to be NUL-terminated.

The normalized form output by this function is suitable for comparing CIF "case-insensitive" strings for equivalence, as equivalent strings will have identical normalized forms. This accounts not only for case folding itself, but also for combining marks, including sequences thereof. It does not, however, erase distinctions between different Unicode characters that are typically rendered similarly (so-called "compatibility equivalents"), as that would constitute a semantic change.

Parameters:
[in] src the Unicode string to normalize; must not be NULL
[in] srclen the maximum length of the input to normalize; if less than zero then the whole string is normalized up to the first NUL character (which otherwise does not need to be present); must not exceed the actual number of UChars in the source string
[in,out] normalized a pointer to a location to record the result; if NULL then the result is discarded, but the return code still indicates whether normalization was successful. If non-NULL, then the pointer at the specified location is overwritten, and the caller assumes responsibility for freeing the memory to which the new value points.
Returns:
CIF_OK on success, or an error code (typically CIF_ERROR ) on failure
UChar* cif_u_strdup ( const UChar *  str  ) 

Creates and returns a duplicate of a Unicode string.

It is sometimes useful to duplicate a Unicode string, but ICU does not provide an analog of strdup() for that purpose. The CIF API therefore provides its own, and makes it available for general use.

Behavior is undefined if the argument is not terminated by a NUL (Unicode) character.

Parameters:
[in] str the NUL-terminated Unicode string to duplicate; the caller retains ownership of this object
Returns:
Returns a pointer to the duplicate, or NULL on failure or if the argument is NULL. Responsibility for the duplicate, if any, belongs to the caller.
 All Data Structures Variables

Copyright 2014, 2015 John C. Bollinger