The TDEStringMatcher class provides string matching against a list of one or more match patterns along with associated options. A single pattern with its associated options will be referred to herein as a "match specification". Current match specification options include: * Type of match pattern: REGEX: Pattern is a regular expression. WILDCARD: Pattern is a wildcard expression like that used in POSIX shell file globbing. SUBSTRING: Pattern is a simple substring that matches any string in which it occurs. Substring characters do not have any other meaning that controls matching. * Alphanumeric character handling in a pattern: NONE: each unescaped alphanumeric character in a pattern is distinct and will match only itself. CASE INSENSITIVE: each unescaped letter in a pattern will match its lower and upper case variants. EQUIVALENCE: Each unescaped variant of an alphanumeric character will match all stylistic and accented variations of that character. * Desired outcome of matching TRUE: match succeeds if a string matches the match pattern. FALSE: match succeeds if a string does NOT match the match pattern. Applications may set and get match specification lists either directly or indirectly (using an encoded match specifications string). The matching functions provided are: matchAny(): strings match if they match any pattern in list. matchAll(): strings match only if the match all patterns in list. MATCH SPECIFICATIONS STRING The TDEStringMatcher class provides applications an encoded match specifications string solely intended to be used for storing and retrieving match specifications. These strings are formatted as follows: OptionString PatternString [ OptionString PatternString ...] Option strings may contain only the following characters: 'r' - Match pattern is a regular expression [default] 'w' - Match pattern is a wildcard expression 's' - Match pattern is a simple substring 'c' - Letter case variants are distinct (e.g. case-sensitive) [default] 'i' - Letter case variants are equivalent (e.g. case-insensitive) 'e' - All letter & number character variants are equivalent '=' - Match succeeds if pattern matches [default] '!' - Match succeeds if pattern does NOT match (inverted match) Option strings should ideally contain exactly 3 characters indicating match pattern type, alphanumeric character handling, and desired outcome of matching. Specifying fewer option characters is possible but may result in unexpected inferred values. Specifying additional and possibly contradictory option characters is also possible, with later characters overriding earlier ones. Pattern strings may not be empty. Invalid pattern strings will cause the entire match specifications string to be rejected. Match specifications strings that are stored in TDE configuration files will be modified as follows: '\' characters in original pattern are encoded as '\\' The separator is encoded as '\t' Using file name matching as an example, the match specifications string: wc= .* rc= ~$ se! e ri= ^a.+\.[0-9]+$ encoded in a TDE configuration file as: wc=\t.*\trc=\t~$\tse!\te\tri=\t^a.+\\.[0-9]+$ will match file names as follows: * All "dotfiles" would be matched with wildcard matching. * All file names ending with '~' (e.g kwrite backup names) would be matched with case-sensitive regex matching. * All filenames that do NOT contain an equivalent variant of the letter 'e' (e.g. 'e','ê','Ě','E') would be matched with substring matching. * All file names starting with letter 'a' or 'A' and ending with '.' followed by one or more numeric digits would be matched with case- insensitive regex matching. IMPLEMENTATION NOTES: * Regular expressions are currently supported by TQRegExp and are thereby subject to its limitations and bugs. This may be changed in the future (e.g. direct access to PCRE2, porting of Qt 5.x QRegularExpression). * Wildcard pattern matching on GLIBC systems is done using the fnmatch function with GNU extended patterns supported. Consult the fnmatch(3) and glob(7) manual pages for more information. On non-GLIBC systems, basic (not extended) wildcard patterns are converted to basic regular expressions and processed by the underlying regular expression engine. * Simple substrings are also supported as match patterns. These are currently processed by the TQString.find() function. In the future, these may be converted and processed by the underlying regex engine, depending on the tradeoff between code simplification and efficiency. * Alphanumeric equivalence is conceptually similar to [=x=] POSIX equivalence class bracket expressions (which are not supported) but is intended to apply globally in patterns. The following are caveats when this option is utilized: - There is potentially significant overhead due to the fact that match patterns and match strings must be converted prior to matching. Conversion requires character-by-character lookup and replacement using a pre-built table. - The table contains equivalents for [0-9A-Z] which should work well for Latin-derived languages. It also contains support for other numeric and non-latin letter characters, the efficacy of which is not as certain. - Due to the 16-bit size limitation of TQChar, the table does not contain mappings for codepoints greater than U+FFFF.