The TDEStringMatcher class provides string matching against a list of one or more match patterns along with associated options. A single pattern with its associated options will be referred to herein as a "match specification". Current match specification options include: * Type of match pattern: REGEX: Pattern is a regular expression whose syntax is currently limited to that supported by the TQRegExp class. WILDCARD: Pattern is a wildcard expression used in POSIX shell file globbing. SUBSTRING: Pattern is a simple substring that matches any string in which it occurs. Substring characters do not have any other meaning that controls matching. * Alphanumeric character handling in a pattern: NONE: each unescaped alphanumeric character in a pattern is distinct and will match only itself. CASE INSENSITIVE: each unescaped letter in a pattern will match its lower and upper case variants. EQUIVALENCE: Each unescaped variant of an alphanumeric character will match all stylistic and accented variations of that character. * Desired outcome of matching TRUE: match succeeds if a string matches the match pattern. FALSE: match fails if a string matches the match pattern. A list of match specifications may be codified in a string formatted as a vertical tab (VT) separated list of substrings as follows: OptionString PatternString [ OptionString PatternString ...] Non-empty option strings may contain only the following characters: 'r' - Match pattern is a regular expression [default] 'w' - Match pattern is a wildcard expression 's' - Match pattern is a simple substring 'c' - Letter case variants are distinct (e.g. case-sensitive) [default] 'i' - Letter case variants are equivalent (e.g. case-insensitive) 'e' - All letter & number character variants are equivalent '=' - Match succeeds if pattern matches [default] '!' - Match fails if pattern matches (inverted match) Options set in option string remain in effect until subsequently overridden. While option strings may be empty, pattern strings may not be empty. Backslash characters in pattern strings should be represented by "\\"; all other characters should be specified literally. The following is an example of a string representing a match specification list intended to apply to file names w .* e e* cr ~$ \\.[0-9]+ The corresponding match specification list will match as follows: * All "dotfiles" would be matched with wildcard matching. * All filenames beginning with an equivalent variant of the letter 'e' (e.g. 'e','ê','Ě','E') would be matched with wildcard matching. * All file names ending with '~' (e.g kwrite backup names) would be matched with case-sensitive regex matching. * All file names having a numeric digit filename suffix (e.g. wget backup files) would be matched with case-sensitive regex matching. Applications may set and get match specification lists either directly or indirectly (as a match specification list string).The matching functions provided are: matchAny(): strings match if they match any pattern in list. matchAll(): strings match only if the match all patterns in list. IMPLEMENTATION NOTES: * Wildcard match patterns are currently limited to POSIX wildcards. Extended wildcard-like expressions are not currently supported (e.g. Bash globstar, extglob, brace expansion). * Wildcard match patterns are currently converted to regular expressions and processed as such instead of using dedicated functions such as fnmatch(3) or glob(3). This may change in the future. * Regular expressions are currently supported by TQRegExp and are thereby subject to its limitations and bugs. This may be changed in the future (e.g. direct access to pcre2(3), porting of Qt QRegularExpression). * Simple substrings are also supported as match patterns. These are currently processed by the TQString.find() function. In the future, these may be converted and processed by the underlying Regex engine, depending on the tradeoff between code simplification and efficiency. * Alphanumeric equivalence is conceptually similar to [=x=] POSIX equivalence class bracket expressions (which are not supported) but is intended to apply globally in patterns. The following are caveats when this option is utilized: - There is potentially significant overhead due to the fact that match patterns and match strings must be converted prior to matching. Conversion requires character-by-character lookup and replacement using a pre-built table. - The table contains equivalents for [0-9A-Z] which should work well for Latin-derived languages. It also contains support for other numeric and non-latin letter characters, the efficacy of which is not as certain. - Due to the 16-bit size limitation of TQChar, the table does not contain mappings for codepoints greater than U+FFFF. * The choice of VT as the match specification string separator was based on the following considerations: - It is a control character that is unlikely to occur in a pattern. If it is desired to match the VT character, then that can be done in a regular expression by specifying '\v'. - Unlike the potentially more readable HT, TDEConfig will not attempt to escape it when storing strings containing it. - Unlike other control characters, files containing text with that character are less likely to be misidentified as binary. - Most common text editors (and `less`) represent that character as a symbol that can be copied and pasted. - Text containing that character displays predictably in a terminal.