From 46fedcbe96ed7aff1bbebf468da28222cd74dac6 Mon Sep 17 00:00:00 2001 From: Vincent Reher Date: Sun, 15 Jan 2023 17:44:39 -0800 Subject: [PATCH] Introduced an implementation-agnostic interface to TSM centered around a structure describing a match pattern and associated options such as: - Pattern type: regex, wildcard (see Note 1), and substring (new). - Alphanumeric character handling: case-sensitive, case-insensitive, equivalent character matching (new - see Note 2). - Match goal: "matching" vs. "inverted matching" (new). Introduced additional "get*" and "set*" functions to accommodate the new list-of-structures interfaces to TSM. Incorporated most of MichelleC's feedback through 2022-12-04 on previously committed code. Other code cleanup. See file tdecore/README.tdestringmatcher for additional information. Note 1: The implementation of wildcard support no longer uses the TQRegExp.setWildcard() function for regex conversion or the TQRegExp.exactMatch() function for matching. Note 2: Full implementation of equivalent character matching is postponed to a future commit. Signed-off-by: Vincent Reher --- tdecore/README.tdestringmatcher | 160 ++++++--- tdecore/tdeglobal.cpp | 5 +- tdecore/tdestringmatcher.cpp | 555 ++++++++++++++++++++++++++------ tdecore/tdestringmatcher.h | 103 +++++- tdeio/tdeio/tdefileitem.cpp | 48 ++- tdeio/tdeio/tdefileitem.h | 26 +- 6 files changed, 685 insertions(+), 212 deletions(-) mode change 100644 => 100755 tdecore/tdestringmatcher.cpp diff --git a/tdecore/README.tdestringmatcher b/tdecore/README.tdestringmatcher index b5312d17d..5b2baf837 100644 --- a/tdecore/README.tdestringmatcher +++ b/tdecore/README.tdestringmatcher @@ -1,63 +1,131 @@ -The TDEStringMatcher class provides string matching against a list of -one or more TQRegExp objects. The matching functions currently provided -are: +The TDEStringMatcher class provides string matching against a list of one +or more match patterns along with associated options. A single pattern with +its associated options will be referred to herein as a "match specification". - matchAny(): strings match if they match any TQRegExp object in list. - matchAll(): strings match only if the match all TQRegExp objects in list. +Current match specification options include: -The list of TQRegExp objects is constructed using a specially formatted string -that is passed by applications via the generatePatternList() function. This -string is referred to as the "patternString" and is formatted as follows: + * Type of match pattern: - * The first character of the patternString defines the character that - is used to split the remainder of the string into match specification - strings. It is recommended (but not required) that this be a character - that does not occur in a match pattern (see below). + REGEX: Pattern is a regular expression whose syntax is + currently limited to that supported by the TQRegExp class. + WILDCARD: Pattern is a wildcard expression used in POSIX + shell file globbing. + SUBSTRING: Pattern is a simple substring that matches any + string in which it occurs. Substring characters do not + have any other meaning that controls matching. - * Each match specification string consists of an initial character that - designates the match specification type followed by the corresponding - match specification itself. + * Alphanumeric character handling in a pattern: - * Match specification strings starting with 'o' are followed by zero or - or more letters, each of which are interpreted as setting or unsetting - a match option. Match options remain in effect until explicitly overridden - by subsequent match options. The match option letters that are currently - recognized and processed are: + NONE: each unescaped alphanumeric character in a pattern + is distinct and will match only itself. + CASE INSENSITIVE: each unescaped letter in a pattern + will match its lower and upper case variants. + EQUIVALENCE: Each unescaped variant of an alphanumeric + character will match all stylistic and accented + variations of that character. - 'w' - Match patterns are to be interpreted as "wildcards" - 'r' - Match patterns are to be interpreted as "regexes" (TQRegExp default) - 'c' - Matching will be case-sensitive (TQRegExp default) - 'c' - Matching will be case-INsensitive - 'm' - Regex matching will be "minimal" versus "greedy" - 'g' - Regex matching will be "greedy" (TQRegExp default) + * Desired outcome of matching - * Match specification strings starting with 'p' are followed by one or - more characters that constitute a match pattern. Each match pattern - together with the match options in effect are used to construct - a single TQRegExp object to be used for string matching. This object - will be validated before acceptance. + TRUE: match succeeds if a string matches the match pattern. + FALSE: match fails if a string matches the match pattern. - * Match specification strings starting with any other character are ignored. +A list of match specifications may be codified in a string formatted as +a vertical tab (VT) separated list of substrings as follows: -Some examples of patternString: + OptionString PatternString [ OptionString PatternString ...] - * |ow|p.*|p*~ - Dotfiles and kwrite backup files will be matched via wildcards +Non-empty option strings may contain only the following characters: - * /or/p[.][0-9]+$/ow/p.* - Files with a version number suffix will be matched via regex - and dotfiles will be matched via wildcard + 'r' - Match pattern is a regular expression [default] + 'w' - Match pattern is a wildcard expression + 's' - Match pattern is a simple substring + 'c' - Letter case variants are distinct (e.g. case-sensitive) [default] + 'i' - Letter case variants are equivalent (e.g. case-insensitive) + 'e' - All letter & number character variants are equivalent + '=' - Match succeeds if pattern matches [default] + '!' - Match fails if pattern matches (inverted match) -Current and potential use of the TDEStringMatcher class include: +Options set in option string remain in effect until subsequently overridden. - * Expansion of definition of "hidden" files, addressing issue # 270. +While option strings may be empty, pattern strings may not be empty. Backslash +characters in pattern strings should be represented by "\\"; all other characters +should be specified literally. - * Creation of a subclass that provides string parsing functions. +The following is an example of a string representing a match specification list +intended to apply to file names -Miscellaneous implementation notes: + w .* e e* cr ~$ \\.[0-9]+ - * It is the responsibility of applications that use this class to provide - patternString editing (perhaps via UI), storage, and retrieval. +The corresponding match specification list will match as follows: - * Regex patterns use TQRegExp::search for matching but wildcard patterns - must use TQRegExp::exactMatch in order for matching to work correctly. + * All "dotfiles" would be matched with wildcard matching. + * All filenames beginning with an equivalent variant of the letter 'e' + (e.g. 'e','ê','Ě','E') would be matched with wildcard matching. + * All file names ending with '~' (e.g kwrite backup names) would be + matched with case-sensitive regex matching. + * All file names having a numeric digit filename suffix (e.g. wget backup + files) would be matched with case-sensitive regex matching. + +Applications may set and get match specification lists either directly or +indirectly (as a match specification list string).The matching functions +provided are: + + matchAny(): strings match if they match any pattern in list. + matchAll(): strings match only if the match all patterns in list. + +IMPLEMENTATION NOTES: + + * Wildcard match patterns are currently limited to POSIX wildcards. + Extended wildcard-like expressions are not currently supported + (e.g. Bash globstar, extglob, brace expansion). + + * Wildcard match patterns are currently converted to regular + expressions and processed as such instead of using dedicated + functions such as fnmatch(3) or glob(3). This may change in + the future. + + * Regular expressions are currently supported by TQRegExp and are + thereby subject to its limitations and bugs. This may be changed + in the future (e.g. direct access to pcre2(3), porting of Qt + QRegularExpression). + + * Simple substrings are also supported as match patterns. These are + currently processed by the TQString.find() function. In the future, + these may be converted and processed by the underlying Regex engine, + depending on the tradeoff between code simplification and efficiency. + + * Alphanumeric equivalence is conceptually similar to [=x=] POSIX + equivalence class bracket expressions (which are not supported) + but is intended to apply globally in patterns. The following + are caveats when this option is utilized: + + - There is potentially significant overhead due to the fact that + match patterns and match strings must be converted prior to + matching. Conversion requires character-by-character lookup + and replacement using a pre-built table. + + - The table contains equivalents for [0-9A-Z] which should work + well for Latin-derived languages. It also contains support for + other numeric and non-latin letter characters, the efficacy of + which is not as certain. + + - Due to the 16-bit size limitation of TQChar, the table does not + contain mappings for codepoints greater than U+FFFF. + + * The choice of VT as the match specification string separator was + based on the following considerations: + + - It is a control character that is unlikely to occur in a pattern. + If it is desired to match the VT character, then that can be done + in a regular expression by specifying '\v'. + + - Unlike the potentially more readable HT, TDEConfig will not attempt + to escape it when storing strings containing it. + + - Unlike other control characters, files containing text with that + character are less likely to be misidentified as binary. + + - Most common text editors (and `less`) represent that character as + a symbol that can be copied and pasted. + + - Text containing that character displays predictably in a terminal. diff --git a/tdecore/tdeglobal.cpp b/tdecore/tdeglobal.cpp index 6eca24169..bb65b7743 100644 --- a/tdecore/tdeglobal.cpp +++ b/tdecore/tdeglobal.cpp @@ -138,9 +138,10 @@ TDEStringMatcher *TDEGlobal::hiddenFileMatcher() TSMTRACE << "TDEGlobal::hiddenFileMatcher(): Global HFM initialization STARTED" << endl; _hiddenFileMatcher = new TDEStringMatcher(); TDEGlobal::config()->setGroup( "General" ); - TQString settings = TDEGlobal::config()->readEntry( "globalHiddenFileSpec", "/oW/.*" ); + char settingsDefault[5] = { 'w', SEP, '.', '*', 0 }; // wildcard match of dotfiles + TQString settings = TDEGlobal::config()->readEntry( "globalHiddenFileSpec", settingsDefault ); TSMTRACE << "TDEGlobal::hiddenFileMatcher(): using retrieved patternString = '" << settings << "'" << endl; - _hiddenFileMatcher->generatePatternList( settings ); + _hiddenFileMatcher->setMatchSpecs( settings ); } return _hiddenFileMatcher; diff --git a/tdecore/tdestringmatcher.cpp b/tdecore/tdestringmatcher.cpp old mode 100644 new mode 100755 index 74e0cd7af..61624245a --- a/tdecore/tdestringmatcher.cpp +++ b/tdecore/tdestringmatcher.cpp @@ -3,167 +3,512 @@ #include #include +typedef TQValueVector RegexList; + class TDEStringMatcher::TDEStringMatcherPrivate { public: - TQString patternString; -}; // FIXME: This may be too small to warrant a private class :\ + + // Properties that may be set / accessed through the TSM interface + TQString m_matchSpecString; + MatchSpecList m_matchSpecList; + + // Properties that implementation only + RegexList m_regexList; + /* Individual TQRegExp objects would not be used to process + a PatternType doesn't require a regex engine for matching + but we may "borrow" the TQRegExp.pattern() field to store + a "converted" version of the pattern. + */ +}; TDEStringMatcher::TDEStringMatcher() { - p = new TDEStringMatcherPrivate; TSMTRACE << "TDEStringMatcher::TDEStringMatcher: New instance created: " << this << endl; + p = new TDEStringMatcherPrivate; } TDEStringMatcher::~TDEStringMatcher() { - patternList.setAutoDelete( true ); - patternList.clear(); + p->m_matchSpecList.clear(); + p->m_regexList.clear(); delete p; TSMTRACE << "TDEStringMatcher::TDEStringMatcher: Instance destroyed: " << this << endl; } -TQString TDEStringMatcher::getPatternString() +//================================================================================================ +// Match specification output functions +//================================================================================================ + +TQString TDEStringMatcher::getMatchSpecString() { - return p->patternString; + return p->m_matchSpecString; } -bool TDEStringMatcher::generatePatternList( TQString newPatternString ) +MatchSpecList TDEStringMatcher::getMatchSpecs() { - if ( newPatternString == p->patternString ) + return p->m_matchSpecList; +} + + +//================================================================================================ +// Match specification input functions +//================================================================================================ + +bool TDEStringMatcher::setMatchSpecs( MatchSpecList newMatchSpecList ) +{ + + RegexList newRegexList; + + TQString optionString = "rc" ; // start with defaults + TQStringList newMatchSpecs; + + TQRegExp rxWork; + + TSMTRACE << "TDEStringMatcher::setPatterns: validating match specification list" << endl; + + for ( MatchSpec matchSpec : newMatchSpecList ) { + + if ( matchSpec.pattern.isEmpty() ) { + TSMTRACE << " Error: empty pattern!" << endl; + newRegexList.clear(); + return false; + } + if ( matchSpec.pattern.find( TQChar(SEP) ) >= 0 ) { + TSMTRACE << " Error: pattern contains reserved separator character" << endl; + newRegexList.clear(); + return false; + } + + switch ( matchSpec.patternType ) { + + // The following pattern types will be using TQRegExp functions for matching + case PatternType::REGEX : + optionString += TQChar('r'); + rxWork.setPattern( matchSpec.pattern ); + break; + case PatternType::WILDCARD : + optionString += TQChar('w'); + rxWork.setPattern( wildcardToRegex( matchSpec.pattern ) ); + break; + + // The following pattern types will be using TQString functions for matching + case PatternType::SUBSTRING : + optionString += TQChar('s'); + rxWork.setPattern( matchSpec.pattern ); // we will "borrow" this field + break; + + default: + newRegexList.clear(); + TSMTRACE << " Error: pattern type out of range" << endl; + return false; + } + + switch ( matchSpec.ancHandling ) { + + case ANCHandling::CASE_SENSITIVE : + optionString += TQChar('c'); + rxWork.setCaseSensitive( true ); + break; + case ANCHandling::CASE_INSENSITIVE : + optionString += TQChar('i'); + rxWork.setCaseSensitive( false ); + break; + case ANCHandling::EQUIVALENCE : + optionString += TQChar('e'); + rxWork.setCaseSensitive( true ); + // FIXME TBD: This is where we will be converting each (unescaped) + // alphanumeric character in rxWork.pattern to its "least" equivalent. + break; + default: + newRegexList.clear(); + TSMTRACE << " Error: alphabetic character handling specification out of range" << endl; + return false; + } + + if ( matchSpec.wantMatch ) + optionString += TQChar('='); + else + optionString += TQChar('!'); + + if (! rxWork.isValid() ) { + TSMTRACE << " Error: invalid pattern syntax'" << endl; + newRegexList.clear(); + return false; + } + + // This particular match specification is good + + newMatchSpecs.append( optionString ); + newMatchSpecs.append( matchSpec.pattern ); + newRegexList.append( rxWork ); + optionString = ""; + } + + // All proposed match specifications are good, update everything accordingly + + p->m_matchSpecList.clear(); p->m_matchSpecList = newMatchSpecList; + p->m_regexList.clear(); p->m_regexList = newRegexList; + p->m_matchSpecString = newMatchSpecs.join( TQChar(SEP) ); + emit patternsChanged(); + + return true; +} + +//================================================================================================= + +bool TDEStringMatcher::setMatchSpecs( TQString newMatchSpecString ) +{ + MatchSpecList newMatchSpecList; + RegexList newRegexList; + + TQRegExp rxWork; // single working copy == each pattern inherits previous options + + MatchSpec matchSpec = { + PatternType::DEFAULT, + ANCHandling::DEFAULT, + true, // seeking matches, not non-matches + "" + }; + + if ( newMatchSpecString == p->m_matchSpecString ) return true; - TSMTRACE << "TDEStringMatcher::generatePatternList: Proposed pattern string: <" << newPatternString << ">" << endl; - if ( newPatternString.length() < 2 ) { - TSMTRACE << " Input string too short to be interpreted, patterns will be cleared" << endl; - patternList.clear(); - p->patternString = "" ; -#ifdef TSMSIGNALS + TSMTRACE << "TDEStringMatcher::setPatterns: Proposed match specification string: <" << newMatchSpecString << ">" << endl; + + if ( newMatchSpecString.isEmpty() ) { + TSMTRACE << " Empty pattern string => match specifications will be cleared" << endl; + p->m_matchSpecList.clear(); + p->m_regexList.clear(); + p->m_matchSpecString = ""; emit patternsChanged(); -#endif // TSMSIGNALS return true; } - TQChar patternStringDivider = newPatternString[0]; - TSMTRACE << " patternStringDivider = '" << patternStringDivider << "'" << endl; - TQStringList specList = TQStringList::split( patternStringDivider, newPatternString.mid(1), true ); - - TQRegExp rxWork; - TQPtrList rxPatternList; - - for ( const TQString &specification : specList ) { - TSMTRACE << " Processing specification string: '" << specification << "'" << endl; - TQChar specificationType = specification[0].lower(); - switch ( specificationType ) { - case 'o' : { - TQString optionString = specification.mid(1).lower(); - TSMTRACE << " Processing match option string: '" << optionString << "'" << endl; - for ( int i = 0 ; i < optionString.length() ; i++ ) { - TQChar optionChar = optionString[i]; - TSMTRACE << " Option character: '" << optionChar << "'" << endl; - switch ( optionChar ) { - case 'w' : rxWork.setWildcard( true ); break; - case 'r' : rxWork.setWildcard( false ); break; - case 'c' : rxWork.setCaseSensitive( true ); break; - case 'i' : rxWork.setCaseSensitive( false ); break; - case 'm' : rxWork.setMinimal( true ); break; - case 'g' : rxWork.setMinimal( false ); break; - default: break; - } - } - TSMTRACE << " Wildcard/CaseSensitive settings: " << rxWork.wildcard() << "/" << rxWork.caseSensitive() << endl; + TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true ); + + if ( newMatchSpecs.count() % 2 != 0 ) { + TSMTRACE << " Error: match specification string must contain an even number of components" << endl; + return false; + } + TSMTRACE << newMatchSpecs.count() << endl; + + bool processingPattern = false; // expected format: option string , pattern string, ... + + for ( TQString &specification : newMatchSpecs ) { + + if ( specification.find( TQChar(SEP) ) >= 0 ) { + TSMTRACE << " Error: match specification string contains reserved separator character" << endl; + newMatchSpecList.clear(); + newRegexList.clear(); + return false; + } + + if ( processingPattern ) { + TSMTRACE << " Processing match pattern string: '" << specification << "'" << endl; + + if ( specification.isEmpty() ) { + TSMTRACE << " Error: empty patterns are not allowed" << endl; + newMatchSpecList.clear(); + newRegexList.clear(); + return false; } - break; - case 'p' : { - TQString pattern = specification.mid(1); - TSMTRACE << " Processing match pattern: '" << pattern << "'" << endl; - if ( pattern.isEmpty() ) { - TSMTRACE << " Empty patterns are not allowed" << endl; - rxPatternList.clear(); - return false; - } - rxWork.setPattern( pattern ); - if (! rxWork.isValid() ) { - TSMTRACE << " Invalid pattern" << endl; - rxPatternList.clear(); - return false; - } - TQRegExp *rxPattern = new TQRegExp( rxWork ); - rxPatternList.append( rxPattern ); + // Prepare regex + + switch ( matchSpec.patternType ) { + + // The following pattern types will be using TQRegExp functions for matching + case PatternType::REGEX : + rxWork.setPattern( specification ); + break; + case PatternType::WILDCARD : + rxWork.setPattern( wildcardToRegex( specification ) ); + break; + + // The following pattern types will be using TQString functions for matching + case PatternType::SUBSTRING : + rxWork.setPattern( specification ); // used for storage only + break; + + default: + continue; // should not arise } - break; - default : - TSMTRACE << " Ignoring unknown specification type '" << specificationType << "'" << endl; - //-Relax, don't overreact: rxPatternList.clear(); - //-Relax, don't overreact: return false; - break; + switch ( matchSpec.ancHandling ) { + case ANCHandling::CASE_SENSITIVE : + rxWork.setCaseSensitive( true ); + break; + case ANCHandling::CASE_INSENSITIVE : + rxWork.setCaseSensitive( false ); + break; + case ANCHandling::EQUIVALENCE : + rxWork.setCaseSensitive( false ); + // FIXME TBD: This is where we will be converting each (unescaped) + // alphanumeric character in rxWork.pattern to its "least" equivalent. + break; + default: + continue; // should not arise + } + + // Test regex + + if (! rxWork.isValid() ) { + TSMTRACE << " Error: invalid pattern syntax'" << endl; + newMatchSpecList.clear(); + newRegexList.clear(); + return false; + continue; + } + + // if (! rxWork.isReallyWhatUserIntended() ) { HA HA + + TSMTRACE << " Final Wildcard/CaseSensitive settings: " << rxWork.wildcard() << "/" << rxWork.caseSensitive() << endl; + + matchSpec.pattern = specification; + newMatchSpecList.push_back( matchSpec ); + newRegexList.append( rxWork ); + processingPattern = false; // next spec should be an option string + continue; + } + + specification = specification.lower(); + TSMTRACE << " Processing match option string: '" << specification << "'" << endl; + for ( int i = 0 ; i < specification.length() ; i++ ) { + TQChar optionChar = specification[i]; + TSMTRACE << " Option character: '" << optionChar << "'" << endl; + + switch ( optionChar ) { + case 'r' : matchSpec.patternType = PatternType::REGEX ; break; + case 'w' : matchSpec.patternType = PatternType::WILDCARD ; break; + case 's' : matchSpec.patternType = PatternType::SUBSTRING ; break; + case 'c' : matchSpec.ancHandling = ANCHandling::CASE_SENSITIVE ; break; + case 'i' : matchSpec.ancHandling = ANCHandling::CASE_INSENSITIVE; break; + case 'e' : matchSpec.ancHandling = ANCHandling::EQUIVALENCE ; break; + case '=' : matchSpec.wantMatch = true ; break; + case '!' : matchSpec.wantMatch = false ; break; + default: + // We reserve ALL other possible option characters for future use! + TSMTRACE << " Error: invalid option character" << endl; + return false; + } } + + processingPattern = true; // next spec should be a pattern string } - // patternList.clear(); // no need to do this? - patternList.setAutoDelete( true ); - patternList = rxPatternList; - p->patternString = newPatternString; - // rxPatternList.clear(); // no need to do this? + p->m_matchSpecList.clear(); p->m_matchSpecList = newMatchSpecList; + p->m_regexList.clear(); p->m_regexList = newRegexList; + p->m_matchSpecString = newMatchSpecString; - TSMTRACE << " Final patternString: '" << p->patternString << "'" << endl; - TSMTRACE << " Number of regex match patterns in list: '" << patternList.count() << "'" << endl; + //newRegexList.clear(); // no need to do this? -#ifdef TSMSIGNALS + TSMTRACE << " Final patternString: '" << p->m_matchSpecString << "'" << endl; + TSMTRACE << " Number of regex match patterns in list: '" << p->m_regexList.count() << "'" << endl; TSMTRACE << " Notifying slots of pattern change" << endl; emit patternsChanged(); TSMTRACE << " All slots have been notified" << endl; -#endif // TSMSIGNALS - TSMTRACE << "TDEStringMatcher::generatePatternList: Patterns were successfully regenerated" << endl << endl; + TSMTRACE << "TDEStringMatcher::setPatterns: Patterns were successfully regenerated" << endl << endl; return true; } +//================================================================================================ +// Match functions +//================================================================================================ + bool TDEStringMatcher::matchAny( const TQString& stringToMatch ) { - //-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl; - for ( const TQRegExp *rxPattern : patternList ) { - if ( - ( rxPattern->wildcard() && rxPattern->exactMatch( stringToMatch ) ) || - ( ! rxPattern->wildcard() && rxPattern->search( stringToMatch ) >= 0) - ) + TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl; + if ( p->m_matchSpecList.isEmpty() ) { + //-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl; + return false; //FIXME: or should that be true per MicheleC's comment? + } + + TQString equivalentString; + + for ( size_t index = 0 ; index < p->m_matchSpecList.count() ; index++ ) + { + TQString matchThis = stringToMatch; + if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE ) { - //-Debug: TSMTRACE << "String matched pattern: '" << rxPattern->pattern() << "'" << endl; - return true; + if ( equivalentString.isNull() ) { + // FIXME TBD: This is where we will be converting each alphanumeric + // character in stringToMatch to its "least" equivalent and storing + // the result in equivalentString. Until then, we'll just do: + equivalentString = stringToMatch; + } + matchThis = equivalentString; + } + + switch ( p->m_matchSpecList[index].patternType ) { + + case PatternType::REGEX : + case PatternType::WILDCARD : + if ( + ( p->m_regexList[index].search( matchThis ) >= 0 ) // was there a match? + == p->m_matchSpecList[index].wantMatch // is that what we were looking for? + ) { + TSMTRACE << "Match succeeded with regex pattern: '" << p->m_regexList[index].pattern() << "'" << endl; + return true; + } + break; + + case PatternType::SUBSTRING : + bool cs = ! (bool) p->m_matchSpecList[index].ancHandling; + if ( + ( matchThis.find( p->m_matchSpecList[index].pattern, 0, cs ) >= 0 ) // was there a match? + == p->m_matchSpecList[index].wantMatch // is that what we were looking for? + ) { + TSMTRACE << "Match succeeded with substring: '" << p->m_matchSpecList[index].pattern << "'" << endl; + return true; + } + break; } - } - if ( patternList.isEmpty() ) { - //-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl; - return false; - } - else { - //-Debug: TSMTRACE << "Match failed, no pattern matched!" << endl; - return false; } + //-Debug: TSMTRACE << "Match failed, no pattern matched!" << endl; + return false ; } bool TDEStringMatcher::matchAll( const TQString& stringToMatch ) { - //-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against ALL stored patterns" << endl; - for ( const TQRegExp *rxPattern : patternList ) { - if ( ! - ( rxPattern->wildcard() && rxPattern->exactMatch( stringToMatch ) ) || - ( ! rxPattern->wildcard() && rxPattern->search( stringToMatch ) >= 0) - ) + //-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl; + if ( p->m_matchSpecList.isEmpty() ) { + //-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl; + return false; //FIXME: or should that be true per MicheleC's comment? + } + + TQString equivalentString; + + for ( size_t index = 0 ; index < p->m_matchSpecList.count() ; index++ ) + { + TQString matchThis = stringToMatch; + if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE ) { + if ( equivalentString.isNull() ) { + // FIXME TBD: This is where we will be converting each alphanumeric + // character in stringToMatch to its "least" equivalent and storing + // the result in equivalentString. Until then, we'll just do: + equivalentString = stringToMatch; + } + matchThis = equivalentString; + } + + if ( + ( p->m_regexList[index].search( matchThis ) < 0 ) // was there no match? + != p->m_matchSpecList[index].wantMatch // is that what we were looking for? + ) { + //-Debug: TSMTRACE << "String fail3ed to matching pattern: '" << rxPattern->pattern() << "'" << endl; + return false; + } + + if ( p->m_regexList[index].search( matchThis ) < 0 ) { //-Debug: TSMTRACE << "String failed to match pattern: '" << rxPattern->pattern() << "'" << endl; return false; } } + //-Debug: TSMTRACE << "Match succeeded, all patterns matched!" << endl; + return true; +} - if ( patternList.isEmpty() ) { - //-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl; - return false; +//================================================================================================ +// Utility functions +//================================================================================================ + +/* + The following code is a modified copy of that found in tqt3/src/tools/qregexp.cpp. +*/ +TQString TDEStringMatcher::wildcardToRegex( const TQString& wildcardPattern ) +{ + int wclen = wildcardPattern.length(); + TQString rx = TQString::fromLatin1( "" ); + int i = 0; + const TQChar *wc = wildcardPattern.unicode(); + while ( i < wclen ) { + TQChar c = wc[i++]; + switch ( c.unicode() ) { + case '*': + rx += TQString::fromLatin1( ".*" ); + break; + case '?': + rx += TQChar( '.' ); + break; + case '$': + case '(': + case ')': + case '+': + case '.': + case '\\': + case '^': + case '{': + case '|': + case '}': + rx += TQChar( '\\' ); + rx += c; + break; + case '[': + rx += c; + /* This is not correct, POSIX states that negation character is '!' + if ( wc[i] == TQChar('^') ) + rx += wc[i++]; + */ + if ( wc[i] == TQChar('!') ) { + rx += TQChar('^'); + i++; + } else if ( wc[i] == TQChar('^') ) { + rx += TQChar( '\\' ); + rx += wc[i++]; + } + if ( i < wclen ) { + if ( rx[i] == ']' ) + rx += wc[i++]; + while ( i < wclen && wc[i] != TQChar(']') ) { + if ( wc[i] == '\\' ) + rx += TQChar( '\\' ); + rx += wc[i++]; + } + } + break; + default: + rx += c; + } } - else { - //-Debug: TSMTRACE << "Match succeeded, all patterns matched!" << endl; - return true; + /* Wildcard patterns must match entire string */ + return TQChar('^') + rx + TQChar('$'); + /* TBD: Add support for extglob */ +} + +static TQString escapeRegexChars( const TQString& basicString ) +{ + int wclen = basicString.length(); + TQString outputString = TQString::fromLatin1( "" ); + int i = 0; + const TQChar *wc = basicString.unicode(); + while ( i < wclen ) { + TQChar c = wc[i++]; + switch ( c.unicode() ) { + case '+': + case '.': + case '^': + case '(': + case ')': + case '[': + case ']': + case '{': + case '}': + case '|': + case '$': + case '?': + case '*': + case '\\': + outputString += TQChar( '\\' ); + outputString += c; + break; + default: + outputString += c; + } } + return outputString; } +//================================================================================================ + #include "tdestringmatcher.moc" diff --git a/tdecore/tdestringmatcher.h b/tdecore/tdestringmatcher.h index f34492e2c..c65a2dac4 100644 --- a/tdecore/tdestringmatcher.h +++ b/tdecore/tdestringmatcher.h @@ -3,12 +3,54 @@ #include "tdelibs_export.h" -#include -#include #include +#include + +#define TSMTRACE kdWarning() << " " + +/** + * Enumeration used by the TDEStringMatcher class + * defining types of patterns to be matched + */ +enum class PatternType: uchar +{ + REGEX = 0, + WILDCARD = 1, +//EXTGLOB = 2, // RESERVED + SUBSTRING = 2, + DEFAULT = REGEX +}; + +/** + * Enumeration used by the TDEStringMatcher class + * defining special handling of alphanumeric characters + */ +enum class ANCHandling: uchar +{ + CASE_SENSITIVE = 0, // No handling + CASE_INSENSITIVE = 1, // Alphabetic case variants are same + EQUIVALENCE = 2, // Alphanumeric equivalents are same + DEFAULT = CASE_SENSITIVE +}; + +/** + * Structure used by the TDEStringMatcher class + * representing properties of a single match specification. + */ +struct MatchSpec +{ + PatternType patternType; + ANCHandling ancHandling; + bool wantMatch; // "matching" vs. "not matching" + TQString pattern; +}; + +/** + * Container used in a TDEStringMatcher object + * representing multiple match specifications. + */ +typedef TQValueVector MatchSpecList; -#define TSMTRACE kdDebug() << " " -#define TSMSIGNALS /** * @@ -23,32 +65,56 @@ public: ~TDEStringMatcher(); /** - Use @param newPatternString to generate @property patternList. Refer to - file README.tdestringmatcher for more information on how the input - string should be formatted. + @return list of currently defined match specifications. */ - bool generatePatternList( TQString newPatternString ); + MatchSpecList getMatchSpecs(); /** - Return pattern string from which @property patternList was created. - String is stored in @property TDEStringMatcherPrivate::patternString. + @return string encoding list of currently defined match specifications. */ - TQString getPatternString(); + TQString getMatchSpecString(); /** - Methods that determine whether or not @param stringToMatch match - any/all of the TQRegExp objects contained in @property patternList. + Use @param newMatchSpecList to generate the internal list of match + specifications to be used for pattern matching. + */ + bool setMatchSpecs( MatchSpecList newMatchSpecList ); + + /** + Use specially encoded @param newPatternString to generate the internal + list of match specifications to be used for pattern matching. Refer + to file README.tdestringmatcher in tdelibs/tdecore source code for + more information on how the input string should be formatted. + */ + bool setMatchSpecs( TQString newMatchSpecString ); + + /** + @return whether or not @param stringToMatch matches any of + the current match specifications. */ bool matchAny( const TQString& stringToMatch ); + + /** + @return whether or not @param stringToMatch matches all of + the current match specifications. + */ bool matchAll( const TQString& stringToMatch ); -signals: + /** + Utility function for converting a wildcard pattern string + to a regular expression pattern string. + */ + TQString wildcardToRegex( const TQString& wildcardPattern ); - void patternsChanged(); + /** + Utility function for escaping all regex-specific characters. + */ + TQString escapeRegexChars( const TQString& basicString ); -protected: - TQPtrList patternList; +signals: + + void patternsChanged(); private: @@ -57,4 +123,7 @@ private: }; +// Use vertical tab as m_patternString separator +inline constexpr char SEP { 0x0B }; + #endif diff --git a/tdeio/tdeio/tdefileitem.cpp b/tdeio/tdeio/tdefileitem.cpp index 2953eff38..a5ad44a5e 100644 --- a/tdeio/tdeio/tdefileitem.cpp +++ b/tdeio/tdeio/tdefileitem.cpp @@ -206,6 +206,8 @@ void KFileItem::init( bool _determineMimeTypeOnDemand ) } // Initialize hidden file matching apparatus + TSMTRACE "KFileItem::init(): Initialization for '" << m_url.fileName() << "' almost complete, initializing the hidden file matcher" << endl; + m_pHiddenFileMatcher = nullptr; // need to do or next will segfault setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() ); } @@ -833,53 +835,46 @@ bool KFileItem::isWritable() const return true; } -void KFileItem::resetHiddenFileMatcher() -{ - setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() ); -} - void KFileItem::setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher ) { - TSMTRACE << "KFileItem::setHiddenFileMatcher(...) called for " << m_url.fileName() << " [" << hiddenFileMatcher->getPatternString() << "]" <