Introduced an implementation-agnostic interface to TSM centered around

a structure describing a match pattern and associated options such as:

  - Pattern type: regex, wildcard (see Note 1), and substring (new).
  - Alphanumeric character handling: case-sensitive, case-insensitive,
    equivalent character matching (new - see Note 2).
  - Match goal: "matching" vs. "inverted matching" (new).

Introduced additional "get*" and "set*" functions to accommodate the
new list-of-structures interfaces to TSM.

Incorporated most of MichelleC's feedback through 2022-12-04
on previously committed code. Other code cleanup.

See file tdecore/README.tdestringmatcher for additional information.

Note 1: The implementation of wildcard support no longer uses
the TQRegExp.setWildcard() function for regex conversion or
the TQRegExp.exactMatch() function for matching.

Note 2: Full implementation of equivalent character matching is
postponed to a future commit.

Signed-off-by: Vincent Reher <tde@4reher.org>
pull/179/head
Vincent Reher 3 years ago
parent 78905ffa6a
commit 46fedcbe96

@ -1,63 +1,131 @@
The TDEStringMatcher class provides string matching against a list of
one or more TQRegExp objects. The matching functions currently provided
are:
The TDEStringMatcher class provides string matching against a list of one
or more match patterns along with associated options. A single pattern with
its associated options will be referred to herein as a "match specification".
matchAny(): strings match if they match any TQRegExp object in list.
matchAll(): strings match only if the match all TQRegExp objects in list.
Current match specification options include:
The list of TQRegExp objects is constructed using a specially formatted string
that is passed by applications via the generatePatternList() function. This
string is referred to as the "patternString" and is formatted as follows:
* Type of match pattern:
* The first character of the patternString defines the character that
is used to split the remainder of the string into match specification
strings. It is recommended (but not required) that this be a character
that does not occur in a match pattern (see below).
REGEX: Pattern is a regular expression whose syntax is
currently limited to that supported by the TQRegExp class.
WILDCARD: Pattern is a wildcard expression used in POSIX
shell file globbing.
SUBSTRING: Pattern is a simple substring that matches any
string in which it occurs. Substring characters do not
have any other meaning that controls matching.
* Each match specification string consists of an initial character that
designates the match specification type followed by the corresponding
match specification itself.
* Alphanumeric character handling in a pattern:
* Match specification strings starting with 'o' are followed by zero or
or more letters, each of which are interpreted as setting or unsetting
a match option. Match options remain in effect until explicitly overridden
by subsequent match options. The match option letters that are currently
recognized and processed are:
NONE: each unescaped alphanumeric character in a pattern
is distinct and will match only itself.
CASE INSENSITIVE: each unescaped letter in a pattern
will match its lower and upper case variants.
EQUIVALENCE: Each unescaped variant of an alphanumeric
character will match all stylistic and accented
variations of that character.
'w' - Match patterns are to be interpreted as "wildcards"
'r' - Match patterns are to be interpreted as "regexes" (TQRegExp default)
'c' - Matching will be case-sensitive (TQRegExp default)
'c' - Matching will be case-INsensitive
'm' - Regex matching will be "minimal" versus "greedy"
'g' - Regex matching will be "greedy" (TQRegExp default)
* Desired outcome of matching
* Match specification strings starting with 'p' are followed by one or
more characters that constitute a match pattern. Each match pattern
together with the match options in effect are used to construct
a single TQRegExp object to be used for string matching. This object
will be validated before acceptance.
TRUE: match succeeds if a string matches the match pattern.
FALSE: match fails if a string matches the match pattern.
* Match specification strings starting with any other character are ignored.
A list of match specifications may be codified in a string formatted as
a vertical tab (VT) separated list of substrings as follows:
Some examples of patternString:
OptionString <VT> PatternString [ <VT> OptionString <VT> PatternString ...]
* |ow|p.*|p*~
Dotfiles and kwrite backup files will be matched via wildcards
Non-empty option strings may contain only the following characters:
* /or/p[.][0-9]+$/ow/p.*
Files with a version number suffix will be matched via regex
and dotfiles will be matched via wildcard
'r' - Match pattern is a regular expression [default]
'w' - Match pattern is a wildcard expression
's' - Match pattern is a simple substring
'c' - Letter case variants are distinct (e.g. case-sensitive) [default]
'i' - Letter case variants are equivalent (e.g. case-insensitive)
'e' - All letter & number character variants are equivalent
'=' - Match succeeds if pattern matches [default]
'!' - Match fails if pattern matches (inverted match)
Current and potential use of the TDEStringMatcher class include:
Options set in option string remain in effect until subsequently overridden.
* Expansion of definition of "hidden" files, addressing issue # 270.
While option strings may be empty, pattern strings may not be empty. Backslash
characters in pattern strings should be represented by "\\"; all other characters
should be specified literally.
* Creation of a subclass that provides string parsing functions.
The following is an example of a string representing a match specification list
intended to apply to file names
Miscellaneous implementation notes:
w .* e e* cr ~$ \\.[0-9]+
* It is the responsibility of applications that use this class to provide
patternString editing (perhaps via UI), storage, and retrieval.
The corresponding match specification list will match as follows:
* Regex patterns use TQRegExp::search for matching but wildcard patterns
must use TQRegExp::exactMatch in order for matching to work correctly.
* All "dotfiles" would be matched with wildcard matching.
* All filenames beginning with an equivalent variant of the letter 'e'
(e.g. 'e','ê','Ě','') would be matched with wildcard matching.
* All file names ending with '~' (e.g kwrite backup names) would be
matched with case-sensitive regex matching.
* All file names having a numeric digit filename suffix (e.g. wget backup
files) would be matched with case-sensitive regex matching.
Applications may set and get match specification lists either directly or
indirectly (as a match specification list string).The matching functions
provided are:
matchAny(): strings match if they match any pattern in list.
matchAll(): strings match only if the match all patterns in list.
IMPLEMENTATION NOTES:
* Wildcard match patterns are currently limited to POSIX wildcards.
Extended wildcard-like expressions are not currently supported
(e.g. Bash globstar, extglob, brace expansion).
* Wildcard match patterns are currently converted to regular
expressions and processed as such instead of using dedicated
functions such as fnmatch(3) or glob(3). This may change in
the future.
* Regular expressions are currently supported by TQRegExp and are
thereby subject to its limitations and bugs. This may be changed
in the future (e.g. direct access to pcre2(3), porting of Qt
QRegularExpression).
* Simple substrings are also supported as match patterns. These are
currently processed by the TQString.find() function. In the future,
these may be converted and processed by the underlying Regex engine,
depending on the tradeoff between code simplification and efficiency.
* Alphanumeric equivalence is conceptually similar to [=x=] POSIX
equivalence class bracket expressions (which are not supported)
but is intended to apply globally in patterns. The following
are caveats when this option is utilized:
- There is potentially significant overhead due to the fact that
match patterns and match strings must be converted prior to
matching. Conversion requires character-by-character lookup
and replacement using a pre-built table.
- The table contains equivalents for [0-9A-Z] which should work
well for Latin-derived languages. It also contains support for
other numeric and non-latin letter characters, the efficacy of
which is not as certain.
- Due to the 16-bit size limitation of TQChar, the table does not
contain mappings for codepoints greater than U+FFFF.
* The choice of VT as the match specification string separator was
based on the following considerations:
- It is a control character that is unlikely to occur in a pattern.
If it is desired to match the VT character, then that can be done
in a regular expression by specifying '\v'.
- Unlike the potentially more readable HT, TDEConfig will not attempt
to escape it when storing strings containing it.
- Unlike other control characters, files containing text with that
character are less likely to be misidentified as binary.
- Most common text editors (and `less`) represent that character as
a symbol that can be copied and pasted.
- Text containing that character displays predictably in a terminal.

@ -138,9 +138,10 @@ TDEStringMatcher *TDEGlobal::hiddenFileMatcher()
TSMTRACE << "TDEGlobal::hiddenFileMatcher(): Global HFM initialization STARTED" << endl;
_hiddenFileMatcher = new TDEStringMatcher();
TDEGlobal::config()->setGroup( "General" );
TQString settings = TDEGlobal::config()->readEntry( "globalHiddenFileSpec", "/oW/.*" );
char settingsDefault[5] = { 'w', SEP, '.', '*', 0 }; // wildcard match of dotfiles
TQString settings = TDEGlobal::config()->readEntry( "globalHiddenFileSpec", settingsDefault );
TSMTRACE << "TDEGlobal::hiddenFileMatcher(): using retrieved patternString = '" << settings << "'" << endl;
_hiddenFileMatcher->generatePatternList( settings );
_hiddenFileMatcher->setMatchSpecs( settings );
}
return _hiddenFileMatcher;

@ -3,167 +3,512 @@
#include <tqregexp.h>
#include <kdebug.h>
typedef TQValueVector<TQRegExp> RegexList;
class TDEStringMatcher::TDEStringMatcherPrivate {
public:
TQString patternString;
}; // FIXME: This may be too small to warrant a private class :\
// Properties that may be set / accessed through the TSM interface
TQString m_matchSpecString;
MatchSpecList m_matchSpecList;
// Properties that implementation only
RegexList m_regexList;
/* Individual TQRegExp objects would not be used to process
a PatternType doesn't require a regex engine for matching
but we may "borrow" the TQRegExp.pattern() field to store
a "converted" version of the pattern.
*/
};
TDEStringMatcher::TDEStringMatcher()
{
p = new TDEStringMatcherPrivate;
TSMTRACE << "TDEStringMatcher::TDEStringMatcher: New instance created: " << this << endl;
p = new TDEStringMatcherPrivate;
}
TDEStringMatcher::~TDEStringMatcher()
{
patternList.setAutoDelete( true );
patternList.clear();
p->m_matchSpecList.clear();
p->m_regexList.clear();
delete p;
TSMTRACE << "TDEStringMatcher::TDEStringMatcher: Instance destroyed: " << this << endl;
}
TQString TDEStringMatcher::getPatternString()
//================================================================================================
// Match specification output functions
//================================================================================================
TQString TDEStringMatcher::getMatchSpecString()
{
return p->patternString;
return p->m_matchSpecString;
}
MatchSpecList TDEStringMatcher::getMatchSpecs()
{
return p->m_matchSpecList;
}
//================================================================================================
// Match specification input functions
//================================================================================================
bool TDEStringMatcher::setMatchSpecs( MatchSpecList newMatchSpecList )
{
RegexList newRegexList;
TQString optionString = "rc" ; // start with defaults
TQStringList newMatchSpecs;
TQRegExp rxWork;
TSMTRACE << "TDEStringMatcher::setPatterns: validating match specification list" << endl;
for ( MatchSpec matchSpec : newMatchSpecList ) {
if ( matchSpec.pattern.isEmpty() ) {
TSMTRACE << " Error: empty pattern!" << endl;
newRegexList.clear();
return false;
}
if ( matchSpec.pattern.find( TQChar(SEP) ) >= 0 ) {
TSMTRACE << " Error: pattern contains reserved separator character" << endl;
newRegexList.clear();
return false;
}
switch ( matchSpec.patternType ) {
// The following pattern types will be using TQRegExp functions for matching
case PatternType::REGEX :
optionString += TQChar('r');
rxWork.setPattern( matchSpec.pattern );
break;
case PatternType::WILDCARD :
optionString += TQChar('w');
rxWork.setPattern( wildcardToRegex( matchSpec.pattern ) );
break;
// The following pattern types will be using TQString functions for matching
case PatternType::SUBSTRING :
optionString += TQChar('s');
rxWork.setPattern( matchSpec.pattern ); // we will "borrow" this field
break;
default:
newRegexList.clear();
TSMTRACE << " Error: pattern type out of range" << endl;
return false;
}
switch ( matchSpec.ancHandling ) {
case ANCHandling::CASE_SENSITIVE :
optionString += TQChar('c');
rxWork.setCaseSensitive( true );
break;
case ANCHandling::CASE_INSENSITIVE :
optionString += TQChar('i');
rxWork.setCaseSensitive( false );
break;
case ANCHandling::EQUIVALENCE :
optionString += TQChar('e');
rxWork.setCaseSensitive( true );
// FIXME TBD: This is where we will be converting each (unescaped)
// alphanumeric character in rxWork.pattern to its "least" equivalent.
break;
default:
newRegexList.clear();
TSMTRACE << " Error: alphabetic character handling specification out of range" << endl;
return false;
}
if ( matchSpec.wantMatch )
optionString += TQChar('=');
else
optionString += TQChar('!');
if (! rxWork.isValid() ) {
TSMTRACE << " Error: invalid pattern syntax'" << endl;
newRegexList.clear();
return false;
}
// This particular match specification is good
newMatchSpecs.append( optionString );
newMatchSpecs.append( matchSpec.pattern );
newRegexList.append( rxWork );
optionString = "";
}
// All proposed match specifications are good, update everything accordingly
p->m_matchSpecList.clear(); p->m_matchSpecList = newMatchSpecList;
p->m_regexList.clear(); p->m_regexList = newRegexList;
p->m_matchSpecString = newMatchSpecs.join( TQChar(SEP) );
emit patternsChanged();
return true;
}
bool TDEStringMatcher::generatePatternList( TQString newPatternString )
//=================================================================================================
bool TDEStringMatcher::setMatchSpecs( TQString newMatchSpecString )
{
if ( newPatternString == p->patternString )
MatchSpecList newMatchSpecList;
RegexList newRegexList;
TQRegExp rxWork; // single working copy == each pattern inherits previous options
MatchSpec matchSpec = {
PatternType::DEFAULT,
ANCHandling::DEFAULT,
true, // seeking matches, not non-matches
""
};
if ( newMatchSpecString == p->m_matchSpecString )
return true;
TSMTRACE << "TDEStringMatcher::generatePatternList: Proposed pattern string: <" << newPatternString << ">" << endl;
if ( newPatternString.length() < 2 ) {
TSMTRACE << " Input string too short to be interpreted, patterns will be cleared" << endl;
patternList.clear();
p->patternString = "" ;
#ifdef TSMSIGNALS
TSMTRACE << "TDEStringMatcher::setPatterns: Proposed match specification string: <" << newMatchSpecString << ">" << endl;
if ( newMatchSpecString.isEmpty() ) {
TSMTRACE << " Empty pattern string => match specifications will be cleared" << endl;
p->m_matchSpecList.clear();
p->m_regexList.clear();
p->m_matchSpecString = "";
emit patternsChanged();
#endif // TSMSIGNALS
return true;
}
TQChar patternStringDivider = newPatternString[0];
TSMTRACE << " patternStringDivider = '" << patternStringDivider << "'" << endl;
TQStringList specList = TQStringList::split( patternStringDivider, newPatternString.mid(1), true );
TQRegExp rxWork;
TQPtrList<TQRegExp> rxPatternList;
for ( const TQString &specification : specList ) {
TSMTRACE << " Processing specification string: '" << specification << "'" << endl;
TQChar specificationType = specification[0].lower();
switch ( specificationType ) {
case 'o' : {
TQString optionString = specification.mid(1).lower();
TSMTRACE << " Processing match option string: '" << optionString << "'" << endl;
for ( int i = 0 ; i < optionString.length() ; i++ ) {
TQChar optionChar = optionString[i];
TSMTRACE << " Option character: '" << optionChar << "'" << endl;
switch ( optionChar ) {
case 'w' : rxWork.setWildcard( true ); break;
case 'r' : rxWork.setWildcard( false ); break;
case 'c' : rxWork.setCaseSensitive( true ); break;
case 'i' : rxWork.setCaseSensitive( false ); break;
case 'm' : rxWork.setMinimal( true ); break;
case 'g' : rxWork.setMinimal( false ); break;
default: break;
TQStringList newMatchSpecs = TQStringList::split( SEP, newMatchSpecString, true );
if ( newMatchSpecs.count() % 2 != 0 ) {
TSMTRACE << " Error: match specification string must contain an even number of components" << endl;
return false;
}
TSMTRACE << newMatchSpecs.count() << endl;
bool processingPattern = false; // expected format: option string , pattern string, ...
for ( TQString &specification : newMatchSpecs ) {
if ( specification.find( TQChar(SEP) ) >= 0 ) {
TSMTRACE << " Error: match specification string contains reserved separator character" << endl;
newMatchSpecList.clear();
newRegexList.clear();
return false;
}
TSMTRACE << " Wildcard/CaseSensitive settings: " << rxWork.wildcard() << "/" << rxWork.caseSensitive() << endl;
if ( processingPattern ) {
TSMTRACE << " Processing match pattern string: '" << specification << "'" << endl;
if ( specification.isEmpty() ) {
TSMTRACE << " Error: empty patterns are not allowed" << endl;
newMatchSpecList.clear();
newRegexList.clear();
return false;
}
// Prepare regex
switch ( matchSpec.patternType ) {
// The following pattern types will be using TQRegExp functions for matching
case PatternType::REGEX :
rxWork.setPattern( specification );
break;
case PatternType::WILDCARD :
rxWork.setPattern( wildcardToRegex( specification ) );
break;
case 'p' : {
TQString pattern = specification.mid(1);
TSMTRACE << " Processing match pattern: '" << pattern << "'" << endl;
if ( pattern.isEmpty() ) {
TSMTRACE << " Empty patterns are not allowed" << endl;
rxPatternList.clear();
return false;
// The following pattern types will be using TQString functions for matching
case PatternType::SUBSTRING :
rxWork.setPattern( specification ); // used for storage only
break;
default:
continue; // should not arise
}
switch ( matchSpec.ancHandling ) {
case ANCHandling::CASE_SENSITIVE :
rxWork.setCaseSensitive( true );
break;
case ANCHandling::CASE_INSENSITIVE :
rxWork.setCaseSensitive( false );
break;
case ANCHandling::EQUIVALENCE :
rxWork.setCaseSensitive( false );
// FIXME TBD: This is where we will be converting each (unescaped)
// alphanumeric character in rxWork.pattern to its "least" equivalent.
break;
default:
continue; // should not arise
}
rxWork.setPattern( pattern );
// Test regex
if (! rxWork.isValid() ) {
TSMTRACE << " Invalid pattern" << endl;
rxPatternList.clear();
TSMTRACE << " Error: invalid pattern syntax'" << endl;
newMatchSpecList.clear();
newRegexList.clear();
return false;
continue;
}
TQRegExp *rxPattern = new TQRegExp( rxWork );
rxPatternList.append( rxPattern );
// if (! rxWork.isReallyWhatUserIntended() ) { HA HA
TSMTRACE << " Final Wildcard/CaseSensitive settings: " << rxWork.wildcard() << "/" << rxWork.caseSensitive() << endl;
matchSpec.pattern = specification;
newMatchSpecList.push_back( matchSpec );
newRegexList.append( rxWork );
processingPattern = false; // next spec should be an option string
continue;
}
break;
default :
TSMTRACE << " Ignoring unknown specification type '" << specificationType << "'" << endl;
//-Relax, don't overreact: rxPatternList.clear();
//-Relax, don't overreact: return false;
break;
specification = specification.lower();
TSMTRACE << " Processing match option string: '" << specification << "'" << endl;
for ( int i = 0 ; i < specification.length() ; i++ ) {
TQChar optionChar = specification[i];
TSMTRACE << " Option character: '" << optionChar << "'" << endl;
switch ( optionChar ) {
case 'r' : matchSpec.patternType = PatternType::REGEX ; break;
case 'w' : matchSpec.patternType = PatternType::WILDCARD ; break;
case 's' : matchSpec.patternType = PatternType::SUBSTRING ; break;
case 'c' : matchSpec.ancHandling = ANCHandling::CASE_SENSITIVE ; break;
case 'i' : matchSpec.ancHandling = ANCHandling::CASE_INSENSITIVE; break;
case 'e' : matchSpec.ancHandling = ANCHandling::EQUIVALENCE ; break;
case '=' : matchSpec.wantMatch = true ; break;
case '!' : matchSpec.wantMatch = false ; break;
default:
// We reserve ALL other possible option characters for future use!
TSMTRACE << " Error: invalid option character" << endl;
return false;
}
}
// patternList.clear(); // no need to do this?
patternList.setAutoDelete( true );
patternList = rxPatternList;
p->patternString = newPatternString;
// rxPatternList.clear(); // no need to do this?
processingPattern = true; // next spec should be a pattern string
}
p->m_matchSpecList.clear(); p->m_matchSpecList = newMatchSpecList;
p->m_regexList.clear(); p->m_regexList = newRegexList;
p->m_matchSpecString = newMatchSpecString;
TSMTRACE << " Final patternString: '" << p->patternString << "'" << endl;
TSMTRACE << " Number of regex match patterns in list: '" << patternList.count() << "'" << endl;
//newRegexList.clear(); // no need to do this?
#ifdef TSMSIGNALS
TSMTRACE << " Final patternString: '" << p->m_matchSpecString << "'" << endl;
TSMTRACE << " Number of regex match patterns in list: '" << p->m_regexList.count() << "'" << endl;
TSMTRACE << " Notifying slots of pattern change" << endl;
emit patternsChanged();
TSMTRACE << " All slots have been notified" << endl;
#endif // TSMSIGNALS
TSMTRACE << "TDEStringMatcher::generatePatternList: Patterns were successfully regenerated" << endl << endl;
TSMTRACE << "TDEStringMatcher::setPatterns: Patterns were successfully regenerated" << endl << endl;
return true;
}
//================================================================================================
// Match functions
//================================================================================================
bool TDEStringMatcher::matchAny( const TQString& stringToMatch )
{
//-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl;
for ( const TQRegExp *rxPattern : patternList ) {
if (
( rxPattern->wildcard() && rxPattern->exactMatch( stringToMatch ) ) ||
( ! rxPattern->wildcard() && rxPattern->search( stringToMatch ) >= 0)
)
TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl;
if ( p->m_matchSpecList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
return false; //FIXME: or should that be true per MicheleC's comment?
}
TQString equivalentString;
for ( size_t index = 0 ; index < p->m_matchSpecList.count() ; index++ )
{
TQString matchThis = stringToMatch;
if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE )
{
//-Debug: TSMTRACE << "String matched pattern: '" << rxPattern->pattern() << "'" << endl;
if ( equivalentString.isNull() ) {
// FIXME TBD: This is where we will be converting each alphanumeric
// character in stringToMatch to its "least" equivalent and storing
// the result in equivalentString. Until then, we'll just do:
equivalentString = stringToMatch;
}
matchThis = equivalentString;
}
switch ( p->m_matchSpecList[index].patternType ) {
case PatternType::REGEX :
case PatternType::WILDCARD :
if (
( p->m_regexList[index].search( matchThis ) >= 0 ) // was there a match?
== p->m_matchSpecList[index].wantMatch // is that what we were looking for?
) {
TSMTRACE << "Match succeeded with regex pattern: '" << p->m_regexList[index].pattern() << "'" << endl;
return true;
}
break;
case PatternType::SUBSTRING :
bool cs = ! (bool) p->m_matchSpecList[index].ancHandling;
if (
( matchThis.find( p->m_matchSpecList[index].pattern, 0, cs ) >= 0 ) // was there a match?
== p->m_matchSpecList[index].wantMatch // is that what we were looking for?
) {
TSMTRACE << "Match succeeded with substring: '" << p->m_matchSpecList[index].pattern << "'" << endl;
return true;
}
break;
}
if ( patternList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
return false;
}
else {
//-Debug: TSMTRACE << "Match failed, no pattern matched!" << endl;
return false;
}
return false ;
}
bool TDEStringMatcher::matchAll( const TQString& stringToMatch )
{
//-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against ALL stored patterns" << endl;
for ( const TQRegExp *rxPattern : patternList ) {
if ( !
( rxPattern->wildcard() && rxPattern->exactMatch( stringToMatch ) ) ||
( ! rxPattern->wildcard() && rxPattern->search( stringToMatch ) >= 0)
)
//-Debug: TSMTRACE << "Attempting to match string '" << stringToMatch << "' against stored patterns" << endl;
if ( p->m_matchSpecList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
return false; //FIXME: or should that be true per MicheleC's comment?
}
TQString equivalentString;
for ( size_t index = 0 ; index < p->m_matchSpecList.count() ; index++ )
{
//-Debug: TSMTRACE << "String failed to match pattern: '" << rxPattern->pattern() << "'" << endl;
return false;
TQString matchThis = stringToMatch;
if ( p->m_matchSpecList[index].ancHandling == ANCHandling::EQUIVALENCE )
{
if ( equivalentString.isNull() ) {
// FIXME TBD: This is where we will be converting each alphanumeric
// character in stringToMatch to its "least" equivalent and storing
// the result in equivalentString. Until then, we'll just do:
equivalentString = stringToMatch;
}
matchThis = equivalentString;
}
if ( patternList.isEmpty() ) {
//-Debug: TSMTRACE << "Match failed on empty pattern list!" << endl;
if (
( p->m_regexList[index].search( matchThis ) < 0 ) // was there no match?
!= p->m_matchSpecList[index].wantMatch // is that what we were looking for?
) {
//-Debug: TSMTRACE << "String fail3ed to matching pattern: '" << rxPattern->pattern() << "'" << endl;
return false;
}
else {
if ( p->m_regexList[index].search( matchThis ) < 0 ) {
//-Debug: TSMTRACE << "String failed to match pattern: '" << rxPattern->pattern() << "'" << endl;
return false;
}
}
//-Debug: TSMTRACE << "Match succeeded, all patterns matched!" << endl;
return true;
}
//================================================================================================
// Utility functions
//================================================================================================
/*
The following code is a modified copy of that found in tqt3/src/tools/qregexp.cpp.
*/
TQString TDEStringMatcher::wildcardToRegex( const TQString& wildcardPattern )
{
int wclen = wildcardPattern.length();
TQString rx = TQString::fromLatin1( "" );
int i = 0;
const TQChar *wc = wildcardPattern.unicode();
while ( i < wclen ) {
TQChar c = wc[i++];
switch ( c.unicode() ) {
case '*':
rx += TQString::fromLatin1( ".*" );
break;
case '?':
rx += TQChar( '.' );
break;
case '$':
case '(':
case ')':
case '+':
case '.':
case '\\':
case '^':
case '{':
case '|':
case '}':
rx += TQChar( '\\' );
rx += c;
break;
case '[':
rx += c;
/* This is not correct, POSIX states that negation character is '!'
if ( wc[i] == TQChar('^') )
rx += wc[i++];
*/
if ( wc[i] == TQChar('!') ) {
rx += TQChar('^');
i++;
} else if ( wc[i] == TQChar('^') ) {
rx += TQChar( '\\' );
rx += wc[i++];
}
if ( i < wclen ) {
if ( rx[i] == ']' )
rx += wc[i++];
while ( i < wclen && wc[i] != TQChar(']') ) {
if ( wc[i] == '\\' )
rx += TQChar( '\\' );
rx += wc[i++];
}
}
break;
default:
rx += c;
}
}
/* Wildcard patterns must match entire string */
return TQChar('^') + rx + TQChar('$');
/* TBD: Add support for extglob */
}
static TQString escapeRegexChars( const TQString& basicString )
{
int wclen = basicString.length();
TQString outputString = TQString::fromLatin1( "" );
int i = 0;
const TQChar *wc = basicString.unicode();
while ( i < wclen ) {
TQChar c = wc[i++];
switch ( c.unicode() ) {
case '+':
case '.':
case '^':
case '(':
case ')':
case '[':
case ']':
case '{':
case '}':
case '|':
case '$':
case '?':
case '*':
case '\\':
outputString += TQChar( '\\' );
outputString += c;
break;
default:
outputString += c;
}
}
return outputString;
}
//================================================================================================
#include "tdestringmatcher.moc"

@ -3,12 +3,54 @@
#include "tdelibs_export.h"
#include <tqstring.h>
#include <tqptrlist.h>
#include <tqobject.h>
#include <tqvaluevector.h>
#define TSMTRACE kdWarning() << "<TSMTRACE> "
/**
* Enumeration used by the TDEStringMatcher class
* defining types of patterns to be matched
*/
enum class PatternType: uchar
{
REGEX = 0,
WILDCARD = 1,
//EXTGLOB = 2, // RESERVED
SUBSTRING = 2,
DEFAULT = REGEX
};
/**
* Enumeration used by the TDEStringMatcher class
* defining special handling of alphanumeric characters
*/
enum class ANCHandling: uchar
{
CASE_SENSITIVE = 0, // No handling
CASE_INSENSITIVE = 1, // Alphabetic case variants are same
EQUIVALENCE = 2, // Alphanumeric equivalents are same
DEFAULT = CASE_SENSITIVE
};
/**
* Structure used by the TDEStringMatcher class
* representing properties of a single match specification.
*/
struct MatchSpec
{
PatternType patternType;
ANCHandling ancHandling;
bool wantMatch; // "matching" vs. "not matching"
TQString pattern;
};
/**
* Container used in a TDEStringMatcher object
* representing multiple match specifications.
*/
typedef TQValueVector<MatchSpec> MatchSpecList;
#define TSMTRACE kdDebug() << "<TSMTRACE> "
#define TSMSIGNALS
/**
*
@ -23,32 +65,56 @@ public:
~TDEStringMatcher();
/**
Use @param newPatternString to generate @property patternList. Refer to
file README.tdestringmatcher for more information on how the input
string should be formatted.
@return list of currently defined match specifications.
*/
bool generatePatternList( TQString newPatternString );
MatchSpecList getMatchSpecs();
/**
Return pattern string from which @property patternList was created.
String is stored in @property TDEStringMatcherPrivate::patternString.
@return string encoding list of currently defined match specifications.
*/
TQString getPatternString();
TQString getMatchSpecString();
/**
Methods that determine whether or not @param stringToMatch match
any/all of the TQRegExp objects contained in @property patternList.
Use @param newMatchSpecList to generate the internal list of match
specifications to be used for pattern matching.
*/
bool setMatchSpecs( MatchSpecList newMatchSpecList );
/**
Use specially encoded @param newPatternString to generate the internal
list of match specifications to be used for pattern matching. Refer
to file README.tdestringmatcher in tdelibs/tdecore source code for
more information on how the input string should be formatted.
*/
bool setMatchSpecs( TQString newMatchSpecString );
/**
@return whether or not @param stringToMatch matches any of
the current match specifications.
*/
bool matchAny( const TQString& stringToMatch );
/**
@return whether or not @param stringToMatch matches all of
the current match specifications.
*/
bool matchAll( const TQString& stringToMatch );
signals:
/**
Utility function for converting a wildcard pattern string
to a regular expression pattern string.
*/
TQString wildcardToRegex( const TQString& wildcardPattern );
void patternsChanged();
/**
Utility function for escaping all regex-specific characters.
*/
TQString escapeRegexChars( const TQString& basicString );
protected:
TQPtrList<TQRegExp> patternList;
signals:
void patternsChanged();
private:
@ -57,4 +123,7 @@ private:
};
// Use vertical tab as m_patternString separator
inline constexpr char SEP { 0x0B };
#endif

@ -206,6 +206,8 @@ void KFileItem::init( bool _determineMimeTypeOnDemand )
}
// Initialize hidden file matching apparatus
TSMTRACE "KFileItem::init(): Initialization for '" << m_url.fileName() << "' almost complete, initializing the hidden file matcher" << endl;
m_pHiddenFileMatcher = nullptr; // need to do or next will segfault
setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() );
}
@ -833,53 +835,46 @@ bool KFileItem::isWritable() const
return true;
}
void KFileItem::resetHiddenFileMatcher()
{
setHiddenFileMatcher( TDEGlobal::hiddenFileMatcher() );
}
void KFileItem::setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher )
{
TSMTRACE << "KFileItem::setHiddenFileMatcher(...) called for " << m_url.fileName() << " [" << hiddenFileMatcher->getPatternString() << "]" <<endl ;
TSMTRACE << "KFileItem::setHiddenFileMatcher(...) called for " << m_url.fileName() << endl ;
if ( hiddenFileMatcher == m_pHiddenFileMatcher )
return;
if ( hiddenFileMatcher == 0 || hiddenFileMatcher == nullptr ) {
kdWarning() << "KFileItem::setHiddenFileMatcher: refusing to process null pointer passed by caller" << endl;
return;
}
#ifdef TSMSIGNALS
if ( m_pHiddenFileMatcher != 0 && m_pHiddenFileMatcher != nullptr ) {
TSMTRACE << " Attempting to disconnect slots from hidden file matcher signals ... " << endl;
if ( disconnect( m_pHiddenFileMatcher, 0, 0, 0 ) )
TSMTRACE << " Attempting to disconnect slots from hidden file matcher (" << m_pHiddenFileMatcher << ") signals ... " << endl;
if ( (m_pHiddenFileMatcher != nullptr) && disconnect( m_pHiddenFileMatcher, 0, this, 0 ) )
TSMTRACE << " ... all slots successfully disconnected" << endl;
}
#endif // TSMSIGNALS
TSMTRACE << " Changing hidden file matcher from " << m_pHiddenFileMatcher << " to " << hiddenFileMatcher << endl;
m_pHiddenFileMatcher = hiddenFileMatcher;
#ifdef TSMSIGNALS
if ( hiddenFileMatcher == nullptr ) {
kdWarning() << "KFileItem::setHiddenFileMatcher: called with null pointer, nothing will be hidden any more" << endl;
return;
}
TSMTRACE << " New pattern string: " << hiddenFileMatcher->getMatchSpecString() << endl ;
TSMTRACE << " Attempting to reconnect slots to hidden file matcher signals ... " << endl;
if ( connect( m_pHiddenFileMatcher, TQT_SIGNAL( destroyed() ), this, TQT_SLOT( resetHiddenFileMatcher() ) ) )
TSMTRACE << " Connected slot resethiddenFileMatcher() to signal destroyed()" << endl;
if ( connect( m_pHiddenFileMatcher, TQT_SIGNAL( destroyed() ), this, TQT_SLOT( setHiddenFileMatcher() ) ) )
TSMTRACE << " Connected slot sethiddenFileMatcher() to signal destroyed()" << endl;
if ( connect( m_pHiddenFileMatcher, TQT_SIGNAL( patternsChanged() ), this, TQT_SLOT( reEvaluateHidden() ) ) )
TSMTRACE << " Connected slot reEvaluateHidden() to signal patternsChanged()" << endl;
#endif // TSMSIGNALS
TSMTRACE << "KFileItem::setHiddenFileMatcher(...) finished, calling reEvaluateHidden()" <<endl ;
TSMTRACE << "KFileItem::setHiddenFileMatcher(...) finished, calling reEvaluateHidden()" << endl ;
reEvaluateHidden();
}
void KFileItem::reEvaluateHidden()
{
TSMTRACE << "KFileItem::reEvaluateHidden() called for " << m_url.fileName() <<endl ;
if ( !m_url.isEmpty() )
if ( m_pHiddenFileMatcher == nullptr ) // abnormal
m_bHiddenByMatcher = false;
else if ( !m_url.isEmpty() )
m_bHiddenByMatcher = m_pHiddenFileMatcher->matchAny( m_url.fileName() );
else // should never happen
m_bHiddenByMatcher = m_pHiddenFileMatcher->matchAny( m_strName );
TSMTRACE << "KFileItem::reEvaluateHidden() completed for " << m_url.fileName() << " [" << m_bHiddenByMatcher << "]" <<endl ;
TSMTRACE << "KFileItem::reEvaluateHidden() completed for " << m_url.fileName() << " [" << m_bHiddenByMatcher << "]" << endl ;
}
bool KFileItem::isHidden() const
@ -1120,6 +1115,9 @@ void KFileItem::assign( const KFileItem & item )
// note: m_extra is NOT copied, as we'd have no control over who is
// deleting the data or not.
// We have to do this (vs. copying properties) to establish new signal/slot connections
setHiddenFileMatcher( item.m_pHiddenFileMatcher );
// We had a mimetype previously (probably), so we need to re-determine it
determineMimeType();
@ -1326,6 +1324,4 @@ TQDataStream & operator>> ( TQDataStream & s, KFileItem & a )
return s;
}
//#ifdef TSMSIGNALS
#include "tdefileitem.moc"
//#endif // TSMSIGNALS

@ -230,25 +230,19 @@ public:
*/
bool isWritable() const;
/**
* Sets object that encapsulates criteria for determining whether or not
* a filesystem entity is hidden based on characteristics of its name.
* Object is stored in @property m_pHiddenFileMatcher.
*/
void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher );
public slots:
/**
* Sets @property m_pHiddenFileMatcher to the global hidden file matcher.
* Sets object that encapsulates criteria for determining whether or not
* a filesystem entity is hidden based on characteristics of its name.
* This object will be referred to as the "hidden file matcher"
*/
void resetHiddenFileMatcher();
void setHiddenFileMatcher( TDEStringMatcher *hiddenFileMatcher = nullptr );
/**
* Checks whether or not the current filesystem object is "hidden" by
* calling the matchAny() method of the TDEStringMatcher object stored
* in @property m_pHiddenFileMatcher. Result of this check is cached in
* @property m_bHiddenByMatcher.
* Checks whether or not the current filesystem object is "hidden"
* according to current hidden file matcher. Result is cached for
* for use by function isHidden()
*/
void reEvaluateHidden();
@ -688,7 +682,7 @@ private:
bool m_bMimeTypeKnown:1;
// Auto: always check if hidden.
// Auto: check if item is hidden.
enum { Auto, Hidden, Shown } m_hidden:3;
/**
@ -699,9 +693,9 @@ private:
/**
* Object that encapsulates criteria for determining whether or not
* this filesystem entity is hidden based on characteristics of its
* name. This property is set by method setHiddenFileMatcher().
* name. Referred to as the "hidden file matcher".
*/
TDEStringMatcher *m_pHiddenFileMatcher = nullptr;
TDEStringMatcher *m_pHiddenFileMatcher;
// For special case like link to dirs over FTP
TQString m_guessedMimeType;

Loading…
Cancel
Save