You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1037 lines
57 KiB
1037 lines
57 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
|
<title>TQRegExp Class</title>
|
|
<style type="text/css"><!--
|
|
fn { margin-left: 1cm; text-indent: -1cm; }
|
|
a:link { color: #004faf; text-decoration: none }
|
|
a:visited { color: #672967; text-decoration: none }
|
|
body { background: #ffffff; color: black; }
|
|
--></style>
|
|
</head>
|
|
<body>
|
|
|
|
<table border="0" cellpadding="0" cellspacing="0" width="100%">
|
|
<tr bgcolor="#E5E5E5">
|
|
<td valign=center>
|
|
<a href="index.html">
|
|
<font color="#004faf">Home</font></a>
|
|
| <a href="classes.html">
|
|
<font color="#004faf">All Classes</font></a>
|
|
| <a href="mainclasses.html">
|
|
<font color="#004faf">Main Classes</font></a>
|
|
| <a href="annotated.html">
|
|
<font color="#004faf">Annotated</font></a>
|
|
| <a href="groups.html">
|
|
<font color="#004faf">Grouped Classes</font></a>
|
|
| <a href="functions.html">
|
|
<font color="#004faf">Functions</font></a>
|
|
</td>
|
|
<td align="right" valign="center"><img src="logo32.png" align="right" width="64" height="32" border="0"></td></tr></table><h1 align=center>TQRegExp Class Reference</h1>
|
|
|
|
<p>The TQRegExp class provides pattern matching using regular expressions.
|
|
<a href="#details">More...</a>
|
|
<p>All the functions in this class are <a href="threads.html#reentrant">reentrant</a> when TQt is built with thread support.</p>
|
|
<p><tt>#include <<a href="tqregexp-h.html">tqregexp.h</a>></tt>
|
|
<p><a href="tqregexp-members.html">List of all member functions.</a>
|
|
<h2>Public Members</h2>
|
|
<ul>
|
|
<li class=fn>enum <a href="#CaretMode-enum"><b>CaretMode</b></a> { CaretAtZero, CaretAtOffset, CaretWontMatch }</li>
|
|
<li class=fn><a href="#TQRegExp"><b>TQRegExp</b></a> ()</li>
|
|
<li class=fn><a href="#TQRegExp-2"><b>TQRegExp</b></a> ( const TQString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )</li>
|
|
<li class=fn><a href="#TQRegExp-3"><b>TQRegExp</b></a> ( const TQRegExp & rx )</li>
|
|
<li class=fn><a href="#~TQRegExp"><b>~TQRegExp</b></a> ()</li>
|
|
<li class=fn>TQRegExp & <a href="#operator-eq"><b>operator=</b></a> ( const TQRegExp & rx )</li>
|
|
<li class=fn>bool <a href="#operator-eq-eq"><b>operator==</b></a> ( const TQRegExp & rx ) const</li>
|
|
<li class=fn>bool <a href="#operator!-eq"><b>operator!=</b></a> ( const TQRegExp & rx ) const</li>
|
|
<li class=fn>bool <a href="#isEmpty"><b>isEmpty</b></a> () const</li>
|
|
<li class=fn>bool <a href="#isValid"><b>isValid</b></a> () const</li>
|
|
<li class=fn>TQString <a href="#pattern"><b>pattern</b></a> () const</li>
|
|
<li class=fn>void <a href="#setPattern"><b>setPattern</b></a> ( const TQString & pattern )</li>
|
|
<li class=fn>bool <a href="#caseSensitive"><b>caseSensitive</b></a> () const</li>
|
|
<li class=fn>void <a href="#setCaseSensitive"><b>setCaseSensitive</b></a> ( bool sensitive )</li>
|
|
<li class=fn>bool <a href="#wildcard"><b>wildcard</b></a> () const</li>
|
|
<li class=fn>void <a href="#setWildcard"><b>setWildcard</b></a> ( bool wildcard )</li>
|
|
<li class=fn>bool <a href="#minimal"><b>minimal</b></a> () const</li>
|
|
<li class=fn>void <a href="#setMinimal"><b>setMinimal</b></a> ( bool minimal )</li>
|
|
<li class=fn>bool <a href="#exactMatch"><b>exactMatch</b></a> ( const TQString & str ) const</li>
|
|
<li class=fn>int match ( const TQString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const <em>(obsolete)</em></li>
|
|
<li class=fn>int <a href="#search"><b>search</b></a> ( const TQString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const</li>
|
|
<li class=fn>int <a href="#searchRev"><b>searchRev</b></a> ( const TQString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const</li>
|
|
<li class=fn>int <a href="#matchedLength"><b>matchedLength</b></a> () const</li>
|
|
<li class=fn>int <a href="#numCaptures"><b>numCaptures</b></a> () const</li>
|
|
<li class=fn>TQStringList <a href="#capturedTexts"><b>capturedTexts</b></a> ()</li>
|
|
<li class=fn>TQString <a href="#cap"><b>cap</b></a> ( int nth = 0 )</li>
|
|
<li class=fn>int <a href="#pos"><b>pos</b></a> ( int nth = 0 )</li>
|
|
<li class=fn>TQString <a href="#errorString"><b>errorString</b></a> ()</li>
|
|
</ul>
|
|
<h2>Static Public Members</h2>
|
|
<ul>
|
|
<li class=fn>TQString <a href="#escape"><b>escape</b></a> ( const TQString & str )</li>
|
|
</ul>
|
|
<hr><a name="details"></a><h2>Detailed Description</h2>
|
|
|
|
|
|
|
|
The TQRegExp class provides pattern matching using regular expressions.
|
|
<p>
|
|
|
|
|
|
|
|
<!-- index regular expression --><a name="regular-expression"></a>
|
|
<p> Regular expressions, or "regexps", provide a way to find patterns
|
|
within text. This is useful in many contexts, for example:
|
|
<p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#f0f0f0"> <td valign="top">Validation
|
|
<td valign="top">A regexp can be used to check whether a piece of text
|
|
meets some criteria, e.g. is an integer or contains no
|
|
whitespace.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top">Searching
|
|
<td valign="top">Regexps provide a much more powerful means of searching
|
|
text than simple string matching does. For example we can
|
|
create a regexp which says "find one of the words 'mail',
|
|
'letter' or 'correspondence' but not any of the words
|
|
'email', 'mailman' 'mailer', 'letterbox' etc."
|
|
<tr bgcolor="#f0f0f0"> <td valign="top">Search and Replace
|
|
<td valign="top">A regexp can be used to replace a pattern with a piece of
|
|
text, for example replace all occurrences of '&' with
|
|
'&amp;' except where the '&' is already followed by 'amp;'.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top">String Splitting
|
|
<td valign="top">A regexp can be used to identify where a string should be
|
|
split into its component fields, e.g. splitting tab-delimited
|
|
strings.
|
|
</table></center>
|
|
<p> We present a very brief introduction to regexps, a description of
|
|
TQt's regexp language, some code examples, and finally the function
|
|
documentation itself. TQRegExp is modeled on Perl's regexp
|
|
language, and also fully supports Unicode. TQRegExp can also be
|
|
used in the weaker 'wildcard' (globbing) mode which works in a
|
|
similar way to command shells. A good text on regexps is <em>Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools</em> by Jeffrey E. Friedl, ISBN 1565922573.
|
|
<p> Experienced regexp users may prefer to skip the introduction and
|
|
go directly to the relevant information.
|
|
<p> In case of multi-threaded programming, note that TQRegExp depends on
|
|
<a href="tqthreadstorage.html">TQThreadStorage</a> internally. For that reason, TQRegExp should only be
|
|
used with threads started with <a href="tqthread.html">TQThread</a>, i.e. not with threads
|
|
started with platform-specific APIs.
|
|
<p> <!-- toc -->
|
|
<ul>
|
|
<li><a href="#1"> Introduction
|
|
</a>
|
|
<li><a href="#1-1"> Characters and Abbreviations for Sets of Characters
|
|
</a>
|
|
<li><a href="#1-2"> Sets of Characters
|
|
</a>
|
|
<li><a href="#1-3"> Quantifiers
|
|
</a>
|
|
<li><a href="#1-4"> Capturing Text
|
|
</a>
|
|
<li><a href="#1-5"> Assertions
|
|
</a>
|
|
<li><a href="#1-6"> Wildcard Matching (globbing)
|
|
</a>
|
|
<li><a href="#1-7"> Notes for Perl Users
|
|
</a>
|
|
<li><a href="#1-8"> Code Examples
|
|
</a>
|
|
</ul>
|
|
<!-- endtoc -->
|
|
|
|
<p> <h3> Introduction
|
|
</h3>
|
|
<a name="1"></a><p> Regexps are built up from expressions, quantifiers, and assertions.
|
|
The simplest form of expression is simply a character, e.g.
|
|
<b>x</b> or <b>5</b>. An expression can also be a set of
|
|
characters. For example, <b>[ABCD]</b>, will match an <b>A</b> or
|
|
a <b>B</b> or a <b>C</b> or a <b>D</b>. As a shorthand we could
|
|
write this as <b>[A-D]</b>. If we want to match any of the
|
|
captital letters in the English alphabet we can write
|
|
<b>[A-Z]</b>. A quantifier tells the regexp engine how many
|
|
occurrences of the expression we want, e.g. <b>x{1,1}</b> means
|
|
match an <b>x</b> which occurs at least once and at most once.
|
|
We'll look at assertions and more complex expressions later.
|
|
<p> Note that in general regexps cannot be used to check for balanced
|
|
brackets or tags. For example if you want to match an opening html
|
|
<tt><b></tt> and its closing <tt></b></tt> you can only use a regexp if you
|
|
know that these tags are not nested; the html fragment, <tt><b>bold <b>bolder</b></b></tt> will not match as expected. If you know the
|
|
maximum level of nesting it is possible to create a regexp that
|
|
will match correctly, but for an unknown level of nesting, regexps
|
|
will fail.
|
|
<p> We'll start by writing a regexp to match integers in the range 0
|
|
to 99. We will require at least one digit so we will start with
|
|
<b>[0-9]{1,1}</b> which means match a digit exactly once. This
|
|
regexp alone will match integers in the range 0 to 9. To match one
|
|
or two digits we can increase the maximum number of occurrences so
|
|
the regexp becomes <b>[0-9]{1,2}</b> meaning match a digit at
|
|
least once and at most twice. However, this regexp as it stands
|
|
will not match correctly. This regexp will match one or two digits
|
|
<em>within</em> a string. To ensure that we match against the whole
|
|
string we must use the anchor assertions. We need <b>^</b> (caret)
|
|
which when it is the first character in the regexp means that the
|
|
regexp must match from the beginning of the string. And we also
|
|
need <b>$</b> (dollar) which when it is the last character in the
|
|
regexp means that the regexp must match until the end of the
|
|
string. So now our regexp is <b>^[0-9]{1,2}$</b>. Note that
|
|
assertions, such as <b>^</b> and <b>$</b>, do not match any
|
|
characters.
|
|
<p> If you've seen regexps elsewhere they may have looked different from
|
|
the ones above. This is because some sets of characters and some
|
|
quantifiers are so common that they have special symbols to
|
|
represent them. <b>[0-9]</b> can be replaced with the symbol
|
|
<b>\d</b>. The quantifier to match exactly one occurrence,
|
|
<b>{1,1}</b>, can be replaced with the expression itself. This means
|
|
that <b>x{1,1}</b> is exactly the same as <b>x</b> alone. So our 0
|
|
to 99 matcher could be written <b>^\d{1,2}$</b>. Another way of
|
|
writing it would be <b>^\d\d{0,1}$</b>, i.e. from the start of the
|
|
string match a digit followed by zero or one digits. In practice
|
|
most people would write it <b>^\d\d?$</b>. The <b>?</b> is a
|
|
shorthand for the quantifier <b>{0,1}</b>, i.e. a minimum of no
|
|
occurrences a maximum of one occurrence. This is used to make an
|
|
expression optional. The regexp <b>^\d\d?$</b> means "from the
|
|
beginning of the string match one digit followed by zero or one
|
|
digits and then the end of the string".
|
|
<p> Our second example is matching the words 'mail', 'letter' or
|
|
'correspondence' but without matching 'email', 'mailman',
|
|
'mailer', 'letterbox' etc. We'll start by just matching 'mail'. In
|
|
full the regexp is, <b>m{1,1}a{1,1}i{1,1}l{1,1}</b>, but since
|
|
each expression itself is automatically quantified by <b>{1,1}</b>
|
|
we can simply write this as <b>mail</b>; an 'm' followed by an 'a'
|
|
followed by an 'i' followed by an 'l'. The symbol '|' (bar) is
|
|
used for <em>alternation</em>, so our regexp now becomes
|
|
<b>mail|letter|correspondence</b> which means match 'mail' <em>or</em>
|
|
'letter' <em>or</em> 'correspondence'. Whilst this regexp will find the
|
|
words we want it will also find words we don't want such as
|
|
'email'. We will start by putting our regexp in parentheses,
|
|
<b>(mail|letter|correspondence)</b>. Parentheses have two effects,
|
|
firstly they group expressions together and secondly they identify
|
|
parts of the regexp that we wish to <a href="#capturing-text">capture</a>. Our regexp still matches any of the three words but now
|
|
they are grouped together as a unit. This is useful for building
|
|
up more complex regexps. It is also useful because it allows us to
|
|
examine which of the words actually matched. We need to use
|
|
another assertion, this time <b>\b</b> "word boundary":
|
|
<b>\b(mail|letter|correspondence)\b</b>. This regexp means "match
|
|
a word boundary followed by the expression in parentheses followed
|
|
by another word boundary". The <b>\b</b> assertion matches at a <em>position</em> in the regexp not a <em>character</em> in the regexp. A word
|
|
boundary is any non-word character such as a space a newline or
|
|
the beginning or end of the string.
|
|
<p> For our third example we want to replace ampersands with the HTML
|
|
entity '&amp;'. The regexp to match is simple: <b>&</b>, i.e.
|
|
match one ampersand. Unfortunately this will mess up our text if
|
|
some of the ampersands have already been turned into HTML
|
|
entities. So what we really want to say is replace an ampersand
|
|
providing it is not followed by 'amp;'. For this we need the
|
|
negative lookahead assertion and our regexp becomes:
|
|
<b>&(?!amp;)</b>. The negative lookahead assertion is introduced
|
|
with '(?!' and finishes at the ')'. It means that the text it
|
|
contains, 'amp;' in our example, must <em>not</em> follow the expression
|
|
that preceeds it.
|
|
<p> Regexps provide a rich language that can be used in a variety of
|
|
ways. For example suppose we want to count all the occurrences of
|
|
'Eric' and 'Eirik' in a string. Two valid regexps to match these
|
|
are <b>\b(Eric|Eirik)\b</b> and <b>\bEi?ri[ck]\b</b>. We need
|
|
the word boundary '\b' so we don't get 'Ericsson' etc. The second
|
|
regexp actually matches more than we want, 'Eric', 'Erik', 'Eiric'
|
|
and 'Eirik'.
|
|
<p> We will implement some the examples above in the
|
|
<a href="#code-examples">code examples</a> section.
|
|
<p> <a name="characters-and-abbreviations-for-sets-of-characters"></a>
|
|
<h3> Characters and Abbreviations for Sets of Characters
|
|
</h3>
|
|
<a name="1-1"></a><p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#a2c511"> <th valign="top">Element <th valign="top">Meaning
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>c</b>
|
|
<td valign="top">Any character represents itself unless it has a special
|
|
regexp meaning. Thus <b>c</b> matches the character <em>c</em>.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\c</b>
|
|
<td valign="top">A character that follows a backslash matches the character
|
|
itself except where mentioned below. For example if you
|
|
wished to match a literal caret at the beginning of a string
|
|
you would write <b>\^</b>.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\a</b>
|
|
<td valign="top">This matches the ASCII bell character (BEL, 0x07).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\f</b>
|
|
<td valign="top">This matches the ASCII form feed character (FF, 0x0C).
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\n</b>
|
|
<td valign="top">This matches the ASCII line feed character (LF, 0x0A, Unix newline).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\r</b>
|
|
<td valign="top">This matches the ASCII carriage return character (CR, 0x0D).
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\t</b>
|
|
<td valign="top">This matches the ASCII horizontal tab character (HT, 0x09).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\v</b>
|
|
<td valign="top">This matches the ASCII vertical tab character (VT, 0x0B).
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\xhhhh</b>
|
|
<td valign="top">This matches the Unicode character corresponding to the
|
|
hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo
|
|
(i.e., \zero ooo) matches the ASCII/Latin-1 character
|
|
corresponding to the octal number ooo (between 0 and 0377).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>. (dot)</b>
|
|
<td valign="top">This matches any character (including newline).
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\d</b>
|
|
<td valign="top">This matches a digit (<a href="tqchar.html#isDigit">TQChar::isDigit</a>()).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\D</b>
|
|
<td valign="top">This matches a non-digit.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\s</b>
|
|
<td valign="top">This matches a whitespace (<a href="tqchar.html#isSpace">TQChar::isSpace</a>()).
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\S</b>
|
|
<td valign="top">This matches a non-whitespace.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\w</b>
|
|
<td valign="top">This matches a word character (<a href="tqchar.html#isLetterOrNumber">TQChar::isLetterOrNumber</a>() or '_').
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\W</b>
|
|
<td valign="top">This matches a non-word character.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\n</b>
|
|
<td valign="top">The n-th <a href="#capturing-text">backreference</a>,
|
|
e.g. \1, \2, etc.
|
|
</table></center>
|
|
<p> <em>Note that the C++ compiler transforms backslashes in strings so to include a <b>\</b> in a regexp you will need to enter it twice, i.e. <b>\\</b>.</em>
|
|
<p> <a name="sets-of-characters"></a>
|
|
<h3> Sets of Characters
|
|
</h3>
|
|
<a name="1-2"></a><p> Square brackets are used to match any character in the set of
|
|
characters contained within the square brackets. All the character
|
|
set abbreviations described above can be used within square
|
|
brackets. Apart from the character set abbreviations and the
|
|
following two exceptions no characters have special meanings in
|
|
square brackets.
|
|
<p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>^</b>
|
|
<td valign="top">The caret negates the character set if it occurs as the
|
|
first character, i.e. immediately after the opening square
|
|
bracket. For example, <b>[abc]</b> matches 'a' or 'b' or 'c',
|
|
but <b>[^abc]</b> matches anything <em>except</em> 'a' or 'b' or
|
|
'c'.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>-</b>
|
|
<td valign="top">The dash is used to indicate a range of characters, for
|
|
example <b>[W-Z]</b> matches 'W' or 'X' or 'Y' or 'Z'.
|
|
</table></center>
|
|
<p> Using the predefined character set abbreviations is more portable
|
|
than using character ranges across platforms and languages. For
|
|
example, <b>[0-9]</b> matches a digit in Western alphabets but
|
|
<b>\d</b> matches a digit in <em>any</em> alphabet.
|
|
<p> Note that in most regexp literature sets of characters are called
|
|
"character classes".
|
|
<p> <a name="quantifiers"></a>
|
|
<h3> Quantifiers
|
|
</h3>
|
|
<a name="1-3"></a><p> By default an expression is automatically quantified by
|
|
<b>{1,1}</b>, i.e. it should occur exactly once. In the following
|
|
list <b><em>E</em></b> stands for any expression. An expression is a
|
|
character or an abbreviation for a set of characters or a set of
|
|
characters in square brackets or any parenthesised expression.
|
|
<p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>?</b>
|
|
<td valign="top">Matches zero or one occurrence of <em>E</em>. This quantifier
|
|
means "the previous expression is optional" since it will
|
|
match whether or not the expression occurs in the string. It
|
|
is the same as <b><em>E</em>{0,1}</b>. For example <b>dents?</b>
|
|
will match 'dent' and 'dents'.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>+</b>
|
|
<td valign="top">Matches one or more occurrences of <em>E</em>. This is the same
|
|
as <b><em>E</em>{1,MAXINT}</b>. For example, <b>0+</b> will match
|
|
'0', '00', '000', etc.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>*</b>
|
|
<td valign="top">Matches zero or more occurrences of <em>E</em>. This is the same
|
|
as <b><em>E</em>{0,MAXINT}</b>. The <b>*</b> quantifier is often
|
|
used by a mistake. Since it matches <em>zero</em> or more
|
|
occurrences it will match no occurrences at all. For example
|
|
if we want to match strings that end in whitespace and use
|
|
the regexp <b>\s*$</b> we would get a match on every string.
|
|
This is because we have said find zero or more whitespace
|
|
followed by the end of string, so even strings that don't end
|
|
in whitespace will match. The regexp we want in this case is
|
|
<b>\s+$</b> to match strings that have at least one
|
|
whitespace at the end.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>{n}</b>
|
|
<td valign="top">Matches exactly <em>n</em> occurrences of the expression. This
|
|
is the same as repeating the expression <em>n</em> times. For
|
|
example, <b>x{5}</b> is the same as <b>xxxxx</b>. It is also
|
|
the same as <b><em>E</em>{n,n}</b>, e.g. <b>x{5,5}</b>.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>{n,}</b>
|
|
<td valign="top">Matches at least <em>n</em> occurrences of the expression. This
|
|
is the same as <b><em>E</em>{n,MAXINT}</b>.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>{,m}</b>
|
|
<td valign="top">Matches at most <em>m</em> occurrences of the expression. This
|
|
is the same as <b><em>E</em>{0,m}</b>.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>{n,m}</b>
|
|
<td valign="top">Matches at least <em>n</em> occurrences of the expression and at
|
|
most <em>m</em> occurrences of the expression.
|
|
</table></center>
|
|
<p> (MAXINT is implementation dependent but will not be smaller than
|
|
1024.)
|
|
<p> If we wish to apply a quantifier to more than just the preceding
|
|
character we can use parentheses to group characters together in
|
|
an expression. For example, <b>tag+</b> matches a 't' followed by
|
|
an 'a' followed by at least one 'g', whereas <b>(tag)+</b> matches
|
|
at least one occurrence of 'tag'.
|
|
<p> Note that quantifiers are "greedy". They will match as much text
|
|
as they can. For example, <b>0+</b> will match as many zeros as it
|
|
can from the first zero it finds, e.g. '2.<u>000</u>5'.
|
|
Quantifiers can be made non-greedy, see <a href="#setMinimal">setMinimal</a>().
|
|
<p> <a name="capturing-text"></a>
|
|
<h3> Capturing Text
|
|
</h3>
|
|
<a name="1-4"></a><p> Parentheses allow us to group elements together so that we can
|
|
quantify and capture them. For example if we have the expression
|
|
<b>mail|letter|correspondence</b> that matches a string we know
|
|
that <em>one</em> of the words matched but not which one. Using
|
|
parentheses allows us to "capture" whatever is matched within
|
|
their bounds, so if we used <b>(mail|letter|correspondence)</b>
|
|
and matched this regexp against the string "I sent you some email"
|
|
we can use the <a href="#cap">cap</a>() or <a href="#capturedTexts">capturedTexts</a>() functions to extract the
|
|
matched characters, in this case 'mail'.
|
|
<p> We can use captured text within the regexp itself. To refer to the
|
|
captured text we use <em>backreferences</em> which are indexed from 1,
|
|
the same as for cap(). For example we could search for duplicate
|
|
words in a string using <b>\b(\w+)\W+\1\b</b> which means match a
|
|
word boundary followed by one or more word characters followed by
|
|
one or more non-word characters followed by the same text as the
|
|
first parenthesised expression followed by a word boundary.
|
|
<p> If we want to use parentheses purely for grouping and not for
|
|
capturing we can use the non-capturing syntax, e.g.
|
|
<b>(?:green|blue)</b>. Non-capturing parentheses begin '(?:' and
|
|
end ')'. In this example we match either 'green' or 'blue' but we
|
|
do not capture the match so we only know whether or not we matched
|
|
but not which color we actually found. Using non-capturing
|
|
parentheses is more efficient than using capturing parentheses
|
|
since the regexp engine has to do less book-keeping.
|
|
<p> Both capturing and non-capturing parentheses may be nested.
|
|
<p> <a name="assertions"></a>
|
|
<h3> Assertions
|
|
</h3>
|
|
<a name="1-5"></a><p> Assertions make some statement about the text at the point where
|
|
they occur in the regexp but they do not match any characters. In
|
|
the following list <b><em>E</em></b> stands for any expression.
|
|
<p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>^</b>
|
|
<td valign="top">The caret signifies the beginning of the string. If you
|
|
wish to match a literal <tt>^</tt> you must escape it by
|
|
writing <b>\^</b>. For example, <b>^#include</b> will only
|
|
match strings which <em>begin</em> with the characters '#include'.
|
|
(When the caret is the first character of a character set it
|
|
has a special meaning, see <a href="#sets-of-characters">Sets of
|
|
Characters</a>.)
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>$</b>
|
|
<td valign="top">The dollar signifies the end of the string. For example
|
|
<b>\d\s*$</b> will match strings which end with a digit
|
|
optionally followed by whitespace. If you wish to match a
|
|
literal <tt>$</tt> you must escape it by writing
|
|
<b>\$</b>.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>\b</b>
|
|
<td valign="top">A word boundary. For example the regexp
|
|
<b>\bOK\b</b> means match immediately after a word
|
|
boundary (e.g. start of string or whitespace) the letter 'O'
|
|
then the letter 'K' immediately before another word boundary
|
|
(e.g. end of string or whitespace). But note that the
|
|
assertion does not actually match any whitespace so if we
|
|
write <b>(\bOK\b)</b> and we have a match it will only
|
|
contain 'OK' even if the string is "Its <u>OK</u> now".
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>\B</b>
|
|
<td valign="top">A non-word boundary. This assertion is true wherever
|
|
<b>\b</b> is false. For example if we searched for
|
|
<b>\Bon\B</b> in "Left on" the match would fail (space
|
|
and end of string aren't non-word boundaries), but it would
|
|
match in "t<u>on</u>ne".
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>(?=<em>E</em>)</b>
|
|
<td valign="top">Positive lookahead. This assertion is true if the
|
|
expression matches at this point in the regexp. For example,
|
|
<b>const(?=\s+char)</b> matches 'const' whenever it is
|
|
followed by 'char', as in 'static <u>const</u> char *'.
|
|
(Compare with <b>const\s+char</b>, which matches 'static
|
|
<u>const char</u> *'.)
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>(?!<em>E</em>)</b>
|
|
<td valign="top">Negative lookahead. This assertion is true if the
|
|
expression does not match at this point in the regexp. For
|
|
example, <b>const(?!\s+char)</b> matches 'const' <em>except</em>
|
|
when it is followed by 'char'.
|
|
</table></center>
|
|
<p> <a name="wildcard-matching"></a>
|
|
<h3> Wildcard Matching (globbing)
|
|
</h3>
|
|
<a name="1-6"></a><p> Most command shells such as <em>bash</em> or <em>cmd.exe</em> support "file
|
|
globbing", the ability to identify a group of files by using
|
|
wildcards. The <a href="#setWildcard">setWildcard</a>() function is used to switch between
|
|
regexp and wildcard mode. Wildcard matching is much simpler than
|
|
full regexps and has only four features:
|
|
<p> <center><table cellpadding="4" cellspacing="2" border="0">
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>c</b>
|
|
<td valign="top">Any character represents itself apart from those mentioned
|
|
below. Thus <b>c</b> matches the character <em>c</em>.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>?</b>
|
|
<td valign="top">This matches any single character. It is the same as
|
|
<b>.</b> in full regexps.
|
|
<tr bgcolor="#f0f0f0"> <td valign="top"><b>*</b>
|
|
<td valign="top">This matches zero or more of any characters. It is the
|
|
same as <b>.*</b> in full regexps.
|
|
<tr bgcolor="#d0d0d0"> <td valign="top"><b>[...]</b>
|
|
<td valign="top">Sets of characters can be represented in square brackets,
|
|
similar to full regexps. Within the character class, like
|
|
outside, backslash has no special meaning.
|
|
</table></center>
|
|
<p> For example if we are in wildcard mode and have strings which
|
|
contain filenames we could identify HTML files with <b>*.html</b>.
|
|
This will match zero or more characters followed by a dot followed
|
|
by 'h', 't', 'm' and 'l'.
|
|
<p> <a name="perl-users"></a>
|
|
<h3> Notes for Perl Users
|
|
</h3>
|
|
<a name="1-7"></a><p> Most of the character class abbreviations supported by Perl are
|
|
supported by TQRegExp, see <a href="#characters-and-abbreviations-for-sets-of-characters">characters
|
|
and abbreviations for sets of characters</a>.
|
|
<p> In TQRegExp, apart from within character classes, <tt>^</tt> always
|
|
signifies the start of the string, so carets must always be
|
|
escaped unless used for that purpose. In Perl the meaning of caret
|
|
varies automagically depending on where it occurs so escaping it
|
|
is rarely necessary. The same applies to <tt>$</tt> which in
|
|
TQRegExp always signifies the end of the string.
|
|
<p> TQRegExp's quantifiers are the same as Perl's greedy quantifiers.
|
|
Non-greedy matching cannot be applied to individual quantifiers,
|
|
but can be applied to all the quantifiers in the pattern. For
|
|
example, to match the Perl regexp <b>ro+?m</b> requires:
|
|
<pre>
|
|
TQRegExp rx( "ro+m" );
|
|
rx.<a href="#setMinimal">setMinimal</a>( TRUE );
|
|
</pre>
|
|
|
|
<p> The equivalent of Perl's <tt>/i</tt> option is
|
|
<a href="#setCaseSensitive">setCaseSensitive</a>(FALSE).
|
|
<p> Perl's <tt>/g</tt> option can be emulated using a <a href="#cap_in_a_loop">loop</a>.
|
|
<p> In TQRegExp <b>.</b> matches any character, therefore all TQRegExp
|
|
regexps have the equivalent of Perl's <tt>/s</tt> option. TQRegExp
|
|
does not have an equivalent to Perl's <tt>/m</tt> option, but this
|
|
can be emulated in various ways for example by splitting the input
|
|
into lines or by looping with a regexp that searches for newlines.
|
|
<p> Because TQRegExp is string oriented there are no \A, \Z or \z
|
|
assertions. The \G assertion is not supported but can be emulated
|
|
in a loop.
|
|
<p> Perl's $& is <a href="#cap">cap</a>(0) or <a href="#capturedTexts">capturedTexts</a>()[0]. There are no TQRegExp
|
|
equivalents for $`, $' or $+. Perl's capturing variables, $1, $2,
|
|
... correspond to cap(1) or capturedTexts()[1], cap(2) or
|
|
capturedTexts()[2], etc.
|
|
<p> To substitute a pattern use <a href="tqstring.html#replace">TQString::replace</a>().
|
|
<p> Perl's extended <tt>/x</tt> syntax is not supported, nor are
|
|
directives, e.g. (?i), or regexp comments, e.g. (?#comment). On
|
|
the other hand, C++'s rules for literal strings can be used to
|
|
achieve the same:
|
|
<pre>
|
|
TQRegExp mark( "\\b" // word boundary
|
|
"[Mm]ark" // the word we want to match
|
|
);
|
|
</pre>
|
|
|
|
<p> Both zero-width positive and zero-width negative lookahead
|
|
assertions (?=pattern) and (?!pattern) are supported with the same
|
|
syntax as Perl. Perl's lookbehind assertions, "independent"
|
|
subexpressions and conditional expressions are not supported.
|
|
<p> Non-capturing parentheses are also supported, with the same
|
|
(?:pattern) syntax.
|
|
<p> See <a href="tqstringlist.html#split">TQStringList::split</a>() and <a href="tqstringlist.html#join">TQStringList::join</a>() for equivalents
|
|
to Perl's split and join functions.
|
|
<p> Note: because C++ transforms \'s they must be written <em>twice</em> in
|
|
code, e.g. <b>\b</b> must be written <b>\\b</b>.
|
|
<p> <a name="code-examples"></a>
|
|
<h3> Code Examples
|
|
</h3>
|
|
<a name="1-8"></a><p> <pre>
|
|
TQRegExp rx( "^\\d\\d?$" ); // match integers 0 to 99
|
|
rx.<a href="#search">search</a>( "123" ); // returns -1 (no match)
|
|
rx.<a href="#search">search</a>( "-6" ); // returns -1 (no match)
|
|
rx.<a href="#search">search</a>( "6" ); // returns 0 (matched as position 0)
|
|
</pre>
|
|
|
|
<p> The third string matches '<u>6</u>'. This is a simple validation
|
|
regexp for integers in the range 0 to 99.
|
|
<p> <pre>
|
|
TQRegExp rx( "^\\S+$" ); // match strings without whitespace
|
|
rx.<a href="#search">search</a>( "Hello world" ); // returns -1 (no match)
|
|
rx.<a href="#search">search</a>( "This_is-OK" ); // returns 0 (matched at position 0)
|
|
</pre>
|
|
|
|
<p> The second string matches '<u>This_is-OK</u>'. We've used the
|
|
character set abbreviation '\S' (non-whitespace) and the anchors
|
|
to match strings which contain no whitespace.
|
|
<p> In the following example we match strings containing 'mail' or
|
|
'letter' or 'correspondence' but only match whole words i.e. not
|
|
'email'
|
|
<p> <pre>
|
|
TQRegExp rx( "\\b(mail|letter|correspondence)\\b" );
|
|
rx.<a href="#search">search</a>( "I sent you an email" ); // returns -1 (no match)
|
|
rx.<a href="#search">search</a>( "Please write the letter" ); // returns 17
|
|
</pre>
|
|
|
|
<p> The second string matches "Please write the <u>letter</u>". The
|
|
word 'letter' is also captured (because of the parentheses). We
|
|
can see what text we've captured like this:
|
|
<p> <pre>
|
|
<a href="tqstring.html">TQString</a> captured = rx.cap( 1 ); // captured == "letter"
|
|
</pre>
|
|
|
|
<p> This will capture the text from the first set of capturing
|
|
parentheses (counting capturing left parentheses from left to
|
|
right). The parentheses are counted from 1 since <a href="#cap">cap</a>( 0 ) is the
|
|
whole matched regexp (equivalent to '&' in most regexp engines).
|
|
<p> <pre>
|
|
TQRegExp rx( "&(?!amp;)" ); // match ampersands but not &amp;
|
|
<a href="tqstring.html">TQString</a> line1 = "This & that";
|
|
line1.<a href="tqstring.html#replace">replace</a>( rx, "&amp;" );
|
|
// line1 == "This &amp; that"
|
|
<a href="tqstring.html">TQString</a> line2 = "His &amp; hers & theirs";
|
|
line2.<a href="tqstring.html#replace">replace</a>( rx, "&amp;" );
|
|
// line2 == "His &amp; hers &amp; theirs"
|
|
</pre>
|
|
|
|
<p> Here we've passed the TQRegExp to <a href="tqstring.html">TQString</a>'s replace() function to
|
|
replace the matched text with new text.
|
|
<p> <pre>
|
|
<a href="tqstring.html">TQString</a> str = "One Eric another Eirik, and an Ericsson."
|
|
" How many Eiriks, Eric?";
|
|
TQRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
|
|
int pos = 0; // where we are in the string
|
|
int count = 0; // how many Eric and Eirik's we've counted
|
|
while ( pos >= 0 ) {
|
|
pos = rx.<a href="#search">search</a>( str, pos );
|
|
if ( pos >= 0 ) {
|
|
pos++; // move along in str
|
|
count++; // count our Eric or Eirik
|
|
}
|
|
}
|
|
</pre>
|
|
|
|
<p> We've used the <a href="#search">search</a>() function to repeatedly match the regexp in
|
|
the string. Note that instead of moving forward by one character
|
|
at a time <tt>pos++</tt> we could have written <tt>pos += rx.matchedLength()</tt> to skip over the already matched string. The
|
|
count will equal 3, matching 'One <u>Eric</u> another
|
|
<u>Eirik</u>, and an Ericsson. How many Eiriks, <u>Eric</u>?'; it
|
|
doesn't match 'Ericsson' or 'Eiriks' because they are not bounded
|
|
by non-word boundaries.
|
|
<p> One common use of regexps is to split lines of delimited data into
|
|
their component fields.
|
|
<p> <pre>
|
|
str = "Trolltech AS\twww.trolltech.com\tNorway";
|
|
<a href="tqstring.html">TQString</a> company, web, country;
|
|
rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
|
|
if ( rx.search( str ) != -1 ) {
|
|
company = rx.cap( 1 );
|
|
web = rx.cap( 2 );
|
|
country = rx.cap( 3 );
|
|
}
|
|
</pre>
|
|
|
|
<p> In this example our input lines have the format company name, web
|
|
address and country. Unfortunately the regexp is rather long and
|
|
not very versatile -- the code will break if we add any more
|
|
fields. A simpler and better solution is to look for the
|
|
separator, '\t' in this case, and take the surrounding text. The
|
|
<a href="tqstringlist.html">TQStringList</a> split() function can take a separator string or regexp
|
|
as an argument and split a string accordingly.
|
|
<p> <pre>
|
|
<a href="tqstringlist.html">TQStringList</a> field = TQStringList::<a href="tqstringlist.html#split">split</a>( "\t", str );
|
|
</pre>
|
|
|
|
<p> Here field[0] is the company, field[1] the web address and so on.
|
|
<p> To imitate the matching of a shell we can use wildcard mode.
|
|
<p> <pre>
|
|
TQRegExp rx( "*.html" ); // invalid regexp: * doesn't quantify anything
|
|
rx.<a href="#setWildcard">setWildcard</a>( TRUE ); // now it's a valid wildcard regexp
|
|
rx.<a href="#exactMatch">exactMatch</a>( "index.html" ); // returns TRUE
|
|
rx.<a href="#exactMatch">exactMatch</a>( "default.htm" ); // returns FALSE
|
|
rx.<a href="#exactMatch">exactMatch</a>( "readme.txt" ); // returns FALSE
|
|
</pre>
|
|
|
|
<p> Wildcard matching can be convenient because of its simplicity, but
|
|
any wildcard regexp can be defined using full regexps, e.g.
|
|
<b>.*\.html$</b>. Notice that we can't match both <tt>.html</tt> and <tt>.htm</tt> files with a wildcard unless we use <b>*.htm*</b> which will
|
|
also match 'test.html.bak'. A full regexp gives us the precision
|
|
we need, <b>.*\.html?$</b>.
|
|
<p> TQRegExp can match case insensitively using <a href="#setCaseSensitive">setCaseSensitive</a>(), and
|
|
can use non-greedy matching, see <a href="#setMinimal">setMinimal</a>(). By default TQRegExp
|
|
uses full regexps but this can be changed with <a href="#setWildcard">setWildcard</a>().
|
|
Searching can be forward with <a href="#search">search</a>() or backward with
|
|
<a href="#searchRev">searchRev</a>(). Captured text can be accessed using <a href="#capturedTexts">capturedTexts</a>()
|
|
which returns a string list of all captured strings, or using
|
|
<a href="#cap">cap</a>() which returns the captured string for the given index. The
|
|
<a href="#pos">pos</a>() function takes a match index and returns the position in the
|
|
string where the match was made (or -1 if there was no match).
|
|
<p> <p>See also <a href="tqregexpvalidator.html">TQRegExpValidator</a>, <a href="tqstring.html">TQString</a>, <a href="tqstringlist.html">TQStringList</a>, <a href="misc.html">Miscellaneous Classes</a>, <a href="shared.html">Implicitly and Explicitly Shared Classes</a>, and <a href="tools.html">Non-GUI Classes</a>.
|
|
|
|
<p> <a name="member-function-documentation"></a>
|
|
|
|
<hr><h2>Member Type Documentation</h2>
|
|
<h3 class=fn><a name="CaretMode-enum"></a>TQRegExp::CaretMode</h3>
|
|
|
|
<p> The CaretMode enum defines the different meanings of the caret
|
|
(<b>^</b>) in a <a href="tqregexp.html#regular-expression">regular expression</a>. The possible values are:
|
|
<ul>
|
|
<li><tt>TQRegExp::CaretAtZero</tt> -
|
|
The caret corresponds to index 0 in the searched string.
|
|
<li><tt>TQRegExp::CaretAtOffset</tt> -
|
|
The caret corresponds to the start offset of the search.
|
|
<li><tt>TQRegExp::CaretWontMatch</tt> -
|
|
The caret never matches.
|
|
</ul>
|
|
<hr><h2>Member Function Documentation</h2>
|
|
<h3 class=fn><a name="TQRegExp"></a>TQRegExp::TQRegExp ()
|
|
</h3>
|
|
Constructs an empty regexp.
|
|
<p> <p>See also <a href="#isValid">isValid</a>() and <a href="#errorString">errorString</a>().
|
|
|
|
<h3 class=fn><a name="TQRegExp-2"></a>TQRegExp::TQRegExp ( const <a href="tqstring.html">TQString</a> & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )
|
|
</h3>
|
|
Constructs a <a href="tqregexp.html#regular-expression">regular expression</a> object for the given <em>pattern</em>
|
|
string. The pattern must be given using wildcard notation if <em>wildcard</em> is TRUE (default is FALSE). The pattern is case
|
|
sensitive, unless <em>caseSensitive</em> is FALSE. Matching is greedy
|
|
(maximal), but can be changed by calling <a href="#setMinimal">setMinimal</a>().
|
|
<p> <p>See also <a href="#setPattern">setPattern</a>(), <a href="#setCaseSensitive">setCaseSensitive</a>(), <a href="#setWildcard">setWildcard</a>(), and <a href="#setMinimal">setMinimal</a>().
|
|
|
|
<h3 class=fn><a name="TQRegExp-3"></a>TQRegExp::TQRegExp ( const <a href="tqregexp.html">TQRegExp</a> & rx )
|
|
</h3>
|
|
Constructs a <a href="tqregexp.html#regular-expression">regular expression</a> as a copy of <em>rx</em>.
|
|
<p> <p>See also <a href="#operator-eq">operator=</a>().
|
|
|
|
<h3 class=fn><a name="~TQRegExp"></a>TQRegExp::~TQRegExp ()
|
|
</h3>
|
|
Destroys the <a href="tqregexp.html#regular-expression">regular expression</a> and cleans up its internal data.
|
|
|
|
<h3 class=fn><a href="tqstring.html">TQString</a> <a name="cap"></a>TQRegExp::cap ( int nth = 0 )
|
|
</h3>
|
|
Returns the text captured by the <em>nth</em> subexpression. The entire
|
|
match has index 0 and the parenthesized subexpressions have
|
|
indices starting from 1 (excluding non-capturing parentheses).
|
|
<p> <pre>
|
|
TQRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
|
|
int pos = rxlen.<a href="#search">search</a>( "Length: 189cm" );
|
|
if ( pos > -1 ) {
|
|
<a href="tqstring.html">TQString</a> value = rxlen.<a href="#cap">cap</a>( 1 ); // "189"
|
|
<a href="tqstring.html">TQString</a> unit = rxlen.<a href="#cap">cap</a>( 2 ); // "cm"
|
|
// ...
|
|
}
|
|
</pre>
|
|
|
|
<p> The order of elements matched by <a href="#cap">cap</a>() is as follows. The first
|
|
element, cap(0), is the entire matching string. Each subsequent
|
|
element corresponds to the next capturing open left parentheses.
|
|
Thus cap(1) is the text of the first capturing parentheses, cap(2)
|
|
is the text of the second, and so on.
|
|
<p> <a name="cap_in_a_loop"></a>
|
|
Some patterns may lead to a number of matches which cannot be
|
|
determined in advance, for example:
|
|
<p> <pre>
|
|
TQRegExp rx( "(\\d+)" );
|
|
str = "Offsets: 12 14 99 231 7";
|
|
<a href="tqstringlist.html">TQStringList</a> list;
|
|
pos = 0;
|
|
while ( pos >= 0 ) {
|
|
pos = rx.<a href="#search">search</a>( str, pos );
|
|
if ( pos > -1 ) {
|
|
list += rx.<a href="#cap">cap</a>( 1 );
|
|
pos += rx.<a href="#matchedLength">matchedLength</a>();
|
|
}
|
|
}
|
|
// list contains "12", "14", "99", "231", "7"
|
|
</pre>
|
|
|
|
<p> <p>See also <a href="#capturedTexts">capturedTexts</a>(), <a href="#pos">pos</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
|
|
|
<p>Examples: <a href="archivesearch-example.html#x479">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2485">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn><a href="tqstringlist.html">TQStringList</a> <a name="capturedTexts"></a>TQRegExp::capturedTexts ()
|
|
</h3>
|
|
Returns a list of the captured text strings.
|
|
<p> The first string in the list is the entire matched string. Each
|
|
subsequent list element contains a string that matched a
|
|
(capturing) subexpression of the regexp.
|
|
<p> For example:
|
|
<pre>
|
|
TQRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
|
|
int pos = rx.<a href="#search">search</a>( "Length: 36 inches" );
|
|
<a href="tqstringlist.html">TQStringList</a> list = rx.<a href="#capturedTexts">capturedTexts</a>();
|
|
// list is now ( "36 inches", "36", " ", "inches", "es" )
|
|
</pre>
|
|
|
|
<p> The above example also captures elements that may be present but
|
|
which we have no interest in. This problem can be solved by using
|
|
non-capturing parentheses:
|
|
<p> <pre>
|
|
TQRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
|
|
int pos = rx.<a href="#search">search</a>( "Length: 36 inches" );
|
|
<a href="tqstringlist.html">TQStringList</a> list = rx.<a href="#capturedTexts">capturedTexts</a>();
|
|
// list is now ( "36 inches", "36", "inches" )
|
|
</pre>
|
|
|
|
<p> Note that if you want to iterate over the list, you should iterate
|
|
over a copy, e.g.
|
|
<pre>
|
|
<a href="tqstringlist.html">TQStringList</a> list = rx.capturedTexts();
|
|
TQStringList::Iterator it = list.<a href="tqvaluelist.html#begin">begin</a>();
|
|
while( it != list.<a href="tqvaluelist.html#end">end</a>() ) {
|
|
myProcessing( *it );
|
|
++it;
|
|
}
|
|
</pre>
|
|
|
|
<p> Some regexps can match an indeterminate number of times. For
|
|
example if the input string is "Offsets: 12 14 99 231 7" and the
|
|
regexp, <tt>rx</tt>, is <b>(\d+)+</b>, we would hope to get a list of
|
|
all the numbers matched. However, after calling
|
|
<tt>rx.search(str)</tt>, <a href="#capturedTexts">capturedTexts</a>() will return the list ( "12",
|
|
"12" ), i.e. the entire match was "12" and the first subexpression
|
|
matched was "12". The correct approach is to use <a href="#cap">cap</a>() in a <a href="#cap_in_a_loop">loop</a>.
|
|
<p> The order of elements in the string list is as follows. The first
|
|
element is the entire matching string. Each subsequent element
|
|
corresponds to the next capturing open left parentheses. Thus
|
|
capturedTexts()[1] is the text of the first capturing parentheses,
|
|
capturedTexts()[2] is the text of the second and so on
|
|
(corresponding to $1, $2, etc., in some other regexp languages).
|
|
<p> <p>See also <a href="#cap">cap</a>(), <a href="#pos">pos</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
|
|
|
<h3 class=fn>bool <a name="caseSensitive"></a>TQRegExp::caseSensitive () const
|
|
</h3>
|
|
Returns TRUE if case sensitivity is enabled; otherwise returns
|
|
FALSE. The default is TRUE.
|
|
<p> <p>See also <a href="#setCaseSensitive">setCaseSensitive</a>().
|
|
|
|
<h3 class=fn><a href="tqstring.html">TQString</a> <a name="errorString"></a>TQRegExp::errorString ()
|
|
</h3>
|
|
Returns a text string that explains why a regexp pattern is
|
|
invalid the case being; otherwise returns "no error occurred".
|
|
<p> <p>See also <a href="#isValid">isValid</a>().
|
|
|
|
<p>Example: <a href="regexptester-example.html#x2486">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn><a href="tqstring.html">TQString</a> <a name="escape"></a>TQRegExp::escape ( const <a href="tqstring.html">TQString</a> & str )<tt> [static]</tt>
|
|
</h3>
|
|
Returns the string <em>str</em> with every regexp special character
|
|
escaped with a backslash. The special characters are $, (, ), *, +,
|
|
., ?, [, \, ], ^, {, | and }.
|
|
<p> Example:
|
|
<pre>
|
|
s1 = TQRegExp::<a href="#escape">escape</a>( "bingo" ); // s1 == "bingo"
|
|
s2 = TQRegExp::<a href="#escape">escape</a>( "f(x)" ); // s2 == "f\\(x\\)"
|
|
</pre>
|
|
|
|
<p> This function is useful to construct regexp patterns dynamically:
|
|
<p> <pre>
|
|
TQRegExp rx( "(" + TQRegExp::escape(name) +
|
|
"|" + TQRegExp::escape(alias) + ")" );
|
|
</pre>
|
|
|
|
|
|
<h3 class=fn>bool <a name="exactMatch"></a>TQRegExp::exactMatch ( const <a href="tqstring.html">TQString</a> & str ) const
|
|
</h3>
|
|
Returns TRUE if <em>str</em> is matched exactly by this <a href="tqregexp.html#regular-expression">regular expression</a>; otherwise returns FALSE. You can determine how much of
|
|
the string was matched by calling <a href="#matchedLength">matchedLength</a>().
|
|
<p> For a given regexp string, R, <a href="#exactMatch">exactMatch</a>("R") is the equivalent of
|
|
<a href="#search">search</a>("^R$") since exactMatch() effectively encloses the regexp
|
|
in the start of string and end of string anchors, except that it
|
|
sets matchedLength() differently.
|
|
<p> For example, if the regular expression is <b>blue</b>, then
|
|
exactMatch() returns TRUE only for input <tt>blue</tt>. For inputs <tt>bluebell</tt>, <tt>blutak</tt> and <tt>lightblue</tt>, exactMatch() returns FALSE
|
|
and matchedLength() will return 4, 3 and 0 respectively.
|
|
<p> Although const, this function sets matchedLength(),
|
|
<a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
|
<p> <p>See also <a href="#search">search</a>(), <a href="#searchRev">searchRev</a>(), and <a href="tqregexpvalidator.html">TQRegExpValidator</a>.
|
|
|
|
<h3 class=fn>bool <a name="isEmpty"></a>TQRegExp::isEmpty () const
|
|
</h3>
|
|
Returns TRUE if the pattern string is empty; otherwise returns
|
|
FALSE.
|
|
<p> If you call <a href="#exactMatch">exactMatch</a>() with an empty pattern on an empty string
|
|
it will return TRUE; otherwise it returns FALSE since it operates
|
|
over the whole string. If you call <a href="#search">search</a>() with an empty pattern
|
|
on <em>any</em> string it will return the start offset (0 by default)
|
|
because the empty pattern matches the 'emptiness' at the start of
|
|
the string. In this case the length of the match returned by
|
|
<a href="#matchedLength">matchedLength</a>() will be 0.
|
|
<p> See <a href="tqstring.html#isEmpty">TQString::isEmpty</a>().
|
|
|
|
<h3 class=fn>bool <a name="isValid"></a>TQRegExp::isValid () const
|
|
</h3>
|
|
Returns TRUE if the <a href="tqregexp.html#regular-expression">regular expression</a> is valid; otherwise returns
|
|
FALSE. An invalid regular expression never matches.
|
|
<p> The pattern <b>[a-z</b> is an example of an invalid pattern, since
|
|
it lacks a closing square bracket.
|
|
<p> Note that the validity of a regexp may also depend on the setting
|
|
of the wildcard flag, for example <b>*.html</b> is a valid
|
|
wildcard regexp but an invalid full regexp.
|
|
<p> <p>See also <a href="#errorString">errorString</a>().
|
|
|
|
<p>Example: <a href="regexptester-example.html#x2487">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>int <a name="match"></a>TQRegExp::match ( const <a href="tqstring.html">TQString</a> & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const
|
|
</h3> <b>This function is obsolete.</b> It is provided to keep old source working. We strongly advise against using it in new code.
|
|
<p> Attempts to match in <em>str</em>, starting from position <em>index</em>.
|
|
Returns the position of the match, or -1 if there was no match.
|
|
<p> The length of the match is stored in <em>*len</em>, unless <em>len</em> is a
|
|
null pointer.
|
|
<p> If <em>indexIsStart</em> is TRUE (the default), the position <em>index</em> in
|
|
the string will match the start of string anchor, <b>^</b>, in the
|
|
regexp, if present. Otherwise, position 0 in <em>str</em> will match.
|
|
<p> Use <a href="#search">search</a>() and <a href="#matchedLength">matchedLength</a>() instead of this function.
|
|
<p> <p>See also <a href="tqstring.html#mid">TQString::mid</a>() and <a href="tqconststring.html">TQConstString</a>.
|
|
|
|
<p>Example: <a href="qmag-example.html#x1791">qmag/qmag.cpp</a>.
|
|
<h3 class=fn>int <a name="matchedLength"></a>TQRegExp::matchedLength () const
|
|
</h3>
|
|
Returns the length of the last matched string, or -1 if there was
|
|
no match.
|
|
<p> <p>See also <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
|
|
|
<p>Examples: <a href="archivesearch-example.html#x480">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2488">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>bool <a name="minimal"></a>TQRegExp::minimal () const
|
|
</h3>
|
|
Returns TRUE if minimal (non-greedy) matching is enabled;
|
|
otherwise returns FALSE.
|
|
<p> <p>See also <a href="#setMinimal">setMinimal</a>().
|
|
|
|
<h3 class=fn>int <a name="numCaptures"></a>TQRegExp::numCaptures () const
|
|
</h3>
|
|
Returns the number of captures contained in the <a href="tqregexp.html#regular-expression">regular expression</a>.
|
|
|
|
<p>Example: <a href="regexptester-example.html#x2489">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>bool <a name="operator!-eq"></a>TQRegExp::operator!= ( const <a href="tqregexp.html">TQRegExp</a> & rx ) const
|
|
</h3>
|
|
|
|
<p> Returns TRUE if this <a href="tqregexp.html#regular-expression">regular expression</a> is not equal to <em>rx</em>;
|
|
otherwise returns FALSE.
|
|
<p> <p>See also <a href="#operator-eq-eq">operator==</a>().
|
|
|
|
<h3 class=fn><a href="tqregexp.html">TQRegExp</a> & <a name="operator-eq"></a>TQRegExp::operator= ( const <a href="tqregexp.html">TQRegExp</a> & rx )
|
|
</h3>
|
|
Copies the <a href="tqregexp.html#regular-expression">regular expression</a> <em>rx</em> and returns a reference to the
|
|
copy. The case sensitivity, wildcard and minimal matching options
|
|
are also copied.
|
|
|
|
<h3 class=fn>bool <a name="operator-eq-eq"></a>TQRegExp::operator== ( const <a href="tqregexp.html">TQRegExp</a> & rx ) const
|
|
</h3>
|
|
Returns TRUE if this <a href="tqregexp.html#regular-expression">regular expression</a> is equal to <em>rx</em>;
|
|
otherwise returns FALSE.
|
|
<p> Two TQRegExp objects are equal if they have the same pattern
|
|
strings and the same settings for case sensitivity, wildcard and
|
|
minimal matching.
|
|
|
|
<h3 class=fn><a href="tqstring.html">TQString</a> <a name="pattern"></a>TQRegExp::pattern () const
|
|
</h3>
|
|
Returns the pattern string of the <a href="tqregexp.html#regular-expression">regular expression</a>. The pattern
|
|
has either regular expression syntax or wildcard syntax, depending
|
|
on <a href="#wildcard">wildcard</a>().
|
|
<p> <p>See also <a href="#setPattern">setPattern</a>().
|
|
|
|
<h3 class=fn>int <a name="pos"></a>TQRegExp::pos ( int nth = 0 )
|
|
</h3>
|
|
Returns the position of the <em>nth</em> captured text in the searched
|
|
string. If <em>nth</em> is 0 (the default), <a href="#pos">pos</a>() returns the position
|
|
of the whole match.
|
|
<p> Example:
|
|
<pre>
|
|
TQRegExp rx( "/([a-z]+)/([a-z]+)" );
|
|
rx.<a href="#search">search</a>( "Output /dev/null" ); // returns 7 (position of /dev/null)
|
|
rx.<a href="#pos">pos</a>( 0 ); // returns 7 (position of /dev/null)
|
|
rx.<a href="#pos">pos</a>( 1 ); // returns 8 (position of dev)
|
|
rx.<a href="#pos">pos</a>( 2 ); // returns 12 (position of null)
|
|
</pre>
|
|
|
|
<p> For zero-length matches, pos() always returns -1. (For example, if
|
|
<a href="#cap">cap</a>(4) would return an empty string, pos(4) returns -1.) This is
|
|
due to an implementation tradeoff.
|
|
<p> <p>See also <a href="#capturedTexts">capturedTexts</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
|
|
|
<h3 class=fn>int <a name="search"></a>TQRegExp::search ( const <a href="tqstring.html">TQString</a> & str, int offset = 0, <a href="tqregexp.html#CaretMode-enum">CaretMode</a> caretMode = CaretAtZero ) const
|
|
</h3>
|
|
Attempts to find a match in <em>str</em> from position <em>offset</em> (0 by
|
|
default). If <em>offset</em> is -1, the search starts at the last
|
|
character; if -2, at the next to last character; etc.
|
|
<p> Returns the position of the first match, or -1 if there was no
|
|
match.
|
|
<p> The <em>caretMode</em> parameter can be used to instruct whether <b>^</b>
|
|
should match at index 0 or at <em>offset</em>.
|
|
<p> You might prefer to use <a href="tqstring.html#find">TQString::find</a>(), <a href="tqstring.html#contains">TQString::contains</a>() or
|
|
even <a href="tqstringlist.html#grep">TQStringList::grep</a>(). To replace matches use
|
|
<a href="tqstring.html#replace">TQString::replace</a>().
|
|
<p> Example:
|
|
<pre>
|
|
<a href="tqstring.html">TQString</a> str = "offsets: 1.23 .50 71.00 6.00";
|
|
TQRegExp rx( "\\d*\\.\\d+" ); // primitive floating point matching
|
|
int count = 0;
|
|
int pos = 0;
|
|
while ( (pos = rx.<a href="#search">search</a>(str, pos)) != -1 ) {
|
|
count++;
|
|
pos += rx.<a href="#matchedLength">matchedLength</a>();
|
|
}
|
|
// pos will be 9, 14, 18 and finally 24; count will end up as 4
|
|
</pre>
|
|
|
|
<p> Although const, this function sets <a href="#matchedLength">matchedLength</a>(),
|
|
<a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
|
<p> <p>See also <a href="#searchRev">searchRev</a>() and <a href="#exactMatch">exactMatch</a>().
|
|
|
|
<p>Examples: <a href="archivesearch-example.html#x481">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2490">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>int <a name="searchRev"></a>TQRegExp::searchRev ( const <a href="tqstring.html">TQString</a> & str, int offset = -1, <a href="tqregexp.html#CaretMode-enum">CaretMode</a> caretMode = CaretAtZero ) const
|
|
</h3>
|
|
Attempts to find a match backwards in <em>str</em> from position <em>offset</em>. If <em>offset</em> is -1 (the default), the search starts at the
|
|
last character; if -2, at the next to last character; etc.
|
|
<p> Returns the position of the first match, or -1 if there was no
|
|
match.
|
|
<p> The <em>caretMode</em> parameter can be used to instruct whether <b>^</b>
|
|
should match at index 0 or at <em>offset</em>.
|
|
<p> Although const, this function sets <a href="#matchedLength">matchedLength</a>(),
|
|
<a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
|
<p> <b>Warning:</b> Searching backwards is much slower than searching
|
|
forwards.
|
|
<p> <p>See also <a href="#search">search</a>() and <a href="#exactMatch">exactMatch</a>().
|
|
|
|
<h3 class=fn>void <a name="setCaseSensitive"></a>TQRegExp::setCaseSensitive ( bool sensitive )
|
|
</h3>
|
|
Sets case sensitive matching to <em>sensitive</em>.
|
|
<p> If <em>sensitive</em> is TRUE, <b>\.txt$</b> matches <tt>readme.txt</tt> but
|
|
not <tt>README.TXT</tt>.
|
|
<p> <p>See also <a href="#caseSensitive">caseSensitive</a>().
|
|
|
|
<p>Example: <a href="regexptester-example.html#x2491">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>void <a name="setMinimal"></a>TQRegExp::setMinimal ( bool minimal )
|
|
</h3>
|
|
Enables or disables minimal matching. If <em>minimal</em> is FALSE,
|
|
matching is greedy (maximal) which is the default.
|
|
<p> For example, suppose we have the input string "We must be
|
|
<b>bold</b>, very <b>bold</b>!" and the pattern
|
|
<b><b>.*</b></b>. With the default greedy (maximal) matching,
|
|
the match is "We must be <u><b>bold</b>, very
|
|
<b>bold</b></u>!". But with minimal (non-greedy) matching the
|
|
first match is: "We must be <u><b>bold</b></u>, very
|
|
<b>bold</b>!" and the second match is "We must be <b>bold</b>,
|
|
very <u><b>bold</b></u>!". In practice we might use the pattern
|
|
<b><b>[^<]+</b></b> instead, although this will still fail for
|
|
nested tags.
|
|
<p> <p>See also <a href="#minimal">minimal</a>().
|
|
|
|
<p>Examples: <a href="archivesearch-example.html#x482">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2492">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>void <a name="setPattern"></a>TQRegExp::setPattern ( const <a href="tqstring.html">TQString</a> & pattern )
|
|
</h3>
|
|
Sets the pattern string to <em>pattern</em>. The case sensitivity,
|
|
wildcard and minimal matching options are not changed.
|
|
<p> <p>See also <a href="#pattern">pattern</a>().
|
|
|
|
<h3 class=fn>void <a name="setWildcard"></a>TQRegExp::setWildcard ( bool wildcard )
|
|
</h3>
|
|
Sets the wildcard mode for the <a href="tqregexp.html#regular-expression">regular expression</a>. The default is
|
|
FALSE.
|
|
<p> Setting <em>wildcard</em> to TRUE enables simple shell-like wildcard
|
|
matching. (See <a href="#wildcard-matching">wildcard matching
|
|
(globbing)</a>.)
|
|
<p> For example, <b>r*.txt</b> matches the string <tt>readme.txt</tt> in
|
|
wildcard mode, but does not match <tt>readme</tt>.
|
|
<p> <p>See also <a href="#wildcard">wildcard</a>().
|
|
|
|
<p>Example: <a href="regexptester-example.html#x2493">regexptester/regexptester.cpp</a>.
|
|
<h3 class=fn>bool <a name="wildcard"></a>TQRegExp::wildcard () const
|
|
</h3>
|
|
Returns TRUE if wildcard mode is enabled; otherwise returns FALSE.
|
|
The default is FALSE.
|
|
<p> <p>See also <a href="#setWildcard">setWildcard</a>().
|
|
|
|
<!-- eof -->
|
|
<hr><p>
|
|
This file is part of the <a href="index.html">TQt toolkit</a>.
|
|
Copyright © 1995-2007
|
|
<a href="http://www.trolltech.com/">Trolltech</a>. All Rights Reserved.<p><address><hr><div align=center>
|
|
<table width=100% cellspacing=0 border=0><tr>
|
|
<td>Copyright © 2007
|
|
<a href="troll.html">Trolltech</a><td align=center><a href="trademarks.html">Trademarks</a>
|
|
<td align=right><div align=right>TQt 3.3.8</div>
|
|
</table></div></address></body>
|
|
</html>
|