<quote>1</quote>,<quote>2</quote>,<quote>3</quote> or
<quote>8</quote>).</para>
<para>As capital letters are different characters from their
non-capital equivalents, to create a caseless character class matching
<quote>a</quote> or <quote>b</quote>, in any case, you need to write it
<userinput>[aAbB]</userinput>.</para>
<para>It is of cause possible to create a <quote>negative</quote>
class matching as <quote>anything but</quote> To do so put a caret
(<literal>^</literal>) at the beginning of the class: </para>
<para><userinput>[^abc]</userinput> will match any character
<emphasis>but</emphasis> <quote>a</quote>, <quote>b</quote> or
<quote>c</quote>.</para>
<para>In addition to literal characters, some abbreviations are
defined, making life still a bit easier:
<variablelist>
<varlistentry>
<term><userinput>\a</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> bell character (BEL, 0x07).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\f</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> form feed character (FF, 0x0C).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\n</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> line feed character (LF, 0x0A, Unix newline).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\r</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> carriage return character (CR, 0x0D).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\t</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> horizontal tab character (HT, 0x09).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\v</userinput></term>
<listitem><para> This matches the <acronym>ASCII</acronym> vertical tab character (VT, 0x0B).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\xhhhh</userinput></term>
<listitem><para> This matches the Unicode character corresponding to
the hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo (i.e.,
\zero ooo) matches the <acronym>ASCII</acronym>/Latin-1 character
corresponding to the octal number ooo (between 0 and
0377).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>.</userinput> (dot)</term>
<listitem><para> This matches any character (including newline).</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\d</userinput></term>
<listitem><para> This matches a digit. Equal to <literal>[0-9]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\D</userinput></term>
<listitem><para> This matches a non-digit. Equal to <literal>[^0-9]</literal> or <literal>[^\d]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\s</userinput></term>
<listitem><para> This matches a whitespace character. Practically equal to <literal>[ \t\n\r]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\S</userinput></term>
<listitem><para> This matches a non-whitespace. Practically equal to <literal>[^ \t\r\n]</literal>, and equal to <literal>[^\s]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\w</userinput></term>
<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that
underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions.
Equal to <literal>[a-zA-Z0-9]</literal></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\W</userinput></term>
<listitem><para>Matches any non-word character - anything but letters or numbers.
Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem>
</varlistentry>
</variablelist>
</para>
<para>The abbreviated classes can be put inside a custom class, for
example to match a word character, a blank or a dot, you could write
<userinput>[\w \.]</userinput></para>
<note> <para>The POSIX notation of classes, <userinput>[:<class
name>:]</userinput> is currently not supported.</para> </note>
<sect3>
<title>Characters with special meanings inside character classes</title>
<para>The following characters has a special meaning inside the
<quote>[]</quote> character class construct, and must be escaped to be
literally included in a class:</para>
<variablelist>
<varlistentry>
<term><userinput>]</userinput></term>
<listitem><para>Ends the character class. Must be escaped unless it is the very first character in the
class (may follow an unescaped caret)</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>^</userinput> (caret)</term>
<listitem><para>Denotes a negative class, if it is the first character. Must be escaped to match literally if it is the first character in the class.</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>-</userinput> (dash)</term>
<listitem><para>Denotes a logical range. Must always be escaped within a character class.</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\</userinput> (backslash)</term>
<listitem><para>The escape character. Must always be escaped.</para></listitem>
<listitem><para>similar to <literal>{0,1}</literal>, zero or 1 occurrence.</para></listitem>
</varlistentry>
</variablelist>
</para>
<sect2>
<title>Greed</title>
<para>When using quantifiers with no maximum, regular expressions
defaults to match as much of the searched string as possible, commonly
known as <emphasis>greedy</emphasis> behavior.</para>
<para>Modern regular expression software provides the means of
<quote>turning off greediness</quote>, though in a graphical
environment it is up to the interface to provide you with access to
this feature. For example a search dialog providing a regular
expression search could have a check box labeled <quote>Minimal
matching</quote> as well as it ought to indicate if greediness is the
default behavior.</para>
</sect2>
<sect2>
<title>In context examples</title>
<para>Here are a few examples of using quantifiers</para>
<variablelist>
<varlistentry>
<term><userinput>^\d{4,5}\s</userinput></term>
<listitem><para>Matches the digits in <quote>1234 go</quote> and <quote>12345 now</quote>, but neither in <quote>567 eleven</quote>
nor in <quote>223459 somewhere</quote></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>\s+</userinput></term>
<listitem><para>Matches one or more whitespace characters</para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>(bla){1,}</userinput></term>
<listitem><para>Matches all of <quote>blablabla</quote> and the <quote>bla</quote> in <quote>blackbird</quote> or <quote>tabla</quote></para></listitem>
</varlistentry>
<varlistentry>
<term><userinput>/?></userinput></term>
<listitem><para>Matches <quote>/></quote> in <quote><closeditem/></quote> as well as
<quote>></quote> in <quote><openitem></quote>.</para></listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>
<sect1 id="assertions">
<title>Assertions</title>
<para><emphasis>Assertions</emphasis> allows a regular expression to
match only under certain controlled conditions.</para>
<para>An assertion does not need a character to match, it rather
investigates the surroundings of a possible match before acknowledging
it. For example the <emphasis>word boundary</emphasis> assertion does
not try to find a non word character opposite a word one at its
position, instead it makes sure that there is not a word
character. This means that the assertion can match where there is no
character, i.e. at the ends of a searched string.</para>
<para>Some assertions actually does have a pattern to match, but the
part of the string matching that will not be a part of the result of
the match of the full expression.</para>
<para>Regular Expressions as documented here supports the following
assertions:
<variablelist>
<varlistentry>
<term><userinput>^</userinput> (caret: beginning of
string)</term>
<listitem><para>Matches the beginning of the searched
string.</para> <para>The expression <userinput>^Peter</userinput> will
match at <quote>Peter</quote> in the string <quote>Peter, hey!</quote>
but not in <quote>Hey, Peter!</quote> </para> </listitem>
</varlistentry>
<varlistentry>
<term><userinput>$</userinput> (end of string)</term>
<listitem><para>Matches the end of the searched string.</para>
<para>The expression <userinput>you\?$</userinput> will match at the
last you in the string <quote>You didn't do that, did you?</quote> but
nowhere in <quote>You didn't do that, right?</quote></para>