You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
932 lines
39 KiB
932 lines
39 KiB
<appendix id="highlight">
|
|
<appendixinfo>
|
|
<authorgroup>
|
|
<author><personname><firstname></firstname></personname></author>
|
|
<!-- TRANS:ROLES_OF_TRANSLATORS -->
|
|
</authorgroup>
|
|
</appendixinfo>
|
|
<title>Working with Syntax Highlighting</title>
|
|
|
|
<sect1 id="highlight-overview">
|
|
|
|
<title>Overview</title>
|
|
|
|
<para>Syntax Highlighting is what makes the editor automatically
|
|
display text in different styles/colors, depending on the function of
|
|
the string in relation to the purpose of the file. In program source
|
|
code for example, control statements may be rendered bold, while data
|
|
types and comments get different colors from the rest of the
|
|
text. This greatly enhances the readability of the text, and thus
|
|
helps the author to be more efficient and productive.</para>
|
|
|
|
<mediaobject>
|
|
<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject>
|
|
<textobject><phrase>A Perl function, rendered with syntax
|
|
highlighting.</phrase></textobject>
|
|
<caption><para>A Perl function, rendered with syntax highlighting.</para>
|
|
</caption>
|
|
</mediaobject>
|
|
|
|
<mediaobject>
|
|
<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject>
|
|
<textobject><phrase>The same Perl function, without
|
|
highlighting.</phrase></textobject>
|
|
<caption><para>The same Perl function, without highlighting.</para></caption>
|
|
</mediaobject>
|
|
|
|
<para>Of the two examples, which is easiest to read?</para>
|
|
|
|
<para>&kate; comes with a flexible, configurable and capable system
|
|
for doing syntax highlighting, and the standard distribution provides
|
|
definitions for a wide range of programming, scripting and markup
|
|
languages and other text file formats. In addition you can
|
|
provide your own definitions in simple &XML; files.</para>
|
|
|
|
<para>&kate; will automatically detect the right syntax rules when you
|
|
open a file, based on the &MIME; Type of the file, determined by its
|
|
extension, or, if it has none, the contents. Should you experience a
|
|
bad choice, you can manually set the syntax to use from the
|
|
<menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight
|
|
Mode</guisubmenu></menuchoice> menu.</para>
|
|
|
|
<para>The styles and colors used by each syntax highlight definition
|
|
can be configured using the <link
|
|
linkend="config-dialog-editor-appearance">Appearance</link> page of the
|
|
<link linkend="config-dialog">Config Dialog</link>, while the &MIME; Types
|
|
it should be used for, are handeled by the <link
|
|
linkend="config-dialog-editor-highlighting">Highlight</link>
|
|
page.</para>
|
|
|
|
<note>
|
|
<para>Syntax highlighting is there to enhance the readability of
|
|
correct text, but you cannot trust it to validate your text. Marking
|
|
text for syntax is difficult depending on the format you are using,
|
|
and in some cases the authors of the syntax rules will be proud if 98%
|
|
of text gets correctly rendered, though most often you need a rare
|
|
style to see the incorrect 2%.</para>
|
|
</note>
|
|
|
|
<tip>
|
|
<para>You can download updated or additional syntax highlight
|
|
definitions from the &kate; website by clicking the
|
|
<guibutton>Download</guibutton> button in the <link
|
|
linkend="config-dialog-editor-highlighting">Highlight Page</link> of the <link
|
|
linkend="config-dialog">Config Dialog</link>.</para>
|
|
</tip>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="katehighlight-system">
|
|
|
|
<title>The &kate; Syntax Highlight System</title>
|
|
|
|
<para>This section will discuss the &kate; syntax highlighting
|
|
mechanism in more detail. It is for you if you want to know about
|
|
it, or if you want to change or create syntax definitions.</para>
|
|
|
|
<sect2 id="katehighlight-howitworks">
|
|
|
|
<title>How it Works</title>
|
|
|
|
<para>Whenever you open a file, one of the first things the &kate;
|
|
editor does is detect which syntax definition to use for the
|
|
file. While reading the text of the file, and while you type away in
|
|
it, the syntax highlighting system will analyze the text using the
|
|
rules defined by the syntax definition and mark in it where different
|
|
contexts and styles begin and end.</para>
|
|
|
|
<para>When you type in the document, the new text is analyzed and marked on the
|
|
fly, so that if you delete a character that is marked as the beginning or end
|
|
of a context, the style of surrounding text changes accordingly.</para>
|
|
|
|
<para>The syntax definitions used by the &kate; Syntax Highlighting System are
|
|
&XML; files, containing
|
|
<itemizedlist>
|
|
<listitem><para>Rules for detecting the role of text, organized into context blocks</para></listitem>
|
|
<listitem><para>Keyword lists</para></listitem>
|
|
<listitem><para>Style Item definitions</para></listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>When analyzing the text, the detection rules are evaluated in
|
|
the order in which they are defined, and if the beginning of the
|
|
current string matches a rule, the related context is used. The start
|
|
point in the text is moved to the final point at which that rule
|
|
matched and a new loop of the rules begins, starting in the context
|
|
set by the matched rule.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="highlight-system-rules">
|
|
<title>Rules</title>
|
|
|
|
<para>The detection rules are the heart of the highlighting detection
|
|
system. A rule is a string, character or <link
|
|
linkend="regular-expressions">regular expression</link> against which
|
|
to match the text being analyzed. It contains information about which
|
|
style to use for the matching part of the text. It may switch the
|
|
working context of the system either to an explicitly mentioned
|
|
context or to the previous context used by the text.</para>
|
|
|
|
<para>Rules are organized in context groups. A context group is used
|
|
for main text concepts within the format, for example quoted text
|
|
strings or comment blocks in program source code. This ensures that
|
|
the highlighting system does not need to loop through all rules when
|
|
it is not necessary, and that some character sequences in the text can
|
|
be treated differently depending on the current context.
|
|
</para>
|
|
|
|
<para>Contexts may be generated dynamically to allow the usage of instance
|
|
specific data in rules.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="highlight-context-styles-keywords">
|
|
<title>Context Styles and Keywords</title>
|
|
|
|
<para>In some programming languages, integer numbers are treated
|
|
differently than floating point ones by the compiler (the program that
|
|
converts the source code to a binary executable), and there may be
|
|
characters having a special meaning within a quoted string. In such
|
|
cases, it makes sense to render them differently from the surroundings
|
|
so that they are easy to identify while reading the text. So even if
|
|
they do not represent special contexts, they may be seen as such by
|
|
the syntax highlighting system, so that they can be marked for
|
|
different rendering.</para>
|
|
|
|
<para>A syntax definition may contain as many styles as required to
|
|
cover the concepts of the format it is used for.</para>
|
|
|
|
<para>In many formats, there are lists of words that represent a
|
|
specific concept. For example in programming languages, the control
|
|
statements is one concept, data type names another, and built in
|
|
functions of the language a third. The &kate; Syntax Highlighting
|
|
System can use such lists to detect and mark words in the text to
|
|
emphasize concepts of the text formats.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="kate-highlight-system-default-styles">
|
|
<title>Default Styles</title>
|
|
|
|
<para>If you open a C++ source file, a &Java; source file and an
|
|
<acronym>HTML</acronym> document in &kate;, you will see that even
|
|
though the formats are different, and thus different words are chosen
|
|
for special treatment, the colors used are the same. This is because
|
|
&kate; has a predefined list of Default Styles which are employed by
|
|
the individual syntax definitions.</para>
|
|
|
|
<para>This makes it easy to recognize similar concepts in different
|
|
text formats. For example comments are present in almost any
|
|
programming, scripting or markup language, and when they are rendered
|
|
using the same style in all languages, you do not have to stop and
|
|
think to identify them within the text.</para>
|
|
|
|
<tip>
|
|
<para>All styles in a syntax definition use one of the default
|
|
styles. A few syntax definitions use more styles that there are
|
|
defaults, so if you use a format often, it may be worth launching the
|
|
configuration dialog to see if some concepts are using the same
|
|
style. For example there is only one default style for strings, but as
|
|
the Perl programming language operates with two types of strings, you
|
|
can enhance the highlighting by configuring those to be slightly
|
|
different. All <link linkend="kate-highlight-default-styles">available default styles</link>
|
|
will be explained later.</para>
|
|
</tip>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="katehighlight-xml-format">
|
|
<title>The Highlight Definition &XML; Format</title>
|
|
|
|
<sect2>
|
|
<title>Overview</title>
|
|
|
|
<para>This section is an overview of the Highlight Definition &XML;
|
|
format. Based on a small example it will describe the main components
|
|
and their meaning and usage. The next section will go into detail with
|
|
the highlight detection rules.</para>
|
|
|
|
<para>The formal definition, aka the <acronym>DTD</acronym> is stored
|
|
in the file <filename>language.dtd</filename> which should be
|
|
installed on your system in the folder
|
|
<filename>$<envar>TDEDIR</envar>/share/apps/katepart/syntax</filename>.
|
|
</para>
|
|
|
|
<variablelist>
|
|
<title>Main sections of &kate; Highlight Definition files</title>
|
|
|
|
<varlistentry>
|
|
<term>A highlighting file contains a header that sets the XML version and the doctype:</term>
|
|
<listitem>
|
|
<programlisting>
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE language SYSTEM "language.dtd">
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>The root of the definition file is the element <userinput>language</userinput>.
|
|
Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para>Required attributes:</para>
|
|
<para><userinput>name</userinput> sets the name of the language. It appears in the menus and dialogs afterwards.</para>
|
|
<para><userinput>section</userinput> specifies the category.</para>
|
|
<para><userinput>extensions</userinput> defines file extensions, like "*.cpp;*.h"</para>
|
|
|
|
<para>Optional attributes:</para>
|
|
<para><userinput>mimetype</userinput> associates files &MIME; Type based.</para>
|
|
<para><userinput>version</userinput> specifies the current version of the definition file.</para>
|
|
<para><userinput>kateversion</userinput> specifies the latest supported &kate; version.</para>
|
|
<para><userinput>casesensitive</userinput> defines, whether the keywords are casesensitiv or not.</para>
|
|
<para><userinput>priority</userinput> is necessary if another highlight definition file uses the same extensions. The higher priority will win.</para>
|
|
<para><userinput>author</userinput> contains the name of the author and his email-address.</para>
|
|
<para><userinput>license</userinput> contains the license, usually LGPL, Artistic, GPL and others.</para>
|
|
<para><userinput>hidden</userinput> defines, whether the name should appear in &kate;'s menus.</para>
|
|
<para>So the next line may look like this:</para>
|
|
<programlisting>
|
|
<language name="C++" version="1.00" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" />
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>Next comes the <userinput>highlighting</userinput> element, which
|
|
contains the optional element <userinput>list</userinput> and the required
|
|
elements <userinput>contexts</userinput> and <userinput>itemDatas</userinput>.</term>
|
|
<listitem>
|
|
<para><userinput>list</userinput> elements contain a list of keywords. In
|
|
this case the keywords are <emphasis>class</emphasis> and <emphasis>const</emphasis>.
|
|
You can add as many lists as you need.</para>
|
|
<para>The <userinput>contexts</userinput> element contains all contexts.
|
|
The first context is by default the start of the highlighting. There are
|
|
two rules in the context <emphasis>Normal Text</emphasis>, which match
|
|
the list of keywords with the name <emphasis>somename</emphasis> and a
|
|
rule that detects a quote and switches the context to <emphasis>string</emphasis>.
|
|
To learn more about rules read the next chapter.</para>
|
|
<para>The third part is the <userinput>itemDatas</userinput> element. It
|
|
contains all color and font styles needed by the contexts and rules.
|
|
In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emphasis>,
|
|
<emphasis>String</emphasis> and <emphasis>Keyword</emphasis> are used.
|
|
</para>
|
|
<programlisting>
|
|
<highlighting>
|
|
<list name="somename">
|
|
<item> class </item>
|
|
<item> const </item>
|
|
</list>
|
|
<contexts>
|
|
<context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" >
|
|
<keyword attribute="Keyword" context="#stay" String="somename" />
|
|
<DetectChar attribute="String" context="string" char="&quot;" />
|
|
</context>
|
|
<context attribute="String" lineEndContext="#stay" name="string" >
|
|
<DetectChar attribute="String" context="#pop" char="&quot;" />
|
|
</context>
|
|
</contexts>
|
|
<itemDatas>
|
|
<itemData name="Normal Text" defStyleNum="dsNormal" />
|
|
<itemData name="Keyword" defStyleNum="dsKeyword" />
|
|
<itemData name="String" defStyleNum="dsString" />
|
|
</itemDatas>
|
|
</highlighting>
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>The last part of a highlight definition is the optional
|
|
<userinput>general</userinput> section. It may contain information
|
|
about keywords, code folding, comments and indentation.</term>
|
|
|
|
<listitem>
|
|
<para>The <userinput>comment</userinput> section defines with what
|
|
string a single line comment is introduced. You also can define a
|
|
multiline comments using <emphasis>multiLine</emphasis> with the
|
|
additional attribute <emphasis>end</emphasis>. This is used if the
|
|
user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasis>.</para>
|
|
<para>The <userinput>keywords</userinput> section defines whether
|
|
keyword lists are casesensitive or not. Other attributes will be
|
|
explained later.</para>
|
|
<programlisting>
|
|
<general>
|
|
<comments>
|
|
<comment name="singleLine" start="#"/>
|
|
</comments>
|
|
<keywords casesensitive="1"/>
|
|
</general>
|
|
</language>
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="kate-highlight-sections">
|
|
<title>The Sections in Detail</title>
|
|
<para>This part will describe all available attributes for contexts,
|
|
itemDatas, keywords, comments, code folding and indentation.</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>The element <userinput>context</userinput> belongs into the group
|
|
<userinput>contexts</userinput>. A context itself defines context specific
|
|
rules like what should happen if the highlight system reaches the end of a
|
|
line. Available attributes are:</term>
|
|
|
|
|
|
<listitem>
|
|
<para><userinput>name</userinput> the context name. Rules will use this name
|
|
to specify the context to switch to if the rule matches.</para>
|
|
<para><userinput>lineEndContext</userinput> defines the context the highlight
|
|
system switches to if it reaches the end of a line. This may either be a name
|
|
of another context, <userinput>#stay</userinput> to not switch the context
|
|
(eg. do nothing) or <userinput>#pop</userinput> which will cause to leave this
|
|
context. It is possible to use for example <userinput>#pop#pop#pop</userinput>
|
|
to pop three times.</para>
|
|
<para><userinput>lineBeginContext</userinput> defines the context if a begin
|
|
of a line is encountered. Default: #stay.</para>
|
|
<para><userinput>fallthrough</userinput> defines if the highlight system switches
|
|
to the context specified in fallthroughContext if no rule matches.
|
|
Default: <emphasis>false</emphasis>.</para>
|
|
<para><userinput>fallthroughContext</userinput> specifies the next context
|
|
if no rule matches.</para>
|
|
<para><userinput>dynamic</userinput> if <emphasis>true</emphasis>, the context
|
|
remembers strings/placeholders saved by dynamic rules. This is needed for HERE
|
|
documents for example. Default: <emphasis>false</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>The element <userinput>itemData</userinput> is in the group
|
|
<userinput>itemDatas</userinput>. It defines the font style and colors.
|
|
So it is possible to define your own styles and colors, however we
|
|
recommend to stick to the default styles if possible so that the user
|
|
will always see the same colors used in different languages. Though,
|
|
sometimes there is no other way and it is necessary to change color
|
|
and font attributes. The attributes name and defStyleNum are required,
|
|
the other optional. Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para><userinput>name</userinput> sets the name of the itemData.
|
|
Contexts and rules will use this name in their attribute
|
|
<emphasis>attribute</emphasis> to reference an itemData.</para>
|
|
<para><userinput>defStyleNum</userinput> defines which default style to use.
|
|
Available default styles are explained in detail later.</para>
|
|
<para><userinput>color</userinput> defines a color. Valid formats are
|
|
'#rrggbb' or '#rgb'.</para>
|
|
<para><userinput>selColor</userinput> defines the selection color.</para>
|
|
<para><userinput>italic</userinput> if <emphasis>true</emphasis>, the text will be italic.</para>
|
|
<para><userinput>bold</userinput> if <emphasis>true</emphasis>, the text will be bold.</para>
|
|
<para><userinput>underline</userinput> if <emphasis>true</emphasis>, the text will be underlined.</para>
|
|
<para><userinput>strikeout</userinput> if <emphasis>true</emphasis>, the text will be stroked out.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>The element <userinput>keywords</userinput> in the group
|
|
<userinput>general</userinput> defines keyword properties. Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para><userinput>casesensitive</userinput> may be <emphasis>true</emphasis>
|
|
or <emphasis>false</emphasis>. If <emphasis>true</emphasis>, all keywords
|
|
are matched casesensitive</para>
|
|
<para><userinput>weakDeliminator</userinput> is a list of characters that
|
|
do not act as word delimiters. For example the dot <userinput>'.'</userinput>
|
|
is a word delimiter. Assume a keyword in a <userinput>list</userinput> contains
|
|
a dot, it will only match if you specify the dot as a weak delimiter.</para>
|
|
<para><userinput>additionalDeliminator</userinput> defines additional delimiters.</para>
|
|
<para><userinput>wordWrapDeliminator</userinput> defines characters after which a
|
|
line wrap may occur.</para>
|
|
<para>Default delimiters and word wrap delimiters are the characters
|
|
<userinput>.():!+,-<=>%&*/;?[]^{|}~\</userinput>, space (<userinput>' '</userinput>)
|
|
and tabulator (<userinput>'\t'</userinput>).</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>The element <userinput>comment</userinput> in the group
|
|
<userinput>comments</userinput> defines comment properties which are used
|
|
for <menuchoice><guimenu>Tools</guimenu><guimenuitem>Comment</guimenuitem></menuchoice> and
|
|
<menuchoice><guimenu>Tools</guimenu><guimenuitem>Uncomment</guimenuitem></menuchoice>.
|
|
Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para><userinput>name</userinput> is either <emphasis>singleLine</emphasis>
|
|
or <emphasis>multiLine</emphasis>. If you choose <emphasis>multiLine</emphasis>
|
|
the attributes <emphasis>end</emphasis> and <emphasis>region</emphasis> are
|
|
required.</para>
|
|
<para><userinput>start</userinput> defines the string used to start a comment.
|
|
In C++ this would be "/*".</para>
|
|
<para><userinput>end</userinput> defines the string used to close a comment.
|
|
In C++ this would be "*/".</para>
|
|
<para><userinput>region</userinput> should be the name of the the foldable
|
|
multiline comment. Assume you have <emphasis>beginRegion="Comment"</emphasis>
|
|
... <emphasis>endRegion="Comment"</emphasis> in your rules, you should use
|
|
<emphasis>region="Comment"</emphasis>. This way uncomment works even if you
|
|
do not select all the text of the multiline comment. The cursor only must be
|
|
in the multiline comment.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>The element <userinput>folding</userinput> in the group
|
|
<userinput>general</userinput> defines code folding properties.
|
|
Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para><userinput>indentationsensitive</userinput> if <emphasis>true</emphasis>, the code folding markers
|
|
will be added indentation based, like in the scripting language Python. Usually you
|
|
do not need to set it, as it defaults to <emphasis>false</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>The element <userinput>indentation</userinput> in the group
|
|
<userinput>general</userinput> defines which indenter will be used, however we strongly
|
|
recommend to omit this element, as the indenter usually will be set by either defining
|
|
a File Type or by adding a mode line to the text file. If you specify an indenter though,
|
|
you will force a specific indentation on the user, which he might not like at all.
|
|
Available attributes are:</term>
|
|
|
|
<listitem>
|
|
<para><userinput>mode</userinput> is the name of the indenter. Available indenters
|
|
right now are: <emphasis>normal, cstyle, csands, xml, python</emphasis> and
|
|
<emphasis>varindent</emphasis>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
</variablelist>
|
|
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="kate-highlight-default-styles">
|
|
<title>Available Default Styles</title>
|
|
<para>Default Styles were <link linkend="kate-highlight-system-default-styles">already explained</link>,
|
|
as a short summary: Default styles are predefined font and color styles.</para>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>So here only the list of available default styles:</term>
|
|
<listitem>
|
|
<para><userinput>dsNormal</userinput>, used for normal text.</para>
|
|
<para><userinput>dsKeyword</userinput>, used for keywords.</para>
|
|
<para><userinput>dsDataType</userinput>, used for data types.</para>
|
|
<para><userinput>dsDecVal</userinput>, used for decimal values.</para>
|
|
<para><userinput>dsBaseN</userinput>, used for values with a base other than 10.</para>
|
|
<para><userinput>dsFloat</userinput>, used for float values.</para>
|
|
<para><userinput>dsChar</userinput>, used for a character.</para>
|
|
<para><userinput>dsString</userinput>, used for strings.</para>
|
|
<para><userinput>dsComment</userinput>, used for comments.</para>
|
|
<para><userinput>dsOthers</userinput>, used for 'other' things.</para>
|
|
<para><userinput>dsAlert</userinput>, used for warning messages.</para>
|
|
<para><userinput>dsFunction</userinput>, used for function calls.</para>
|
|
<para><userinput>dsRegionMarker</userinput>, used for region markers.</para>
|
|
<para><userinput>dsError</userinput>, used for error highlighting and wrong syntax.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="kate-highlight-rules-detailled">
|
|
<title>Highlight Detection Rules</title>
|
|
|
|
<para>This section describes the syntax detection rules.</para>
|
|
|
|
<para>Each rule can match zero or more characters at the beginning of
|
|
the string they are test against. If the rule matches, the matching
|
|
characters are assigned the style or <emphasis>attribute</emphasis>
|
|
defined by the rule, and a rule may ask that the current context is
|
|
switched.</para>
|
|
|
|
<para>A rule looks like this:</para>
|
|
|
|
<programlisting><RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] /></programlisting>
|
|
|
|
<para>The <emphasis>attribute</emphasis> identifies the style to use
|
|
for matched characters by name, and the <emphasis>context</emphasis>
|
|
identifies the context to use from here.</para>
|
|
|
|
<para>The <emphasis>context</emphasis> can be identified by:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>An <emphasis>identifier</emphasis>, which is the name of the other
|
|
context.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>An <emphasis>order</emphasis> telling the engine to stay in the
|
|
current context (<userinput>#stay</userinput>), or to pop back to a
|
|
previous context used in the string (<userinput>#pop</userinput>).</para>
|
|
<para>To go back more steps, the #pop keyword can be repeated:
|
|
<userinput>#pop#pop#pop</userinput></para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Some rules can have <emphasis>child rules</emphasis> which are
|
|
then evaluated only if the parent rule matched. The entire matched
|
|
string will be given the attribute defined by the parent rule. A rule
|
|
with child rules looks like this:</para>
|
|
|
|
<programlisting>
|
|
<RuleName (attributes)>
|
|
<ChildRuleName (attributes) />
|
|
...
|
|
</RuleName>
|
|
</programlisting>
|
|
|
|
|
|
<para>Rule specific attributes varies and are described in the
|
|
following sections.</para>
|
|
|
|
|
|
<itemizedlist>
|
|
<title>Common attributes</title>
|
|
<para>All rules have the following attributes in common and are
|
|
available whenever <userinput>(common attributes)</userinput> appears.
|
|
<emphasis>attribute</emphasis> and <emphasis>context</emphasis>
|
|
are required attributes, all others are optional.
|
|
</para>
|
|
|
|
<listitem>
|
|
<para><emphasis>attribute</emphasis>: An attribute maps to a defined <emphasis>itemData</emphasis>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>context</emphasis>: Specify the context to which the highlighting system switches if the rule matches.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>beginRegion</emphasis>: Start a code folding block. Default: unset.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>endRegion</emphasis>: Close a code folding block. Default: unset.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>lookAhead</emphasis>: If <emphasis>true</emphasis>, the
|
|
highlighting system will not process the matches length.
|
|
Default: <emphasis>false</emphasis>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>firstNonSpace</emphasis>: Match only, if the string is
|
|
the first non-whitespace in the line. Default: <emphasis>false</emphasis>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><emphasis>column</emphasis>: Match only, if the column matches. Default: unset.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<itemizedlist>
|
|
<title>Dynamic rules</title>
|
|
<para>Some rules allow the optional attribute <userinput>dynamic</userinput>
|
|
of type boolean that defaults to <emphasis>false</emphasis>. If dynamic is
|
|
<emphasis>true</emphasis>, a rule can use placeholders representing the text
|
|
matched by a <emphasis>regular expression</emphasis> rule that switched to the
|
|
current context in its <userinput>string</userinput> or
|
|
<userinput>char</userinput> attributes. In a <userinput>string</userinput>,
|
|
the placeholder <replaceable>%N</replaceable> (where N is a number) will be
|
|
replaced with the corresponding capture <replaceable>N</replaceable>
|
|
from the calling regular expression. In a
|
|
<userinput>char</userinput> the placeholer must be a number
|
|
<replaceable>N</replaceable> and it will be replaced with the first character of
|
|
the corresponding capture <replaceable>N</replaceable> from the calling regular
|
|
expression. Whenever a rule allows this attribute it will contain a
|
|
<emphasis>(dynamic)</emphasis>.</para>
|
|
|
|
<listitem>
|
|
<para><emphasis>dynamic</emphasis>: may be <emphasis>(true|false)</emphasis>.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<sect2 id="highlighting-rules-in-detail">
|
|
<title>The Rules in Detail</title>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term>DetectChar</term>
|
|
<listitem>
|
|
<para>Detect a single specific character. Commonly used for example to
|
|
find the ends of quoted strings.</para>
|
|
<programlisting><DetectChar char="(character)" (common attributes) (dynamic) /></programlisting>
|
|
<para>The <userinput>char</userinput> attribute defines the character
|
|
to match.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Detect2Chars</term>
|
|
<listitem>
|
|
<para>Detect two specific characters in a defined order.</para>
|
|
<programlisting><Detect2Chars char="(character)" char1="(character)" (common attributes) (dynamic) /></programlisting>
|
|
<para>The <userinput>char</userinput> attribute defines the first character to match,
|
|
<userinput>char1</userinput> the second.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>AnyChar</term>
|
|
<listitem>
|
|
<para>Detect one character of a set of specified characters.</para>
|
|
<programlisting><AnyChar String="(string)" (common attributes) /></programlisting>
|
|
<para>The <userinput>String</userinput> attribute defines the set of
|
|
characters.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>StringDetect</term>
|
|
<listitem>
|
|
<para>Detect an exact string.</para>
|
|
<programlisting><StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) /></programlisting>
|
|
<para>The <userinput>String</userinput> attribute defines the string
|
|
to match. The <userinput>insensitive</userinput> attribute defaults to
|
|
<emphasis>false</emphasis> and is passed to the string comparison
|
|
function. If the value is <emphasis>true</emphasis> insensitive
|
|
comparing is used.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>RegExpr</term>
|
|
<listitem>
|
|
<para>Matches against a regular expression.</para>
|
|
<programlisting><RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) /></programlisting>
|
|
<para>The <userinput>String</userinput> attribute defines the regular
|
|
expression.</para>
|
|
<para><userinput>insensitive</userinput> defaults to
|
|
<emphasis>false</emphasis> and is passed to the regular expression
|
|
engine.</para>
|
|
<para><userinput>minimal</userinput> defaults to
|
|
<emphasis>false</emphasis> and is passed to the regular expression
|
|
engine.</para>
|
|
<para>Because the rules are always matched against the beginning of
|
|
the current string, a regular expression starting with a caret
|
|
(<literal>^</literal>) indicates that the rule should only be
|
|
matched against the start of a line.</para>
|
|
<para>See <link linkend="regular-expressions">Regular Expressions</link>
|
|
for more information on those.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>keyword</term>
|
|
<listitem>
|
|
<para>Detect a keyword from a specified list.</para>
|
|
<programlisting><keyword String="(list name)" (common attributes) /></programlisting>
|
|
<para>The <userinput>String</userinput> attribute identifies the
|
|
keyword list by name. A list with that name must exist.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Int</term>
|
|
<listitem>
|
|
<para>Detect an integer number.</para>
|
|
<para><programlisting><Int (common attributes) (dynamic) /></programlisting></para>
|
|
<para>This rule has no specific attributes. Child rules are typically
|
|
used to detect combinations of <userinput>L</userinput> and
|
|
<userinput>U</userinput> after the number, indicating the integer type
|
|
in program code. Actually all rules are allowed as child rules, though,
|
|
the <acronym>DTD</acronym> only allowes the child rule <userinput>StringDetect</userinput>.</para>
|
|
<para>The following example matches integer numbers follows by the character 'L'.
|
|
<programlisting>
|
|
<Int attribute="Decimal" context="#stay" >
|
|
<StringDetect attribute="Decimal" context="#stay" String="L" insensitive="true"/>
|
|
</Int>
|
|
</programlisting></para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Float</term>
|
|
<listitem>
|
|
<para>Detect a floating point number.</para>
|
|
<para><programlisting><Float (common attributes) /></programlisting></para>
|
|
<para>This rule has no specific attributes. <userinput>AnyChar</userinput> is
|
|
allowed as a child rules and typically used to detect combinations, see rule
|
|
<userinput>Int</userinput> for reference.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>HlCOct</term>
|
|
<listitem>
|
|
<para>Detect an octal point number representation.</para>
|
|
<para><programlisting><HlCOct (common attributes) /></programlisting></para>
|
|
<para>This rule has no specific attributes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>HlCHex</term>
|
|
<listitem>
|
|
<para>Detect a hexadecimal number representation.</para>
|
|
<para><programlisting><HlCHex (common attributes) /></programlisting></para>
|
|
<para>This rule has no specific attributes.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>HlCStringChar</term>
|
|
<listitem>
|
|
<para>Detect an escaped character.</para>
|
|
<para><programlisting><HlCStringChar (common attributes) /></programlisting></para>
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
<para>It matches literal representations of characters commonly used in
|
|
program code, for example <userinput>\n</userinput>
|
|
(newline) or <userinput>\t</userinput> (TAB).</para>
|
|
|
|
<para>The following characters will match if they follow a backslash
|
|
(<literal>\</literal>):
|
|
<userinput>abefnrtv"'?\</userinput>. Additionally, escaped
|
|
hexadecimal numbers like for example <userinput>\xff</userinput> and
|
|
escaped octal numbers, for example <userinput>\033</userinput> will
|
|
match.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>HlCChar</term>
|
|
<listitem>
|
|
<para>Detect an C character.</para>
|
|
<para><programlisting><HlCChar (common attributes) /></programlisting></para>
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
<para>It matches C characters enclosed in a tick (Example: <userinput>'c'</userinput>).
|
|
So in the ticks may be a simple character or an escaped character.
|
|
See HlCStringChar for matched escaped character sequences.</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>RangeDetect</term>
|
|
<listitem>
|
|
<para>Detect a string with defined start and end characters.</para>
|
|
<programlisting><RangeDetect char="(character)" char1="(character)" (common attributes) /></programlisting>
|
|
<para><userinput>char</userinput> defines the character starting the range,
|
|
<userinput>char1</userinput> the character ending the range.</para>
|
|
<para>Usefull to detect for example small quoted strings and the like, but
|
|
note that since the highlighting engine works on one line at a time, this
|
|
will not find strings spanning over a line break.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>LineContinue</term>
|
|
<listitem>
|
|
<para>Matches at end of line.</para>
|
|
<programlisting><LineContinue (common attributes) /></programlisting>
|
|
<para>This rule has no specific attributes.</para>
|
|
<para>This rule is useful for switching context at end of line, if the last
|
|
character is a backslash (<userinput>'\'</userinput>). This is needed for
|
|
example in C/C++ to continue macros or strings.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>IncludeRules</term>
|
|
<listitem>
|
|
<para>Include rules from another context or language/file.</para>
|
|
<programlisting><IncludeRules context="contextlink" [includeAttrib="true|false"] /></programlisting>
|
|
|
|
<para>The <userinput>context</userinput> attribute defines which context to include.</para>
|
|
<para>If it a simple string it includes all defined rules into the current context, example:
|
|
<programlisting><IncludeRules context="anotherContext" /></programlisting></para>
|
|
|
|
<para>
|
|
If the string begins with <userinput>##</userinput> the highlight system
|
|
will look for another language definition with the given name, example:
|
|
<programlisting><IncludeRules context="##C++" /></programlisting></para>
|
|
<para>If <userinput>includeAttrib</userinput> attribute is
|
|
<emphasis>true</emphasis>, change the destination attribute to the one of
|
|
the source. This is required to make for example commenting work, if text
|
|
matched by the included context is a different highlight than the host
|
|
context.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>DetectSpaces</term>
|
|
<listitem>
|
|
<para>Detect whitespaces.</para>
|
|
<programlisting><DetectSpaces (common attributes) /></programlisting>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
<para>Use this rule if you know that there can several whitespaces ahead,
|
|
for example in the beginning of indented lines. This rule will skip all
|
|
whitespace at once, instead of testing multiple rules and skipping one at the
|
|
time due to no match.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term>DetectIdentifier</term>
|
|
<listitem>
|
|
<para>Detect identifier strings (as a regular expression: [a-zA-Z_][a-zA-Z0-9_]*).</para>
|
|
<programlisting><DetectIdentifier (common attributes) /></programlisting>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
<para>Use this rule to skip a string of word characters at once, rather than
|
|
testing with multiple rules and skipping one at the time due to no match.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Tips & Tricks</title>
|
|
|
|
<itemizedlist>
|
|
<para>Once you have understood how the context switching works it will be
|
|
easy to write highlight definitions. Though you should carefully check what
|
|
rule you choose in what situation. Regular expressions are very mighty, but
|
|
they are slow compared to the other rules. So you may consider the following
|
|
tips.
|
|
</para>
|
|
|
|
<listitem>
|
|
<para>If you only match two characters use <userinput>Detect2Chars</userinput>
|
|
instead of <userinput>StringDetect</userinput>. The same applies to
|
|
<userinput>DetectChar</userinput>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Regular expressions are easy to use but often there is another much
|
|
faster way to achieve the same result. Consider you only want to match
|
|
the character <userinput>'#'</userinput> if it is the first character in the
|
|
line. A regular expression based solution would look like this:
|
|
<programlisting><RegExpr attribute="Macro" context="macro" String="^\s*#" /></programlisting>
|
|
You can achieve the same much faster in using:
|
|
<programlisting><DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" /></programlisting>
|
|
If you want to match the regular expression <userinput>'^#'</userinput> you
|
|
can still use <userinput>DetectChar</userinput> with the attribute <userinput>column="0"</userinput>.
|
|
The attribute <userinput>column</userinput> counts character based, so a tabulator still is only one character.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>You can switch contexts without processing characters. Assume that you
|
|
want to switch context when you meet the string <userinput>*/</userinput>, but
|
|
need to process that string in the next context. The below rule will match, and
|
|
the <userinput>lookAhead</userinput> attribute will cause the highlighter to
|
|
keep the matched string for the next context.
|
|
<programlisting><Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" /></programlisting>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Use <userinput>DetectSpaces</userinput> if you know that many whitespaces occur.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Use <userinput>DetectIdentifier</userinput> instead of the regular expression <userinput>'[a-zA-Z_]\w*'</userinput>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Use default styles whenever you can. This way the user will find a familiar environment.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>Look into other XML-files to see how other people implement tricky rules.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>You can validate every XML file by using the command
|
|
<command>xmllint --dtdvalid language.dtd mySyntax.xml</command>.</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>If you repeat complex regular expression very often you can use
|
|
<emphasis>ENTITIES</emphasis>. Example:</para>
|
|
<programlisting>
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE language SYSTEM "language.dtd"
|
|
[
|
|
<!ENTITY myref "[A-Za-z_:][\w.:_-]*">
|
|
]>
|
|
</programlisting>
|
|
<para>Now you can use <emphasis>&myref;</emphasis> instead of the regular
|
|
expression.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
</appendix>
|