|
|
|
<appendix id="highlight">
|
|
|
|
<title>Working with Syntax Higlighting</title>
|
|
|
|
|
|
|
|
<sect1 id="highlight-overview">
|
|
|
|
|
|
|
|
<title>Overview</title>
|
|
|
|
|
|
|
|
<para>Syntax Highlighting is what makes the editor automatically
|
|
|
|
display text in different styles/colors, depending on the function of
|
|
|
|
the string in relation to the purpose of the file. In program source
|
|
|
|
code for example, control statements may be rendered bold, while data
|
|
|
|
types and comments get different colors from the rest of the
|
|
|
|
text. This greatly enhances the readability of the text, and thus
|
|
|
|
helps the author to be more efficient and productive.</para>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject><imagedata format="PNG" fileref="highlighted.png"/></imageobject>
|
|
|
|
<textobject><phrase>A perl function, rendered with syntax
|
|
|
|
highlighting.</phrase></textobject>
|
|
|
|
<caption><para>A perl function, rendered with syntax highlighting.</para>
|
|
|
|
</caption>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject><imagedata format="PNG" fileref="unhighlighted.png"/></imageobject>
|
|
|
|
<textobject><phrase>The same perl function, without
|
|
|
|
highlighting.</phrase></textobject>
|
|
|
|
<caption><para>The same perl function, without highlighting.</para></caption>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
<para>Of the two examples, which is easiest to read?</para>
|
|
|
|
|
|
|
|
<para>&kate; comes with a flexible, configurable and capable system
|
|
|
|
for doing syntax highlighting, and the standard distribution provides
|
|
|
|
definitions for a wide range of programming languages, markup and
|
|
|
|
scripting languages and other text file formats. In addition you can
|
|
|
|
provide your own definitions in simple &XML; files.</para>
|
|
|
|
|
|
|
|
<para>&kate; will automatically detect the right syntax rules when you
|
|
|
|
open a file, based on the &MIME; Type of the file, determined by its
|
|
|
|
extension, or, if it has none, the contents. Should you experience a
|
|
|
|
bad choice, you can manually set the syntax to use from the
|
|
|
|
<menuchoice><guimenu>Documents</guimenu><guisubmenu>Highlight
|
|
|
|
Mode</guisubmenu></menuchoice> menu.</para>
|
|
|
|
|
|
|
|
<para>The styles and colors used by each syntax highlight definition,
|
|
|
|
as well as which &MIME;types it should be used for, can be configured
|
|
|
|
using the <link linkend="config-dialog-editor-hl"> Highlight</link>
|
|
|
|
page of the <link linkend="config-dialog">Config Dialog</link>.</para>
|
|
|
|
|
|
|
|
<note>
|
|
|
|
<para>Syntax highlighting is there to enhance the readability of
|
|
|
|
correct text, but you cannot trust it to validate your text. Marking
|
|
|
|
text for syntax is difficult depending on the format you are using,
|
|
|
|
and in some cases the authors of the syntax rules will be proud if 98%
|
|
|
|
of text gets correctly rendered, though most often you need a rare
|
|
|
|
style to see the incorrect 2%.</para>
|
|
|
|
</note>
|
|
|
|
|
|
|
|
<tip>
|
|
|
|
<para>You can download updated or additional syntax highlight
|
|
|
|
definitions from the &kate; website by clicking the
|
|
|
|
<guibutton>Download</guibutton> button in the <link
|
|
|
|
linkend="config-dialog-editor-hl">Highlight Page</link> of the <link
|
|
|
|
linkend="config-dialog">Config Dialog</link>.</para>
|
|
|
|
</tip>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="katehighlight-system">
|
|
|
|
|
|
|
|
<title>The &kate; Syntax Highlight System</title>
|
|
|
|
|
|
|
|
<para>This section will discuss the &kate; syntax highlighting
|
|
|
|
mechanism in more detail. It is for you if you want to know know about
|
|
|
|
it, or if you want to change or create syntax definitions.</para>
|
|
|
|
|
|
|
|
<sect2 id="katehighlight-howitworks">
|
|
|
|
|
|
|
|
<title>How it Works</title>
|
|
|
|
|
|
|
|
<para>Whenever you open a file, one of the first things the &kate;
|
|
|
|
editor does is detect which syntax definition to use for the
|
|
|
|
file. While reading the text of the file, and while you type away in
|
|
|
|
it, the syntax highlighting system will analyze the text using the
|
|
|
|
rules defined by the syntax definition and mark in it where different
|
|
|
|
contexts and styles begin and end.</para>
|
|
|
|
|
|
|
|
<para>When you type in the document, the new text is analyzed and marked on the
|
|
|
|
fly, so that if you delete a character that is marked as the beginning or end
|
|
|
|
of a context, the style of surrounding text changes accordingly.</para>
|
|
|
|
|
|
|
|
<para>The syntax definitions used by the &kate; syntax highlighting system are
|
|
|
|
&XML; files, containing
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para>Rules for detecting the role of text, organized into context blocks</para></listitem>
|
|
|
|
<listitem><para>Keyword lists</para></listitem>
|
|
|
|
<listitem><para>Style Item definitions</para></listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>When analyzing the text, the detection rules are evaluated in
|
|
|
|
the order in which they are defined, and if the beginning of the
|
|
|
|
current string matches a rule, the related context is used. The start
|
|
|
|
point in the text is moved to the final point at which that rule
|
|
|
|
matched and a new loop of the rules begins, starting in the context
|
|
|
|
set by the matched rule.</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="highlight-system-rules">
|
|
|
|
<title>Rules</title>
|
|
|
|
|
|
|
|
<para>The detection rules are the heart of the highlighting detection
|
|
|
|
system. A rule is a string, character or <link
|
|
|
|
linkend="regular-expressions">regular expression</link> against which
|
|
|
|
to match the text being analyzed. It contains information about which
|
|
|
|
style to use for the matching part of the text. It may switch the
|
|
|
|
working context of the system either to an explicitly mentioned
|
|
|
|
context or to the previous context used by the text.</para>
|
|
|
|
|
|
|
|
<para>Rules are organized in context groups. A context group is used
|
|
|
|
for main text concepts within the format, for example quoted text
|
|
|
|
strings or comment blocks in program source code. This ensures that
|
|
|
|
the highlighting system does not need to loop through all rules when
|
|
|
|
it is not necessary, and that some character sequences in the text can
|
|
|
|
be treated differently depending on the current context.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="highlight-context-styles-keywords">
|
|
|
|
<title>Context Styles and Keywords</title>
|
|
|
|
|
|
|
|
<para>In some programming languages, integer numbers are treated
|
|
|
|
differently than floating point ones by the compiler (the program that
|
|
|
|
converts the source code to a binary executable), and there may be
|
|
|
|
characters having a special meaning within a quoted string. In such
|
|
|
|
cases, it makes sense to render them differently from the surroundings
|
|
|
|
so that they are easy to identify while reading the text. So even if
|
|
|
|
they do not represent special contexts, they may be seen as such by
|
|
|
|
the syntax highlighting system, so that they can be marked for
|
|
|
|
different rendering.</para>
|
|
|
|
|
|
|
|
<para>A syntax definition may contain as many styles as required to
|
|
|
|
cover the concepts of the format it is used for.</para>
|
|
|
|
|
|
|
|
<para>In many formats, there are lists of words that represent a
|
|
|
|
specific concept. For example in programming languages, the control
|
|
|
|
statements is one concept, data type names another, and built in
|
|
|
|
functions of the language a third. The &kate; Syntax Highlighting
|
|
|
|
System can use such lists to detect and mark words in the text to
|
|
|
|
emphasize concepts of the text formats.</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<sect2 id="kate-highlight-system-default-styles">
|
|
|
|
<title>Default Styles</title>
|
|
|
|
|
|
|
|
<para>If you open a C++ source file, a &Java; source file and an
|
|
|
|
<acronym>HTML</acronym> document in &kate;, you will see that even
|
|
|
|
though the formats are different, and thus different words are chosen
|
|
|
|
for special treatment, the colors used are the same. This is because
|
|
|
|
&kate; has a predefined list of Default Styles, that are employed by
|
|
|
|
the individual syntax definitions.</para>
|
|
|
|
|
|
|
|
<para>This makes it easy to recognize similar concepts in different
|
|
|
|
text formats. For example comments are present in almost any
|
|
|
|
programming, scripting or markup language, and when they are rendered
|
|
|
|
using the same style in all languages, you do not have to stop and
|
|
|
|
think to identify them within the text.</para>
|
|
|
|
|
|
|
|
<tip>
|
|
|
|
<para>All styles in a syntax definition use one of the default
|
|
|
|
styles. A few syntax definitions use more styles that there are
|
|
|
|
defaults, so if you use a format often, it may be worth launching the
|
|
|
|
configuration dialog to see if some concepts are using the same
|
|
|
|
style. For example there is only one default style for strings, but as
|
|
|
|
the perl programming language operates with two types of strings, you
|
|
|
|
can enhance the highlighting by configuring those to be slightly
|
|
|
|
different.</para>
|
|
|
|
</tip>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="katehighlight-xml-format">
|
|
|
|
<title>The Highlight Definition &XML; Format</title>
|
|
|
|
|
|
|
|
<sect2>
|
|
|
|
<title>Overview</title>
|
|
|
|
|
|
|
|
<para>This section is an overview of the Highlight Definition &XML;
|
|
|
|
format. It will describe the main components and their meaning and
|
|
|
|
usage, and go into detail with the detection rules.</para>
|
|
|
|
|
|
|
|
<para>The formal definition, aka the <acronym>DTD</acronym> is stored
|
|
|
|
in the file <filename>language.dtd</filename> which should be
|
|
|
|
installed on your system in the directory
|
|
|
|
<filename>$<envar>TDEDIR</envar>/share/apps/kate/syntax</filename>.</para>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<title>Main components of &kate; Highlight Definitions</title>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>The General Section</term>
|
|
|
|
<listitem>
|
|
|
|
<para>The General Section contains information on the comment format
|
|
|
|
of the described language, and defines whether keywords are case
|
|
|
|
sensitive.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Highlighting</term>
|
|
|
|
<listitem>
|
|
|
|
<para>The Highlighting section contains all data required to analyze
|
|
|
|
and render the text. This includes:</para>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term>ItemDatas</term>
|
|
|
|
<listitem><para>Contains ItemData elements, each defining a
|
|
|
|
style.</para></listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Keyword lists</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Each list has a name, and may contain any number of items.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Contexts</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Contains contexts, which again contain the syntax detection rules.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
</variablelist>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1 id="kate-highlight-rules-detailled">
|
|
|
|
<title>Highlight Detection Rules</title>
|
|
|
|
|
|
|
|
<para>This section describes the syntax detection rules.</para>
|
|
|
|
|
|
|
|
<para>Each rule can match zero or more characters at the beginning of
|
|
|
|
the string they are asked to test. If the rule matches, the matching
|
|
|
|
characters are assigned the style or <emphasis>attribute</emphasis>
|
|
|
|
defined by the rule, and a rule may ask that the current context is
|
|
|
|
switched.</para>
|
|
|
|
|
|
|
|
<para>The <emphasis>attribute</emphasis> and
|
|
|
|
<emphasis>context</emphasis> attributes are common to all
|
|
|
|
rules.</para>
|
|
|
|
|
|
|
|
<para>A rule looks like this:</para>
|
|
|
|
|
|
|
|
<programlisting><RuleName attribute="(identifier)" context="(identifier|order)" [rule specific attributes] /></programlisting>
|
|
|
|
|
|
|
|
<para>The <emphasis>attribute</emphasis> identifies the style to use
|
|
|
|
for matched characters by name or index, and the
|
|
|
|
<emphasis>context</emphasis> identifies the context to use from
|
|
|
|
here.</para>
|
|
|
|
|
|
|
|
<para>The <emphasis>attribute</emphasis> can be identified either by
|
|
|
|
name, or by its zero-based index in the ItemDatas group.</para>
|
|
|
|
|
|
|
|
<para>The <emphasis>context</emphasis> can be identified by:</para>
|
|
|
|
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>An <emphasis>identifier</emphasis>, currently only its zero-based
|
|
|
|
index in the contexts group.</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>An <emphasis>order</emphasis> telling the engine to stay in the
|
|
|
|
current context (<userinput>#stay</userinput>), or to pop back to a
|
|
|
|
previous context used in the string
|
|
|
|
(<userinput>#pop</userinput>).</para>
|
|
|
|
<para>To go back more steps, the #pop keyword can be repeated:
|
|
|
|
<userinput>#pop#pop#pop</userinput></para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
<para>Some rules can have <emphasis>child rules</emphasis> which are
|
|
|
|
then evaluated if and only if the parent rule matched. The entire
|
|
|
|
matched string will be given the attribute defined by the parent
|
|
|
|
rule. A rule with child rules looks like this:</para>
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
<RuleName (attributes)>
|
|
|
|
<ChildRuleName (attributes) />
|
|
|
|
...
|
|
|
|
</RuleName>
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
|
|
|
|
<para>Rule specific attributes varies and are described in the
|
|
|
|
following list.</para>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<title>The Rules in Detail</title>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>DetectChar</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect a single specific character. Commonly used for example to
|
|
|
|
find the ends of quoted strings.</para>
|
|
|
|
<programlisting><DetectChar char="(character)" (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>char</userinput> attribute defines the character
|
|
|
|
to match.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Detect2Chars</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect two specific characters in a defined order.</para>
|
|
|
|
<programlisting><Detect2Chars char="(character)" char1="(character)" (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>char</userinput> attribute defines the first character to match,
|
|
|
|
<userinput>char1</userinput> the second.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>AnyChar</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect one character of a set of specified characters.</para>
|
|
|
|
<programlisting><AnyChar String="(string)" (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>String</userinput> attribute defines the set of
|
|
|
|
characters.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>StringDetect</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect an exact string.</para>
|
|
|
|
<programlisting><StringDetect String="(string)" [insensitive="TRUE|FALSE;"] (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>String</userinput> attribute defines the string
|
|
|
|
to match. The <userinput>insensitive</userinput> attribute defaults to
|
|
|
|
<userinput>FALSE</userinput> and is fed to the string comparison
|
|
|
|
function. If the value is <userinput>TRUE</userinput> insensitive
|
|
|
|
comparing is used.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>RegExpr</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Matches against a regular expression.</para>
|
|
|
|
<programlisting><RegExpr String="(string)" [insensitive="TRUE|FALSE;"] [minimal="TRUE|FALSE"] (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>String</userinput> attribute defines the regular
|
|
|
|
expression.</para>
|
|
|
|
<para><userinput>insensitive</userinput> defaults to
|
|
|
|
<userinput>FALSE</userinput> and is fed to the regular expression
|
|
|
|
engine.</para>
|
|
|
|
<para><userinput>minimal</userinput> defaults to
|
|
|
|
<userinput>FALSE</userinput> and is fed to the regular expression
|
|
|
|
engine.</para>
|
|
|
|
<para>Because the rules are always matched against the beginning of
|
|
|
|
the current string, a regular expression starting with a caret
|
|
|
|
(<literal>^</literal>) indicates that the rule should only be
|
|
|
|
matched against the start of a line.</para>
|
|
|
|
<para>See <link linkend="regular-expressions">Regular
|
|
|
|
Expressions</link> for more information on those.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Keyword</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect a keyword from a specified list.</para>
|
|
|
|
<programlisting><keyword String="(list name)" (common attributes) /></programlisting>
|
|
|
|
<para>The <userinput>String</userinput> attribute identifies the
|
|
|
|
keyword list by name. A list with that name must exist.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Int</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect an integer number.</para>
|
|
|
|
<para><programlisting><Int (common attributes) /></programlisting></para>
|
|
|
|
<para>This rule has no specific attributes. Child rules are typically
|
|
|
|
used to detect combinations of <userinput>L</userinput> and
|
|
|
|
<userinput>U</userinput> after the number, indicating the integer type
|
|
|
|
in program code.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>Float</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect a floating point number.</para>
|
|
|
|
<para><programlisting><Float (common attributes)
|
|
|
|
/></programlisting></para>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>HlCOct</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect an octal point number representation.</para>
|
|
|
|
<para><programlisting><HlCOct (common attributes) /></programlisting></para>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>HlCHex</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect a hexadecimal number representation.</para>
|
|
|
|
<para><programlisting><Int (common attributes) /></programlisting></para>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>HlCStringChar</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect an escaped character.</para>
|
|
|
|
<para><programlisting><HlCStringChar (common attributes)
|
|
|
|
/></programlisting></para>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
|
|
|
|
<para>It matches letteral representations of invisible characters
|
|
|
|
commonly used in program code, for example <userinput>\n</userinput>
|
|
|
|
(newline) or <userinput>\t</userinput> (TAB).</para>
|
|
|
|
|
|
|
|
<para>The following characters will match if they follow a backslash
|
|
|
|
(<literal>\</literal>):
|
|
|
|
<userinput>abefnrtv"'?</userinput>. Additionally, escaped
|
|
|
|
hexadecimal numbers like for example <userinput>\xff</userinput> and
|
|
|
|
escaped octal numbers, for example <userinput>\033</userinput> will
|
|
|
|
match.</para>
|
|
|
|
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>RangeDetect</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Detect a string with defined start and end characters.</para>
|
|
|
|
<programlisting><RangeDetect char="(character)" char1="(character)" (common attributes) /></programlisting>
|
|
|
|
<para><userinput>char</userinput> defines the character starting the range,
|
|
|
|
<userinput>char2</userinput> the character ending the range.</para>
|
|
|
|
<para>Usefull to detect for example small quoted strings and the like, but note that
|
|
|
|
since the hl engine works on one line at a time, this will not find strings spanning over a line break.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>LineContinue</term>
|
|
|
|
<listitem>
|
|
|
|
<para>Matches at end of line.</para>
|
|
|
|
<programlisting><LineContinue (common attributes) /></programlisting>
|
|
|
|
<para>This rule has no specific attributes.</para>
|
|
|
|
<para>This rule is usefull for switching context at end of line.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
</variablelist>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
</appendix>
|