|
|
KPilot PalmDoc Conduit bookmark Specification
|
|
|
=============================================
|
|
|
|
|
|
(c) 2003 Reinhold Kainhofer, reinhold@kainhofer.com
|
|
|
|
|
|
This document is licensed under the FDL (Free Documentation License)
|
|
|
as published by the FSF. Any version of the FDL can be applied
|
|
|
at your convenience.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The PalmDoc conduit has three ways to indicate bookmarks for a text:
|
|
|
-) Inline tags of the form <* bookmarkname *>
|
|
|
-) Endtags of the form <bookmarkname> at the end of the document
|
|
|
-) Regular expressions in a separate textname.bmk file
|
|
|
(textname.bmk ist the filename of the text with the .txt replaced by .bmk)
|
|
|
|
|
|
|
|
|
In the design of the .bmk file, I tried to stay close to the
|
|
|
syntac of MakeDocJ bookmark files, but it turned out that I
|
|
|
needed to extend the syntax a little. Also, MakeDocJ uses Java
|
|
|
RegExps, while the PalmDoc conduit uses the QRegExp, which have
|
|
|
some slight differences (especially concerning the ^ and $
|
|
|
patterns as well as backreferences). So if you used MakeDocJ,
|
|
|
the .bmk file syntax will be quite familiar, but you will still
|
|
|
have to adapt your bookmark files for Qt regular expressions
|
|
|
instead of Java regular expressions
|
|
|
|
|
|
|
|
|
|
|
|
1) INLINE TAGS
|
|
|
|
|
|
Whenever a tag of the form <* someText *> appears in the text,
|
|
|
this sequence is removed from the text, and a bookmark is set
|
|
|
there with the bookmark name "someText" (the part between the
|
|
|
<* and the *>).
|
|
|
|
|
|
|
|
|
2) ENDTAGS
|
|
|
|
|
|
If the text ends with tags of the form <someText>, the string
|
|
|
in braces is used as bookmark name, and wherever it appears in
|
|
|
the text, a bookmark is set.
|
|
|
After the > any number of whitespace is allowed, but no other
|
|
|
characters like letters, numbers, or punctuation. Also, inside
|
|
|
the braces no line break must occur. The conduit searches the
|
|
|
text from the end and if it finds a line break inside a <...>
|
|
|
sequence, the tag and everything before it is assumed to belong
|
|
|
to the text and doesn't form a bookmark tag.
|
|
|
Between endtags any number of whitespace (spaces, tabs, line
|
|
|
feeds etc.) is allowed.
|
|
|
|
|
|
As an example, assume you have a text ending in:
|
|
|
... the bad guy was punished, and they lived happily
|
|
|
ever after!
|
|
|
<Tag with
|
|
|
line feed>
|
|
|
<bad guy> <princess>
|
|
|
<married>
|
|
|
|
|
|
The conduit starts at the end, ignores all whitespace between
|
|
|
the tags, so it finds the tags "married", "princess", and "bad guy".
|
|
|
The "Tag with line feed" has a line feed, so it is assumed to belong
|
|
|
to the text.
|
|
|
Assume now you have a text ending in:
|
|
|
... the bad guy was punished, and they lived happily
|
|
|
ever after!
|
|
|
<bad guy> The End <princess>
|
|
|
<married>
|
|
|
|
|
|
Here, only "married" and "princess" are found as bookmarks. Because
|
|
|
of the letters before the "princess" tags, the search for the
|
|
|
bookmarks ends at the letter "d" of "The End" (the conduit starts
|
|
|
from the end and moves backward until it finds some text which
|
|
|
cannot be seen as a endtag.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3) REGULAR EXPRESSIONS IN A SEPARATE FILE
|
|
|
|
|
|
This is by far the most complex way to specify bookmarks, but
|
|
|
it is also the mose powerful.
|
|
|
If you have a text with filename "My fairy tale.txt", the
|
|
|
bookmarks will be specified in a file called "My fairy tale.bmk"
|
|
|
(just the text filename with the .txt replaced by .bmk). This
|
|
|
file contains the bookmark definitions, one in each line. Lines
|
|
|
starting with a # are seen as comments, and empty lines are also
|
|
|
ignored.
|
|
|
|
|
|
|
|
|
In the .bmk file, each bookmark line has one of the following syntaces
|
|
|
(I will explain all fields later on). Fields in [..] are optional:
|
|
|
|
|
|
bmkName
|
|
|
bmkPosition, bmkName
|
|
|
+, bmkPatternRegExp[, bmkNameAsString[, firstIncludedBmk[, lastIncludedBmk]]]
|
|
|
+, bmkPatternRegExp[, bmkNameIndexOfSubexpression[, firstIncludedBmk[, lastIncludedBmk]]]
|
|
|
-, bmkPatternRegExp[, bmkNameAsString]
|
|
|
-, bmkPatternRegExp[, bmkNameIndexOfSubexpression]
|
|
|
|
|
|
If the first field is a string, it is used as the bookmark name
|
|
|
and pattern to search for.
|
|
|
If the first field is a number, it means the position of the
|
|
|
bookmark, and the second field is the name of the bookmark.
|
|
|
If the first field is either + or -, the second field gives
|
|
|
a regular expression that is used to find the position of the
|
|
|
bookmark. If the first field is a -, the search is done only
|
|
|
once and only the first match will be added as bookmark. If
|
|
|
the first field is a +, the search is done until the regular
|
|
|
expression can no longer be found (the fourth and fifth fields
|
|
|
can be used to include only a certain range of hits). If there
|
|
|
is a third field, and it is a string, it gives the name of the
|
|
|
bookmark as a regular expression (i.e. \1 are replaced by the
|
|
|
first subexpression of the search, where subexpressions are
|
|
|
specified by round brackets in the regexp of the second field).
|
|
|
If there is a third field, and it is a number, it gives the index
|
|
|
of the subexpression of bmkPatternRegExp that is used as the
|
|
|
bookmark name.
|
|
|
If there is no third field, the whole matched text will be used
|
|
|
as bookmark name.
|
|
|
The optional fourth and fifth fields can be used to set bookmarks
|
|
|
only after the first few ocurrences of the regexp in the text, and
|
|
|
to stop the search after the expression has been found a certain
|
|
|
number of times.
|
|
|
|
|
|
|
|
|
|
|
|
If the PDB->PC sync is set up to store the bookmarks in a bookmark file,
|
|
|
it will create a file "My fairy tale.bm" (no "k") with entries of the form
|
|
|
position,bmkName
|
|
|
The .bmk file will be used if it exists, but if no .bmk file exists, the .bm file
|
|
|
will be used. This way you can override the bookmark settings, while
|
|
|
at the same time the PDB->TXT sync does not destroy your possibly
|
|
|
existing .bmk file.
|
|
|
|
|
|
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
1) Imagine you have a line like:
|
|
|
frog princess
|
|
|
In this case, the text is searched for "frog princess", and a
|
|
|
bookmark is set whenever "frog princess" occurs in the text.
|
|
|
The name of each of these bookmarks will be "frog princess".
|
|
|
|
|
|
2) A bookmark line:
|
|
|
55, Bookmark at offset 55
|
|
|
Here, a bookmark will be set at offset 55 (55th character of
|
|
|
the text), and it will have the name "Bookmark at offs" (truncated
|
|
|
to 16 characters)
|
|
|
|
|
|
3) A bookmark line
|
|
|
-,Chapter \d+
|
|
|
causes a bookmark to be set at the first ocurrence of "Chapter XXX",
|
|
|
where XXX denotes one or more digits. The bookmark name will be
|
|
|
"Chapter XXX" (XXX replaced by the actual digits).
|
|
|
|
|
|
4) A bookmark line
|
|
|
+,Chapter \d+
|
|
|
causes bookmarks to be set wherever "Chapter XXX" (XXX being one
|
|
|
or more digits) appears in the text. The bookmark name will again
|
|
|
be "Chapter XXX", but the search does not stop after the first hit.
|
|
|
|
|
|
5) A bookmark line
|
|
|
+,\n\s*(Chapter \d+)\D+, 1
|
|
|
causes a bookmark to be set whenever a new line starts with
|
|
|
"Chapter XXX" (whitespace is allowed before the "Chapter"), and
|
|
|
uses the first subexpression in (..) as the bookmark name. If you
|
|
|
have a passage
|
|
|
Chapter 15: here it starts
|
|
|
The regular expression will match, so a bookmark will be set there
|
|
|
and the subexpression "Chapter 15" (which matches the (Chapter \d+) )
|
|
|
will be used as bookmark text.
|
|
|
|
|
|
6) A bookmark line
|
|
|
+,\n\s*Part (\d+),\1\. part
|
|
|
sets a bookmark whenever a line starts with "Part XXX". The XXX
|
|
|
will be stored as the first matched subexpression. The third field
|
|
|
"\1\. part" is the regular expression for the bookmark name, where
|
|
|
\1 is replaced by the first matched subexpression of the search (XXX
|
|
|
in this case). So if a line starts with " Part 17: ", the bookmark
|
|
|
name will be "17. part".
|
|
|
|
|
|
7) A bookmark line
|
|
|
+,Table (\d+): ,\1\. Tabelle,5,25
|
|
|
will match whenever "Table XXX: " appears in the text, and the bookmark
|
|
|
name will be "XXX. Tabelle". However, the fourth field means that the
|
|
|
first four hits are ignored (the 5th hit is the first hit to be included
|
|
|
as a bookmark), and the fifth field means that all further hits after the
|
|
|
25th will be ignored, too.
|
|
|
|
|
|
8) In law texts, I use a regular expression
|
|
|
+,\n *(<28>\.? *\d+[a-z]?\.?) +, 1
|
|
|
to search for all paragraphs starting like "<22>. 15. " or " <20>23 ", and set
|
|
|
a bookmark there using only the part from the <20> to the last digit or the
|
|
|
full stop after the last digit (the pattern between the (), in our two
|
|
|
cases the bookmark names will be "<22>. 15." and "<22>23" ).
|