17/02/2004 This file is intented to provide some informations about the internal design of VPL as well as all its oddities ;-) Summary: 1- A bit of history. 2- Some definitions. 3- A quick overview of the Quanta/TDEHTML stuff interacting with VPL. 4- Basic design and interaction with Quanta. 5- VPL Classes 6- Synchronizations 7- TODO If you find an error (shouldn't be so hard ;-), could you report me please? 1) History In early 2003, I was looking for a good HTML WYSIWYG editor, and I didn't find what I wanted! So I decided to code one. After a quick search, I've found a dead project, Kafka, in kdenonbeta, which was supposed to become an full-featured WYSIWYG editor based on tdehtml. But at this time (2000-2001 I think) tdehtml wasn't ready. So it was abandonned. Meanwhile tdehtml have been greatly improved, partially thanks to the Apple Safari merging. Then I started to hack kafka a bit, adding basic cursor navigation, insertion/deletion, and so on... But I quickly realised that it would be too hard and too long for me alone to come to a decent editor. So I was looking to join an existing project, and I choose Quanta+, basically because it was (and still is, in my humble opinion) the best HTML editor in the KDE environment. It seemed I came to Quanta+ exactly at the best time: they were considering to add WYSIWYG capabilities! So for now one year, I've been coded VPL during my free time, and I am not far from a stable status. 2) Some definitions First let us quickly define some things in order to better understand the next parts. * XML (http://www.w3.org/XML/): Defined by the W3C (http://www.w3.org/), it is widely used as the next generation way to exchange and store data. Many file formats are based on it, e.g. OpenOffice files, Quanta's data files, and recent HTML files. Just open one of quanta .tag file to see what it looks like (quanta/data/dtep/**/**.tag). * SGML (more infos here: http://www.w3.org/MarkUp/SGML/): The ancestor of XML, is less strict, but looks like XML. The old HTML file formats are based on him. * DTD : Document Type Definition, define how a XML file should look like e.g. which elements are allowed in one. For example when we speak of HTML, we usually speak of the HTML DTD, which tells us what elements exists (A/IMG/TABLE/...) and how to use them (TBODY inside TABLE,...). * HTML (http://www.w3.org/MarkUp/): Hey, we all know what it is!! Yep, but for some people (/me looking at myself one year ago), it only exists one sort of HTML. In fact, the current version of HTML is 4.01, and it exists three versions of HTML DTD: HTML transitional, HTML strict and HTML frameset. HTML transitional includes all the elements plus the deprecated ones, HTML strict includes all the elements minus the deprecated ones and the HTML frameset includes all the elements necessary to build some frames. These HTML DTDs are using SGML, that is why there are not recommended. Instead the following DTDs are recommended: * XHTML (http://www.w3.org/TR/xhtml1/): We have the XHTML 1.0 Transitional/Strict/Frameset DTDs which are basically the same thing that the HTML Transitional/Strict/Frameset DTDs but it is using XML. And finally we have XHTML 1.1, and the upcoming XHTML 2.0. * CSS (http://www.w3.org/Style/CSS/): It is a way to add style (e.g. fonts, color,...) to a web page. It was created in order to separate the contents (the information) from the style. * DOM (http://www.w3.org/DOM/) is a sort of "treeview" of a XML/SGML file. E.g.
text has for DOM representation: HTML *-- BODY *-- #text (text) *-- IMG *-- attribute (name:href, value:boo) * DTEP : (stands for Document Type Editing Package) It is Quanta's way to store the DTD information (and also includes supplemental elements like toolbars and more - see the .tag files in quanta/data/dtep). Why not use the DTD file directly? Because it doesn't contains all we want (no descriptions) and are written in a very odd way (just take a look... You will get sick soon :) 3) A quick overview of the Quanta/TDEHTML stuff interacting with VPL. First, the most important thing: the parser. Defined in the quanta/parser/ directory, it is composed of the Node class, the Tag class, the Parser class and the QTag class. The parser reads and parses (Parser::parse) or rebuilds from an already parsed document (Parser::rebuild) a Node Tree, which is basically a DOM like representation of the document, but even closing Tags and empty text are represented (as well as server side scripting elements like PHP.) In fact, everything is put in the tree so that we can get back the original SGML/XML file from the tree. From now, I call it the Node tree. For example text has for Node tree: HTML *-- BODY *-- #text (text) *-- IMG (attr1 name:href, value:boo) *-- Empty text ( ) *-- /BODY /HTML The Node class handle the pointers to the parent, next, previous and first child Node. *Each* Node has a valid pointer to a Tag. The Tag takes care to remember all the information concerning the Tag itself, like the attributes, the type, etc...) One QTag per Element is created from the .tag files when Quanta is started. Each QTag contains all the DTD information about the Tag. E.g. the "IMG" Qtag says that it is a single Tag, and what are its attributes. You can get a QTag with QuantaCommon::tagFromDTD, but don't delete the QTag! Now to tdehtml. The class TDEHTMLPart is the HTML renderer widget of konqueror. It internally works with a Node Tree (another? Yep!) but these Nodes are real DOM::Nodes. (From now, I will call it the DOM::Node tree) Each of the DOM Nodes is tdehtml-internally linked to a rendering Node i.e. a change made to one DOM::Node will update the HTML rendering cf /path/to/kde/include/dom/*.h and also in the tdelibs cvs module, cf the nice tdelibs/tdehtml/DESIGN.html. WARNING about DOM::Nodes, they are just interfaces!! 4) Basic design and interaction with Quanta. Now we will enter VPL itself. VPL stands for Visual Page Layout, but you may as well call it WYSIWYG (What you See Is What You Get). (Eric's note: Except of course that HTML only suggests layout as opposed to a desktop publishing program unless you use absolute CSS very carefully. So WYSIWYG really is a ficticious misnomer with HTML.) ;-) First have in mind that when editing a HTML file in Quanta, the Node Tree is always up to date. Loading a new file/switching tabs calls Parser::parse, and typing a letter calls Parser::rebuild. Then we can see the VPL design as this: Source (XML file) <=> Node tree <=> DOM::Node tree. Then when a change is made to the source file, Parser::rebuild is called and synchronize (not really, we will see this later) the corresponding DOM::Node. In the opposite, when a DOM::Node is modified, the corresponding Node is synchronized, and the source file is modified. Of course, it is a little more complicated, but let's see this later. 5) VPL classes. VPL has several classes, but note sometimes it is not really object oriented, but I will clean up soon. * KafkaWidget(kafkahtmlpart.[h|cpp]): Derived from TDEHTMLPart, it uses the caret mode implemented by Leo Savernik in tdehtml (that means we don't have to care about cursor navigation). It handles every keypress in order to edit the widget (backspace/delete/return/