17/02/2004 This file is intented to provide some informations about the internal design of VPL as well as all its oddities ;-) Summary: 1- A bit of history. 2- Some definitions. 3- A quick overview of the Quanta/KHTML stuff interacting with VPL. 4- Basic design and interaction with Quanta. 5- VPL Classes 6- Synchronizations 7- TODO If you find an error (shouldn't be so hard ;-), could you report me please? 1) History In early 2003, I was looking for a good HTML WYSIWYG editor, and I didn't find what I wanted! So I decided to code one. After a quick search, I've found a dead project, Kafka, in kdenonbeta, which was supposed to become an full-featured WYSIWYG editor based on khtml. But at this time (2000-2001 I think) khtml wasn't ready. So it was abandonned. Meanwhile khtml have been greatly improved, partially thanks to the Apple Safari merging. Then I started to hack kafka a bit, adding basic cursor navigation, insertion/deletion, and so on... But I quickly realised that it would be too hard and too long for me alone to come to a decent editor. So I was looking to join an existing project, and I choose Quanta+, basically because it was (and still is, in my humble opinion) the best HTML editor in the KDE environment. It seemed I came to Quanta+ exactly at the best time: they were considering to add WYSIWYG capabilities! So for now one year, I've been coded VPL during my free time, and I am not far from a stable status. 2) Some definitions First let us quickly define some things in order to better understand the next parts. * XML (http://www.w3.org/XML/): Defined by the W3C (http://www.w3.org/), it is widely used as the next generation way to exchange and store data. Many file formats are based on it, e.g. OpenOffice files, Quanta's data files, and recent HTML files. Just open one of quanta .tag file to see what it looks like (quanta/data/dtep/**/**.tag). * SGML (more infos here: http://www.w3.org/MarkUp/SGML/): The ancestor of XML, is less strict, but looks like XML. The old HTML file formats are based on him. * DTD : Document Type Definition, define how a XML file should look like e.g. which elements are allowed in one. For example when we speak of HTML, we usually speak of the HTML DTD, which tells us what elements exists (A/IMG/TABLE/...) and how to use them (TBODY inside TABLE,...). * HTML (http://www.w3.org/MarkUp/): Hey, we all know what it is!! Yep, but for some people (/me looking at myself one year ago), it only exists one sort of HTML. In fact, the current version of HTML is 4.01, and it exists three versions of HTML DTD: HTML transitional, HTML strict and HTML frameset. HTML transitional includes all the elements plus the deprecated ones, HTML strict includes all the elements minus the deprecated ones and the HTML frameset includes all the elements necessary to build some frames. These HTML DTDs are using SGML, that is why there are not recommended. Instead the following DTDs are recommended: * XHTML (http://www.w3.org/TR/xhtml1/): We have the XHTML 1.0 Transitional/Strict/Frameset DTDs which are basically the same thing that the HTML Transitional/Strict/Frameset DTDs but it is using XML. And finally we have XHTML 1.1, and the upcoming XHTML 2.0. * CSS (http://www.w3.org/Style/CSS/): It is a way to add style (e.g. fonts, color,...) to a web page. It was created in order to separate the contents (the information) from the style. * DOM (http://www.w3.org/DOM/) is a sort of "treeview" of a XML/SGML file. E.g. text has for DOM representation: HTML *-- BODY *-- #text (text) *-- IMG *-- attribute (name:href, value:boo) * DTEP : (stands for Document Type Editing Package) It is Quanta's way to store the DTD information (and also includes supplemental elements like toolbars and more - see the .tag files in quanta/data/dtep). Why not use the DTD file directly? Because it doesn't contains all we want (no descriptions) and are written in a very odd way (just take a look... You will get sick soon :) 3) A quick overview of the Quanta/KHTML stuff interacting with VPL. First, the most important thing: the parser. Defined in the quanta/parser/ directory, it is composed of the Node class, the Tag class, the Parser class and the QTag class. The parser reads and parses (Parser::parse) or rebuilds from an already parsed document (Parser::rebuild) a Node Tree, which is basically a DOM like representation of the document, but even closing Tags and empty text are represented (as well as server side scripting elements like PHP.) In fact, everything is put in the tree so that we can get back the original SGML/XML file from the tree. From now, I call it the Node tree. For example text has for Node tree: HTML *-- BODY *-- #text (text) *-- IMG (attr1 name:href, value:boo) *-- Empty text ( ) *-- /BODY /HTML The Node class handle the pointers to the parent, next, previous and first child Node. *Each* Node has a valid pointer to a Tag. The Tag takes care to remember all the information concerning the Tag itself, like the attributes, the type, etc...) One QTag per Element is created from the .tag files when Quanta is started. Each QTag contains all the DTD information about the Tag. E.g. the "IMG" Qtag says that it is a single Tag, and what are its attributes. You can get a QTag with QuantaCommon::tagFromDTD, but don't delete the QTag! Now to khtml. The class KHTMLPart is the HTML renderer widget of konqueror. It internally works with a Node Tree (another? Yep!) but these Nodes are real DOM::Nodes. (From now, I will call it the DOM::Node tree) Each of the DOM Nodes is khtml-internally linked to a rendering Node i.e. a change made to one DOM::Node will update the HTML rendering cf /path/to/kde/include/dom/*.h and also in the kdelibs cvs module, cf the nice kdelibs/khtml/DESIGN.html. WARNING about DOM::Nodes, they are just interfaces!! 4) Basic design and interaction with Quanta. Now we will enter VPL itself. VPL stands for Visual Page Layout, but you may as well call it WYSIWYG (What you See Is What You Get). (Eric's note: Except of course that HTML only suggests layout as opposed to a desktop publishing program unless you use absolute CSS very carefully. So WYSIWYG really is a ficticious misnomer with HTML.) ;-) First have in mind that when editing a HTML file in Quanta, the Node Tree is always up to date. Loading a new file/switching tabs calls Parser::parse, and typing a letter calls Parser::rebuild. Then we can see the VPL design as this: Source (XML file) <=> Node tree <=> DOM::Node tree. Then when a change is made to the source file, Parser::rebuild is called and synchronize (not really, we will see this later) the corresponding DOM::Node. In the opposite, when a DOM::Node is modified, the corresponding Node is synchronized, and the source file is modified. Of course, it is a little more complicated, but let's see this later. 5) VPL classes. VPL has several classes, but note sometimes it is not really object oriented, but I will clean up soon. * KafkaWidget(kafkahtmlpart.[h|cpp]): Derived from KHTMLPart, it uses the caret mode implemented by Leo Savernik in khtml (that means we don't have to care about cursor navigation). It handles every keypress in order to edit the widget (backspace/delete/return/) and modify only the DOM::Node tree (not the Node tree). * KafkaDocument(wkafkapart.[h|cpp]): It takes care to load the DOM::Node tree from the Node tree, and when a change is made to the DOM::Node tree, it apply it in the Node tree. It basically takes care of the synchronization of the trees. * kafkaCommon(kafkacommon.[h|cpp]): A lot of useful functions, some need to be moved... Everything for Node modification, Node tree modification, DOM::Node tree, DOM::Node modification is here. * undoRedo(undoredo.[h|cpp]): Not functional yet, but it is intended to provide undo/redo functionality to both VPL and Quanta. But you are invited to use its structures. See section ? for more informations. * kNodeAttrs(nodeproperties.[h|cpp]): We can easily put a link to a DOM::Node from a Node in the Node class, but the opposite is impossible (we can't derive them). So we have a link DOM::Node => kNodeAttrs => Node (thanks to a QPtrDict). And we also have some informations about the way to handle this node when editing VPL. (Be careful, one Node can be linked against several Nodes. See after.) * NodeEnhancer(nodeenhancer.[h|cpp]): It is an interface class. Its aim it to "transform" or "enhance" DOM::Nodes when DOM::Nodes are synchronized from Nodes. Sometimes it is to add some style (e.g. adding a red dotted border to FORM elements) but sometimes it is essential (e.g. KHTML won't accept a TABLE without a TBODY even if it is DTD valid, so we had to manually add a TBODY. It explain why some Nodes can point to more than 1 Nodes.) * HTMLEnhancer(htmlenhancer.[h|cpp]): Derived from NodeEnhancer, it apply transformations for HTML files. * htmlDocumentProperties(htmldocumentproperties.[h|cpp]): A simple quick start dialog, which needs some work. 6) Synchronizations So basically, we have the following design: In whatever views, changes are made. These changes are directly applied to the Node tree. But we will wait xxx ms before an update of the opposite view (configurable). In fact, it will *reload* the opposite view. Since the beginning, I wanted to update only the modified Nodes (UndoRedo's job) but I did some bad work and almost everything is commented now. So we are coming back to UndoRedo. Even if it is still not working, we should use it. You may have noticed that every function which modifies the Node tree has a strange parameter, a NodeModifsSet. In fact, when you start modifying the Node tree, everything must be recorded. It has two aims: * To provide undo/redo capabilities. * To know which Nodes have been modified and thus when synchronizing a view from another, to update only the modified Nodes. Finally an important boolean which needs to be mentioned: Tag::cleanStrBuilt This boolean specifies if the string stored in Tag::tagStr() is valid. In fact, when a change is made to a DOM::Node and a Node is being modified, only its members are modified (the attrs, the name, the type), but the tag string is not rebuilt (to save some CPU). When reloading the source view, every tag with the Tag::cleanStrBuilt set to false has its string rebuilt. For text Nodes, if: Tag::cleanStrBuilt == false, the text contains no entities e.g. "boo a" as seen in the VPL view. Tag::cleanStrBuilt == true, the text is parsed e.g. "boo a" as seen in the source view. Question: Why not (internally) directly modify the source view in dialogs (e.g. reading and modifying the HTML markup)? Answer: Because it will limit this dialog to the source view only! When editing the document in whatever view: VPL, source but also others like the node treeview we have at the upper left, the only thing always in synchronization is the Node tree!! A change made in the source calls Parser::rebuild, a change in VPL update the Node tree. So a dialog working on Nodes is nicer, because it will update the Node tree, and each view will be able to update itself from the Node Tree. For the moment, the only dialog working this way is the Document Properties Dialog. The Table editor dialog, for example, doesn't work in VPL because it is based on the source view. Question: But when a change is made in VPL, why not directly updating the source? Answer: Because we will end up with a very unresponsive VPL! Updating two Nodes tree and a source view require a bit of CPU! 7)TODO * Implement some missing things: copy/paste, bad cursor behaviour,... * Port the quanta dialogs (table editor,...) so that they can work in VPL * Make VPL a KPart * Make undo/redo works * Put a error system * Make VPL works for non-HTML DTDs! * Visual PHP edition (??) * Complete this file ;-) You can tell me what needs to be more explained! HAPPY HACKING! Nicolas Deschildre