You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

130 lines
4.0 KiB

-------------------------------------------------------------------------------
This document is too incomplete to be of much use.
Patches are welcome!
Theory of operation
-------------------
Uncrustify goes through several steps to reformat code.
The first step, parsing, is the most complex and important.
Step 1 - Tokenize
-----------------
C code must be understood to some degree to be able to be properly indented.
The parsing step reads in a text buffer and breaks it into chunks and puts
those chunks in a list.
When a chunk is parsed, the original column and line are recorded.
These are the chunks that are parsed:
- punctuators
- numbers
- words (keywords, variables, etc)
- comments
- strings
- whitespace
- preprocessors
See token_enum.h for a complete list.
See punctuators.cpp and keywords.cpp for examples of how they are used.
In the code, chunk types are prefixed with 'CT_'.
The CT_WORD token is changed into a more specific token using the lookup table
in keywords.cpp
Step 2 - Tokenize Cleanup
-------------------------
The second step is to change the token type for certain constructs that need
to be adjusted early on.
For example, the '<' token can be either a CT_COMPARE or CT_ANGLE_OPEN.
Both are handled very differently.
If a CT_WORD follows CT_ENUM/CT_STRUCT/CT_UNION, then it is marked as a CT_TYPE.
Basically, anything that doesn't depend on the nesting level can be done at this
stage.
Step 3 - Brace Cleanup
-------------------------
This is possibly the most difficult step.
do/if/else/for/switch/while bodies are examined and virtual braces are added.
Brace parent types are set.
Statement start and expression starts are labeled.
And #ifdef constructs are handled.
This step determines the levels (m_braceLevel, level and m_ppLevel).
REVISIT:
The code in brace_cleanup.cpp needs to be reworked to take advantage of being
able to scan forward and backward. The original code was going to be merged
into tokenize.cpp, but that was WAY too complex.
Step 4 - Fix Symbols (combine.cpp)
----------------------------------
This step is no longer properly named.
In the original design, neighboring chunks were to be combined into longer
chunks. This proved to be a silly idea. But the name of the file stuck.
This is where most of the interesting identification stuff goes on.
Colons type are detected, variables are marked, functions are labeled, etc.
Also, all the punctuators are classified. Ie, CT_MINUS become CT_NEG or CT_ARITH.
- Types are marked.
- Functions are marked.
- Parenthesis and braces are marked where appropriate.
- finds and marks casts
- finds and marks variable definitions (for aligning)
- finds and marks assignments that may be aligned
- changes CT_INCDEC_AFTER to CT_INCDEC_BEFORE
- changes CT_STAR to either CT_PTR_TYPE, CT_DEREF or CT_ARITH
- changes CT_MINUS to either CT_NEG or CT_ARITH
- changes CT_PLUS and CT_ADDR to CT_ARITH, if needed
- other stuff?
Casts
-----
Casts are detected as follows:
- paren pair not part of if/for/etc nor part of a function
- contains only CT_QUALIFIER, CT_TYPE, '*', and no more than one CT_WORD
- is not followed by CT_ARITH
Tough cases:
(foo) * bar;
If uncertain about a cast like this: (foo_t), some simple rules are applied.
If the word ends in '_t', it is a cast, unless followed by '+'.
If the word is all caps (FOO), it is a cast.
If you use custom types (very likely) that aren't detected properly (unlikely),
the add them to the config file like so: (example Using C-Sharp types)
type UInt32 UInt16 UInt8 Byte
type Int32 Int16 Int8
Step 6+ Everything else
-------------------------
From this point on, many filters are run on the chunk list to change the
token columns.
indent.cpp sets the left-most column.
align.cpp set the column for individual chunks.
space.cpp sets the spacing between chunks.
Others insert newlines, change token position, etc.
Last Step - Output
-------------------------
At the final step the list is printed to the output.
Everything except comments are printed as-is.
Comments are reformatted in the output stage.