You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
130 lines
4.0 KiB
Plaintext
130 lines
4.0 KiB
Plaintext
|
|
-------------------------------------------------------------------------------
|
|
This document is too incomplete to be of much use.
|
|
Patches are welcome!
|
|
|
|
|
|
Theory of operation
|
|
-------------------
|
|
|
|
Uncrustify goes through several steps to reformat code.
|
|
The first step, parsing, is the most complex and important.
|
|
|
|
|
|
Step 1 - Tokenize
|
|
-----------------
|
|
C code must be understood to some degree to be able to be properly indented.
|
|
The parsing step reads in a text buffer and breaks it into chunks and puts
|
|
those chunks in a list.
|
|
|
|
When a chunk is parsed, the original column and line are recorded.
|
|
|
|
These are the chunks that are parsed:
|
|
- punctuators
|
|
- numbers
|
|
- words (keywords, variables, etc)
|
|
- comments
|
|
- strings
|
|
- whitespace
|
|
- preprocessors
|
|
|
|
See token_enum.h for a complete list.
|
|
See punctuators.cpp and keywords.cpp for examples of how they are used.
|
|
|
|
In the code, chunk types are prefixed with 'CT_'.
|
|
The CT_WORD token is changed into a more specific token using the lookup table
|
|
in keywords.cpp
|
|
|
|
|
|
Step 2 - Tokenize Cleanup
|
|
-------------------------
|
|
|
|
The second step is to change the token type for certain constructs that need
|
|
to be adjusted early on.
|
|
For example, the '<' token can be either a CT_COMPARE or CT_ANGLE_OPEN.
|
|
Both are handled very differently.
|
|
If a CT_WORD follows CT_ENUM/CT_STRUCT/CT_UNION, then it is marked as a CT_TYPE.
|
|
Basically, anything that doesn't depend on the nesting level can be done at this
|
|
stage.
|
|
|
|
|
|
Step 3 - Brace Cleanup
|
|
-------------------------
|
|
|
|
This is possibly the most difficult step.
|
|
do/if/else/for/switch/while bodies are examined and virtual braces are added.
|
|
Brace parent types are set.
|
|
Statement start and expression starts are labeled.
|
|
And #ifdef constructs are handled.
|
|
|
|
This step determines the levels (m_braceLevel, level and m_ppLevel).
|
|
|
|
REVISIT:
|
|
The code in brace_cleanup.cpp needs to be reworked to take advantage of being
|
|
able to scan forward and backward. The original code was going to be merged
|
|
into tokenize.cpp, but that was WAY too complex.
|
|
|
|
|
|
Step 4 - Fix Symbols (combine.cpp)
|
|
----------------------------------
|
|
|
|
This step is no longer properly named.
|
|
In the original design, neighboring chunks were to be combined into longer
|
|
chunks. This proved to be a silly idea. But the name of the file stuck.
|
|
|
|
This is where most of the interesting identification stuff goes on.
|
|
Colons type are detected, variables are marked, functions are labeled, etc.
|
|
Also, all the punctuators are classified. Ie, CT_MINUS become CT_NEG or CT_ARITH.
|
|
|
|
- Types are marked.
|
|
- Functions are marked.
|
|
- Parenthesis and braces are marked where appropriate.
|
|
- finds and marks casts
|
|
- finds and marks variable definitions (for aligning)
|
|
- finds and marks assignments that may be aligned
|
|
- changes CT_INCDEC_AFTER to CT_INCDEC_BEFORE
|
|
- changes CT_STAR to either CT_PTR_TYPE, CT_DEREF or CT_ARITH
|
|
- changes CT_MINUS to either CT_NEG or CT_ARITH
|
|
- changes CT_PLUS and CT_ADDR to CT_ARITH, if needed
|
|
- other stuff?
|
|
|
|
|
|
Casts
|
|
-----
|
|
Casts are detected as follows:
|
|
- paren pair not part of if/for/etc nor part of a function
|
|
- contains only CT_QUALIFIER, CT_TYPE, '*', and no more than one CT_WORD
|
|
- is not followed by CT_ARITH
|
|
|
|
Tough cases:
|
|
(foo) * bar;
|
|
|
|
If uncertain about a cast like this: (foo_t), some simple rules are applied.
|
|
If the word ends in '_t', it is a cast, unless followed by '+'.
|
|
If the word is all caps (FOO), it is a cast.
|
|
If you use custom types (very likely) that aren't detected properly (unlikely),
|
|
the add them to the config file like so: (example Using C-Sharp types)
|
|
type UInt32 UInt16 UInt8 Byte
|
|
type Int32 Int16 Int8
|
|
|
|
|
|
Step 6+ Everything else
|
|
-------------------------
|
|
|
|
From this point on, many filters are run on the chunk list to change the
|
|
token columns.
|
|
|
|
indent.cpp sets the left-most column.
|
|
align.cpp set the column for individual chunks.
|
|
space.cpp sets the spacing between chunks.
|
|
Others insert newlines, change token position, etc.
|
|
|
|
|
|
Last Step - Output
|
|
-------------------------
|
|
|
|
At the final step the list is printed to the output.
|
|
Everything except comments are printed as-is.
|
|
Comments are reformatted in the output stage.
|
|
|