To fix some problems with lexing in Scintilla and to add new capabilities, there are going to be some major changes. It is likely these will go into the release after the next one. This release will be called 2.20 as the changes are not completely backwards compatible.
A problem with current Scintilla is that lexers and lexer options
such as properties and keywords are attached to the view (ScintillaBase
)
object rather than the Document
object. When two views are
showing one document then it is possible for two different lexers to be
called to style the text leading to arbitrary and confusing results.
To fix this, lexer state is being moved from ScintillaBase
to Document
although the state is still being set up by ScintillaBase
as it is providing the API to client code.
This will change the scope of some settings so may require changes to applications. Applications that only set up properties or word lists at initialisation or when changing languages will have to repeat these for each document. Conversely, there will no longer be a need to set parameters for each view on a document or when switching between documents on a view since documents retain settings.
Some languages may benefit from features like styling local variables differently to global variables or showing fields that are not present in a structure in an error style. These sorts of features require that something like a symbol table is maintained by the lexer.
Lexers currently have only limited space to store information about each document: the document's style bytes and line state (a single integer per line). There are some other locations that could be used, like unused bits in folding state, but using these for lexer state may not be compatible with future changes. This makes it too difficult to implement a symbol table with only the current features.
The solution is to create a lexer object which can contain arbitrary
additional data. Each document has a separate lexer object. Lexer
objects implement the ILexer
interface.
The lexer object may contain any data required for the functioning of the lexer. This can include information extracted from the document such as a list of functions.
There is no current way for a lexer to indicate that changing a
property or keyword list should cause restyling. In SciTE, you can for
example add a keyword to keywordclass.cpp, then return to a C++ document
and not see any change to existing styles. Only lexing done after the
addition will use the new keywords. Lexer objects will be responsible
for storing properties and word lists. They provide PropertySet
and WordListSet
methods to receive these parameters and
return a position where lexing should be restarted from (normally 0
although lexers may be more intelligent about this) or -1 if the change
does not affect lexing or folding.
Release
is called to destroy the lexer object.
PrivateCall
allows for direct communication between the
application and a lexer. An example would be where an application
maintains a single large data structure containing symbolic information
about system headers (like Windows.h) and provides this to the lexer
where it can be applied to each document. This avoids the costs of
constructing the system header information for each document. This is
invoked with the SCI_PRIVATELEXERCALL
API.
Currently lexers interact with the document through a concrete class
derived from the Accessor
abstract base class with Accessor
providing most of the functionality and the derived class implementing
communication with the document. This is either direct (DocumentAccessor
)
for lexers linked into Scintilla or through messages (WindowAccessor
)
for external lexers. In the new scheme, the only way of performing
communications with the document is through the IDocument
interface which can be used for external lexers as well as lexers linked
into Scintilla.
This avoids dependence on GUI windowing code and makes it easier to move lexers between shared libraries and linked in. It could also be used for lexers housed within application code although this has not yet been implemented.
Since IDocument
is an interface, it can be used across
build boundaries (such as between two DLLs) where the implementation can
not be seen from the client so can not be optimized by the compiler.
This gets in the way of efficient buffering, so the task of buffering is
moved to a helper class that is local to the lexer. Example helper
classes are the simple LexAccessor
and its subclass Accessor
which provides more services. These may be used by lexers or lexers may
create their own helper classes.
The use of interfaces between components is similar to COM or XPCOM.
Using actual COM or XPCOM would add complexity. The interfaces are
defined as C++ but can be emulated by C and probably by other languages
that are compatible with COM. SCI_METHOD
is defined to be
whatever is needed to specify a reasonable calling convention on each
platform so that each side of the interface can call the other. This is
currently __stdcall
on Windows and is unspecified on Unix.
The ILexer
and IDocument
interfaces may be
expanded in the future with extended versions (ILexer2
...).
The Version
method indicates which interface is
implemented and thus which methods may be called.
Scintilla tries to minimize the consequences of modifying text to
only relex and redraw the line of the change where possible. Lexer
objects contain their own private extra state which can affect later
lines. For example, if the C++ lexer is greying out inactive code
segments then changing the statement #define BEOS 0
to #define
BEOS 1
may require restyling and redisplaying later parts of the
document. The lexer can call ChangeLexerState
to signal to
the document that it should relex and display more.
SetErrorStatus
is used to notify the document of
exceptions. Exceptions should not be thrown over build boundaries as the
two sides may be built with different compilers or incompatible
exception options.
External lexers will require changes.
They will have to implement a lexer object factory function (exposed
through GetLexerFactory
) instead of the current Lex
and Fold
functions. Once a lexer object has been created,
it is called exactly the same as internal lexer objects.
Existing lexers do not have to change much as the LexerModule
and LexerSimple
classes provide a very similar
environment. The set of headers used by lexers has changed but is fairly
consistent among lexers so can just be copied from a lexer included
with Scintilla. Lexers should not include Platform.h and only use
headers from the include and lexlib directories. Using headers from the
src, win32, or gtk directories makes the code dependent on features that
may change so should not be done.
A lexer may be converted to an ILexer
implementing class
by defining a class derived from ILexer
, a factory
function and changing the LexerModule
to use the factory
function rather than lexing and folding functions. Initially it is
simplest to derive the class from LexerBase
as this
provides some default functionality including standard property set and
word lists. Later these should be overridden to optimize changes to
parameters.
Around 60 lines of boiler-plate additional code are needed to convert an existing lexer into an external lexer that implements ILexer.
An implementation of all this is available from http://www.scintilla.org/nulex.zip
Additional directories have been used to impose some more order on the source code. Lexers have been moved into the lexers directory and classes used by lexers are in the lexlib directory. The build files work for Windows and GTK+, but those for OS X have not been updated.
The C++ lexer included has some code to show whether or not code
is active based on preprocessor state with inactive code shown in
different styles to active code. This is turned on with lexer.cpp.track.preprocessor=1
and keywords5
containing a set of preprocessor definitions
in the form <var>=<value> <var>=<value>
...
Definitions within the source will be picked up if lexer.cpp.update.preprocessor=1
.
Both these options have some cost in terms of speed and memory. The
inactive states are 64 greater than their active counterparts. This
looks like
Example properties to achieve above:
lexer.cpp.track.preprocessor=1
lexer.cpp.update.preprocessor=1
keywords5.$(file.patterns.cpp)=\
PLAT_GTK=1 \
_MSC_VER \
PLAT_GTK_WIN32=1
# White space
style.cpp.64=fore:#808080,fore:#C0C0C0
# Comment: /* */.
style.cpp.65=$(style.cpp.1),fore:#90B090
style.cpp.66=$(style.cpp.2),fore:#90B090
style.cpp.67=$(style.cpp.3),fore:#D0D0D0
style.cpp.68=$(style.cpp.4),fore:#90B0B0
style.cpp.69=$(style.cpp.5),fore:#9090B0
style.cpp.70=$(style.cpp.6),fore:#B090B0
style.cpp.71=$(style.cpp.7),fore:#B090B0
style.cpp.72=$(style.cpp.8),fore:#C0C0C0
style.cpp.73=$(style.cpp.9),fore:#B0B090
style.cpp.74=$(style.cpp.10),fore:#B0B0B0
style.cpp.75=$(style.cpp.11),fore:#B0B0B0
style.cpp.76=$(style.cpp.12),fore:#000000
style.cpp.77=$(style.cpp.13),fore:#007F00
style.cpp.78=$(style.cpp.14),fore:#7FAF7F
style.cpp.79=$(style.cpp.15),fore:#C0C0C0
style.cpp.80=$(style.cpp.16),fore:#C0C0C0
style.cpp.81=$(style.cpp.17),fore:#C0C0C0
style.cpp.82=$(style.cpp.18),fore:#C0C0C0