Scintilla Future Features

Page last modified April 17 2001.

This page is for accessing the source of the current beta version of Scintilla/SciTE (600K), and the windows executable (320K) and to discuss upcoming features. This page will update without updating the main Scintilla page and the beta download will update whenever a new build is available without any change to this page. New beta downloads are announced on the Scintilla interest mailing list.

Further out

Typesafe Scintilla core

Currently Scintilla provides an interface based on the Windows [pass in an int and two int/pointers and receive an int/pointer] convention and also uses this convention internally. This makes it difficult to port to typesafe VMs like Java and .NET. Separating out a typesafe core from a non-typesafe API layer looks feasible and I'd like to run on these VMs as they look important for the future. Typesafe code is also inherently better being more verifiable both at compile time and by runtime instrumentation. For .NET, managed C++ can be used (I don't see much benefit to C#) but for Java, it should be possible to use a combination of limiting code to a C++ subset, some coding conventions and some comment directives to allow automatic translation into Java source. The typesafe core does not appear to appeal enough to any commercial users to be sponsored and its interesting to me so it is likely to be the next major feature implemented on my own time.

More styling bits

One of the very first modifications I received to Scintilla was from Philippe, dropping the styling bytes to allow handling larger files. One user, at least, wants more than 8 bits of styling information per character so this will probably be settable to 0, 8, 16 or 32 style bits per character. This may mean separating the document and style buffers rather than interleaving as having a variable stride will complicate the code more than I like. May be some performance costs to this although it could go either way.

Per document lexer state and scalable lexer state

Currently, lexer state can be stored in the styling bytes or in one int per line of line state. It should also be possible for the lexer to store state information attached to the document. Further, the management of the per character, per line and per document state should be handled simply by the environment so the lexer can say something like [remember that a here document delimited with "EOF" started at character 6276] and then be able to say [that here document ends at character 6354]. This is much like the named/described ranges possible in other editors. However, I see this as (at least initially) a feature only available to lexers as the lifetime of lexing information is handled by the environment (its a forward 'cascade') and persistent named ranges require maintenance when insertions and deletions are performed. One problem here could be with multiple pass lexers, such as those that first perform a styling pass, and then perform a much slower error detection pass to place indicators on errors or other bad code.

View state save/restore

It has been pointed out that folding does not stay when you change buffers in SciTE. This is because folding is stored in the view rather than the document to allow multiple views of one document to have different folding. Other applications also use the same approach as SciTE of having one Scintilla view object and switching any of the open documents into this view to make them visible. Scintilla could have a way of persisting all internal view state into a string/stream/blob that can be given back to Scintilla to ensure its view is in the same state as when the blob was retrieved. This feature would, for ease of implementation, only work where the document had not changed between the view state save and restore and would also not allow real persistence such as to a file.

Lexical state for internationalised text

When using an localisation scheme such as GNU's gettext, strings that are localised are marked like '_("English form")'. To enhance being able to look through files for localised strings and differentiate them from non-localised strings, this could be shown in an alternative style.

Merge similar lexers

The C++, Lua and Pascal lexers are fairly similar, differing mostly in simple features such as the strings used to group code into blocks and to mark comments. Merging these lexers would simplify maintenance, reduce code, and allow other similar languages such as Ada and Modula to be supported easily.