SinkWorld banner image

Portability

There are demonstration GUI front ends for the current code base. These are source code display applets and there is one version written in Java and another in C#. The demonstrations are reasonably fast downloads on a cable modem but may be too slow to use from a dial up line.

The Java version (35K) uses Swing so needs a browser with a recent version of Java installed.It has worked in Internet Explorer 6 on Windows 2000, and Mozilla on both Windows 2000 and Linux. The Java Runtime Environment 1.3 was installed to provide a JVM. Use the '+' and '-' keys to zoom in and out on the text. The 'l' key loads a small file from the server and the 'o' key a large file. May need to click on the applet before the keyboard will work.

The C# version (45K) uses .NET so needs a recent version of Internet Explorer (tested on 6.0) with .NET 1.0 installed. Either the .NET SDK (140 Meg download, tested) or the .NET Framework (22 Meg download) should work. The '+' and '-' keys work for zooming like the Java version but only when the control has focus which must be given to it by pressing the tab key until the control's background is yellow. There is a security restriction in .NET that stops the control from claiming focus when clicked on. There is a control bar at the top that calls the control. The first combo box downloads pieces of source from the web site and displays them. Code fragments can be pasted into the next control, a textarea and then copied into the source displayer with the Set button. The TextHintMode is for experimenting with some of the graphics options provided by .NET.

Managed Execution

Managed execution platforms such as the Java Virtual Machine and the .NET Framework have become very important to many developers and are likely to dominate software development in the future.

These platforms enhance safety by allowing components to interact with a greater (and sometimes scalable) degree of control imposed on them by the platform. While this has some benefits for avoiding crashes in stand-alone applications, it is particularly beneficial when combining components inside applications such as web browsers.

This is of interest to Scintilla in allowing low footprint development environments. Komodo has demonstrated an IDE built on the framework of a browser but is itself a heavyweight installation. Another approach to development is taken by ZOPE where much of the development can be performed within any web browser. This does not provide the rich user experience available from 'thick-client' software. A middle ground can be reached by using technologies such as Dynamic HTML and applets within the browser. These technologies are becoming more viable as browsers are enhanced and bandwidth increases.

It is probable that Scintilla can be implemented in a 40-80K applet that will download over a cable modem or similar in a second or so making it a viable UI element within browser based development environments.

Plugin code, such as extra lexers, can be used to enhance Scintilla. When used within a managed platform, Scintilla can be allowed to safely download and integrate these components as needed for handling the different languages that may be used.

It is unclear, at least to me, which of these platforms will be most successful or if another platform will be even more popular. By targeting both managed platforms as well as the current unmanaged platforms, code can be assured of running in the future, no matter which platform wins. This preserves the investment made in Scintilla code as well as code that uses Scintilla which can then be migrated to another platform more easily.

.NET differentiates between several management concepts. Managed code runs on .NET using the memory management built into .NET. Verifiable managed code is a subset of managed code that can be verified as correct and so can be trusted to run without compromising the user's security. The aim with SinkWorld is to produce verifiable managed code.

Portability isn't hard, its just work

Scintilla was originally written to run on Win32 and has since been ported to the GTK+, wxWindows and Qt toolkits. The platform adaptation layer for one platform is less than 20% of the size of the platform independent code. While there are sometimes complexities in dealing with the unique aspects of a platform, the bulk of this code is fairly straightforward, providing a simple library for performing basic graphics and UI operations, dealing with the platform's version of events and providing an API in the style of the platform. The platform independent code is more intricate, with complex control paths and optimisations making it harder to work on.

All GUI environments offer similar capabilities. Where they differ, providing only lowest common denominator functionality can be avoided by building rich functionality out of lower level functions and by allowing functionality that only exists on one platform. An example of this in the current version of Scintilla is that Unicode is only supported on Windows. Once Unicode is well supported on other platforms then Scintilla can be changed to provide that feature. If a platform doesn't provide its own Unicode support then it is also possible, although it would be a significant amount of work, to develop Unicode support within the platform adaptation layer.

Target Languages

To target these different platforms, as well as the platforms currently targeted by Scintilla, different programming languages must be used. C++ can not be used on the JVM or for verifiable managed code on .NET as it is an unsafe language, using machine address pointers with no bounds checking. .NET does allow mixing unsafe and safe code but using some unsafe code means that the assembly is unsafe. Managed C++ can be used on .NET but not to produce verifiable managed code. There is no C# compiler available for the JVM and while a Java compiler for .NET has been announced, no details about availability, price, or whether it will remain supported in the long term have been revealed.

Implementing and supporting a version of Scintilla in multiple languages would take quite a bit of effort. This effort can be minimised if it were possible to compile for unmanaged platforms, the JVM, and .NET from one code base.

The C++, Managed C++, C#, and Java languages are similar enough that very similar code can be written in each. Here is an implementation of a split buffer, the core class used to store text in Scintilla, in all four languages:

A combined difference view uses background colour to indicate how the 4 files differ.

The languages are so similar that it is possible to automatically translate from one to the other with a small amount of code and this has been the major focus of initial SinkWorld development.

The above module was the first translated by hand from C++, before the start of the XLang.py translation tool so code differences, such as * versus [] were minimised. With automatic translation, more idiomatic code can be generated.

C~ - a C++ subset translatable to Java and C#

It may be possible to translate Java or C# into C++ but C++, especially the preprocessor, allows a degree of freedom in writing valid source code not available in C# or Java. The preprocessor allows inserting instructions to the translation program that create valid C++ code or are empty and so disappear after preprocessing. This is the reason that the work done so far has been using a subset of C++ as the primary source representation of components. This freedom has not been needed so far as the development has encountered fewer problems than expected but this technique may be needed later.

C++ is a very complex language and most developers and projects keep to a subset of C++. Scintilla has used a subset that does not include exceptions, templates or run-time type information. As Java and C# use exceptions widely, SinkWorld code will have to deal with exceptions and so the C++ version of the code will also need to handle exceptions. Templates or generics are not yet available for C#/.NET and are not standardised for Java, so any use of templates will require that the language translator expand templates. Templates do not appear necessary although it could, for example, be useful to parameterise the cbx class shown above by the type of data stored in the buffer to allow a choice between 8 bit characters and 16 bit characters. The run-time type information available in Java and .NET is very complete and useful but is far beyond that provided by C++, so will not be used in SinkWorld.

There is now a simple template expansion program, Template.py which is used to create 1, 2, and 4 byte wide versions of cbx to be used as styling buffers depending on the amount of styling information required by an application.

The subset of C++ used in SinkWorld does not yet have a formal definition, and is instead defined by what the tools allow. The subset will expand over time as more code is converted to be translatable. Since this subset may need to be discussed, and may be defined formally, it is called C~, pronounced C-wriggle.

Using exceptions in the C++ version does lose some portability to platforms such as WinCE which do not support exceptions. It may be possible to use the Java or C# translations for these platforms. WinCE is supposed to support .NET.

In some places C++ does not differentiate as much as Java or C#. Java and C# have more forms of privacy than C++ and differentiate between interfaces and classes. As C# and Java only allow implementation inheritance from one super class, the C~ input must also restrict implementation inheritance and differentiate between classes and interfaces. This will be by a convention where interface names start with 'I' which is already used by some developers, particularly those using COM.

C~ is a subset of C++ rather than a complete new language as it is easier to use other tools such as debuggers on the code if the primary representation is a current programming language. Developing a new language would also redirect effort towards defining and improving the new language rather than working on the classes.

XLang - an automatic translator

The XLang translator reads and tokenises C++ (C~) source code and then performs a series of lexical transformations to change C++ syntax into C# and Java syntax.

Translation is done line by line so some syntax elements such as method declarations must currently be on one line to be handled. The transformations are mostly substitutions of sequences of lexemes into other sequences of lexemes using a small lexeme sequence description language.

Currently, all class methods must be defined within the class definition, so all of a class is defined in one place. This should be improved in the future with the normal class header and implementation files merged to construct the Java/C# code. There is already another translator, C2J++ which can perform this merging.

Methods may now be defined either in the header or declared in a header and defined in a source file.

The description language consists of a stream of lexeme descriptions separated by "~" with a literal "~" represented by "~~". Each lexeme description starts with a letter designating the lexical class and an optional value. An empty value matches any value. The lexical class letters are listed in CxxTokens.py. "kprivate~o:" matches the keyword "private" followed by the operator ":". A very simple translation would change the C++ "NULL" constant identifier to the Java/C# "null" with
tokens.substTokenStrings("iNULL", "null")
A more complex translation takes constants defined as enumerations and produces Java/C# "static int" declarations, so "enum { SCE_C_DEFAULT = 0 };" becomes "static int SCE_C_DEFAULT = 0;" with
tokens.substTokenStrings("kenum~o{~i~o=~m~o}", "static int $2 = $4", 1)
which uses the final "ignore whitespace" argument to handle situations where optional whitespace may appear between elements. The "$n" substitutions refer to the values that match each lexeme.

By acting on lexemes XLang can avoid some of the context sensitivity problems that would occur if it used regular expressions to match input. However, this means that it does not have many of the features of regular expressions such as matching "any number" or "one or more" elements which would be useful when matching constructs such as a list of type keywords like "const unsigned long".

Current code

The current code can be downloaded. It is also available from the sinkworld module in the Scintilla CVS. This code only works completely on Windows NT and Windows 2000. Building all the code requires quite a few tools: Microsoft .NET Beta 1, Python, Java, JUnit, NUnit, and CppUnit. It should be possible to translate to Java and build that without .NET installed with "make mj.class". To allow deployment to users without all these tools in the future there will also be distributions with all the code generation already performed so someone using the Java translation will only need a Java compiler.

Four classes have been implemented:

These classes are implemented in C++ and Managed C++ and the C++ implementations are automatically converted to C# and Java implementations. There is also some support code:

The Managed C++ version has not been updated with the new classes.

The use of unit testing frameworks is very important in this project to ensure that the automatically translated code has the same behaviour in each translation.

Open Questions

The current code performs well with the slowest translation being around one third of C++ speed. This code is faster than Scintilla because of better data strucures.