Implementing syntax highlighting can be a painful task. Lots of regular expressions and thinking may be required to master this.
On the search for an easy way to implement a syntax highlighting mechanism for NSTextView I found an easy and fast (regarding runtime) way to do this. By using flex (a tool with its roots in the 1970’s) defining the rules of what should be highlighted makes it a lot more easier and structured.
flex is the GNU version of lex, the lexical analyzer generator (Buzzword jackpot!) developed by Eric Schmidt and Mike Lesk in 1975. By defining some rules (mainly regular expressions) it generates a C program which allows us to scan through a text and do something with the text, e.g. divide it into tokens. Tokens are chunks of characters with a special meaning (as defined in our rules). Confused? Read on, this concept will become very clear soon.
In this article I want to show you how we can use this ancient but very powerful tool to implement a basic syntax highlighting within a NSTextView.
Overview
At first, let’s have a look at the goals. In the demo application (see below) we have one single window with a NSTextView where we can write some text. This text is parsed and highlighted in the following way:
- positive real number will be colored green
- negative real numbers will be colored red
A real number can be entered in the following formats:
- 12
- -3.54
- 23e2
- 23E2
- 423.2e-3
- -5e-1
For the regular expression gurus of you the format can be defined like this:
[-+]?[0-9]+(\.[0-9]+)?([eE][-+]?[0-9]+)?
In addition we want to sum up all found real numbers and show the result in our application.

I’d recommend you to download the source code and read the comments there for additional information.
This application consists of three parts:
- NSTextView (for the text) and NSTextField (to show the total value),
- the NSTextStorage delegate and
- the actual scanner created with flex
As you can see, we don’t need to subclass NSTextView to add highlighting, everything is done in the NSTextStorage delegate and the scanner.
NSTextView and NSTextField
The NSTextView is only used for the user to input some text. We don’t need to change its behavior nor do we need to subclass it.
The NSTextField at the bottom is used to show the total of the collected real number. Nothing special here either.
NSTextStorage delegate
The NSTextStorage delegte is responsible to pass the entered text to the scanner, to retrieve the token information and to add the string attributes (e.g. the color) to the original text.
The main part is a while-loop which fetches all tokens and if the token indicates a real number, an NSForegroundColorAttribute attribute is added to the text. That’s all we have to in the delegate in principle.
The scanner
The scanner is the heart of this implementation. It takes an ordinary C string and parses through it returning tokens if something interesting is found. Actually, it returns tokens for everything it could find (whether it is text or a real number). This behavior is needed in the delegate to determine the range.
To write such a scanner the flex tool becomes VERY handy, because you can define the behavior of the scanner with a fistful rules. Flex takes these rules and generates a more or less large C file (lex.yy.c) which is then the actual scanner.
Xcode supports flex files directly by adding a *.l file to the project. In the build process Xcode runs flex to “translate” it to C code and it also compiles this C code. We don’t have to do anything.
I’d suggest you to read the flex manual to get familiar with flex as I won’t explain it in detail. Flex is a really cool tool and to deal with it for some time won’t harm.
Unfortunately there is one problem with flex: It doesn’t know UTF-8 characters. As you might know UTF-8 characters consist of 2 to 4 bytes and flex thinks every single byte is one character.
But, there is a workaround to this problem. What we can do is to capture all those bytes and return them as on “byte-string”. Our delegate will read those bytes in and recognize them as one UTF-8 character.
The designers of UTF-8 were clever and hid the length of the UTF-8 character in the first byte. So, after reading the first UTF-8 byte we can determine how many bytes to follow (more information here, page 103). This buffer mechanism requires some additional variables and functions the scanner.
The regular expression mentioned above can easily be split up into more readable parts on the flex.l file. Changes to the rules can be applied easily this way.
Download
The demo application source code can be downloaded via my subversion repository (http://www.stiefels.net/svn/projects/Capture%20Reals/).
A universal binary is also available here.
Summary
If you’re interested in adding a real fast syntax highlighting to your application, doing this with flex should be one option to be considered. It may not be the perfect solution (UTF-8 workaround, missing runtime flexibility) but it is very easy to implement and may fit your requirements.
I hope this article was at least a little bit helpful. Please let me know what you think about it.
Thanks for reading!

0 Responses to “Syntax Highlighting for NSTextView with flex”