Strict Standards: Declaration of action_plugin_blog::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php on line 13

Strict Standards: Declaration of action_plugin_discussion::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/discussion/action.php on line 0

Strict Standards: Declaration of action_plugin_importoldchangelog::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/importoldchangelog/action.php on line 0

Deprecated: Function split() is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 146

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php:13) in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 236

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 390

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 390

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 387

Strict Standards: Only variables should be passed by reference in /www/htdocs/w00d9226/oliverh.com/doku.php on line 69

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php:13) in /www/htdocs/w00d9226/oliverh.com/inc/actions.php on line 128
Syntax Tokenizer Extension for Saxon [Oliver's Site]


Deprecated: Function split() is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parser/lexer.php on line 510

Deprecated: Function split() is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parser/lexer.php on line 510

Deprecated: Function split() is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parser/lexer.php on line 510
Table of Contents

Syntax Tokenizer Extension for Saxon

This extension is no langer maintained. The current version works only for Saxon 6.5 but not for Saxon 8.x.

This extension for Saxon 6.5 provides an XSLT extension function for tokenizing programming code fragments. The extension function can be used to perform on-fly syntax highlighting during the transformation process.

How does it works? The function returns an interim node-set containing a tokenized representation of the code. The stylesheet can then process this node-set, e.g. to generate formatted, syntax-highlighted output.

Download and Installation

Download the saxon6-tokenizer.zip distribution. After download, just unzip the file and make sure that the saxon6tokenizer.jar file is part of your classpath if you run Saxon the next time.

Note: The extension works only with Saxon 6.5 (and maybe some other versions of the 6.x series). It does not work with Saxon 8.

Usage

To use the extension in Saxon, you have to associate the extension’s namespace java:com.oliverh.xsltext.tokenizer.Saxon6Tokenizer (that’s the Java class defining the extension) with a namespace prefix of your choice. I will use the prefix syn in the following examples.

If the namespace is properly bound, the following two extension functions are available in XPath expressions:

Both arguments are interpreted as string values.

Tokenized Node-set

The function splits the given program fragment up into lines and tokens and returns them as node set. The node set consists of <line> elements which themselves contain nested <token> elements and text nodes.

The <token> elements have an attribute class describing the type of the token. Possible values are keyword, literal, operator and comment.

Supported Programming Languages

The extension uses the free jEdit Syntax Package as tokenizer engine which accepts the following programming languages names:

Batch, CC, C, Eiffel, HTML, IDL, JavaScript, Java, Makefile, PHP, Patch, Perl, Props, Python, SQL, ShellScript, TSQL, TeX, XML, XPath

For more information about the tokenizer, have a look at the homepage of the jEdit Syntax Package. If you know a better tokenizer engine, please let me know.

Example

Consider the following Java code fragment:

// Print a hello message
public void sayHello() {
  System.out.println("Hello!");
}

If you provide this fragment to the syn:tokenize extension function with the value Java as second argument, the function will return a node set which corresponds to:

<line><token class='comment'>// Print a hello message</token></line>
<line><token class='keyword'>void</token> sayHello<token class='operator'>(</token> 

For further details, have a look at test/test.xsl.


Strict Standards: Only variables should be passed by reference in /www/htdocs/w00d9226/oliverh.com/doku.php on line 77