Strict Standards: Declaration of action_plugin_blog::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php on line 13

Strict Standards: Declaration of action_plugin_discussion::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/discussion/action.php on line 745

Strict Standards: Declaration of action_plugin_importoldchangelog::register() should be compatible with DokuWiki_Action_Plugin::register($controller) in /www/htdocs/w00d9226/oliverh.com/lib/plugins/importoldchangelog/action.php on line 157

Deprecated: Assigning the return value of new by reference is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parserutils.php on line 202

Deprecated: Assigning the return value of new by reference is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parserutils.php on line 205

Deprecated: Assigning the return value of new by reference is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parserutils.php on line 314

Deprecated: Assigning the return value of new by reference is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/parserutils.php on line 454

Strict Standards: Declaration of cache_instructions::retrieveCache() should be compatible with cache::retrieveCache($clean = true) in /www/htdocs/w00d9226/oliverh.com/inc/cache.php on line 291

Deprecated: Function split() is deprecated in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 146

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php:13) in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 236

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 390

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 390

Deprecated: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /www/htdocs/w00d9226/oliverh.com/inc/auth.php on line 387

Strict Standards: Only variables should be passed by reference in /www/htdocs/w00d9226/oliverh.com/doku.php on line 69

Warning: Cannot modify header information - headers already sent by (output started at /www/htdocs/w00d9226/oliverh.com/lib/plugins/blog/action.php:13) in /www/htdocs/w00d9226/oliverh.com/inc/actions.php on line 350
====== Syntax Tokenizer Extension for Saxon ====== This extension is no langer maintained. The current version works only for Saxon 6.5 but not for Saxon 8.x. This extension for Saxon 6.5 provides an XSLT extension function for tokenizing programming code fragments. The extension function can be used to perform on-fly syntax highlighting during the transformation process. How does it works? The function returns an interim node-set containing a tokenized representation of the code. The stylesheet can then process this node-set, e.g. to generate formatted, syntax-highlighted output. ===== Download and Installation ===== Download the [[http://oliverh.com/files/saxon6-tokenizer.zip|saxon6-tokenizer.zip]] distribution. After download, just unzip the file and make sure that the saxon6tokenizer.jar file is part of your classpath if you run Saxon the next time. Note: The extension works only with Saxon 6.5 (and maybe some other versions of the 6.x series). It does **not** work with Saxon 8. ===== Usage ===== To use the extension in Saxon, you have to associate the extension's namespace ''java:com.oliverh.xsltext.tokenizer.Saxon6Tokenizer'' (that's the Java class defining the extension) with a namespace prefix of your choice. I will use the prefix ''syn'' in the following examples. If the namespace is properly bound, the following two extension functions are available in XPath expressions: * ''syn:tokenize(program,language)'': Returns a node-set containing the tokenized representation of the given program (for more details, see below). * ''syn:supportsSyntax(language)'': Returns a boolean value indicating whether the given programming language is supported by the tokenizer or not. Both arguments are interpreted as string values. ===== Tokenized Node-set ===== The function splits the given program fragment up into lines and tokens and returns them as node set. The node set consists of '''' elements which themselves contain nested '''' elements and text nodes. The '''' elements have an attribute ''class'' describing the type of the token. Possible values are ''keyword'', ''literal'', ''operator'' and ''comment''. ===== Supported Programming Languages ===== The extension uses the free [[http://sourceforge.net/projects/jedit-syntax/|jEdit Syntax Package]] as tokenizer engine which accepts the following programming languages names: > Batch, CC, C, Eiffel, HTML, IDL, JavaScript, Java, Makefile, PHP, Patch, Perl, Props, Python, SQL, ShellScript, TSQL, TeX, XML, XPath For more information about the tokenizer, have a look at the [[http://sourceforge.net/projects/jedit-syntax/|homepage]] of the jEdit Syntax Package. If you know a better tokenizer engine, please let me know. ==== Example ==== Consider the following Java code fragment: // Print a hello message public void sayHello() { System.out.println("Hello!"); } If you provide this fragment to the ''syn:tokenize'' extension function with the value ''Java'' as second argument, the function will return a node set which corresponds to: // Print a hello message void sayHello( For further details, have a look at test/test.xsl.