Package eu.bandm.tools.util3
Class XalanSurrogateRepair
java.lang.Object
eu.bandm.tools.util3.XalanSurrogateRepair
Xalan has a severe bug: surrogate code elements(= two consecutive 16bit words
= two adjacent Java chars) are encoded separately as character references, when
serializing to a utf8 stream. (Same with utf16 stream!) This is wrong.
This code here repairs the resulting file, but only for this this special case:
It does not consider "CDATA-SECTIONS", etc.
It does not decode hexadecimal numeric character references.
There is no error handling.
It does not consider "CDATA-SECTIONS", etc.
It does not decode hexadecimal numeric character references.
There is no error handling.
-
Field Summary
Modifier and TypeFieldDescriptionprotected static XalanSurrogateRepairLexer_HandPatched
protected static antlr.Token
-
Constructor Summary
-
Method Summary
-
Field Details
-
lexer
-
nextToken
protected static antlr.Token nextToken
-
-
Constructor Details
-
XalanSurrogateRepair
public XalanSurrogateRepair()
-
-
Method Details
-
consume
protected static void consume() -
LA
protected static boolean LA(int t) -
main
public static void main(String[] args) throws FileNotFoundException, IOException, UnsupportedEncodingException
-