Andreas Rozek      

LuaJava_09 - what happens to Unicode strings?

Lua uses 8-bit strings while Java is Unicode-based - thus: what happens to Java strings which cannot be represented using 8-bit characters? LuaJava_09 goes further into this question.

Please, also consider my "Hints for Reading" and the "List of Recent Changes"!
(Problems displaying this page? Ugly graphics? Please, click here)

LuaJava_09

The script itself is extremely simple and should not require any further explanation. The accompanying Java source may be compiled using
 

  javac LuaJava_09.java

and does not need the luajava package. The resulting class file should be copied to a place where it can be found by the Java class loader (e.g., into a directory which is automatically scanned by the Java extension mechanism).

After an invocation of the form
 

  java luna.LuaJava LuaJava_09.lua

the script produces the following output

LuaJava_09 - what happens to Unicode strings?

  Lua2Java('abc')                -> \u0061\u0062\u0063
  Java2Lua('\u0161\u0262\u0363') -> š??
    (length = 6, content = "\197\161\201\162\205\163")
    (for a comparison "\97\98\99" = "abc")

Which is probably not what one might expect:

  • Lua strings are properly converted to their Java counterparts - that's easy enough;
  • Java (i.e. Unicode) strings with a non-empty codepage will get destroyed when passed to Lua:
    • the length of the resulting Lua string is longer than that of the Java string;
    • the (byte) codes of the individual characters in the Lua string seem somewhat random (the author has no idea what happened, at least)

    What the author has not checked so far, however, is whether Lua gets an UTF-encoded string from Java - this would preserve the original contents, while remaining readable as long as the original (Unicode) characters are within the range 0..127 (decimal)

Source Code

The source code of this script and its accompanying Java class is publically available:

Disclaimer

Please, also consider the author's Disclaimer!


http://www.Andreas-Rozek.de/LuaJava/Acquainting/LuaJava_09_en.html   (last Modification: 26.11.2004)