Computers primarily deal with numbers. Any character or graphic that you see is represented by an internal numeric code. The set of codes that represents a given set of characters is referred to as a character-set. Computers use the codes to find the correct symbol in a font, for display.
Traditional computer character sets comprised only 127 different codes (including some that had special meaning and did not directly translate to a symbol). This was fine for the display terminals at the time, but internationalisation required more. ISO-8859-1 (aka Latin-1) expanded on the available symbols, but human languages comprise a set of literally tens of thousands of different symbols.
), and has undergone several revisions since then. At last count it supports more than 90,000 codes, with more being added. While several standards currently compete for technical, historical and political reasons, Unicode is the most complete character code representation available at the present time. For a more indepth history of Unicode try
You may have noticed some of the SL residents chat in languages other than english. Some of our Japanese residents, for example are chatting in their native language, with their native character sets. If you have the proper
you will see the characters in their native japanese character set. If you do not, you'll see a series of small grey blocks in your chat history (your operating system's way of telling you that it cannot find a font which contains symbols that match the character codes).
Not every part of SL is happy with unicode. While some years old now, Unicode adoption has been slow. Generally you will find that unicode functions with Chat, IM, and the email/IM gateway. Other parts, like the names of objects, inventory search and such may not understand characters typed that are not Latin-1
The encoding standard that SL uses is UTF-8, raw UTF-8 strings can be built with the use of
. SL isn't fully compliant with UTF-8 yet.
These two functions will make you wish you were never born.
returns the unicode value of the character. If the character is not part of the utf-8 character set or is invalid the function returns the first byte and sets the negative bit.
returns the utf-8 encoded version of the integer unicode. Standard Compliant.
// Memory intensive, but fast.
// TODO: Overcome llParseStringKeepNulls/llParseString2List's seperators/spacers limitations.
// TODO: Check for off by 1 errors.
integer stringBytes(string n) {
n = llEscapeURL(n);
// tokens list contains all combinations of %XX that llEscapeURL can output.
list tokens = ["%01","%02","%03","%04","%05","%06","%07","%08","%09","%0A","%0B","%0C","%0D","%0E","%0F","%10","%11","%12","%13","%14","%15","%16","%17","%18","%19","%1A","%1B","%1C","%1D","%1E","%1F","%20","%21","%22","%23","%24","%25","%26","%27","%28","%29","%2A","%2B","%2C","%2D","%2E","%2F","%3A","%3B","%3C","%3D","%3E","%3F","%40","%5B","%5C","%5D","%5E","%5F","%60","%7B","%7C","%7D","%7E","%7F","%80"]+["%81","%82","%83","%84","%85","%86","%87","%88","%89","%8A","%8B","%8C","%8D","%8E","%8F","%90","%91","%92","%93","%94","%95","%96","%97","%98","%99","%9A","%9B","%9C","%9D","%9E","%9F","%A0","%A1","%A2","%A3","%A4","%A5","%A6","%A7","%A8","%A9","%AA","%AB","%AC","%AD","%AE","%AF","%B0","%B1","%B2","%B3","%B4","%B5","%B6","%B7","%B8","%B9","%BA","%BB","%BC","%BD","%BE","%BF","%C0","%C1"]+["%C2","%C3","%C4","%C5","%C6","%C7","%C8","%C9","%CA","%CB","%CC","%CD","%CE","%CF","%D0","%D1","%D2","%D3","%D4","%D5","%D6","%D7","%D8","%D9","%DA","%DB","%DC","%DD","%DE","%DF","%E0","%E1","%E2","%E3","%E4","%E5","%E6","%E7","%E8","%E9","%EA","%EB","%EC","%ED","%EE","%EF","%F0","%F1","%F2","%F3","%F4","%F5","%F6","%F7","%F8","%F9","%FA","%FB","%FC","%FD","%FE","%FF"];
// The null strings in this list represent the number of %XX tokens that were parsed out.
integer withNullLength = llGetListLength(llParseStringKeepNulls(n, tokens, []));
list noNulls = llParseString2List(n, tokens, []);
return llStringLength((string)noNulls) // The 1-byte chars that weren't escaped
+ withNullLength - llGetListLength(noNulls); // Number of bytes in escaped chars.
}