Customised Java UTF-16

I have implemented customized encoding mechanism for javaUTF16. Does this implementation support all the characters? public class Encoding { public static void main(String[] args) { byte [] arr = new byte[1000]; String str = “abcde” ; //even this encoding works supplementary characters Encode(arr,0,str); System.out.println(Decode(arr,str.length())); } public static byte[] Encode(byte[] ByteArray , int offset ,String str) … Read more

Matching accented characters with Javascript regexes

Here’s a fun snippet I ran into today: /\ba/.test(“a”) –> true /\bà/.test(“à”) –> false However, /à/.test(“à”) –> true Firstly, wtf? Secondly, if I want to match an accented character at the start of a word, how can I do that? (I’d really like to avoid using over-the-top selectors like /(?:^|\s|’|\(\) ….) Answer This worked for … Read more

How can I display a ‘Reload’ symbol in HTML without loading an image via HTTP?

I would like to display a ‘refresh’ symbol in an HTML/JavaScript app I’m creating, but I do not want to make any HTTP requests to load an image. How can I do this reliably across all major browsers? The closest Unicode value I could find is: ↺ (↺), but the arrow is pointing the wrong … Read more

How can I use io.StringIO() with the csv module?

I tried to backport a Python 3 program to 2.7, and I’m stuck with a strange problem: >>> import io >>> import csv >>> output = io.StringIO() >>> output.write(“Hello!”) # Fail: io.StringIO expects Unicode Traceback (most recent call last): File “<stdin>”, line 1, in <module> TypeError: unicode argument expected, got ‘str’ >>> output.write(u”Hello!”) # This … Read more

How do I compare a Unicode string that has different bytes, but the same value?

I’m comparing Unicode strings between JSON objects. They have the same value: a = ‘人口じんこうに膾炙かいしゃする’ b = ‘人口じんこうに膾炙かいしゃする’ But they have different Unicode representations: String a : u’\u4eba\u53e3\u3058\u3093\u3053\u3046\u306b\u81be\u7099\u304b\u3044\u3057\u3083\u3059\u308b’ String b : u’\u4eba\u53e3\u3058\u3093\u3053\u3046\u306b\u81be\uf9fb\u304b\u3044\u3057\u3083\u3059\u308b’ How can I compare between two Unicode strings on their value? Answer Unicode normalization will get you there for this one: >>> import … Read more

JSON and escaping characters

I have a string which gets serialized to JSON in Javascript, and then deserialized to Java. It looks like if the string contains a degree symbol, then I get a problem. I could use some help in figuring out who to blame: is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in) is … Read more