Avoid octal escapes, use raw bytes instead

gagern · gagern · commit ee31c97b258f · 2015-05-29T10:11:45.000+02:00
There was a bug where the hex-to-oct conversion would match \\x01.  But
support for octal escape sequences is optional in any case, and forbidden in
strict mode, so we should avoid using these.

As per the ECMAScript 5.1 spec, any source character (which may be any
unicode code point) can be used inside a string literal, with the exception
of backslash, line terminator or the quoting character.  So we do just that:
dump a lot of raw bytes into the string literal and escape only what needs
to be escaped.

There is one catch, though: sources are usually encoded in UTF-8, in which
case we can't exactly plug in raw bytes, but have to use UTF-8 sequences for
the range \x80 through \xff.  This may cause problems if the source file is
NOT interpreted as UTF-8.
diff --git a/emcc b/emcc
@@ -1337,10 +1337,11 @@ try:
           return '';
       membytes = ''.join(map(chr, membytes))
       if not memory_init_file:
-        s = repr(membytes)
-        hex_to_octal = lambda x: '\\%o' % int(x.group(1), 16)
-        s = re.sub(r'\\x([0-1][0-9A-Fa-f])(?:(?=[^0-9])|$)', hex_to_octal, s)
-        return 'var memoryInitializer = %s;' % s
+        s = membytes
+        s = s.replace('\\', '\\\\').replace("'", "\\'")
+        s = s.replace('\n', '\\n').replace('\r', '\\r')
+        s = s.decode('latin1').encode('utf8')
+        return "var memoryInitializer = '%s';" % s
       open(memfile, 'wb').write(membytes)
       if DEBUG:
         # Copy into temp dir as well, so can be run there too