Firefox can pull your silky voice in over a microphone and generate Ogg/Vorbis. This format is small enough to allow anyone to upload 3 minutes of audio in as long as it takes to send a large photo by email.
Chrome however, only gives us PCM data. So what you gonna do? Spend an hour uploading WAV files to a server? No thanks.
I’m going to use Emscripten to transcode libOgg + libVorbis into Javascript. And then encode the PCM data arriving from the mic into a more bandwidth-friendly format. Also: it would scale better not to have the web-server do all of the audio encoding :)
Build step:
emcc -O2 -s ASM_JS=1 -s EXPORTED_FUNCTIONS='["_lexy_encoder_start", "_lexy_encoder_write", "_lexy_encoder_finish", "_lexy_test", "_lexy_write_test", "_lexy_get_buffer_length", "_lexy_get_buffer"]' -Iincludes -Llibs -lvorbis -lvorbisenc -logg vorbis.cpp -o vorbis.small.js
I have a tiny wrapper that performs vorbis encoding, feeds packets into ogg, and outputs encoded data into a buffer that can then be played via <audio> or pushed to a server via xmlhttp. The above build step compiles and exports the minimal amount of Javascript to support only the features of the libraries that I’m using.
The Emscripten overhead/bootstrap is 200KB
A full build exporting all ogg/vorbis functions: 4MB
Minimal build exporting just the critical parts (above) : 1.6MB
1.6MB isn’t awful. Gzip it and it’s fairly manageable. But we can do better!
Crack open the generated Javascript
Scanning through, the first thing I notice is gigantic blobs of binary data (I assume these are functions from the ogg/vorbis libraries). These account for a good 1MB of the generated file. A good target.
It looks like this:
Each byte is represented as an integer (0-255), written out as an ASCII string. This means a single byte like “119” is represented by 3 bytes on disk. My first thought: shrink it by representing it in pure binary form. But you can’t just write binary into a Javascript file, that will generate random characters and break code. So you need to encode it somehow. A fast and well supported encoding method, base64 is worth trying.
How do we do that? Need to find the code in Emscripten responsible for generating these blobs. Scan the emscripten folder for the “memory initializer” comment and see where that takes us. Turns out: tools/shared.py
aleckz@horizon:~/tests/emscripten$ fgrep "memory initializer */ allocate" * -r tools/shared.py: return '/* memory initializer */ allocate(["%s"], %i, "i8", ALLOC_NONE, Runtime.GLOBAL_BASE%s);' % (
This method generates the big blobs, tweak it to use base64 encoding:
@staticmethod def replace_initializers(src, inits): class State: first = True def rep(m): if not State.first: return '' # write out all the new initializers in place of the first old one State.first = False def gen_init(init): offset, contents = init return '/* memory initializer (tweaked) */ allocate("%s", %i, "i8", ALLOC_NONE, Runtime.GLOBAL_BASE%s);' % ( #','.join(contents), base64.b64encode(''.join(struct.pack('B',int(i)) for i in contents)), len(contents), '' if offset == 0 else ('+%d' % offset) ) return '\n'.join(map(gen_init, inits)) return re.sub(JS.memory_initializer_pattern, rep, src)
Original output was done via: ‘,’.join(contents),
Raw binary (bad idea) can be done via: ”.join(struct.pack(‘B’,int(i)) for i in contents),
Base64 encoded binary (nice and safe) via: base64.b64encode(”.join(struct.pack(‘B’,int(i)) for i in contents)),
Disclaimer: I don’t know python at all, so the above code may be hazardous to your health.
Next: modifying the allocate() function to decode base64 encoded binary.
We just changed the parameters fed into the allocate() method, so we need to modify it to be able to deal with base64 encoded data.
But I don’t want to modify the global allocate(), since that may have unintended consequences. Rather lets make an alternative one to use with our large binary blobs.
1) Search-and-replace /* memory initializer */ allocate with /* memory initializer (tweaked) */ allocateBase64Encoded
2) Create this moddified version of allocate() called allocateBase64Encoded():
function b64ToUint6 (nChr) { return nChr > 64 && nChr < 91 ? nChr - 65 : nChr > 96 && nChr < 123 ? nChr - 71 : nChr > 47 && nChr < 58 ? nChr + 4 : nChr === 43 ? 62 : nChr === 47 ? 63 : 0; } function base64DecToArr (sBase64, nBlocksSize) { var sB64Enc = sBase64.replace(/[^A-Za-z0-9\+\/]/g, ""), nInLen = sB64Enc.length, nOutLen = nBlocksSize ? Math.ceil((nInLen * 3 + 1 >> 2) / nBlocksSize) * nBlocksSize : nInLen * 3 + 1 >> 2, taBytes = new Uint8Array(nOutLen); for (var nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) { nMod4 = nInIdx & 3; nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << 18 - 6 * nMod4; if (nMod4 === 3 || nInLen - nInIdx === 1) { for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) { taBytes[nOutIdx] = nUint24 >>> (16 >>> nMod3 & 24) & 255; } nUint24 = 0; } } return taBytes; } // special base64 allocator // assumes: allocator = ALLOC_NONE and types = i8 function allocateBase64Encoded(slab, length, types, allocator, ptr) { HEAPU8.set(base64DecToArr(slab), ptr); }
The above allocator is extremely stripped down. It works with a Base64 string to Uint8Array decoder I borrowed from Mozilla docs. And only for ALLOC_NONE “i8” (unsigned char) slabs. The above snippet can be injected into Emscripten’s src/preamble.js or manually into generated code on an individual basis.
Resulting Javascript blobs:
No more ASCII integer arrays, but rather a nice base64 string. And we save a good 600KB!
Minimal build using base64 encoding: 1.0MB
Shrinking the binary libraries
libvorbis.so takes up 440KB, libvorbis.so 220KB, libogg.so 30KB. So let’s tackle the fat library first.
Running llvm-ar t libvorbisenc.a tells us that the entire archive is just libvorbisenc.o. Taking a look at libvorbisenc.cpp we can spot a lot of large headers responsible for it’s 440KB weight. Let’s gut some of these headers:
We will only support 44khz. Delete vorbisenc.o and rebuild:
-rw-rw-r-- 1 aleckz aleckz 172K Feb 18 01:34 vorbisenc.o -rw-rw-r-- 1 aleckz aleckz 440K Feb 17 14:16 vorbisenc.o.original
The encoder object shrank by a good 270KB!
Minimal gutted base64 build: 730KB
Gzipped: 200KB! This is VERY manageable.
Awesome!!!! 730 KB with a encoder in js if pretty awesome.!!
Hi, is a good work!
Can you create one step-by-step howto and publish this creation (linux, I use ubuntu 12.04 64) eg. on github or hier?
Or put compiled js files into blog and accept downloadable?
Big question a encoder performance,have you perform data?
I try SPEEX encoder built in a realtime HTML5 upstream application. Main perform. results is good. Main laptop is 4core, good work this fine. The speex chunks source a and use block parsing (4096 sample on each block), upload to server and restream to many clients.
can you share how you implemented the speex.js encoding? I am attempting this, and am having issues, any help much much appreciated!
failing that, some more explanation as to how to get the emscripten method working would be so helpful!
At the beginning of your article you say
“Firefox can pull your silky voice in over a microphone and generate Ogg/Vorbis.”
How ? I’m pretty interested by it, any link or piece of code would be really helpful, thanks
Pingback: Browser Audio Encoding, Atom Open Source | 8tut.com
Very intresting, but useless without detailed information.
I’m very interested in this as well. Have you published this library at all?
I’m interested in this too!!!
Do you put the compiled js files into blog and accept downloadable?
Great work! and nice post ;)
Thanks!
Hey, i’ve tried all of the above and I get this when I run the emcc -O2 command above:
warning: unresolved symbol: vorbis_analysis_buffer
warning: unresolved symbol: vorbis_encode_init_vbr
warning: unresolved symbol: ogg_stream_packetin
warning: unresolved symbol: vorbis_comment_clear
warning: unresolved symbol: vorbis_info_clear
warning: unresolved symbol: vorbis_block_clear
warning: unresolved symbol: vorbis_comment_add_tag
warning: unresolved symbol: ogg_stream_flush
warning: unresolved symbol: ogg_stream_clear
warning: unresolved symbol: vorbis_bitrate_flushpacket
warning: unresolved symbol: vorbis_analysis_init
warning: unresolved symbol: ogg_stream_pageout
warning: unresolved symbol: vorbis_analysis_headerout
warning: unresolved symbol: vorbis_analysis_wrote
warning: unresolved symbol: vorbis_analysis
warning: unresolved symbol: vorbis_info_init
warning: unresolved symbol: vorbis_dsp_clear
warning: unresolved symbol: vorbis_analysis_blockout
warning: unresolved symbol: ogg_stream_init
warning: unresolved symbol: vorbis_block_init
warning: unresolved symbol: vorbis_bitrate_addblock
warning: unresolved symbol: vorbis_comment_init
I build libogg and libvorbis from sources using emconfigure and placed them in the library directory as above but it cannot find the symbols when linking…
Oh no!
D.
Can’t you just provide working .js file or real demo on the web? Prove that you actually got it working :)
Check out https://github.com/Garciat/libvorbis.js for a working distribution of the library.
Amazing work, I’m trying to do this but converting to m4a files, only found ffmpeg but is too big and delayed :(