“No Input” game: Talk to the Cat

While I was doing research into motion detection and voice-activation, as I have been doing on and off for a couple of weeks now, I came across PocketSphinx recently. It’s one of the very few open source speech recognition libraries I have found out there and one, conveniently enough, that also has a JavaScript port. Voice commands in the browser!

However, as the demo experiment I put together shows, it is not ready for wide-spread deployment in websites and as an interface layer for games. If and when it works (and that “if” is dependent on things the software can’t account for like hardware and browser issues), it is quite good though. In the testing I’ve done, it was able to recognize the words I setup most of the time.

Which brings me to another issue with the JavaScript port. Because the dictionary is a large file (over 100,000 entries), you have to look up a word you want and then hard-code its values. And, while that’s not too much of a problem, it does mean a slow testing process of picking words, doing testing, and then sometimes finding out the command words you happened to pick were too similar in sound.

I went through a whole game development iteration where I was going in one direction, found my words were too close together phonetically, and then needed to swap them out. Each time, I would wait for the loading, try out some new words, and then go back to search for different combinations.

It was that loading, actually, that will be the death for most projects that might want to use PocketSphinx.js. The emscripten’d file is rather large, at several MBs in size. The project, when it works, is very impressive, but its size means loads of downloading (and thus waiting) for clients. And for mobile projects, the several megabyte download with considerable waiting and loading time, is just a No Go.