Hacking Google Voice API in Linux

You should have seen voice-aware input zones coming with the new google chrome release about a month ago. Yeah it’s a cool way to input text easily without typing for long seconds, with the opportunity to get search results for “laughable clothes” when you say “fashionable clothes”. Seriously i cannot see how this is useful, especially when it comes to desktop PCs.

But there’s a good guy on the internet who happily made good use of it. He made a shell script that listens to your voice and use Google Voice API to decode it and convert it to text. I will be explaining this hack he made so you all can make good use of it.

First thing is we need a url for the API, do we define the API variable

API="http://www.google.com/speech-api/v1/recognize?lang=en"

Note that at the end of  it there is this lang parameter, we can make our script more efficient if it would be able to handle multiple languages, let’s put it in a variable, or maybe get it passed as an argument 🙂

if [ -z "$1" ]
  then
    echo "No language supplied, using en\n"
    LANG="en"
  else
    echo "using $1 as language\n"
    LANG="$1"
fi
API="http://www.google.com/speech-api/v1/recognize?lang=$LANG"

Now we need to send to this url a sound file containing our voice. But it’s not that simple of course, we need:

  • arecord to record our voice over the mic
  • flac to convert the file format
  • wget to interact with the api

Make sure these 3 packages are installed, if not, you can always use your package manager like apt-get to install it. The reason we’re converting the file into flac format is that is required by the API itself. Now let’s mix things together!

JSON=`arecord -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac;\
wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" "$API"`

As you can see, we did good so far and the script will receive the response in JSON format, so we need to parse it using sed and awk. I already wrote an article about sed here, you want to check it out. This may look freaky but it does the job

UTTERANCE=`echo $JSON\
 |sed -e 's/[{}]/''/g'\
  |awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }'\
   |awk -F: 'NR==3 { print $3; exit }'\
    |sed -e 's/["]/''/g'`
echo "utterance: $UTTERANCE"

Yeah now we had our script to echo the text! That seems pretty geeky, but how can this be useful? Controlling our PC maybe? why not! To do that we must define string to which the script compares the final text, if it matches one of the strings, it executes the corresponding command.

CMD_LIST_DIRECTORY="list directory"
CMD_WHOAMI="who am i"
if [ `echo "$UTTERANCE" | grep -ic "^$CMD_LIST_DIRECTORY$"` -gt 0 ]; then
     ls .
elif [ `echo "$UTTERANCE" | grep -ic "^$CMD_WHOAMI$"` -gt 0 ]; then
     whoami
fi

We can define countless numbers of commands, i will be working on using arrays for this (maybe one of you can do it for us 🙂 ). You can find a complete script here if you are too lazy to save a new file :p

Guess what, we just made good use of Google Voice API! I will leave you to test it, improve it and why not share it. Your comments are welcome.

8 thoughts on “Hacking Google Voice API in Linux

  1. Nice little tutorial. If possibly, how would you extend this so that the voice is being read at all times and can stream the output? Having to type the command is perhaps circumventable?
    Xarlos.

  2. I am looking for some help on implementing this code, I can pay for lessons on Skype etc please email me or leave response to contact

Leave a comment