Sphinx

Sphinx Description

From Sourceforge:

Sphinx is a speaker-independent large vocabulary continuous speech recognizer under Berkeley's style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition system.

Sphinx project pages

The CMU Sphinx Group Open Source Speech Recognition Engines
Sphinx Sourceforge Project
CMU Sphinx: Open Source Speech Recognition

Sphinx and Asterisk


A detailed example and code (link mirror) for a client/server approach. Built for use with Asterisk, but could work with any number of projects. This should get you up and running in no time!

There are also some discussions to be found on the Asterisk-Users list for example here.

Quote by Stephan A. Edelman (July 2006)

It is fairly easy to integrate Asterisk with Sphinx, the only trouble is that you
need to have an Acoustic Model (AM) for 8KHz, which are not (yet) readily available.

There is a Language Model (LM) and AM included with Sphinx, but it is designed for a sampling rate of 16KHz and
therefore does not work with Asterisk. Even if you use sophisticated upsampling techniques (sinc
interpolation - sox) to create 16KHz for use with Sphinx, the recognition rate is absolutely dismal.
Sphinx does a fine job on native 16KHz samples, just not samples created by upsampling.

CMU does have an 8KHz LM and AM available which they created for their Communicator project (phone based
airplane reservation system), but the AM is for Sphinx2, which has considerably poorer recognition
speed and accuracy compared to Sphinx3 or Sphinx4. I haven't had a chance to try and convert the AM to
Sphinx3, but I believe it can be done.


The LM is obviously independent of the sampling frequency contrary to that stated above. What you really need is someone who will take the original telephony recordings, or at least any other speech corpus (e.g. http://www.voxforge.org/), downsample the recordings to 8 kHz if needed and then retrain the AM with the new data. The reason the old AM doesn't recognize the 8kHz signal is because AM looks for patterns in the frequency domain and if it was trained on let's say 16 kHz data, it will be used to recognizing patterns in the 0-8kHz band. If you give him a signal upsampled from 8 to 16kHz, there's nothing for him to look for above the 4kHz boundary. And there is no upsampling technique that would create these missing patterns out of thin air...


Example


The tutorial on the Asterisk/Sphinx integration above does not provide a complete working sample. This script (as the AGI script to be called from Asterisk) does work with the rest of the tutorial as described:

- sphinx.agi



//!/usr/bin/perl
// sphinx.agi
// Copyright (c) 2005 Josh McAllister
//
// This program is free software; you can redistribute it and/or modify
// it under the same terms as Perl itself.
//
// Written by Josh McAllister 

use Asterisk::AGI;
my $AGI = new Asterisk::AGI;
%input = $AGI->ReadParse();

sub asr {
use IO::Socket;
use FileHandle;
use IPC::Open2;
my $file = shift or return undef;
my $host = shift || 'localhost';
my $port = shift || '1069';
my $fh;

my $remote =  IO::Socket::INET->new(
		Proto    => "tcp",
		PeerAddr => "$host",
		PeerPort => "$port",
		) or return undef;

#Idea here being that you can pass a reference to an existing file handle... not yet implemented, just pass a filename.
if (ref $file) { 
   my $fh = $file;
} else {
   open (FH, $file) || return undef;
   $fh = *FH;
}

$file =~ /(gsm|wav)$/;
my $type = $1;
if ($type !~ /gsm|wav/) {
   warn "Unknown file type ($file)";
   return undef;
}
#print "FTYPE: $type\n";
$pid = open2(*SOXIN, *SOXOUT, "sox -t $type - -s -r 8000 -w -t wav - 2>/dev/null") || warn ("Could not open2.\n");

binmode $fh;
binmode SOXIN;
binmode SOXOUT;
binmode $remote;

while (defined(my $b = read $fh, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   print SOXOUT $buf;
}
close SOXOUT;

$count = 0;
my $sox = undef;
while (defined(my $b = read SOXIN, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   $sox .= $buf;
}
print $remote length($sox) . "\n";
print $remote "$sox";
close SOXIN;

#print "DEBUG: Waiting for result.\n";
   
$count=0;
while (defined(my $b = read $remote, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   $result .= $buf;
}

close $fh;
close $remote;

return "$result";
}

sub confirm {
      while (my $tries <= 3) {
         $tries++;

         $AGI->stream_file("vr/say_yes_no",'""');

         $AGI->stream_file("beep",'""');
         $AGI->record_file("/tmp/$$", 'gsm', '0',3000);
         $AGI->stream_file("beep",'""');


         #Here is where the magic happens, this is calling the asr sub from sphinx-netclient.pl
         #Again, this sub needs to be in this same script. $vresponse will contain the
         #transcription of the what the caler said.

         my $vresponse = asr("/tmp/$$.gsm");
         $AGI->verbose("CONFIRM: $vresponse");

         next if $vresponse !~ /YES|NO|ACCEPT|CANCEL/;

         $gotresp = 1;


         if ($vresponse =~ /NO|CANCEL/i) {
            sleep 1;
            $AGI->stream_file("cancelled",'""');
            return undef;
         } else {
	    $AGI->set_variable('RESPONSE', 'YES');
            return 1;
         }

      }

      if (! $gotresp) {
         sleep 1;
         $AGI->stream_file("invalid_selection",'""');
         return undef;
      }
}

$AGI->stream_file("vr/green_eggs_ham",'""');
unless ( confirm() ) {
   #They said no
   $AGI->set_variable('RESPONSE', 'NO');
   exit;
}



Integrating Sphinx with phpAGI


If you have the server up and running from the tutorial above (and it really is quite easy to do), then you can connect to it from PHP using this function which should be added to the phpAGI script



    function sphinx($filename='', $timeout=3000, $service_port = 1069, $address = '127.0.0.1'){
		
		/* if a recording has not been passed in we create one */
		if ($filename=="") {
			$filename = "/var/lib/asterisk/sounds/sphinx_".$this->request['agi_uniqueid'];
			$extension = "wav";
			$this->stream_file('beep', 3000, 5);
			$this->record_file($filename, $extension, '0',$timeout);
			$filename=$filename.'.'.$extension;
		}	
			
		/* Create a TCP/IP socket. */
		$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
		if ($socket < 0) {
			return false;
		}
		
		$result = socket_connect($socket, $address, $service_port);
		if ($result < 0) {
		   return false;
		}
		
		//open the file and read in data
		$handle = fopen($filename, "rb");
		$data = fread($handle, filesize($filename));
		
		socket_write($socket, filesize($filename)."\n");
		socket_write($socket, $data);
		
		$response = socket_read($socket, 2048);
		
		socket_close($socket);
		
		unlink($filename);
		return $response;
   }



See also

Sphinx Description

From Sourceforge:

Sphinx is a speaker-independent large vocabulary continuous speech recognizer under Berkeley's style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition system.

Sphinx project pages

The CMU Sphinx Group Open Source Speech Recognition Engines
Sphinx Sourceforge Project
CMU Sphinx: Open Source Speech Recognition

Sphinx and Asterisk


A detailed example and code (link mirror) for a client/server approach. Built for use with Asterisk, but could work with any number of projects. This should get you up and running in no time!

There are also some discussions to be found on the Asterisk-Users list for example here.

Quote by Stephan A. Edelman (July 2006)

It is fairly easy to integrate Asterisk with Sphinx, the only trouble is that you
need to have an Acoustic Model (AM) for 8KHz, which are not (yet) readily available.

There is a Language Model (LM) and AM included with Sphinx, but it is designed for a sampling rate of 16KHz and
therefore does not work with Asterisk. Even if you use sophisticated upsampling techniques (sinc
interpolation - sox) to create 16KHz for use with Sphinx, the recognition rate is absolutely dismal.
Sphinx does a fine job on native 16KHz samples, just not samples created by upsampling.

CMU does have an 8KHz LM and AM available which they created for their Communicator project (phone based
airplane reservation system), but the AM is for Sphinx2, which has considerably poorer recognition
speed and accuracy compared to Sphinx3 or Sphinx4. I haven't had a chance to try and convert the AM to
Sphinx3, but I believe it can be done.


The LM is obviously independent of the sampling frequency contrary to that stated above. What you really need is someone who will take the original telephony recordings, or at least any other speech corpus (e.g. http://www.voxforge.org/), downsample the recordings to 8 kHz if needed and then retrain the AM with the new data. The reason the old AM doesn't recognize the 8kHz signal is because AM looks for patterns in the frequency domain and if it was trained on let's say 16 kHz data, it will be used to recognizing patterns in the 0-8kHz band. If you give him a signal upsampled from 8 to 16kHz, there's nothing for him to look for above the 4kHz boundary. And there is no upsampling technique that would create these missing patterns out of thin air...


Example


The tutorial on the Asterisk/Sphinx integration above does not provide a complete working sample. This script (as the AGI script to be called from Asterisk) does work with the rest of the tutorial as described:

- sphinx.agi



//!/usr/bin/perl
// sphinx.agi
// Copyright (c) 2005 Josh McAllister
//
// This program is free software; you can redistribute it and/or modify
// it under the same terms as Perl itself.
//
// Written by Josh McAllister 

use Asterisk::AGI;
my $AGI = new Asterisk::AGI;
%input = $AGI->ReadParse();

sub asr {
use IO::Socket;
use FileHandle;
use IPC::Open2;
my $file = shift or return undef;
my $host = shift || 'localhost';
my $port = shift || '1069';
my $fh;

my $remote =  IO::Socket::INET->new(
		Proto    => "tcp",
		PeerAddr => "$host",
		PeerPort => "$port",
		) or return undef;

#Idea here being that you can pass a reference to an existing file handle... not yet implemented, just pass a filename.
if (ref $file) { 
   my $fh = $file;
} else {
   open (FH, $file) || return undef;
   $fh = *FH;
}

$file =~ /(gsm|wav)$/;
my $type = $1;
if ($type !~ /gsm|wav/) {
   warn "Unknown file type ($file)";
   return undef;
}
#print "FTYPE: $type\n";
$pid = open2(*SOXIN, *SOXOUT, "sox -t $type - -s -r 8000 -w -t wav - 2>/dev/null") || warn ("Could not open2.\n");

binmode $fh;
binmode SOXIN;
binmode SOXOUT;
binmode $remote;

while (defined(my $b = read $fh, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   print SOXOUT $buf;
}
close SOXOUT;

$count = 0;
my $sox = undef;
while (defined(my $b = read SOXIN, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   $sox .= $buf;
}
print $remote length($sox) . "\n";
print $remote "$sox";
close SOXIN;

#print "DEBUG: Waiting for result.\n";
   
$count=0;
while (defined(my $b = read $remote, my($buf), 4096)) {
   last if $b == 0;
   $count += $b;
   $result .= $buf;
}

close $fh;
close $remote;

return "$result";
}

sub confirm {
      while (my $tries <= 3) {
         $tries++;

         $AGI->stream_file("vr/say_yes_no",'""');

         $AGI->stream_file("beep",'""');
         $AGI->record_file("/tmp/$$", 'gsm', '0',3000);
         $AGI->stream_file("beep",'""');


         #Here is where the magic happens, this is calling the asr sub from sphinx-netclient.pl
         #Again, this sub needs to be in this same script. $vresponse will contain the
         #transcription of the what the caler said.

         my $vresponse = asr("/tmp/$$.gsm");
         $AGI->verbose("CONFIRM: $vresponse");

         next if $vresponse !~ /YES|NO|ACCEPT|CANCEL/;

         $gotresp = 1;


         if ($vresponse =~ /NO|CANCEL/i) {
            sleep 1;
            $AGI->stream_file("cancelled",'""');
            return undef;
         } else {
	    $AGI->set_variable('RESPONSE', 'YES');
            return 1;
         }

      }

      if (! $gotresp) {
         sleep 1;
         $AGI->stream_file("invalid_selection",'""');
         return undef;
      }
}

$AGI->stream_file("vr/green_eggs_ham",'""');
unless ( confirm() ) {
   #They said no
   $AGI->set_variable('RESPONSE', 'NO');
   exit;
}



Integrating Sphinx with phpAGI


If you have the server up and running from the tutorial above (and it really is quite easy to do), then you can connect to it from PHP using this function which should be added to the phpAGI script



    function sphinx($filename='', $timeout=3000, $service_port = 1069, $address = '127.0.0.1'){
		
		/* if a recording has not been passed in we create one */
		if ($filename=="") {
			$filename = "/var/lib/asterisk/sounds/sphinx_".$this->request['agi_uniqueid'];
			$extension = "wav";
			$this->stream_file('beep', 3000, 5);
			$this->record_file($filename, $extension, '0',$timeout);
			$filename=$filename.'.'.$extension;
		}	
			
		/* Create a TCP/IP socket. */
		$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
		if ($socket < 0) {
			return false;
		}
		
		$result = socket_connect($socket, $address, $service_port);
		if ($result < 0) {
		   return false;
		}
		
		//open the file and read in data
		$handle = fopen($filename, "rb");
		$data = fread($handle, filesize($filename));
		
		socket_write($socket, filesize($filename)."\n");
		socket_write($socket, $data);
		
		$response = socket_read($socket, 2048);
		
		socket_close($socket);
		
		unlink($filename);
		return $response;
   }



See also

Created by: liawagner, Last modification: Fri 26 of Nov, 2010 (15:02 UTC) by atheos
Please update this page with new information, just login and click on the "Edit" or "Discussion" tab. Get a free login here: Register Thanks! - Find us on Google+