Jump to content

Text to Speech without Internet


KeviNH

Recommended Posts

With a little effort and a Unix-Like OS, I've come up with a simple solution for Text-to-Speech without relying on going out to an Internet website every time.

 

You'll need the Networking Module and the following FREE software, installed on some sort of Unix machine (one that is always up, I use a VM):

  • [*:17f7atcm]
espeak
[*:17f7atcm]sox
[*:17f7atcm]mplayer (or madplayer, or any other MP3 player)
[*:17f7atcm]Perl (usually built in)
[*:17f7atcm]rtunes (if you want to use Airplay, only on OpenBSD)

 

The attached script uses the above tools to run a listener on UDP/15157 on your Unix box.

 

With the UDI Networkinh module, you can create a UDP resource, use the IP of the Unix machine, port 15157 and plain text. Any text you type in the resource box will be translated to speech. Because the script caches the generated audio, there will be a slight delay the first time any new piece of text is played, while espeak translates it.

 

The example script is a bit of a kludge, there's no security or access control, it will eventually use all your disk space. The same pieces could be used in an SSL-ized CGI script with authentication with just a little more effort.

Link to comment

Can't attach the file, extensions are all denied.

 

Note: When creating a "Resource" in Networking, make sure your string of text ends with a carriage return (hit "Enter" at the end of the line.). Without the newline, the listener won't recognize there is data to process. I could correct this by using a low-level system read() command on the loop, maybe in the next version.

 

It's a really simple daemon, the toughest part was the Airplay connectivity, which I think might be only on OpenBSD.

 

There is no authentication around the daemon at all, but aside from making your speakers spew profanity, I don't anticipate any obvious security risks as long as you run the script as an unprivileged user (just needs to be able to play audio, and with Airplay via the network, not even that).

 

#! /usr/bin/perl
#
# Simple text to speech network daemon
#
# Because saying a large amount of text can take quite some time,
# We use UDP so the sender doesn't need to wait for us to finish.

$PORT=15157;    # We listen on this UDP port.  Change this
$AIRPLAY="10.3.3.5"; # Change to IP address of your airplay device

use Socket;
use Digest::MD5 qw(md5 md5_hex md5_base64);

# Feel free to play with the voice.
$espeak="espeak -ven-us+m2";

# Can replace this with any MP3 player, any output device.
$mplayer="mplayer -really-quiet -ao rtunes:device=$AIRPLAY:af=inet";

# I use sox to pad with silence before/after speech
# and because it will upconvert the espeak WAV to airplay-compatible audio.
#
$sox="sox";

# If you're using Airplay via rtunes, don't change the channels, rate, or padding.
$soxformat="channels 2 rate 44100 pad 1 3";

$spool=$ENV{HOME}."/spool/";
mkdir($spool);
chdir($spool) || die "I need spool directory $spool - $!";


# Network listener
socket(UDP, PF_INET, SOCK_DGRAM, getprotobyname("udp"));
bind(UDP, sockaddr_in($PORT, INADDR_ANY));

# Loop forever
while ($string=) {
       $string=~s/[^\w\s\.\?\,\<\>\/\;]/ /g;
       $short=md5_hex($string);

       $wav=$short. ".wav";
       $mp3=$short. ".mp3";

       unless(-s $wav || -s $mp3) {
               open(SHORT,">$short.txt") || die $!;
               print SHORT $string,"\n";
               close(SHORT);

               system("$espeak -f $short.txt -w $wav");
               }

       # Change this call if you want to pad with additional sound effects,
       # For example, Sox can prepend a tone WAV file before the speech.
       system("$sox $wav $mp3 $soxformat") unless(-s $mp3);

       # This doesn't have to be mplayer, any player should work.
       system("$mplayer $mp3");

       unlink($wav);
       }
exit(0);

Link to comment

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...