RC SYSTEMS
RC8650 VOICE SYNTHESIZER
DoubleTalk RC8650
CMOS, 3.3 Volt / 5 Volt
Voice Synthesizer Chipset
FEATURES
• Integrated text-to-speech processor:
–
–
–
–
High voice quality, unlimited vocabulary
Converts any ASCII text into speech automatically
Add/modify messages by simply editing a text file
On-the-fly control of speed, pitch, volume, etc.
• Playback of sound files:
– Real-time PCM and ADPCM
– Prerecorded on chip, up to 15 minutes
• Tone generation:
– Three voice musical
– Dual sinusoidal
– DTMF (Touch-Tone) dialer
GENERAL DESCRIPTION
• On-chip A/D converter:
The RC8650 integrates a text-to-speech (TTS) processor, real
time and prerecorded audio playback, multiple tone generators
and telephone dialer into an easy to use chipset. The integrated
text-to-speech processor utilizes RC Systems’ DoubleTalk™ TTS
technology, which is based on a patented voice concatenation
technique using real human voice samples. The DoubleTalk TTS
processor also gives the user unprecedented real-time control of
the speech signal, including pitch, volume, tone, speed, expression, articulation, and so on.
– Four channels, 8-bit resolution
– One-shot, continuous, single sweep, and
continuous sweep modes of operation
– Software and hardware triggering
– Support for external op amp
•
•
•
•
•
•
•
•
Analog and digital audio outputs
Serial and bus interfaces
User programmable greeting and default settings
Using a standard serial or bus interface, any ASCII text can be
streamed to the RC8650 for automatic conversion into speech.
Real time and prerecorded audio playback modes augment the
TTS capabilities for applications requiring very high voice quality
and a relatively small, fixed vocabulary, or applications requiring
special sounds or sound effects. Integrated musical and sinusoidal tone generators, Touch-Tone dialer and four-channel A/D
converter further enhance the RC8650’s attractiveness by providing these often-needed functions on chip. The audio output is
delivered in both analog and digital PCM audio formats, which
can be used to drive a speaker or digital audio stream.
Flexible user exception dictionary
In-circuit, field programmable
2 KB input buffer for virtually no-overhead operation
Available in 3.3 V and 5 V versions
Low power (typ @ 3.3 V):
– 11 mA active
– 0.7 mA idle
– 2 µA standby
The RC8650 includes integrated nonvolatile memory for the storage and on-demand playback of up to 15 minutes of prerecorded
speech and sounds. Additional on-chip memory enables the user
to store a power-on “greeting” message that is automatically
played whenever the chipset is powered up, as well as configure
the chip’s default settings. A special memory area is also provided
for storing a custom pronunciation dictionary, allowing the pronunciation of virtually any character string to be redefined. All of
these features can be programmed and updated by the user via
the integrated serial port, even in the field after the RC8650 has
been integrated into the end-product.
APPLICATIONS
•
•
•
•
•
•
•
•
•
•
•
•
Robotics
Talking OCR systems
Talking pagers and PDAs
GPS navigation systems
Vending, ticketing and ATM machines
Remote diagnostic reporting
Dial-up information systems
Handheld barcode readers
The RC8650 chipset is comprised of two surface-mounted devices. Both operate from a +3.3 V or +5 V supply and consume
very little power. Most applications require only the addition of a
lowpass filter/audio power amplifier to implement a fully functional
system.
Electronic test and measurement
Security systems
Aids for the orally or visually disabled
Meeting federal ADA requirements
DoubleTalk RC8650 User’s Manual Rev F1
Revised 2/18/02
1
© 1999-2001 RC Systems, Incorporated
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 1: SPECIFICATIONS
NC
IC4
IC5
IC6
IC7
IC8
IC9
IC10
IC11
IC12
IC13
VCC
IC14
VSS
IC15
IC16
IC17
IC18
IC19
IC20
IC21
NC
IC22
IC23
IC24
IC25
IC26
IC27
IC28
IC29
PINOUTS
80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51
PIO7
81
50
IC3
PIO6
82
49
PRD#
PIO5
83
48
NC
PIO4
84
47
STS#
PIO3
85
46
IC2
PIO2
86
45
NC
PIO1
87
44
IC1
PIO0
88
43
NC
SEL4
89
42
IC0
SEL3
90
41
ACLR#
SEL2
91
40
NC
SEL1
92
39
VCC
AN3
93
38
CTS#
AN2
94
37
RDY#
AN1
95
36
RXD
AVSS
96
35
TXD
AN0
97
34
DARTS#
AVREF
98
33
DACLK
AVCC
99
32
DAIN
100
31
DAOUT
48
47
46
45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
IC5
VCC
VSS
IC22
IO7
IC23
IO6
IC24
IO5
IC25
IO4
VCC
IC26
IO3
IC27
IO2
IC28
IO1
IC29
IO0
IC1
VSS
IC3
IC21
IC6
IC7
IC8
IC9
IC10
IC11
IC12
IC13
NC
IC32
IC2
IC0
VCC
VCC
IC31
IC30
IC4
IC14
IC15
IC16
IC17
IC18
IC19
IC20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
BRS0
BRS1
BRS2
IC30
IC31
IC32
VCC
Figure 1.1. Pin Assignments
AS0
VSS
TOP VIEW
SUSP0#
BRD
RC46xxFP
48-Lead TSOP
12 mm x 20 mm
AS1
TS1
SUSP1#
TS0
PWR#
AO0
NC
AO1
STBY#
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
VCC
8
VCC
7
XIN
6
VSS
5
XOUT
4
RESET#
3
NC
2
SEL5
1
AMPIN
TOP VIEW
AMPOUT
ADTRG
RC8650FP
100-Lead QFP
14 mm x 20 mm
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
PIN DESCRIPTIONS
Table 1.1. Pin Descriptions
Pin Name
Type
Name and Function
IC0–IC32
INPUT/
OUTPUT
CHIPSET INTERCONNECTS: Interconnections between the RC8650 and RC46xx chips. IC0 connects
to IC0, IC1 to IC1, etc. IC30–IC32 must have a 47 kΩ pullup resistor to VCC. No other connections should
be made to these pins.
AO0
AO1
OUTPUT
ANALOG OUTPUT: Channels 0 and 1 digital to analog (D/A) converter outputs. The output voltage
range is from 0 V to AVREF; AVREF/2 V when at rest. Single channel systems must use AO0.
TS0
TS1
OUTPUT
TALK STATUS: Indicates whether a voice channel is active. TSn can be used to enable external
devices such as a transmitter, telephone, or audio amplifier. The pins’ polarity are programmable, and
can be activated automatically or under program control. Single channel systems must use TS0.
SUSP0#
SUSP1#
INPUT
SUSPEND: Suspends audio output when Low. These pins affect only the corresponding AO pin; they do
not affect the digital audio output DAOUT pin (use DARTS# to control DAOUT). Single channel systems
must use SUSP0#. Connect these pins to a High level if not used.
AS0
AS1
OUTPUT
AUDIO SYNC: Outputs a clock signal in synchronization with the updating of analog outputs AO0 and
AO1. The pin changes state whenever the corresponding D/A converter is updated. Single channel
systems must use AS0.
DAOUT
OUTPUT
DIGITAL AUDIO OUTPUT: Provides the same 8 bit digital audio stream that is fed to the internal D/A
converters. This pin can be programmed to be a CMOS or open-drain output. The communication
protocol is progammable, and can operate in synchronous or asynchronous mode.
DACLK
INPUT
DIGITAL AUDIO CLOCK: This pin is used to clock data out of the DAOUT pin and data into the DAIN pin
in the synchronous digital audio output mode. DACLK can be programmed to transfer data on either the
rising edge or falling edge of the clock. Connect this pin to a High level if not used.
DAIN
INPUT
DIGITAL AUDIO CONTROL INPUT: This pin is used to control the operation of the DAOUT pin in a
multi-channel system. Reserved for a future product; connect this pin to a High level.
DARTS#
INPUT
DIGITAL AUDIO REQUEST TO SEND: A Low on this pin enables transmission from the DAOUT pin; a
High suspends transmission. DARTS# may be used in both the synchronous and asynchronous transfer
modes. Connect this pin to a Low level if not used.
PIO0–PIO7
INPUT/
OUTPUT
PERIPHERAL INPUT/OUTPUT BUS: Eight bit bidirectional peripheral bus. Data is input from a
peripheral when PRD# is active. Status information is output when STS# is active. PIO0–PIO7 also
connect to the RC46xx chip. Text, data and commands can be sent to the RC8650 over this bus.
STS#
OUTPUT
STATUS: Controls the transfer of status information from the RC8650 to a peripheral. Status information is
driven on the PIO0–PIO7 pins when STS# is Low. STS# is active only when there is new status information.
PRD#
OUTPUT
PERIPHERAL READ: Controls the transfer of data from a peripheral to the RC8650. Data is read from
the PIO0–PIO7 pins when PRD# is Low. If a connection is made to PRD#, it must also have a 47 kΩ
pullup resistor to VCC.
PWR#
INPUT
PERIPHERAL WRITE: Controls the writing of peripheral data to the RC8650. Data on the PIO0–PIO7
pins is read by the RC8650 on the rising edge of PWR#. Sufficient time must be given for the RC8650 to
process the data before writing additional data—RDY# (or Status Register bit SR.4) should be used for
this purpose. Connect this pin to a High level if not used.
RDY#
OUTPUT
READY: RDY# High indicates that the RC8650 is busy processing the last byte that was written over the
Peripheral I/O Bus. Wait for RDY# to be Low before attempting to write more data. RDY# goes High
briefly after each write operation over the PIO0–PIO7 bus, acknowledging receipt of each byte. If the
RC8650’s input buffer becomes full as a result of the last write operation, RDY# will remain High until
room becomes available. Note that RDY# can also be read from Status Register bit SR.4.
3
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.1. Pin Descriptions (Continued)
Pin Name
Type
AN0-AN3
INPUT
A/D CONVERTER INPUTS: Analog to digital converter input pins. Leave any unused pins unconnected.
ADTRG
INPUT
A/D CONVERTER TRIGGER: Starts A/D conversion when hardware triggering is selected. Minimum
Low pulse width is 200 ns. Leave this pin unconnected if not used.
AMPIN
INPUT
A/D CONVERTER AMPLIFIER: Connecting an operational amplifier between these pins allows the input
voltage to all four A/D converter input pins to be amplified with one operational amplifier. Leave these pins
unconnected if not used.
AMPOUT
OUTPUT
Name and Function
RXD
INPUT
RECEIVE DATA: Asynchronous serial data input used to read text, data and commands into the RC8650.
Connect this pin to a High level if not used.
TXD
OUTPUT
TRANSMIT DATA: Asynchronous serial data output used to read information out of the RC8650.
CTS#
OUTPUT
CLEAR TO SEND: The CTS# pin is Low when the RC8650 is able to accept data. CTS# acknowledges
each byte received on the RXD pin by going High briefly. If the RC8650’s input buffer becomes full as a
result of the last byte received, CTS# will remain High until room becomes available.
BRD
INPUT
BAUD RATE DETECT: BRD is used by the RC8650 to sample the host’s serial data stream in order to
determine its baud rate. BRD is normally connected to the RXD pin. The BRS0–BRS2 pins affect the
operation of BRD. Connect this pin to a High level if not used.
BRS0–
BRS2
INPUT
BAUD RATE SELECT: Programs the asynchronous serial port’s baud rate. Both the RXD and TXD pins
are programmed to the baud rate set by these pins. Setting BRS0–BRS2 to a High level will allow the
RC8650 to automatically detect the baud rate with the BRD pin. Connect to a High level if not used.
STBY#
INPUT
STANDBY/INIT: Dual function pin which either puts the RC8650 in standby mode or initializes its internal
parameter memory. STBY# must be High on the rising edge of RESET#.
Driving STBY# Low for 250 ms or longer causes the RC8650 to enter Standby mode. All peripheral
and serial port handshake lines are driven to their false (“not ready”) states, and the input buffer is
cleared. During standby, the RC8650 draws the minimum possible current (2 µA typ), but it is not able to
respond to any input pin except STBY# and RESET#. Returning STBY# High causes the RC8650 to
enter Idle mode (1 mA typ); the handshake lines are re-asserted and the RC8650 will be able to accept
input again. If the RC8650 entered standby due to a Sleep Timer event, driving STBY# Low for 250 ns or
longer then High will return the RC8650 to Idle mode.
Driving STBY# Low for less than 250 ms initializes the RC8650’s non-volatile parameter memory.
The greeting message and user dictionary are erased, and all voice parameters are restored to their
factory default settings. The prerecorded audio memory is not affected. The RC8650 then announces its
version number via the AO0 pin.
Connect this pin to a High level if not used.
SEL1–
SEL5
INPUT
SELECT: Programs the channel pair that the RC8650 is to respond to in a multi-channel system. These
pins are reserved for a future product; connect SEL1–SEL5 to a Low level to ensure upward compatibility.
RESET#
INPUT
RESET: A Low immediately terminates all activity and sets all pins in a known state. RESET# must be
held Low a minimum of 3 µs after VCC has stabilized in the proper voltage range. All pins will be valid
within 2 ms after reset.
ACLR#
INPUT
ANALOG CLEAR: A Low initializes the D/A and A/D converters within the RC8650. Connect ACLR# to
RESET#.
4
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.1. Pin Descriptions (Continued)
Pin Name
XIN
Type
INPUT
XOUT
OUTPUT
Name and Function
CLOCK INPUT/OUTPUT: These pins connect to the internal clock generating circuit. All timing for the
RC8650 and RC46xx chips are derived from this circuit. Connect a 7.3728 MHz crystal between XIN
and XOUT. Alternatively, an external 7.3728 MHz square wave may be applied to XIN.
VCC
POWER: +5 V ±10%, +3.3 V ±0.3 V power supply connection.
VSS
GROUND: Connect these pins to system ground.
AVCC
ANALOG POWER: Power supply input for the D/A and A/D converters. Connect this pin to VCC.
AVSS
ANALOG GROUND: Ground input for the D/A and A/D converters. Connect this pin to VSS.
AVREF
ANALOG REFERENCE VOLTAGE: Reference voltage for the D/A and A/D converters. Connect this
pin to VCC. Caution: any noise present on this pin will appear on the AO output pins.
NC
NO CONNECT: NC pins must remain unconnected. Connection of NC pins may result in component
failure or incompatibility with future product enhancements.
BLOCK DIAGRAM
XOUT
XIN
STBY#
RC8650 / RC46xx CHIPSET
PIO0–PIO7
RDY#
STS#
8
TONE GENERATORS
BUS I/F
PRD#
PWR#
BRS0–BRS2
2
ANALOG
AUDIO I/F
MUSICAL
2
2
SINUSOIDAL
DOUBLETALK
TEXT-TO-SPEECH
PROCESSOR
TOUCH-TONE
RXD
TXD
CTS#
2
CLOCK
GENERATOR
AS0–AS1
TS0–TS1
SUSP0#–
SUSP1#
DAIN
3
ASYNC
SERIAL I/F
DAOUT
DIGITAL
AUDIO I/F
RE-WRITABLE
NON-VOLATILE
MEMORY
BRD
DACLK
DARTS#
GREETING MSG /
DEFAULT SETTINGS
(234 BYTES)
4
SEL1–SEL5
AO0–AO1
5
CHANNEL
DECODER
RECORDED AUDIO
(0 / 130 / 390 / 910
SEC MAX)
EXCEPTION
DICTIONARY
(16 KB)
A/D CONV
AN0–AN3
AMPIN
AMPOUT
ADTRG
Figure 1.2. Functional Block Diagram
5
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
upgrade/update capability, use of the serial port is recommended
whenever possible.
FUNCTIONAL DESCRIPTION
The RC8650 chipset includes a number of features that make it
ideally suited for any design requiring voice output. The RC8650’s
major features are described below.
The RC8650’s audio output is available in both analog and digital
formats. The analog output should be used in applications where
no further processing of the audio signal is required, such as driving a speaker or headphones (the output still needs to be filtered
and amplified, however). The digital output is for applications that
require further processing of the audio signal, such as digital mixing or creating sound files for later playback.
Text-to-Speech Synthesizer
The RC8650 provides text-to-speech conversion with its integrated DoubleTalk text-to-speech synthesizer. Any English text
written to the RC8650 is automatically converted into speech.
Commands can be embedded in the input stream to dynamically
control the voice, even at the phoneme level (phonemes are the
basic sound units of speech).
RECOMMENDED CONNECTIONS
A greeting message can be stored in the RC8650 which is automatically spoken immediately after the RC8650 is reset. Any of the
commands recognized by the RC8650 may be included as part
of the greeting message, which can be used to set up custom
default settings and/or play back a prerecorded message or tone
sequence. An integrated nonvolatile memory area is also provided for storing a custom pronunciation dictionary, allowing the
pronunciation of any character string to be redefined.
Power/Ground
Power and ground connections are made to multiple VCC and VSS
pins of the RC8650 and RC46xx chips. Every VCC pin must be
connected to power, and every VSS pin must be connected to
ground. Decoupling capacitors should be placed as close as
possible to both chips. In particular, make sure adequate
decoupling is placed on the AVCC and AVREF pins, as noise
present on these pins will also appear on the AO output pins.
Connect any unused input pins to an appropriate signal level.
Leave any unused output pins and all NC pins unconnected.
Musical Tone Generator
An integrated, three-voice musical tone generator is capable of
generating up to three tones simultaneously over a four-octave
range. Simple tones to attention-getting sounds can be easily created.
Chip Interconnects
Pins IC0 through IC32 and PIO0 through PIO7 must be connected
between the RC8650 and RC46xx chips. IC30, IC31, and IC32
must have a 47 kΩ pullup resistor to VCC.
Touch-Tone Generator
The RC8650 includes an integrated DTMF (Touch-Tone) generator. This is useful in telephony applications where standard DTMF
tones are used to signal a remote receiver, modem, or access the
public switched telephone network.
Clock Generator
The RC8650 has an internal oscillator and clock generator that
can be controlled either by an external 7.3728 MHz crystal or external 7.3728 MHz clock source. Because the serial port baud
rate is derived from the clock generator, the use of ceramic resonators is not recommended due to their relatively wide frequency
tolerances. If an external clock is used, connect it to the XIN pin
and leave XOUT unconnected. See Figure 1.3 for recommended
clock connections.
Sinusoidal Tone Generator
A precision, dual sinusoidal tone generator can synthesize the
tones often used in signaling applications. The tone frequencies
can be independently set, allowing signals such as call-progress
tones to be generated.
Recorded Audio Playback
Up to 15 minutes of prerecorded speech and sound effects can
be stored in the RC8650 for later playback. Additionally, the
RC8650 can play back eight bit PCM and ADPCM audio in real
time, such as speech and/or sound effects stored in an external
memory or file system.
RC8650
RC8650
15
7.3728 MHZ
22 PF
Versatile I/O
XIN
XOUT
XIN
13
15
22 PF
Figure 1.3. Clock Connections
6
13
NC
EXTERNAL CLOCK
VCC
VSS
All data is sent to the RC8650 through its built in serial and/or
parallel ports. For maximum flexibility, including infield product
XOUT
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.2. Default Baud Rate Options
INTERFACING THE RC8650
The RC8650 contains both asynchronous serial and eight bit bus
interfaces. All text, commands, tone generator data, real time audio data, etc., are transmitted to the RC8650 via one of these ports.
For maximum flexibility, use of the serial port is recommended
whenever possible. Not all RC8650 functions are supported
through the bus interface. In particular, index markers, operating
system updates, chipset identification, current operating settings,
A/D conversion, and prerecorded audio downloads are only supported through the serial interface.
BRS2
BRS1
BRS0
L
L
L
L
H
H
H
H
L
L
H
H
L
L
H
H
L
H
L
H
L
H
L
H
Baud Rate
300
600
1200
2400
4800
9600
19200
Auto-detect
Serial Interface
The serial port operates with 8 data bits, 1 or more stop bits, no
parity, and any standard baud rate between 300 and 115200 bps.
The automatic baud rate detection mechanism is enabled when
the BRS0–BRS2 pins are all at a High logic level and the BRD pin
is connected to RXD. The baud rate is determined by the shortest
High or Low period detected in the input stream. This period is
assumed to be the bit rate of the incoming data.
A typical RS-232C interface is shown in Figure 1.4. Note that the
MAX232A transceiver is not required if the host system’s serial
port operates at 0/+5 V logic levels (which most microprocessors
and microcontrollers do). The RC8650’s serial port may be connected directly to the host system in this case.
In order for the RC8650 to determine the incoming baud rate,
there must be at least one isolated “1” or “0” in the input character.
The CR character, 0Dh, is recommended for locking the baud
rate. The character is not otherwise processed by the RC8650; it is
discarded.
The CTS# pin should be used to control the flow of serial data to
the RC8650. It is not necessary to check CTS# before transmitting
every byte, however. All data is routed through a high speed 16
byte buffer within the RC8650 before being stored in the primary
buffer. CTS# may be checked every eight bytes with no risk of
data loss.
If the measured bit period is determined to be a valid baud rate,
the RC8650 acknowledges lock acquisition by transmitting the
ASCII character “l” (6Ch) on the TXD pin.
Baud rate selection
The serial port’s baud rate can be programmed using any of three
methods: pin strapping, auto-detect, and by command. Pin
strapping sets the baud rate according to the logic levels
present on the BRS0–BRS2 pins, as shown in Table 1.2. Autodetect enables the serial port to automatically detect the baud
rate of the incoming data. The baud rate command (described
in Section 2) allows the baud rate to be changed at any time,
effectively overriding the first two methods. Note that pin strapping
cannot be used to program baud rates higher than 19200; to do
this, auto-detection or the baud rate command must be used.
Start bit
RXD
CTS#
Baud rate validation (≈75 ms)
6Ch
TXD
Figure 1.5. Baud Rate Detection Timing
MAX232A
VCC
16
RC8650
BRD
RXD
TXD
CTS#
V+
C1–
V–
C2+
30
29
28
GND
C2–
7
36
35
R1O
T1I
R2O
T2I
13
R1I
14
T1O
8
R2I
7
T2O
38
DB9
0.1UF
2
BRS0
BRS1
BRS2
C1+
0.1UF
VCC
RS-232C
SERIAL PORT
1
VCC
6
3
4
0.1UF
0.1UF
15
12
11
9
10
5
Figure 1.4. RS-232C Interface
7
DSR
RXD
RTS
TXD
CTS
1
6
2
7
3
8
4
9
5
USE STRAIGHT–
THRU CABLE
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Note The measurement cycle ends when there have been no
High-to-Low nor Low-to-High transitions on the BRD pin for at least
75 ms. Consequently, the RC8650 will ignore any data sent to it for
a period of 75 ms after the “lock-on” character has been received.
The CTS# pin is driven High during this time, and the acknowledgment character is not transmitted until the RC8650 is actually
ready to accept data. See Figure 1.5.
Because the RC8650 can take up to 15 µs to accept data written
to it (AC Characteristics, tYHWH parameter), software drivers
should wait for RDY to drop to 0 after a byte is written in order to
avoid overwriting it with the next data byte. Not doing so could
result in the loss of data. Waiting for RDY to drop to 0 ensures that
RDY will not falsely show that the RC8650 is ready the next time
the driver is called.
Status messages
If a system interrupt can occur while waiting for RDY to become 0,
or if RDY cannot otherwise be checked at least once every 8 µs, a
software timeout should be enforced to avoid hanging up in the
wait loop. The time RDY stays 0 is relatively short (8 µs min.) and
can be missed if interrupted. The timeout should be at least 15 µs,
which is the maximum time for RDY to drop to 0 after writing a byte
of data. In non time-critical applications, the output routine could
simply delay 15 µs or longer before exiting, without checking for
RDY = 0 at all.
The serial port provides real-time operating status information via
the TXD pin. Status are transmitted as one-byte messages, shown
in Table 1.3. Each message directly correlates to a status flag in
the Status Register (Table 1.4). The specific character used, and
whether it will be transmitted, are functions of the VC and STM bits
of the Protocol Options Register. (The Protocol Options Register is
described in Section 2.) For information about how to obtain reading-progress status, see the Index Marker command description.
Figure 1.6 illustrates the recommended method of writing data to
the RC8650’s bus interface. This method should be used for writing all types of data, including text, commands, tone generator
and real time audio data.
Table 1.3. Status Messages
Event
VC = 0
VC = 1
Output has started
“B”
Output has stopped
“E”
“s”
–
Buffer almost full
(<100 bytes available)
–
Sleep/Standby mode
confirmation
“S”
Baud rate lock
confirmation
“L”
Yes
START
“t”
Buffer almost empty
(<100 bytes remaining)
Requires
STM = 1
“e”
“f”
“p”
“l”
Yes
Yes
READ STATUS
REGISTER
Yes
No
RDY = 1
?
NO
No
YES
WRITE BYTE
TO RC8650
Bus/Printer Interface
The RC8650’s bus interface allows it to be connected to a microprocessor or microcontroller in the same manner as a static RAM
or I/O device, as shown in Figure 1.7. The microprocessor controls all transactions with the RC8650 over the system data bus
using the RD and WR# signals. RD controls the reading of the
RC8650’s Status Register; WR# controls the transfer of data into
the RC8650. The Status Register bits and their definitions are
shown in Table 1.4.
READ STATUS
REGISTER
NO
RDY = 0
?
A registered bus transceiver is required for communication between the RC8650 and microprocessor; two 74HCT374s placed
back to back may be substituted for the 74HCT652 shown in the
figure. Prior to each write operation to the RC8650, the host processor should verify that the RC8650 is ready by testing the RDY
status flag.
YES
NO
15 µs
TIMEOUT
?
YES
WRITE COMPLETE
The RC8650 can also be interfaced to a PC’s printer port as shown
in Figure 1.7. A 74HCT374 can be used in place of the 74HCT652,
since bidirectional communication is not necessary. Handshaking is performed automatically via the BUSY pin.
Figure 1.6. Recommended Method of Writing Data Via
the Bus Interface
8
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.4. Bus Interface Status Register Bit Definitions
R
TS
R
RDY
AF
AE
STBY
R
7
6
5
4
3
2
1
0
Status Register Bit
Description
SR.7 = RESERVED (R)
Reserved for future use. Mask out when polling the Status Register.
SR.6 = TALK STATUS (TS)
1 = Talking
0 = Idle
The TS bit has the same meaning as the TS pin. “1” means that the RC8650 is producing
output; “0” means output has ceased. The TS bit is not affected by the TS Pin Control
command, which affects only the TS pin.
SR.5 = RESERVED (R)
Reserved for future use. Mask out when polling the Status Register.
SR.4 = READY STATUS (RDY)
1 = Ready
0 = Busy
The RDY bit has the same meaning as the RDY# pin. The RC8650 sets RDY to “1” to
indicate that it is ready to receive data. RDY drops to “0” momentarily after each write
operation over the PIO bus, acknowledging receipt of each character.
SR.3 = ALMOST FULL (AF)
1 = Buffer almost full
0 = Buffer not almost full
This bit is “1” anytime there are less than 100 bytes available in the input buffer. AF is
always “0” in the real time audio playback mode and when using the musical tone
generator.
SR.2 = ALMOST EMPTY (AE)
1 = Buffer almost empty
0 = Buffer not almost empty
This bit is “1” anytime there are less than 100 bytes remaining in the input buffer. AE is
always “1” in the real time audio playback mode and when using the musical tone
generator.
SR.1 = STANDBY MODE (STBY)
1 = RC8650 is in Standby mode
0 = RC8650 not in Standby mode
This bit is “1” when the RC8650 has entered Standby mode. Standby mode is entered
either by setting the STBY# pin Low or from the Sleep Timer.
SR.0 = RESERVED (R)
Reserved for future use. Mask out when polling the Status Register.
CENTRONICS
COMPATIBLE
PRINTER PORT
VCC
12
GND
RC8650
PIO0
PIO1
PIO2
PIO3
PIO4
PIO5
PIO6
PIO7
STS#
PRD#
88
87
86
85
84
83
82
81
4
5
6
7
8
9
10
11
47
49
1
21
A0
A1
A2
A3
A4
A5
A6
A7
CAB
GBA#
VCC
SBA
SAB
B0
B1
B2
B3
B4
B5
B6
B7
CBA
GAB
BUS INTERFACE
24
22
2
DB25
20
19
18
17
16
15
14
13
23
3
DB0
DB1
DB2
DB3
DB4
DB5
DB6
DB7
DB0
DB1
DB2
DB3
DB4
DB5
DB6
DB7
WR#
RD
WR#
74HCT652
VCC
4.7K
VCC
47K
PWR#
RDY#
20
37
Figure 1.7. Bus/Printer Interface
9
2
3
4
5
6
7
8
9
1
13
15
10
11
12
18
DATA0
DATA1
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
STB#
SLCT
ERROR#
ACK#
BUSY
PE
GND
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
VCC
10
1
TO RC8650
3
4
7
8
13
14
17
18
PIO0
PIO1
PIO2
PIO3
PIO4
PIO5
PIO6
PIO7
STS#
11
GND
OC#
20
VCC
D0
D1
D2
D3
D4
D5
D6
D7
2
5
6
9
12
15
16
19
Q0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
STBY
AE
AF
RDY
LATCHED
STATUS
FLAGS
TS
CLK
74HCT374
Figure 1.8. Method of Capturing Status Information for Driving External Circuitry
Analog Audio Output
Digital Audio Output
The analog output pins AO0 and AO1 are high impedance (10 kΩ
typ) outputs from the RC8650’s internal D/A converters. When using these outputs, the addition of an external low-pass filter is
highly recommended.
The digital audio pin DAOUT outputs the RC8650’s audio signal
as a digital audio stream consisting of 8 data bits per sample. The
normalized sampling rate for all text to speech modes and the
DTMF generator is 84 kbs (10,500 bytes/sec). The sinusoidal generator, prerecorded and real time audio playback mode rates are
user programmable, so their normalized rates will vary. See the
Pin Descriptions and Audio Control Register command description for further details.
The circuit shown in Figure 1.9 is a low-pass filter/power amplifier
capable of delivering 675 mW into an 8 Ω load. The circuit is representative of one channel (use AO0 in single-channel systems).
The amplifier’s shutdown pin can be controlled by the corresponding channel’s TS pin to minimize current drain when the
channel is inactive.
47K
0.027UF
47K
1000PF
22K
AO0
VCC
4 –
6
LM4862
8200PF
3 +
3900PF
2
7
1
8Ω
5
8
+
1UF
TS0
(PROGRAM TS0 PIN
FOR ACTIVE LOW)
Figure 1.9. 3 kHz Low-Pass Filter/Power Amplifier
10
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
ELECTRICAL SPECIFICATIONS
+5V
mA
C3 0.1UF
8
14
64
96
97
95
94
93
2
1
100
+5V
SERIAL I/F
BUS I/F
TERMINATE
UNUSED I/O PINS
TO APPROPRIATE
LOGIC LEVEL
DIGITAL
AUDIO
ANALOG
OUTPUTS &
CONTROL
+5V
R1
47K
R2
47K
30
29
28
7
36
35
38
37
49
47
20
32
31
34
33
4
5
23
24
3
6
21
22
92
91
90
89
10
18
VSS
VSS
VSS
AVSS
AN0
AN1
AN2
AN3
AMPIN
AMPOUT
ADTRG
C4 0.1UF
VCC
VCC
VCC
VCC
VCC
AVCC
AVREF
41
12
SW2
15
Y1
7.3728 MHZ
13
IC32
IC31
IC30
IC29
IC28
IC27
IC26
IC25
IC24
IC23
IC22
IC21
IC20
IC19
IC18
IC17
IC16
IC15
IC14
IC13
IC12
IC11
IC10
IC9
IC8
IC7
IC6
IC5
IC4
IC3
IC2
IC1
IC0
RDY#
PRD#
STS#
PWR#
DAIN
DAOUT
DARTS#
DACLK
AO0
TS0
SUSP0#
AS0
AO1
TS1
SUSP1#
AS1
SEL1
SEL2
SEL3
SEL4
SEL5
SW1
ACLR#
RESET#
XIN
37
13
14
47
+5 V
R3
47K
BRS0
BRS1
BRS2
BRD
RXD
TXD
CTS#
STBY#
9
16
62
17
39
99
98
PIO7
PIO6
PIO5
PIO4
PIO3
PIO2
PIO1
PIO0
R4
47K
VSS
VSS
27
46
R5
47K
25
26
27
80
79
78
77
76
75
74
73
71
70
69
68
67
66
65
63
61
60
59
58
57
56
55
54
53
52
50
46
44
42
10
15
16
30
32
34
36
39
41
43
45
25
24
23
22
21
20
19
18
8
7
6
5
4
3
2
1
48
17
26
11
28
12
81
82
83
84
85
86
87
88
44
42
40
38
35
33
31
29
IC32
IC31
IC30
IC29
IC28
IC27
IC26
IC25
IC24
IC23
IC22
IC21
IC20
IC19
IC18
IC17
IC16
IC15
IC14
IC13
IC12
IC11
IC10
IC9
IC8
IC7
IC6
IC5
IC4
IC3
IC2
IC1
IC0
PIO7
PIO6
PIO5
PIO4
PIO3
PIO2
PIO1
PIO0
XOUT
U1
RC8650FP
C1
22 PF
VCC
VCC
VCC
VCC
U2
RC46xxFP
C2
22 PF
Figure 1.10. Typical Operating Circuit/Test Circuit
* WARNING: Stresses greater than those listed under “Absolute
Maximum Ratings” may cause permanent damage to the device.
This is a stress rating only; operation of the device at any condition
above those indicated in the operational sections of these specifications is not implied. Exposure to absolute maximum rating
conditions for extended periods may affect device reliability.
ABSOLUTE MAXIMUM RATINGS*
Supply voltage, VCC and AVCC . . . . . . . . . . . . . . . –0.3 V to +6.5 V
DC input voltage, VI . . . . . . . . . . . . . . . . . . . . . –0.3 V to VCC +0.3 V
Operating temperature, TA . . . . . . . . . . . . . . . . . . . 0 °C to +70 °C
Storage temperature, TS . . . . . . . . . . . . . . . . . . . –55 °C to +125 °C
11
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
DC CHARACTERISTICS
TA = 0 °C to +70 °C, VCC = AVCC = AVREF = 3.3 V / 5 V, VSS = AVSS = 0 V, XIN = 7.3728 MHz
Symbol
3.3 ± 0.3 V
Parameter
Min
VIL
VIH
VIA
1
Input voltage, Low
Input voltage, High
VHYR
Analog input voltage (AN0-3)
Input hysterisis, RESET#
VOL
Output voltage, Low
VOH
Output voltage, High
IIL
Input load current
RO
Analog output resistance
(AO0-1)
ICC
Supply current
Active
Idle
Standby
Program (Note 1)
5 V ± 10%
Typ
Typ
Unit
Test Conditions
Max
Max
Min
0
0.8VCC
0.2VCC
0
0.8VCC
0.2VCC
VCC
0
0.2
AVREF
0
0.2
AVREF
V
V
1.8
V
0.5
V
V
IOL = 1 mA
IOH = –1 mA
±5
µA
VIN = VSS to VCC
10
20
kΩ
17
1
2
35
2
25
70
mA
mA
µA
mA
VCC
1.8
0.5
VCC – 0.5
VCC – 0.5
±4
4
10
20
11
0.7
2
20
1.5
15
50
4
V
All outputs open;
all inputs = VCC or
VSS; AVCC and
AVREF currents
included
Applies during internal programming operations: greeting message, dictionary and prerecorded sound file downloads, and microcode updates.
AC CHARACTERISTICS
TA = 0 °C to +70 °C, VCC = AVCC = AVREF = 3.3 V / 5 V, VSS = AVSS = 0 V
External Clock Input Timing
Symbol
fC
tWCL
3.3 ± 0.3 V
Parameter
tWCH
tCR
External clock input frequency
External clock input Low pulse width
External clock input High pulse width
External clock rise time
tCF
External clock fall time
tWCL
5 V ± 10%
Nom
Max
Min
Nom
Max
7.3359
60
60
7.3728
67.8
67.8
7.4097
7.3359
40
40
7.3728
67.8
67.8
7.4097
18
18
tWCH
XIN
tCF
Unit
Min
tCR
Figure 1.11. External Clock Waveform
12
15
15
MHz
ns
ns
ns
ns
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Bus Interface Timing
Symbol
3.3 ± 0.3 V
Parameter
Min
tWSL
tDVSL
1
Max
5 V ± 10%
Min
Unit
Max
215
tDHSH
STS# pulse width Low
STS# Low to data valid
Data hold from STS# going High
250
5
5
ns
ns
ns
tWRL
tDVRH
tDHRH
PRD# pulse width Low
Data setup to PRD# going High
Data hold from PRD# going High
215
85
0
250
40
0
ns
ns
ns
tWWL
tDVWH
tDHWH
tYHWH
tWYH
PWR# pulse width Low
Data setup to PWR# going High
Data hold from PWR# going High
RDY# High from PWR# going High (Note 1)
RDY# pulse width High (Note 1)
380
–2
15
250
–2
15
ns
µs
µs
µs
µs
155
150
15
8
15
8
Applies to the RDY# pin and RDY status flag.
tWSL
STS#
tWRL
PRD#
tWWL
PWR#
tYHWH
[Note 1]
RDY#
tWYH
[Note 1]
tDVRH
tDVSL
PIO0–
PIO7
tDHSH
OUTPUT
1
tDHRH
INPUT
tYHWH and tWYH apply to both the RDY# pin and RDY status flag.
Figure 1.12. Bus Interface Waveforms
13
tDHWH
tDVWH
INPUT
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Analog Audio Timing
AOi
95 µs nom.
ASi
Audio suspended
Audio resumed
SUSPi#
Figure 1.13. Analog Audio Waveforms
Digital Audio Timing
Symbol
Parameter
Min
tCYC
tWCL
DACLK cycle time
DACLK pulse width Low
tWCH
DACLK pulse width High
tDVCL
tDHCL
fS
DACLK Low to data valid
Data hold from DACLK going Low
TTS and DTMF generator internal sampling rate
Max
Unit
200
ns
100
100
ns
0
10.5
ns
80
ns
10.5
ns
kHz
tCYC
tWCH
DACLK
tWCL
DAOUT
tDVCL
Figure 1.14. Digital Audio Waveforms
14
Notes
tDHCL
Nominal
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Standby Timing
Symbol
3.3 ± 0.3 V
Parameter
Min
tWSBL
STBY# pulse width Low
To enter Standby mode
To reinitialize parameter memory
To exit Sleep mode
5 V ± 10%
Max
250
Min
250
250
380
tWSBL
STBY#
Figure 1.15. Standby Waveform
15
Unit
Max
250
250
ms
ms
ns
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
PACKAGE INFORMATION
100 Pin Plastic 14 x 20 mm QFP (measured in millimeters)
SEATING
PLANE
16.5
17.1
13.8
14.2
0.00
0.20
19.8
20.2
22.5
23.1
3.05
MAX
0.25
0.40
0.65
0.13
0.20
DETAIL A
0.10
SEE DETAIL A
16
0°
10°
0.40
0.80
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
48 Pin Plastic 12 x 20 mm TSOP (measured in millimeters)
SEATING
PLANE
19.8
20.2
18.3
18.5
0.05
0.20
0.50
0.15
0.25
11.9
12.1
0.10
1.20
MAX
DETAIL A
0.12
0.18
SEE DETAIL A
0.40
0.60
0.80
Recommended PCB Layouts (measured in millimeters)
14.6
20.5
18.1
1.8
0.65
1.5
0.40
0.50
17
0.30
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
ORDERING INFORMATION
R C 8 6 L 5 0 – 0
VCC RANGE
RECORDED AUDIO CAPACITY
BLANK = 5 V ± 10%
0 = 0 sec
1 = 130 sec
L = 3.3 ± 0.3 V
2 = 390 sec
3 = 910 sec
†
VALID COMBINATIONS:
RC8650-0
(RC4641FP)
RC86L50-0
(RC46L41FP)
(RC4651FP)
RC8650-1 *
(RC46L51FP)
RC86L50-1
(RC46L61FP)
RC86L50-2
(RC46L71FP)
RC86L50-3
†
All chipset versions come with RC8650FP.
Companion chip is shown in parentheses.
* Denotes standard product.
18
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 2: PRINCIPLES OF OPERATION
This section describes the operating characteristics of the
DoubleTalk RC8650 chipset.
TRANSLATION ACCURACY
Because the RC8650 must handle the highly irregular spelling
system of English, as well as proper names, acronyms, technical
terms, and borrowed foreign words, there inevitably will be words
that it will mispronounce. If a word is mispronounced, there are
three techniques for correcting it:
OPERATING MODES
The RC8650 has six primary operating modes and two low-power
modes designed to achieve maximum functionality and flexibility.
The operating mode can be changed anytime, even on the fly.
1. Spell the word phonetically for the desired pronunciation.
Note The RC8650 will not begin speaking until it receives a CR
(ASCII 13) or Null (ASCII 00) character—this ensures that a complete contextual analysis can be performed on the input text. If it is
not possible for the application to send a CR or Null at the end of
each text message, use the Timeout Delay command (nY).
2. Redefine the way the word should be pronounced by creating
an exception for it in the RC8650’s exception dictionary. This
method allows words to be corrected without having to modify
the original text, and it automatically corrects all instances of
the word. Exception dictionaries are covered in detail in Section 4.
The RC8650 does not make any distinction between uppercase
and lowercase characters. Text and commands may be sent as
all uppercase, all lowercase, or any combination thereof.
3. Use the RC8650’s Phoneme mode.
The first technique is the easiest way to fine tune word pronunciations—by tricking the RC8650 into the desired pronunciation.
Among the more commonly mispronounced words are compound words (baseball), proper names (Sean), and foreign loan
words (chauffeur). Compound words can usually be corrected by
separating the two words with a space, so that “baseball” becomes “base ball.” Proper names and foreign words may require
a bit more creativity, so that “Sean” becomes “Shon,” and “chauffeur” becomes “show fur.” Heteronyms (words with identical spelling but different meanings and pronunciations) can also be
modified using this technique. For example, if the word read is to
be pronounced “reed” instead of “red,” it can simply be respelled
as “reed.”
Text mode. In this mode, all text sent to the RC8650 is spoken
normally. Punctuation is also taken into consideration by the intonation generation algorithms. This is the default operating mode.
Character mode. This mode causes the RC8650 to translate
input text on a character-by-character basis; i.e., text will be
spelled instead of spoken as words.
Phoneme mode. This mode disables the RC8650’s text-tophonetics translator, allowing the RC8650’s phonemes to be directly accessed.
Real Time Audio Playback mode. In this mode, data sent to
the RC8650 is written directly to its audio buffer. This results in a
high data rate, but provides the capability of producing the highest quality speech, as well as sound effects. PCM and ADPCM
data types are supported.
COMMANDS
The commands described in the following pages provide a
simple yet flexible means of controlling the RC8650 under software control. They can be used to vary voice attributes, such as
the volume or pitch, to suit the requirements of a particular application or listener’s preferences. Commands are also used to
change operating modes.
Prerecorded Audio Playback mode. This mode allows recorded speech and sound effects to be stored on-chip and
played back at a later time. PCM and ADPCM data types are
supported.
Tone Generator modes. These modes activate the
RC8650’s musical tone generator, sinusoidal generator, or DTMF
generator. They can be used to generate audible prompts, music,
signaling tones, dial a telephone, etc.
Commands can be freely intermixed with the text that is to be
spoken, allowing the voice to be dynamically controlled. Commands affect only the data that follows them in the data stream.
Idle mode. To help conserve power in battery-powered systems, the RC8650 automatically enters a reduced-power state
whenever it is inactive. Data can still be read and written to the
RC8650 while in this mode. Current draw is typically 1 mA.
Command Syntax
All RC8650 commands are composed of the command character, a parameter n comprised of a one to four-digit number string,
and a single string literal that uniquely identifies the command.
Some commands simply enable or disable a feature of the
RC8650 and do not require a parameter. The general command
format is:
Standby mode. This mode powers down the RC8650, where
current draw is typically only 2 µA. Standby mode can be invoked
from either the STBY# pin or with the Sleep command. Data cannot be read from or written to the RC8650 in this mode.
<command character>[<number string>]<string literal>
19
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
If two or more commands are to be used together, each must be
prefaced with the command character. This is the only way the
RC8650 knows to treat the remaining characters as a command,
rather than text that should be spoken. For example, the following
commands program pitch level 40 and volume level 7 (Control-A
is the default command character):
TTS COMMANDS
This section describes the software commands that affect the
text-to-speech synthesizer.
Text Mode/Delay (T/nT)
This command places the RC8650 in the Text operating mode.
The optional delay parameter n is used to create a variable pause
between words. The shortest, and default delay of 0, is used for
normal speech. For users not accustomed to synthetic speech,
the synthesizer’s intelligibility may be improved by introducing a
delay. The longest delay that can be specified is 15. If the delay
parameter is omitted, the current (last set) value will be used and
the exception dictionary will be disabled. This feature is useful for
returning from another operating mode or disabling the exception
dictionary (see Enable Exceptions command).
Control-A “40P” Control-A “7V”
The command character
The default RC8650 command character is Control-A (ASCII 01).
The command character itself can be spoken by the RC8650 by
sending it twice in a row: Control-A Control-A. This special
command allows the command character to be spoken without
affecting the operation of the RC8650, and without having to
change to another command character and then back again.
Changing the command character
Character Mode/Delay (C/nC)
The command character can be changed to another control character (ASCII 01-26) by sending the current command character,
followed by the new character. To change the command character to Control-D, for example, issue the command Control-A
Control-D. To change it back, issue the command Control-D
Control-A. It’s generally a good idea to change the command
character if the forthcoming text contains characters which may
otherwise be interpreted as command characters (and hence
commands).
This command puts the RC8650 in the Character operating
mode. The optional delay parameter n is used to create a variable
pause between characters. Values between 0 (the default) and
15 provide pauses from shortest to longest, respectively. Values
between 16 and 31 provide the same range of pauses, but control
characters will not be spoken. If the delay parameter is omitted,
the current value will be used and the exception dictionary will be
disabled.
The command character can be unconditionally reset to ControlA by sending Control-^ (ASCII 30) to the RC8650 while operating
in the Text, Character, or Phoneme modes.
Phoneme Mode (D)
This command disables the text-to-phonetics translator, allowing
the RC8650’s phonemes to be accessed directly. Table 2.1 lists
the phonemes that can be produced by the RC8650.
Command parameters
Command parameters are composed of one to four digit number
strings. The RC8650 supports two types of parameters: absolute
and relative. Absolute parameters explicitly specify the
parameter’s new value, such as 9S or 3B. Relative parameters
specify a displacement from a parameter’s current value, not the
actual new value itself.
When concatenating two or more phonemes, each phoneme
must be delimited by a space. For example, the word “computer”
would be represented phonetically as
K AX M P YY UW DX ER
Phoneme attribute tokens
Relative parameters can specify either a positive or negative displacement from a parameter’s current value. For example, the
Volume command +2V increases the volume level by two
(V+2→V). If the current volume is 4, the volume will increase to 6
after the command has executed. The command –2V will have a
similar effect, except the volume will be decreased by two.
The RC8650 supports a number of phoneme attribute tokens that
can be used in addition to the standard commands. These tokens
do not require the command character or any parameters, but
can only be used in Phoneme mode.
As indicated in Table 2.2, the / and \ tokens temporarily increase
and decrease the pitch by m steps. Besides being temporary, the
difference between using the pitch tokens and the Pitch command is that the effective pitch range is extended beyond the
normal 0-99 range by approximately ±20 steps, and if the pitch
should fall out of range, it will always saturate, regardless of the
Protocol Options Register SAT setting.
If the value of a parameter falls outside the command’s range, the
value will either wrap around or saturate, depending on the setting
of the SAT bit of the Protocol Options Register. For example, if
parameters are programmed to wrap, the current volume is 7 and
the command +4V is issued, the resultant volume will be (7+4)–10
= 1, since the volume range is 0-9. If parameters are programmed
to saturate, the resultant volume would be 9 instead.
All other phoneme attribute token commands remain in effect until
explicitly changed.
When writing application programs for the RC8650, it is recommended that relative parameters be used for temporarily changing voice attributes (such as raising the pitch of a word), using
absolute-parameter commands only once in the program’s initialization routine. This way, if the base value of an attribute needs to
be changed, it only needs to be changed in the initialization routine.
Applications of Phoneme mode
Phoneme mode is useful for creating customized speech, when
the normal text-to-speech modes are inappropriate for producing
the desired voice effect. For example, Phoneme mode should be
used to change the stress or emphasis of specific words in a
20
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
100 A$ = CHR$(1)
105 LPRINT A$;"D";A$;"M"
110 LPRINT "//H AW -/D>/EH R +<\\YY UW
S P \IY K T UW \M IY
DH AE T -\W
EY .+/"
Table 2.1. DoubleTalk Phoneme Symbols
Phoneme
Symbol
A
AA
AE
AH
AW
AX
AY
B
CH
D
DH
DX
E
EH
EI
ER
EW
EY
F
G
H
I
IH
IX
IY
J
K
KX
L
Example
Word
das (Spanish)
cot
cat
cut
cow
bottom
bite
bib
church
did
either
city
ser (Spanish)
bet
mesa (Spanish)
bird
acteur (French)
bake
fee
gag
he
libro (Spanish)
bit
rabbit
beet
age
cute
ski
long
Phoneme
Symbol
M
N
NG
NY
O
OW
OY
P
PX
R
RR
S
SH
T
TH
TX
U
UH
UW
V
W
WH
Y
YY
Z
ZH
space
,
.
Example
Word
me
new
rung
niño (Spanish)
no (Spanish)
boat
boy
pop
spot
ring
tres (Spanish)
sell
shell
tin
thin
stick
uno (Spanish)
book
boot
valve
we
when
mayo (Spanish)
you
zoo
vision
variable pause *
medium pause
long pause
Note in line 105 that expression is disabled, since the pitch variations due to the internal intonation algorithms would otherwise interfere with the pitch tokens. Compare this with the same phrase
produced in Text mode with expression enabled:
100 A$ = CHR$(1)
105 LPRINT A$;"T";A$;"E"
110 LPRINT "How dare you speak to me that
way!"
Phoneme mode is also useful in applications that provide their
own text-to-phoneme translation, such as the front end of a custom text-to-speech system.
Speed (nS)
The synthesizer’s speech rate can be adjusted with this command, from 0S (slowest) through 9S (fastest). The default rate is 1S
(5S if the VC bit of the Protocol Options Register is set to 0).
Voice (nO)
The text-to-speech synthesizer has eight standard voices and a
number of individual voice controls that can be used to independently vary the voice characteristics. Voices are selected with the
commands 0O through 7O, shown in Table 2.3. Because this
command alters numerous internal voice parameters (pitch, expression, tone, etc.), it should precede any individual voice control commands.
* Normally used between words; duration determined by nT command
Table 2.3. Voice Presets
n
Table 2.2. Phoneme Attribute Tokens
Symbol
nn
/
\
+
–
>
<
0
1
2
3
4
5
6
7
Function
Set pitch to 'nn' (0-99)
Increase pitch m steps *
Decrease pitch m steps *
Increase speed 1 step
Decrease speed 1 step
Increase volume 1 step
Decrease volume 1 step
* Step size determined by nE command; m ≈ 2n
Voice Name
Perfect Paul (default)
Vader
Big Bob
Precise Pete
Ricochet Randy
Biff
Skip
Robo Robert
Articulation (nA)
This command adjusts the articulation level, from 0A through 9A.
Excessively low articulation values tend to make the voice sound
slurred; very high values, on the other hand, can make the voice
sound choppy. The default articulation is 5A.
phrase. This is because Phoneme mode allows voice attributes to
be modified on phoneme boundaries within each word, whereas
Text mode allows changes only at word boundaries. This is illustrated in the following Basic program examples.
21
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
base levels, the command can be expanded to control how number strings will be read. This is done by ORing the values 04h and/
or 08h to the base parameter range, as described below.
Expression (E/nE)
Expression, or intonation, is the variation of pitch within a sentence
or phrase. When expression is enabled (n > 0), the RC8650 attempts to mimic the pitch patterns of human speech. For example,
when a sentence ends with a period, the pitch drops at the end of
the sentence; a question mark will cause the pitch to rise.
Table 2.4. Punctuation Filter
n
The optional parameter n determines the degree of intonation. 0E
provides no intonation (monotone), whereas 9E is very animated
sounding. 5E is the default setting. If the parameter is omitted, the
current (last set) value will be used. This is useful for re-enabling
intonation after a Monotone command.
0
1
2
3
Punctuation Spoken
All
Most (all but CR, LF, Space)
Some ($%&#@=+*^|\<>)
None
Monotone (M)
This command disables all intonation (expression), causing the
RC8650 to speak in a monotonic voice. Intonation should be disabled whenever manual intonation is applied using the Pitch command or phoneme attribute tokens. Note that this command is
equivalent to the 0E command.
Effect on number strings
The values of n listed in Table 2.4 cause number strings to be read
one digit at a time (e.g., 0123 = “zero one two three”). ORing 04h
to the values listed in the table (n = 4-7) forces number strings to
be read as numbers (0123 = “one hundred twenty three”). N = 6
and n = 7 also force currency strings to be read as they are normally spoken—for example, $11.95 is read as “eleven dollars and
ninety five cents.” Finally, ORing 08h to these values (n = 8-15)
disables leading zero suppression; number strings beginning
with zero will always be read one digit at a time.
Formant Frequency (nF)
This command adjusts the synthesizer’s overall frequency response (vocal tract formant frequencies), over the range 0F
through 9F. By varying the frequency, voice quality can be finetuned or voice type changed. The default frequency is 5F.
The default filter setting is 6B (Some punctuation, Numbers mode,
leading zero suppression enabled).
Pitch (nP)
This command varies the synthesizer’s pitch over a wide range,
which can be used to change the average pitch during speech
production, produce manual intonation, or create sound effects
(including singing). Pitch values can range from 0P through 99P;
the default is 50P.
CONTROL COMMANDS
Volume (nV)
This is a global command that controls the RC8650’s output volume level, from 0V through 9V. 0V yields the lowest possible volume; maximum volume is attained at 9V. The default volume is 5V.
The Volume command can be used to set a new listening level,
create emphasis in speech, or change the output level of the tone
generators.
Tone (nX)
The synthesizer supports three tone settings, bass (0X), normal
(1X) and treble (2X), which work much like the bass and treble
controls on a stereo. The best setting to use depends on the
speaker being used and personal preference. Normal (1X) is the
default setting.
Timeout Delay (nY)
Reverb (nR)
The RC8650 defers translating the contents of its input buffer until
a CR or Null is received. This ensures that text is spoken smoothly
from word to word and that the proper intonation is given to the
beginnings and endings of sentences. If text is sent to the RC8650
without a CR or Null, it will remain untranslated in the input buffer
indefinitely.
This command is used to add reverberation to the voice. 0R (the
default) introduces no reverb; increasing values of n correspondingly increase the reverb delay and effect. 9R is the maximum
setting.
Punctuation Filter (nB)
The RC8650 contains a programmable timer that is able to force
the RC8650 to translate its buffer contents after a preset time interval. The timer is enabled only if the Timeout Delay parameter n is
non-zero, the RC8650 is not active (not talking), and the input
buffer contains no CR or Null characters. Any characters sent to
the RC8650 before timeout will automatically restart the timer.
Depending on the application, it may be desirable to limit the
reading of certain punctuation characters. For example, if the
RC8650 is used to proofread documents, the application may call
for only unusual punctuation to be read. On the other hand, an
application that orally echoes keyboard entries for a blind user
may require that all punctuation be spoken.
The Timeout parameter n specifies the number of 200 millisecond
periods in the delay time, which can range from 200 milliseconds
to 3 seconds. The default value is 0Y, which disables the timer.
The RC8650 supports four primary levels of punctuation filtering
as shown in Table 2.4. These levels determine which punctuation
characters will be spoken and which will not. In addition to the four
22
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Once the RC8650 has entered Standby mode, it can be re-awakened only by a hardware reset or by driving the STBY# pin low for
250 ns or longer, then High again. All of the RC8650 handshake
signals (BUSY, DTR#, and RDY#) are forced to their “not ready”
states when the RC8650 is in Standby.
Table 2.5. Timeout Delays
n
0
1
2
.
.
15
Delay
Indefinite (wait for CR/Null)
200 milliseconds
400 milliseconds
.
.
3000 milliseconds (3 sec.)
Index Marker (nI)
Index markers are nonspeaking “bookmarks” that can be used to
keep track of where the RC8650 is reading within a passage of
text. The parameter n is any number between 0 and 99; thus, up to
100 unique markers may be active at any given time.
When the RC8650 has spoken the text up to a marker, it transmits
the marker number to the host via the TXD pin. Note that this value
is a binary number between 0 and 99, not a literal number string as
was used in the command to place the marker. This allows the
marker to be transmitted as a one-byte value.
Sleep Timer (nQ)
This command activates the RC8650’s sleep timer. If the user forgets to turn off the system’s power at the end of the day, for example, the sleep timer can be used to force the RC8650 into
Standby mode automatically. An audible “reminder” tone can
even be programmed to sound every ten minutes to remind the
user that the power was left on, before shutdown occurs.
Baud Rate (nH)
The serial port’s baud rate may be programmed to any of the rates
listed in Table 2.7. If included as part of the greeting message, the
command will effectively override the default baud rate set by the
BRS pins.
The sleep timer is reset anytime the RC8650 is active, or more
precisely, whenever the TS pin is asserted. The timer begins running on the falling edge of TS. In this way, the RC8650 will not shut
itself down during normal use, as long as the programmed timer
interval is longer than the maximum time the RC8650 is inactive.
Table 2.7. Programmable Baud Rates
The command parameter n determines when Standby mode will
be entered. You can place the RC8650 in Standby mode immediately, program the sleep timer to any of 15 ten-minute intervals (10
to 150 minutes), or disable the sleep timer altogether.
n
0
1
2
3
4
5
6
7
8
9
10
Note that the delay interval is simply n x 10 minutes for 0 < n < 16.
ORing 10h to these values (16 < n < 32) also enables the reminder tone, which sounds at the end of each ten minute interval.
Programming n = 0 disables the sleep timer, which is the default
setting. Setting n = 16 forces the RC8650 to enter Standby mode
as soon as all output has ceased.
If the sleep timer is allowed to expire, the RC8650 will emit the
ASCII character “p” from the TXD pin and the STBY status flag will
be set to 1, just before entering Standby mode. This enables the
host to detect that the RC8650 has entered Standby mode.
0
1
.
.
15
16
17
.
.
31
300
600
1200
2400
4800
9600
19200
Auto-detect
38400
57600
115200
TS Pin Control (nK)
Table 2.6. Sleep Timer
n
Baud Rate
The TS pins provide talk status information for each audio channel, which can be used to activate a transmitter, take a telephone
off hook, enable an audio power amplifier, etc., at the desired time.
Each pin’s state and polarity can be independently configured, as
shown in Table 2.8. The programming of the TS pins do not affect
the Status Register TS flag in any way. The default setting is 1K.
Delay
Sleep timer disabled
10 min
.
.
150 min
0 (immediate)
10 min w/reminder
.
.
150 min w/reminder
If a TS pin is programmed High or Low, it will remain so until
changed otherwise. This feature can be used to activate a transmitter, for example, before speech output has begun. In the automatic mode, the TS pin is asserted as soon as output begins; it will
return to its false state when all output has ceased. Note that because RC8650 commands work synchronously, the TS pin will not
change state until all text and commands, up to the TS Pin Control
command, have been spoken and/or executed.
23
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Bit POR.7 (VC) programs the RC8650 to emulate RC Systems’
original V8600 voice synthesizer module. When this bit is set to 0
(which V8600 application programs do, as this bit was undefined
in the V8600), the overall voice speed range is reduced and the
default speed is changed from 1S to 5S, matching the characteristics of the V8600. The serial port status messages are also affected by the setting of this bit.
Table 2.8. TS Pin Control
n
TS Mode/Polarity
0
1
2
3
Automatic/Active Low
Automatic/Active High
Forced Low
Forced High
Note Relative parameters work differently than usual with this
command. Instead of specifying a displacement from the
register’s current value, relative parameters allow you to set (“+”)
and clear (“–”) individual register bits. For example, +65G sets bits
POR.0 and POR.6; –65G clears POR.0 and POR.6.
Protocol Options Register (nG)
This command controls various internal RC8650 operating parameters. The command parameter n is calculated by ORing together the individual control bits shown in Table 2.9. For example,
193G (193 = 128 + 64 + 1) disables V8600 emulation, enables all
status messages and specifies that parameters should saturate.
128G is the default setting.
Table 2.9. Protocol Options Register Bit Definitions
VC
SAT
DDUR
R
R
R
R
STM
7
6
5
4
3
2
1
0
Protocol Options Register Bit
Description
POR.7 = V8600 COMPATIBILITY (VC)
1 = Compatibility disabled
0 = Compatibility enabled
Emulates RC Systems’ V8600 voice synthesizer module when set to “0.” Overall
voice speed range and serial port status responses are adjusted to that of the V8600.
Default: “1” (in the V8600A module, this bit defaults to “0”).
POR.6 = SATURATE (SAT)
1 = Parameters saturate
0 = Parameters wrap
Determines whether command parameters wrap or saturate when their range has
been exceeded. Default: “0.”
POR.5 = DTMF DURATION (DDUR)
1 = 500 ms
0 = 100 ms
Determines DTMF (Touch-Tone) generator burst duration. When set to “1,” tone
bursts are 500 ms long; when “0,” 100 ms. Default: “0.”
POR.4 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
POR.3 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
POR.2 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
POR.1 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
POR.0 = STATUS MESSAGES (STM)
1 = Enabled
0 = Disabled
Enables and disables the transmission of certain status messages from the TXD pin.
Default: “0.”
24
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Figure 2.1 is a functional block diagram of the ADC input stage;
Figure 2.2 illustrates the ADC in operation. Table 2.10 lists the
definitions of each bit of the ADC Control Register. The default
register setting is 0$.
ADC Control Register (n$)
The ADC Control Register controls the operation of the integrated
analog-to-digital converter. All ADC results are transferred via the
TXD pin.
Operation of the ADC is not mutually exclusive of other RC8650
functions. The ADC can operate concurrently with text-to-speech,
tone generation, audio playback, etc. The effective sampling rate
is one-tenth the serial port baud rate (115200 baud = 11.5 ksps).
The following is an overview of the ADC:
–
Four channels, 8-bit resolution (±2 LSB precision)
–
One-shot, continuous, single sweep, and continuous sweep
modes of operation
–
Selectable software or hardware triggering
–
Support for external amplification/signal conditioning of all
four ADC channels
Note Relative parameters work differently than usual with this
command. Instead of specifying a displacement from the
register’s current value, relative parameters allow you to set (“+”)
and clear (“–”) individual register bits. For example, +34$ sets bits
ADR.1 and ADR.5; –16$ clears ADR.4.
Table 2.10. ADC Control Register Definitions
R
AMP
TRG
CONT
SWP
R
CH
CH
7
6
5
4
3
2
1
0
ADC Control Register Bit
Description
ADR.7 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
ADR.6 = EXTERNAL AMPLIFIER (AMP)
1 = Amp connected
0 = Amp not connected
Set this bit to “1” to use an operational amplifier connected between the
AMPIN and AMPOUT pins. Connecting an op amp and enabling this function
allows the voltage input to each ADC input pin to be amplified with one op
amp. Default: “0.”
ADR.5 = TRIGGER SOURCE (TRG)
1 = Hardware trigger (ADTRG pin)
0 = Software trigger
Setting this bit to “1” enables hardware triggering of the ADC. The ADC will
not begin operating until the ADTRG pin changes from a High to a Low level.
When TRG is “0” the ADC will begin operating whenever the ADR register is
written to. Default: “0.”
ADR.4 = CONTINUOUS MODE (CONT)
1 = Continuous mode
0 = One-shot mode
Setting this bit to “1” causes the ADC to operate continuously. If a single
channel is selected for measurement (ADR.3 = 0), that channel will be read
repeatedly. If sweep mode is selected (ADR.3 = 1), the active input channels
will be continuously read in a cyclic fashion. Clearing this bit while the ADC is
operating will stop the ADC. Default: “0.”
ADR.3 = SWEEP MODE (SWP)
1 = Sweep mode
0 = Single-channel mode
This bit determines whether a single channel or multiple input channels will
be read. When Sweep mode is selected, ADR.1–0 determine which input
channels will be scanned. Default: “0.”
ADR.2 = RESERVED (R)
Reserved for future use. Write “0” to ensure future compatibility.
ADR.1–0 = CHANNEL SELECT (CH)
These bits determine which input channel(s) will be read by the ADC.
Default: “00.”
When ADR.3 = 0:
00 = AN0
01 = AN1
10 = AN2
11 = AN3
When ADR.3 = 1:
00 = undefined
01 = AN0–AN1 sweep
10 = undefined
11 = AN0–AN3 sweep
NOTES:
1. The AMPOUT pin can be used as a fifth ADC input if an external op amp is not used. Set ADR.6 = 1 to select the AMPOUT pin
for conversion.
25
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
4-1 MUX
TO ADC
CIRCUIT
AN0
AMP = 0
AN1
AN2
AN3
CH1
CH0
AMPIN
AMP = 1
AMPOUT
AMP = 1
Figure 2.1. ADC Input Block Diagram
CONT = CH1 = 1
CH0 = 0
TRG = SWP = CONT = 0
CH1 = CH0 = 0
TXD
AN0
AN2
CONT = 0
AN2
AN2
AN2
AN2
ADTRG
TRG = CH0 = 1
SWP = CONT = CH1 = 0
TXD
AN1
SWP = CONT = 1
AN1
AN0
CONT = 0
AN1
ADTRG
Figure 2.2. ADC Transfer Timing
26
AN0
AN1
AN0
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
each byte in order to keep the average transfer rate from exceeding 10 kbytes/sec.
Audio Control Register (nN)
The Audio Control Register determines whether the audio stream
will be output as an analog signal on the AO pins or as serial digital
data on the DAOUT pin. See Table 2.11 for the definition of each
register bit. The default register setting is 0N.
Figure 2.3 illustrates the synchronous data transfer mode. Note
how either DARTS# or DACLK can be used to regulate the flow of
data from the RC8650.
In the digital audio modes, data is transferred from the DAOUT pin
in 8 bit linear, offset binary format (midscale = 80h). The DARTS#
pin can be used to regulate the flow of data—it must be Low for
transfers to begin. In the synchronous mode, do not attempt to
read the data at an average rate faster than 10 kbytes/sec. At
clock rates above 80 kHz the host must pause between reading
Note Relative parameters work differently than usual with this
command. Instead of specifying a displacement from the
register’s current value, relative parameters allow you to set (“+”)
and clear (“–”) individual register bits. For example, +40N sets bits
ACR.3 and ACR.5; –5N clears ACR.0 and ACR.2.
Table 2.11. Audio Control Register Definitions
AM
TM
DPC
TF
TCP
BR
BR
BR
7
6
5
4
3
2
1
0
Audio Control Register Bit
Description
ACR.7 = AUDIO MODE (AM)
1 = Digital
0 = Analog
Set this bit to “0” to direct the audio stream to the AO pin (analog). Set the bit
to “1” to direct output to the DAOUT pin (digital). Default: “0.”
ACR.6 = TRANSFER MODE (TM)
1 = Synchronous
0 = Asynchronous
In the asynchronous transfer mode the data rate and timing are controlled
by the internal bit rate generator (ACR.2–0). Data is output on the DAOUT
pin and formatted as 1 start bit, 8 data bits (LSB first), and 1 stop bit.
In the synchronous transfer mode the data rate and timing are controlled
by the host with the DACLK pin. Data is output from the DAOUT pin as 8 bit
data frames.
Default: “0.”
ACR.5 = DAOUT PIN CONTROL (DPC)
1 = Open-drain
0 = CMOS
Set this bit to “1” to configure the DAOUT pin as an open-drain output, or to
“0” for a CMOS output. The open-drain configuration should be used when
wire-or’ing two or more DAOUT pins together. Default: “0.”
ACR.4 = TRANSFER FORMAT (TF)
1 = MSB first
0 = LSB first
Set this bit to “1” to have the 8 bit data frames transmitted most-significant bit
first, or to “0” for least-significant bit first. Valid only in the synchronous
transfer mode. Default: “0.”
ACR.3 = TRANSFER CLOCK POLARITY (TCP)
1 = Rising edge
0 = Falling edge
Set this bit to “1” to clock data out of the DAOUT pin on the rising edge of the
DACLK pin, or to “0” to clock data on the falling edge. Valid only in the
synchronous transfer mode. Default: “0.”
ACR.2–0 = BIT RATE (BR)
000 = 2400
001 = 4800
010 = 9600
011 = 14400
100 = 19200
101 = 28800
110 = 57600
111 = 115200
These bits determine the bit rate used in the asynchronous transfer mode.
Valid only in the asynchronous transfer mode. Default: “000.”
NOTES:
1. ACR.6–ACR.0 are valid only when ACR.7 = 1.
2. ACR.4–ACR.3 are valid only when ACR.7 and ACR.6 = 1.
3. ACR.2–ACR.0 are valid only when ACR.7 =1 and ACR.6 = 0.
27
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Starts transmission
DARTS#
Stopped because DARTS# is High
Stopped because DACLK stopped
DACLK
DAOUT
D0
D1
D2
D3
D4
D5
D6
D7
D7
D6
D5
D4
TF = TCP = 0
D3
D2
D1
D0
TF = TCP = 1
Figure 2.3. Synchronous Digital Audio Transfer Timing
Load Exception Dictionary (L)
Reinitialize (@)
This command purges the RC8650’s exception dictionary and
stores subsequent output from the host in the RC8650’s dictionary
memory. The maximum dictionary size is 16 KB.
This command reinitializes the RC8650 by clearing the input
buffer and restoring the voice parameters to their factory default
settings. The exception dictionary, prerecorded audio, nor greeting message are affected.
Exception dictionaries must be compiled into the format used by
the RC8650 before they can be used. The RC8650 Studio software, available from RC Systems, includes a dictionary editor and
compiler for performing this task.
Zap Commands (Z)
This command prevents the RC8650 from honoring subsequent
commands, causing it to read commands as they are encountered (useful in debugging). Any pending commands in the input
buffer will still be honored. The only way to restore command recognition after the Zap command has been issued is to write Control-^ (ASCII 30) to the RC8650 or perform a hardware reset.
The creation of exception dictionaries is covered in detail in Section 4.
Enable Exceptions (U)
The exception dictionary is enabled with this command. If the
RC8650 is in Phoneme mode, or if an exception dictionary has not
been loaded, the command will have no effect. The exception
dictionary can be disabled by issuing one of the mode commands D, T, or C.
TONE GENERATION COMMANDS
Musical/Sinusoidal Tone Generators (J/nJ)
The musical and sinusoidal tone generators are activated with
these commands. Refer to Section 3 for detailed information.
Clear (Control-X), Skip (Control-Y)
The Clear command stops the RC8650 and flushes its input buffer
of all text and commands. The Skip command skips to the next
sentence in the buffer. Neither command affects the RC8650’s
settings.
DTMF Generator (n*)
The DTMF (Touch-Tone) generator generates the 16 standard
tone pairs commonly used in telephone systems. Each tone is 100
ms in duration, followed by a 100 ms inter-digit pause (both durations can be extended to 500 ms by setting the DDUR bit of the
Protocol Options Register)—more than satisfying telephone signaling requirements. The mapping of the command parameter n
to the buttons on a telephone is shown in Table 2.12.
Note that the format of these commands is unique in that the command character (Control-A) is not used with them. The Control-X
(ASCII 24) and Control-Y (ASCII 25) characters are written directly
to the RC8650, which enables it to react immediately, even if its
input buffer is full. To be most effective, the states of the RC8650
handshaking signals should be ignored.
28
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
The “pause” tone can be used to generate longer inter-digit delays in phone number strings, or to create silent periods in the
RC8650’s output. The generator’s output level can be adjusted
with the Volume command (nV). DTMF commands may be intermixed with text and other commands without restriction.
to the RC8650’s internal audio buffer; the RC8650 then outputs
samples from the buffer to the DAC at the rate programmed by n.
Because the audio data is buffered within the RC8650, the output
sampling rate is independent of the data rate into the RC8650, as
long as the input rate is equal to or greater than the programmed
sampling rate.
Table 2.12. DTMF Dialer Button Map
The RC8650 supports PCM and ADPCM audio data formats.
ADPCM data is audio data that has been compressed using utility
software available from RC Systems (this software can also convert Wave files to PCM and ADPCM formats for the RC8650).
ADPCM compression yields data files that are half the size of PCM
files, thereby reducing the required data bandwidth and storage
requirements.
n
Button
0
.
.
9
10
11
12
13
14
15
16
0
.
.
9
*
#
A
B
C
D
pause
The output sampling rate can be programmed to any rate between 4 and 11 kHz (32,000-88,000 bps) by choosing the appropriate parameter value. The relationship between the command
parameter n and the sampling rate fs is
n = 155 – 617/fs
fs = 617/(155 – n)
where fs is measured in kHz. For example, to program an 8 kHz
sampling rate, choose n=78. The range of n is 0–99, hence fs can
range from 4 to 11 kHz.
The following procedure should be used for sending PCM or
ADPCM audio data to the RC8650 in real time:
AUDIO PLAYBACK COMMANDS
Prerecorded Audio Playback Mode (n&)
1) Program the desired volume level with the Volume (nV) command. A volume setting of 5 will cause the data to be played
back at its original volume level. This step is optional.
A virtually unlimited number of sound files can be stored in the
RC8650, limited only by the amount of available on-chip audio
memory. RC8650 Studio, a Windows-based application available
from RC Systems, makes it easy to arrange and manage standard
Windows wave files that can be downloaded into the RC8650.
2) Issue the Real Time Audio Playback Mode command n# if
PCM data is being sent, or n% for ADPCM data. The TS pin
and TS flag will be asserted at this time.
Each sound file (word, phrase, or sound effect) is automatically
assigned a record number, beginning with zero. The first file is
record 0, the second is record 1, and so on. The playback command plays records in any random order, using n to specify the
desired record.
3) If the RC8650’s serial port is being used for transferring the
audio data, change the host system’s baud rate to 115,200
baud at this time.
4) Begin transferring the audio data to the RC8650. The same
methods employed for sending ASCII data to the text-tospeech synthesizer should be used. PCM data must be sent
to the RC8650 as linear, eight bit signed data (–127 to +127, 0
= midscale).
The playback level can be adjusted with the Volume (nV) command. A volume setting of 5 will cause the files to be played back
at their original volume level.
Text and/or commands may be freely intermixed with the playback command. For example,
5) After the last byte of audio data has been sent to the RC8650,
send the value 80h (–128). This signals the RC8650 to terminate Real Time Audio Playback mode and return to the text-tospeech mode of operation. Note that up to 1024 bytes of data
may still be in the audio buffer, so the RC8650 may continue
producing sound for as long as 0.25 second (at 4 kHz sampling rate) after the last byte of data has been sent. The TS pin/
flag will not be cleared until all of the audio data has been
output to the DAC, at which time the RC8650 will again be able
to accept data from the host.
^A “11*” “Hello” ^A “–3V” ^A “3&” ^A “+3V” ^A “9&”
plays the Touch-Tone “#” key and says “hello” at the current volume setting, followed by the fourth sound file at a reduced volume
level, and finally the tenth sound file at the original volume level.
Real Time Audio Playback Mode (n#/n%)
This mode allows audio samples to be written directly to the
RC8650’s digital-to-analog converter (DAC) via the RC8650’s serial and parallel ports. All data sent to the RC8650 is routed directly
If the host’s serial port baud rate was changed in step 3, it
should now be changed back to its original rate.
29
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 2.13. Parameters Returned by Interrogate Command
MISCELLANEOUS COMMANDS
Write Greeting Message (255W)
Parameter
Cmd
Range
C/D/T
0=Char; 1=Phon; 2=Text
Punc filter
nB
0-15
Formant freq
nF
0-9
Pitch
nP
0-99
Speed
nS
0-9
Volume
nV
0-9
Tone
nX
0-2
Expression
nE
0-9
Control-A “3S” Control-A “2O” “ready”
Dict loaded
L
1=loaded; 0=not loaded
will program the RC8650 to use voice speed 3, Big Bob’s
voice, and say “ready” whenever it is reset.
Dict status
U
1=enabled; 0=disabled
Input buffer size
–
x100 bytes
3) Write a Null (ASCII 00) to terminate the command and store the
greeting in the RC8650’s nonvolatile memory.
Articulation
nA
0-9
Reverb
nR
0-9
TS pin control
nK
0-3
POR register
nG
0-255
ACR register
nN
0-255
–
x16K bytes
Sleep delay
nQ
0-31
Timeout delay
nY
0-15
Char mode delay
nC
0-31
Text mode delay
nT
0-15
Voice
nO
0-7
ADR register
n$
0-255
Anytime the RC8650 is reset, an optional user-defined greeting
message is automatically played. The message may consist of
any text/command sequence up to 234 characters in length.
Modal commands can be included, such as tone generator and
audio playback commands.
Mode
Note The exception dictionary is erased whenever a new greeting message is written to the RC8650.
To create a new greeting message, perform the following steps:
1) Write the command Control-A “255W”.
2) Write the exact text/command sequence you want to store, up
to 234 characters. For example, the string
Chipset Identification (6?)
This command returns RC8650 system information that is used
during factory testing. Eight bytes are transmitted via the TXD pin.
The only information that may be of relevance to an application is
the internal microcode revision number, which is conveyed in the
last two bytes in packed-BCD format. For example, 13h 01h would
be returned if the version number was 1.13.
Rec audio capacity
Interrogate (12?)
This command retrieves the current operating settings of the
RC8650. Table 2.13 lists the parameters in the order they are
transmitted from the TXD pin, the command(s) that control each
parameter, and each parameter’s range. The parameters are organized as a byte array of one byte per parameter.
30
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
COMMAND SUMMARY
Table 2.14. RC8650 Command Summary
Command
nA
nB
C/nC
D
E/nE
nF
nG
nH
nI
J/nJ
nK
L
M
nN
nO
nP
nQ
nR
nS
T/nT
U
nV
W
nX
nY
Z
@
n*
n#/n%
n&
n$
n?
Function
Articulation
Punctuation filter
Character mode/delay
Phoneme mode
Expression
Formant frequency
Protocol Options Register
Baud rate
Index marker
Musical/sinusoidal tone generators
TS pin control
Load exception dictionary
Monotone
Audio Control Register
Voice
Pitch
Sleep timer
Reverb
Speed
Text mode/delay
Enable exception dictionary
Volume
Write greeting message
Tone
Timeout delay
Zap commands
Reinitialize
DTMF generator
Real time audio playback
Prerecorded audio playback
ADC Control Register
Chipset ID/Interrogate
31
n Range
Default
0-9
0-15
0-31
–
0-9
0-9
0-255
0-10
0-99
0-99
0-3
–
–
0-255
0-7
0-99
0-31
0-9
0-9
0-15
–
0-9
255
0-2
0-15
–
–
0-16
0-99
0-9999
0-255
6/12
5
6
0
–
5
5
128
–
–
–
1
–
–
0
0
50
0
0
2
0
–
5
–
1
0
–
–
–
–
–
–
–
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 3: MUSICAL & SINUSOIDAL TONE GENERATORS
MUSICAL TONE GENERATOR
The RC8650 contains a three-voice tone generator that can be
used for creating music and sound effects. This section explains
how to program the generator.
Note The RC8650 assumes that tone generator data will immediately follow the J command; therefore, be sure not to terminate
the command with a CR or Null.
Note The musical tone generator output is available only from
the AO pins. Digital audio output is not possible.
The tone generator is controlled with four, four-byte data and command frames, called Initialize, Voice, Play, and Quit. With
these, the programmer can control the volume, duration, and frequencies of the three voices.
The musical tone generator is activated with the J command (no
parameter). Once activated, all data output to the RC8650 is directed to the tone generator.
Byte
0
1
2
3
0
KA
KTL
KTH
KD
K1
Initialize command
0
0
1
K2
K3
Voice frame
1
0
0
0
0
Quit command
Play command
Figure 3.1. Musical Tone Generator Command Formats
Initialize Command
Voice Frame
The Initialize command sets up the tone generator’s relative amplitude and tempo (speed). The host must issue this command to
initialize the tone generator before sending any Voice frames. The
Initialize command may, however, be issued anytime afterward to
change the volume or tempo on the fly.
Voice frames contain the duration and frequency (pitch) information for each voice. All Voice frames are stored in a 2K buffer within
the RC8650, but are not played until the Play command is issued.
If the number of Voice frames exceeds 2K bytes in length, the
RC8650 will automatically begin playing the data.
Initialize command format
Voice frame format
The Initialize command consists of a byte of zero and three parameters. The parameters are defined as follows:
Voice frames are composed of three frequency time constants
(K1-K3) and a duration byte (KD), which specifies how long the
three voices are to be played.
KA
KTL
KTH
Voice amplitude (1-255)
Tempo, low byte (0-255)
Tempo, high byte (0-255)
The relationship between the time constant Ki and the output frequency fi is:
fi = 16,768/Ki
The range of the tempo KT (KTL and KTH) is 1-65,535 (1–FFFFh);
the larger the value, the slower the overall speed of play. The amplitude and tempo affect all three voices, and stay in effect until
another Initialize command is issued. If the command is issued
between Voice frames to change the volume or tempo on the fly,
only the Voice frames following the command will be affected.
where fi is in Hertz and Ki = 4-255. Setting Ki to zero will silence
voice i during the frame.
KD may be programmed to any value between 1 and 255; the
larger it is made, the longer the voices will play during the frame.
32
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
and intermediate note values to be played, while maintaining the
same degree of accuracy. This is important when, for example, a
thirty-second note is to be played staccato, or a note is dotted
(multiplying its length by 1.5).
Table 3.1. Musical Note Pitch/Ki Values
Note
C
C#
D
D#
E
F
F#
G
G#
A
A#
B
C
C#
D
D#
E
F
F#
G
G#
A
A#
B
C-Mid
C#
Ki
255 (FFh)
241 (F1h)
228 (E4h)
215 (D7h)
203 (CBh)
192 (C0h)
181 (B5h)
171 (ABh)
161 (A1h)
152 (98h)
144 (90h)
136 (88h)
128 (80h)
121 (79h)
114 (72h)
107 (6Bh)
101 (65h)
96 (60h)
90 (5Ah)
85 (55h)
81 (51h)
76 (4Ch)
72 (48h)
68 (44h)
64 (40h)
60 (3Ch)
Note
D
D#
E
F
F#
G
G#
A
A#
B
C
C#
D
D#
E
F
F#
G
G#
A
A#
B
C
C#
D
Ki
57 (39h)
54 (36h)
51 (33h)
48 (30h)
45 (2Dh)
43 (2Bh)
40 (28h)
38 (26h)
36 (24h)
34 (22h)
32 (20h)
30 (1Eh)
28 (1Ch)
27 (1Bh)
25 (19h)
24 (18h)
23 (17h)
21 (15h)
20 (14h)
19 (13h)
18 (12h)
17 (11h)
16 (10h)
15 (0Fh)
14 (0Eh)
Table 3.2. Musical Note Duration/KD Values
Note Duration
Whole
Half
Quarter
Eighth
Sixteenth
Thirty-second
KD
192
96
48
24
12
6
(C0h)
(60h)
(30h)
(18h)
(0Ch)
(06h)
Using the suggested values, it turns out that most musical scores
sound best when played at a tempo of 255 or faster (i.e., KTH = 0).
Of course, the “right” tempo is the one that sounds the best.
Play Command
The Play command causes the voice data in the input buffer to
begin playing. Additional Initialize commands and Voice frames
may be sent to the RC8650 while the tone generator is operating.
The TS pin and TS flag are asserted at this time, enabling the host
to synchronize to the playing of the tone data. TS becomes inactive after all of the data has been played.
Quit Command
The task of finding Ki for a particular musical note is greatly simplified by using Table 3.1. The tone generator can cover a fouroctave range, from C two octaves below Middle C (Ki = 255), to D
two octaves above Middle C (Ki = 14). Ki values less than 14 are
not recommended.
The Quit command marks the end of the tone data in the input
buffer. The RC8650 will play the contents of the buffer up to the
Quit command, then return to the text-to-speech mode that was in
effect when the tone generator was activated. Once the Quit command has been issued, the RC8650 will not accept any more data
until the entire buffer has been played.
For example, the Voice frame
DATA 24,64,0,0
Example Tune
The Basic program shown in Figure 3.2 reads tone generator data
from a list of DATA statements and LPRINTs each value to the
RC8650. The program assumes that the RC8650 is connected to
a PC’s printer port, although output could be redirected to a COM
port with the DOS MODE command.
will play Middle C using voice 1 (K1 = 64). Since K2 and K3 are
zero, voices 2 and 3 will be silent during the frame. The duration of
the note is a function of both the tempo KT and duration KD, which
in this case is 24.
As another example,
The astute reader may have noticed some “non-standard” note
durations in the DATA statements, such as the first two Voice
frames in line 240. According to the original music, some voices
were not to be played as long as the others during the beat. The FC-F notes in the first frame are held for 46 counts, while the low F
and C in the second frame are held for two additional counts.
Adding the duration (first and fifth) bytes together, the low F and C
do indeed add up to 48 counts (46 + 2), which is the standard
duration of a quarter note.
DATA 48,64,51,43
plays a C-E-G chord, for a duration twice as long as the previous
example.
Choosing note durations and tempo
Table 3.2 lists suggested KD values for each of the standard musical note durations. This convention permits shorter (1/64th note)
33
RC SYSTEMS
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
RC8650 VOICE SYNTHESIZER
LPRINT
' ensure serial port baud rate is locked
LPRINT CHR$(1);"J"; ' activate tone generator
READ B0,B1,B2,B3
' read a frame (4 bytes)
LPRINT CHR$(B0); CHR$(B1); CHR$(B2); CHR$(B3);
IF B0 + B1 + B2 + B3 > 0 THEN 120 ' loop until Quit
END
'
'
' Data Tables:
'
' Init (volume = 255, tempo = 86)
DATA 0,255,86,0
'
' Voice data
DATA 46,48,64,192, 2,0,64,192, 48,48,0,0, 48,40,0,0, 48,36,0,0
DATA 94,24,34,0, 2,24,0,0, 24,0,36,0, 24,0,40,0, 48,0,48,0
DATA 48,40,0,192, 46,36,0,0, 2,0,0,0, 48,36,0,0, 48,24,34,0
DATA 46,24,34,0, 2,0,34,0, 46,24,34,0, 2,24,0,0, 24,0,36,0
DATA 24,0,40,0, 48,0,48,0
'
' Play, Quit
DATA 0,0,1,1, 0,0,0,0
Figure 3.2. Example Musical Tone Generator Program
34
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
The tone frequencies F1 and F2 are computed as follows:
SINUSOIDAL TONE GENERATOR
The musical tone generator is capable of producing three tones
simultaneously, and works well in applications which require neither precise frequencies nor a “pure” (clean) output. The output is
a pulse train rich in harmonic energy, which tends to sound more
interesting than pure sinusoids in music applications.
Fi = Ki x fs / 1024 (Hz)
where 0 ≤ Ki ≤ 255. Substituting the relationship fs = 617 / (155 - n)
into this equation,
Fi = Ki x 603 / (155 – n) (Hz)
The sinusoidal tone generator enables the simultaneous generation of two sinusoidal waveforms. Applications for this generator
range from generating simple tones to telephone call-progress
tones (such as a dial tone or busy signal). The frequency range is
0 to 2746 Hz, with a resolution of 4 to 11 Hz.
Depending on the value of n, Fi can range from 0 Hz to 2746 Hz.
If only one tone is to be generated, the other tone frequency may
be set to 0 (Ki = 0), or equal in frequency. Note, however, that due
to the additive nature of the tone generators, the output amplitude
from both generators running at the same frequency will be twice
that of just one generator running. Both K1 and K2 may be set to 0
to generate silence.
The sinusoidal tone generator is activated with the command nJ,
where n is an ASCII number between 0 and 99. Note the similarity
to the musical tone generator command, J, which uses no parameter. The parameter n programs the internal sampling rate, much
like the Real Time Audio Playback command does; in fact, the
sampling rate fs has the same relationship to n as the Real Time
Audio Playback command:
Note that the frequency step size and frequency range are strictly
functions of n. In general, the larger n is, the larger the step size
and range will be. The parameter Ki can be thought of as a multiplier, which when multiplied by the step size, yields the output
frequency. For example, setting n = 95 (corresponding to an internal sampling rate of 10.28 kHz) results in a frequency step size of
603 / (155 - 95) Hz, or 10 Hz. Thus, the output frequency range
spans 0 Hz to 255 x 10 Hz, or 2550 Hz, in 10 Hz steps.
fs = 617 / (155 – n)
Immediately following the nJ command are three binary parameter bytes:
As an example, suppose your application needed to generate the
tone pair 440/350 Hz (a dial tone) for say, 2.5 seconds. We will
choose n = 95, because it yields a convenient step size of 10 Hz.
The tone duration parameter Kd is calculated as follows:
nJ Kd K1 K2
where Kd determines the tone duration, and K1 and K2 set the
output frequencies of generators 1 and 2, respectively.
Kd = 2410 x Td / (155 – n)
The tone duration and frequencies are not only functions of these
parameters, but of n as well. The output amplitude is a function of
the Volume command (nV). The command and parameter values
are buffered within the RC8650, and can be intermixed with text
and other commands without restriction.
substituting Td = 2.5 (sec) and n = 95,
Kd = 2410 x 2.5 / (155 – 95) = 100
K1 (440 Hz) is computed as follows:
The tone duration Td is calculated as follows:
K1 = F1 x (155 – n) / 603
Td = Kd x 256 / fs (sec)
= 440 x (155 – 95) / 603 = 44
where 0 ≤ Kd ≤ 255. Substituting the relationship fs = 617 / (155 –
n) into the above equation,
In like manner, K2 (350 Hz) is computed to be 35.
In order to embed the command in a text file, the computed values
must be converted into their ASCII equivalents: 100 = “d”, 44 = “,”
and 35 = “#”. The complete command becomes
Td = Kd x (155 – n) / 2410 (sec)
Setting Kd = 1 yields the shortest duration; Kd = 0 (treated as 256)
the longest. Depending on the value of n, Td can range from 23 ms
to 16.5 sec.
^A95Jd,#
which can be embedded within normal text for the synthesizer.
35
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 4: EXCEPTION DICTIONARIES
Exception dictionaries make it possible to alter the way the
RC8650 interprets character strings it receives. This is useful for
correcting mispronounced words, triggering the generation of
tones and/or the playback of prerecorded sounds, or even speaking in a foreign language. In some cases, an exception dictionary
may even negate the need of a text pre-processor in applications
that cannot provide standard text strings. This section describes
how to create exception dictionaries for the RC8650.
Table 4.1. Context Tokens
Symbol
Definition
#
+
^
A vowel: a, e, i, o, u, y
A front vowel: e, i, y
A consonant: b, c, d, f, g, h, j, k, l, m, n, p,
q, r, s, t, v, w, x, z
*
:
?
One or more consonants
Zero or more consonants
A voiced consonant: b, d, g, j, l, m, n, r, v,
w, z
@
!
One of: d, j, l, n, r, s, t, z, ch, sh, th
One of: b, c, d, f, g, p, t
%
Exception dictionaries can be created and edited with a word
processor or text editor that stores documents as standard text
(ASCII) files. However, the dictionary must be compiled into the
internal format used by the RC8650 before it can be used. The
RC8650 Studio software, available from RC Systems, includes a
dictionary editor and compiler.
A suffix: able(s), ably, e(s), ed(ly), er(s), ely,
eless, ement(s), eness, ing(s), ingly (must
also be followed by a non- alphabetic
character)
&
$
A sibilant: c, g, j, s, x, z, ch, sh
A nonalphabetic character (number,
space, etc.)
One or more non-printing characters
(spaces, controls, line breaks, etc.)
EXCEPTION SYNTAX
\
|
The text-to-speech modes of the RC8650 utilize an English lexicon and letter-to-sound rules to convert text the RC8650 receives
into speech. The pronunciation rules determine which sounds, or
phonemes, each character will receive based on its relative position within each word. The integrated DoubleTalk text-to-speech
engine analyzes text by applying these rules to each word or
character, depending on the operating mode in use. Exception
dictionaries augment this process by defining exceptions for (or
even replacing) these built in rules.
~
Exceptions have the general form
`
L(F)R=P
which means “the text fragment F, occurring with left context L and right context R, gets the pronunciation P.” All
three parts of the exception to the left of the equality sign must be
satisfied before the text fragment will receive the pronunciation
given by the right side of the exception.
A digit (0-9)
One or more digits
Wildcard (matches any character)
The right side of an exception (P) specifies the pronunciation that
the text fragment is to receive, which may consist of any combination of phonemes (Table 2.1), phoneme attribute tokens (Table
2.2), and commands (Table 2.14). Using the tone generator and
prerecorded audio playback commands, virtually limitless combinations of speech, tones, and sound effects can be triggered from
any input text pattern. If no pronunciation is given, no sound will be
given to the text fragment; the text fragment will be silent.
The text fragment defines the input characters that are to be translated by the exception, and may consist of any combination of
letters, numbers, and symbols. Empty (null) text fragments may
be used to generate sound based on a particular input pattern,
without actually translating any of the input text. The text fragment
(if any) must always be contained within parentheses.
A dictionary file may also contain comments, but they must be on
lines by themselves (i.e., they cannot be on the same line as an
exception). Comment lines begin with a semicolon character (;),
so that the compiler will know to skip over it.
Characters to the left of the text fragment specify the left context
(what must come before the text fragment in the input string), and
characters to the right define the right context. Both contexts are
optional, so an exception may contain neither, either, or both contexts. There are also 15 special symbols, or context tokens,
that can be used in an exception’s context definitions (Table 4.1).
An example of an exception is
C(O)N=AA
which states that o after c and before n gets the pronunciation AA,
the o-sound in cot. For example, the o in conference, economy,
and icon would be pronounced according to this exception.
Note that although context tokens are, by definition, valid only
within the left and right context definitions, the wildcard token may
also be used within text fragments. Any other context token appearing within a text fragment will be treated as a literal character.
36
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Another example is
(O)+=OW
(O)=UW
$R(H)=
The first exception states that o followed by e, i, or y is to be pronounced OW, the o-sound in boat. The second exception does not
place any restriction on what must come before or after o, so o in
any context will receive the UW pronunciation. If the exceptions
were reversed, the (O)+ exception would never be reached because the (O) exception will always match o in any context. In
general, tightly-defined exceptions (those containing many context restrictions) should precede loosely-defined exceptions
(those with little or no context definitions).
which states that h after initial r is silent, as in the word rhyme (the
$ context token represents any non-alphabetic character, such
as a space between words; see Table 4.1).
Punctuation, numbers, and most other characters can be redefined with exceptions as well:
(5)=S I NG K O
(CHR$)=K EH R IX K T ER
(Spanish five)
(Basic function)
(RAT)=R AE T
(RATING)=R EY T IH NG
(R)=R
THE TRANSLATION ALGORITHM
In order to better understand how an exception dictionary works,
it is helpful to understand how the DoubleTalk text-to-speech engine processes text.
This is an example of how not to organize exceptions. The exception (RATING) will never be used because (RAT) will always
match first. According to these exceptions, the word rating would
be pronounced “rat-ing.”
Algorithms within the DoubleTalk engine analyze input text a character at a time, from left to right. A list of pronunciation rules is
searched sequentially for each character until a rule is found that
matches the character in the correct position and context. The
algorithm then passes over the input character(s) bracketed in the
rule (the text fragment), and assigns the pronunciation given by
the right side of the rule to them. This process continues until all of
the input text has been converted to phonetic sounds.
It can be beneficial to group exceptions by the first character of
the text fragments, that is, all of the A exceptions in one group, all
the B exceptions in a second group, and so on. This gives an
overall cleaner appearance, and can prove to be helpful if the
need arises to troubleshoot any problems in your dictionary.
The following example illustrates how the algorithm works by
translating the word receive.
TEXT NOT MATCHED BY THE DICTIONARY
The algorithm begins with the letter r and searches the R pronunciation rules for a match. The first rule that matches is $(RE)^#=R
IX, because the r in receive is an initial r and is followed by an e,
a consonant (c), and a vowel (e). Consequently, the text fragment
re receives the pronunciation R IH, and the scan moves past re
to the next character: receive. (E is not the next scan character
because it occurred inside the parentheses with the r; the text
fragment re as a whole receives the pronunciation R IX)
It is possible that some input text may not match anything in a
dictionary, depending on the nature of the dictionary. For example, if a dictionary was written to handle unusual words, only
those words would be included in the dictionary. On the other
hand, if a dictionary defined the pronunciation for another language, it would be comprehensive enough to handle all types of
input. In any case, if an exception is not found for a particular
character, the English pronunciation will be given to that character
according to the built in pronunciation rules.
The first match among the C rules is (C)+=S, because c is followed by an e, i, or y. C thus receives the pronunciation S, and
processing continues with the second e: receive.
Generally, the automatic switchover to the built in rules is desirable
if the dictionary is used to correct mispronounced words, since by
definition the dictionary is defining exceptions to the built in rules.
If the automatic switchover is not desired, however, there are two
ways to prevent it from occurring. One way is to end each group of
exceptions with an unconditional exception that matches any
context. For example, to ensure that the letter “a” will always be
matched, end the A exception group with the exception
(A)=pronunciation. This technique works well to ensure matches
for specific characters, such as certain letters or numbers.
(EI)=IY is the first rule to match the second e, so ei receives the
sound IY. Processing resumes at the character receive, which
matches the default V rule, (V)=V.
The final e matches the rule #:(E)$=, which applies when e is
final and follows zero or more consonants and a vowel. Consequently, e receives no sound and processing continues with the
following word or punctuation, if any. Thus, the entire phoneme
string for the word receive is R IX S IY V.
If the exception dictionary is to replace the built in rules entirely,
end the dictionary with the following exception:
()=
RULE PRECEDENCE
Since DoubleTalk uses its translation rules in a sequential manner,
the position of each exception relative to the others must be carefully considered. For example, consider the following pair of exceptions:
This special exception causes unmatched characters to be ignored (receive no sound), rather than receive the pronunciation
defined by the built in rules.
37
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
EFFECT ON PUNCTUATION
When Zero Isn’t Really Zero
Punctuation defined in the exception dictionary has priority over
the Punctuation Filter command. Any punctuation defined in the
dictionary will be used, regardless of the Punctuation Filter setting.
However, if the dollar sign character ($) is defined within the text
fragment of any exception, currency strings will not be read as
dollars and cents.
When reading addresses or lists of numbers, we humans often
substitute the word “oh” for the digit 0. For example, we might say
1020 North Eastlake as “one oh two oh North Eastlake.” The digit
0 can be redefined in this manner with the following exception:
(0)=OW
Arithmetic Operators
CHARACTER MODE EXCEPTIONS
Some characters may have more than one name; for example, the
character “/” may be read as “slash” or “divided by,” depending
on the context. Such characters can be redefined if their default
names don’t fit the application. For example, the arithmetic operators (/, *, ^, etc.) can be defined for mathematical applications with
the following exceptions:
Exceptions are defined independently for the Character and Text
modes of operation. The beginning of the Character mode exceptions is defined by inserting the letter C just before the first Character mode exception. No exceptions prior to this marker will be
used when the RC8650 is in Character mode, nor will any exceptions past the marker be used in Text mode. For example:
.
.
()=
(Text mode exceptions)
C
.
.
.
()=
(Character mode exceptions marker)
\(/)\=D IX V AY D IX D
B AY
\(*)\=M AH L T AX P L AY D
B AY
\(^)\=R EY Z D
T UW
.
.
etc.
(optional; used if built in rules are not to be
used in no-match situations)
(Character mode exceptions)
Acronyms and Abbreviations
Acronyms and abbreviations can be defined so the words they
represent will be spoken.
(optional; used if built in rules are not to be
used in no-match situations)
$(KW)$=K IH L AH W AA T
$(DR)$=D AA K T ER
$(TV)$=T EH L AX V IH ZH IX N
APPLICATIONS
The following examples illustrate some ways in which the exception dictionary can be used.
String Parsing & Decryption
Sometimes the data that we would like to have read is not available in a “ready-to-read” format. For example, the output of a GPS
receiver may look something like this:
Correcting Mispronounced Words
The most obvious of all applications—correcting mispronounced
words.
$GPGGA,123456,2015.2607,N,...
S(EAR)CH=ER
$(OK)$=OW K EY
The first 14 characters of the string contain fixed header and variable time data, which we don’t care about. The following exception will ensure that the first 14 characters are not read:
The first exception corrects the pronunciation of all words containing search (search, searched, research, etc.). As this exception
illustrates, it is only necessary to define the problem word in its root
form, and only the part of the word that is mispronounced (ear, in
this case). The second exception corrects the word ok, but because of the left and right contexts, will not cause other words
(joke, look, etc.) to be incorrectly translated.
($GPGGA,``````,)=
In addition, the following exceptions handle the “degrees” and
“minutes” components of the latitudinal coordinate:
,\\()\\.=D IX G R IY Z , ,
(.)=M IH N IH T S , ,
(,N,)=N OW R TH
L AE T IH T UW D
No Cussing, Please
The reading of specific characters or words can be suppressed
by writing exceptions in which no pronunciation is given.
(????)=
The four exceptions taken together will translate the example
string as “20 degrees, 15 minutes, north latitude.” (Additional exceptions for handling the seconds component, and digits themselves, are not shown for clarity).
(YOU fill in the blanks!)
38
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
the data bus lines of the bus interface (see Figure 1.7) and
hardwiring the remaining four to the appropriate logic levels, virtually any set of 16 ASCII characters can be generated, which in turn
can be interpreted by the exception dictionary.
Heteronyms
Heteronyms are words that have similar spellings but are pronounced differently, depending on the context, such as read
(“reed” and “red”) and wind (“the wind blew” and “wind the
clock”). Exceptions can be used to fix up these ambiguities, by
including non-printing (Control) characters in the text fragment of
the exception.
For example, by connecting the four control bits to DB0 through
DB3, DB4 and DB5 to VCC, DB6 and DB7 to ground and the strobe
to PWR#, ASCII codes 30h through 3Fh (corresponding to the
digits “0” through “9” and the six ASCII characters following them)
can be generated by the four control bits. Message strings would
then be assigned to each of these ASCII characters. For example,
you could make the character “0” (corresponding to all four control bits = 0) say, “please insert quarter,” with the following dictionary entry:
Suppose a line of text required the word “close” to be pronounced
as it is in “a close call,” instead of as in “close the window.” The
following exception changes the way the s will sound:
(^DCLOSE)=K L OW S
Note the Control-D character (^D) in the text fragment. Although a
non-printing character, the translation algorithms treat it as they
would any printing character. Thus, the string “^D close” will be
pronounced with the s receiving the “s” sound, wherever it appears in the text stream. Plain “close” (without the Control-D) will
be unaffected—the s will still receive the “z” sound. It does not
matter where you place the Control character in the word, as long
as you use it the same way in your application’s text. You may use
any non-printing character (except LF and CR) in this manner.
(0)=P L IY Z
IH N S ER T
K W OW R T ER
The Timeout timer should also be activated (1Y, for example) in
order for the “message” to be executed. Otherwise, the RC8650
will wait indefinitely for a CR/Null character that will never come.
The timer command could be included in the greeting message.
TIPS
Make sure that your exceptions aren’t so broad in nature that they
do more harm than good. Exceptions intended to fix broad
classes of words, such as word endings, are particularly notorious
for ruining otherwise correctly pronounced words.
Foreign Languages
Dictionaries can be created that enable the RC8650 to speak in
foreign languages. It’s not as difficult as it may seem—all that is
required in most cases is a pronunciation guide and a bit of patience. If you don’t have a pronunciation guide for the language
you’re interested in, check your local library. Most libraries have
foreign language dictionaries that include pronunciation guides,
which make it easy to transcribe the pronunciation rules into exception form.
Take care in how your exceptions are organized. Remember, an
exception’s position relative to others is just as important as the
content of the exception itself.
Exception Anomalies
On rare occasions, an exception may not work as expected. This
occurs when the built in pronunciation rules get control before the
exception does. The following example illustrates how this can
happen.
Language Translation
Exception dictionaries even allow the RC8650 to read foreign language text in English! The following exceptions demonstrate how
this can be done with three example Spanish/English words.
Suppose an exception redefined the o in the word “process” to
have the long “oh” sound, the way it is pronounced in many parts
of Canada. Since the word is otherwise pronounced correctly, the
exception redefines only the “o:”
(GRANDE)=L AA R J
(BIEN)=F AY N
(USTED)=YY UW
PR(O)CESS=OW
The sense of translation can also be reversed:
But much to our horror, the RC8650 simply refuses to take on the
new Canadian accent.
(LARGE)=G RR A N D EI
(FINE)=B I EI N
(YOU)=U S T EI DH
It so happens the RC8650 has a built in rule which looks something like this:
$(PRO)=P R AA
Message Macros
Certain applications may not be able to send text strings to the
RC8650. An example of such an application is one that is only
able to output a four bit control word and strobe. Sixteen unique
output combinations are possible, but this is scarcely enough to
represent the entire ASCII character set.
This rule translates a group of three characters, instead of only
one as most of the built in rules do. Because the text fragment PRO
is translated as a group, the o is processed along with the initial
“pr,” and consequently the exception never gets a shot at the o.
You can, however, assign an entire spoken phrase to a single
ASCII character with the exception dictionary. By driving four of
If you suspect this may be happening with one of your exceptions,
include more of the left-hand side of the word in the text fragment
(in the example above, (PRO)=P R OW would work).
39
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 5: RC8650 EVALUATION KIT
The RC8650 Evaluation Kit comes with everything required to
evaluate and develop applications for the RC8650 chipset using a
Windows-based PC. The included RC8650 Studio™ software
provides an integrated development environment with the following features:
•
•
•
•
EVALUATION KIT CONTENTS
The following components are included in the DoubleTalk
RC8650 Evaluation Kit:
•
•
•
•
•
Read any text, either typed or from a file
Easy access to the various RC8650 voice controls
Manage collections of sound files and store them in the RC8650
Exception dictionary editor/compiler, and much more...
Printed circuit board containing the RC8650-1 chipset
AC power supply
Speaker
Serial cable
RC8650 Studio™ development software CD
The evaluation board can also be used in stand-alone environments by simply printing the desired text and commands to it via
the onboard RS-232 serial or parallel ports.
J201
EVAL BOARD OUTLINE
SPEAKER
OUTPUT
SW1
U1
U2
SW2
2
1
RESET
STANDBY/INIT
AUDIO OUTPUT
& CONTROL
JP1
P1
JP2
1
JP3
2
BAUD RATE
SELECT
DOUBLETALK
A/D CONVERTER
P2
EVALUATION BOARD
JP4
JP5
P101
J1
JP6
P102
RS-232 INTERFACE
P103
1
1
2
2
PRINTER/BUS INTERFACE
TTL SERIAL
INTERFACE
40
DC POWER INPUT
(8 – 25 VDC)
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
CONNECTOR PIN ASSIGNMENTS & SCHEMATICS
Table 5.1. P1 Pin Assignments (Audio Output & Control)
Table 5.4. P101 Pin Assignments (RS-232 Serial Interface)
Pin No.
Pin Name
Pin No.
Pin Name
Pin No.
Pin Name
Pin No.
Pin Name
1
AO0
9
AS0
1
NC
6
DSR
2
AO1
10
AS1
2
RXD
7
RTS
3
SP+0
11
SUSP0
3
TXD
8
CTS
4
SP+1
12
SUSP1
4
NC
9
NC
5
SP–0
13
DAOUT
5
GND
—
—
6
SP–1
14
DARTS#
7
TS0
15
DACLK
8
TS1
16
GND
Table 5.5. P102 Pin Assignments (TTL Serial Interface)
Table 5.2. P2 Pin Assignments (A/D Converter)
Pin No.
Pin Name
Pin No.
Pin Name
1
AN0
6
GND
2
GND
7
AN3
3
AN1
8
GND
4
GND
9
ADTRG
5
AN2
10
GND
JP2
JP1
Baud Rate
X
X
X
300
X
X
X
X
1200
2400
X
X
X
4800
9600
X
Pin No.
Pin Name
1
GND
3
TXD
2
CTS
4
RXD
Table 5.6. P103 Pin Assignments (Printer/Bus Interface)
600
X
Pin Name
JP4-JP6 must be open in order to use the TTL interface
Table 5.3. JP1-JP3 Pin Assignments (Baud Rate)
JP3
Pin No.
19200
Auto-detect (default)
“X” denotes jumper installed
41
Pin No.
Pin Name
Pin No.
Pin Name
1
STB#
14
GND
2
AFD#
15
DATA6
3
DATA0
16
GND
4
ERROR#
17
DATA7
5
DATA1
18
GND
6
INIT#
19
ACK#
7
DATA2
20
GND
8
SLCTIN#
21
BUSY
9
DATA3
22
GND
10
GND
23
PE
11
DATA4
24
GND
12
GND
25
SLCT
13
DATA5
26
RD#
42
RESET
AO0
TS0
SUSP0
AS0
AO1
TS1
SUSP1
AS1
DAOUT
DARTS
DACLK
RDY
PRD
STS
PWR
RXD
TXD
CTS
BRS0
BRS1
BRS2
AN0
AN1
AN2
AN3
AMPI
AMPO
ADTRG
VCC
VCC
1
C1
22 PF
Y1
7.3728 MHZ
SW2
8
Z2 47K
Z1 47K
1
3
C2
22 PF
13
15
41
12
18
92
91
90
89
10
4
5
23
24
3
6
21
22
32
31
34
33
37
49
47
20
30
29
28
7
36
35
38
97
95
94
93
2
1
100
8
14
64
96
XOUT
XIN
ACLR#
RESET#
STBY#
SEL1
SEL2
SEL3
SEL4
SEL5
AO0
TS0
SUSP0#
AS0
AO1
TS1
SUSP1#
AS1
DAIN
DAOUT
DARTS#
DACLK
RDY#
PRD#
STS#
PWR#
BRS0
BRS1
BRS2
BRD
RXD
TXD
CTS#
AN0
AN1
AN2
AN3
AMPIN
AMPOUT
ADTRG
VSS
VSS
VSS
AVSS
C4 0.1UF
PIO7
PIO6
PIO5
PIO4
PIO3
PIO2
PIO1
PIO0
IC32
IC31
IC30
IC29
IC28
IC27
IC26
IC25
IC24
IC23
IC22
IC21
IC20
IC19
IC18
IC17
IC16
IC15
IC14
IC13
IC12
IC11
IC10
IC9
IC8
IC7
IC6
IC5
IC4
IC3
IC2
IC1
IC0
VCC
VCC
VCC
VCC
VCC
AVCC
AVREF
U1
RC8650FP
81
82
83
84
85
86
87
88
25
26
27
80
79
78
77
76
75
74
73
71
70
69
68
67
66
65
63
61
60
59
58
57
56
55
54
53
52
50
46
44
42
9
16
62
17
39
99
98
VCC
Z1
47K
3
Z1
47K
2
4
Z1
47K
1
VCC
44
42
40
38
35
33
31
29
10
15
16
30
32
34
36
39
41
43
45
25
24
23
22
21
20
19
18
8
7
6
5
4
3
2
1
48
17
26
11
28
12
27
46
PIO7
PIO6
PIO5
PIO4
PIO3
PIO2
PIO1
PIO0
IC32
IC31
IC30
IC29
IC28
IC27
IC26
IC25
IC24
IC23
IC22
IC21
IC20
IC19
IC18
IC17
IC16
IC15
IC14
IC13
IC12
IC11
IC10
IC9
IC8
IC7
IC6
IC5
IC4
IC3
IC2
IC1
IC0
VSS
VSS
VCC
VCC
VCC
VCC
C3 0.1UF
PIO[0..7]
PIO[0..7]
DATE: 3/10/01
SCALE: NONE
REVISION: B
DRAWN BY: RC
© RC Systems, Inc.
APPROVED BY:
DOUBLETALK EVAL PCB (CHIP SET)
U2
RC4651FP
37
13
14
47
VCC
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
PIO[0..7]
AO0
SP+0
SP–0
TS0
AS0
SUSP0
DAOUT
DACLK
Z2
47K
5
PIO[0..7]
VCC
1
2
1
3
5
7
9
11
13
15
P1
43
VCC
PIO7
PIO6
PIO5
PIO4
PIO3
PIO2
PIO1
PIO0
2
4
6
8
10
12
14
16
1
RDY
PWR
Z2 47K
STS
PRD
4
6
R2
47K
1
21
11
10
9
8
7
6
5
4
12
CAB
GBA#
A7
A6
A5
A4
A3
A2
A1
A0
GND
U102
74HCT652
AO1
SP+1
SP–1
TS1
AS1
SUSP1
DARTS
23
3
13
14
15
16
17
18
19
20
Z1
47K
1
10
2
4
6
8
10
1
1
1
VCC
24
22
2
VCC
CBA
GAB
B7
B6
B5
B4
B3
B2
B1
B0
VCC
SBA
SAB
C106
0.1UF
JP1
JP2
JP3
P2
2
2
2
1
3
5
7
9
7
Z1
47K
13
VCC
1
Z1
47K
9
D7
D6
D5
D4
D3
D2
D1
D0
RD#
WR#
BRS0
BRS1
BRS2
R101 4.7K
AN0
AN1
AN2
AN3
ADTRG
6
U3
74HCT14
12
5
1
VCC
19
4
25
17
15
13
11
9
7
5
3
21
1
2
6
8
26
23
10
12
14
16
18
20
22
24
P103
ACK#
ERROR#
SLCT
DATA7
DATA6
DATA5
DATA4
DATA3
DATA2
DATA1
DATA0
BUSY
STB#
AFD#
INIT#
SLCTIN#
(NC)
PE
GND
GND
GND
GND
GND
GND
GND
GND
CENTRONICS
COMPATIBLE
PARALLEL PORT
RXD
TXD
CTS
1
1
1
C103
0.1UF
C104
0.1UF
12
11
10
9
15
6
2
16
R1O
T1I
T2I
R2O
GND
V–
V+
VCC
5
4
3
1
RXD
TXD
CTS
DSR
RXD
RTS
TXD
CTS
4
3
2
1
1
6
2
7
3
8
4
9
5
RS-232C
SERIAL PORT
P102
TTL
SERIAL PORT
DB9
P101
DATE: 3/10/01
REVISION: B
DRAWN BY: RC
© RC Systems, Inc.
APPROVED BY:
DOUBLETALK EVAL PCB (I/F)
C102
0.1UF
C101
0.1UF
SCALE: NONE
13
R1I
14
T1O
7
T2O
8
R2I
C2–
C2+
C1–
C1+
U101
MAX202
NOTE: P103 MAY BE CONNECTED DIRECTLY TO A
PC COMPATIBLE PARALLEL PORT VIA A RIBBON
CABLE WITH A 26-PIN DUAL ROW SOCKET
CONNECTOR TO A DB25 MALE CONNECTOR.
OPEN JP4-6 WHEN
USING TTL PORT
JP6 2
JP5 2
JP4 2
C105
0.1UF
VCC
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
AMPI
AO0
R403
C205
0.027UF
C403
R402
C204
8200PF
R203 47K
44
3 +
U401
2 –
R401
4
C203
3900PF
4 –
7
C402
+
C401
6
2
VCC
C201
1UF
3 +
U201
LM4861
R202 22K
1
2
8
5
AMPO
U3
74HCT14
1
A/D CONDITIONING
7
6
VCC
C202 1000PF
R201 47K
2
3
1
J201
AO1
C305
0.027UF
= COMPONENTS NOT INSTALLED ON EVAL BOARD
FILTER/AMP—CHANNEL 1
TS0
SP–0
SP+0
C304
8200PF
R303 47K
C303
3900PF
4 –
+
7
1
4
8
5
U3
74HCT14
3
2
3
1
J301
FILTER/AMP—CHANNEL 2
TS1
SP–1
SP+1
DATE: 3/10/01
REVISION: B
DRAWN BY: RC
© RC Systems, Inc.
APPROVED BY:
DOUBLETALK EVAL PCB (AUDIO)
2
6
VCC
SCALE: NONE
C301
1UF
3 +
U301
LM4861
R302 22K
C302 1000PF
R301 47K
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
RESET
8–25 VDC
8
45
10
1
3
2
U3
74HCT14
9
DC JACK
J1
11
+
VCC
C5
1UF
R1
100K
SW1
D1
1N4004
+
C6
1UF
1
VI
2
GND
U4
MC78M05CDT
VO
3
C7
0.1UF
+
6
7
14
5
C306
10UF
VCC
+
U3
74HCT14
C206
10UF
VCC
DATE: 3/10/01
REVISION: B
DRAWN BY: RC
© RC Systems, Inc.
APPROVED BY:
DOUBLETALK EVAL PCB (P/S)
SCALE: NONE
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Specifications written in this publication are believed to be accurate, but are not guaranteed to be entirely free of error. RC Systems reserves the right
to make changes in the devices or the device specifications described in this publication without notice. RC Systems advises its customers to obtain
the latest version of device specifications to verify, before placing orders, that the information being relied upon by the customer is current.
In the absence of written agreement to the contrary, RC Systems assumes no liability relating to the sale and/or use of RC Systems products
including fitness for a particular purpose, merchantability, for RC Systems applications assistance, customer’s product design, or infringement of
patents or copyrights of third parties by or arising from use of devices described herein. Nor does RC Systems warrant or represent that any license,
either express or implied, is granted under any patent right, copyright, or other intellectual property right of RC Systems covering or relating to any
combination, machine, or process in which such devices might be or are used. RC Systems products are not intended for use in medical, life saving,
or life sustaining applications.
Applications described in this publication are for illustrative purposes only, and RC Systems makes no warranties or representations that the devices
described herein will be suitable for such applications.
RC SYSTEMS
1609 England Avenue, Everett, WA 98203
Phone: (425) 355-3800 Fax: (425) 355-1098
Internet: http://www.rcsys.com