msm6679al-110 Oki Semiconductor, msm6679al-110 Datasheet - Page 20

no-image

msm6679al-110

Manufacturer Part Number
msm6679al-110
Description
Si/sd Voice Recognizer, Recorder/player, And Speech
Manufacturer
Oki Semiconductor
Datasheet
A typical target accuracy of 97% is achieved with a 3% E
a 3%E
SD Recognition
In SD recognition mode, the MSM6679AL-110 can be trained to recognize up to 61 words. The
MSM6679AL-110 can support multiple speakers by switching vocabularies, but only one
speaker’s vocabulary should be active at one time.
The end user enrolls a phrase in the MSM6679AL-110’s vocabulary by recording the phrase three
times or more. The host Micro Controller Unit (MCU) controls the number of times each phrase
in enrolled. Generally, higher recognition accuracy is achieved with each additional enrollment.
The word set is made more robust by pronouncing each phrase slightly differently during initial
enrollment.
In addition to enrollment training, adaptive template updating can drive the accuracy towards
100%. The host MCU updates templates by first asking the speaker to confirm a recognized
phrase with a “yes” or “no” response, and subsequently updating the template for corresponding
words. The use of name tags (see next paragraph) facilitates this process.
MSM6679AL-110 Voice Recognition Processor
18
The samples should be generated from a randomly-ordered list, with each word spoken twice
and with a dummy word at the beginning and end. There must be >2 sec between each sample
for accurate data processing. To provide the audio fidelity required for high-quality recognition
training, a DAT recorder, together with the microphone that will be used in the final application,
is required. To ensure data integrity, data is submitted to Oki after collecting samples from the
first 20 speakers for initial screening. If acceptable, then the remaining collection may proceed.
If substitution errors are possible, collection of spare words during initial collection is
recommended. For example, alternate words to “Stop” and “Top” could be “Halt” and “First.”
Collections should contain a wide variety of the background sound conditions that will exist
during actual usage. For example, if the collection is for use in an automobile, conditions such
as vehicle speed, road conditions, various window opening positions, heater or AC blower
speeds and radio volumes should be varied during the collection. The signal-to-noise ratio
should be maintained at 20dB.
To achieve high accuracy rates, phrase selection, data collection, background initialization
strategy, and control software need careful consideration. There are no published standards for
recognition accuracy.
Oki defines accuracy by:
with the following definitions:
Parameters for Recognition Accuracy
Substitution Error
Rejection Error
Gap Error
Time-Out Error
Spurious Response Error
REJ
rate.
Name
Accuracy = 100% - E
E
RATE
Symbol
E
E
E
E
E
SUB
GAP
TME
SPU
REJ
= E
SUB
Most critical type error, e.g., Say "Five", recogrize "Nine"
Word not recognized, opportunity for operator to repeat
Word spoken before recognizer ready
Word length is too long
Sourd or imvalid word classfied as a valid word
(i.e., drop handset or speak wong word)
+ 1/2 E
RATE
REJ
RATE
, composed of a 1.5% E
Condition
¡ Semiconductor
SUB
rate and

Related parts for msm6679al-110