NEURAL NETWORK SYNTHESIZER OF PAUSE DURATION FOR MANDARINE TEXT-TO-SPEECH

(1)

Conclusion: This Letter has described ROM-based algorithms and architectures for computing x sin

(e)

and x cos

(e).

Some properties of sine and cosine functions are used to reduce the required table size. The proposed architecture can be pipelined to provide an x sin

(e)

or x cos (0) per addition or table look-up time. Simulation results show that acceptable preci- sion can be achieved in a feasible ROM size. An application of the proposed method is also presented. The proposed co- ordinate rotator is pipelined to provide a throughput of 0.25 rotations per addition or table look-up time.

20th January 1992 H.-M. Jong, L.-G. Chen and T.-D. Chiueh (Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, Republic of China)

References

1 HWANG, K. (Ed.): 'Computer arithmetic' (John Wiley & Sons, New York, 1979), pp. 201-206

2 TAYLOR, E. I., GILL, R., JOSEPH, I., and RADKE, J.: 'A 20 bit logarith- mic number system processor', IEEE Trans., 1988, C-37, pp. 190- 200

3 CONSIDINE, v.: 'CORDIC trigonometric function generator for DSP. Proc. IEEE ICASSP, 1989, pp. 2381-2384

NEURAL NETWORK SYNTHESISER OF

PAUSE DURATION FOR M A N D A R I N TEXT-TO

-

S PE EC H

Shaw-Hwa Hwang and Sin-Horng Chen

Indexing terms: Speech synthesis: Signal processing, Neural networks

A neural network based approach of pause-duration synthesis for Mandarin text-to-speech is proposed. It uses an MLP to replace explicit synthesis rules for generating pause duration from input text. By properly training the MLP using a large set of utterances, phonological rules of producing pause duration are automatically learned. Experimental results confirmed that this is a promising approach.

Introduction: In Mandarin Chinese, each character is pro- nounced as a monosyllable. Pause duration between two suc- cessive monosyllables plays an important role in the naturalness of sentential speech; pause duration is thus important prosodic information in synthesising Mandarin text-to- speech. Traditionally, it is synthesised by a rule-based approach [I-23. Phonological rules are invoked to imitate the human pronunciation process of generating pause duration from a given text. Although a rule-based approach is simple, the process of rule inference is tedious. Besides, because various linguistic features may interactively affect the pronunciation of pause duration, rules are usually incomplete.

In this Letter, a neural network based approach of pause- duration synthesis is proposed. The basic idea is similar to the NETalk used for assigning parameters of allophones to each English character according to the context [3]. In our method, a multilayer perceptron (MLP) is employed to replace explicit synthesis rules to generate pause duration according to the context. By properly training the MLP with a large training set using the error back-propagation algo- rithm, phonological rules are expected to be automatically deduced and implicitly memorised. The MLP can hence be taken as a mechanism of pause-duration synthesis. Intensive study of linguistics for rule inference is therefore unnecessary. Proposed system: Fig. 1 shows the block diagram of the pro- posed system. It consists of two main parts. The first one is a text analysis in which some linguistic features representing the context are extracted. The second one is a single-output MLP serving as the mechanism of generating pause duration from these linguistic features. It is noted that the nonlinear oper- ation in the output node of an MLP is removed for linearly generating multilevel values of pause duration.

720

Although linguistic features on various levels may affect the pronunciation of pause duration, in this study only some rele- vant linguistic features, listed below, are extracted from neigh- bouring context in the text analysis:

(1).The type of initial of the ensuing syllable: six broad types of initial listed below were used:

(a) /m, n, I, r, 'null'/ (b) /h, sh, shi/ ( 4 lb, d.

(4

itz. j, jil

(4

IP. t. k/

(f)

Its, ch, chi, f, s i

(2) The tone of the ensuing syllable: five lexical tones were used

(3) .One positional parameter: is the ensuing syllable the ending syllable of a sentence?

(4) Two phrasal parameters:

(a) does the processing location precede a polysyllabic phrase? (b) does the processing location lie within a polysyllabic phrase?

(5) Others: does there exist an intentional pause or breath? From the above discussions, a total of 15 binary contextual features were used. We note that the last feature is used to compensate for the effect of unusual pauses that occasionally occurred in the training utterances.

input text

?-l

t e x t analysis

I I

lingyistic features M L P I

I

.)

p a u s e d u r a t i o n

Fig. 1 Block diagram ofproposed system

Simulations: The performance of the proposed approach was examined by simulations. Two sets of sentential utterances spoken by a female announcer were recorded from TV news. The first database comprising 2278 monosyllables was employed to train the MLP and the second database compris: ing 584 syllables was used for outside testing. All utterances are natural and fluent. They were manually segmented into syllable periods for extracting pause durations. Contextual features were also manually extracted.

Table I lists the average mismatch error between original and synthesised pause durations. Average mismatch errors of Table 1 AVERAGE MISMATCH ERROR

BETWEEN ORIGINAL AND SYNTHESISED PAUSE DURATIONS 198611) Absolute Mean variation Statistics of

pause duration 7.33 frames 8.10 frames

Inside test 1.84 frames

Outside test 1.88 frames

Average mismatch error

1 frame = 3.75ms.

(2)

1.84 frames (6.9ms) a n d 1.88 frames (7.1 ms) were achieved for t h e inside an d the outside tests, respectively. Comparing these results with the statistics of the training database with mean 7.33 frames a n d average absolute variation 8.10 frames, the system performed quite well. By closely analysing experimental results, we found that

a

very high hit rate, 95.83%,

of

correct determination a s t o whether a pause occurred o r n o t was achieved for the inside test and 87.14% for the outside test.

C o n c l u s i o n s : A novel neural network based approach for synthesising pause duration for Mandarin text-to-speech has been presented. It simply uses a n M L P to replace explicit synthesis rules

as

the mechanism

of

pause duration synthesis. Experi- mental results showed that, by this approach, reasonably good pause information can he efficiently generated using only some simple contextual features. It is thus a promising approach.

3rd February 19Y2 Shaw-Hwa Hwang and Sin-Horng Chen (Department of Communica- tion Engineering and Center f o r Telecommunications Research, Nation- al Chiao Tung University, Taiwan, Republic

of

China)

References

1 LEE, L. s., TSENC, c. Y., and OUH-YOUNG, M.: ‘The synthesis rules in a Chinese text-to-speech system’, IEEE Trans., 1989, ASP-37, pp. 1309-1319

ZHANG, 1 . : ‘Acoustic parameters and phonological rules of a text- to-speech system for Chinese’. Proc. IEEE Int. Conf. Acousl., Speech, Signal Processing, Japan, 1986, pp. 2023-2026

3 SUNOWSKI, T. J., and ROSENBERG. c. J . : ‘NETtalk: a parallel network that learns to read a l o u d . TR. JHU/EECS-S6/OI, The Johns Hopkins University Electrical Engineering and Computer Science, 1986

2

LiNbO, WAVEGUIDE SHG DEVICE WITH

FERROELECTRIC-DOMAIN-INVERTED

GRATING FORMED BY ELECTRON-BEAM

SCANNING

M.

F u j i m u r a ,

T.

S u h a r d a n d

H.

N i s h i h a r a

Indexing terms: lntegraied optics, Nonlinear optics, Optical waveguides, Harmonic generation, Electron beam lithography A ferroelectric-domain-inverted grating was fabricated by electron beam scanning in LINhO,. A waveguide second harmonic generation (SHG) device with the grating was fab- ricated and demonstrated for the first time. The experiments were performed using a CW-Nd : YAG laser, and normalised SHG conversion efficiency of 50%/W was obtained.

I n t r o d u c t i o n : T h e LiNbO, waveguide quasiphase-matching (QPM) second harmonic generation (SHG) device with ferroelectric-domain-inverted grating is one of the most promising devices for a compact short-wavelength coherent light source 11-51. Formation

of

the domain-inverted grating is an important process in the fabrication of such devices. Inversion techniques such a s T i indiffusion into the

+

z face of L i N b O , [l, 21, Li,O outdiffusion from the

+z

face [3], and SiO, cladding on the + z face and heat treatment [4] have been used to form the domain-inverted gratings in waveguide SHG devices an d waveguide S H G experiments have been performed.

It has been reported recently that electron beam (EB) scan- ning on the - z face induced the inversion without poling field a t room temperature [6, 71. This technique provided domain walls perpendicular to the surface an d continued to the + z

face. Bulk S H G experiments have been performed [7]. Such an inversion structure allows larger cross-sectional overlap between guded waves a n d the domain-inverted g r a t i n g which is an important requirement for waveguide SHG devices. Effective application of the inversion structure for implement- ing efficient waveguide

SHG

devices is expected. This Letter

ELECTRONICS LETTERS

9th

April

1992

Vol.

28

No.

8

reports a waveguide S H G device fabricated by EB scanning inversion for the first time.

D o m a i n inversion: The

+ z

face of LiNbO, of 0. 5” thick- ness was coated with a n Au film of 5 0 n m and EB was scanned o n the - z face at room temperature to form domain inverted gratings

of

6.4pm period. The EB acceleration voltage, current, a nd diameter were

20kV,

0.3nA, and -0.3 pm, respectively. The inversion pattern was observed after etching in a mixture of

HF

: H N O , , which etches only the -z face of LiNbO,.

Fig. 1 shows an S E M photograph of an etched top surface. Although the EB was scanned continuously along the grating line, the inversion occurred in segmented regions. The separation between two adjacent segments was - l p m , a nd it was independent of EB scanning speed that ranged from 0.02 to 0.1 mm/s. Because the electric field due to accumulated electrons may be related t o the inversion pattern, E B scanning by another scanning mode may result in a different inversion pattern. EB scanning by a dotted-line-like mode was tried. Th e inversion pattern tended to be segmented similarly to the results for the continuous scanning mode. F o r large E B dot spacing, the separation of the domain segments corresponded to the EB spacing. However, for E E dot spacing less than 1 pm, the domain separation was 1 pm an d independent of the EE d o t spacing.

4 It

s e g m e n t separation

Fig. 1 S E M photograph of domain inverted grating fabricated by elec- triin heam scanning (aftrr etching)

It was found that excessive accumulation of electrons limited the area of the domain-inverted grating. Gratings u p to a few square millimetres in area were successfully fabricated. However, for EB scanning over larger areas, domain inverted grating was obtained only in a part of the scanned area a nd no inversion occurred in the area scanned.

D e v i c e f a b r i c a t i o n : A prototype waveguide QPM-SHG device was fabricated by EB scanning inversion. The device had a fanout domain-inverted grating and a channel waveguide array, as shown in Fig.

2,

t o compensate

for

residual phase mismatch

[ 2 ] .

The fanout grating was divided into three parts to obtain

SHG

interaction length of 3 . 3 m m against the grating area limitation mentioned before. The grating period ranged from 5.76 t o 7-04pm.

Th e domain-inverted grating was formed by dotted-line-like EB scanning with EB d ot spacing 2.3 pm a nd scanning speed

0.03

mm/s. T he channel waveguide array was fabricated by selective proton-exchange in pure benzoic acid a nd annealing r41.

S H G e x p e r i m e n t s ; CW -N d : YAG laser light of 1.06pm wavelength was end-coupled through a x

20

lens a nd a TM-like fundamental guided-mode was excited in one of the channel waveguides.

A

guided-mode SH wave by first order QPM was obtained from the channel with the domain-inverted grating