• 沒有找到結果。

Voice and text messaging-a concept to integrate the services of telephone and data networks

N/A
N/A
Protected

Academic year: 2021

Share "Voice and text messaging-a concept to integrate the services of telephone and data networks"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

Voice and Text Messaging ---A Concept to Integrate the Services of Telephone and Data Networks

Lin-shan Lee and Ming Oun-young

Dept. of Electrical Engineering National Taiwan University Taipei, Taiwan

Abstract

Although ISDN is one of the major trends for telecommunication development, it is also highly desired to have some intermediate technology which can immediately improve the services of the current networks before actually completing the implementation of ISDN. In this paper, a concept of Voice and Text Messaging System (VTM) is proposed, which can integrate the distinct services of both the telephone and data networks very quickly. In Taiwan, Rep. of China, the telephone network has very wide coverage and large number of users, while the data network has very limited number of subscribers because they have to possess a terminal. The center of VTM described here is a Chinese text-to-speech system which can transform any Chinese text processed in the data network into Mandarin voice, and therefore this voice signal can be transmitted through the telephone network and received by the telephone network users. These users can key in their instructions such as choice of information, text processing, forward and backward skipping by pressing the touch-tone buttons of the telephone set. The electronic mail and database information services provided by the data network therefore become a portion of the voice mail and message sercies provided by the telephone network. The large number of telephone network users, even without a terminal, can thus be served by both networks.

I. Introduction

The information age coming true today can be characterized as a time of exploding demand for communication applications and services of all kinds. These demands create both great opportunities and difficult challenges for the development of advanced telecommunication technologies. One of the major trends for this development is expected to be the lntegrated Services Digital Network (ISDN), in which a full range of voice, data, text and video services will be supported. However, the current telecommunication net- works were planned several decades ago, and it takes time for these current networks to

-

(Tel) ( 0 2 ) 3 9 2 - 2 4 4 4

be eventually changed into ISDN. It is therefore also highly desired to have some intermediate technology which can improve very quickly the :ervices the current net- works can offer without actually completing the implementation of ISDN. In this paper, a concept of Voice and Text Messaging System (VTM) is proposed, which can provide all current telephone network users the access to all information services given by the current data network. Although the concept is described in this paper in the form of handling Inform ation in Chinese language, it is definitely not limited to any country or any language.

In Taiwan, Rep. of China, the public switched telephone network has long been developed very well. In average about every three persons share a telephone set, and the network covers almost the entire country, from the mountains to the islands. On the other hand, the public switched data network was established several years ago, but the users are always very limited. Although this data network is supported by very good database information services, only very few subscribers possessing terminals or computers can have access to this network. Because these two networks are completely different, one serving voice and the other data; they can not communicate with each other. The idea of the Voice and Text Messaging System (VTM) is to provide a channel for the large population of the telephone network users to have access to the data network. The center of VTM is a Chinese text-to-speech system which can transform any Chinese text process- ed in the data network into Mandarin voice, as if read by a human being, and therefore this voice can be transmitted through the telephone network and received by the telep- hone network users. The users can "key in" the instructions for handling the text by pressing the touch-tone buttons of the telep- hone set. Operations such as choice of information, message processing, forward and backward skipping, etc. can all be made available. In this way, the large number of users all over the country, even without a terminal, can be served by the data network

(2)

through a telephone set. The electronic mail services and voice mail services can thus be combined, and the services of the networks improved. This is considered as an alterna- tive technology which can improve the network services immediately before going to ISDN and without having to make any modification on the current networks. The detailed concepts, approaches, design and operations of this system will be discussed in the following sections.

11. The Basic ConceDt

The major elements of the current tele- communication networks in Taiwan, Rep. of China [ I ] are depicted in Fig 1 . The public switched telephone network is well developed, having wide spread coverage all over the country and very large number of users. Some computers and terminals can also make use of this network through a modem. The voice mail services are recently provided which is achieved by a voice information center and a voice message service center. The user can receive his voice mail from his voice mail box very conveniently. On the other hand, the public switched data network is relative- ly new, having only limited number of users because every user has to possess a terminal or computer. Electronic mail services and database information services are provided in this network through the text message service center and the database information center, the latter can provide many usefal public information stored in the database. The above two networks are mutually independent, one serving voice and the other data. They can not communicate with each other. Only very small number of users who are subscr- ibers of both networks and possessing both a telephone set and a terminal or computer can have access to both networks. Therefore most of the large number of users of the telephone network can not be served by the data net- work.

The concept of the Voice and Text Messaging System (VTM) is shown in Fig. 2, which provides a channel for communication between the above two networks. The center of VTM is a Chinese text-to-speech system

[ 2 , 3 ] which can transform arbitrary Chinese text into Mandarin voice as if read by a human being. Such a text-to-sheech system has been successfully developed and implemen- ted, which is the key for the concept of VTM. Although the guality of the synthesized speech is not very satisfactory yet, the intelligibility is very high even after being transmitted through telephone channels. Therefore all text information processed in the data network, including the electronic mail and database information, can be tra-

nsformed into voice, transmitted through the telephone network, and received by the tele- phone network users, as long as the text is in Chinese. In this way, we don't have to make any modifications on the current net- works, and the voice mail, electronic mail and database information services remain the same. In addition, the electronic mail and database information services can be transfo- rmed into voice and become a portion of the voice mail service. For example, the sub- scribers of the data network can send their electronic mail to a telphone network user who doesn't have a terminal or computer beca- use the mail can be transformed into voice and stored in the receiver's voice mail box. Also, the telephone network users can make use of the database information services by listening to the information stored in the database and read by the text-to-speech system. Therefore the large number of users can all be served by the data network through a telephone set, even without a terminal, and the electronic mail and database information services can be included in the voice mail service. The services of the telecommunica- tion networks can thus be immediately improved.

A nice feature of information in form of text is that the text can be read by the users selectively and repeatedly. The readers can easily skip the parts not intere- sting and repeat on the parts of special interest to them. This feature should there- fore also be implemented on VTM. The opera- tions on the text such as choice of informa- tion, message processing and forward and backward skipping can be acomplished. The instructions of the users can be keyed in by pressing the touch-tone buttons of the telep- hone set. Of course, the convenience and efficiency achieved can not be the same as text processing on the data network, because a user sitting before a terminal can read many words on the screen in a sight simultan- eously, but can only listen to the voice word by word. A l s o , there are many other special features of text information which can't be obtained in VTM. For example, if a telephone user would like to have a copy of the text, he has to go to a different place with a printer available to receive the copy. Fur- thermore, many types of information such as graphs, figures, tables, paragraphs, size of

the characters or letters, punctuation marks, etc. are very difficult to be presented in voice. In other words, by extending the electronic mail and database information services to the entire telephone network, inevitably the level of the services in the telephone network can't be equally high as that in the data network.

(3)

111. The System Configuration and Operation Procedures

The simplified block diagram of VTM is shown in Fig. 3 . The central part of the system is a Chinese text-to-speech system which will be described in detail later, and a control center which is in charge of the control, management, and operation of the complete system. The telephone network inter- face receives all messages and instructions from the telephone network and transfer them to the control center, and the data network interface receives all messages and instruc- tions from the data network and transfer them to the text buzfer and the control center. The text information to be read by the text- to-speech system will be temporarily stored in the text buffer after obtained by the data network interface from the data network. The voice information synthesized by the text-to- speech system will then be temporarily stored in the voice buffer before transmitted to the telephone network through the telephone net- work interface. The control center is respon- sible for the processing of all instructions and messages from both networks and the control of all operations in the system.

When a telephone network user would like to use the database information provided by the data network, he can call VTM and key in the instructions. These instructions will be transferred to and processed by the control center. The control center then notices the data network the desired services via the data network interface. The text information is then provided by the data network, stored in the text buffer, read by the text-to- speech system, and sent to the user through the telephone network interface and the telephone network. The feedback information usually appearring on the screen in the data network such as "The requested information is not found, try again", "For further informa- tion, please press 62" can be provided by

synthesized voice through the telephone net- work. The telephone network user can also skip forward or backward a given lenqth of text or choose to repeat a desired text information by pressing previously assigned numbers. On the other hand, when a data network subscriber would like to send elec- tronic mail to a telephone network user, the mail will be sent to VTM. His instructions will be transferred to the control center through the data network interface. The text of the mail will be transformed into voice by the text-to-speech system and transmitted to the voice mail box by the telephone network. The telephone network user will receive the mail just as voice mail, except the voice is synthetic. He can also select, skip or repeat a given part of the mail by pressing previously assigned numbers. Of course, if such a system is to be actually implemented

and operated in practical telecommunication networks, special considerations should also be given to the traffic needs, the necessary number of channels, and the implied hardwave design structures.

IV. The Chinese Text-to-speech System

Here we are going to very briefly describe the Chinese text-to-speech system, which is in fact the center of VTM. This system can transform arbitrary given text in Chinese into Mandarin voice, and is success- fully implemented with satisfactory perform- ance. The design approach is based on a syllable concatenation model due to special considerations on the characteristics of Chinese language [2,31.

There are at least some 13 thousands of commonly used Chinese characters, each character is monosyllabic. There are at least some 60 thousands of commonly used words in Chinese, each composed of from one to several characters. However, the total number of different syllables in Mandarin speech is only about 1300. The use of syll- ables as the basic units to synthesize Mandarin Chinese therefore becomes a very natural choice. Speech waveforms for Chinese sentences can be synthesized directly by simply concatenating the syllables in the sentences and adjusting the parameters describing the acoustic properties of these syllables. Another very special important feature of Mandarin Chinese language is the existence of the lexical tones. Chinese is a tonal language. There are basically four different tones, i.e., the high-level tone (usually referred to as the first tone), the mid-rising tone (the second tone), the mid- falling-rising tone (the third tone), and the high-falling tone (the fourth tone). It has been shown that the primary difference for the four tones is in the pitch contours, and in fact there exist standard patterns for the pitch contours which will produce the four tones. If the differences among the syll- ables due to lexical tones are disregarded, only 418 syllables are required to generate

all the pronunciations for Mandarin Chinese. The Chinese text-to-speech system based on the above syllable concatenation concept has a block diagram shown in Fig. 4 . In the database the LPC coefficients for the 418

first-tone syllables and the standard patterns for the pitch contours of the four lexical tones are stored. The synthesis rules are a set of general rules which deter- mines how the parameters describing the acoustic properties of the syllables should be adjusted when the syllables are concaten- ated to form unrestricted sentences with arbitrary text. These rules are the key technology to obtain synthesized voice with

(4)

satisfactory guality. They will be summarized very briefly in the following:

( 1 ) The Tone Concatenation Rules

When the syllables with different tones are concatenated in natural speech, the standard patterns for pitch contours are subject to various modifications.For example, if a fourth tone precedes another fourth tone without any pause between them, the first fourth tone will be modified such that the slope of the pitch contour will be decreased by 20%. A l s o , if a fourth tone is followed by a third tone, the third tone will be modified such that the entire pitch contour should be shifted up to make a continuous contour connecting that of the preceding syllable, etc.

( 2 ) Special Sandhi Rules for the Third Tone The third tone has a "mid-falling- rising" pattern for the fundamental frequen- cies. However, such a third tone is produced fully only for special occasions. In many cases only the first half or the second half of the third tone will be produced depending on the syllable it precedes.

( 3 ) Stress Rules and Intonation Patterns When two syllables form a word, the stress i s in general assigned to the second syllable, although exceptions exist in some cases. When more than two syllables form a word, the primary stress is given to the last syllable, the secondary stress to the first syllable, while those in between are least stressed. Also, the intonation pattrn of a declarative sentence is in general declining, etc.

( 4 ) Syllable Duration Rules, Pause lnsertion Rules and Energy

Modification Rules

The duration of each syllable should be adjusted according to different factors such as the tone, the initial consonant, the word it forms, etc. A l s o , pauses of different length should be assigned to different punctuation marks and syntactic boundaries. The energy level of each syllable should be modified based on different considerations such as the tone, stress, etc., too.

The flow chart of the complete text-to- speech system is shown in Fig. 5. The system first extract the parameters for the syll- ables from the database according to the input text. The syllable duration is then defined, pause inserted, pitch periods adjusted, energy modified, and the speech synthesizer finally produces the output speech. The system is implemented using Digital Signal Processors with the aid of a personal computer. The intelligibility of the synthesized speech i s tested and found to be very high, even after transmission through telephone channels. This is why this system can be used to develop VTM.

V. Conclusion

The concept of a Voice and Text Messag- ing System (VTM) is described in this paper. It can help the telephone network users to have access to the data network services, and make the electronic mail and database information services a portion of the voice mail service. The services of the telecommun- ication networks can thus be improved immediately.

References 1 1 1

121

131

Proceedings of the Telecommunications Laboratory, Telecommunications Labor- atory, Directorate General for Tele- communications, Rep. of China.

Ming Ouh-young, Chiu-yu Tseng, Lin-shan Lee, "Design Considerations and Prelimi- nary Results for a Chinese Text-to- Speech System, "1 984 International Computer Symposium, Dec. 1984, Tamkang University, Taipei, Taiwan, Rep. of China. pp. 1331-1341.

Ming Ouh-young, Chiu-yu Tseng, Lin-shan Lee, "A Chinese Text-to-Speech System Based on A Syllable Concatenation Model,

"1986 International Conference on Acous- tic, Speech and Signal Processing, Apr. 1986, Tokyo, Japan, pp.2439-2442. I Teleohone ~~ Network Voice Message Service Center Modem Termina Comput Database Informatior Center Public Switched Data Network Text Message Service Center 1 1

Fig 1 . The current telephone network and data network.

No communicatlon between the two.

(5)

Voice Information Center Public Switched Teleohone Interface

li'

c voice Buffer V Voice Message Service Center Control Center I I

II

c Chinese Text-to-speech system L Termina Compu Text I P Buffer

11

-

Data Network Oatabase Informatior Center U Public Switched Energy Modification Data Network Text Message Service Center o f the text-to- speech synthesis Fig 2. The concept of the Voice and Text Messaging

System I V T M )

Ihe Synllvrir Ruler

4 1 1 1st-lone

Database Speech Composition

input

/

Fig.4. The block diagram of the Chinese text-to-speech system based on the syllable concatenation concept

To the telephone network

Fig.). The system

confiquration of VTM

U

lb the data network

BEGIN

Q text,(Syntactic Structure)

I

I

Obtain the data for the syllables

PITCH ENERGY Coefficients

I

I

.

Syllable duration Adjustment

I

Purse insertion

+7

Pitch adjustment

algorithm

Speech Synthesizer

數據

Fig  1 .   The  current  telephone network  and  data  network.

參考文獻

相關文件

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

It is intended in this project to integrate the similar curricula in the Architecture and Construction Engineering departments to better yet simpler ones and to create also a new

(It is also acceptable to have either just an image region or just a text region.) The layout and ordering of the slides is specified in a language called SMIL.. SMIL is covered in

„ An adaptation layer is used to support specific primitives as required by a particular signaling application. „ The standard SS7 applications (e.g., ISUP) do not realize that

¾ PCS systems can connected to Public Switched Telephone Network (PSTN)6. ¾ Goal of PCS:enabling communications with a person at anytime, at any place and in any

This option is designed to provide students an understanding of the basic concepts network services and client-server communications, and the knowledge and skills

Microphone and 600 ohm line conduits shall be mechanically and electrically connected to receptacle boxes and electrically grounded to the audio system ground point.. Lines in

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.