TV-news archive - 網路多媒體數位浮水印及其應用之研究

A fully automated Web-based TV-News System[40, 41] consists of three modules: (1) TV news video acquisition, (2) news content analysis, and (3) user interface for news query, search and retrieval. Figure 6.16 depicts the overall architecture and interaction of these three modules. The major tasks of the acquisition module are to record TV news programs in a proper video format, and to fetch related news text contents from Internet webs. Content analysis module segments the recorded news video into story based units, and extracts news title and keywords from each story unit. Providing a friendly querying and browsing environment for retrieving interested news stories is the most important task of the user interface module.

The overall content processing and analyzing are briefly described in the followings. At the beginning, a TV news program is captured and encoded into stream video format. The recorded streaming video is named and tagged first, and then stored in database. In the meantime, a shot detector is used to segment the streaming video into scene based shots for key-frame extraction and generation. Within a shot, speaker identification techniques are then applied to detect anchor frames. The close captions in the anchor frames are then extracted and recognized by using video OCR techniques[41] as candidates for the title and keywords of each story units. The extracted keywords can then be used in matching with (1) the Internet news stories to construct links between TV news stories and Internet news, and (2) the users’ query text words for retrieving their interested news stories.

6.7 Summary

This chapter addresses techniques and possible applications of fully automated informa-tion mining on a multimedia TV-news archive. The proposed automated informainforma-tion mining contain the following processes: (1) segmenting a TV-news program video record-ing into scene clips, (2) usrecord-ing video OCR to extract and recognize close-caption and/or image characters into keywords for each scenes, (3) using keywords to generating semantic

labels for each scenes, and (4) segmenting commercial video clips from news clips. Infor-mation associated with various labels and scenes (e.g., the starting and ending time of a scene) are stored in the proposed news information tree. Performing statistical analysis on the data items in the news information tree can reveal hidden information, like popu-lar channels and evolution of some hot news stories. These information can help general multitude in finding their favored or desired news-channel, searching focal point person, tracking hot news stories, ..., and so on.

The proposed web based TV news archive were also used as a test bed of the proposed image watermarking and video fingerprinting methods. Watermarks can be embedded automatically in key frame extraction procedure, and information on shot change can be used as signals such that the server can change clients’ descrambling key (and also the fingerprints) to increase the security [34]. In multimedia applications, computational effi-ciency is one of the important issues. Our testing shows that the proposed watermarking and fingerprinting methods are efficient to real world multimedia applications.

Date 1

Figure 6.5: The data structure of a news information tree.

Detect Commercial

A A A A

A A A C CA

Story1 Story2 Story3 A:anchor C:commercial W:weathercast Detect Anchor Video Clip

Detect Weather Report Shot News Program

Extract News Stories

Figure 6.6: The flow diagram of the proposed news story analysis and information extraction processes.

Max

Anchor’s Cluster

Audio Segmentation Based on BIC Audio Clustering

X

X X

C

X : Audio Feature Stream

Figure 6.7: The flow diagram of a BIC-based audio segmentation method.

Newshawk

Figure 6.8: The general structure of a news story. On-site scene story contains three major news contents:

locations, interview and tables or quoted words.

:locality News Story Newshawk Voice Model

Mark the Rest of Newshawk Periods as Locality Mark Data Chart from Newshawk’s Period Mark the Rest as Interview

Mark Newshawk Speaking Periods N N

N I N

Figure 6.9: On-site scene segmentation flow.

Close Captions

Figure 6.10: Information flow of the generation of a news information tree.

Location

Event

Figure 6.11: An example of locality scene frame. The locality scene is used to show where and what the news occurred. Thus, the location information and event description can be retrieved from the close-captions of a locality scene.

Interviewee

Interviewee’s Point

Figure 6.12: An example of interview scene frame. The interview scene is used to present the news persons’ point of view. Thus, the interviewee’s name and their opinion can be extracted from the screen characters or closed captions.

Chart Subject

Column Index

Row Index

Figure 6.13: An example of data chart scene frame. The data chart scene is used to present information in a organized manner. Additional information is also available from the on screen characters.

0 500 1000 1500 2000 2500 3000 3500 4000

"politics"

"society"

"sport"

Figure 6.14: Three sets of (representative) keywords are used to associate the appearing frequency of social, political and sport news in a news program.

0 2 4 6 8 10 12 14 16

0 20 40 60 80 100 120 140 160

"elect"

Figure 6.15: The life-cycle of a specific news events along with a period of time.

Figure 6.16: The overall architecture and information processing flow of the proposed fully automated web-based TV-news system.

Chapter 7 Conclusions and future works

7.1 Conclusions

In this dissertation, progressive image watermarking schemes and video fingerprinting schemes for a web-based multimedia news archive were proposed. Progressive transmis-sion of images is very useful and widely used in many applications, especially in image transmission over the Internet. In this dissertation, we first propose a progressive image watermarking scheme. In this scheme, the watermark is embedded in such a way that we can retrieve part of it even when the watermarked image is still being transmitted.

As transmission progresses, the retrieved watermark has a decreasing bit error rate. Our proposed methods can not only confirm the watermarked image progressively, but also in-telligently select watermark modification values. The significance of the proposed method can protect digital right at distribution side, instead of at the user side. Also, when a par-tial of image is determined to be free of piracy suspect, the progressive detection process can be terminated to save computational resources as well as the network bandwidth.

We also propose a new video scrambling and fingerprinting approach for digital media right protection. The proposed method contains two parts: (1) video scrambling at server side, and (2) fingerprint embedding at client side. First, a content server scrambles and multicasts video contents to end users. Then by applying a (v, k, 1)-BIBD scheme, the server partitions a descrambling key into v = O(√

n) descrambling subkeys, and multicasts to n users. On receiving descrambling subkeys from the content server, each

user combines these subkeys into a descrambling key embedded with a fingerprint. By using he’s descrambling key, a scrambled video becomes a fingerprinted video designated to the user. In general, embedding fingerprints may often generate some kinds of noise to the video contents. According to the experiment results, when the fingerprint consisting of less than 15 watermarks or watermark strength α is less than 0.4, the PSNR of video frames can be 35 or higher. This is visually acceptable.

Finally, an integrated information mining techniques for multimedia TV-news archive is addressed. The utilizes techniques from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. By using acoustic analysis, a news program can be partitioned into news and commercial clips, with 90%

accuracy on a data set of 400 hours TV-news recorded off the air from July 2003 to August of 2004. By applying speaker identification and/or image detection techniques, each news stories can be segmented with an accuracy of 96%. On screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories.

The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodality query on specific news stories.

The proposed web based TV news archive were also used as a test bed of the proposed image watermarking and video fingerprinting methods. Watermarks can be embedded automatically when key frames are extracted, and information on shot change can be used as signals to change clients’ fingerprints to increase the security. In multimedia applications, computational efficiency is one of the important issues. Our testing shows that the proposed watermarking and fingerprinting methods are efficient to real world applications.

7.2 Future works

Following the research described in this dissertation, future works can be focused on the following topics:

• A watermark detection system can be developed to actively detect watermarked images on the Internet. Moreover, information about images, such as URL and a few wavelet coefficients can be cached so that a user can query his/her watermarked image with the watermark.

• Up to the present, most image watermarking approaches have embedded watermarks in fixed frequency subbands. However, as shown in the experimental results in Section 3.4 and Section 4.4, the proper subbands for watermark embedding are different among images. Thus, based on the proposed methods, an progressive watermarking scheme that can select frequency subbands adaptively should be developed.

• Up to now, Video multicasting have not been broadly used on the Internet. Af-ter a commonly used protocol or standard is appear, we will modify the proposed fingerprinting method slightly to fit the multicast method.

• The proposed TV news archive is mainly for Chinese news in Taiwan. In the future, TV news in other languages will be processed, recorded and mined as well.

Bibliography

[1] I. J. Cox, M. L. Miller, and J. A. Bloom, Digital Watermarking: Principles & Prac-tice. New York: Morgan Kaufman Publishers, October 2001.

[2] I. J. Cox, M. L. Miller, J. A. Bloom, J. Fridrich, and T. Kalker, Digital Watermarking and Steganography. London: Elsevier Science & Technology, November 2007.

[3] E. Koch and J. Zhao, “Towards robust and hidden image copyright labeling,” in Proc. of 1995 IEEE Workshop on Nonlinear Signal and Image Processing, Halkidiki, Greece, June 1995, pp. 452–455.

[4] C.-T. Hsu and J.-L. Wu, “Hidden digital watermarks in images,” IEEE Trans. Image Processing, vol. 8, no. 1, pp. 58–68, January 1999.

[5] J. Zhao and E. Koch, “Embedding robust labels into images for copyright protection,”

in International Congress on Intellectual Property Rights for Specialised Information, Knowledge and New Technologies, Vienna, Austria, 21-25 1995.

[6] F. Y. Duan, I. King, L.-W. W. Chan, and L. Xu, “Intra-block max-min algorithm for embedding robust digital watermark into images,” Multimedia Information Analysis and Retrieval, pp. 255–264, 1998.

[7] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, “Techniques for data hiding,” IBM Systems Journal, vol. 35, no. 3,4, pp. 313–336, 1996.

[8] J. K. Su, J. J. Eggers, and B. Girod, “Capacity of digital watermarks subjected to an optimal collusion attack,” in Proc. Eur. Signal Process. Conf., vol. 4, September 2000, pp. 1981–1984.

[9] D. Boneh and J. Shaw, “Collusion-secure fingerprinting for digital data,” IEEE Trans.

Information Theory, vol. 44, no. 5, pp. 1897–1905, September 1998.

[10] B. P. Hans-Jurgen Guth, “Error- and collusion-secure fingerprinting for digital data,”

in Prelim. Proc. 3rd Intl. Information Hiding Workshop, Dresden, Germany, October 1999, pp. 134–145.

[11] S. Voloshynovskiy, S. Pereira, T. Pun, J. . Eggers, and J. K. Su, “New paradigms for effective multicasting and fingerprinting of entertainment media,” IEEE Communi-cations Magazine, vol. 43, no. 6, pp. 77–84, June 2005.

[12] M. Kutter and F. A. P. Petitcolas, “A fair benchmark for image watermarking sys-tems,” in Proc. SPIE Security and Watermarking of Multimedia Contents, vol. 3657, San Jose, CA, USA, 25–27 Jan. 1999, pp. 226–239.

[13] W. Zhu, Z. Xiong, and Y. Q. Zhang, “Multiresolution watermarking for images and video,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 545–550, 1999.

[14] Y.-K. Chee, “Survey of progressive image transmission methods,” International Jour-nal Of Imaging Systems And Technology, vol. 10, no. 1, pp. 3–19, 1999.

[15] S. Voloshynovskiy, S. Pereira, T. Pun, J. . Eggers, and J. K. Su, “Attacks on dig-ital watermarks: Classification, estimation-based attacks, and benchmarks,” IEEE Communications Magazine, vol. 39, no. 8, pp. 118–126, August 2001.

[16] T. P. chun Chen and T. Chen, “Progressive image watermarking,” in IEEE Iterna-tional Conference on Multimedia and Expo, vol. 2, July 2000, pp. 1025–1028.

[17] A. Jayawardena and P. Lenders, “Embedding multiresolution binary images into wavelet domain multiresolution binary watermark channels for copyright enforce-ment,” in Proceedings of the Acoustics, Speech, and Signal Processing 2000, vol. 4, June 2000, pp. 1983–1986.

[18] Z.-N. Li and M. S. Drew, Eds., Fundamentals of Multimedia. Prentice-Hall, October 2003.

[19] Y.-H. Chen, J.-M. Su, H. Fu, H.-C. Huang, and H. Pao, “Adaptive watermarking using relationships between wavelet coefficients,” in IEEE International Symposium on Circuits and Systems, vol. 5, May 2005, pp. 4979–4982.

[20] E. K. P. Chong and S. H. Zak, An Introduction to Optimization, 2nd ed. New York:

John Wiley and Sons, 2001.

[21] J. C.-I. Chuang and M. A. Sirbu, “Pricing multicast communication: A cost-based approach,” Telecommunication Systems, vol. 17, no. 3, pp. 281–297, 2001.

[22] H. Zhao and K. Liu, “Bandwidth efficient fingerprint multicast for video streaming,”

in Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, May 2004, pp. 849–852.

[23] T.-L. Wu and S. F. Wu, “Selective encryption and watermarking of mpeg video,”

in International Conference on Image Science, Systems and Technology, CISST97, June 1997, pp. 261–269.

[24] H. hua Chu, L. Qiao, and K. Nahrstedt, “A secure multicast protocol with copyright protection,” ACM Computer Communication Review, vol. 32, Issue 2, pp. 42–60, April 2002.

[25] R. Parviainen and P. Parnes, “Large scale distributed watermarking of multicast media through encryption,” in Communications and Multimedia Security, ser. IFIP Conference Proceedings, R. Steinmetz, J. Dittmann, and M. Steinebach, Eds., vol.

192. Darmstadt, Germany: Kluwer, May 2001.

[26] D. Thanos, “Coin-video: A model for the dissemination of copyrighted video streams over open networks,” in Information Hiding: 4th International Workshop. Pitts-burgh, PA, USA: Lecture Notes in Computer Science, April 2001, pp. 169–184.

[27] B. M. Macq and J.-J.Quisquater, “Cryptology for digital tv broadcasting,” Proceed-ings of the IEEE, vol. 83, no. 6, pp. 944–957, June 1995.

[28] F. Hartung and B. Girod, “Digital watermarking of mpeg-2 coded video in the bitstream domain,” in Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, April 1997, pp. 2621–2624.

[29] J. A. Bloom, “Security and rights management in digital cinema,” in Proceedings of the 2003 International Conference on Multimedia and Expo, vol. 2, July 2003, pp.

621–624.

[30] P. Judge and M. Ammar, “Whim: Watermarking multicast video with a hierarchy of intermediaries,” Computer Networks, vol. 39, no. 6, pp. 699–712, August 2002.

[31] W. Luh and D. Kundur, Digital Media Fingerprinting: Techniques and Trends. CRC, 2004, ch. 19.

[32] D. Kunder and K. Karthik, “Video fingerprinting and encryption principles for digital rights management,” Proceedings of the IEEE, vol. 92, no. 6, pp. 918–932, June 2004.

[33] I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure spread spectrum watermarking for multimedia,” IEEE Trans. Image Processing, vol. 6, no. 12, pp.

1673–1687, December 1997.

[34] K. Su, D. Kundur, and D. Hatzinakos, “Statistical invisibility for collusion-resistant digital video watermarking,” IEEE Transactions on Multimedia, vol. 7, no. 1, pp.

43–51, 2005.

[35] C. C. Lindner and C. A. Rodger, Design Theory. Boca Raton: CRC Press LLC, 1997.

[36] J. H. Dinitz and D. R. Stinson, Eds., Contemporary Design Theory: A Collection of Surveys. New York: Willy, 1992.

[37] C. J. Colbourn and J. H. Dinitz, The CRC Handbook of Combinatorial Designs.

Boca Raton: CRC Press, 2006.

[38] T.-S. Chen, C.-C. Chang, and M.-S. Hwang, “A virtual image cryptosystem based upon vector quantization,” IEEE Transactions on Image Processing, vol. 7, no. 10, pp. 1485–1488, 1998.

[39] S. Huffman, T.-E. Yang, L. Yan, and K. Sanders, “Genie out of the bottle: Three u.s. networks report tiananmen square,” in Proceedings of the annual meeting of Association for Education in Journalism and Mass Communication, Minneapolis, Minnesota, USA, 1990.

[40] Y. Xu, Y. Chen, C. Tseng, P. Lai, R. Hsieh, Y. Lu., Y. Shen, and H.-C. Fu, “Multi-media tv news browsing system,” in Proceedings. IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, ROC, June 2004.

[41] P.S.Lai, L.Y.Lai, T.C.Tseng, Y.H.Chen, and H.-C. Fu, “A fully automated web-based tv-news system,” in Proceedings of PCM2004, Tokyo, Japan, Dec. 2004.

[42] “Informedia.” [Online]. Available: http://www.informedia.cs.cmu.edu/

[43] “Informedia:vace-ii.” [Online]. Available: http://www.informedia.cs.cmu.edu/arda/vaceII.html [44] N. C. Wah, Analysis of Spatio-Temporal Slices for Video Content Representation.

PhD Thesis, Hong Kong University of Science & Technology, 2000.

[45] S.-S. Cheng, Y. hong Chen, C.-L. Tseng, H.-C. Fu, and H.-T. Pao, “A self-growing probabilistic decision-based neural network with applications to anchor/speaker iden-tification,” in Proceedings of the Second International Conference on Hybrid Intelli-gent Systems (HIS02), Santiago, Chile, 2002.

[46] T. Sato, T. Kanade, E. Hughes, and M. Smith, “Video optical character recognition for digital news archive,” in Proceedings of Workshop on Content-Based Access of Image and Video Databases, Los Alamitos, CA, 1998, pp. 52–60.

[47] H.-C. Fu, H.-Y. Chang, Y. Y. Xu, and H.-T. Pao, “User adaptive handwriting recog-nition by self-growing probabilistic decision-based neural networks,” IEEE Transac-tions on Neural Networks, vol. 11, no. 6, p. 1373, 2000.

[48] P.-S. L. Tzu-Yang Huang and H.-C. Fu, “A shot-based video clip search method,” in Proceedings. of CVGIP2004, Taipei, Hualien, ROC, August 2004.

[49] C. Fraley and A. E. Raftery, “How many clusters? which clustering method? answers via model-based cluster analysis.” Computer Journal, vol. 41, pp. 578–588, 1998.

[50] S.-Y. Sun, C.L.Tseng, Y.H.Chen, S.C.Chuang, and H.C.Fu, “Cluster-based support vector machine in text-independent speaker identification,” in Proceedings of Inter-national Joint Conference on Neural Networks IJCNN 2004, Budapest, Hungary, 2004.

[51] L. Zhu, A. Rao, and A. Zhang, “Theory of keyblock-based image rerieval,” ACM Trans. on Information Systems, vol. 20, no. 2, pp. 224–257, April 2002.

在文檔中網路多媒體數位浮水印及其應用之研究 (頁 90-108)