The Design and Development of Web-based Chinese Ancient
Book Automatic Version Comparison System
The Design and Development of Web-based Chinese Ancient
Book Automatic Version Comparison System
Student Pao-Huang Chen Advisor Dr. Ming-Jiu Hwang
A Thesis
Submitted to College of Computer Science National Chiao Tung University in partial Fulfillment of the Requirements
for the Degree of
Master of Science
in Digital Library January 2012
Hsinchu, Taiwan, Republic of China
The Design and Development of Web-based Chinese Ancient
Book Automatic Version Comparison System
Student : Pao-Huang Chen Advisor: Dr. Ming-Jiu Hwang
Degree Program of Computer Science National Chiao Tung University
Abstract
The Chinese nation has a long history of five thousand years of civilization which leaving many precious cultural relics and books. Especially the ancient books which made copies and transferred through many dynasties have many versions in the same book. The results bring confusion and trouble to users and researchers. Therefore, the present study utilizes the computers technology and Internet borderless features, including the speed, memory and comparison to design a systematic platform that easy to use for researchers and general users. After inserting data to the system, it will automatically show different words or paragraph in the different ancient book versions and help researchers make annotation and textualism. In addition, the users can figure out many errors which difficult to find manually. It also improves the quality and quantity of textualism and reaches the goal of automatic version comparison.
... i ... ii ... iii ... iv ... vi ... vii ... 1 1.1 ... 1 1.2 ... 2 1.3 ... 3 1.4 ... 3 1.5 ... 5 ... 6 2.1 ... 6 2.1.1 ... 7 2.1.2 ... 8 2.1.3 ... 16 2.1.4 ... 18 2.1.5 ... 21 2.2 ... 23 2.3 ... 24 2.3.1 AppServ ... 24 2.3.2 PHP ... 28
2.3.3 MySQL ... 32 ... 35 3.1 ... 35 3.2 ... 36 3.3 ... 37 3.3.1 ... 39 3.3.2 ... 40 3.3.3 ... 43 3.3.4 ... 45 3.3.5 ... 46 3.4 ... 47 ... 54 4.1. ... 54 4.2. ... 55 4.2.1 ... 57 4.2.2 ... 62 4.2.3 ... 68 4.2.4 ... 70 4.2.5 ... 71 4.3 ... 72 ... 73 5.1 ... 73 5.2 ... 74 ... 76
2-1 ... 8 2-2 PHP ... 31 3-1 ... 48 3-2 ... 48 3-3 ... 48 3-4 ... 49 3-5 ... 49 3-6 ... 50 3-7 ... 50 3-8 ... 51 3-9 ... 51 3-10 ... 52 4-1 ... 72
1-1 ... 4 2-1 ... 6 2-3 ... 13 2-4 ... 15 2-5 ... 22 2-6 ... 23
2-7 Apache Service PHP MySQL User ... 25
2-8 PHP ... 26 2-9 MySQL ... 27 2-10 phpMyAdmin ... 28 2-11 PHP ... 29 2-12 CGI ... 30 2-13 ... 33 3-1 ... 35 3-2 MVC ... 37 3-3 ... 38 3-4 ... 40 3-5 ... 41 3-6 ... 42 3-7 ... 44 3-8 ... 45 3-9 ... 46 3-10 ... 47
3-11 ... 53 4-1 ... 55 4-2 ... 56 4-3 ... 57 4-4 ... 58 4-5 ... 58 4-6 ... 59 4-7 ... 59 4-8 ... 60 4-9 ... 60 4-10 ... 60 4-11 .. 61 4-12 ... 62 4-13 ... 63 4-14 ... 63 4-15 ... 64 4-16 ... 65 4-17 ... 65 4-18 ... 66 4-19 ... 67 4-20 ... 67 4-21 ... 68 4-22 ... 69 4-23 ... 69 4-24 ... 70
4-25 ... 70
4-26 ... 71
4-27 ... 71
1.1 1.2 1.3 1.4 1.5
1.1
1. 2. 3.
1.2
1. 2. 3. 4. 5.1.3
1. (.txt) 2. 3. 4.1.4
1-1界 定 研 究 主 題 階 段 系 統 設 計 階 段 系 統 評 估 階 段 結 論 與 建 議 階 段 1-1
1.5
1. 2. 3. 4. 5.2.1 2.2 2.3
2.1
2-1 2-1 1994[2] 2008[3] 1997[5] 2010[10] [14] [13] 2003[12] 1997[11] [8] / 2007[] / 2007[15] / / 2007[16] Time Topic 20072.1.1 1911 1 1911 2 1912 3 1911 4 2-1 1 2010 http://www.hudong.com/wiki/ 13 2012 6 14 2 1994 3-5 3 2000 10 4 2001 500 1
2-1 1994 1911 2000 1912 2001 1911 2010 1911 5 2.1.2 6 5 2000 iii 6 2000 58
1. 2. , 7 1. 2. 3. 4. 5. 6. 7. 8 1. 7 1997 http://cdp.sinica.edu.tw/paper/1997/19970301_2.html 2011 11 20 8 2000 2
2. 1. 9 1984 1990 1989 9 1997 http://hanji.sinica.edu.tw/index.html 2011 10 21
10 (1) (2) (3) (4) 10 2000 60
2-2 http://hanchi.ihp.sinica.edu.tw/ 2. 1998 (1924~1934) 12 (1) App HTMLhelp HTML 11 2000 60 12 2000 61
(2) (3) (4) HTMLhelp 2-3 http://www.cbeta.org/result/search.htm 3. 1989
CHANT Center 13 1980 1990 14 15 (1) (2) Unicode Unicode (3) (Big5) (GBK) 13 2007 114-116 2007 7 14 2007 114-116 2007 7 15 2000 62
2-4
2.1.3 16 17 (Theodotus) (Jerome, St.347-420) 1675 (Jakob
Griesbach) (Immanuel Bekker) (Karl Lachmann)
18 16 1998 3 17 2002 97 24 1 18 2010 119 2010 8
19 20 21 19 1997 1 1997 7 20 1998 22 21 2003 7
2.1.4 1. 2. 22 1931 22 2002 97 24 1
1. 23 2. 24 3. 25 23 1997 118 24 1997 119 25 1997 120
4. 26 5. 27 26 1997 121 27 1994 179
2.1.5 28 1. 2. 3. 4. 29 1. 2. 3. 2-5 2-6 28 2007 142 2007 3 29 2007 84 21 2
2-5 2007 book1 book2 s1 s2 w s1 s2 s1 s2 n location1 location2
location2 location2 location2
No No Yes Yes No Yes location1=location2 location1<location2
location2=1 location2>1 location2=1 location2>1 location2>1 location2=1 location1>location2
2-6 2007
2.2
1. 2. 3. 古 籍 自 動 校 勘 系 統1. 2. 3.
2.3
AppServ PHP MySQL 2.3.1 AppServAppServ Apache PHP MySQL phpMyAdmin Windows Linux
2-7 Apache Service PHP MySQL User AppServ 2-7 1. Apache HTTP Server PHP Apache HTTP Server 2. PHP HTML IE FireFox Chrome HTML PHP 2.3.2 Apache Web Server PHP Service MySQL Database phpMyAdmin Service
2-8 PHP
http://www.php.net/source.php?url=/downloads.php 3. MySQL
2-9 MySQL http://www.tutorialsweb.com/sql/working-with-mysql.htm 4. phpMyAdmin MySQL Web PHP Web MySQL
2-10 phpMyAdmin
2.3.2 PHP
PHP(PHP Hypertext Preprocessor)
(command line interface) (GUI)
1995 PHP PHP Group PHP PHP PHP 2007 4 PHP 30 31 30 2011 PHP http://zh.wikipedia.org/wiki/PHP 2011 10 30 31
Usage Stats for April 2007
PHP: 20,917,850 domains, 1,224,183 IP addresses Source: Netcraft
2-11 PHP
http://www.php.net/usage.php
CGI(Common Gateway Interface)
PHP
CGI Web CGI
CGI CGI
C Perl ASP Java PHP 32
32
2-12 CGI PHP 5 MySQL 5 CGI CGI PHP 33 1. PHP5 XML MySQL extensioon 2. PHP UNIX Windows 3. PHP HTML HTML JavaScript 33 2006 PHP5 MySQL5 1-3~1-5 HTML PHP WWW Server PHP DataBase Server
PHP
4. PHP Open Source GPL(General Public
License)
5. PHP
2-2 PHP
Adabas D Ingres Oracle Dbase
InterBase Ovrimos Empress FrontBase
PostgreSQL FilePro mSQL Solid
Hyperwave Direct Ms-SQL Sybase IBM DB2
MySQL Velocis Informix ODBC
Unix dbm 6. PHP IMAP SNMP NNTP POP3 7. PHP HTML PDF XHTML XML 8. PHP (1) (Server-side) PHP Web
Web Web Web PHP (2) PHP PHP (3) PHP PHP GUI 2.3.3 MySQL MySQL (RDBMS) MySQL AB Internet MySQL Google Facebook 34 (database) (table) (record) (column) 35 34 2011 MySQL http://zh.wikipedia.org/wiki/MySQL 2011 10 30 35 Mark Maslakowski 2001 MySQL 21 4-5
2-13
MySQL21
MySQL 36 37
1. C C++
2. FreeBSD HP-UX Linux Mac OS Novell NetWare NetBSD OpenBSD OS/2 Wrap Solaris Windows
3. GNU Automake Autoconf Libtool
4. C C++ C# VB.NET Delphi Java Perl PHP Python Ruby API
36
MySQLPress 1995 MySQL - MySQL AB MySQL 1-5
37 2011 MySQL http://zh.wikipedia.org/wiki/MySQL 2011 10 30 Database Column Column Column Column Column Column
Table Table Table
Field Field
5. CPU
6. SQL
7. GB 2312 BIG5 Shift JIS
8. (MyLSAM)
9. (hash)
3.1
3-1 38 古 籍 文 件3.2
MVC
(Model) (View) (Controller)
3-2 View 1. Model View 2. View (User Interface) Model Controller 3. Controller
(Request) Model View
3-2 MVC
3.3
3-3 1. (HTML) 2. Client Browser Presentation Layer Browser Web Server Business Logic Layer Request Controller ( ) View (PHP ) Model Forward 資 料 處 理 Database Data Access Layer3.
4.
5.
3-3
3.3.1
3-4 1.
2.
3-4 3.3.2 3-5 1. Source Data 古 籍 文 件 分 類
3-6 2. 3-5 方 式 選 擇 古 籍 校 勘
3-6
0
2
+1 +1
3.3.3
3-7
Javascript TextRange
TextRange Move 3-839
1. (character) (word) (sentence)
2. 3.
39
3-8
3.3.4
3-9 3.3.5 3-10 Source Data 比 對 字 詞 庫
3-10
3.4
10 3-11
1. admin 3-1 systemID int(11) user varchar(100) pw varchar(100) group tinyint(4) 2. category 3-2 id int(11) name varchar(100) description text 3. subcategory 3-3 id int(11) name varchar(100) description text categoryID int(11)
4. title 3-4 id int(11) name varchar(100) description text categoryID int(11) subcategoryID int(11) hierarchy tinyint(4) 5. content 3-5 id int(11) description text categoryID int(11) subcategoryID int(11) titleID int(11) content test
6. notes 3-6 id int(11) notetitle text notecontent text author varchar(200) contentID varchar(100) share tinyint(4) 7. notes_compare 3-7 id int(11) content1 tinyint(4) content2 tinyint(4) similar_str varchar(500) notes text author varchar(100) share tinyint(4)
8. tools_title 3-8 id int(11) name varchar(100) description text 9. tools_detail 3-9 id int(11) name1 varchar(100) name2 varchar(100) description text titleID varchar(100) example varchar(500) source varchar(200)
10. collections 3-10 id int(11) collector varchar(100) notesID varchar(100) notes_compareID varchar(100)
1 1= 1 1 1 n= 1 m n= 3-11 category title subcategory content tools_detail tools_title notes_compare notes collections admin 1 n m n 1 n n n 1 n 1 1 n 1 n 1 n n m n 1 1 n
4.1 4.2 4.3
4.1.
4-1 Window Server 2003 AppServ Apache Httpd MySQL phpMyAdmin PHP Apache AppServ MySQL PHP HTML CSS JavaScript AppServ AppServ Apache PHP HTML CSS MySQL4-1
4.2.
4-2 Internet Apache Server PHP JavaScript HTML MySQL4-3
4-3
4.2.1
1.
4-4
4-5
(.txt)
4-6
4-7
4-8 4-9
4-10
4-8
4-9
4-10
4-11
4.2.2
4-12
4-12 1.
4-14
4-15
4-13
4-15 2. 4-16 26 160 _ _ _ _ 4-17 4-18
4-16
4-18 3.
4-19
4-20 _
4-19
4-21
4.2.3
1.
4-22 2.
4-23
4-24
4-24
4.2.4
4-25 4-26 4-27
4-26
4-27
4.2.5
4-28
4.3
4-1 4-1
5.1 5.2
1. 2. 3. 4.
5.2
1.2.
3.
(.txt)
1. http://www.hudong.com/wiki/ #13, available at 2012.6.14. 2. 1994 3. 2000 4. 500 2001 5. http://cdp.sinica.edu.tw/paper/1997/19970301_1.htm, available at 2011.11.20. 6. http://hanji.sinica.edu.tw/index.html, available at 2011.10.21. 7. 2007 7 2007 8. 1998 9. 24 1 2002 10. 2010 8 2010 11. 1997 7 1997 12. 2003 13. 1997 14. 1994 15. 2007 3 2007 16. 21 2 2007 17. PHP, http://zh.wikipedia.org/wiki/PHP, available at 2011.10.30.
18. PHP PHP Usage Stats, http://www.php.net/usage.php, available at 2011.10.30.
19. PHP5 MySQL5 2006
20. MySQL, http://zh.wikipedia.org/wiki/MySQL, available at 2011.10.30.
21. Mark Maslakowski MySQL 21 2001
22. MySQLPress MySQL - MySQL AB MySQL
1995 23. 2011 24. 1995 25. 1994 26. 1994
27. R. Kawase, E. Herder, and W. Nejdl, “A comparison of paper-based and online annotations in the workplace,” EC-TEL, pp. 240–253, Proceedings of the 4th European Conference on Technology Enhanced Learning. Springer-Verlag, 2009.
28. PL Patrick Rau, et al. “Developing web annotation tools for learners and instructors,” Interacting with Computers, Volume 16, Issue 2,