劇情導向的戲劇文創內容協同製作平台

(1)

行政院國家科學委員會補助專題研究計畫成果報告期末報告

劇情導向的戲劇文創內容協同製作平台

計畫類別：個別型計畫

計畫編號： NSC 101-2221-E-011-145-

執行期間： 101 年 08 月 01 日至 102 年 10 月 31 日執行單位：國立臺灣科技大學資訊管理系

計畫主持人：林伯慎

計畫參與人員：碩士班研究生-兼任助理人員：葉文智碩士班研究生-兼任助理人員：紀至鍇碩士班研究生-兼任助理人員：劉易昇

報告附件：出席國際會議研究心得報告及發表論文處理方式：

1.公開資訊：本計畫可公開查詢

2.「本研究」是否已有嚴重損及公共利益之發現：否 3.「本報告」是否建議提供政府單位施政參考：否

中華民國 103 年 01 月 29 日

(2)

中文摘要：本計畫的目的是研發戲劇形式的文創內容協同製作平台，透過網際網路的快速資源共享，讓不同專長和技術的人分工合作，減少製作戲劇內容的障礙，以實現協同創作的情境。不論是東方與西方的戲劇，劇情(story)都是一部戲劇的核心。

這是指：在一部戲劇中，角色間的對話、表情、動作以及劇情的發展，是劇作者或觀眾所最關心的焦點；因此，以劇情導向的方式來創作戲劇內容，是一種最自然的創作方式。本計畫將平台定位為「劇情導向」即是基於此想法。劇情導向概念不同於傳統投影片製作系統的頁面(page)概念或動畫製作系統的影格(frame)概念，在製作介面上也有不同。另外，

戲劇創作的各種技術專長常常形成創作上的障礙，導致數位內容製作不易普及。故此，本計畫將平台定位為「協同創作」，希望提供創作者一個友善、開放、分享、合作的協同製作平台，以減少因專業知識或操作技能所造成的創作障礙，並降低戲劇式數位內容的製作時間與成本。

本戲劇內容製作平台分為兩個子系統，一為創作元素分享系統，另一為導演製作系統。創作元素分享系統可讓具有不同專長的創作者分享其創作成果，例如劇本或角色人物；劇創作者因此得以運用別人的專長，創作出內容更豐富的作品。

劇情製作系統採用 Java Swing 提供圖形化操作介面系統，結合 Rhino JavaScript engine 實現即時排演的功能，以及 Xuggle 程式庫錄製動畫檔；此系統的圖形介面提供使用者簡易直覺的介面以編修對白、表情、走位與動作特效並錄音，

並可以對編修結果即時排演。我們以此平台製作了小紅帽童話故事改編之卡漫式動畫做為此平台之展示範例。

中文關鍵詞：戲劇內容創作、劇情導向、協同創作、數位內容、卡漫製作英文摘要： The aim of the project is to develop a platform for

synergetic creation of digital content for cartoon- like drama. This platform can reduce the barrier of creating animations and facilitate the cooperation among people with different expertise and skill by providing a story-oriented integrated interface and a resource-sharing environment through Internet. This can reduce the time and cost for making digital contents effectively.

This platform consists of two subsystems: one is a resource-sharing system and the other is a story- editing system. The design elements, which include scripts, animations of characters, or the pictures, can be uploaded to the database in the resource-

(3)

sharing system, and then accessed by the story- editing system. The creators can make use of the resources contributed by others and create their own dramas using the story-editing interface. The editing interface can be used to set the characters, change their expressions, modify the dialogue, record the voices, set the movements of the characters or the animation effects, and so on. The edited story can be rehearsed immediately on the central stage and

recorded as a video. A short drama modified from ＇Little Red Riding Hood＇ is made on the platform to demonstrate the functions of this platform.

英文關鍵詞： Drama creation, Story-oriented, Synergetic creation, Digital content, Creation of cartoon

(4)

行政院國家科學委員會專題研究計畫成果報告

劇情導向的戲劇文創內容協同製作平台

計畫編號：NSC 101-2221-E-011-145 執行期限：101年8月1日至102年10月31日

主持人：林伯慎台灣科技大學資訊管理學系

一. 摘要

本計畫的目的是研發戲劇形式的文創內容協同製作平台，透過網際網路的快速資源共享，讓不同專長和技術的人分工合作，減少製作戲劇內容的障礙，以實現協同創作的情境。不論是東方與西方的戲劇，劇情(story)都是一部戲劇的核心。

這是指：在一部戲劇中，角色間的對話、表情、動作以及劇情的發展，是劇作者或觀眾所最關心的焦點；因此，以劇情導向的方式來創作戲劇內容，是一種最自然的創作方式。本計畫將平台定位為「劇情導向」即是基於此想法。劇情導向概念不同於傳統投影片製作系統的頁面(page)概念或動畫製作系統的影格(frame)概念，在製作介面上也有不同。另外，戲劇創作的各種技術專長常常形成創作上的障礙，導致數位內容製作不易普及。故此，本計畫將平台定位為「協同創作」，

希望提供創作者一個友善、開放、分享、合作的協同製作平台，以減少因專業知識或操作技能所造成的創作障礙，並降低戲劇式數位內容的製作時間與成本。

本戲劇內容製作平台分為兩個子系統，一為創作元素分享系統，另一為導演製作系統。創作元素分享系統可讓具有不同專長的創作者分享其創作成果，例如劇本或角色人物；劇創作者因此得以運用別人的專長，創作出內容更豐富的作品。

劇情製作系統採用Java Swing提供圖形化操作介面系統，結合Rhino JavaScript engine實現即時排演的功能，以及Xuggle程式庫錄製動畫檔；此系統的圖形介面提供使用者簡易直覺的介面以編修對白、表情、走位與動作特效並錄音，並可以對編修結果即時排演。我們以此平台製作了小紅帽童話故事改編之卡漫式動畫做為此平台之展示範例。

關鍵詞：

戲劇內容創作、劇情導向、協同創作、數位內容、卡漫製作

(5)

Abstract

The aim of the project is to develop a platform for synergetic creation of digital content for cartoon-like drama. This platform can reduce the barrier of creating animations and facilitate the cooperation among people with different expertise and skill by providing a story-oriented integrated interface and a resource-sharing environment through Internet. This can reduce the time and cost for making digital contents effectively.

This platform consists of two subsystems: one is a resource-sharing system and the other is a story-editing system. The design elements, which include scripts, animations of characters, or the pictures, can be uploaded to the database in the resource-sharing system, and then accessed by the story-editing system. The creators can make use of the resources contributed by others and create their own dramas using the story-editing interface. The editing interface can be used to set the characters, change their expressions, modify the dialogue, record the voices, set the movements of the characters or the animation effects, and so on. The edited story can be rehearsed immediately on the central stage and recorded as a video. A short drama modified from “Little Red Riding Hood” is made on the platform to demonstrate the functions of this platform.

Keywords：

Drama creation, Story-oriented, Synergetic creation, Digital content, Creation of cartoon

(6)

二. 計畫報告

2.1 研究背景

網際網路的發展帶來各種產業劇變與重組，也對於文化創意產業帶來了衝擊和更多的機會。文化創意產業的範圍很廣，舉凡電影、表演藝術、廣告、節目、

教材、遊戲、設計等都是文化創意可以發揮的地方。如果能夠結合資訊科技和網際網路技術，擴展文創產業的服務形式與內容，就可促使文創生態圈朝向更豐富多元發展。因此，本計畫以戲劇形式的文創內容協同製作平台，做為技術研發的切入點。具有戲劇內涵的文化形式種類繁多，舉凡漫畫、卡通、動畫、電影、短劇、相聲、舞台劇等都是。因此，戲劇形式的數位內容應用範圍極廣，具有極大的發展空間；像是廣告、文宣、教材、電子書、簡報等，都是其可以發揮的場所。

如果能透過網際網路的快速資源共享，減低技術障礙，讓不同知識、技能、專長的人能協同合作、彼此互補、激盪創意，應可降低數位內容製作成本，對於文化創意的發展產生促進的作用。

目前市面已有一些可以編輯圖片或製作動畫的軟體，然而，這類產品往往具有繁複的操作，並需要經過陡峭的學習曲線才能熟練，因而對數位內容創作形成了「技能障礙」。另外，在戲劇創作中，對於這些軟體做低階、繁複、技巧性的

操作也會迫使創作者中斷創意思考的「流」，對創作過程會造成極大的干擾，破

壞創作的品質，形成了「介面障礙」。例如，戲劇的創作者在構想劇情時，如果

又分心去編修某一動畫的細節，可能就無法專注於揣摩角色的情緒和對白，工作品質也受到影響。如何克服「技能障礙」和「介面障礙」，是讓戲劇式文創內容製作的重要議題。在一齣戲劇裡面，往往角色的對話、情緒、動作等才是編劇、

導演或觀眾所最關心的焦點，其次才是場景、道具、走位和其它細節。我們將本計畫定位為劇情導向(story-oriented)，原因也在此，因為劇情(story)才是戲劇的核心。如果戲劇動畫的製作介面可以支援劇情的主要概念，包括人物、選角、場景、

道具、配音、走位、對白等，就可以降低創作者的進入障礙。故此，本計畫的構想即是希望提供一個友善、開放、分享合作的戲劇內容製作平台，以減少創作者因技術或操作介面所造成的障礙，並降低數位內容製作的時間和成本。

(7)

2.2 研究目的

在一個戲劇形式表演的產生過程，需要具有不同專業人員投入，例如，電影製作可能會包括劇作者(composer)、演員(actor)、工程人員(engineer)、導演(director)、

攝影師(photographer)、藝術設計(art designer)、配音人員(dub)等；而動畫、卡通、

漫畫等形式的演出，也會有不同的創作者共同參與合作。對於戲劇內容的創作過程，我們可將創作者的角色簡化為四個部份：劇情創作者(composer)負責產出原始劇本，造型設計者(designer)負責設計場景、道具、人物造型(含表情與動作)，

導演或製作人(director)負責整合與整體呈現，配音員(dub)負責錄製旁白與對白的聲音內容。這四類創作者的角色關係顯示於圖 2.2.1。

圖 2.2.1 戲劇內容製作平台的創作者分工關係圖

本計劃研究的目的即是在網際網路環境下，建立一個戲劇文創內容的協同製作平台。此平台可以使參與者依據其興趣專長，包括劇本編寫、人物造型、場景道具設計、配音、導演等，進行相互討論、彼此分工、並各展所長，形成共同創作之環境。此平台提供系統介面將各群組使用者所設計、創作之元素，進行相關之資料整合、演算、繪製等處理，以產生劇場動畫回應至使用者。此種整合架構可以有效地對複雜的劇場動畫製作進行分工並簡化步驟，降低其製作時程成本，

增加內容來源的多元化，使數位內容的製作在簡易、低成本、快速、多元的條件人物場景道具

造型資源庫

導演原始劇本

聲音檔

劇情創作造形設計

後製劇本製作平台

配音動卡漫

電影檔

(8)

下，達到普及的目標。平台可創作出的數位內容形式包含卡通、漫畫、動畫等，

可以應用於戲劇、表演、廣告、文宣、教材、簡報等範疇。主要特點簡述如下：

- 劇情導向：以劇情的概念編輯劇本，而非一般編輯軟體常使用的頁面、影格、

或時間軸。劇情列可以在圖形介面上選取人物，錄製其對白，選取表情動作，

設定其走位。亦可更改道具場景，或錄製旁白。藉由劇情導向的介面，創作者可以較專注於和劇情有關的創作上，不會被瑣碎的操作中斷創意流。

- 協同創作：創作者可上傳或下載原始劇本、人物造型、聲音、道具、場景、

後製劇場專案等到系統資源庫。經由資源的共享，創作者專長可以互補，實現合作式創作與協同創作，並節省製作成本和時程。

- 創意激盪：藉由協同創作，創作成果可以有多種不同的數位內容呈現形式。

例如，原始劇本，可能被表現成漫畫、卡通、動畫等演出方式。而人物設計者所創作出來的人物造型，可能會被應用在各個戲劇中。這為戲劇的表現創造了多元的可能性。

- 創作支援：我們計畫研究自然語言理解查詢介面，自動產生人物/場景的候選組合，並根據台詞自動判斷選取表情。並研發照片卡通化技術，將照片中的人臉轉換成卡通漫畫形式，存入人物造型資源庫。

本年的計畫中，我們完成了基礎導演製作系統建置與網路共享機制的研發；我們以卡漫動畫形式的童話教材製作為例，驗證此導演製作的實用性與易用性，並以劇本、人物造型、場景圖的搜尋共享驗證了協同創作的可行性。

2.3 系統介紹

2.3.1 系統架構

本戲劇式文創內容協同製作平台分為兩個子系統，一為創作元素分享系統，

另一為導演製作系統。系統架構如圖 2.3.1 所示，說明如下：

(9)

1. 創作元素分享系統：使劇情創作者與造型設計者可透過網路上傳原始劇本以及人物造型動畫；此部分是藉由客戶端的 jQuery 程式庫、以及伺服器端的 JSP 網頁程式，以 JSON 格式傳輸，並存入 MySQL 資料庫。資料庫中的原始劇本以及人物動畫經由 GlassFish 伺服器，以網路服務形式供客戶端存取。

2. 導演製作系統：主要是採用 Java Swing 技術提供圖形化操作介面，以編修劇情；並結合 Rhino JavaScript engine 進行劇本解譯，實現即時排演和影片錄製功能。導演與配音人員可藉由我們的導演平台介面，向 GlassFish 伺服器要求下載資料庫中的劇本與人物動畫，然後在導演平台完成編輯劇情、配音、與動畫特效等功能，最後透過 Xuggle API 的功能將排演好的戲劇錄製成影片檔案。

此種整合架構可以有效地對複雜的劇場動畫製作進行分工，簡化其步驟，以降低其製作時程成本，並增加內容來源的多元化。

圖 2.3.1 協同製作平台系統架構圖

(10)

2.3.2 導演製作系統

導演製作系統主是提供導演與配音人員製作戲劇的介面。我們設計透過 Web Service 及查詢介面分享了三類創作元素，供製作戲劇時作為可用素材，包括劇本、人物造型、與場景圖。導演可載入劇本、角色人物造型與場景，並針對劇情的需求，修改角色的對白、動作表情，更換場景、加入旁白、增加其他特效與走位，或請配音人員錄製為角色對白。製作後的劇本可以儲存成劇場專案，將劇本、

人物動畫圖檔、使用場景圖檔、錄音檔案等全部記錄在專案檔。在導演製作系統中可以重新載入劇場專案，編修調整細節，進行排演，或是錄製成動畫影片。

圖 2.3.2 導演製作系統主流程圖

(11)

2.3.3 劇本共享流程

我們藉由 JSP 網頁程式，提供使用者上傳文字檔的原始劇本，流程如圖 2.3.3 所示。原始劇本格式如圖 2.3.3 左上方範例所示。網頁伺服器中接收到上傳劇本後，會剖析劇本中的角色名稱、動作、以及對白，轉寫成 JSON 格式，並存入劇本資料庫。上傳時使用者可為劇本設定關鍵詞標籤以方便搜尋過濾，例如：小紅

帽的劇本可加入「小紅帽」、「童話」等標籤。在導演製作系統中，使用者可執行

劇本庫搜尋功能並輸入搜尋的關鍵字，製作系統會向 GlassFish 應用伺服器提出搜尋的需求；伺服器查詢後回傳與關鍵字相關的劇本清單，由製作系統回應給使用者；使用者若點選了特定劇本，劇本就會被下載並匯入到導演製作系統的編輯區中並可開始編輯，如圖 2.3.3 左下方所示。

圖 2.3.3 劇本共享流程

(12)

2.3.4 人物造型共享流程

上傳人物造型功能是以 JSP 網頁介面，提供設計者上傳人物的多個造型動畫圖檔(GIF 格式)，其流程如圖 2.3.4 所示。以九官鳥為例，除了內定的定裝照 (photo.gif)外，我們又為其設計了四種造型圖案，分別為喜、怒、哀、樂，如圖 2.3.4 左上方所示。透過上傳介面網頁的介面，這五種造型被命名為「九官鳥」，

存入角色人物資料庫中，並加入「九官鳥」、「動物」等關鍵詞標記，或設定是否

允許分享。每位造型設計者可以上傳並瀏覽多個人物造型，如圖 2.3.4 右上方所示。已上傳的人物造型可以在導演製作系統中被搜尋到，並且下載匯入到製作系統中，搜尋介面如圖 2.3.4 右下方所示。導演可以指定劇本中的某個角色要由哪個人物造型來扮演，並根據劇情或對白來編輯人物的情緒或姿態，劇情編輯介面如圖 2.3.4 左下方所示。

圖 2.3.4 人物造型共享流程

(13)

2.3.5 導演製作主畫面

編輯主畫面分成中央舞台、造型庫、場景庫、工具桿、劇情列五大區域，如圖 2.3.5 所示。我們將編輯劇場所需要的所有工具全部呈現於主畫面上，這樣的設計方便導演不必從工作列中尋找所需要的工作區與工作按鈕，增加編輯的方便性；另外除了中央舞台與劇情列連結在一起之外，所有工作按鈕皆置於工作桿之下；工作桿具有浮動視窗的特性，導演能夠利用此特性，將所有的按鈕移動至其所習慣的位置，增加劇場編輯的效率。

中央舞台部分用於場景的呈現、編輯人物走位、排演預覽等功能。中央舞台中的每一個角色、場景都需要有相對應的劇情列控制，因此我們將中央舞台與下方劇情列連結在同一視窗，以利導演選擇對應劇情列以設定動作。中央舞台可依背景圖片大小進行縮放。左方造型庫能夠從網路與本機載入角色，讓導演選擇適合的角色加入至中央舞台區，進行選角動作；右側場景庫用來呈現目前專案所載入的場景，導演能夠視劇情需要，由此區選擇場景更換。工作桿中包含了劇場的常用編輯按鈕，所有的按鈕分成四大類，包括劇情列管理類、錄音控制類、動畫設定類、以及排演錄影類，每一類別的按鈕以不同顏色區塊加以區分。其中劇情列管理類包括了增刪劇情列及載入儲存劇場專案的功能，其餘類型則分別支援不同的劇場編修功能。

圖 2.3.5 導演製作系統主畫面

(14)

2.3.6 劇情編輯

圖 2.3.5 左下方的劇情列的部份，代表劇情演進的序列，此區域可以用來編修劇情。劇本中每一個角色對白、動作都以一行劇情列記錄；根據劇情列，導演可以清楚知道下一刻演出的對白或動作，並且可以隨時在不同位置插入一行劇情，

完成所需要的動作。例如，加入旁白、更換背景、或是新增人物。

劇情列的資料結構中包含了劇情列識別 ID、人物名稱、表情姿態、台詞、

音訊、動畫效果、時間、指數、動作組合、控制資訊等，用來控制該劇情列的行為。其中，表情姿態內容為該人物造型的表情圖檔(可為靜態圖片或 GIF 動畫)，

一種表情對應一個圖檔；音訊欄位則顯示該行劇情列是否有配音或是背景音樂，

並可以播放試聽。劇情列被解譯後會一行一行執行，例如，某一行的人物是九官鳥，設定表情為「喜」，對白為「大家好」，並且已錄製了對應的語音「大家好」，

則排演時執行到該行，九官鳥就會變成「喜」的表情，並說「大家好」，並且顯

示對話泡泡「大家好」。

除了對白外，動畫設定類的按鈕欄中還可設定人物的動畫特效，如搖晃、出入場、縮放、走位等。當導演選取某劇情列、並按下走位按鈕後，系統會在該劇情列之人物上顯示一個紅色靶狀物作為參考點，如圖 2.3.6 所示。導演可以將該參考點作為出發點，使用滑鼠在中央舞台區畫上走位軌跡即可；圖 2.3.6 中，黑色曲線為導演繪製的軌跡，執行該劇情列時，九官鳥會依該軌跡移動。此動畫還會搭配劇情列上的「指數」與「時間」兩個參數，更改變速度變化和動畫執行時

間。動畫參數(例如位置)為隨時間變化之指數函數，「指數」表示該函數的指數

值，指數值為 1 時代表速度變化為線性變化；而時間則表示該動畫執行之時間，

單位為千分之一秒(m-sec)。

(15)

圖 2.3.6 劇情編輯畫面

2.3.7 排演錄影

在進行劇情編修的過程中，導演可以隨時進行排演，只要在排演預覽按鈕區找到排演相關動作按鈕，執行排演預覽動作即可。除了對整部戲劇進行排演預覽之外，導演也可以對單幕劇進行排演，或是選取劇情列某區段進行排演。在排演時若同時啟動錄開關，系統會利用 Xuggle 套件，將播放的圖像和聲音在編碼後同步寫入影片串流並輸出至檔案。

(16)

圖 2.3.7 排演錄影功能

2.4 展示範例

我們以童話故事「小紅帽」為故事腳本，使用本製作平台製作了一個簡易的卡漫式動畫展示範例。首先，編劇者須先將「小紅帽」故事腳本編寫成格式化的原始劇本，部分原始劇本的範例如圖 2.4.1 所示。

(17)

圖 2.4.1 「小紅帽」部分原始劇本

進一步，編劇者可將此原始劇本經由網頁介面上傳至共享資料庫，而劇場創作者則可從導演製作系統中搜尋「小紅帽」並下載此劇本至劇場整合環境。同樣地，造型設計者也可以將「小紅帽」、「大野狼」、「奶奶」等人物造型的圖片上傳到共享資料庫，製作平台就可將其匯入整合製作環境。接下來，在製作環境中，

劇情創作者可以指定劇本中的角色是由哪個人物造型飾演，並開始編輯匯入的劇情列。劇情編輯過程中，創作者可以直接從劇情列編輯區中新增刪除劇情列、修改對白、設定表情、錄製聲音、設定走位動畫、錄製旁白、更換佈景等，並隨時針對特定劇情區段進行排演，以檢視演出效果是否合乎預期。在戲劇製作完成後，

可以使用錄影的功能，將演出錄製成為影片檔，也可將整個戲劇的製作環境儲存成為劇場專案，以便日後進行分享或修改。圖 2.4.2 顯示了「小紅帽」劇場專案中所錄製的部分演出場景。雖然目前我們初步完成整合編輯功能的展示，但以一個動畫的製作而言，豐富的創作素材還需要具有專業的美工技能的造型設計者投入，因此我們未來計畫將此平台開放給素人創作者，並研發造型設計的支援技術，

營造一個友善分享的協同創作環境。

(奶奶住在村子外面的森林裡，離小紅帽家有很長一段路。) (小紅帽剛走進森林就碰到了一條狼。小紅帽不知道狼是壞傢伙，所以一點也不怕它。)

大野狼[微笑]:你好，小紅帽小紅帽:你好，狼先生

大野狼:小紅帽，這麼早要到哪裡去呀小紅帽:我要到奶奶家去。

大野狼:你那圍裙下面有什麼呀

小紅帽:蛋糕和葡萄酒。昨天我們家烤了一些蛋糕，可憐的奶奶生了病，要吃一些好東西才能恢復過來

大野狼:你奶奶住在哪裡呀，小紅帽

小紅帽:進了林子還有一段路呢。她的房子就在三棵大橡樹下，低處圍著核桃樹籬笆。你一定知道的

大野狼[心中盤算著]:這小東西細皮嫩肉的，味道肯定比那老太婆要好。我要講究一下策略，讓她倆都逃不出我的手心

(18)

圖 2.4.2 「小紅帽」劇場部分場景

(19)

三. 計算成果

本計畫依照既定的需求規劃與開發時程，已完成以下的工作項目：研發戲劇式內容的製作平台、研發資源共享機制和介面、研討會論文兩篇及 SCI 期刊論文一篇、完成以此平台製作之展示範例。茲簡述如下：

研發戲劇式內容的製作平台：

導演製作平台主要是採用 Java Swing 實現圖形化操作介面，結合 Rhino JavaScript Engine 提供劇情解譯與排演功能，並透過 Xuggle API 將排演中的戲劇動畫錄製成影片檔。在平台中，創作者可搜尋共享資料庫並下載劇本與人物造型，然後在劇情列中編輯劇情、動作、對白，並錄製聲音與設定走位或動畫特效，實現戲劇上的演出。編輯的劇情可以在中央舞台區進行區段、單幕、或完整排演，讓創作者可以立即看到劇場演出的效果。製作中的劇場專案可以儲存與載入，方便創作者的分享與觀摩。

研發資源共享機制和介面：

我們提供了網頁介面，讓劇本創作者與造型設計者可以上傳原始劇本或人物造型到共享資料庫。原始劇本是具有特定格式的純文字檔，人物造型則是多個 GIF 格式檔，可為圖片或多影格動畫。共享資料庫經由 GlassFish 應用程式伺服器以 RESTful Web Service (Java EE 6)介面分享，供導演製作系統使用。在製作戲劇時，創作者可以用關鍵詞搜尋共享資料庫中的劇本與人物造型，或搜尋網路圖片作為場景圖，並把它們下載到劇場整合編輯環境中。例如，以「小紅帽」搜尋劇本庫、以「動物」搜尋造型庫或是以「城堡」搜尋 Google 圖片。共享資料庫的管理是採用會員制，須先註冊與登入會員才可上傳劇本與人物造型，上傳時可以選擇是否開放共享，或是由會員個人使用。如果有開放共享，則該資源在導演平台中即可被搜尋與下載。

研討會論文兩篇與期刊論文一篇：

本計畫雖然是以內容製作系統平台的研發為主，依此主軸之相關研究亦獲得豐碩成果；除已發表兩篇國際研討會論文，另有一篇也投稿至 SCI 期刊論文，並已獲得接受。研討會論文有一篇是關於漫畫式圖案之設計樣式搜尋以及風格分析

(20)

方法，發表在 2nd International Conference on Innovation, Communication and Engineering；另一篇是關於手繪方式搜尋圖案形狀的彈性比對方法，發表在 2013 年的 Technologies and Applications of Artificial Intelligence。

對本計畫的平台而言，第一篇論文的漫畫圖設計樣式的搜尋方法可應用於道具、

人物、場景的搜尋配搭；而第二篇論文的手繪形狀檢索方法則可應用於人物造型的檢索。第一篇論文的延伸研究已擴充為期刊論文 Design Pattern Retrieval and Style Analysis for Content Creation of Comic Figures，投稿至 SCI 期刊 Mathematical Problems in Engineering(I/F 1.383)，並已獲得接受。兩篇國際研討會論文如附件。

完成以此平台製作之展示範例：

我們以經典童話故事「小紅帽」為展示範例，利用此平台製作了小紅帽的卡漫式動畫專案，並錄製成影片；我們也製作數個劇本與相關人物造型於共享資料庫中作為範例。在製作此展示的過程中，我們實際體驗了此戲劇內容製作平台的實用性、簡易性、與趣味性，對於一般使用者非常容易入手；未來如能吸引更多劇本與造型創作者分享劇本、人物造型，並加入造型設計支援的功能，對於平台的推廣應有很大助益。

其它相關的成果：

基礎導演製作與共享平台完成後，我們希望進一步提供創作者更豐富的造型設計支援，讓創作者能更方便地設計多變化的人物造型。例如，創作者可以載入一張圖片(例如皮卡丘或凱蒂貓)，透過去背(Matting)、變形技術(Morphing)快速調整其表情或製作成簡易動畫，以產生較富有變化的動作表情；或是從資源分享庫中找到裝飾元素(例如帽子或雨傘)，給予角色更多元的變化。我們已經合作發表了去背技術，並初步完成人物造型設計子系統的介面雛形，需要進一步研究的是如何讓去背與變形技術達到更好的效果。另外，真人相片的漫畫化處理，對於也是重要的應用技術，我們將在未來將其加入卡漫劇場的造型設計支援系統。最後，我們也已整合語音辨識和動畫劇場，容許在劇本執行中透過辨識結果(關鍵詞)而轉移到不同的劇情列區段；此功能可達成雙向互動式媒體，讓劇情隨著使用者的意圖而改變，實現多模式情境交談的應用。

(21)

1 INTRODUCTION

To place a few objects within limited space is a common issue for quite a few design tasks, such as the furnishings of the furniture, the collocation of dressings, or the layout of the comic figures [1-3]. In such tasks, users’ goals are usually implicit and vague, and require clarifications through the self-exploration, or through the interaction between the customer and the designer. This is mainly because the personal taste and the design styles are often implicit, not easy to describe, and difficult to match. For the collocation of personal dressings, for example, a customer might have only rough ideas, which might include some tendencies of the choices for the colors or accessories due to fashion. In such case it is not easy for the clerks to suggest appropriate collocation of dressings because the preference of the customer cannot be described explicitly as usual.

With the advances of computer technologies, there are more and more design elements that are digitized and accessible on the Internet, such as sketches, photos, images, or 3D models. These elements can be well represented with specific data structure [3,4] and organized in the design.

However, the lack of search methods for the design patterns makes it difficult for the designers to refer to similar patterns, or to compare and learn from them. Provided the patterns can be compared and searched efficiently in the network environment, the learning of design can be facilitated largely while the co-creation among designers becomes more feasible [5]. In addition, the style analysis for a group of patterns is still difficult nowadays.

Conventionally, the design styles can be analyzed or summarized only by experienced designers or experts manually. If the design styles can be analyzed and summarized automatically, it might inspire the designers and facilitate the learning process effectively.

In this paper, the architecture for representing, comparing, searching and analyzing the design patterns of digital contents was proposed, as shown in Fig. 1. Through the approach, the designers can view similar design patterns efficiently during the period of learning, and see how those elements they use are utilized differently and innovatively by other designers. In addition, based on this approach, a lot of patterns can be clustered, analyzed and summarized automatically, which makes it easier to perform the style analysis. As a result, the designers can view more patterns systematically and obtain broader view of the styles efficiently. The proposed approach has been verified successfully on a design support system for the creation of comic figures.

Experimental results show that, it not only appears promising in the domain of digital content creation, but exhibits great commercial potential in the era of design and education.

Pattern Search and Style Analysis for the Design of Comic Figures

Bor-shen Lin and Shang-te Tsai

National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.

ABSTRACT: To place a few objects within limited space is a common issue for quite a few design tasks, such as the furnishings of the furniture, the collocation of dressings, or the design of the comic figures.

Though there are many design elements shared on the Internet nowadays, it is still lack of search methods for design patterns. This makes it difficult for the designers to refer to similar patterns, or to compare and learn from them. In this paper, the architecture for representing, comparing, searching and analyzing the design patterns of digital contents were proposed. Through the approach, the designers can view similar patterns efficiently when learning, and see how similar elements are utilized differently and innovatively by other designers. In addition, the patterns can be clustered, analyzed and summarized automatically, which makes it easier to perform the style analysis. The proposed approach has been verified successfully on a design support system for the creation of comic figures. Experimental results show that, it not only appears promising in the domain of digital content creation, but exhibits great commercial potential in the era of design and education.

Database of Design Elements Web UI for

Comic Design Comic

Patterns

Extraction of Attributes

Pattern Features

{ pi }

Extraction of Attributes Query pattern

Comparison s(q, pi)

N-best Patterns Ranking q

Pattern Clustering

Pattern Clusters

Style Analysis

Summary of Styles

Figure 1. Basic architecture for pattern analysis.

(22)

2 FEATURE EXTRACTION AND PATTERN SEARCH

According to the architecture shown in Fig. 1, the comic patterns are first composed by the designers through the web user interface. The attributes of objects in every pattern are then extracted and represented as the feature vector. The similarity and the distance could then be defined such that the comparison between two patterns can be performed.

The search for the query pattern can hence be conducted accordingly to obtain the N-best patterns.

2.1 Web UI for Designing Comic Figures

First, a relational database of design elements with more than 4,000 colored images was built. Every image in the database was named and tagged manually with several keywords. The images can be classified into five types according to their contents, which include the heads of famous stars, the bodies with various actions and dressings, the shadows of different shapes, the props, and the background scenes. A web interface based on client and server architecture was then provided for the designers to compose the comic patterns, as shown in Fig. 2.

Each composed pattern is the combination of multi-layered image objects that could be the fonts or the images of the five types. As can be seen in Fig. 2, the example comic pattern contains the image objects for the head of Obama, a body with nice suit, a luxury car, two shadows, the scene of the White House, and a banner with the title “Yes! We Can”.

The designers can adjust flexibly the attributes of the elements such as the size, location, tags, font text, type and color, and so on, while applying such special effects as mirror. The edited patterns were output and saved for further analysis.

2.2 Feature Extraction

In order to perform the comparison between the design patterns, the attributes for every pattern need be extracted and represented structurally. Since every pattern contains a variable number of heterogeneous objects with different attributes, conventional feature representation based on the bag-of-words model seems too simple and

inappropriate. Here the attributes edited for the image objects as mentioned in previous section, together with the keywords for the objects, such as president, Obama, car, Cadillac, etc., were extracted as and represented as the feature for the pattern. The feature for each pattern therefore contains a set of heterogeneous objects with various attributes, and is denoted as “pattern feature".

2.3 Similarity and Search

Because the pattern feature is not based on bag-of-words model, the similarity between two patterns cannot be formulated as the typical cosine similarity. Assume the features of two patterns are p and q, where p = { }, q = { }, and ’s and ’s are the image objects in the two patterns, respectively. The number of elements in p and q, and the types (e.g. head or scene) of image objects, and , might be different, as shown in Fig. 3.

The similarity between and should be 0 if the two objects are of different types, while being some score for measuring the “similar attributes”

otherwise, as depicted below.

( ) ∑ (1) if and are of the same type, 0 elsewhere.

Note is some positive score when the values x and y of an attribute for and meet the criteria of similarity, and is 0 otherwise. The criteria of similarity for the attributes could be defined flexibly according to the characteristic of the attributes, as shown in Table 1. Here the criteria for location attribute, for example, is whether the two objects are close enough, while the criteria for font color attribute is whether the distance between two font colors in color space is below some threshold.

In addition, higher score could be assigned to more important attribute when necessary. For the tag attribute, for example, the number of hit keywords can be taken into account for scoring. In this way, the similarity between two objects can be measured numerically by summing the similarities of the attributes. With the similarity ( ) depicted in Eq. (1), the most matched object in q for every object in p can found out.

( ). (2) Note the object and the object must be of the same type. The similarity between p and q can then be defined as below.

∑ . (3)

Figure 2. A web interface for composing the comic pattern.

o s(p_i, q_j)

x □

∆

□ o

∆ o x pattern p

Figure 3. Similarity between two patterns.

pattern q

(23)

Table 1. Attributes and criteria of similarity Attribute Criteria of similarity

id the same image

location distance is less than 30 pixels size difference is smaller than 10 pixels transparency difference of alpha is smaller 30 (0-255) color change difference for specific band of RGB is

lower than 30 (0-255) name/tags contains the same keyword mirror mirror effect is applied to both font size difference is smaller than 8 points font type the same font type

font text contains the same keyword

font color Euclidean distance of two colors in RGB space smaller than 30 (0-255).

Furthermore, with the similarity between two patterns as depicted above, a query pattern can be compared with all patterns in the database one by one such that N-best patterns can be obtained, as shown on the central part of Fig. 1. Note the definition of similarity in Eq. (3) is asymmetric, which is all right for the search problem. When applied to clustering algorithm, however, symmetric similarity or distance need be utilized instead, as defined below.

. (4)

.

3 CLUSTERING AND STYLE ANALYSIS

Traditionally, a design style usually refers to a group of similar design patterns, in which implicit rules of attributes exist. To analyze the design styles therefore means to find out the groups of patterns and summarize their characteristics. In this section, the clustering algorithm was used to generate the clusters of patterns, on which the summarization for design styles could be further performed, as depicted on the right hand side of Fig. 1.

3.1 Clustering of Design Patterns

First, agglomerative clustering is conducted for the patterns. During the bottom-up agglomeration, the distances for all pairs of clusters were computed in order to decide the closest pair for merging. Note that the distance might be the maximum distance, the minimum distance, or the average distance, among all pairs of patterns. Here in this paper the maximum distance is used, as illustrated in Fig. 4.

When the agglomeration was finished, a dendrogram was built. A threshold for distance can then be used to split the dendrogram into clusters. The threshold stands for the upper bound of maximum distances among the patterns for all clusters, and can be adjusted to obtain desired number of clusters.

Agglomerative clustering is flexible for different number of clusters and, by use of local connectivity, can better deal with outliers, which partitioning

clustering algorithms, such as k-means algorithm, usually suffer from. The limitation of agglomerative clustering is, it requires more computations for pair-wise distances and is suitable for moderate amount of data only.

3.2 Style Analysis

After the clustering, a few clusters of design patterns were produced. Since every cluster contains a set of patterns that are similar to one another, it is possible to collect the statistics of the attributes so as to find out the key attributes that contribute most to the similarity of the patterns in the cluster. This is the issue of summarization in data mining. However, since the attribute variables of image objects here are discrete instead of numerical, it is infeasible to summarize the cluster based on the means, variances, or ranges for the variables. In this section, an analysis approach based on the similarities of attributes was proposed, based on which the analysis and summarization on a cluster can be conducted.

As depicted in Eq. (1), the similarity between two objects, and , is accumulated for their common attributes. To perform automatic style analysis on a cluster, the similarity between pair-wise patterns in the cluster in Eq. (2) and (3) could be computed again, and the statistics for the attributes can be collected at the same time. In the meanwhile, the occurrences of keywords for names, tags and font texts can be recorded, and accordingly the style of the cluster can be summarized.

4 EXPERIMENTS

709 comic patterns designed by 66 designers were collected from the web interface. The approach illustrated in Section 2 was first used to search for the 10-best patterns that match the query pattern most, and the results are shown in Fig. 5. The top-left icons in Fig. 5 are the query patterns while the rest are the retrieved patterns. As can be observed in this figure, the proposed approach is able to find out similar patterns that contain the same elements or the elements with similar attributes (the similar size, location, tags, etc.). Through the pattern search, the designers can learn efficiently how those elements they use are utilized differently and innovatively by other designers, which can help to enhance design knowledge and skills effectively.

x

d(C_i, C_j) x x

x

x x x x

x cluster C_i

cluster C_j

p₁

p₂ p₃ p₄

p₅ p₆

Figure 4. Agglomerative clustering and dendrogram.

threshold

(24)

Besides, the clustering algorithm depicted in Section 3 was conducted on the patterns and 46 clusters were produced finally. Three example clusters with a part of patterns are displayed in Fig.

6. As can be observed, the patterns in every cluster (row) exhibit high degree of similarity. The patterns in the first cluster, for example, have common head and body action, while those in the second cluster share the same background and body. The visual cues are very prominent and can help the learners to make conclusions on the style of the cluster intuitively.

When style analysis was further applied, the statistics and contributions of the attributes for every cluster could be obtained. For the three clusters in Fig. 6, the corresponding contributions of the attributes and the summary of keywords are further displayed in Table 2. As can be seen in this table,

cluster 3 has relatively high similarity on the size attribute, while cluster 1 has the highest similarity on the tags due to more common elements. The keywords with high frequency are summarized in the last row, which indicates the style of the cluster.

Table 2. Contributions of the attributes to similarity.

Attribute Cluster 1 Cluster 2 Cluster 3

id 24.36% 20.23% 23.08%

location 5.95% 5.84% 7.69%

size 12.18% 14.01% 23.08%

name 12.18% 13.23% 11.54%

tags 42.78% 40.08% 34.62%

font size 1.42% 0.78% 0

font type 0.85% 0.39% 0

font color 0.28% 5.45% 0

summary of keywords with high frequency

江南大叔 , 朴載相, 騎馬舞西裝 , 騎馬舞

太空基地 , 鋼鐵人 2, 鋼鐵人

財神爺 , 周星馳, ,戴著財神帽

5 CONCLUSIONS

This paper proposed the architecture for representing, comparing, searching and analyzing design patterns. The approach can facilitate the learning of design, and has been verified successfully on a design support system for the creation of comic figures. It not only appears promising in the domain of digital content creation, but exhibits great commercial potential in the era of design and education.

6 REFERENCES

[1] Rahul Swaminathan, Robert Schleicher, Simon Burkard, Renato Agurto and Steven Koleczko.

(2013). Happy Measure: Augmented Reality for Mobile Virtual Furnishing. International Journal of Mobile Human Computer Interaction . pp.16-44.

[2] Zulikha Jamaludin. (2011). Designing Interface for Girls: Looking at the ‘Form’ and ‘Content’

Factors in a Comic Container . User Science and Engineering (i-USEr). pp.36-41.

ISBN:978-1-4577-1654-6.

[3] DaeKyu Jung, Hui Yong Kim, Han-Kyu Lee, JeHo Nam and Jin Woo Hong . (2008). Development of a Packaging Storage Format for Electronic Comic Services with Metadata. Consumer Electronics.

ICCE 2008. Digest of Technical Papers.

ISBN:978-1-4244-1458-1.

[4] Kohei Arai and Tolle Herman. (2010). Method for Automatic E-Comic Scene Frame Extraction for Reading Comic on Mobile Devices. Information Technology: New Generations (ITNG). pp.370-375.

ISBN:978-1-4244-6270-4.

[5]Thomas Kohler, Johann Fueller, Kurt Matzler, and Daniel Stieger. (2011). Co-Creation in Virtual Worlds: The Design of the User Experience. MIS quarterly 35.3 (2011): 773-788.

obtained

Figure 5. Example of top-10 search results for 2 queries.

Figure 6. Example three clusters with a part of patterns.

(25)

Elastic Warping of Radial Features for Shape Alignment in Sketch Retrieval

Bor-Shen Lin, Wen-Chi Yeh

National Taiwan University of Science and Technology Taipei, Taiwan

[email protected], [email protected]

Yen-Chun Lin

Chang Jung Christian University Tainan, Taiwan

[email protected]

Abstract—Query by sketch, which allows users to issue the query intuitively and flexibly, is an easy way of retrieving objects. In this paper, we propose an elastic warping scheme of radial features for shape alignment in the spatial domain based on the dynamic time warping algorithm. This scheme can align flexibly the non-silhouette shapes consisting of point sets with adjustable constraints. Through the optimization of the algorithm, the scheme can achieve significantly better performance than direct matching of shapes, and exhibit error tolerance for translation, scaling, or rotation. It can also apply to the retrieval of real images containing natural or artificial objects. Experimental results show that the approach can retrieve the target images successfully. Furthermore, it can find totally different images with similar shapes.

Keywords-sketch retrieval; radial feature; shape alignment;

elastic warping; shape recognition

I. INTRODUCTION

Content-based image retrieval (CBIR) [1] has been an important research area for more than a decade because the information an image can represent is tremendous and often cannot be described precisely with words. Through CBIR, the user is allowed to retrieve images in intuitive and flexible ways, e.g., query by example and query by sketch [2-4].

Query by example allows users to issue the query with example images, while query by sketch gives users the freedom of issuing the query with the primitive features such as shapes or colors. In some applications like personal sketch book, query by sketch is more attractive because it gives the user larger space for imagination when the user may draw a sketch by hand freely [4]. This is desirable because the goals of image retrieval include not only obtaining the target images but finding out interesting images with similar features in the database.

In most CBIR systems, images are usually matched according to the structural or statistical information such as shape, texture, color, and spatial relationships. The features obtained from statistics, such as histograms or trigrams [5-7], often contain the global statistical information but lose the spatial information, so they are usually appropriate only for image retrieval based on color or texture.

To retrieve similar objects based on the shape, some contour-based or region-based shape descriptors with associated matching scheme have been proposed [8-14].

Many approaches were proposed for matching the contours.

The curvature scale space method, for instance, extracts

curvature scale space maxima from the contour such that the shapes can be compared [10], while the active shape model aligns two contours directly through a linear transformation [11]. The moment Fourier descriptor, on the other hand, can match two regions according to the features derived from the moments of line segments [12]. There is also an elastic matching method that tries to match a contour with the gradient distribution of an image [13]. Among the approaches, few preserve and make use of precise spatial information, and only a few of them are appropriate for the alignment of non-silhouette shapes [14].

In this paper, an elastic warping scheme of radial features for shape alignment in spatial domain based on dynamic time warping algorithm [15] was proposed. This scheme can align the non-silhouette shapes consisting of point sets flexibly but with adjustable constraints. Through the optimization of the algorithm, it was verified that the scheme can achieve significantly better performance than direct matching of shapes, and exhibit error tolerance for translation, scaling, or rotation. This scheme was also applied to the retrieval of real images containing natural or artificial objects. Experimental results show that the approach is able to retrieve target images successfully. Furthermore, it can find totally different images with similar shapes.

II. SKETCH REPRESENTATION

A. Sketch Normalization

A sketch is represented as a set of 2D points, which is denoted as point set P. The centroid of the point set P can then be computed as the average for all points in P, as follows.

𝒄 _|𝑃|∑_𝑃 (1)

where |P| is the number of points in P. In order for the query sketch to be translation invariant, all points are shifted to a coordinate system whose origin is the centroid. That is,

𝒄 for every 𝑃. (2) The shifted points can be further represented in the polar coordinate system such that the calibration for rotation between the query sketch and the target sketch can be done with less effort. If point is represented as y in the Cartesian coordinate system, then the polar coordinate

θ can be computed by

(26)

y ^/

θ t n^- y/ . (3) The derived point set P̃, {(r, )}, can then be used to derive the radial feature in the following section.

In addition, the query sketch and the target sketch might be of different scale. To normalize the scale, the radius can be divided by the standard deviation of radius as described below.

' /σ

𝜎 _|𝑃|∑_𝑃| 𝒄| . (4) The new point set P̃^', {(r’, )}, with normalized radiuses can also be used to compute the radial feature. The point sets 𝑃̃

and 𝑃̃ are referred to as no-scaling and scaling, respectively, in the experiments of later sections.

B. Radial Features

A normalized point set, 𝑃̃ or P̃^', can be further converted into the radial feature S⃗ that consists of a sequence of point sets,

S⃗ { 𝑆𝑘 }_𝑘=0^𝑁−

𝑆𝑘 { ∶ 𝜃 ^𝜋𝑘_𝑁 }, (5) where N is the angular sampling rate. As described above, the points in the normalized point set are divided into N subsets, 𝑆_𝑘, according to the directions of the points relative to the centroid, which correspond to θ 2𝜋𝑘/𝑁. The angular sampling rate N influences both the quantization errors on the angle and the quality of the radial feature. If N is 360, for example, the value of k could be 0, 1, through 359, which accounts for the precision of 1 degree. If the sampling rate is too low, the quantization errors would be high, which might degrade the retrieval performance.

Two similar sketches without normalization of radius and their corresponding radial features are shown in Fig. 1 (N = 360). In Fig. 1 the horizontal axis is the angle index in unit of degree, while the vertical axis is the radius in unit of pixel. It could be found in this figure that the point sets 𝑆_𝑘’s for individual angles might contain different number of points.

The point set S60 at degree 60, for instance, contains two points, while the point set S0 at degree 0 contains only one point. Therefore, the radial feature is a sequence of point sets with variable number of points. In addition, it can be observed in Fig. 1 that, the two similar sketches on the left hand side have similar but a little different radial features on the right hand side. How two radial features can be matched appropriately with each other will be discussed in later sections.

Figure 1. Example sketches with corresponding features.

III. SKETCH ALIGNMENT AND RETRIEVAL

A. Elastic Alignment

In an information retrieval system, it is a fundamental issue to compare the query with every document in the database based on the features. It is hence indispensable to define the similarity or distance between the radial features of the sketches here. Assume the angular sampling rate is held constant. The radial features for the query and the document are then sequences of the same length, denoted by

Q⃗⃗ {Q_i}_i=0^N- and D⃗⃗ {D_j}_j=0^N-, respectively. Since the two sequences are of the same length, it is easy to compute the distance of them by straight alignment. However, since the sketches are drawn by hand freely, there might be some, if not many, warping errors between two sketches even though they look similar, as can be observed from Fig. 1.

That is to say, straight alignment between two sequences is perhaps not the best strategy.

To solve this issue, an elastic warping scheme based on dynamic time warping (DTW) is proposed, as depicted in Fig. 2. The goal of DTW is to find the optimally matched path with the smallest path distance between two sequences.

This method allows every node in the search space to remain in the same state, horizontally or vertically, or to progress to the next state, as depicted in Fig. 2. Since the path is continuous on the state space, path continuity is the basic constraint on the alignment of the two sequences.

To further limit the degree of warping, mechanisms of maximum number of repeated states and penalty are applied.

Penalty is a soft constraint that penalizes (increases) the path distance in case any repeated state occurs, while maximum number of repeated states puts a hard constraint on the total number of repeated states for any candidate paths. These two constraints can be adjusted appropriately to obtain better alignment between two sequences of radial features such that the distance can be more discriminative.

Figure 1. Example sketches with corresponding features.

(27)

Figure 2. Elastic warping based on dynamic time warping algorithm.

* the number of repeated states is increased, too, and the term is ignored in case it exceeds a maximum value.

In addition, in the dynamic time warping algorithm, the distance function, i t(Q_i D_j), between two elements in the two sequences needs to be defined. Since the elements of the feature sequence here are the point sets with variable number of points, it is necessary to define the distance function for two point sets with probably different number of points. To make the distance robust to the shapes of sketches with sophisticated internal texture, in this paper a fuzzy distance function is defined as the minimum among the distances between any pairwise points in 𝑄 and 𝐷 ,

i t(Q_i D_j) in_{u Q}_i_{v D}_j| u - v |, (6) and the computation is further illustrated in Fig. 3. The distance function in (6) can be applied to the warping algorithm described in Fig. 2 so as to obtain the optimal path with associated path distance. Finally, the distance between the two sequences 𝑄⃗ and 𝐷⃗⃗ is then the path distance normalized by the path length L.

𝑑_𝐷𝑇𝑊(𝑄⃗ 𝐷⃗⃗ ) 𝐷 𝑁 𝑁 /𝐿. (7) B. Sketch Retrieval Scheme

With the feature representation and distance measure for two sketches as described previously, it is possible to build the sketch database and the retrieval system, as shown in Fig.

4. Every sketch in the database, here denoted as sketch document Dl, need first be represented in the form of point set as illustrated in Section 2, where l is the index of the document. The sketch documents, {Dl}, are then converted into the radial features, {D⃗⃗⃗ _l}, as shown on the right hand side of Fig. 4. A query sketch denoted as Q, on the other hand, can also converted into the radial feature, 𝑄⃗ . The distances between the query feature and the document features can then be computed one by one, and further ranked to find out the N-best sketches that have the lowest distances with the query sketch.

Figure 3. Computation of distance between two point sets.

Figure 4. Architecture of the sketch retrieval system.

C. Evalution Metrics

Conventionally, precision and recall are popular metrics for evaluating the performance of information retrieval systems. However, the precision can only account for N-best inclusion rate without considering the ranking indexes of the included items. Mean average precision (MAP), on the other hand, is also a widely used metric that can tell whether the rankings of included items are good or not. The average precision for a query is computed by

AP _R∑^R_i=_{r i}ⁱ , (8) where R is the number of relevant items included in the N-best output, and r(i) is the ranking index of the i-th relevant item. For example, when the relevant items in 10 best output are located at 1, 2, and 3, the AP is ₃(

3

3) . If they are located at 8, 9 and 10, the AP becomes

3(₈ ₉ ₀³) ≅ 0.2 57. Clearly, the mean of the APs for all 𝑁𝑄 queries, can then be computed:

MAP

N_Q∑^N_j^Q AP j . (9) D. Direct Matching of Point Sets

In the proposed retrieval scheme shown in Fig. 4, the distance between two sketches is computed based on the distance for the radial features. It is also possible to compute the distance based on the match of normalized point sets (P̃

or P̃ ) as defined in Section 2. That is,

DM Q̃ D̃ _|Q|∑q Q̃ _EU ^*

in𝒅 𝐷̃𝑑𝐸𝑈 𝒅 . (10) Note is the point in document 𝐷̃ which is the closest to point in query Q̃ , and 𝑑𝐸𝑈 𝒙 𝒚 is the Euclidean distance between the two points. The distance _DM Q̃ D̃

can replace the distance 𝑑(𝑄⃗ 𝐷⃗⃗ ) in Fig. 4, and will be compared with DTW in a later section.

Document Features

{ 𝐷⃗⃗⃗ }_𝑙 Sketch Documents{ Dl }

Query Feature

𝑄⃗ Feature Extractio n

Ranking N-best

Sketches

Query Sketch Q

Feature Extraction

Distance

𝑑 𝑄⃗ 𝐷⃗⃗ 𝑙 𝐷^Features⃗⃗⃗ _𝑙 of

Documents{Dl