智慧財產權保護碼

(1)

國立交通大學

應用數學系

碩士論文

智慧財產權保護碼

On Codes for Copyright Protection

研究生：黎冠成

指導老師：符麥克教授

(2)

On Codes for Copyright Protection

研究生: 黎冠成 Student：Li, Guan-Cheng

指導教授: 符麥克 Advisor：Michael Fuchs

國立交通大學

應用數學系

碩士論文

A Thesis

Submitted to Department of Applied Mathematics College of Science

National Chiao Tung University in Partial Fulfillment of the Requirements

for the Degree of Master

in

Applied Mathematics June 2006

Hsinchu, Taiwan, Republic of China

(3)

L

I

, G

UAN

-C

HENG

O

N

C

ODES FOR

C

OPYRIGHT

P

ROTECTION

Master Thesis

In Partial Fulfillment of Requirement

For the Degree of Master

Advisor:

Professor Michael Fuchs

Submitted to

Institute of Applied Mathematics

College of Science

National Chiao Tung University

Hsinchu, Taiwan, Republic of China

(4)

Contents i

Preface ii

Acknowledgement iii

1 Introduction 1

2 Definitions and Basics 5

2.1 Some Coding Theory . . . 5

2.1.1 Hamming Distance . . . 5

2.1.2 Hamming Weight . . . 5

2.1.3 Minimum Distance . . . 6

2.1.4 Error Correcting Code . . . 6

2.1.5 Code Composition . . . 6

2.2 Descendence . . . 7

2.3 Frameproof code . . . 8

(5)

CONTENTS ii

2.4 Secure Frameproof code . . . 8

2.4.1 Separating Weights . . . 9

2.5 Identifiable parent property code . . . 10

2.6 Traceability code . . . 11

2.7 Relations . . . 11

3 Hash Families and Codes 13 3.1 Hash Functions . . . 13

3.2 Perfect Hash Families . . . 14

3.3 Separating Hash Families . . . 14

3.4 Difference Matrices . . . 15

3.5 Set Systems . . . 16

3.6 Sandwich Free Families . . . 16

3.7 Secure Codes . . . 17

4 Unreadable Marks and PTT 19 4.1 Unreadable Marks . . . 20

4.2 Probabilistic Traitor Tracing . . . 21

5 Constructions of SFP Codes 28 5.1 Hadamard Matrices and Jacobsthal Matrices . . . 28

(6)

5.3 Concatenation Method . . . 38

5.4 Conversion from Hash Families . . . 40

5.5 Linear Codes . . . 47

5.6 Comparisons . . . 51

6 Summary 52

A Acronyms 53

(7)

Preface

Unauthorized illegal duplication is a major problem in many areas. For digital media, duplication is especially easy because copying such material is immediate and no information is degraded in the process. In addition, the growth of the Inter-net makes it possible to distribute the material in a much larger scale than before. Because of both technical and legal issues, it is often difficult to find and prosecute the pirates. Hence, to protect digital copies is a complicated task. Recently, elec-tronic fingerprinting was devised as a method to discourage people from illegally redistributing their legally purchased copy.

Electronic fingerprinting deals with the problem of object identification through the use of electronic marks, unique to each object. We consider fingerprinting for the purpose of protecting innocent users from being framed and tracing of illegit-imately copied and distributed data, so called pirate copies.

We examine the possibilities of designing fingerprinting codes that are resis-tant to tampering. We show that under certain assumptions, we are often able to protect blameless users and even trace back the criminals.

Also, with the model we describe, the result of tracing should be reliable. That is, our tracing may fail in the sense that no pirates are identified, but we should not mistakenly accuse an innocent user. In this thesis, we mainly focus on a number of code constructions, and discuss their mathematical properties against piracy.

(8)

First of all, I am especially grateful to my supervisor Professor Michael Fuchs for his incessant patience in giving me professional guidance on mathematics re-search. He is an enthusiastic mentor who taught me not only the essence of the theory but also to articulate it clearly and convincingly. This is particularly valu-able for me since I was trained to be a problem solver rather than a seller of my theory under traditional education before.

I want to thank Professor Kar-Kin Zao of Institute of Computer Science for providing inspirations in his course “Internet Security” which gave me related ideas on electronic security.

I would also like to thank Professor Ta-Yuan Huang for his endless encourage-ment, in particular, regarding the undergraduate training of algebra in his class, which later introduced me into wonderful fields of mathematics and computer science.

Since this is my first time typing in LateX, I owe many thanks to my senior classmates for giving me advice about the typesetting of a thesis. Finally, I would show my appreciation to the support of my beloved girlfriend, Yu-Han and my family, and ascribe the completion of my degree to my family members.

Li, Guan-Cheng [email protected] National Chiao Tung University May, 16th, 2006.

(9)

Chapter 1 Introduction

Conventional mechanisms for copyright protection are obviously incapable of treating digital data owing to the essential difference of the documents. This leads to the interest of developing other means for deterring the pirates from illegally re-distributing products. Digital fingerprinting, for example, can serve our purpose. A fingerprint is a set of number sequence added to digital data that can be de-tected or extracted later to make an assertion about the data. The fingerprint can be applied in several areas, including:

• Ownership assertion

• Authentication and integrity verification • Content labeling

• Digital watermarking • Access control protocols • Content protection

• Detection of copyright violations • Secure on-line multimedia distribution • Resource usage control

(10)

• Trust and trust management

With digital fingerprinting, a publisher embeds a unique fingerprint into each distributed copy of a document, keeping a database of sold copies and their cor-responding buyers. If an illegally distributed copy is discovered, the publisher would certainly want to trace back to the unauthorized user by comparing its fin-gerprint to the database. Because of the uniqueness of the finfin-gerprint, the pirates would introduce some kind of marking distortion upon the documents. In order to redistribute illegal copies anonymously, a pirate may try different types of at-tacks to disclose the fingerprint. Assuming that the pirate has an access to a single document copy, that has been marked for him, he may try to restore the original document by identifying and removing the fingerprint. However, such an attack may be questionable if the fingerprinting is hidden carefully and scattered all over the document. A stronger attack results if several pirates collude and compare their independently marked copies. They can identify the hidden fingerprint by locating the differences among their copies, replace them with other feasible marks, com-bine their copies into several new ones whose fingerprint differs from all of the pirates, and resell their pirated products with different fingerprints without ever worrying about being caught. The copies replaced by feasible marks are called the descendence as will be made precise in Chapter 2.

Frameproof codes were introduced by Boneh and Shaw [8] as a method of digital fingerprinting which prevents a coalition of a specified sizew¶_from fram-ing a user not in the coalition. Several constructions ofw-frameproof codes were

mainly introduced later on by Stinson, Wei, Encheva, and Cohen [12,14,30].

Besides the design of frameproof codes against piracy, an efficient traitor trac-ing algorithm might be necessary in order to identify the offenders. The traitor tracing problem was introduced by Chao, Fiat and Naor for broadcast encryption systems, where the data should be accessible only to authorized users. When an illegal copy produced by a group of authorized users of the copyrighted material is detected, traitor tracing schemes allow to trace back at least one producer of it. In particular, these schemes are suitable for pay-per-view TV applications. We consider, as an example, a pay-per-view movie type scenario introduced by Fiat and Tassa. In this scenario, the content is divided into n segments. Each of this

(11)

CHAPTER 1. INTRODUCTION 3 segments is marked with one ofq different symbols. Each user receives a

differ-ently marked copy of the content. The ordered set of the marks for each copy can be given as aq-ary vector of length n. A coalition of colluding users can make an

illegal copy by combining different segments of their data and broadcast it. After an illegal copy is detected, traitor tracing schemes attempt to reveal at least one traitor. The practical applications require to accommodate as many users as pos-sible when there is a restriction on the number of symbols which can be used for marking the data. On the other hand, some digits of the codes, whatever registered or pirated, might happen to be erased or appear undetectable however accidentally or deliberately. Therefore, there might be a need to distinguish codes in more than one position in order to be fault-tolerant.

Several codes providing some forms of traceability have been designed to be used in these schemes. These codes have been extensively studied in recent years. The weak forms are frameproof (FP) codes and secure frameproof (SFP) codes. A stronger form includes identifiable-parent-property (IPP) codes introduced by Hollmann, van Lint, Linnartz and Tolhuizen [21], and traceability (TA) codes in-troduced by Chor, Fiat and Naor [10]. Such codes allow the tracing of at least one parent of any illegal copy when the size of the coalition of colluders does not ex-ceed some given numberw. Their combinatorial properties and related structures

with codes have been studied by Hollmann et al., Staddon, Stinson and Wei, Barg, et al. and Sarkar [28,30,31,21].

As a matter of fact, FP codes turn out to be a subclass of SFP codes, SFP codes are a subclass of IPP codes, and IPP codes are a subclass of TA codes. They will be mathematically formulated in Chapter 2. Their relationship with hash families will be treated in Chapter 3.

The aim of this thesis is to study the above codes under the presence of un-readable marks. In such a situation, Boneh and Shaw [8] pointed out that codes with traitor tracing properties do not exist. This will be made precise in Chapter 4. They provided an alternative, slightly weaker form of traceability codes by us-ing randomness and probabilistic traitor tracus-ing. Their work is important from an application point of view because they trade off some accuracy for a fast traitor-tracing algorithm under the condition that undetectable marks exist. Hence, IPP and TA codes are only interesting from a theoretic point of views and are less ap-plicative owing to the intolerance of undetectable marks. The probabilistic traitor

(12)

tracing (PTT) algorithm due to Boneh and Shaw will be presented in the second half of Chapter 4.

However, it should be pointed out that if there are too many unreadable marks then even the probabilistic approach fails. An extreme case would be a codeword filled with unreadable marks which is totally impossible for the distributor to rec-ognize, not mentioning tracing back. However, the pirated products with unread-able marks will soon be detected by the distributor, and in practical situations, the pirates will scatter only a few unreadable marks to the products in order to falsely convince the customers that the pirated products are copyrighted ones.

On the other hand, FP and SFP codes are immune from undetectable marks. Since SFP is stronger than FP, SFP codes find more practical applications such as the distribution of multi-license. In such a scenario a distributor sells his products to an institution instead of an individual. The distributor then gives a couple of codes as a base to generate more codes for the use of employees in the institution. The distributor certainly hopes that the base codes exhibit the secure frameproof property so that codewords authorized to each institution can be treated indepen-dently.

We conclude the introduction by giving a sketch of the thesis. In Chapter 2, we will provide the basic definitions which will be more general than the original definitions given by Stinson in [30]. Chapter 3 is then dedicated to the relation-ship between hash families and codes. In Chapter 4, we study unreadable marks and the probabilistic approach, and prove that IPP and TA codes do not exist. Fi-nally, in Chapter 5, we investigate explicit constructions for SFP codes. Most of the results in Chapter 4 and 5 are taken from the literature. We however tried to increase clarity by adding more details and giving simplified proofs of many re-sults. Moreover, we tried to give a complete picture by incorporating all results presently known concerning codes for copyright protection under the presence of unreadable marks.

(13)

Chapter 2 Definitions and Basics

2.1 Some Coding Theory

Throughout the thesis, we denote byN the code length, by n the code size, and

byq the number of alphabets over a code C.

2.1.1 Hamming Distance

Definition: The Hamming distancedH between two codewords is the number of

positions whose entries are different.

Example 2.1. dH(11001, 01101) = 2

2.1.2 Hamming Weight

Definition: The Hamming weight denotes the number of nonzero entries in a

codeword.

Example 2.2. The Hamming weight of(1, 0, 1, 1, 0) is usually denoted as weight (1, 0, 1, 1, 0) = 3.

(14)

2.1.3 Minimum Distance

Definition: The minimum distance of a code C _⊆ PN is the least Hamming distancedH(x, y) between any pair of different codewords x, y∈ C.

2.1.4 Error Correcting Code

Definition: The(N, k, d)q-Error Correcting Code (ECC) is aq-ary linear code

with cardinalityk, code length N, and minimum Hamming distance between any

two codewords d. It follows that the code rate R is k/n and code size is qk_{. In}

some situations we also need to specify by D the maximum Hamming distance

between any two codewords. Normally we omit the subscript in the binary case. In the nonlinear case, (N, n, q) is a q-ary code of length N with code size n. The rate is computed as N−1_log

q|n|. The following two nonlinear codes are

for practical applications. One is the constant-weight code being a binary code whose codewords have a fixed number of 1′_{s. The other is the equidistant code}

being a code where any two codewords enjoy a fixed Hamming distance. We further introduce some more terminology for linear ECC as follows:

Theorem 2.1 (Singleton Bound). For a code C : Pk _7→ PN with minimum distanced, N > k + d− 1.

Codes satisfying the equality of Singleton Bound are called Maximum Distance

Separable (MDS) code.

A codeC with odd d is said to be a Perfect Code if for every codeword w of length N not in C, there is an unique codeword w0inC such that dH(w, w0) 6 (d−1)/2.

2.1.5 Code Composition

Definition: Let A be an (N2, n2, q2) code over an alphabet Q2 with |Q2| = q2

(15)

CHAPTER 2. DEFINITIONS AND BASICS 7

{a1, . . . , aq2} and let B = {b1, . . . , bq2}. Let θ : Q2 7→ B be the one-to-one

map-ping defined byθ(ai) = bi for1 6 i 6 q2. For any codeworda = (a1, . . . , aN2)∈

A we denote by ˜a = (θ(a1), . . . , θ(aN2)) = (b1, . . . , bN2) the q1-ary sequence of

lengthN1N2obtained froma by using θ. The set

A⋆B =_{{˜a = (b}1, . . . , bN2)| (a1, . . . , aN2)∈ A}

is called (N1N2, n2, q1) concatenation code of A and B, with inner code A and

outer codeB.

2.2 Descendence

Certain properties of the codes discussed above can be formulated using mathe-matical notations. Subsequently, let C be a code of length N on an alphabet Q

with_{|Q| = q.}

We denote by “?” the unreadable mark deliberately or accidently inserted into the pirated codewords. For any subset of codewordsC0 ⊆ C, we define the set of

descendants ofC0, denoteddesc(C0) by

desc (C0) := ( x_{∈ Q}N : xi ∈ ( {ai : a∈ C0} , if| {ai : a∈ C0} | = 1; {ai : a∈ C0} ∪ {?}, otherwise. ) .

Namely, the setdesc(C0) consists of the N-tuples plus perhaps some unreadable

marks that could be produced by a coalition holding the codewords in the setC0. If

in a certain entry there is only one choice for the coalition, then only that feasible element will be used in that entry. Besides, the coalition could choose more than one elements plus a question mark.

Letw _{∈ N be the number of codewords a coalition could have. We define the} w− descendant code of C, denoted descw(C0)††, as follows:

descw(C) :=

[

C0⊆C,|C0|≤w

desc (C0) .

(16)

In other words, the setdescw(C) consists of the N-tuples that could be

pro-duced by comparing the codewords they jointly hold by some coalition of size at mostw. Example 2.3. LetC = _{{(1, 2, 0, 1, 1), (2, 2, 0, 1, 0)}.} Thendesc2(C) =        1 2 ?  , 2, 0, 1,   0 1 ?        . And,_|desc2(C)| = 9.

Remark 2.1. Two pirated codewords (1,0,0,?,?) and (1,0,1,?,?) are obviously

dif-ferent because of the third entry. However, when given two codewords (1,0,1,?,?) and (1,0,1,?,?), we still treat them differently although they might become the same codewords.

Next, we give the definitions concerning the mathematical properties required by FP, SFP, IPP, and TA codes.

2.3 Frameproof code

Definition: C is a w-frameproof (FP) code provided that for all x _{∈ desc}w(C),

x_{∈ desc(C}i)∩ C implies x ∈ Ci.

Roughly speaking, a code isw-frameproof if no coalition of size at most w

can frame another user not in the coalition by producing the codeword held by that user.

2.4 Secure Frameproof code

Definition: C is a w-secure frameproof (SFP) code provided that for all x _∈ descw(C)∩ QN, x∈ desc(Ci)∩ desc(Cj) implies that Ci∩ Cj 6= ∅, where i 6= j.

In other words, a code isw-secure frameproof if no coalition of size at most w

can frame a disjoint coalition of size at mostw by producing an N-tuple that could

(17)

CHAPTER 2. DEFINITIONS AND BASICS 9 disjoint coalitionsC1andC2of size at mostw, we know that they cannot produce

the same false fingerprint, i.e.,desc(Ci)∩ desc(Cj)∩ QN =∅.

Remark 2.2. Note that FP and SFP codes are resistent from the threats of

un-readable marks because if innocent users are safe from being framed by colluded codewords, they are even safer from being framed by those codewords with un-readable marks under the assumption mentioned earlier in Remark2.1.

2.4.1 Separating Weights

Here, we do not look at the unreadable marks.

Definition: The separating weight λw of two coalitions is the least number of

positions where the descendences of them are separated. The normalized separat-ing weight isτw := λw/N where N is the code length.

Obviously, a code isw_{− SF P if and only if λ}w > 0.

Sometimesλw is incremented by various means such as concatenation method

in order to overcome some undetectable marks problem. Namely, if some unread-able marks occurs in a supposedly separating position, other positions can serve as a backup in order to separate codes correctly.

Example 2.4. The code_{{1122334, 2112433, 1212343} is a 2-SFP with λ}2 = 2.

(18)

user3 as u(3)_.

Coalition (_{{user 1 and 2}) = desc}2

u(1)_{, u}(2) =1 2 , 1,1 2 , 2,3 4 , 3,3 4

u(2), u(3) =1 2 ,1 2 , 1, 2,3 4 ,3 4 , 3

u(1), u(3) = 1,1 2 ,1 2 , 2, 3,3 4 ,3 4

Note that the coalition of user1 and 2 cannot frame user 3 because of the second

and sixth entries, the coalition of user2 and 3 cannot frame user 1 because of the

third and seventh entries, and the coalition of user 1 and 3 cannot frame user 2

because of the first and fifth entries.

Note that the separating weight of such code isλ2 = 2 because they are

differ-entiated in at least two positions. The normalized separating weight is therefore

τ2 = 2/7.

2.5 Identifiable parent property code

Definition:C is a w-identifiable parent property (IPP) code provided that for all x_{∈ desc}w(C), it holds that

\

i : x∈desc(Ci)

Ci 6= ∅.

A code enjoys thew-identifiable parent property if no coalition of size at most w can produce an N-tuple that cannot be traced back to at least one member of

the coalition. In such a code, whenever a codeword belongs to the descendance of a coalition of size at mostw, at least one of the parents of the coalition can be

(19)

CHAPTER 2. DEFINITIONS AND BASICS 11

2.6 Traceability code

Definition: For x, y ∈ QN_{, define}_{I(x, y) =} _{{i : x}

i = yi}. C is a w-traceability

(TA) code provided that, for allx∈ descw(C), x∈ desc(Ci) implies that there is

at least one codewordy_{∈ C}isuch that|I(x, y)| > |I(x, z)| for any z ∈ C \ Ci.

In fact,I(x, y) stands for the closeness of two codewords, which can also be

expressed as N − dH(x, y), where N denotes the length of the codeword, and

dH(x, y) is the hamming distance of two codewords.

A code enjoying the w-traceability property allows an efficient (i.e.,

linear-time) algorithm to determine an identifiable parent. More precisely, if we com-pare an illegal codeword to each codeword in C, then the codeword closest to

the illegal one will be one of the parent in the coalition. Note that TA property is much stronger than just IPP property which necessitates comparisons with _wn sets, resulting in a nonlinear running time.

Remark 2.3. It has to be made clear that IPP and TA codes appear vulnerable

under the presence of unreadable marks because by definition we can say nothing if there are “?”, not mentioning identifying or tracing the parents. This will be justified in the beginning of Chapter 4 where we show that they in fact do not exist.

Remark 2.4. If there are no unreadable marks in the pirated codewords, then

IPP and TA codes can exist. However, the constructions of IPP and TA codes will not be treated because they are only of theoretical interest owing to intolerance of unreadable marks.

In the sequel, we point out the relationships of these codes.

2.7 Relations

1. w-SFP implies w-FP. This is self-explanatory if we treat an individual as

an independent coalition. Let one coalition_{A be of size at most w and the} other coalition_{B be simply one individual. w-SFP assures that two disjoint}

(20)

coalitions of size at mostw cannot produce the same codeword. The

coali-tion _{B is a trivial coalition since the descendence of B is B itself, which} would not be framed by coalition_{A by the definition of SFP.}

2. w-IPP implies w-SFP. This is clear because IPP itself is an intensified

ver-sion of SFP. Namely,(Ci∩ Cj)⊆

T

i : x∈desc(Ci)Ci 6= ∅.

3. w-TA implies w-IPP. Suppose C is a w-TA code. If x ∈ descw(C), then

there is a aubset Ci ⊆ C, where |Ci| = w, such that x ∈ desc(Ci). Let

y _{∈ C}i such that |I(x, y)| > |I(x, z)| for all z ∈ Ci. Hence |I(x, y)| >

|I(x, z)| for any z ∈ C by the definition of a w-TA code. We show that,

for any Cj ⊆ C with |Cj| 6 w, x ∈ desc(Cj) implies y ∈ Cj. In fact, if

y6∈ Cj, then there isw∈ Cjsuch that|I(x, w)| > |I(x, y)| by the definition

of a w-TA code. This contradicts the fact that|I(x, y)| > |I(x, z)| for any z _{∈ C.}

(21)

Chapter 3 Hash Families and Codes

Before going into explicit constructions of such codes, some preliminaries are needed to reinforce the mathematical structures and serve as basic tools in the construction.

Recently, hash families and related structures have been used to construct codes for copyright protection. Subsequently, we will define them and discuss their inter-relationship with the codes defined in the previous chapter.

3.1 Hash Functions

Let n > m. An (n, m)-hash function is a function h : A _{7→ B, where |A| = n}

and _{|B| = m. An (n, m)-hash family is a finite set H of (n, m)-hash functions} such thath : A7→ B for each h ∈ H, where |A| = n and |B| = m. We use the

notationHF (N; n, m) to denote an (n, m)-hash family with_{|H| = N.}

(22)

3.2 Perfect Hash Families

Let n, m and w be integers such that n > m > w > 2. An (n, m, w)-perfect

hash family is an(n, m)-hash family,H, such that for any X ⊆ A with |X| = w,

there exists at least one h _{∈ H such that h|}X is injective. We use the notation

P HF (N; n, m, w) to denote an (n, m, w)-perfect hash family with_{|H| = N.}

3.3 Separating Hash Families

Let n, m, w1 andw2 be integers such that n > m. An (n, m, w1, w2)-separating

hash family is an (n, m)-hash family, H, such that for any X1, X2 ⊆ A with

|X1| = w1,|X2| = w2andX1∩ X2 =∅, there exists at least one h ∈ H such that

{h(x) : x ∈ X1}∩{h(x) : x ∈ X2} = ∅. We use the notation SHF (N; n, m, w1, w2)

to denote an(n, m, w1, w2)-separating hash family with|H| = N.

[16] provides a survey on hash families. The following theorem is immediate from the definition of perfect hash families and separating hash families.

Theorem 3.1. Let_{H be an (N; n, m) hash family.}

1. If_{H is a P HF (N; n, m, w), then it is a P HF (N; n, m, w}′) for all w′ ₆_w.

2. If_{H is a SHF (N; n, m, w}1, w2), then it is a SHF (N; n, m, w′1, w2′) for all

w′

1 6w1 andw′2 6w2.

3. If_{H is a P HF (N; n, m, w}1+ w2), then it is a SHF (N; n, m, w1, w2).

Next, we establish the relationship between hash families and codes, we depict a(N, n, q)-code, C, as an n× N matrix M(C) on q symbols, where each row of

the matrix corresponds to one of the codewords. Similarly, we can represent an

HF (N; n, m), _{H, as an N × n matrix on m symbols, where each row of the}

matrix corresponds to one of the functions in_{H. These two matrices are transpose} to each other.

(23)

CHAPTER 3. HASH FAMILIES AND CODES 15 Given an (N, n, q)-code C, we define H(C) to be the HF (N; n, q) whose

matrix representation is M(C)⊤_{. Thus if} _{C =} _{x1_{, x}2_,_{· · · , x}n_{} and 1 6 j 6 N,}

then the hash functionhj ∈ H(C) is defined by the rule hj(i) = xij, 1 6 i 6 n.

Obviously, the matrix representation of PHF and SHF should satisfy the fol-lowing:

Lemma 3.1. AP HF (N; n, q, w) can be depicted as an N_{×n matrix with entries}

from _{{1, 2, . . . , q} such that in any w columns, there exists at least one row such} that thew entries are distinct.

Lemma 3.2. ASHF (N; n, q, w1, w2) can be depicted as an N × n matrix with

entries from_{{1, 2, . . . , q} such that in any two disjoint columns C}1andC2 of size

w1 andw2 respectively, there exists at least one row such that the entries in the

columnsC1 are distinct from the entries in the columnsC2.

Hence the relationship between PHF and FP codes and between SHF and SFP codes follows immediately by definition.

Theorem 3.2. A (N, n, q)-code, C, is a w− F P code if and only if H(C) is an SHF (N; n, q, w, 1).

Theorem 3.3. A(N, n, q)-code, C, is a w_{−SF P code if H(C) is an P HF (N; n, q, 2w),}

wheren > 2w.

Theorem 3.4. A(N, n, q)-code, C, is a w− SF P code if and only if H(C) is an SHF (N; n, q, w, w), where n > 2w.

The proofs are trivial. Perfect hash families and separating hash families turn out to be just another languages for FP and SFP codes.

3.4 Difference Matrices

Definition: An(n, k; λ)-difference matrix is a k× nλ matrix D = (di,j), with

entries fromZn, in which the multiset

{dh,j− di,j mod n : 1 6 j 6 nλ}

(24)

Example 3.1. Ifgcd ((k − 1)!, n) = 1, then the k × n matrix D defined by di,j =

ij mod n is a (n, k; 1)-difference matrix.

The concept of difference matrix will serve as a tool later in the recursive construction of perfect hash families in Theorem5.20.

3.5 Set Systems

Definition: A set system is a pair (X,B) where X is a set of elements called

points, and_{B is a set of subsets of X, the members of which are called blocks. A} set system can be described by an incidence matrix. Let (X,_{B) be a set system}

where X = {x1, x2, . . . , xn} and B = {B1, B2, . . . , BN}. The incidence matrix

of(X,B) is the N × n matrix A = (aij), where

aij =

(

1 ifxj ∈ Bi

0 ifxj 6∈ Bi.

Conversely, given an incidence matrix, we can define an associated set system in an obvious way. Here, ifC is a (N, n, 2)-code, then the matrix M(C) is a 0_{− 1}

matrix, which can therefore be thought of as the incidence matrix of a set system. For any codeword w _{∈ C, we will use B}w to denote the associated block in the

corresponding set system.

3.6 Sandwich Free Families

A set system (X,_{B) is an (w}1, w2)-sandwich free family provided that, for any

two disjoint subsetsC1, C2 ofB, where |C1| 6 w1 and|C2| 6 w2, the following

property holds: \ B∈C1 B ! [ \ B∈C2 B ! * [ B∈C1 B ! \ [ B∈C2 B !

(25)

CHAPTER 3. HASH FAMILIES AND CODES 17 An(w1, w2)-sandwich free family, (X,B), will be denoted as an (w1, w2)−SF F (N, n)

if_{|X| = n and |B| = N.}

The connection between SFF and SFP codes is stated as follows.

Theorem 3.5. A w− SF P (N, n) exists if and only if there exists a (w, w) − SF F (N, n).

The proof is not so straightforward like the previous one, and will be given in the proof of Theorem5.16which focuses on explicit constructions of such codes.

3.7 Secure Codes

A codeC is w-secure if there exists a tracing algorithm_{A satisfying the following:}

if a coalitionC of size at most w generates a word x thenA(x) ∈ C.

The tracing algorithm _{A on input x must output a member of the coalition}

C that generated the codeword. Hence, an illegal copy can be traced back to at

least one member of the guilty coalition. Clearly there is no hope in recovering the entire coalition since some of its members might be passive; they are part of the coalition, but they contribute nothing to the construction of illegal copies.

Actually, the concept of w-secure codes is not new to us since we have the

following result.

Proposition 3.1. C isw-secure if and only if C is an w-IPP code.

Proof. We firstly derive a necessary condition of a code to be w-secure.

Con-sider the following scenario: letC be some code. Let C1 andC2 be two coalition

of w colluders such that C1 ∩ C2 = ∅. Suppose an unregistered codeword is

caught which is marked by a codeword x which belongs to both descw(C1) and

descw(C2). As a consequence, both coalitions are suspicious. Since their

inter-section is empty, it is not possible to determine with certainty who created the unregisteredx. It follows that if C is w-secure then when the intersection of C1

(26)

well. Of course, the same is true for j subsets C1, . . . , Cj. This gives the

nec-essary condition. The sufficient condition is self-explanatory by the definition of identifiable parent property of IPP code.

Hence, TA codes are secure codes as well. However, both IPP and TA codes do not exist under the presence of unreadable marks as will be clarified in the next chapter. Therefore, IPP and TA codes are only interesting from a theoretic point of view, and will not be treated subsequently. In the next chapter we will explain more about unreadable marks and introduce a probabilistic traitor tracing algorithm to construct “almost” secure codes.

(27)

Chapter 4 Unreadable Marks and PTT

Unreadable marks or undetectable bits are symbols in an uncertain state. For in-stance, when the police or distributer recovers an illegal copy of an object, she might find some symbols undefined in the codeword or could hardly determine which state an unreadable mark is in. The only thing she can do is to simply re-place them by “?”’s.

On the other hand, unreadable marks can be deliberately created by the coali-tions in order to make traitor tracing less feasible and make themselves safer from being prosecuted. As a matter of fact, IPP and TA codes do not exist under the presence of unreadable marks as will be indicated later. However, FP and SFP codes are resistent from the threats of unreadable marks because if innocent users are safe from being framed by colluded codewords, they are even safer from being framed by those codewords with unreadable marks.

Without unreadable marks, IPP and TA codes can exist and have been investi-gated by several researchers in [21,35, 10,6, 2,36, 37,19, 29,20]. However, in the context of fingerprinting, the assumption that marks cannot become unread-able is unrealistic.

Based on the above reasoning and the fact that SFP is an intensified version of FP codes, SFP finds more practical applications in industry. Therefore, the explicit construction of SFP codes will be our main focus in the next chapter.

(28)

The remaining of this chapter will explain how unreadable marks contravene the existence of IPP and TA codes. In order to overcome the problem, a proba-bilistic approach will be proposed.

4.1 Unreadable Marks

Recall in Section3.7the idea of secure codes is introduced. We rephrase Proposi-tion3.1as the following lemma.

Lemma 4.1. IfC is a w-secure code then

C1∩ · · · ∩ Cr =∅ ⇒ descw(C1)∩ · · · ∩ descw(Cr) =∅

for all coalitionsC1,· · · , Crof at mostw colluders each.

It seems that secure codes provide a good solution to the problem of collusion. Unfortunately, whenw > 1, w-secure codes do not exist.

Theorem 4.1. Forw > 2 there are no w-secure codes.

Proof. Obviously, it is sufficient to show that there are no 2-secure codes. Let c(1)_{, c}(2)_{, c}(3)_{be three distinct legal codewords assigned to users}_u

1, u2, u3,

respec-tively. Define the majority wordM = MAJ c(1)_{, c}(2)_{, c}(3)_by

Mi =      c(1)_i , ifc(1)_i = c(2)_i orc(1)_i = c(3)_i c(2)_i , ifc(2)_i = c(3)_i ?, otherwise.

One can readily verify that the majority wordM belongs to desc2{u1, u2}, desc2{u1, u3},

and desc2{u2, u3}, simultaneously. However, the intersection of the coalitions is

empty. Hence, by Lemma4.1, the2-secure code cannot exist.

The proof of the theorem shows that if a coalition employs the “majority” strat-egy it is guaranteed to defeat all fingerprinting codes. Based on above argument and Proposition 3.1 the existence of IPP and TA codes is denied. This forces us to weaken our requirements for fingerprinting schemes. In the following section,

(29)

CHAPTER 4. UNREADABLE MARKS AND PTT 21 we intend to allow the distributor to make some random choices when embedding the codewords in the products. The point is that the random choices will be kept hidden from the users. This enables us to construct codes which will capture a member of the guilty coalition with sufficiently high probability.

4.2 Probabilistic Traitor Tracing

Probabilistic traitor tracing (PTT) is much more efficient in most of the cases. In this scheme, we need not to identify colluders who have absolutely committed crime.kInstead, we treat a couple of might-be-colluders as suspects, and compute the probability that they might be colluders. This may not deterministically tell us who is guilty for the first time. However, after several times of identification, some pirates will become more and more suspicious by accumulating their probabilities of being guilty. Such a strategy works particularly well for the applications such as pay-per-view movies that call for iterative retrievals of data.

Suppose a coalitionC of w users creates an illegal copy of an object.

Finger-printing schemes that enable the capture of a member of the coalitionC with

prob-ability at least1_{−ǫ are called w-secure codes with ǫ error. Namely, Pr [A(x) ∈ C] >} 1_{−ǫ. In other words, The traitor tracing algorithm A on input x outputs a member}

of the coalitionC that generated the codeword x with high probability. To do so,

we intend to allow the distributor to make some random choices when embedding the codewords in the objects. Our point is that the random choices will be kept hidden from the users.

We begin by considering an(N, n)-code which is n-secure with ǫ-error for any ǫ > 0. Let cm be a column of heightn in which the first m bits are 1 and the rest

are 0. The code C (N = d(n− 1), n) consists of all columns c1, . . . , cn−1, each

duplicatedd times. The amount of duplication determines the error probability ǫ.

Example 4.1. The codeC(16, 5) for five users A, B, C, D, E is k_{More generally speaking, we say they committed crime with probability}_1.

(30)

A : B : C : D : E : B1 z }| { 1111 0000 0000 0000 0000 B2 z }| { 1111 1111 0000 0000 0000 B3 z }| { 1111 1111 1111 0000 0000 B4 z }| { 1111 1111 1111 1111 0000

An intuitive traitor tracing strategy is: if any of the first three positions of a pirated codeword is 1, then we know A must belong to the coalition. If we look

at the other direction, we have that if any of the last three positions of a pirated codeword is0, then we know E must belong to the coalition. If A and B collude, C, D, and E are safe from being framed. However, if A and E collude, the

de-scendance ofA and E could jeopardize legal users of B, C, and D. Nevertheless,

this is very unlikely becauseA and E differ in 16 places and the probability for A and E to frame B, C, or D is barely 1₂16 ≈ 10−5_{. This gives a heuristics for}

probabilistic traitor tracing.

Consider, ifB is innocent, then what A, C, D, E could detect in the first eight

positions is totally indifferent, namely, either 11111111 or 00000000. If some of A, C, D, or E collude, then the number of 0′_{s and 1}′_{s should be evenly distributed}

inB1 andB2. If the number of1′s tends to appear more in B2 rather than inB1,

then we deduce thatB is highly suspicious.

Let w(1)_{, . . . , w}(n) _{denote the codewords of} _{C(N, n). Before the distributor}

embeds the codewords ofC(N, n) in an objects he picks a permutation π as

ran-dom as possible. Userui’s copy of the object will be fingerprinted using the word

πw(i)_{. Note that the same permutation}_{π is used for all users. The point is that π}

will be kept hidden from the user. Keeping the permutation hidden from the users is equivalent to hiding the information of which mark in the object encodes which bit in the code. This simple technique will be shown to be effective to overcome the barrier of unreadable marks.

Before going to the construction, we introduce some notation:

1. LetBmbe the set of all bit positions in which the users see columns of type

cm. That is,Bmis the set of all bit positions in which the firstm users see a

(31)

CHAPTER 4. UNREADABLE MARKS AND PTT 23 2. For2 6 s 6 n− 1 define Rs = Bs−1∪ Bs.

3. For a binary string x, let weight(x) denote the number of 1′_{s as a binary}

case of Hamming weight defined in Section2.1.2.

Theorem 4.2 (Boneh and Shaw [8]). Forn > 3 and ǫ > 0 let d = 2n2log 2n_ǫ . The fingerprinting schemeC(N, n) is n-secure with ǫ-error.

The argument has been literally treated above, but we formalize the language here. The length of this code isd(n_{− 1) = O n}3_logn

ǫ

Intuitively, suppose user

s is NOT a member of the coalition C0 which produced the wordx. The hidden

permutationπ prevents the coalition from knowing which marks represent which

bits in the codeC(N, n). The only information the coalition has is the value of the

marks it can detect. Observe that without users a coalition sees exactly the same

values for all bit positionsi_{∈ R}s. Hence, for a bit positioni∈ Rs, the coalitionC0

cannot tell if i lies in Bs or inBs−1. This means that whichever strategy they use

to set the bits ofx|Rs, the1

′_{s in x}_|

Rs will be roughly evenly distributed between

x|Bs andx|Bs−1 with high probability. As a result, if the1

′_{s in x}_|

Rs are not evenly

distributed then, with high probability, user s is a member of the coalition that

generatedx.

Algorithm for probabilistic traitor tracing will be stated accordingly. The input codewordx found in the illegal copy may contain some unreadable marks, call it

“?”. As a convention these bits are set to “0” before the word x is feed into the

algorithm.

INPUT: x∈ {0, 1}N.

AIM: Find a subset of the coalition that producedx.

Algorithm:

1. Ifweight (x|B1) > 0 then output “User 1 is guilty.”

2. Ifweight x_|Bn−1

< d then output “User n is guilty.”

3. Fors from 2 to n_{− 1 do:}

Letk = weight (x_|Rs). Ifweight x|Bs−1 < k₂ −qk 2log 2n

(32)

The correctness of algorithm rely on the following theorem.

Theorem 4.3. Consider the codeC (N = d(n− 1), n) where d = 2n2_log 2n ǫ . Let

S be the set of users which is declared as guilty on input x. Then with probability

at least1_{− ǫ, the set S is a subset of the coalition C}0that producedx.

Before the proof of the theorem we introduce two preliminary lemmas.

Lemma 4.2 (Chernoff Bound). Let X be a binomial random variable over k

experiments with success probability1/2. Then,

Pr X₋k 2 < a 6e−2a2/k

The proof can be found in standard textbooks on probability theory.

Lemma 4.3. LetY follows a hyper-geometric distribution:

Pr[Y = r] = d r _d k−r 2d k .

LetX follows a binomial distribution with success probability 1/2:

Pr[X = r] =k r 1 2 k . Then, Pr[Y = r] 6 2Pr [X = r]

(33)

CHAPTER 4. UNREADABLE MARKS AND PTT 25

Proof. For the sake of brevity assumek is even. (The case for k odd is similar.)

Pr[Y = r] = d r _d k−r 2d k =k r d(d − 1) · · ·(d − r + 1)d(d − 1) · · ·(d − k + r + 1) 2d(2d_{− 1) · · · (2d − k + 1)} 6k r 2−kd 2_(d_{− 1)}2_{· · · d −} k−2 2 2 d(d_{− 1) · · · d −}k−1 2 =k r 2−k d(d− 1) · · · d − k−2 2 d₋ 1 2 d₋3 2 · · · d − k−1 2 =k r 2−k d(d− 1) · · · d − k−2 2 d− 1 + 1 2 d− 2 + 1 2 · · · d − k−2 2 + 1 2 d−k−1 2 6k r 2−k d d₋k−1 2 6k r 2−k· 2 = 2Pr [X = r]

Note that the last inequality follows sincek 6 d.

The proof of Theorem4.3is now as follows:

Proof. Suppose user 1 was declared guilty, i.e., 1 ∈ S. Then weight (x|B1) > 0.

This tells us that user1 must be a member of C0 (otherwise, the bits inB1would

appear undistinguishable forC0). Similarly, ifn∈ S then n ∈ C0.

Suppose the algorithm declared user1 < s < n as guilty. We show that the

probability thats is not guilty is at most _nǫ. This will show that the probability that there exists a user inS which is not guilty is at most ǫ.

Lets be an innocent user, i.e., s 6∈ C0. As was discussed above, this means

that the coalitionC0 cannot distinguish between the bit positions inRs. Because

the permutation π was chosen uniformly at random from the set of all

permu-tations, the 1′_{s in x}_|

Rs may be regarded as being randomly placed in x|Rs. Let

(34)

1′_{s in x}_|

Bs−1 given thatx|Rs containsk 1

′_{s. For any integer r, 0 6 r 6 k:}

Pr[Y = r] = Prweight(x_|Bs−1) = r| weight(x|Rs) = k = d r d k−r 2d k

follows a hyper-geometric distribution where d = 2n2_log 2n

ǫ is the size of the

block. The expectation ofY is k

2. To bound the probability thats was pronounced

guilty we need to bound Pr " Y < k 2 − r k 2log 2n ǫ #

from above. This can be done by comparingY to an appropriate binomial random

variable.

LetX be a binomial random variable over k experiments with success

proba-bility 1₂. Lemma4.3tells us that for anyr we have that Pr [Y = r] 6 2Pr [X = r].

This means that for anya > 0

Pr Y − k 2 < a 62Pr X− k 2 < a 62e−2a2/k

where the last inequality follows from the standard Chernoff bound of Lemma

4.2. Plugging ina =qk 2 log 2n ǫ leads to Pr " Y < k 2 − r k 2log 2n ǫ # 62e− log2nǫ = ǫ n

Hence, if users is innocent then the probability of her being declared guilty is at

most _nǫ. This also means the probability that some innocent user will be declared guilty is at mostǫ, as desired.

Note that the code size is always smaller than the code length by a factor of d here, meaning a poor code size. This problem can be overcome with the

concatenation method discussed in [8] in order to increase the code size and hence accommodate more users. We provide the sketch concept here. Recall in Section

2.1.5the definition of code composition. LetC′_(N′_{, n}′_{) be an outer code over an}

alphabet size n, with code size n′ _{and code length}_N′_{, where the codewords are}

chosen independently and uniformly at random. The idea is to compose our

(35)

CHAPTER 4. UNREADABLE MARKS AND PTT 27 code will containn′_{codewords and has length}_N′_{N = N}′_d(n_{−1). It is made up of}

N′ _{copies of}_{C(N, n). The point is that the codewords of the code C}′ _{will be kept}

secret from the users. This is in addition to keeping hidden the N′ _permutations

used when embedding theN′_{copies of}_{C(N, n) in the products. A traitor tracing}

algorithm is also provided for this scheme which is similar to the original one. Moreover, N and n can be chosen in such a way that n is exponential in N. For

more details we refer the reader to their paper [8]. In the next chapter we will concentrate on the construction of secure frameproof codes.

(36)

Constructions of SFP Codes

This chapter discusses various constructions that meet the requirement of secure-frameproof property. The constructions can be classified into two classes. One of them is called direct construction which will be studied in the first half of this chapter. In such scheme, we construct directly without any help of previous exis-tential results. The other is recursive construction which will be investigated in the second half of this chapter. Given a code‡‡ satisfying certain properties the recur-sive construction augments it to longer codewords and larger code size satisfying the original properties as well.

Part I: Direct Construction

5.1 Hadamard Matrices and Jacobsthal Matrices

Definition: A Hadamard matrix is ann× n real matrix H which satisfies HHT =

nI. The name derives from a theorem of Hadamard.

Theorem 5.1. LetX = (xij) be an n×n real matrix whose entries satisfy |xij| 6

1 for all i and j. Then |det(X)| 6 nn/2_{. Equality holds if and only if} _{X is a}

Hadamard matrix.

‡‡_{We call it the initial seed.}

(37)

CHAPTER 5. CONSTRUCTIONS OF SFP CODES 29 Letx1, x2, . . . , xn be the rows ofX. By Euclidean geometry,|det(X)| is the

volume of the parallelepiped with sidesx1, x2, . . . , xn; namely,

|det(X)| 6 |x1| · |x2| · · · |xn|

where_|xi| is the Euclidean length of xi; equality holds if and only ifx1, x2, . . . , xn

are mutually perpendicular. By assumption,

|xi| = x2i1+ x2i2+· · · + x2in

1/2

6n1/2,

with equality if and only if_|xij| = 1 for all j.

Subsequently, we focus on Hadamard matrices with all entries_±1.

For which ordersn do Hadamard matrices exist? There is a well-known

nec-essary condition:

Theorem 5.2. If a Hadamard matrix of ordern exists, then n = 1 or 2 or n ≡ 0 (mod 4).

To see this, we observe first that changing the sign of every entry in a column of a Hadamard matrix gives another Hadamard matrix. So changing the signs of all columns for which the entry in the first row is_{−, we may assume that all entries} in the first row are+. (We abbreviate +1 and_{−1 to + and − respectively.)}

a z }| { +_{· · · +} +_{· · · +} +· · · + a z }| { +_{· · · +} +_{· · · +} − · · · − a z }| { +_{· · · +} − · · · − +· · · + a z }| { +_{· · · +} − · · · − − · · · −

Because every other row is orthogonal to the first, we see that each further row hasm entries + and −, where n = 2m. Moreover, if n > 2, the first three rows

are displayed in the above figure withn = 4a. The most important open question

in the theory of Hadamard matrices is that of existence (In other words, whether or not the above necessary condition could serve as a sufficient condition is not known).

Conjecture 5.1. A Hadamard matrix of order4n exists for every positive integer n.

(38)

Theorem 5.3. Let H be a Hadamard matrix of order n. Then the partitioned

matrix

H H H −H

is a Hadamard matrix of order2n.

This observation can be applied recursively and leads to the following series of matrices. H1 = 1 H2 = 1 1 1 −1 H4 =     1 1 1 −1 1 1 1 −1 1 1 1 ₋₁ − 1 1 1 ₋₁    =     1 1 1 1 1 ₋₁ 1 ₋₁ 1 1 −1 −1 1 _{−1 −1} 1     · · ·

In this manner, Sylvester constructed Hadamard matrices of order2n_{for every}

non-negative integern. Sylvester’s matrices have a number of special properties.

They are symmetric. The elements in the first column and the first row are all pos-itive. The elements in all the other rows and columns are evenly divided between positive and negative.

Raymond Paley later showed how to construct a Hadamard matrix of order

q + 1 where q is any prime power which is congruent to 3 modulo 4. He also

constructed matrices of order2(q + 1) for prime powers q which are congruent to 1 modulo 4. His method uses finite fields.

Letq be a prime power congruent to 3 modulo 4. Recall that in the field GF(q),

half of the nonzero elements are quadratic residues, and half are quadratic non-residues. The quadratic character of GF(q) is defined as:

χ(x) =      0 ifx = 0; +1 ifx is a quadratic residue; −1 ifx is a quadratic non-residue.

Definition: LetA be a matrix whose rows and columns are indexed by elements

of GF(q), and has entries as axy = χ(y − x). Then, A is skew-symmetric, with

(39)

CHAPTER 5. CONSTRUCTIONS OF SFP CODES 31

Theorem 5.4. If we replace the diagonal zeros by _{−1s in the Jacobsthal matrix}

and augment it by a new row and a new column all of entries 1, we obtain a

Hadamard matrix of orderq + 1 called Hadamard matrix of Paley type. H =

1 1 1 A_{− I}

Example 5.1. Forp = 7, we obtain the following matrix:

A =           0 1 1 −1 1 −1 −1 −1 0 1 1 ₋₁ 1 ₋₁ −1 −1 0 1 1 −1 1 1 _{−1 −1} 0 1 1 ₋₁ −1 1 _{−1 −1} 0 1 1 1 −1 1 −1 −1 0 1 1 1 ₋₁ 1 _{−1 −1} 0          

A normalized Hadamard matrixH of order q + 1 of Paley type is now given

as follows: H = 1 1 1 A_{− I} .

Example 5.2. Forp = 7, we obtain the following matrix over GF(3) by replacing

-1 by 2 fromA: A′ =           0 1 1 2 1 2 2 2 0 1 1 2 1 2 2 2 0 1 1 2 1 1 2 2 0 1 1 2 2 1 2 2 0 1 1 1 2 1 2 2 0 1 1 1 2 1 2 2 0          

LetH4k be any Hadamard matrix of order 4k when +1s are replaced by 0s

and_{−1s by 1s.}

Theorem 5.5 (Encheva & Cohen [17]). H4kis a binary2−SF P (N, n) where N =

n = 4k.

Proof. We show that there is a column like(0011)⊤_or₍₁₁₀₀₎⊤_{for any}_c

1, c2, c3, c4 ∈

H4k. We consider a normalized Hadamard matrix where the first row is the all1s

and firstly assume none ofc1, c2, c3, c4 is the all 1s codeword. Suppose the

(40)

c1 2k z }| { 1 . . . 1 2k z }| { 0 . . . 0 c2 k z }| { 1 . . . 1 k z }| { 0 . . . 0 k z }| { 1 . . . 1 k z }| { 0 . . . 0 c3 (a) z }| { 1 . . . 1 (k−a) z }| { 0 . . . 0 (k−b) z }| { 1 . . . 1 (b) z }| { 0 . . . 0 (k−c) z }| { 1 . . . 1 (c) z }| { 0 . . . 0 (d) z }| { 1 . . . 1 (k−d) z }| { 0 . . . 0

Because none of c1, c2, c3, c4 is the all 1s codeword, they should contain equal

number of 0s as 1s, and every two rows should coincide in half of the positions

and differ in the other half positions. Therefore,c3should contain2k 1s, yielding:

a + (k_{− b) + (k − c) + d = 2k}

Sincec3 should coincide withc2 in exactly2k positions, we have that:

a + b + (k_{− c) + (k − d) = 2k}

Again sincec3 should coincide withc1 in exactly2k positions, we have that:

a + b + c + d = 2k

A routine calculation leads toa = b = c = d.

Accordingly, the support ofc1, c2, c3, c4 is given by

c1 2k z }| { 1 . . . 1 2k z }| { 0 . . . 0 c2 k z }| { 1 . . . 1 k z }| { 0 . . . 0 k z }| { 1 . . . 1 k z }| { 0 . . . 0 c3 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 c4 (k−y) z }| { 0 . . . 0 (y) z }| { 1 . . . 1 (y) z }| { 0 . . . 0 (k−y) z }| { 1 . . . 1 (y) z }| { 0 . . . 0 (k−y) z }| { 1 . . . 1 (k−y) z }| { 0 . . . 0 (y) z }| { 1 . . . 1 ⇓ y > k_{− x} ⇓ k_{− y > x} We deduce that x + y = k

(41)

CHAPTER 5. CONSTRUCTIONS OF SFP CODES 33 c3 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 c4 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1

However, this is impossible because c3 and c4 should coincide in 2k positions.

Moreover, if one of c1, c2, c3, c4 is the all 1s codeword then it is even easier for

them to exhibit the2_{− SF P property. Hence, H}4k is2− SF P .

Theorem 5.6. Jacobsthal matrices generate2_{− SF P over GF(3).}

(42)

5.2 The Subsets Method

Another direct construction which employs the properties of subsets was proposed by Tonien and Safavi-Naini [34].

Let(k) be the set{1, 2, . . . , k}. By (k)twe denote the set of all subsets of(k)

which contain exactlyt elements.

With parametersk, t, r, consider the matrix Mt,r(k) whose rows are labeled by

elements of(k)tand columns are labeled by elements of(k)r. ForU ∈ (k)t, V ∈

(k)r, the entry at the rowU and column V of the matrix Mt,r(k) is|U ∩ V |. The

codeCt,r(k) is composed by the rows of the matrix Mt,r(k). Without ambiguity,

we identify a codeword of Ct,r(k) with a set U ∈ (k)t and a position with a

set V ∈ (k)r. By definition, the symbol of the codewordU at the position V is

UV =|U ∩ V |.

The codeCt,6r(k) can be defined in a similar way. Code Ct,6r(k) is depicted

by the matrix Mt,6r(k) whose rows and columns are labeled by elements of the

sets(k)tand(k)6r respectively. ForU ∈ (k)t andV ∈ (k)6r, the symbol of the

codewordU at the position V is UV =|U ∩ V |.

CodesC∗

t,r(k) and Ct,6r∗ (k) are binary codes. They are constructed the same

as code Ct,r(k) and Ct,6r(k) except that the symbol of the codeword U at the

positionV is UV =|U ∩ V | (mod 2).

Example 5.3. Codes ofC3,2(5), C3,2∗ (5), C3,62(4), and C3,62∗ (4) are shown below:

C3,2(5) {1, 2} {1, 3} {1, 4} {1, 5} {2, 3} {2, 4} {2, 5} {3, 4} {3, 5} {4, 5} {1, 2, 3} 2 2 1 1 2 1 1 1 1 0 {1, 2, 4} 2 1 2 1 1 2 1 1 0 1 {1, 2, 5} 2 1 1 2 1 1 2 0 1 1 {1, 3, 4} 1 2 2 1 1 1 0 2 1 1 {1, 3, 5} 1 2 1 2 1 0 1 1 2 1 {1, 4, 5} 1 1 2 2 0 1 1 1 1 2 {2, 3, 4} 1 1 1 0 2 2 1 2 1 1 {2, 3, 5} 1 1 0 1 2 1 2 1 2 1 {2, 4, 5} 1 0 1 1 1 2 2 1 1 2 {3, 4, 5} 0 1 1 1 1 1 1 2 2 2

(43)

CHAPTER 5. CONSTRUCTIONS OF SFP CODES 35 C3∗,2(5) {1, 2} {1, 3} {1, 4} {1, 5} {2, 3} {2, 4} {2, 5} {3, 4} {3, 5} {4, 5} {1, 2, 3} 0 0 1 1 0 1 1 1 1 0 {1, 2, 4} 0 1 0 1 1 0 1 1 0 1 {1, 2, 5} 0 1 1 0 1 1 0 0 1 1 {1, 3, 4} 1 0 0 1 1 1 0 0 1 1 {1, 3, 5} 1 0 1 0 1 0 1 1 0 1 {1, 4, 5} 1 1 0 0 0 1 1 1 1 0 {2, 3, 4} 1 1 1 0 0 0 1 0 1 1 {2, 3, 5} 1 1 0 1 0 1 0 1 0 1 {2, 4, 5} 1 0 1 1 1 0 0 1 1 0 {3, 4, 5} 0 1 1 1 1 1 1 0 0 0 C3,62(4) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2, 3} 1 1 1 0 2 2 1 2 1 1 {1, 2, 4} 1 1 0 1 2 1 2 1 2 1 {1, 3, 4} 1 0 1 1 1 2 2 1 1 2 {2, 3, 4} 0 1 1 1 1 1 1 2 2 2 C3∗,62(4) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2, 3} 1 1 1 0 0 0 1 0 1 1 {1, 2, 4} 1 1 0 1 0 1 0 1 0 1 {1, 3, 4} 1 0 1 1 1 0 0 1 1 0 {2, 3, 4} 0 1 1 1 1 1 1 0 0 0

The following two theorems are used to establish the secure-frameproof prop-erty ofCt,r(k) and Ct,6r(k).

Theorem 5.7. IfS1, S2, S3, and S4 are arbitrary subsets of(k) such that

Si 6⊂ Sj andSj 6⊂ Sifor alli∈ {1, 2} and j ∈ {3, 4}

then there exists an elementsV _{∈ (k)}63 such that the following two sets

{|V ∩ S1| mod 2, |V ∩ S2| mod 2} and {|V ∩ S3| mod 2, |V ∩ S4| mod 2}

are disjoint.

This further implies the following.

Corollary 5.1. IfS1, S2, S3, and S4are arbitrary subsets of(k) such that

Si 6⊂ Sj andSj 6⊂ Sifor alli∈ {1, 2} and j ∈ {3, 4}

then there exists an elementsV ∈ (k)63 such that the following two sets

{|V ∩ S1|, |V ∩ S2|} and {|V ∩ S3|, |V ∩ S4|}

(44)

The proof of Theorem5.7 and Corollary5.1 can be found in [34] which ex-haustively investigates all of the possibilities of distribution of 0s and 1s. Based

on the above fact, we have the following explicit constructions.

Theorem 5.8. For anyk > 4, C2,2(k) is a ternary 2−SF P with code size n = k₂

and code lengthN = k₂.

Proof. We indicate a proof which is easier than the original found in the paper of

Tonien and Safavi-Naini. First it is sufficient to show that ifC2,2(8) is 2− SF P

then the same is true as well forC2,2(k) for all k > 8.

Therefore, wheneverk > 8, the submatrix of dimension 8₂ × 8₂out of the matrix of dimension k₂ _× k₂ will always have the 2_{− SF P property. Thus}

the conclusion follows. Now in order to finish the proof, we still have to verify the 2− SF P property for k = 5, 6, 7, 8, but this can be done either by hand or

computers.

Theorem 5.9. For any k > t, Ct,63(k) is a quaternary 2− SF P with code size

n = k_tand code lengthN = k₁+ ₂k+ k₃= 1₆k(k2_{+ 5).}

Theorem 5.10. For anyk > 4t + r_{− 1 and r > 3, C}t,r(k) is a (min{t, r} + 1) −

ary 2− SF P with code size n = kt

and code lengthN = k_r.

Proof. For any four distinct elementsS1, S2, S3, S4of(k)t, by Corollary5.1, there

existsV _{∈ (k)}63such that the two sets{|V ∩ S1|, |V ∩ S2|} and {|V ∩ S3|, |V ∩ S4|}

are disjoint. Sincek > 4t+r−1 = |S1|+|S2|+|S3|+|S4|+r−1, we can add more

elements from the set(k)_{\ (S}1∪ S2∪ S3∪ S4) to V and obtain a set V′ ∈ (k)r.

We have V _{∩ S}i = V′ ∩ Si, and thus, the two sets {|V′∩ S1|, |V′∩ S2|} and

{|V′ _{∩ S}

3|, |V′ ∩ S4|} are disjoint. This proves that the code Ct,r(k) is a 2−SF P .

Combining the results and Theorem5.7,5.9, and5.10, we have the following binary codes.

Theorem 5.11. For any k > t, C∗

t,63(k) is a binary 2 − SF P with code size

n = k_tand code lengthN = k₁+ ₂k+ k₃= 1₆k(k2_{+ 5).}

Theorem 5.12. For anyk > 4t + r_{− 1 and r > 3, C}∗

t,r(k) is a binary 2− SF P

with code sizen = k_tand code lengthN = k_r.

Note that the “Subsets Method” which is capable of generating exponential code sized 2_{− SF P is much better than the “Hadamard Method” which gives}

(45)

2 − SF P codes with code size only the same as the code length. In order to

demonstrate this, we take advantage of Stirling’s formula

k!_∼√2πk k e

k

.

Consider the binary code derived in Theorem5.12, the maximum code size is for

t =_⌊k 2⌋, n =k_k 2 = _k k! 2 ! k₂! ∼ √ 2πk k_ek q 2πk₂k/2_e k/2 q 2πk₂k/2_e k/2 = 2k r 2 πk

which is exponential with respect to the code lengthN. Moreover, the minimum

code length is forr = 3, N = 1₆k(k2_{+ 5). Therefore, the maximum code rate can}

be achieved as: R = N−1log_qn ∼ 1 6k(k 2_{+ 5)} −1 log2 2k r 2 πk ! 6 6k k(k2_{+ 5)}log2 r 2 πk !

which tends to zero ask goes to infinity. Nevertheless, we have N =_{O (log n)}3. Later on, we will introduce better codes with positive code rates where again the code size is exponential with respect to the code length. Also, up to now we have only investigated2_{− SF P codes, we will show w − SF P codes for w > 2 in the}

(46)

Part II: Recursive Construction

5.3 Concatenation Method

Recall that in Section 2.1.5 of second chapter the concatenation of two codes is defined. The construction of the section employs concatenation of two codes: normally a longer outer codeB and a shorter inner code A. The concatenation is

usually used to increase the code length and the separating weights, λw, defined

in Section2.4.1.

Theorem 5.13. Let u > v, C1 be a u− SF P code with λu andC2 a v− SF P

code withλv, then the concatenatedC1⋆C2is av− SF P with a new separating

weightλ′ _>_λ uλv.

Proof.C2 is anv− SF P outer code and the symbols of C2 are replaced by a

one-to-one mapping by codewords ofC1, so if any two coalitions ofC2of size at most

v is separable, then any two coalitions of C1⋆C2 of size at mostv is separable as

well. The separating weight of outer code isλv, and for each separated position of

C2 the inner codeC1itself separates in at leastλu positions by definition. Hence,

the new separating weightλ′ is at leastλuλv.

Example 5.4. LetA and B be the following code:

A =     0 0 0 0 1 1 1 1 2 2 2 2 2 2 1 0 1 2 1 0     4×5 B =                             0 0 0 0 0 1 1 1 1 0 2 2 2 2 0 3 3 3 3 0 3 2 1 0 1 2 3 0 1 1 1 0 3 2 1 0 1 2 3 1 1 3 2 0 2 0 2 3 1 2 3 1 0 2 2 2 0 1 3 2 2 1 3 0 3 3 0 2 1 3 0 3 1 2 3 1 2 0 3 3                             16×5

(47)

It is easy to check thatA is a 3− SF P code and B a 2 − SF P code. Define the

following mapping of alphabet symbols ofB to the rows of A:

θ :          0 7→ (00001) 1 7→ (11122) 2 7→ (22221) 3 7→ (01210)

Applying this mapping toB we obtain a code A⋆B with parameters N = 16, n = 25, q = 3: A⋆B =                             00001 00001 00001 00001 00001 11122 11122 11122 11122 00001 22221 22221 22221 22221 00001 01210 01210 01210 01210 00001 01210 22221 11122 00001 11122 22221 01210 00001 11122 11122 11122 00001 01210 22221 11122 00001 11122 22221 01210 11122 11122 01210 22221 00001 22221 00001 22221 01210 11122 22221 01210 11122 00001 22221 22221 22221 00001 11122 01210 22221 22221 11122 01210 00001 01210 01210 00001 22221 11122 01210 00001 01210 11122 22221 01210 11122 22221 00001 01210 01210                             16×25

A more practical application of the concatenation method will be indicated later in Section5.5.

(48)

5.4 Conversion from Hash Families

Constructions for hash families have been extensively investigated by many re-searches. Here, we assume the existence of certain hash families and use them to construct secure frameproof codes. We first construct small codes and use them as the initial seed to construct bigger ones.

We will use sandwich free families, perfect hash families, and separating hash families to construct SFP codes. Note that in the construction the unreadable marks are unnecessary for discussion. Before doing so, we present a direct con-struction and a recursive concon-struction of SFP codes which explains the idea of recursive construction.

Theorem 5.14. For any integerw > 2, there is a w_{− SF P} 2w−1_w−1, 2w. Proof. Recall the representation of incidence matrix defined in Section 3.5of set systems. Let the codeC be built from an incidence matrix whose first row contains

all 1s and the remaining columns corresponds to _{B which is the set of subsets} B1, . . . , BN, where Bi contains all possible (w − 1) choices out of (2w − 1)

elements, yielding N = 2w−1_w−1. We will show that C = u(1)_{, . . . , u}(2w) _{is a}

w_{− SF P (N, n) code where N =} 2w−1_w−1 is the code length andn = 2w is the

code size. It suffices to verify that for all C1, C2 ⊆ C and |C1| = |C2| = w,

C1∩ C2 =∅. Since n = 2w, if follows that C2 = C\ C1. Because the code length

isN = 2w−1_w−1, there is a unique bit positioni such that u(j)_i = 1 for all u(j)_{∈ C} 1

andu(j)_i = 0 for all u(j) _{∈ C}

2. It follows thatxi = 1 if x ∈ descw(C1) and xi = 0

ifx_{∈ desc}w(C2) or vice versa. Hence, descw(C1)∩ descw(C2) =∅.

Example 5.5. Using the above method, a 3-SFP(10,6) code can be constructed

and interpreted as an incidence matrix as follows:

M(C) =         1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1         6×10

A recursive construction can be provided in a similar way.

智慧財產權保護碼

國 立 交 通 大 學

應用數學系

碩 士 論 文

智慧財產權保護碼

On Codes for Copyright Protection

研 究 生：黎冠成

指導老師：符麥克 教授

On Codes for Copyright Protection

研 究 生: 黎冠成 Student：Li, Guan-Cheng

指導教授: 符麥克 Advisor：Michael Fuchs

國 立 交 通 大 學

應 用 數 學 系

碩 士 論 文

L

I

, G

UAN

-C

HENG

O

N

C

ODES FOR

C

OPYRIGHT

P

ROTECTION

Master Thesis

In Partial Fulfillment of Requirement

For the Degree of Master

Advisor:

Professor Michael Fuchs

Submitted to

Institute of Applied Mathematics

College of Science

National Chiao Tung University

Hsinchu, Taiwan, Republic of China

Preface

Chapter 1

Introduction

Chapter 2

Definitions and Basics

2.1

Some Coding Theory

2.1.1

Hamming Distance

2.1.2

Hamming Weight

2.1.3

Minimum Distance

2.1.4

Error Correcting Code

2.1.5

Code Composition

2.2

Descendence

2.3

Frameproof code

2.4

Secure Frameproof code

2.4.1

Separating Weights

2.5

Identifiable parent property code

2.6

Traceability code

2.7

Relations

Chapter 3

Hash Families and Codes

3.1

Hash Functions

3.2

Perfect Hash Families

3.3

Separating Hash Families

3.4

Difference Matrices

3.5

國立交通大學

碩士論文

研究生：黎冠成

指導老師：符麥克教授

研究生: 黎冠成 Student：Li, Guan-Cheng

國立交通大學

應用數學系

碩士論文