國 立 交 通 大 學
應用數學系
碩 士 論 文
智慧財產權保護碼
On Codes for Copyright Protection
研 究 生:黎冠成
指導老師:符麥克 教授
On Codes for Copyright Protection
研 究 生: 黎冠成 Student:Li, Guan-Cheng
指導教授: 符麥克 Advisor:Michael Fuchs
國 立 交 通 大 學
應 用 數 學 系
碩 士 論 文
A ThesisSubmitted to Department of Applied Mathematics College of Science
National Chiao Tung University in Partial Fulfillment of the Requirements
for the Degree of Master
in
Applied Mathematics June 2006
Hsinchu, Taiwan, Republic of China
L
I
, G
UAN
-C
HENG
O
N
C
ODES FOR
C
OPYRIGHT
P
ROTECTION
Master Thesis
In Partial Fulfillment of Requirement
For the Degree of Master
Advisor:
Professor Michael Fuchs
Submitted to
Institute of Applied Mathematics
College of Science
National Chiao Tung University
Hsinchu, Taiwan, Republic of China
Contents i
Preface ii
Acknowledgement iii
1 Introduction 1
2 Definitions and Basics 5
2.1 Some Coding Theory . . . 5
2.1.1 Hamming Distance . . . 5
2.1.2 Hamming Weight . . . 5
2.1.3 Minimum Distance . . . 6
2.1.4 Error Correcting Code . . . 6
2.1.5 Code Composition . . . 6
2.2 Descendence . . . 7
2.3 Frameproof code . . . 8
CONTENTS ii
2.4 Secure Frameproof code . . . 8
2.4.1 Separating Weights . . . 9
2.5 Identifiable parent property code . . . 10
2.6 Traceability code . . . 11
2.7 Relations . . . 11
3 Hash Families and Codes 13 3.1 Hash Functions . . . 13
3.2 Perfect Hash Families . . . 14
3.3 Separating Hash Families . . . 14
3.4 Difference Matrices . . . 15
3.5 Set Systems . . . 16
3.6 Sandwich Free Families . . . 16
3.7 Secure Codes . . . 17
4 Unreadable Marks and PTT 19 4.1 Unreadable Marks . . . 20
4.2 Probabilistic Traitor Tracing . . . 21
5 Constructions of SFP Codes 28 5.1 Hadamard Matrices and Jacobsthal Matrices . . . 28
5.3 Concatenation Method . . . 38
5.4 Conversion from Hash Families . . . 40
5.5 Linear Codes . . . 47
5.6 Comparisons . . . 51
6 Summary 52
A Acronyms 53
Preface
Unauthorized illegal duplication is a major problem in many areas. For digital media, duplication is especially easy because copying such material is immediate and no information is degraded in the process. In addition, the growth of the Inter-net makes it possible to distribute the material in a much larger scale than before. Because of both technical and legal issues, it is often difficult to find and prosecute the pirates. Hence, to protect digital copies is a complicated task. Recently, elec-tronic fingerprinting was devised as a method to discourage people from illegally redistributing their legally purchased copy.
Electronic fingerprinting deals with the problem of object identification through the use of electronic marks, unique to each object. We consider fingerprinting for the purpose of protecting innocent users from being framed and tracing of illegit-imately copied and distributed data, so called pirate copies.
We examine the possibilities of designing fingerprinting codes that are resis-tant to tampering. We show that under certain assumptions, we are often able to protect blameless users and even trace back the criminals.
Also, with the model we describe, the result of tracing should be reliable. That is, our tracing may fail in the sense that no pirates are identified, but we should not mistakenly accuse an innocent user. In this thesis, we mainly focus on a number of code constructions, and discuss their mathematical properties against piracy.
First of all, I am especially grateful to my supervisor Professor Michael Fuchs for his incessant patience in giving me professional guidance on mathematics re-search. He is an enthusiastic mentor who taught me not only the essence of the theory but also to articulate it clearly and convincingly. This is particularly valu-able for me since I was trained to be a problem solver rather than a seller of my theory under traditional education before.
I want to thank Professor Kar-Kin Zao of Institute of Computer Science for providing inspirations in his course “Internet Security” which gave me related ideas on electronic security.
I would also like to thank Professor Ta-Yuan Huang for his endless encourage-ment, in particular, regarding the undergraduate training of algebra in his class, which later introduced me into wonderful fields of mathematics and computer science.
Since this is my first time typing in LateX, I owe many thanks to my senior classmates for giving me advice about the typesetting of a thesis. Finally, I would show my appreciation to the support of my beloved girlfriend, Yu-Han and my family, and ascribe the completion of my degree to my family members.
Li, Guan-Cheng psd23.am93g@nctu.edu.tw National Chiao Tung University May, 16th, 2006.
Chapter 1
Introduction
Conventional mechanisms for copyright protection are obviously incapable of treating digital data owing to the essential difference of the documents. This leads to the interest of developing other means for deterring the pirates from illegally re-distributing products. Digital fingerprinting, for example, can serve our purpose. A fingerprint is a set of number sequence added to digital data that can be de-tected or extracted later to make an assertion about the data. The fingerprint can be applied in several areas, including:
• Ownership assertion
• Authentication and integrity verification • Content labeling
• Digital watermarking • Access control protocols • Content protection
• Detection of copyright violations • Secure on-line multimedia distribution • Resource usage control
• Trust and trust management
With digital fingerprinting, a publisher embeds a unique fingerprint into each distributed copy of a document, keeping a database of sold copies and their cor-responding buyers. If an illegally distributed copy is discovered, the publisher would certainly want to trace back to the unauthorized user by comparing its fin-gerprint to the database. Because of the uniqueness of the finfin-gerprint, the pirates would introduce some kind of marking distortion upon the documents. In order to redistribute illegal copies anonymously, a pirate may try different types of at-tacks to disclose the fingerprint. Assuming that the pirate has an access to a single document copy, that has been marked for him, he may try to restore the original document by identifying and removing the fingerprint. However, such an attack may be questionable if the fingerprinting is hidden carefully and scattered all over the document. A stronger attack results if several pirates collude and compare their independently marked copies. They can identify the hidden fingerprint by locating the differences among their copies, replace them with other feasible marks, com-bine their copies into several new ones whose fingerprint differs from all of the pirates, and resell their pirated products with different fingerprints without ever worrying about being caught. The copies replaced by feasible marks are called the descendence as will be made precise in Chapter 2.
Frameproof codes were introduced by Boneh and Shaw [8] as a method of digital fingerprinting which prevents a coalition of a specified sizew¶from fram-ing a user not in the coalition. Several constructions ofw-frameproof codes were
mainly introduced later on by Stinson, Wei, Encheva, and Cohen [12,14,30].
Besides the design of frameproof codes against piracy, an efficient traitor trac-ing algorithm might be necessary in order to identify the offenders. The traitor tracing problem was introduced by Chao, Fiat and Naor for broadcast encryption systems, where the data should be accessible only to authorized users. When an illegal copy produced by a group of authorized users of the copyrighted material is detected, traitor tracing schemes allow to trace back at least one producer of it. In particular, these schemes are suitable for pay-per-view TV applications. We consider, as an example, a pay-per-view movie type scenario introduced by Fiat and Tassa. In this scenario, the content is divided into n segments. Each of this
CHAPTER 1. INTRODUCTION 3 segments is marked with one ofq different symbols. Each user receives a
differ-ently marked copy of the content. The ordered set of the marks for each copy can be given as aq-ary vector of length n. A coalition of colluding users can make an
illegal copy by combining different segments of their data and broadcast it. After an illegal copy is detected, traitor tracing schemes attempt to reveal at least one traitor. The practical applications require to accommodate as many users as pos-sible when there is a restriction on the number of symbols which can be used for marking the data. On the other hand, some digits of the codes, whatever registered or pirated, might happen to be erased or appear undetectable however accidentally or deliberately. Therefore, there might be a need to distinguish codes in more than one position in order to be fault-tolerant.
Several codes providing some forms of traceability have been designed to be used in these schemes. These codes have been extensively studied in recent years. The weak forms are frameproof (FP) codes and secure frameproof (SFP) codes. A stronger form includes identifiable-parent-property (IPP) codes introduced by Hollmann, van Lint, Linnartz and Tolhuizen [21], and traceability (TA) codes in-troduced by Chor, Fiat and Naor [10]. Such codes allow the tracing of at least one parent of any illegal copy when the size of the coalition of colluders does not ex-ceed some given numberw. Their combinatorial properties and related structures
with codes have been studied by Hollmann et al., Staddon, Stinson and Wei, Barg, et al. and Sarkar [28,30,31,21].
As a matter of fact, FP codes turn out to be a subclass of SFP codes, SFP codes are a subclass of IPP codes, and IPP codes are a subclass of TA codes. They will be mathematically formulated in Chapter 2. Their relationship with hash families will be treated in Chapter 3.
The aim of this thesis is to study the above codes under the presence of un-readable marks. In such a situation, Boneh and Shaw [8] pointed out that codes with traitor tracing properties do not exist. This will be made precise in Chapter 4. They provided an alternative, slightly weaker form of traceability codes by us-ing randomness and probabilistic traitor tracus-ing. Their work is important from an application point of view because they trade off some accuracy for a fast traitor-tracing algorithm under the condition that undetectable marks exist. Hence, IPP and TA codes are only interesting from a theoretic point of views and are less ap-plicative owing to the intolerance of undetectable marks. The probabilistic traitor
tracing (PTT) algorithm due to Boneh and Shaw will be presented in the second half of Chapter 4.
However, it should be pointed out that if there are too many unreadable marks then even the probabilistic approach fails. An extreme case would be a codeword filled with unreadable marks which is totally impossible for the distributor to rec-ognize, not mentioning tracing back. However, the pirated products with unread-able marks will soon be detected by the distributor, and in practical situations, the pirates will scatter only a few unreadable marks to the products in order to falsely convince the customers that the pirated products are copyrighted ones.
On the other hand, FP and SFP codes are immune from undetectable marks. Since SFP is stronger than FP, SFP codes find more practical applications such as the distribution of multi-license. In such a scenario a distributor sells his products to an institution instead of an individual. The distributor then gives a couple of codes as a base to generate more codes for the use of employees in the institution. The distributor certainly hopes that the base codes exhibit the secure frameproof property so that codewords authorized to each institution can be treated indepen-dently.
We conclude the introduction by giving a sketch of the thesis. In Chapter 2, we will provide the basic definitions which will be more general than the original definitions given by Stinson in [30]. Chapter 3 is then dedicated to the relation-ship between hash families and codes. In Chapter 4, we study unreadable marks and the probabilistic approach, and prove that IPP and TA codes do not exist. Fi-nally, in Chapter 5, we investigate explicit constructions for SFP codes. Most of the results in Chapter 4 and 5 are taken from the literature. We however tried to increase clarity by adding more details and giving simplified proofs of many re-sults. Moreover, we tried to give a complete picture by incorporating all results presently known concerning codes for copyright protection under the presence of unreadable marks.
Chapter 2
Definitions and Basics
2.1
Some Coding Theory
Throughout the thesis, we denote byN the code length, by n the code size, and
byq the number of alphabets over a code C.
2.1.1
Hamming Distance
Definition: The Hamming distancedH between two codewords is the number of
positions whose entries are different.
Example 2.1. dH(11001, 01101) = 2
2.1.2
Hamming Weight
Definition: The Hamming weight denotes the number of nonzero entries in a
codeword.
Example 2.2. The Hamming weight of(1, 0, 1, 1, 0) is usually denoted as weight (1, 0, 1, 1, 0) = 3.
2.1.3
Minimum Distance
Definition: The minimum distance of a code C ⊆ PN is the least Hamming distancedH(x, y) between any pair of different codewords x, y∈ C.
2.1.4
Error Correcting Code
Definition: The(N, k, d)q-Error Correcting Code (ECC) is aq-ary linear code
with cardinalityk, code length N, and minimum Hamming distance between any
two codewords d. It follows that the code rate R is k/n and code size is qk. In
some situations we also need to specify by D the maximum Hamming distance
between any two codewords. Normally we omit the subscript in the binary case. In the nonlinear case, (N, n, q) is a q-ary code of length N with code size n. The rate is computed as N−1log
q|n|. The following two nonlinear codes are
for practical applications. One is the constant-weight code being a binary code whose codewords have a fixed number of 1′s. The other is the equidistant code
being a code where any two codewords enjoy a fixed Hamming distance. We further introduce some more terminology for linear ECC as follows:
Theorem 2.1 (Singleton Bound). For a code C : Pk 7→ PN with minimum distanced, N > k + d− 1.
Codes satisfying the equality of Singleton Bound are called Maximum Distance
Separable (MDS) code.
A codeC with odd d is said to be a Perfect Code if for every codeword w of length N not in C, there is an unique codeword w0inC such that dH(w, w0) 6 (d−1)/2.
2.1.5
Code Composition
Definition: Let A be an (N2, n2, q2) code over an alphabet Q2 with |Q2| = q2
CHAPTER 2. DEFINITIONS AND BASICS 7
{a1, . . . , aq2} and let B = {b1, . . . , bq2}. Let θ : Q2 7→ B be the one-to-one
map-ping defined byθ(ai) = bi for1 6 i 6 q2. For any codeworda = (a1, . . . , aN2)∈
A we denote by ˜a = (θ(a1), . . . , θ(aN2)) = (b1, . . . , bN2) the q1-ary sequence of
lengthN1N2obtained froma by using θ. The set
A⋆B ={˜a = (b1, . . . , bN2)| (a1, . . . , aN2)∈ A}
is called (N1N2, n2, q1) concatenation code of A and B, with inner code A and
outer codeB.
2.2
Descendence
Certain properties of the codes discussed above can be formulated using mathe-matical notations. Subsequently, let C be a code of length N on an alphabet Q
with|Q| = q.
We denote by “?” the unreadable mark deliberately or accidently inserted into the pirated codewords. For any subset of codewordsC0 ⊆ C, we define the set of
descendants ofC0, denoteddesc(C0) by
desc (C0) := ( x∈ QN : xi ∈ ( {ai : a∈ C0} , if| {ai : a∈ C0} | = 1; {ai : a∈ C0} ∪ {?}, otherwise. ) .
Namely, the setdesc(C0) consists of the N-tuples plus perhaps some unreadable
marks that could be produced by a coalition holding the codewords in the setC0. If
in a certain entry there is only one choice for the coalition, then only that feasible element will be used in that entry. Besides, the coalition could choose more than one elements plus a question mark.
Letw ∈ N be the number of codewords a coalition could have. We define the w− descendant code of C, denoted descw(C0)††, as follows:
descw(C) :=
[
C0⊆C,|C0|≤w
desc (C0) .
In other words, the setdescw(C) consists of the N-tuples that could be
pro-duced by comparing the codewords they jointly hold by some coalition of size at mostw. Example 2.3. LetC = {(1, 2, 0, 1, 1), (2, 2, 0, 1, 0)}. Thendesc2(C) = 1 2 ? , 2, 0, 1, 0 1 ? . And,|desc2(C)| = 9.
Remark 2.1. Two pirated codewords (1,0,0,?,?) and (1,0,1,?,?) are obviously
dif-ferent because of the third entry. However, when given two codewords (1,0,1,?,?) and (1,0,1,?,?), we still treat them differently although they might become the same codewords.
Next, we give the definitions concerning the mathematical properties required by FP, SFP, IPP, and TA codes.
2.3
Frameproof code
Definition: C is a w-frameproof (FP) code provided that for all x ∈ descw(C),
x∈ desc(Ci)∩ C implies x ∈ Ci.
Roughly speaking, a code isw-frameproof if no coalition of size at most w
can frame another user not in the coalition by producing the codeword held by that user.
2.4
Secure Frameproof code
Definition: C is a w-secure frameproof (SFP) code provided that for all x ∈ descw(C)∩ QN, x∈ desc(Ci)∩ desc(Cj) implies that Ci∩ Cj 6= ∅, where i 6= j.
In other words, a code isw-secure frameproof if no coalition of size at most w
can frame a disjoint coalition of size at mostw by producing an N-tuple that could
CHAPTER 2. DEFINITIONS AND BASICS 9 disjoint coalitionsC1andC2of size at mostw, we know that they cannot produce
the same false fingerprint, i.e.,desc(Ci)∩ desc(Cj)∩ QN =∅.
Remark 2.2. Note that FP and SFP codes are resistent from the threats of
un-readable marks because if innocent users are safe from being framed by colluded codewords, they are even safer from being framed by those codewords with un-readable marks under the assumption mentioned earlier in Remark2.1.
2.4.1
Separating Weights
Here, we do not look at the unreadable marks.
Definition: The separating weight λw of two coalitions is the least number of
positions where the descendences of them are separated. The normalized separat-ing weight isτw := λw/N where N is the code length.
Obviously, a code isw− SF P if and only if λw > 0.
Sometimesλw is incremented by various means such as concatenation method
in order to overcome some undetectable marks problem. Namely, if some unread-able marks occurs in a supposedly separating position, other positions can serve as a backup in order to separate codes correctly.
Example 2.4. The code{1122334, 2112433, 1212343} is a 2-SFP with λ2 = 2.
user3 as u(3).
Coalition ({user 1 and 2}) = desc2
u(1), u(2) =1 2 , 1,1 2 , 2,3 4 , 3,3 4
Coalition ({user 2 and 3}) = desc2
u(2), u(3) =1 2 ,1 2 , 1, 2,3 4 ,3 4 , 3
Coalition ({user 1 and 3}) = desc2
u(1), u(3) = 1,1 2 ,1 2 , 2, 3,3 4 ,3 4
Note that the coalition of user1 and 2 cannot frame user 3 because of the second
and sixth entries, the coalition of user2 and 3 cannot frame user 1 because of the
third and seventh entries, and the coalition of user 1 and 3 cannot frame user 2
because of the first and fifth entries.
Note that the separating weight of such code isλ2 = 2 because they are
differ-entiated in at least two positions. The normalized separating weight is therefore
τ2 = 2/7.
2.5
Identifiable parent property code
Definition:C is a w-identifiable parent property (IPP) code provided that for all x∈ descw(C), it holds that
\
i : x∈desc(Ci)
Ci 6= ∅.
A code enjoys thew-identifiable parent property if no coalition of size at most w can produce an N-tuple that cannot be traced back to at least one member of
the coalition. In such a code, whenever a codeword belongs to the descendance of a coalition of size at mostw, at least one of the parents of the coalition can be
CHAPTER 2. DEFINITIONS AND BASICS 11
2.6
Traceability code
Definition: For x, y ∈ QN, defineI(x, y) = {i : x
i = yi}. C is a w-traceability
(TA) code provided that, for allx∈ descw(C), x∈ desc(Ci) implies that there is
at least one codewordy∈ Cisuch that|I(x, y)| > |I(x, z)| for any z ∈ C \ Ci.
In fact,I(x, y) stands for the closeness of two codewords, which can also be
expressed as N − dH(x, y), where N denotes the length of the codeword, and
dH(x, y) is the hamming distance of two codewords.
A code enjoying the w-traceability property allows an efficient (i.e.,
linear-time) algorithm to determine an identifiable parent. More precisely, if we com-pare an illegal codeword to each codeword in C, then the codeword closest to
the illegal one will be one of the parent in the coalition. Note that TA property is much stronger than just IPP property which necessitates comparisons with wn sets, resulting in a nonlinear running time.
Remark 2.3. It has to be made clear that IPP and TA codes appear vulnerable
under the presence of unreadable marks because by definition we can say nothing if there are “?”, not mentioning identifying or tracing the parents. This will be justified in the beginning of Chapter 4 where we show that they in fact do not exist.
Remark 2.4. If there are no unreadable marks in the pirated codewords, then
IPP and TA codes can exist. However, the constructions of IPP and TA codes will not be treated because they are only of theoretical interest owing to intolerance of unreadable marks.
In the sequel, we point out the relationships of these codes.
2.7
Relations
1. w-SFP implies w-FP. This is self-explanatory if we treat an individual as
an independent coalition. Let one coalitionA be of size at most w and the other coalitionB be simply one individual. w-SFP assures that two disjoint
coalitions of size at mostw cannot produce the same codeword. The
coali-tion B is a trivial coalition since the descendence of B is B itself, which would not be framed by coalitionA by the definition of SFP.
2. w-IPP implies w-SFP. This is clear because IPP itself is an intensified
ver-sion of SFP. Namely,(Ci∩ Cj)⊆
T
i : x∈desc(Ci)Ci 6= ∅.
3. w-TA implies w-IPP. Suppose C is a w-TA code. If x ∈ descw(C), then
there is a aubset Ci ⊆ C, where |Ci| = w, such that x ∈ desc(Ci). Let
y ∈ Ci such that |I(x, y)| > |I(x, z)| for all z ∈ Ci. Hence |I(x, y)| >
|I(x, z)| for any z ∈ C by the definition of a w-TA code. We show that,
for any Cj ⊆ C with |Cj| 6 w, x ∈ desc(Cj) implies y ∈ Cj. In fact, if
y6∈ Cj, then there isw∈ Cjsuch that|I(x, w)| > |I(x, y)| by the definition
of a w-TA code. This contradicts the fact that|I(x, y)| > |I(x, z)| for any z ∈ C.
Chapter 3
Hash Families and Codes
Before going into explicit constructions of such codes, some preliminaries are needed to reinforce the mathematical structures and serve as basic tools in the construction.
Recently, hash families and related structures have been used to construct codes for copyright protection. Subsequently, we will define them and discuss their inter-relationship with the codes defined in the previous chapter.
3.1
Hash Functions
Let n > m. An (n, m)-hash function is a function h : A 7→ B, where |A| = n
and |B| = m. An (n, m)-hash family is a finite set H of (n, m)-hash functions such thath : A7→ B for each h ∈ H, where |A| = n and |B| = m. We use the
notationHF (N; n, m) to denote an (n, m)-hash family with|H| = N.
3.2
Perfect Hash Families
Let n, m and w be integers such that n > m > w > 2. An (n, m, w)-perfect
hash family is an(n, m)-hash family,H, such that for any X ⊆ A with |X| = w,
there exists at least one h ∈ H such that h|X is injective. We use the notation
P HF (N; n, m, w) to denote an (n, m, w)-perfect hash family with|H| = N.
3.3
Separating Hash Families
Let n, m, w1 andw2 be integers such that n > m. An (n, m, w1, w2)-separating
hash family is an (n, m)-hash family, H, such that for any X1, X2 ⊆ A with
|X1| = w1,|X2| = w2andX1∩ X2 =∅, there exists at least one h ∈ H such that
{h(x) : x ∈ X1}∩{h(x) : x ∈ X2} = ∅. We use the notation SHF (N; n, m, w1, w2)
to denote an(n, m, w1, w2)-separating hash family with|H| = N.
[16] provides a survey on hash families. The following theorem is immediate from the definition of perfect hash families and separating hash families.
Theorem 3.1. LetH be an (N; n, m) hash family.
1. IfH is a P HF (N; n, m, w), then it is a P HF (N; n, m, w′) for all w′ 6w.
2. IfH is a SHF (N; n, m, w1, w2), then it is a SHF (N; n, m, w′1, w2′) for all
w′
1 6w1 andw′2 6w2.
3. IfH is a P HF (N; n, m, w1+ w2), then it is a SHF (N; n, m, w1, w2).
Next, we establish the relationship between hash families and codes, we depict a(N, n, q)-code, C, as an n× N matrix M(C) on q symbols, where each row of
the matrix corresponds to one of the codewords. Similarly, we can represent an
HF (N; n, m), H, as an N × n matrix on m symbols, where each row of the
matrix corresponds to one of the functions inH. These two matrices are transpose to each other.
CHAPTER 3. HASH FAMILIES AND CODES 15 Given an (N, n, q)-code C, we define H(C) to be the HF (N; n, q) whose
matrix representation is M(C)⊤. Thus if C = {x1, x2,· · · , xn} and 1 6 j 6 N,
then the hash functionhj ∈ H(C) is defined by the rule hj(i) = xij, 1 6 i 6 n.
Obviously, the matrix representation of PHF and SHF should satisfy the fol-lowing:
Lemma 3.1. AP HF (N; n, q, w) can be depicted as an N×n matrix with entries
from {1, 2, . . . , q} such that in any w columns, there exists at least one row such that thew entries are distinct.
Lemma 3.2. ASHF (N; n, q, w1, w2) can be depicted as an N × n matrix with
entries from{1, 2, . . . , q} such that in any two disjoint columns C1andC2 of size
w1 andw2 respectively, there exists at least one row such that the entries in the
columnsC1 are distinct from the entries in the columnsC2.
Hence the relationship between PHF and FP codes and between SHF and SFP codes follows immediately by definition.
Theorem 3.2. A (N, n, q)-code, C, is a w− F P code if and only if H(C) is an SHF (N; n, q, w, 1).
Theorem 3.3. A(N, n, q)-code, C, is a w−SF P code if H(C) is an P HF (N; n, q, 2w),
wheren > 2w.
Theorem 3.4. A(N, n, q)-code, C, is a w− SF P code if and only if H(C) is an SHF (N; n, q, w, w), where n > 2w.
The proofs are trivial. Perfect hash families and separating hash families turn out to be just another languages for FP and SFP codes.
3.4
Difference Matrices
Definition: An(n, k; λ)-difference matrix is a k× nλ matrix D = (di,j), with
entries fromZn, in which the multiset
{dh,j− di,j mod n : 1 6 j 6 nλ}
Example 3.1. Ifgcd ((k − 1)!, n) = 1, then the k × n matrix D defined by di,j =
ij mod n is a (n, k; 1)-difference matrix.
The concept of difference matrix will serve as a tool later in the recursive construction of perfect hash families in Theorem5.20.
3.5
Set Systems
Definition: A set system is a pair (X,B) where X is a set of elements called
points, andB is a set of subsets of X, the members of which are called blocks. A set system can be described by an incidence matrix. Let (X,B) be a set system
where X = {x1, x2, . . . , xn} and B = {B1, B2, . . . , BN}. The incidence matrix
of(X,B) is the N × n matrix A = (aij), where
aij =
(
1 ifxj ∈ Bi
0 ifxj 6∈ Bi.
Conversely, given an incidence matrix, we can define an associated set system in an obvious way. Here, ifC is a (N, n, 2)-code, then the matrix M(C) is a 0− 1
matrix, which can therefore be thought of as the incidence matrix of a set system. For any codeword w ∈ C, we will use Bw to denote the associated block in the
corresponding set system.
3.6
Sandwich Free Families
A set system (X,B) is an (w1, w2)-sandwich free family provided that, for any
two disjoint subsetsC1, C2 ofB, where |C1| 6 w1 and|C2| 6 w2, the following
property holds: \ B∈C1 B ! [ \ B∈C2 B ! * [ B∈C1 B ! \ [ B∈C2 B !
CHAPTER 3. HASH FAMILIES AND CODES 17 An(w1, w2)-sandwich free family, (X,B), will be denoted as an (w1, w2)−SF F (N, n)
if|X| = n and |B| = N.
The connection between SFF and SFP codes is stated as follows.
Theorem 3.5. A w− SF P (N, n) exists if and only if there exists a (w, w) − SF F (N, n).
The proof is not so straightforward like the previous one, and will be given in the proof of Theorem5.16which focuses on explicit constructions of such codes.
3.7
Secure Codes
A codeC is w-secure if there exists a tracing algorithmA satisfying the following:
if a coalitionC of size at most w generates a word x thenA(x) ∈ C.
The tracing algorithm A on input x must output a member of the coalition
C that generated the codeword. Hence, an illegal copy can be traced back to at
least one member of the guilty coalition. Clearly there is no hope in recovering the entire coalition since some of its members might be passive; they are part of the coalition, but they contribute nothing to the construction of illegal copies.
Actually, the concept of w-secure codes is not new to us since we have the
following result.
Proposition 3.1. C isw-secure if and only if C is an w-IPP code.
Proof. We firstly derive a necessary condition of a code to be w-secure.
Con-sider the following scenario: letC be some code. Let C1 andC2 be two coalition
of w colluders such that C1 ∩ C2 = ∅. Suppose an unregistered codeword is
caught which is marked by a codeword x which belongs to both descw(C1) and
descw(C2). As a consequence, both coalitions are suspicious. Since their
inter-section is empty, it is not possible to determine with certainty who created the unregisteredx. It follows that if C is w-secure then when the intersection of C1
well. Of course, the same is true for j subsets C1, . . . , Cj. This gives the
nec-essary condition. The sufficient condition is self-explanatory by the definition of identifiable parent property of IPP code.
Hence, TA codes are secure codes as well. However, both IPP and TA codes do not exist under the presence of unreadable marks as will be clarified in the next chapter. Therefore, IPP and TA codes are only interesting from a theoretic point of view, and will not be treated subsequently. In the next chapter we will explain more about unreadable marks and introduce a probabilistic traitor tracing algorithm to construct “almost” secure codes.
Chapter 4
Unreadable Marks and PTT
Unreadable marks or undetectable bits are symbols in an uncertain state. For in-stance, when the police or distributer recovers an illegal copy of an object, she might find some symbols undefined in the codeword or could hardly determine which state an unreadable mark is in. The only thing she can do is to simply re-place them by “?”’s.
On the other hand, unreadable marks can be deliberately created by the coali-tions in order to make traitor tracing less feasible and make themselves safer from being prosecuted. As a matter of fact, IPP and TA codes do not exist under the presence of unreadable marks as will be indicated later. However, FP and SFP codes are resistent from the threats of unreadable marks because if innocent users are safe from being framed by colluded codewords, they are even safer from being framed by those codewords with unreadable marks.
Without unreadable marks, IPP and TA codes can exist and have been investi-gated by several researchers in [21,35, 10,6, 2,36, 37,19, 29,20]. However, in the context of fingerprinting, the assumption that marks cannot become unread-able is unrealistic.
Based on the above reasoning and the fact that SFP is an intensified version of FP codes, SFP finds more practical applications in industry. Therefore, the explicit construction of SFP codes will be our main focus in the next chapter.
The remaining of this chapter will explain how unreadable marks contravene the existence of IPP and TA codes. In order to overcome the problem, a proba-bilistic approach will be proposed.
4.1
Unreadable Marks
Recall in Section3.7the idea of secure codes is introduced. We rephrase Proposi-tion3.1as the following lemma.
Lemma 4.1. IfC is a w-secure code then
C1∩ · · · ∩ Cr =∅ ⇒ descw(C1)∩ · · · ∩ descw(Cr) =∅
for all coalitionsC1,· · · , Crof at mostw colluders each.
It seems that secure codes provide a good solution to the problem of collusion. Unfortunately, whenw > 1, w-secure codes do not exist.
Theorem 4.1. Forw > 2 there are no w-secure codes.
Proof. Obviously, it is sufficient to show that there are no 2-secure codes. Let c(1), c(2), c(3)be three distinct legal codewords assigned to usersu
1, u2, u3,
respec-tively. Define the majority wordM = MAJ c(1), c(2), c(3)by
Mi = c(1)i , ifc(1)i = c(2)i orc(1)i = c(3)i c(2)i , ifc(2)i = c(3)i ?, otherwise.
One can readily verify that the majority wordM belongs to desc2{u1, u2}, desc2{u1, u3},
and desc2{u2, u3}, simultaneously. However, the intersection of the coalitions is
empty. Hence, by Lemma4.1, the2-secure code cannot exist.
The proof of the theorem shows that if a coalition employs the “majority” strat-egy it is guaranteed to defeat all fingerprinting codes. Based on above argument and Proposition 3.1 the existence of IPP and TA codes is denied. This forces us to weaken our requirements for fingerprinting schemes. In the following section,
CHAPTER 4. UNREADABLE MARKS AND PTT 21 we intend to allow the distributor to make some random choices when embedding the codewords in the products. The point is that the random choices will be kept hidden from the users. This enables us to construct codes which will capture a member of the guilty coalition with sufficiently high probability.
4.2
Probabilistic Traitor Tracing
Probabilistic traitor tracing (PTT) is much more efficient in most of the cases. In this scheme, we need not to identify colluders who have absolutely committed crime.kInstead, we treat a couple of might-be-colluders as suspects, and compute the probability that they might be colluders. This may not deterministically tell us who is guilty for the first time. However, after several times of identification, some pirates will become more and more suspicious by accumulating their probabilities of being guilty. Such a strategy works particularly well for the applications such as pay-per-view movies that call for iterative retrievals of data.
Suppose a coalitionC of w users creates an illegal copy of an object.
Finger-printing schemes that enable the capture of a member of the coalitionC with
prob-ability at least1−ǫ are called w-secure codes with ǫ error. Namely, Pr [A(x) ∈ C] > 1−ǫ. In other words, The traitor tracing algorithm A on input x outputs a member
of the coalitionC that generated the codeword x with high probability. To do so,
we intend to allow the distributor to make some random choices when embedding the codewords in the objects. Our point is that the random choices will be kept hidden from the users.
We begin by considering an(N, n)-code which is n-secure with ǫ-error for any ǫ > 0. Let cm be a column of heightn in which the first m bits are 1 and the rest
are 0. The code C (N = d(n− 1), n) consists of all columns c1, . . . , cn−1, each
duplicatedd times. The amount of duplication determines the error probability ǫ.
Example 4.1. The codeC(16, 5) for five users A, B, C, D, E is kMore generally speaking, we say they committed crime with probability1.
A : B : C : D : E : B1 z }| { 1111 0000 0000 0000 0000 B2 z }| { 1111 1111 0000 0000 0000 B3 z }| { 1111 1111 1111 0000 0000 B4 z }| { 1111 1111 1111 1111 0000
An intuitive traitor tracing strategy is: if any of the first three positions of a pirated codeword is 1, then we know A must belong to the coalition. If we look
at the other direction, we have that if any of the last three positions of a pirated codeword is0, then we know E must belong to the coalition. If A and B collude, C, D, and E are safe from being framed. However, if A and E collude, the
de-scendance ofA and E could jeopardize legal users of B, C, and D. Nevertheless,
this is very unlikely becauseA and E differ in 16 places and the probability for A and E to frame B, C, or D is barely 1216 ≈ 10−5. This gives a heuristics for
probabilistic traitor tracing.
Consider, ifB is innocent, then what A, C, D, E could detect in the first eight
positions is totally indifferent, namely, either 11111111 or 00000000. If some of A, C, D, or E collude, then the number of 0′s and 1′s should be evenly distributed
inB1 andB2. If the number of1′s tends to appear more in B2 rather than inB1,
then we deduce thatB is highly suspicious.
Let w(1), . . . , w(n) denote the codewords of C(N, n). Before the distributor
embeds the codewords ofC(N, n) in an objects he picks a permutation π as
ran-dom as possible. Userui’s copy of the object will be fingerprinted using the word
πw(i). Note that the same permutationπ is used for all users. The point is that π
will be kept hidden from the user. Keeping the permutation hidden from the users is equivalent to hiding the information of which mark in the object encodes which bit in the code. This simple technique will be shown to be effective to overcome the barrier of unreadable marks.
Before going to the construction, we introduce some notation:
1. LetBmbe the set of all bit positions in which the users see columns of type
cm. That is,Bmis the set of all bit positions in which the firstm users see a
CHAPTER 4. UNREADABLE MARKS AND PTT 23 2. For2 6 s 6 n− 1 define Rs = Bs−1∪ Bs.
3. For a binary string x, let weight(x) denote the number of 1′s as a binary
case of Hamming weight defined in Section2.1.2.
Theorem 4.2 (Boneh and Shaw [8]). Forn > 3 and ǫ > 0 let d = 2n2log 2nǫ . The fingerprinting schemeC(N, n) is n-secure with ǫ-error.
The argument has been literally treated above, but we formalize the language here. The length of this code isd(n− 1) = O n3logn
ǫ
Intuitively, suppose user
s is NOT a member of the coalition C0 which produced the wordx. The hidden
permutationπ prevents the coalition from knowing which marks represent which
bits in the codeC(N, n). The only information the coalition has is the value of the
marks it can detect. Observe that without users a coalition sees exactly the same
values for all bit positionsi∈ Rs. Hence, for a bit positioni∈ Rs, the coalitionC0
cannot tell if i lies in Bs or inBs−1. This means that whichever strategy they use
to set the bits ofx|Rs, the1
′s in x|
Rs will be roughly evenly distributed between
x|Bs andx|Bs−1 with high probability. As a result, if the1
′s in x|
Rs are not evenly
distributed then, with high probability, user s is a member of the coalition that
generatedx.
Algorithm for probabilistic traitor tracing will be stated accordingly. The input codewordx found in the illegal copy may contain some unreadable marks, call it
“?”. As a convention these bits are set to “0” before the word x is feed into the
algorithm.
INPUT: x∈ {0, 1}N.
AIM: Find a subset of the coalition that producedx.
Algorithm:
1. Ifweight (x|B1) > 0 then output “User 1 is guilty.”
2. Ifweight x|Bn−1
< d then output “User n is guilty.”
3. Fors from 2 to n− 1 do:
Letk = weight (x|Rs). Ifweight x|Bs−1 < k2 −qk 2log 2n
The correctness of algorithm rely on the following theorem.
Theorem 4.3. Consider the codeC (N = d(n− 1), n) where d = 2n2log 2n ǫ . Let
S be the set of users which is declared as guilty on input x. Then with probability
at least1− ǫ, the set S is a subset of the coalition C0that producedx.
Before the proof of the theorem we introduce two preliminary lemmas.
Lemma 4.2 (Chernoff Bound). Let X be a binomial random variable over k
experiments with success probability1/2. Then,
Pr X−k 2 < a 6e−2a2/k
The proof can be found in standard textbooks on probability theory.
Lemma 4.3. LetY follows a hyper-geometric distribution:
Pr[Y = r] = d r d k−r 2d k .
LetX follows a binomial distribution with success probability 1/2:
Pr[X = r] =k r 1 2 k . Then, Pr[Y = r] 6 2Pr [X = r]
CHAPTER 4. UNREADABLE MARKS AND PTT 25
Proof. For the sake of brevity assumek is even. (The case for k odd is similar.)
Pr[Y = r] = d r d k−r 2d k =k r d(d − 1) · · ·(d − r + 1)d(d − 1) · · ·(d − k + r + 1) 2d(2d− 1) · · · (2d − k + 1) 6k r 2−kd 2(d− 1)2· · · d − k−2 2 2 d(d− 1) · · · d −k−1 2 =k r 2−k d(d− 1) · · · d − k−2 2 d− 1 2 d−3 2 · · · d − k−1 2 =k r 2−k d(d− 1) · · · d − k−2 2 d− 1 + 1 2 d− 2 + 1 2 · · · d − k−2 2 + 1 2 d−k−1 2 6k r 2−k d d−k−1 2 6k r 2−k· 2 = 2Pr [X = r]
Note that the last inequality follows sincek 6 d.
The proof of Theorem4.3is now as follows:
Proof. Suppose user 1 was declared guilty, i.e., 1 ∈ S. Then weight (x|B1) > 0.
This tells us that user1 must be a member of C0 (otherwise, the bits inB1would
appear undistinguishable forC0). Similarly, ifn∈ S then n ∈ C0.
Suppose the algorithm declared user1 < s < n as guilty. We show that the
probability thats is not guilty is at most nǫ. This will show that the probability that there exists a user inS which is not guilty is at most ǫ.
Lets be an innocent user, i.e., s 6∈ C0. As was discussed above, this means
that the coalitionC0 cannot distinguish between the bit positions inRs. Because
the permutation π was chosen uniformly at random from the set of all
permu-tations, the 1′s in x|
Rs may be regarded as being randomly placed in x|Rs. Let
1′s in x|
Bs−1 given thatx|Rs containsk 1
′s. For any integer r, 0 6 r 6 k:
Pr[Y = r] = Prweight(x|Bs−1) = r| weight(x|Rs) = k = d r d k−r 2d k
follows a hyper-geometric distribution where d = 2n2log 2n
ǫ is the size of the
block. The expectation ofY is k
2. To bound the probability thats was pronounced
guilty we need to bound Pr " Y < k 2 − r k 2log 2n ǫ #
from above. This can be done by comparingY to an appropriate binomial random
variable.
LetX be a binomial random variable over k experiments with success
proba-bility 12. Lemma4.3tells us that for anyr we have that Pr [Y = r] 6 2Pr [X = r].
This means that for anya > 0
Pr Y − k 2 < a 62Pr X− k 2 < a 62e−2a2/k
where the last inequality follows from the standard Chernoff bound of Lemma
4.2. Plugging ina =qk 2 log 2n ǫ leads to Pr " Y < k 2 − r k 2log 2n ǫ # 62e− log2nǫ = ǫ n
Hence, if users is innocent then the probability of her being declared guilty is at
most nǫ. This also means the probability that some innocent user will be declared guilty is at mostǫ, as desired.
Note that the code size is always smaller than the code length by a factor of d here, meaning a poor code size. This problem can be overcome with the
concatenation method discussed in [8] in order to increase the code size and hence accommodate more users. We provide the sketch concept here. Recall in Section
2.1.5the definition of code composition. LetC′(N′, n′) be an outer code over an
alphabet size n, with code size n′ and code lengthN′, where the codewords are
chosen independently and uniformly at random. The idea is to compose our
CHAPTER 4. UNREADABLE MARKS AND PTT 27 code will containn′codewords and has lengthN′N = N′d(n−1). It is made up of
N′ copies ofC(N, n). The point is that the codewords of the code C′ will be kept
secret from the users. This is in addition to keeping hidden the N′ permutations
used when embedding theN′copies ofC(N, n) in the products. A traitor tracing
algorithm is also provided for this scheme which is similar to the original one. Moreover, N and n can be chosen in such a way that n is exponential in N. For
more details we refer the reader to their paper [8]. In the next chapter we will concentrate on the construction of secure frameproof codes.
Constructions of SFP Codes
This chapter discusses various constructions that meet the requirement of secure-frameproof property. The constructions can be classified into two classes. One of them is called direct construction which will be studied in the first half of this chapter. In such scheme, we construct directly without any help of previous exis-tential results. The other is recursive construction which will be investigated in the second half of this chapter. Given a code‡‡ satisfying certain properties the recur-sive construction augments it to longer codewords and larger code size satisfying the original properties as well.
Part I: Direct Construction
5.1
Hadamard Matrices and Jacobsthal Matrices
Definition: A Hadamard matrix is ann× n real matrix H which satisfies HHT =
nI. The name derives from a theorem of Hadamard.
Theorem 5.1. LetX = (xij) be an n×n real matrix whose entries satisfy |xij| 6
1 for all i and j. Then |det(X)| 6 nn/2. Equality holds if and only if X is a
Hadamard matrix.
‡‡We call it the initial seed.
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 29 Letx1, x2, . . . , xn be the rows ofX. By Euclidean geometry,|det(X)| is the
volume of the parallelepiped with sidesx1, x2, . . . , xn; namely,
|det(X)| 6 |x1| · |x2| · · · |xn|
where|xi| is the Euclidean length of xi; equality holds if and only ifx1, x2, . . . , xn
are mutually perpendicular. By assumption,
|xi| = x2i1+ x2i2+· · · + x2in
1/2
6n1/2,
with equality if and only if|xij| = 1 for all j.
Subsequently, we focus on Hadamard matrices with all entries±1.
For which ordersn do Hadamard matrices exist? There is a well-known
nec-essary condition:
Theorem 5.2. If a Hadamard matrix of ordern exists, then n = 1 or 2 or n ≡ 0 (mod 4).
To see this, we observe first that changing the sign of every entry in a column of a Hadamard matrix gives another Hadamard matrix. So changing the signs of all columns for which the entry in the first row is−, we may assume that all entries in the first row are+. (We abbreviate +1 and−1 to + and − respectively.)
a z }| { +· · · + +· · · + +· · · + a z }| { +· · · + +· · · + − · · · − a z }| { +· · · + − · · · − +· · · + a z }| { +· · · + − · · · − − · · · −
Because every other row is orthogonal to the first, we see that each further row hasm entries + and −, where n = 2m. Moreover, if n > 2, the first three rows
are displayed in the above figure withn = 4a. The most important open question
in the theory of Hadamard matrices is that of existence (In other words, whether or not the above necessary condition could serve as a sufficient condition is not known).
Conjecture 5.1. A Hadamard matrix of order4n exists for every positive integer n.
Theorem 5.3. Let H be a Hadamard matrix of order n. Then the partitioned
matrix
H H H −H
is a Hadamard matrix of order2n.
This observation can be applied recursively and leads to the following series of matrices. H1 = 1 H2 = 1 1 1 −1 H4 = 1 1 1 −1 1 1 1 −1 1 1 1 −1 − 1 1 1 −1 = 1 1 1 1 1 −1 1 −1 1 1 −1 −1 1 −1 −1 1 · · ·
In this manner, Sylvester constructed Hadamard matrices of order2nfor every
non-negative integern. Sylvester’s matrices have a number of special properties.
They are symmetric. The elements in the first column and the first row are all pos-itive. The elements in all the other rows and columns are evenly divided between positive and negative.
Raymond Paley later showed how to construct a Hadamard matrix of order
q + 1 where q is any prime power which is congruent to 3 modulo 4. He also
constructed matrices of order2(q + 1) for prime powers q which are congruent to 1 modulo 4. His method uses finite fields.
Letq be a prime power congruent to 3 modulo 4. Recall that in the field GF(q),
half of the nonzero elements are quadratic residues, and half are quadratic non-residues. The quadratic character of GF(q) is defined as:
χ(x) = 0 ifx = 0; +1 ifx is a quadratic residue; −1 ifx is a quadratic non-residue.
Definition: LetA be a matrix whose rows and columns are indexed by elements
of GF(q), and has entries as axy = χ(y − x). Then, A is skew-symmetric, with
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 31
Theorem 5.4. If we replace the diagonal zeros by −1s in the Jacobsthal matrix
and augment it by a new row and a new column all of entries 1, we obtain a
Hadamard matrix of orderq + 1 called Hadamard matrix of Paley type. H =
1 1 1 A− I
Example 5.1. Forp = 7, we obtain the following matrix:
A = 0 1 1 −1 1 −1 −1 −1 0 1 1 −1 1 −1 −1 −1 0 1 1 −1 1 1 −1 −1 0 1 1 −1 −1 1 −1 −1 0 1 1 1 −1 1 −1 −1 0 1 1 1 −1 1 −1 −1 0
A normalized Hadamard matrixH of order q + 1 of Paley type is now given
as follows: H = 1 1 1 A− I .
Example 5.2. Forp = 7, we obtain the following matrix over GF(3) by replacing
-1 by 2 fromA: A′ = 0 1 1 2 1 2 2 2 0 1 1 2 1 2 2 2 0 1 1 2 1 1 2 2 0 1 1 2 2 1 2 2 0 1 1 1 2 1 2 2 0 1 1 1 2 1 2 2 0
LetH4k be any Hadamard matrix of order 4k when +1s are replaced by 0s
and−1s by 1s.
Theorem 5.5 (Encheva & Cohen [17]). H4kis a binary2−SF P (N, n) where N =
n = 4k.
Proof. We show that there is a column like(0011)⊤or(1100)⊤for anyc
1, c2, c3, c4 ∈
H4k. We consider a normalized Hadamard matrix where the first row is the all1s
and firstly assume none ofc1, c2, c3, c4 is the all 1s codeword. Suppose the
c1 2k z }| { 1 . . . 1 2k z }| { 0 . . . 0 c2 k z }| { 1 . . . 1 k z }| { 0 . . . 0 k z }| { 1 . . . 1 k z }| { 0 . . . 0 c3 (a) z }| { 1 . . . 1 (k−a) z }| { 0 . . . 0 (k−b) z }| { 1 . . . 1 (b) z }| { 0 . . . 0 (k−c) z }| { 1 . . . 1 (c) z }| { 0 . . . 0 (d) z }| { 1 . . . 1 (k−d) z }| { 0 . . . 0
Because none of c1, c2, c3, c4 is the all 1s codeword, they should contain equal
number of 0s as 1s, and every two rows should coincide in half of the positions
and differ in the other half positions. Therefore,c3should contain2k 1s, yielding:
a + (k− b) + (k − c) + d = 2k
Sincec3 should coincide withc2 in exactly2k positions, we have that:
a + b + (k− c) + (k − d) = 2k
Again sincec3 should coincide withc1 in exactly2k positions, we have that:
a + b + c + d = 2k
A routine calculation leads toa = b = c = d.
Accordingly, the support ofc1, c2, c3, c4 is given by
c1 2k z }| { 1 . . . 1 2k z }| { 0 . . . 0 c2 k z }| { 1 . . . 1 k z }| { 0 . . . 0 k z }| { 1 . . . 1 k z }| { 0 . . . 0 c3 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 c4 (k−y) z }| { 0 . . . 0 (y) z }| { 1 . . . 1 (y) z }| { 0 . . . 0 (k−y) z }| { 1 . . . 1 (y) z }| { 0 . . . 0 (k−y) z }| { 1 . . . 1 (k−y) z }| { 0 . . . 0 (y) z }| { 1 . . . 1 ⇓ y > k− x ⇓ k− y > x We deduce that x + y = k
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 33 c3 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 c4 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (k−x) z }| { 0 . . . 0 (x) z }| { 1 . . . 1 (x) z }| { 0 . . . 0 (k−x) z }| { 1 . . . 1
However, this is impossible because c3 and c4 should coincide in 2k positions.
Moreover, if one of c1, c2, c3, c4 is the all 1s codeword then it is even easier for
them to exhibit the2− SF P property. Hence, H4k is2− SF P .
Theorem 5.6. Jacobsthal matrices generate2− SF P over GF(3).
5.2
The Subsets Method
Another direct construction which employs the properties of subsets was proposed by Tonien and Safavi-Naini [34].
Let(k) be the set{1, 2, . . . , k}. By (k)twe denote the set of all subsets of(k)
which contain exactlyt elements.
With parametersk, t, r, consider the matrix Mt,r(k) whose rows are labeled by
elements of(k)tand columns are labeled by elements of(k)r. ForU ∈ (k)t, V ∈
(k)r, the entry at the rowU and column V of the matrix Mt,r(k) is|U ∩ V |. The
codeCt,r(k) is composed by the rows of the matrix Mt,r(k). Without ambiguity,
we identify a codeword of Ct,r(k) with a set U ∈ (k)t and a position with a
set V ∈ (k)r. By definition, the symbol of the codewordU at the position V is
UV =|U ∩ V |.
The codeCt,6r(k) can be defined in a similar way. Code Ct,6r(k) is depicted
by the matrix Mt,6r(k) whose rows and columns are labeled by elements of the
sets(k)tand(k)6r respectively. ForU ∈ (k)t andV ∈ (k)6r, the symbol of the
codewordU at the position V is UV =|U ∩ V |.
CodesC∗
t,r(k) and Ct,6r∗ (k) are binary codes. They are constructed the same
as code Ct,r(k) and Ct,6r(k) except that the symbol of the codeword U at the
positionV is UV =|U ∩ V | (mod 2).
Example 5.3. Codes ofC3,2(5), C3,2∗ (5), C3,62(4), and C3,62∗ (4) are shown below:
C3,2(5) {1, 2} {1, 3} {1, 4} {1, 5} {2, 3} {2, 4} {2, 5} {3, 4} {3, 5} {4, 5} {1, 2, 3} 2 2 1 1 2 1 1 1 1 0 {1, 2, 4} 2 1 2 1 1 2 1 1 0 1 {1, 2, 5} 2 1 1 2 1 1 2 0 1 1 {1, 3, 4} 1 2 2 1 1 1 0 2 1 1 {1, 3, 5} 1 2 1 2 1 0 1 1 2 1 {1, 4, 5} 1 1 2 2 0 1 1 1 1 2 {2, 3, 4} 1 1 1 0 2 2 1 2 1 1 {2, 3, 5} 1 1 0 1 2 1 2 1 2 1 {2, 4, 5} 1 0 1 1 1 2 2 1 1 2 {3, 4, 5} 0 1 1 1 1 1 1 2 2 2
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 35 C3∗,2(5) {1, 2} {1, 3} {1, 4} {1, 5} {2, 3} {2, 4} {2, 5} {3, 4} {3, 5} {4, 5} {1, 2, 3} 0 0 1 1 0 1 1 1 1 0 {1, 2, 4} 0 1 0 1 1 0 1 1 0 1 {1, 2, 5} 0 1 1 0 1 1 0 0 1 1 {1, 3, 4} 1 0 0 1 1 1 0 0 1 1 {1, 3, 5} 1 0 1 0 1 0 1 1 0 1 {1, 4, 5} 1 1 0 0 0 1 1 1 1 0 {2, 3, 4} 1 1 1 0 0 0 1 0 1 1 {2, 3, 5} 1 1 0 1 0 1 0 1 0 1 {2, 4, 5} 1 0 1 1 1 0 0 1 1 0 {3, 4, 5} 0 1 1 1 1 1 1 0 0 0 C3,62(4) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2, 3} 1 1 1 0 2 2 1 2 1 1 {1, 2, 4} 1 1 0 1 2 1 2 1 2 1 {1, 3, 4} 1 0 1 1 1 2 2 1 1 2 {2, 3, 4} 0 1 1 1 1 1 1 2 2 2 C3∗,62(4) {1} {2} {3} {4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2, 3} 1 1 1 0 0 0 1 0 1 1 {1, 2, 4} 1 1 0 1 0 1 0 1 0 1 {1, 3, 4} 1 0 1 1 1 0 0 1 1 0 {2, 3, 4} 0 1 1 1 1 1 1 0 0 0
The following two theorems are used to establish the secure-frameproof prop-erty ofCt,r(k) and Ct,6r(k).
Theorem 5.7. IfS1, S2, S3, and S4 are arbitrary subsets of(k) such that
Si 6⊂ Sj andSj 6⊂ Sifor alli∈ {1, 2} and j ∈ {3, 4}
then there exists an elementsV ∈ (k)63 such that the following two sets
{|V ∩ S1| mod 2, |V ∩ S2| mod 2} and {|V ∩ S3| mod 2, |V ∩ S4| mod 2}
are disjoint.
This further implies the following.
Corollary 5.1. IfS1, S2, S3, and S4are arbitrary subsets of(k) such that
Si 6⊂ Sj andSj 6⊂ Sifor alli∈ {1, 2} and j ∈ {3, 4}
then there exists an elementsV ∈ (k)63 such that the following two sets
{|V ∩ S1|, |V ∩ S2|} and {|V ∩ S3|, |V ∩ S4|}
The proof of Theorem5.7 and Corollary5.1 can be found in [34] which ex-haustively investigates all of the possibilities of distribution of 0s and 1s. Based
on the above fact, we have the following explicit constructions.
Theorem 5.8. For anyk > 4, C2,2(k) is a ternary 2−SF P with code size n = k2
and code lengthN = k2.
Proof. We indicate a proof which is easier than the original found in the paper of
Tonien and Safavi-Naini. First it is sufficient to show that ifC2,2(8) is 2− SF P
then the same is true as well forC2,2(k) for all k > 8.
Therefore, wheneverk > 8, the submatrix of dimension 82 × 82out of the matrix of dimension k2 × k2 will always have the 2− SF P property. Thus
the conclusion follows. Now in order to finish the proof, we still have to verify the 2− SF P property for k = 5, 6, 7, 8, but this can be done either by hand or
computers.
Theorem 5.9. For any k > t, Ct,63(k) is a quaternary 2− SF P with code size
n = ktand code lengthN = k1+ 2k+ k3= 16k(k2+ 5).
Theorem 5.10. For anyk > 4t + r− 1 and r > 3, Ct,r(k) is a (min{t, r} + 1) −
ary 2− SF P with code size n = kt
and code lengthN = kr.
Proof. For any four distinct elementsS1, S2, S3, S4of(k)t, by Corollary5.1, there
existsV ∈ (k)63such that the two sets{|V ∩ S1|, |V ∩ S2|} and {|V ∩ S3|, |V ∩ S4|}
are disjoint. Sincek > 4t+r−1 = |S1|+|S2|+|S3|+|S4|+r−1, we can add more
elements from the set(k)\ (S1∪ S2∪ S3∪ S4) to V and obtain a set V′ ∈ (k)r.
We have V ∩ Si = V′ ∩ Si, and thus, the two sets {|V′∩ S1|, |V′∩ S2|} and
{|V′ ∩ S
3|, |V′ ∩ S4|} are disjoint. This proves that the code Ct,r(k) is a 2−SF P .
Combining the results and Theorem5.7,5.9, and5.10, we have the following binary codes.
Theorem 5.11. For any k > t, C∗
t,63(k) is a binary 2 − SF P with code size
n = ktand code lengthN = k1+ 2k+ k3= 16k(k2+ 5).
Theorem 5.12. For anyk > 4t + r− 1 and r > 3, C∗
t,r(k) is a binary 2− SF P
with code sizen = ktand code lengthN = kr.
Note that the “Subsets Method” which is capable of generating exponential code sized 2− SF P is much better than the “Hadamard Method” which gives
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 37
2 − SF P codes with code size only the same as the code length. In order to
demonstrate this, we take advantage of Stirling’s formula
k!∼√2πk k e
k
.
Consider the binary code derived in Theorem5.12, the maximum code size is for
t =⌊k 2⌋, n =kk 2 = k k! 2 ! k2! ∼ √ 2πk kek q 2πk2k/2e k/2 q 2πk2k/2e k/2 = 2k r 2 πk
which is exponential with respect to the code lengthN. Moreover, the minimum
code length is forr = 3, N = 16k(k2+ 5). Therefore, the maximum code rate can
be achieved as: R = N−1logqn ∼ 1 6k(k 2+ 5) −1 log2 2k r 2 πk ! 6 6k k(k2+ 5)log2 r 2 πk !
which tends to zero ask goes to infinity. Nevertheless, we have N =O (log n)3. Later on, we will introduce better codes with positive code rates where again the code size is exponential with respect to the code length. Also, up to now we have only investigated2− SF P codes, we will show w − SF P codes for w > 2 in the
Part II: Recursive Construction
5.3
Concatenation Method
Recall that in Section 2.1.5 of second chapter the concatenation of two codes is defined. The construction of the section employs concatenation of two codes: normally a longer outer codeB and a shorter inner code A. The concatenation is
usually used to increase the code length and the separating weights, λw, defined
in Section2.4.1.
Theorem 5.13. Let u > v, C1 be a u− SF P code with λu andC2 a v− SF P
code withλv, then the concatenatedC1⋆C2is av− SF P with a new separating
weightλ′ >λ uλv.
Proof.C2 is anv− SF P outer code and the symbols of C2 are replaced by a
one-to-one mapping by codewords ofC1, so if any two coalitions ofC2of size at most
v is separable, then any two coalitions of C1⋆C2 of size at mostv is separable as
well. The separating weight of outer code isλv, and for each separated position of
C2 the inner codeC1itself separates in at leastλu positions by definition. Hence,
the new separating weightλ′ is at leastλuλv.
Example 5.4. LetA and B be the following code:
A = 0 0 0 0 1 1 1 1 2 2 2 2 2 2 1 0 1 2 1 0 4×5 B = 0 0 0 0 0 1 1 1 1 0 2 2 2 2 0 3 3 3 3 0 3 2 1 0 1 2 3 0 1 1 1 0 3 2 1 0 1 2 3 1 1 3 2 0 2 0 2 3 1 2 3 1 0 2 2 2 0 1 3 2 2 1 3 0 3 3 0 2 1 3 0 3 1 2 3 1 2 0 3 3 16×5
CHAPTER 5. CONSTRUCTIONS OF SFP CODES 39
It is easy to check thatA is a 3− SF P code and B a 2 − SF P code. Define the
following mapping of alphabet symbols ofB to the rows of A:
θ : 0 7→ (00001) 1 7→ (11122) 2 7→ (22221) 3 7→ (01210)
Applying this mapping toB we obtain a code A⋆B with parameters N = 16, n = 25, q = 3: A⋆B = 00001 00001 00001 00001 00001 11122 11122 11122 11122 00001 22221 22221 22221 22221 00001 01210 01210 01210 01210 00001 01210 22221 11122 00001 11122 22221 01210 00001 11122 11122 11122 00001 01210 22221 11122 00001 11122 22221 01210 11122 11122 01210 22221 00001 22221 00001 22221 01210 11122 22221 01210 11122 00001 22221 22221 22221 00001 11122 01210 22221 22221 11122 01210 00001 01210 01210 00001 22221 11122 01210 00001 01210 11122 22221 01210 11122 22221 00001 01210 01210 16×25
A more practical application of the concatenation method will be indicated later in Section5.5.
5.4
Conversion from Hash Families
Constructions for hash families have been extensively investigated by many re-searches. Here, we assume the existence of certain hash families and use them to construct secure frameproof codes. We first construct small codes and use them as the initial seed to construct bigger ones.
We will use sandwich free families, perfect hash families, and separating hash families to construct SFP codes. Note that in the construction the unreadable marks are unnecessary for discussion. Before doing so, we present a direct con-struction and a recursive concon-struction of SFP codes which explains the idea of recursive construction.
Theorem 5.14. For any integerw > 2, there is a w− SF P 2w−1w−1, 2w. Proof. Recall the representation of incidence matrix defined in Section 3.5of set systems. Let the codeC be built from an incidence matrix whose first row contains
all 1s and the remaining columns corresponds to B which is the set of subsets B1, . . . , BN, where Bi contains all possible (w − 1) choices out of (2w − 1)
elements, yielding N = 2w−1w−1. We will show that C = u(1), . . . , u(2w) is a
w− SF P (N, n) code where N = 2w−1w−1 is the code length andn = 2w is the
code size. It suffices to verify that for all C1, C2 ⊆ C and |C1| = |C2| = w,
C1∩ C2 =∅. Since n = 2w, if follows that C2 = C\ C1. Because the code length
isN = 2w−1w−1, there is a unique bit positioni such that u(j)i = 1 for all u(j)∈ C 1
andu(j)i = 0 for all u(j) ∈ C
2. It follows thatxi = 1 if x ∈ descw(C1) and xi = 0
ifx∈ descw(C2) or vice versa. Hence, descw(C1)∩ descw(C2) =∅.
Example 5.5. Using the above method, a 3-SFP(10,6) code can be constructed
and interpreted as an incidence matrix as follows:
M(C) = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1 6×10
A recursive construction can be provided in a similar way.