A New Pattern Representation Scheme
Using Data Compression
Presented by Chen-hsiu Huang
Media Data Processing
• How to deal with tremendous variety of media:
– Caterization resolves the latent global information structure
contained in a set of unknown data
– Recognition provides the means of correctly identifying
• We believe that a new media analysis scheme should alleviate following:
1. Generality, i.e., applicability to media data of any type 2. Facility for both categorization and recognition
3. The ability to cope with indefinitely varying (difficult to represent by a set of finite well-defined models) media data 4. Easily implementable and low processing cost.
VQ: Vector Quantization
• VQ is applicable to a wide range media data and is implemented very easily.
• Categorization and recognition are performed by partitioning a given feature space into several classes and assigning an unknown vector to an appropriate class
• Due to the lack of general mapping schemes, VQ has been limited to rather low-level analysis tasks for which intrinsic feature vectors are available
• Frequency domain methods using features such as Fourier-,
DCT-, or wavelet-coefficients have wide applicability but require real-valued inputs. This requirement restricts the applicability of such methods to symbol data such as text
• NN and related algorithms provide a completely different general scheme. Such systems also cope well with indefinitely varying sources, yet are only applicable to recognition and require much training even for small tasks with a corresponding high
The PRDC System
• In PRDC, input data is converted into text and compressed using a set of encoding dictionaries; it generates a
compression ratio vector (CV) as a feature of the original input • The CV is then used as a feature vector in traditional VQ.
Although some of the original information is lost in the text
generation, we can still exploit the attractive properties of VQ by delimiting its scope
• The realization of PRDC depends on the ability to convert
various media data into text and construct a CV feature sapce • Methods for evaluating the complexity or randomness of finite
sequences have been studied extensively providing the foundation for a number of data compression techniques
• However, the use of a CV as a general pattern feature, the core of PRDC, is a new concept
Media Data
Encoder1 Encoder2 Encoder3
Text Text Compressors Compression feature of different media- specific encoders
• Let A = {ai|0 ≤ i ≤ n − 1} be an alphabet composed of n
characters. A text t is a finite sequence over A. Let l(t) be its length
• For example, if A = {a, b}, then
t1 = aaaaaa, t2 = aabaab, t3 = ababab, t4 = abbabb and t5 = bbbbbb are texts, and l(ti) = 6 for all i
• Call substring of t as words, and define a dictionary dm,t as a set
of words in t with l(t) ≤ m. For example, d3,t2 = {a, b, aa, ab, ba, aab, aba, baa} and d3,t4 = {a, b, ab, ba, bb, abb, bab, bba}
• A parsing of u by dm,t, which is denoted by p(u, dm,t) is a
successive partitioning of u into words of dm,t
• If dm,t contains a significant amount of information about u, the parsed word count l(p(u, dm,t)) is small. Note that p(u, dm,t) is not unique
• In order to ensure the uniqueness of parsing, we introduce the concept of greedy parsing gp(u, dm,t)
• This is defined as the recursive parsing of a text u by taking the longest prefix lpf (u, dm,t) of u in dm,t followed by the greedy
parsing of the remaining part rest(u), as defined by the following function (φ denotes null text)
gp(u, dm,t) =
φ if u = φ
• For example, gp(t1, d3,t4) = a.a.a.a.a.a, l(gp(t1, d3,t4)) = 6 and
gp(t3, d3,t4) = ab.ab.ab, l(gp(t3, d3,t4)) = 3. The uniqueness of gp(u, dm,t) is proven in Theorem 1.
• Now, we can define the compression ratio ρ(u, dm,t) of u by dm,t
ρ(u, dm,t) = l(gp(u, dm,t)) l(u)
• Using the above example, we get ρ(t1, d3,t4) = (6/6) = 1.0 and
ρ(t3, d3,t4) = (3/6) = 0.5
• In order to enhance the featuring power, let us use a tuple of dictionaries Dm,t = (dm,t1, dm,t2, ..., dm,tn) construct from a text set T = {t1, t2, ...tn}. Then we can define an n-dimensional CV of u.
ρ(u, Dm,T) = (ρ(u, dm,t1), ..., ρ(u, dm,tn))
• If we choose T = {t2, t4} and m = 3, we obtain the CVs shown in
Table 1.
• The distance between these vectors represent similarities between the original texts.
CVs for Example Texts
Mathematical Discussion of CVs
• For CVs to be valid feature vectors of texts, the following minimum requirements must be met:
1. The CV of any text t must be able to be determined uniquely 2. Similmarities between texts should be adequately reflected in
• For (1), we show in Theorem 1 that the use of greedy parsing allows us to map a text to a unique CV in a multidimensional unit cube spanned by Dm,T
• As for (2), we point out in Theorem 2 that the mapping from a text to its CV may be degenerative, i.e., different texts can be mapped to an identical CV
• However, in Theorem 3, we show that we can remedy this
situation by extending Dm,T. We show in Theorems 4 and 5 that similar texts are mapped to similar CVs.
Dm,T = (dm,t1, dm,t2, ..., dm,tn),
the compression ratio vector ρ(u, Dm,T) is determined uniquely.
Moreover, ρ(u, Dm,T) ∈ [0, 1]|T |, where |T | is the cardinality of T and [0, 1]|T | is a |T |-dimensional unit cube
Proof. The uniqueness of ρ(u, dm,tk) = (1/l(u))l(gp(u, dm,tk)) follows from the uniqueness of l(u) and l(gp(u, dm,tk)). As the former is
obvious, we show the latter by showing its minimality. Suppose contrarily that some parsing p(u, dm,tk) < l(gp(u, dm,tk)), then for
dm,tk, contradicting the greediness of wgpi. Therefore, l(gp(u, dm,tk)) should be minimal. The latter part of the theorem follows from the obvious fact 1 ≤ l(gp(u, dm,tk)) ≤ l(u) and the definition of
u = v ⇒ ρ(u, Dm,T) = ρ(v, Dm,T), but the converse is not always true.
Proof. The first part of the Theorem is obvious. The last part is shown by counter example. Let A = {a, b, 0, 1}, u = ababbb,
v = 010111, T = {ababbb010111, 010111ababbb} and D2,T =
({a, b, ab, ba, bb, b0, 0, 1, 01, 10, 11}, {0, 1, 01, 10, 11, 1a, a, b, ab, ba, bb}) We then have ρ(u, D2,T) = ρ(v, Dm,T) = (0.5, 0.5). But, u = v.
(ρ(u, Dm,T) = ρ(v, Dm,T)) ∧ u = v then
∃ ˆm.[ρ(u, Dm,Tˆ ∪{u,v}) = ρ(v, Dm,Tˆ ∪{u,v})]
Proof. Assuming contrarily, let us attempt to refute the conclusion, getting
∀ ˆm.[ρ(u, Dm,Tˆ ∪{u,v}) = ρ(v, Dm,Tˆ ∪{u,v})]
Seperating the compression operations of T and {u, v}, we get
∀ ˆm.[ρ(u, Dm,Tˆ ) = ρ(v, Dm,Tˆ ) ∧ ρ(u, Dm,ˆ {u,v}) = ρ(v, Dm,ˆ {u,v})]
Using the second term, we get
ˆ
m = max(l(u), l(v)) = l(u)
we get
ρ(u, dm,uˆ ) = 1/l(u) < 1/l(v) ≤ ρ(v, dˆ,u) This contradicts the above formula.
In the case of l(u) = l(v), if we choose ˆm = l(u) = l(v) and use the
above formula, we obtain
Finally, we show that similar texts are mapped to similar CVs.
We first show in Theorem 4 that the CV of a concatenated text uv can be approximated by a weighted sum of CVs of u and v.
Then, using this result, we show that a minor variant of u is mapped to a minor variant of the CV of u.
ρuv − (
l(uv)ρu + l(uv)ρv) ≤ l(uv)
where ρuv abbreviates ρ(uv, Dm,T), etc., and r(T ) is the radius of a unit sphere in |T |-dimensional space such as r(T ) = |T | (Euclidian distance) or r(T ) = |T | (City distance).
Moreover, if l(uvw) is large and l(v) l(uvw), that is, if v is much shorter than uvw, then we get
l(gp(uv, dm, t)) > l(gp(u, dm,t)) + l(gp(v, dm,t)) + 1
> l(gp(u, dm,t)) + l(gp(v, dm,t))
This means it’s possible to obtain a trivial nongreedy parsing
p(uv, dm,t) = gp(u, dm,t).gp(v, dm,t) sufficing
l(p(uv, dm,t)) < l(gp(uv, dm,t)). This contradict Theorem 1. Second, we prove
l(gp(u, dm,t)) + l(gp(v, dm,t)) − 1 ≤ l(gp(uv, dm, t))
Assuming that gp(uv, dm,t) = wgp1.wgp2...wgpl, then there exists a word wgpk such that wgpk = wgp− kwgp+k and
gp(u, dm,t) = wgp1.wgp2...wgp− k. We are given a nongreedy parsing +
As l(gp(v, dm,t)) ≤ l(p(v, dm,t)) by Theorem 1, we get
l(gp(v, dm,t)) ≤ l(p(v, dm,t)) ≤
l(gp(uv, dm,t)) − l(gp(u, dm,t)) + 1 This implies
l(gp(u, dm,t)) + l(gp(v, dm,t)) − 1 ≤ l(gp(uv, dm, t))
Dividing these two inequalities by l(uv) and using l(uv)1 = ll(uv)(u) l(u)1 and l(uv)1 = ll(uv)(v) l(v)1 , we obtain
To proof the later part, let u = uv and use (1) twice. We then get ρuvw = ρuvw ≈ l(u) l(uw)ρu + l(w) l(uw)ρw ≈ l(u) l(uw) l(u) l(uv)ρu + l(v) l(uv)ρv + l(w) l(uw)ρw = l(u) l(uvw)ρu + l(v) l(uvw)ρv + l(w) l(uvw)ρw
Therefore, when l(v) l(uvw), we get
ρuvw ≈ l(u)
l(uvw)ρu + 0 +
l(w)
Encoding Media Data into Text
Sequential Pattern. Given a nontext sequence s = s1s2...sn, first segment s to obtain
SEG(s) = v1v2...vl
Replace each segment by as letter to give a text
t = V Q(SEG(s)) = V Q(v1)V Q(v2)...V Q(vl)
Spatial Pattern. Let P = {ρi,j|(i, j) ∈ Ir × Ic} be a color image composed of pixels of Ir rows and Ic columns, where pi,j denotes the RGB-vector of a pixel (i, j). First, compile P into a nondirected
weighted graph G(P ) composed of nodes ni,j, edges ei,j,k,l, and edge weights wi,j,k,l.
Here, we define the edge weight as the color difference between two terminal ndoes using an appropriate distance function d(x, y), i.e.,
of G(P ). Starting from a node, at the north-west corner, for
example, traverse M ST (G(P )) in a light-weight-edge-first manner outputing a sequence
T RAV (M ST (G(P ))) = (pi1,j1, dir1,0)(pi2,j2, dir2,1)...(pil,jl, dirl,l−1) Here, pik,jk and dirk,k−1 denotes the color of the current node and the traverse direction from the previous node nik−1,jk−1 to the
current node nik,jk. For example
Finally, we encode T RAV (M ST (G(P ))) into a text
t = V Q(T RAV (M ST (G(P ))))
= V Q((pi1,j1, dir1,0))...V Q((pil,jl, dirl,l−1))
Note that this scheme is an extension of Freeman’s chain code [28] in that both color contour shape (part of spatial) information and color (spectral) information on P are encoded simultaneously.
• Encoding into Text. Input media data is encoded into text using the method described above.
• Text Compression. The buffer-type dictionary approach is
adopted in some of the LZ-type text compression algorithms [26], [27].
• Selection of a Dictionary Set. First, prepare a small text set Ts to get Dm,Ts. Perform cluster analysis on the output vector {ρ(t, Dm,Ts) ∈ Tl}. Set the value |T | equal to the number of
Dm,T, we choose a set of training texts Tc and prepare a set of teaching data {(v, atr(v))|v ∈ Tc}, where atr(v) is the manually prepared attribute of v. We use Dm,T to compress texts v ∈ Tc to obtain a case database CDB = {(ρ(v, Dm,T), atr(v)|v inTc)}. In categorizing a set of texts Td, we calculate the respective
CV s = {ρ(v, Dm,T)|v ∈ Td}, on which we perform a cluster analysis [10].
• Recognition. In Recognition tasks, the nearest element (ρ(v∗, Dm,T), atr(v∗)) ∈ CDB of the incoming ρ(u, Dm,T) is selected and the corresponding atr(v∗) is output as the
Feature Representability of CV
Computer Programs. The header and source parts of C programs
are gathered into two files, H.txt and C.txt, both of which contain approximately 2,300 characters. These files are concatenated to form HH.txt, HC.txt, CH.txt, and CC.txt
The resolution is quite sharp and six CVs can be clearly categorized into three groups: {H.txt, HH.txt}, {HC.txt, CH.txt}, and {C.txt, CC.txt}. The dictionaries {H.txt, HH.txt} compress {H.txt, HH.txt} and {HC.txt, CH.txt} well, but, as expected, this is not the case for {C.txt, CC.txt}.
Human Voice. Two 30-second self-introduction speeches are
recorded for five students. Each file is divided into frames, and each frame is encoded into one of 26 codes. Several frame lengths were tested and a frame length of 25-ms was found to provide the clearest features
The resolution is not as sharp, except for {K1.wav, K2.wav}. However, it it possible to categorize the data into three groups:
{K1.wav, K2.wav}, {S1.wav, S2.wav, P1.wav, P2.wav}, and {T1.wav,
Gray-scale Image. Eight areas of 50 × 50 pixels are selected from Fig. 2d to generate files in bmp format. These subimages are
encoded into one of 64 codes, (8 MST directions) × (8 grayscales), using the proposed spatial pattern coding method.
The resolution is sharp, with four distinct groups: {3C.bmp,
4C.bmp}, {4A.bmp, 2B.bmp, 5B.bmp}, {4B.bmp, 3B.bmp}, and
{1B.bmp}. This is in good agreement with our visual impression for
the original image. Based onthis categorization, 3B.bmp as an
unknown input will be recognizable as being similar to 4B.bmp, in accordance with our intuition.
Applications
Conclusion
• We have proposed a new pattern representation scheme called PRDC, by which input data is converted into a text and then compressed using a set of dictionaries. PRDC can realize
attractive properties for media analysis: generality, facility for both categorization (class formation) and recognition
(classification), ability to cope with indefinitely varying media data, and easy implementability.
• We have presented a mathematical proof of the realizability of a feature space of CVs and demonstrated the usefulness of PRDC
Future Work
• Future investigations include the application of PRDC to more specific and sophisticated media analysis tasks and a
performance comparison with other methods.
• We anticipatethat combinations of PRDC with high-level
methods will be effective. In addition, we intend to examine a variant of PRDC that uses media-specific compressors rather than universal text compressors. We anticipate that this variant