• 沒有找到結果。

A new pattern representation scheme using data compression

N/A
N/A
Protected

Academic year: 2021

Share "A new pattern representation scheme using data compression"

Copied!
45
0
0

加載中.... (立即查看全文)

全文

(1)

A New Pattern Representation Scheme

Using Data Compression

Presented by Chen-hsiu Huang

(2)

Media Data Processing

• How to deal with tremendous variety of media:

– Caterization resolves the latent global information structure

contained in a set of unknown data

– Recognition provides the means of correctly identifying

(3)

• We believe that a new media analysis scheme should alleviate following:

1. Generality, i.e., applicability to media data of any type 2. Facility for both categorization and recognition

3. The ability to cope with indefinitely varying (difficult to represent by a set of finite well-defined models) media data 4. Easily implementable and low processing cost.

(4)

VQ: Vector Quantization

• VQ is applicable to a wide range media data and is implemented very easily.

• Categorization and recognition are performed by partitioning a given feature space into several classes and assigning an unknown vector to an appropriate class

• Due to the lack of general mapping schemes, VQ has been limited to rather low-level analysis tasks for which intrinsic feature vectors are available

(5)

• Frequency domain methods using features such as Fourier-,

DCT-, or wavelet-coefficients have wide applicability but require real-valued inputs. This requirement restricts the applicability of such methods to symbol data such as text

• NN and related algorithms provide a completely different general scheme. Such systems also cope well with indefinitely varying sources, yet are only applicable to recognition and require much training even for small tasks with a corresponding high

(6)

The PRDC System

• In PRDC, input data is converted into text and compressed using a set of encoding dictionaries; it generates a

compression ratio vector (CV) as a feature of the original input • The CV is then used as a feature vector in traditional VQ.

Although some of the original information is lost in the text

generation, we can still exploit the attractive properties of VQ by delimiting its scope

(7)

• The realization of PRDC depends on the ability to convert

various media data into text and construct a CV feature sapce • Methods for evaluating the complexity or randomness of finite

sequences have been studied extensively providing the foundation for a number of data compression techniques

• However, the use of a CV as a general pattern feature, the core of PRDC, is a new concept

(8)

Media Data

Encoder1 Encoder2 Encoder3

Text Text Compressors Compression feature of different media- specific encoders

(9)

• Let A = {ai|0 ≤ i ≤ n − 1} be an alphabet composed of n

characters. A text t is a finite sequence over A. Let l(t) be its length

• For example, if A = {a, b}, then

t1 = aaaaaa, t2 = aabaab, t3 = ababab, t4 = abbabb and t5 = bbbbbb are texts, and l(ti) = 6 for all i

(10)

• Call substring of t as words, and define a dictionary dm,t as a set

of words in t with l(t) ≤ m. For example, d3,t2 = {a, b, aa, ab, ba, aab, aba, baa} and d3,t4 = {a, b, ab, ba, bb, abb, bab, bba}

• A parsing of u by dm,t, which is denoted by p(u, dm,t) is a

successive partitioning of u into words of dm,t

If dm,t contains a significant amount of information about u, the parsed word count l(p(u, dm,t)) is small. Note that p(u, dm,t) is not unique

(11)

• In order to ensure the uniqueness of parsing, we introduce the concept of greedy parsing gp(u, dm,t)

• This is defined as the recursive parsing of a text u by taking the longest prefix lpf (u, dm,t) of u in dm,t followed by the greedy

parsing of the remaining part rest(u), as defined by the following function (φ denotes null text)

gp(u, dm,t) =   

φ if u = φ

(12)

• For example, gp(t1, d3,t4) = a.a.a.a.a.a, l(gp(t1, d3,t4)) = 6 and

gp(t3, d3,t4) = ab.ab.ab, l(gp(t3, d3,t4)) = 3. The uniqueness of gp(u, dm,t) is proven in Theorem 1.

• Now, we can define the compression ratio ρ(u, dm,t) of u by dm,t

ρ(u, dm,t) = l(gp(u, dm,t)) l(u)

• Using the above example, we get ρ(t1, d3,t4) = (6/6) = 1.0 and

ρ(t3, d3,t4) = (3/6) = 0.5

(13)

• In order to enhance the featuring power, let us use a tuple of dictionaries Dm,t = (dm,t1, dm,t2, ..., dm,tn) construct from a text set T = {t1, t2, ...tn}. Then we can define an n-dimensional CV of u.



ρ(u, Dm,T) = (ρ(u, dm,t1), ..., ρ(u, dm,tn))

• If we choose T = {t2, t4} and m = 3, we obtain the CVs shown in

Table 1.

The distance between these vectors represent similarities between the original texts.

(14)

CVs for Example Texts

(15)

Mathematical Discussion of CVs

• For CVs to be valid feature vectors of texts, the following minimum requirements must be met:

1. The CV of any text t must be able to be determined uniquely 2. Similmarities between texts should be adequately reflected in

(16)

• For (1), we show in Theorem 1 that the use of greedy parsing allows us to map a text to a unique CV in a multidimensional unit cube spanned by Dm,T

• As for (2), we point out in Theorem 2 that the mapping from a text to its CV may be degenerative, i.e., different texts can be mapped to an identical CV

• However, in Theorem 3, we show that we can remedy this

situation by extending Dm,T. We show in Theorems 4 and 5 that similar texts are mapped to similar CVs.

(17)

Dm,T = (dm,t1, dm,t2, ..., dm,tn),

the compression ratio vector ρ(u, Dm,T) is determined uniquely.

Moreover, ρ(u, Dm,T) ∈ [0, 1]|T |, where |T | is the cardinality of T and [0, 1]|T | is a |T |-dimensional unit cube

Proof. The uniqueness of ρ(u, dm,tk) = (1/l(u))l(gp(u, dm,tk)) follows from the uniqueness of l(u) and l(gp(u, dm,tk)). As the former is

obvious, we show the latter by showing its minimality. Suppose contrarily that some parsing p(u, dm,tk) < l(gp(u, dm,tk)), then for

(18)

dm,tk, contradicting the greediness of wgpi. Therefore, l(gp(u, dm,tk)) should be minimal. The latter part of the theorem follows from the obvious fact 1 ≤ l(gp(u, dm,tk)) ≤ l(u) and the definition of

(19)

u = v ⇒ ρ(u, Dm,T) = ρ(v, Dm,T), but the converse is not always true.

Proof. The first part of the Theorem is obvious. The last part is shown by counter example. Let A = {a, b, 0, 1}, u = ababbb,

v = 010111, T = {ababbb010111, 010111ababbb} and D2,T =

({a, b, ab, ba, bb, b0, 0, 1, 01, 10, 11}, {0, 1, 01, 10, 11, 1a, a, b, ab, ba, bb}) We then have ρ(u, D2,T) = ρ(v, Dm,T) = (0.5, 0.5). But, u = v.

(20)

(ρ(u, Dm,T) = ρ(v, Dm,T)) ∧ u = v then

∃ ˆm.[ρ(u, Dm,Tˆ ∪{u,v}) = ρ(v, Dm,Tˆ ∪{u,v})]

Proof. Assuming contrarily, let us attempt to refute the conclusion, getting

∀ ˆm.[ρ(u, Dm,Tˆ ∪{u,v}) = ρ(v, Dm,Tˆ ∪{u,v})]

Seperating the compression operations of T and {u, v}, we get

∀ ˆm.[ρ(u, Dm,Tˆ ) = ρ(v, Dm,Tˆ ) ∧ ρ(u, Dm,ˆ {u,v}) = ρ(v, Dm,ˆ {u,v})]

Using the second term, we get

(21)

ˆ

m = max(l(u), l(v)) = l(u)

we get

ρ(u, dm,uˆ ) = 1/l(u) < 1/l(v) ≤ ρ(v, dˆ,u) This contradicts the above formula.

In the case of l(u) = l(v), if we choose ˆm = l(u) = l(v) and use the

above formula, we obtain

(22)

Finally, we show that similar texts are mapped to similar CVs.

We first show in Theorem 4 that the CV of a concatenated text uv can be approximated by a weighted sum of CVs of u and v.

Then, using this result, we show that a minor variant of u is mapped to a minor variant of the CV of u.

(23)

ρuv − (

l(uv)ρu + l(uv)ρv) ≤ l(uv)

where ρuv abbreviates ρ(uv, Dm,T), etc., and r(T ) is the radius of a unit sphere in |T |-dimensional space such as r(T ) = |T | (Euclidian distance) or r(T ) = |T | (City distance).

Moreover, if l(uvw) is large and l(v)  l(uvw), that is, if v is much shorter than uvw, then we get



(24)

l(gp(uv, dm, t)) > l(gp(u, dm,t)) + l(gp(v, dm,t)) + 1

> l(gp(u, dm,t)) + l(gp(v, dm,t))

This means it’s possible to obtain a trivial nongreedy parsing

p(uv, dm,t) = gp(u, dm,t).gp(v, dm,t) sufficing

l(p(uv, dm,t)) < l(gp(uv, dm,t)). This contradict Theorem 1. Second, we prove

l(gp(u, dm,t)) + l(gp(v, dm,t)) − 1 ≤ l(gp(uv, dm, t))

Assuming that gp(uv, dm,t) = wgp1.wgp2...wgpl, then there exists a word wgpk such that wgpk = wgp kwgp+k and

gp(u, dm,t) = wgp1.wgp2...wgp k. We are given a nongreedy parsing +

(25)

As l(gp(v, dm,t)) ≤ l(p(v, dm,t)) by Theorem 1, we get

l(gp(v, dm,t)) ≤ l(p(v, dm,t))

l(gp(uv, dm,t)) − l(gp(u, dm,t)) + 1 This implies

l(gp(u, dm,t)) + l(gp(v, dm,t)) − 1 ≤ l(gp(uv, dm, t))

Dividing these two inequalities by l(uv) and using l(uv)1 = ll(uv)(u) l(u)1 and l(uv)1 = ll(uv)(v) l(v)1 , we obtain

(26)

To proof the later part, let u = uv and use (1) twice. We then get  ρuvw = ρuvw  l(u) l(uw)ρu + l(w) l(uw)ρw  l(u) l(uw)  l(u) l(uv)ρu + l(v) l(uv)ρv  + l(w) l(uw)ρw =  l(u) l(uvw)ρu + l(v) l(uvw)ρv + l(w) l(uvw)ρw

Therefore, when l(v)  l(uvw), we get



ρuvw l(u)

l(uvw)ρu + 0 +

l(w)

(27)

Encoding Media Data into Text

Sequential Pattern. Given a nontext sequence s = s1s2...sn, first segment s to obtain

SEG(s) = v1v2...vl

Replace each segment by as letter to give a text

t = V Q(SEG(s)) = V Q(v1)V Q(v2)...V Q(vl)

(28)

Spatial Pattern. Let P = {ρi,j|(i, j) ∈ Ir × Ic} be a color image composed of pixels of Ir rows and Ic columns, where pi,j denotes the RGB-vector of a pixel (i, j). First, compile P into a nondirected

weighted graph G(P ) composed of nodes ni,j, edges ei,j,k,l, and edge weights wi,j,k,l.

Here, we define the edge weight as the color difference between two terminal ndoes using an appropriate distance function d(x, y), i.e.,

(29)

of G(P ). Starting from a node, at the north-west corner, for

example, traverse M ST (G(P )) in a light-weight-edge-first manner outputing a sequence

T RAV (M ST (G(P ))) = (pi1,j1, dir1,0)(pi2,j2, dir2,1)...(pil,jl, dirl,l−1) Here, pik,jk and dirk,k−1 denotes the color of the current node and the traverse direction from the previous node nik−1,jk−1 to the

current node nik,jk. For example

(30)

Finally, we encode T RAV (M ST (G(P ))) into a text

t = V Q(T RAV (M ST (G(P ))))

= V Q((pi1,j1, dir1,0))...V Q((pil,jl, dirl,l−1))

Note that this scheme is an extension of Freeman’s chain code [28] in that both color contour shape (part of spatial) information and color (spectral) information on P are encoded simultaneously.

(31)
(32)

• Encoding into Text. Input media data is encoded into text using the method described above.

• Text Compression. The buffer-type dictionary approach is

adopted in some of the LZ-type text compression algorithms [26], [27].

• Selection of a Dictionary Set. First, prepare a small text set Ts to get Dm,Ts. Perform cluster analysis on the output vector {ρ(t, Dm,Ts) ∈ Tl}. Set the value |T | equal to the number of

(33)

Dm,T, we choose a set of training texts Tc and prepare a set of teaching data {(v, atr(v))|v ∈ Tc}, where atr(v) is the manually prepared attribute of v. We use Dm,T to compress texts v ∈ Tc to obtain a case database CDB = {(ρ(v, Dm,T), atr(v)|v inTc)}. In categorizing a set of texts Td, we calculate the respective

CV s = {ρ(v, Dm,T)|v ∈ Td}, on which we perform a cluster analysis [10].

• Recognition. In Recognition tasks, the nearest element (ρ(v∗, Dm,T), atr(v∗)) ∈ CDB of the incoming ρ(u, Dm,T) is selected and the corresponding atr(v∗) is output as the

(34)

Feature Representability of CV

Computer Programs. The header and source parts of C programs

are gathered into two files, H.txt and C.txt, both of which contain approximately 2,300 characters. These files are concatenated to form HH.txt, HC.txt, CH.txt, and CC.txt

The resolution is quite sharp and six CVs can be clearly categorized into three groups: {H.txt, HH.txt}, {HC.txt, CH.txt}, and {C.txt, CC.txt}. The dictionaries {H.txt, HH.txt} compress {H.txt, HH.txt} and {HC.txt, CH.txt} well, but, as expected, this is not the case for {C.txt, CC.txt}.

(35)
(36)

Human Voice. Two 30-second self-introduction speeches are

recorded for five students. Each file is divided into frames, and each frame is encoded into one of 26 codes. Several frame lengths were tested and a frame length of 25-ms was found to provide the clearest features

The resolution is not as sharp, except for {K1.wav, K2.wav}. However, it it possible to categorize the data into three groups:

{K1.wav, K2.wav}, {S1.wav, S2.wav, P1.wav, P2.wav}, and {T1.wav,

(37)
(38)

Gray-scale Image. Eight areas of 50 × 50 pixels are selected from Fig. 2d to generate files in bmp format. These subimages are

encoded into one of 64 codes, (8 MST directions) × (8 grayscales), using the proposed spatial pattern coding method.

The resolution is sharp, with four distinct groups: {3C.bmp,

4C.bmp}, {4A.bmp, 2B.bmp, 5B.bmp}, {4B.bmp, 3B.bmp}, and

{1B.bmp}. This is in good agreement with our visual impression for

the original image. Based onthis categorization, 3B.bmp as an

unknown input will be recognizable as being similar to 4B.bmp, in accordance with our intuition.

(39)
(40)

Applications

(41)
(42)
(43)
(44)

Conclusion

• We have proposed a new pattern representation scheme called PRDC, by which input data is converted into a text and then compressed using a set of dictionaries. PRDC can realize

attractive properties for media analysis: generality, facility for both categorization (class formation) and recognition

(classification), ability to cope with indefinitely varying media data, and easy implementability.

• We have presented a mathematical proof of the realizability of a feature space of CVs and demonstrated the usefulness of PRDC

(45)

Future Work

• Future investigations include the application of PRDC to more specific and sophisticated media analysis tasks and a

performance comparison with other methods.

• We anticipatethat combinations of PRDC with high-level

methods will be effective. In addition, we intend to examine a variant of PRDC that uses media-specific compressors rather than universal text compressors. We anticipate that this variant

參考文獻

相關文件

On a Saturday afternoon, you pull into a parking lot with unme- tered spaces near a shopping area, where people are known to shop, on average, for 2 hours. You circle around, but

• When a number can not be represented exactly with the fixed finite number of digits in a computer, a near-by floating-point number is chosen for approximate

• A cell array is a data type with indexed data containers called cells, and each cell can contain any type of data. • Cell arrays commonly contain either lists of text

We present a new method, called ACC (i.e. Association based Classification using Chi-square independence test), to solve the problems of classification.. ACC finds frequent and

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

mNewLine ; invoke the macro This is how you define and invoke a simple macro. The assembler will substitute &#34;call

The presentation or rebranding by a company of an established product in a new form, a new package or under a new label into a market not previously explored by that company..

Step 4 If the current bfs is not optimal, then determine which nonbasic variable should become a basic variable and which basic variable should become a nonbasic variable to find a