• 沒有找到結果。

The length of each appearing bit N sequence equals the length of the data sequence

N/A
N/A
Protected

Academic year: 2021

Share "The length of each appearing bit N sequence equals the length of the data sequence"

Copied!
26
0
0

加載中.... (立即查看全文)

全文

(1)

Chapter 3 Bit Sequence Representation

In our approach, the information in the data sequence is stored as the form of bit

sequences. According to this representation, the frequencies of candidate patterns

could be checked more efficiently during the mining process.

In this chapter, chapter 3.1 will introduce the appearing bit sequence and the bit

index table. How to apply the appearing bit sequences of patterns to compute the

“frequency” of candidate patterns with fault tolerance quickly is introduced in chapter

3.2 and 3.3.

3.1 Appearing Bit Sequences

For each kind of data item N in the data sequence, N has a corresponding

appearing bit sequence (denoted asAppear ). The length of each appearing bit N

sequence equals the length of the data sequence. The leftmost bit is numbered as bit

1 and the numbering increases to the rightmost bit. In other words, if some data item

appears on the ith position of the data sequence, bit i in the appearance bit sequence of

this data item is set to be 1; otherwise, it is set to be 0. A bit index table is used to

store the appearing bit sequences for all the data items in the data sequence. Take

DSeq = ABCDABCACDEEABCCDEAC as an example. Its corresponding bit index

table is shown as Table 3.1.

(2)

Table 3.1: The bit index table of “ABCDABCACDEEABCCDEAC”

Data Item Appearing Bit Sequence (Appear ) N

A 10001001000010000010

B 01000100000001000000

C 00100010100000110001

D 00010000010000001000

E 00000000001100000100

From the bit index table, the number of bits with value 1 inAppear for a data N

item N equals to freq(N) in DSeq. Therefore, the frequency of a data item can be

obtained without needing to scan the data sequence repeatedly. The idea is also

applicable for a longer pattern of a data sequence. We use the following example to

show this idea.

Example 3.1 The bit index table of “ABCDABCACDEEABCCDEAC” is given as

shown in Table 3.1.

(1) Suppose we would like to getAppearAB. If bit i inAppearAB contains value 1,

it denotes a position where “AB” appears. Therefore, “A” must appear on

position i and the next position i+1 must contain “B”. The relative

information about the positions where “A” and “B” appears is stored in

AppearA and AppearB , respectively. So can use the following steps to getAppearAB.

(3)

Step1. Get the appearing bit sequence of the data item “B” from Table 1, 0001000000

0100010000

B =

Appear .

Step2. Perform the left shift operation one (= AB 1) bit on AppearB(shift

bit (i+1) to bit i, where1≤ i<19 , and set bit 20 to be 0),

0010000000 1000100000

) 1 , (

_shift AppearB =

l .

Step3. Perform an and operation on the result of Step2 and AppearA to

getAppearAB. Thus the resultant bit sequence:

0010000000 1000100000

) 1 , (

_ =

B

A l shift Appear

Appear .

(2) After gettingAppearAB, suppose we would like to getAppearABC. When

“AB” appears on position i and “C” appears on position i+2, it implies pattern

“ABC” appears on position i. Therefore, we can get AppearABC by

performing the following steps.

Step1. Obtain the appearing bit sequence of the last note “C” from Table 1, 0000110001

0010001010

C =

Appear .

Step2. Perform the left shift operation two (= ABC 1) bits onAppear , C

0011000100 1000101000

) 2 , (

_shift AppearC = l

Step3. Perform an and operation on the result of Step2 and AppearAB to

getAppearABC. Thus the resultant bit sequence:

(4)

The bits with value 1 in AppearAB implies those positions where “AB” begins to

appear. Therefore, the number of these bits equals the frequency of “AB” in DSeq,

that is 3. Similarly, fromAppearABC, the frequency of “ABC” in DSeq, 3, could be

obtained efficiently.

Observing Example 3.1 carefully, we can get the following rules to getAppearP

for a pattern P. Suppose a pattern P=P1P2KPm (m2)is given, where P i

) , , 1

(i= K m is a data item. Let P=P1P2KPm1 and P be denoted by X. Then m

AppearP can be deducted fromAppearP'andAppearX according to the following recursive formula, where l_shift(b,n,c)means to perform left shift n bits on b, and

the rightmost bit on b is filled with constant c(c=0 or 1). If the parameter c is omitted

from the function, the default value of c is set to be 0.



=

= =

. ),

1 , (

_

, 1 ,

'

' Appear l shift Appear P otherwise Appear

P if Appear

Appear

X P

X P

P P

(3.1)

3.2 Appearing Bit Sequences of Insertion Fault Tolerance

By extending the representations of appearing bit sequence, the fault-tolerant

appearing bit sequences are designed to represent the appearing positions of a pattern

with fault tolerance. Given a fault-tolerance δ (δIorδD), the appearing bit sequence

(5)

of a pattern P in a data sequence, denoted asFT-AppearP+(δ)/FT-AppearP(δ),

represents the positions where the data sequence IFT/DFT-contains P under fault

tolerance δ . The methods for getting FT-AppearP+(δI) and FT-AppearP(δD)

systematically are described in this chapter and the next chapter separately.

Considering the insertion fault tolerance, we define the appearing bit sequence of

a pattern P with E numbers of insertion errors, denoted as AppearP+(E). The bits

with value 1 in AppearP+(E)represent those positions where the data sequence

FT-contains P with E insertion errors. According to [Def. 2.4], there are (δI +1)

situations that a pattern P IFT-appears in DSeq under fault toleranceδI. That is,

DSeq FT-contains P with 0, 1, 2,…,δI insertion errors. In other words, performing

δI or operations on (δI +1) appearing bit sequences: AppearP+(0),AppearP+(1), )

( and

, ), 2

( P I

P Appear

Appear+ K + δ , )FT-AppearP+(δI could be obtained. Figure 3.1

illustrates the relationship between FT-AppearP+(δI)andAppearP+(E)when δI= 2

for example patterns “A”, “AB”, “ABC”, “ABCD”, and “ABCDE” .

(6)

Figure 3.1 Example of the relationship between FT-AppearP+(δI)andAppearP+(E)

In Figure 3.1, note that FT-AppearA+(2)equals toAppearA+(0)directly because it

is not possible to insert 1 or 2 data items in the “middle” of pattern “A” to get a

sub-pattern of the data sequence. Therefore, we can conclude the following rule.

[Rule 3.1] Whenever a given pattern P such that P =1 and insertion fault-tolerance

isδI, then P = A

FT-AppearA+(2)= AppearA+(0)AppearA+(1)AppearA+(2)

P = AB

FT-AppearAB+ (2)= AppearAB+ (0) AppearAB+ (1) AppearAB+ (2)

P = ABC

FT-AppearABC+ (2)= AppearABC+ (0) AppearABC+ (1)AppearABC+ (2)

P = ABCD

) 2 ( )

1 ( )

0 ( )

2 (

-AppearABCD+ = AppearABCD+ AppearABCD+ AppearABCD+ FT

P = ABCDE

) 2 ( )

1 ( )

0 ( )

2 (

-AppearABCDE+ = AppearABCDE+ AppearABCDE+ AppearABCDE+ FT

(7)

EδI

∀ 1 , 0AppearP+(E)= (represented byDSeq numbers of bits with 0s)

(3.2)

The remaining problem is how to get eachAppearP+(E) , where P >1and

EδI

0 . SinceAppearP+(0) represents the locations where DSeq FT-contains P with zero insertion error, it implies the same information represented inAppearP.

Therefore, the way of getting AppearP+(0) is the same as getting AppearP(described

in the previous chapter).

When1EδI, )AppearP+(E also can be obtained by performing the left shift

and and operations on appearing bit sequences of sub-pattern of P according to the

following lemma.

[Lemma 3.1]

Given a patternP=P1P2KPm, where P i (i=1,K,m)is a data item. Let P′

denote the sub-pattern P1P2KPm1 of P and X denote theP . DSeq FT-contains m

pattern P with E insertion errors on position i, iff DSeq FT-contains pattern P′ with k

insertion errors on position i (0k E) and X appears on position i+( P +E)− . 1

Proof. P′ appears in DSeq from position i to (i+ P 1)+k (with k insertion errors).

So there exists Ek insertion errors between P′ and X. It implies X must appear on

(8)

position i+( P +E) 1− , because (i+ P 1+k) + (Ek) + 1 = i+(P+1)+E1 +

= i ( P +E) 1− (QP′ 1+ = P ). #

In other words, X must appear on the ( P + E1)th position on the right hand

side of position i. Therefore, the way of gettingAppearP+(E)can be expressed as the

following recursive formula for0< EδI.



+

=

= +

=

+ (( )) _ ( , 1), .

, 1 0s)

with bits of numbers by

ed (represent 0

) (

' 0

otherwise E

P Appear shift

l k Appear

P if DSeq

E Appear

X P

E

k

P (3.3)

To combine Formulas (3.1) and (3.3), a recursive function of gettingAppearP+(E),

where0EδIis defined as follows.

[Def. 3.1] (Recursive function of getting AppearP+(E) ) Suppose a pattern

Pm

P P

P= 1 2K is given. Let P′ denote the sub-pattern P1P2KPm1 of P and X

denote the last data item Pm . When insertion fault tolerance δI is

given, AppearP+(E), where0EδI , is obtained from the following recursive

function.

If P =1, then

P

P Appear

Appear+(0)= ;

(9)

∀ 1 EδI, 0AppearP+(E)= (represented byDSeq numbers of bits with 0s);

Else

( ) ( '( )) _ ( , 1)

0

+

= +

=

+ EAppear k l shift Appear P E

Appear p X

E k P

Example 3.2 Given the bit index table of “ABCDABCACDEEABCCDEAC” as

shown in Table 1. Assume δI =1 , the process of getting AppearAB+ (1)

andAppearABC+ (1) is shown as follows.

(1) )AppearAB+ (1

Step1. GetAppearB =01000100000001000000from the bit index table.

Step2. Perform an or operation onAppearA+(0)andAppearA+(1). According

to formula (3.2), AppearA+(1)=0, andAppearA+(0)= AppearA. The

result of the or operation is assigned to temporal variable s.

0010000010 1000100100

) 1 ( )

0

( =

= AppearA+ AppearA+ s

Step3. Perform the left shift operation two (= AB +11) bits onAppearB,

0100000000 0001000000

) 2 , (

_ =

=l shift AppearB t

Step4. Perform an and operation on s and t to getAppearAB+ (1). Thus the

resultant bit sequence: s∧ t =00000000000000000000.

(2) AppearABC+ (1)

Step1. GetAppearC =00100010100000110001 from the bit index table.

(10)

Step2. Perform an or operation onAppearAB+ (0)andAppearAB+ (1), then assign

it to s. SinceAppearAB+ (0) is gotten based on formula (3.1) and

) 1

+ (

AppearAB is known from the previous result of this example, the resultant appearing bit sequence s= AppearAB+ (0)AppearAB+ (1)

0010000000 1000100000

= .

Step3. Perform the left shift operation three (= ABC +11) bits

onAppear , 0110001000C t =l_shift(AppearC,3)=0001010000 .

Step4. Perform an and operation on s and t to getAppearABC+ (1). Thus the

resultant bit sequence: s∧ t =00000000000010000000.

Figure 3.2 illustrates the process to get AppearP+(E)recursively for a sample

pattern “ABCDE”, whereδI =2.

(11)

Figure 3.2 Example of the way of gettingAppearABCDE+ (E)recursively P = A

A

A Appear

Appear+(0)=

000 000 ) 1

( = K

+

AppearA

000 000 ) 2

( = K

+

AppearA

P = AB

) 1 , (

_ ) 0 ( )

0

( A B

AB Appear l shift Appear

Appear+ = +

) 2 , (

_ )) 1 ( )

0 ( (

) 1

( A A B

AB Appear Appear l shift Appear

Appear+ = + +

) 3 , (

_ )) 2 ( )

1 ( )

0 ( (

) 2

( A A A B

AB Appear Appear Appear l shift Appear

Appear+ = + + +

P = ABC

) 2 , (

_ ) 0 ( )

0

( AB C

ABC Appear l shift Appear

Appear+ = +

) 3 , (

_ )) 1 ( )

0 ( (

) 1

( AB AB C

ABC Appear Appear l shift Appear

Appear+ = + +

) 4 , (

_ )) 2 ( )

1 ( )

0 ( (

) 2

( AB AB AB C

ABC Appear Appear Appear l shift Appear

Appear+ = + + +

P = ABCD

) 3 , (

_ ) 0 ( )

0

( ABC D

ABCD Appear l shift Appear

Appear+ = +

) 4 , (

_ )) 1 ( )

0 ( (

) 1

( ABC ABC D

ABCD Appear Appear l shift Appear

Appear+ = + +

) 5 , (

_

)) 2 ( )

1 ( )

0 ( (

) 2 (

D

ABC ABC

ABC ABCD

Appear shift

l

Appear Appear

Appear Appear

= + + +

+

P = ABCDE

) 4 , (

_ ) 0 ( )

0

( ABCD E

ABCDE Appear l shift Appear

Appear+ = +

) 5 , (

_ )) 1 ( )

0 ( (

) 1

( ABCD ABCD E

ABCDE Appear Appear l shift Appear

Appear+ = + +

) 6 , (

_

)) 2 ( )

1 ( )

0 ( (

) 2 (

E

ABCD ABCD

ABCD ABCDE

Appear shift

l

Appear Appear

Appear Appear

= + + +

+

(12)

Finally, FT-AppearP+(δI) can be obtained by performing ()

0

i AppearP

i

I +

δ= .

) ( -freq P

FT DSeq equals to the number of bits with value 1 inFT-AppearP+(δI).

Therefore the insertion fault-tolerant frequency of a pattern P could be counted

efficiently to evaluate whether P is a FT-RP or not.

Example 3.3 Follows the result shown in Example 3.1 and Example 3.2,

0010000000 1000100000

) 1 ( )

0 ( )

1 (

-AppearABC+ = AppearABC+ AppearABC+ =

FT , thus

under insertion fault tolerance 1, FT-freqDSeq("ABC")=3.

To avoid the duplicated computations of performing or and left shift operations

to get AppearP+(E) for various E, we modify the recursive function of

getting AppearP+(E) to show the recurrent relations between temp variables for

computing )AppearP+(E andAppearP+(E1).

[Def. 3.2] (Modified recursive function of gettingAppearP+(E)) Given a pattern

Pm

P P

P= 1 2K . Let P′ denote the sub-pattern P1P2KPm1 of P and X denote the

last data item P . When insertion fault tolerance m δI is given, AppearP+(E)

where0EδI, is obtained from the following recursive function.

If P =1, then

P

P Appear

Appear+(0)= ;

(13)

EδI

∀ 1 , 0AppearP+(E)= (represented byDSeq numbers of bits with 0s);

Else

IfE=0, then

) 0 ( )

( '

1

= AppearP+

E

temp ;

) 1 , (

_ )

2(E =l shift Appear P

temp X

Else

) ( )

1 ( )

( 1 '

1 E temp E Appear E

temp = P+ ;

) 1 ), 1 ( ( _ )

( 2

2 E =l shift temp E

temp ;

) ( )

( )

(E temp1 E temp2 E

AppearP+ =

Figure 3.3 shows the results after modifying Figure 3.2 based on [Def. 3.2].

(14)

Figure 3.3 Modification of Figure 3.2 P = A

A

A Appear

Appear+(0)=

000 000 ) 1

( = K

+

AppearA

000 000 ) 2

( = K

+

AppearA

P = AB

) 0 ( )

0

1(

= AppearA+

temp

) 1 , (

_ ) 0

2( l shift AppearB

temp =

) 0 ( )

0 ( )

0

( temp1 temp2 AppearAB+ =

) 1 ( )

0 ( )

1

( 1

1

+

=temp AppearA temp

) 1 ), 0 ( ( _ ) 1

( 2

2 l shift temp temp =

) 1 ( )

1 ( )

1

( temp1 temp2 AppearAB+ =

) 2 ( )

1 ( )

2

( 1

1

+

=temp AppearA temp

) 1 ), 1 ( (

_ ) 2

( 2

2 l shift temp temp =

) 2 ( )

2 ( )

2

( temp1 temp2 AppearAB+ = P = ABC

) 0 ( )

0

1(

= AppearAB+

temp

) 2 , (

_ ) 0

2( l shift AppearC

temp =

) 0 ( )

0 ( )

0

( temp1 temp2 AppearABC+ =

) 1 ( )

0 ( )

1

( 1

1

+

=temp AppearAB temp

) 1 ), 0 ( ( _ ) 1

( 2

2 l shift temp temp =

) 1 ( )

1 ( )

1

( temp1 temp2 AppearABC+ =

) 2 ( )

1 ( )

2

( 1

1

+

=temp AppearAB temp

) 1 ), 1 ( (

_ ) 2

( 2

2 l shift temp temp =

) 2 ( )

2 ( )

2

( temp1 temp2 AppearABC+ =

(15)

Figure 3.3 Modification of Figure 3.2 (Continue)

3.3 Appearing Bit Sequences of Deletion Fault Tolerance

Similar to the insertion fault tolerance, we define the appearing bit sequence of a P = ABCD

) 0 ( )

0

1(

= AppearABC+

temp

) 3 , (

_ ) 0

2( l shift AppearD

temp =

) 0 ( )

0 ( )

0

( temp1 temp2 AppearABCD+ =

) 1 ( )

0 ( )

1

( 1

1

+

=temp AppearABC temp

) 1 ), 0 ( ( _ ) 1

( 2

2 l shift temp temp =

) 1 ( )

1 ( )

1

( temp1 temp2 AppearABCD+ =

) 2 ( )

1 ( )

2

( 1

1

+

=temp AppearABC temp

) 1 ), 1 ( ( _ ) 2

( 2

2 l shift temp temp =

) 2 ( )

2 ( )

2

( temp1 temp2 AppearABCD+ = P = ABCDE

) 0 ( )

0

1(

= AppearABCD+

temp

) 4 , (

_ ) 0

2( l shift AppearE

temp =

) 0 ( )

0 ( )

0

( temp1 temp2 AppearABCDE+ =

) 1 ( )

0 ( )

1

( 1

1

+

=temp AppearABCD temp

) 1 ), 0 ( ( _ ) 1

( 2

2 l shift temp temp =

) 1 ( )

1 ( )

1

( temp1 temp2 AppearABCDE+ =

) 2 ( )

1 ( )

2

( 1

1

+

=temp AppearABCD temp

) 1 ), 1 ( ( _ ) 2

( 2

2 l shift temp temp =

) 2 ( )

2 ( )

2

( temp1 temp2 AppearABCDE+ =

(16)

value 1 in AppearP(E)represent those positions where the data sequence FT-contains

P with E deletion errors.

Suppose a pattern P=P1P2KPm is given. Let Y denote the first data item

and P ′′ denote the sub-pattern P2P3…Pm of P. Different fromFT-AppearP+(δI) ,

) ( -AppearP D

FT δ represents the positions where Y appears and DSeq FT-contains P ′′

on the next positions with at most δD deletion errors. Therefore, when finding a

position j where DSeq FT-contains P ′′ with 0, 1, 2,…, or δD deletion errors, we can

find the position ( j1) where DSeq DFT-contains P under fault tolerance δD if

position ( j1) contains Y. In other words, after performing δD or operations on

(δD +1) appearing bit sequences:AppearP′′(0),AppearP′′(1), …,AppearP′′(δD 1), and

) ( D

AppearP′′ δ , then performing a left shift operation on the previous result, and finally performing an and operation with AppearY, )FT-AppearP(δD could be

obtained. Figure 3.4 illustrates the relationship between FT-AppearP(δD)

andAppearP′′(E)when δD= 2 for example patterns “A”, “AB”, “ABC”, “ABCD”,

and “ABCDE”. Note that if |P| δD+1, the rightmost bit is filled with 1 when

performing the left shift operation because the bit is considered as “don’t care” bit on

the next performed and operation. Otherwise, 0 is filled to the rightmost bit.

數據

table is shown as Table 3.1.
Figure 3.1 Example of the relationship between  FT - Appear P + ( δ I ) and Appear P + (E )
Figure 3.2 Example of the way of getting Appear ABCDE + (E ) recursively P = A AAAppearAppear+(0)=000000)1(=K+AppearA000000)2(=K+AppearAP = AB )1,(_)0()0(AB
Figure 3.3 Modification of Figure 3.2 P = A AAAppearAppear+(0)=000000)1(=K+AppearA000000)2(=K+AppearAP = AB )0()01(=AppearA+temp)1,(_)02(lshiftAppearBtemp=)0()0()0(temp1temp2AppearAB+=∧)1()0()1(11∨+=tempAppearAtemp)1),0((_)1(22lshifttemptemp=)1()1()1(temp1
+3

參考文獻

相關文件

好了既然 Z[x] 中的 ideal 不一定是 principle ideal 那麼我們就不能學 Proposition 7.2.11 的方法得到 Z[x] 中的 irreducible element 就是 prime element 了..

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

For pedagogical purposes, let us start consideration from a simple one-dimensional (1D) system, where electrons are confined to a chain parallel to the x axis. As it is well known

The observed small neutrino masses strongly suggest the presence of super heavy Majorana neutrinos N. Out-of-thermal equilibrium processes may be easily realized around the

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

(1) Determine a hypersurface on which matching condition is given.. (2) Determine a

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the