Periodicity - 探勘手機應用程式的使用模式以預測其使用行為

For discovering the feature of periodicity of the Apps usage pattern, fp =< p, st₁, st₂, ..., st_n >, we first detect the usage period of the App and then discover the specific times under the period.

Figure 4.1: An example of periodicity detection

There are two steps to discover the period and specific times, including period detection and specific time discover.

4.3.1 Period Detection

Given the App-history of an App, the goal of this step is to detect the period of the App, and the Apps without periodicity will be eliminated. We adopt Discrete Fourier Transform (DFT) to find all possible periods, and the dynamic cut approach [23] the most significant period, and then the autocorrelation is to prevent the problem that DFT cannot find the significant period in the low

Figure 4.2: An example of behavior clustering

of this App. If none of the periods exceed the maximum power, then the sequence is regarded as non-periodic. The maximum power is determined using the following method. Let rAH(app) be a sequence which is the outcome of a random permutation on the elements of AH(app). Because of the scramblingprocess, rAH(app) does not exhibit any pattern or period, so that we record the maximum power as pmax which FTgram [23] of rAH(app) exhibits at any period. Only the period in AH(app) that has higher power than pmax may considered as a real period. In Figure 4.1(b), the red dashed line is the maximum power of shuffled sequence. Since the maximum power, P , is higher than the red dashed line, this App is recognized as having period and the frequency is P . Finally, in Figure 4.1(c), the frequency is mapped to the period on the autocorrelation curve, and we can see that the mapped period is corresponding to 24 hours, as shown at P2.

Time Complexity: DFT can be executed in O(nlogn) time using Fast Fourier Transform al-gorithm (FFT). Since autocorrelation is a formal convolution which can also be solved by FFT, its complexity is also O(nlogn). Thus, the overall time complexity of period detection is O(nlogn).

4.3.2 Specific Time Discovery

After the period detection step, for an App with its period D, this step first identifies multiple usage behaviors under the period D and then discover the specific times. Since different usage behaviors may share the same period, we have to separate them to discover the specific times. For example, a user may use an App at different times in weekday and weekend, but their period are both one day.

Given the period D and the App-history of an App, the App-history is first decomposed into several subsequences, called pieces, and then the pieces are clustered so that the App-pieces in the same cluster represent the same usage behavior. The App-App-pieces can be formally defined as follows :

Definition 4. App-pieces : The i-th App-piece is defined as AP S(i) = {xj|x^j = Ntwhere i = b_D^tc and j = b^{t mod D}c}, where is the length of time slot and Ntdenotes the number of usage logs of the App app at time t.

For example, time slot = 6, consider the App-history of an App app from 2010-01-01 to 2010-01-02 is AH(app) =< 0, 3, 0, 0, 1, 1, 0, 0 >. Given D = 24 hours, the App-history can be decomposed into two App-pieces: AP S(1) = {0, 3, 0, 0} and AP S(2) = {1, 1, 0, 0}. To identify the multiple usage behaviors, we perform the hierarchical clustering. Figure 4.2 shows an example of behavior identification where we first decompose the App-history into several App-pieces, and then we utilize EDR [24] to calculate the distance of two pieces.

For each group, we discover the specific times under the period D. We use the concept in [25]

to identify the specific times. Figure 4.3(a) shows the cumulative usage in 24 hours which is derived by the period detection step. We first separate the temporal domain into several candidate-intervals that the variance of usage in each interval could be minimized. Given a cumulative App-piece, the candidate-interval S = {s¹, s2, ...} can be minimized by argminSP_{|S| 1}

i=1 V ar(si, si+1) and var(si, si+2) > var for each i. The objective can be formulated as a recurrence: let n be the number of time slots in the cumulative App-piece and MV (i, j) be the minimum variance sum from sj to sn 1 on the premise that the sequence from si to sj has partitioned. Then, MV (i, j) can be

Figure 4.3: An example of specific time interval discovery

In Figure 4.3(b), the parts are [0, 8], [8, 11], [11, 21] and [21, 24]. We than calculate the usage of each candidate-interval and derive the local maximums of them to be the usage times. As shown in Figure 4.3(c), [8, 11] and [21, 24] are local maximums. Finally, the specific times are the means of usage in the local maximums which is depicted in Figure 4.3(d).

Time Complexity: Hierarchical clustering takes O(N²), where N denotes the total number of App-pieces [26], and to discover the specific times, generating the candidate-intervals takes O(n³) because it is a recurrence which can be solved by a dynamic manner, where n denotes the number of time slots in the cumulative App-piece.

Chapter 5 Apps Usage Pattern Based Prediction

In this section, we attempt to use the Apps usage patterns for the Apps usage prediction. We propose an Apps usage pattern based prediction method, abbreviated asAUP which predicts the top K Apps that are likely to be executed at the query time based on the Apps usage patterns. InAUP, we first present an Apps usage probability model to formulated the features of Apps usage patterns as the probabilities of the Apps execution, and then we propose two algorithms to derive a list of top K Apps.

5.1 Apps Usage Probability Model

For the Apps usage pattern of an App, the pattern includes three features that represent the Apps usage. To formulated the features of the Apps usage pattern as the probability of the Apps execution, there are three measurements according to the features.

5.1.1 Global-frequency Probability Measurement

the usage frequency of the App appk, and the probability of global f requency can be formulated For example, given the Apps usage pattern of the Gmail App, and its feature of global f requency, fg, the ug is 100. For all Apps usage patterns, the sum of their ug is 1000, then the usage probability of global f requencyof the Gmail App is 100/1000 = 0.1.

5.1.2 Temporal-frequency Probability Measurement

The usage probability of temporal frequency of an App represents that, in the specific temporal bucket, the App usage frequency under all Apps usage frequency. Before formulating the probability of temporal f requency, the query time tq need to be mapped to a temporal-bucket by following computation.

M apping(tq) = (tq t0) mod (bks + 1) (2) where t0 is the start time of the first temporal bucket and bks is the size of the temporal bucket defined by previous section.

For the feature ft=< u_b₁, u_b₂, u_b₃, ..., u_b_n >of the Apps usage pattern, the ubiis the cumulative usage frequency in the i-th temporal-bucket which is determined by the above section. After the query time and temporal-bucket mapping, the probability of temporal f requency of the App appkin the i-th temporal-bucket can be formulated as follows:

T F P (appk) = ubi(appk) Pn

j=1ubi(appj) (3)

where ubi(app_k)is the temporal f requencyof the Apps usage pattern of the appkin the i-th temporal-bucket.

For example, given the Apps usage pattern of the Gmail App and a query time, if the query time is mapped to 10th temporal-bucket by the mapping, the temporal f requency of the Apps usage pattern in 10th temporal-bucket of the Gmail App and other Apps are considered. If the

temporal f requency of the Apps usage pattern in 10th bucket of the Gmail App is 50 and the sum of other Apps is 100, then the probability of temporal frequency of the Gmail App is 50/100

= 0.5 at the query time.

5.1.3 Periodicity Probability Measurement

For the periodicity of the Apps usage pattern, the period and the specific times are discovered.

If the Apps usage pattern of an App exist the feature of periodicity, then the period of this App is discovered. However, we can not use the usage period for formulating the probability of periodicity, because for the usage period may exists more than one specific times and the periodicity does not exist at any time under the period.

To formulate the probability of periodicity, we present the time distance matching between the query time and the specific times. The closer specific time is considered for the probability of periodicity computation. The specific time st of the App appkis defined as appk.st. Given a query time qt, the inverted value of differences between qt and the specific time st are computed by the time distance function T D(appk.st, qt). The T D(appk.st, qt)is defined as follows:

T D(app_k.st, qt) =

For example, if Gmail App usage has two specific times, 08:23 and 22:15 under the period.

Given a query time 09:30, the query time is matched to the specific time 08:23, because the time distance between query time and the first specific time is 1 / | 09:30 - 08:23 | (minutes) which is closer than the second specific time 1 / | 09:30 - 22:15 | .

The probability of periodicity represents that the time distance between the specific times and the query time of an App under all Apps. For the periodicity of the Apps usage pattern, fp =<

p, st , st , ..., st >, the probability of periodicity of the App app can be defined as follows:

Figure 5.1: An example of round-robin prediction for top K Apps

00:15, and Calendar has one specific time 10:50. Given the query time 09:30, the time distance of Gmail is 1/58, Alarm is 1/120 and Calendar is 1/80. Thus, the usage probability of periodicity of Gmail App is (1/58) / (1/58 + 1/120 + 1/80).

在文檔中探勘手機應用程式的使用模式以預測其使用行為 (頁 19-27)