Algorithm MF: Deriving Movement Functions

A Regression-based Approach for Mining User Movement Patterns

3.2 A Regression-based Approach for Mining User Movement PatternsMovement Patterns

3.2.4 Algorithm MF: Deriving Movement Functions

Given the aggregation movement sequence AM S devised by algorithm LS and its clustered time projection sequence CT PAM S generated by algorithm TC, al-gorithm MF is able to derive a sequence of movement functions able to estimate the frequent movement behaviors of mobile users. For each cluster, we need to derive confidence movement functions. Then, linkage movement functions are de-termined to link confidence movement functions among clusters. Finally, a move-ment functionF (t) is derived and represented as < U0(t), E1(t), U1(t), E2(t),..., Ek(t), Uk(t) >, where Ei(t) is the confidence movement function in cluster CLi

ofCT PAM S andUi(t) is the linkage movement function from Ei(t) to Ei+1(t).

Deriving Confidence Movement Functions

For each clusterCLiofCT PAM S, the confidence movement function of a mobile user, expressed as Ei(t) = ( ˆxi(t), ˆyi(t), T Ii), is derived. In this case, ˆxi(t) (respectively,yˆi(t)) is a movement function in x-coordinate axis (respectively, in y-coordinate axis) and the confidence movement function is valid for the time interval indicated inT Ii.

Without loss of generality, let CLi be < t1, t2, ..., tn >, where tj denotes one of the time slots in CLi for j = 1, 2, ..., n. AM Rⁱ contains frequent base stations with their corresponding counts in the i-th time slot of AM S. To de-rive movement functions, the location of base stations should be converted from the symbolic model into the geometric model through a map table that indicates the coordinates of base stations and is provided by telecompanies. Hence, given AM S and CT PAM S, for each cluster of CT PAM S, the geometric coordinates of frequent base stations can be derived along with their corresponding counts and represented as (t1, x1, y1, w1), (t2, x2, y2, w2), ..., (tn, xn, yn, wn) where ti is the corresponding time slot,xi(respectively,yi) is the x-coordinate (respectively,

y-coordinate) of the base station, and wi is the number of phone calls a mobile user has made at this base station. Accordingly, for each cluster of CT PAM S, a weighted regression analysis is able to derive the corresponding confidence move-ment function.

Given a set of data points, the goal of regression analysis is to derive the best estimated curve with the minimal sum of least square errors [26]. One aggrega-tion movement sequence is generated in Step 1, which calculates the appearance counts of base stations. Thus, based on the appearance counts of base stations, we can derive curves closer to those base stations with larger appearance counts.

This is because the more calls a user makes at a base station, the more confidence we have that this mobile user frequently appears in the coverage area of this base station. Another advantage of utilizing weighted regression analysis is that in a real scenario of mobile computing systems, the base station that serves to a user is not always the nearest base station. This is because other base stations nearby will cover the nearest base station when it becomes overloaded. However, the scenario above does not always happen. The appearing counts of other base stations will be fewer than that of the nearest base station. Therefore, weighted regression analy-sis makes it possible to derive curves close to base stations with higher appearance counts.

Given data points within a cluster, this article considers the derivation of the ˆ

x(t). An m-degree polynomial function ˆx(t) = a0 + a1t + ... + amt^m is de-rived to approximate the movement behavior along x-coordinate axis. Given the data points(t1, x1, y1, w1), (t2, x2, y2, w2) ,..., (tn, xn, yn, wn), the regression coef-ficients{α0, α1, ...am} are then selected to minimize the residual sum of squares

²x=Pn

i=1wie²_i, whereei= (xi−(a0+ a1ti+ a2(ti)²... + am(ti)^m)). The value of m is application dependent, and must be smaller than the number of data points.

The value of m is proportional to the precision of the fitting curve. Since ˆx(t)

ti ID xi yi wi

Table 3.5: Data points with their corresponding weights

is obtained by matrix operations, the matrix size is thus the dominant factor in regression performance. However, the impact of weighted regression analysis on execution time is not significant in this article since the maximal value of m is usually small. When the value of m is small, the execution time of regression analysis is acceptable. Therefore, according to the number of data points avail-able, the value ofm should be set as large as possible.

For ease of presentation, the following terms are defined:

H =

By solving the equation (√

WH)^T(√ sult, for each cluster of CT PAM S, the confidence movement function Ei(t) = (ˆx(t), ˆy(t), [t1, tn]) of a mobile user can be devised.

For example, let AM S =< {A : 16, B : 1}, {A : 1}, φ, {D : 2, F :

3For the proof, see Appendix A.

3}, {H : 2} > and the coordinates of A, B, D, F and H be (1, 1), (1, 2), (4, 2), (3, 3) and (5,3), respectively. Given AM S and CT PAM S =< 1, 2, 4, 5 >, it is possible to obtain data points with their weights, as Table 3.5 shows. By setting m to 3, the 3-degree polynomial ˆx(t) = a0+ a1t + a2t² + a3t³ is de-rived. The coefficients a0, a1, a2 and a3 are determined by a regression curve that minimize the residual sum error. That is, a^∗ = ( a0 a1 a2 a3 )^T must be determined. Since there are six data points with their corresponding time

slots of 1, 1, 2, 4, 4 and 5, H =

weights of the data points are 16, 1, 1, 2, 3 and 2, respectively. Hence, √ W is a diagonal matrix with the diagonal entries of [√

16, √ 1.021t² − 0.105t³. Figure 3.3 shows that the confidence movement functions, where the circle point indicates the location of a base station with its correspond-ing weight and the solid line is the curve derived by algorithm MF. The confidence movement function closely resembles actual movement behavior, demonstrating the advantage of utilizing regression analysis to mine user movement patterns.

Algorithm 5: Algorithm MF

Input: AM S and clustered time projection sequence CT PAM S

Output: A list of movement functions

F (t) =< E1(t), U1(t), E2(t), ..., Ek(t), Uk(t) >

1: begin 2: F (t) =<>;

3: fori = 1 to k − 1 do 4: begin

5: doing regression onCLito generateEi(t);

6: doing regression onCLi+1to generateEi+1(t);

7: t1 =the last time slot in CLi; 8: t2 =the first time slot in CLi+1;

9: using inner interpolation to generateUi(t) = (ˆxi(t), ˆyi(t), (t1, t2));

10: insertEi(t), Ui(t) and Ei+1(t) in F (t);

11: end

12: if1 /∈ CL¹then

13: generateU0(t) and Insert U0(t) into the head of F (t);

14: ifε /∈ CLkthen

15: generateUk(t) and Insert Uk(t) into the tail of F (t);

16: end

1 2 3 4 5

Figure 3.3: An illustrative example of deriving confident movement functions.

Deriving Linkage movement Functions

Given AM S and a cluster of CT PAM S =< CL1, CL2, ..., CLk >, algorithm MF generates the whole confidence movement function, denoted as F (t). F (t) is represented as< U0(t), E1(t), U1(t), E2(t), ..., Ek(t), Uk(t) >, where Ei(t) is the confidence movement function in cluster CLi of CT PAM S and Ui(t) is the linkage movement function fromEi(t) to Ei+1(t). Algorithm MF (from lines 5 to 6) shows that for each cluster of CT PAM S, the corresponding confidence movement functions are derived using the regression method above. However, the first time slot may not be included inCL1. Ift0 is the first time slot ofCL1

andt0 6= 1, the U0(t) = {E1(t0), [1, t0)} is generated for the boundary condition.

Otherwise,U0(t) will not be valid in F (t). The same is true for Uk(t). The linkage movement function is calculated by interpolation (in line 9 of algorithm MF).

For example, assume thatCT PAM S =< 1, 2, 4, 5 >, < 7, 9, 10 >, E1(t) = (2.333 − 2.133t + 0.867t²− 0.066t³, 2.529 − 2.386t + 1.021t²− 0.105t³, [1, 5]) andE2(t) = (6 + 1.17t − 0.16t², 3 + 0t + 0t², [7, 10]). It can be verified that the first time slot of cluster< 1, 2, 4, 5 > is 1. The last time slot of < 1, 2, 4, 5 > is

5 and the first time slot of cluster < 7, 9, 10 > is 7. Thus, a linkage movement function should be generated by inner interpolation. From E1(t), at the 5th time slot, we can have a data point(x = 5.09, y = 3). At the 7th time slot, a data point (x = 6.35, y = 3) is generated by applying E2(7). By inner interpolation, we could haveU1(t) = (1.94 + ^6.35−5.09₇₋₅ t, 3 + ³⁻³₇₋₅t, (5,7)). Similarly, U2(t) can be determined. After obtaining the confidence and linkage functions, the F (t) =<

E1(t), U1(t), E2(t), U2(t) > can be derived. Figure 3.4 shows the snapshot of F (t). When using F (t) to predict the location of mobile users, we will only use the confidence movement function whose time interval includes the given timet.

ForF (t) =< E1(t), U1(t), E2(t), U2(t) >, when the time is 4, only E1(t) will be used to predict the location since the given time 4 is within the time interval of E1(t).

Time Complexity Analysis: Algorithm MF is of polynomial time complexity.

When the maximal size in row/column is n, the time complexity used to solve the normal equation by Strassen’s algorithm isΘ(n^{lg 7}) [23]. Moreover, the inter-polation by Lagrange’s formula requiresΘ(m²), where m represents the number of points involved in the interpolation [23]. Sincen is usually larger than m, the value ofΘ(n^{lg 7}) dominates the complexity of algorithm MF.

3.2.5 Estimating A User’s Location Based on a Movement

在文檔中軌跡模式探勘與應用之研究 (頁 65-71)