Warehouse Operations and Queries - Multi-Granularity Moving Pattern Analysis

Multi-Granularity Moving Pattern Analysis

3.5 Warehouse Operations and Queries

After two stages of processing, the extracted moving patterns are stored into table moving patten of the trajectory data warehouse. The granularity of grids used in generating patterns are recorded in table grid unit, and the information of time periods in table time grid. These relational attributes would be retrieved together in information queries and moving pattern operations.

Moving patterns happen with spatiotemporal properties. Focusing pattern anal-ysis on spatial or temporal domains would help information presenting. So, we first discuss about the spatial and temporal operations designed for retrieving related moving patterns in our trajectory data warehouse. Next, queries often request summarizations on one or multiple dimensions over value ranges larger than basic units. And maintenance is worked through further condensing aged moving patterns with larger granularity grids. In response to the need, aggregation of patterns is designed. Afterward, we propose a process to deal with the distinct trajectory esti-mation problem caused by dividing stage during moving pattern extraction. The distinct estimation is done by recording some additional attributes in pattern extrac-tion processes. In table flow trace, records keep track of the duplicately counted trajectory amount between patterns. Finally, we discuss about several types of moving pattern queries, which work with both ordinary and special designed oper-ations our trajectory data warehouse can provide.

3.5.1 Spatial and Temporal Operations

In a query, we usually limit value ranges on some of the attributes to get wanted information. In moving pattern query and analysis, a geographical area or a time period is often a focused target. We are interested in finding patterns related to a regional area or a time period. Thus we provide several pattern query operations on spatial and temporal domains to meet the need.

Here we define the operations in spatial domain, focusing on the laying positions and passing areas of patterns. First, we deal with the operations related to start and end points of moving patterns.

Operation 3.1 (Start from, End in). A moving pattern P T N starts from an area A if its start point is inside A. Similarly, P T N ends in A if its end point is inside

The query target area A is decided by four points {(x_s, y_s), (x_e, y_s), (x_s, y_e), (x_e, y_e)}, where x_s< x_e and y_s< y_e, as area A in Fig. 3.4. And assume P T N has start point (x_a, y_a) and end point (x_b, y_b). Then, P T N starts from A if

x_s ≤ x_a < x_e and y_s≤ y_a< y_e, and P T N ends in A when

xs ≤ xb < xe and ys≤ yb < ye.

With start from and end in operations, we can find patterns initialized and finalized in a region, and the changing and connection among patterns in an area.

Next, a P T N might all happen inside an area.

Operation 3.2 (Lay on). A moving pattern P T N lays on an area A if it both starts from and ends in A. Meanwhile, a point P lays on A if it is inside A. Thus, we can redefine P T N starts from A as its start point lays on A, and P T N ends in A when its end point lays on A. Analyzing patterns lay on a larger area helps emphasizing moving behaviors inside a region, distinguishing them with those cross region boundaries.

Moreover, we are interested in moving patterns related to an area, even their attributes show no direct connection to the area. A moving pattern might neither

start from nor end in an area, but pass through it. Pattern analysis of an area would not be completed without finding out the passing through ones, and we would like to reveal them via an operation.

Operation 3.3 (Pass through). A moving pattern P T N passes through an area A if the P T N intersects with exactly two boundaries of A. The pass through definition excluded the start from or end in cases which inter-sect only one or less boundary. While the idea of a P T N interinter-sects with exactly two boundaries of A is simple, the computations cost is high in checking intersection with each area boundary in turn. We analyzed the properties and figured out its would be more efficient to deal the computation with diagonal lines of area and their slopes. Thus, we redefine P T N passes through A as it intersects with one of the two diagonal lines in A.

When a pattern is neither start from nor end in A, we start to check for its intersection with A. First we define the diagonal lines L+ and L− of A as

L₊ : ((x_s, y_s), (x_e, y_e)), with positive slope, and L− : ((x_e, y_s), (x_s, y_e)), with negative slope.

Next, we calculate the slope of P T N line L_{P T N} as

Slope_{P T N} =M y/ M x = (yb− y_a)/(x_b− x_a).

When SlopeP T N > 0, we find for intersecting point of LP T N and L−, else we search for intersection of L_{P T N} and L₊. Then, P T N passes through A, if the following check is met with intersecting point (xc, yc).

x_s< x_c< x_e and y_s < y_c< y_e and x_a < x_c< x_b and y_a< y_c < y_b.

The above mentioned spatial operations of moving patterns are all illustrated in Fig. 3.4, the moving pattern is starts from area B, ends in area D, lays on larger area E and passes through area C.

Besides spatial domain, we are also interested in querying for patterns in a time period. In temporal domain, while we store moving patterns period by period, patterns are mapping to time grids they belong with their recorded T_IDs.

Figure 3.4: Illustration of spatial domain operations.

Operation 3.4 (Happen during). A moving pattern P T N happens during a time period T , if its corresponding time grid T_ID starts and ends in T . We define target period T start from t_i and end at t_j, and the valid period of P T N queried with T_ID as (t_s, t_e). P T N happens during T when

t_i ≤ t_s < t_e< t_j.

The T which patterns happen during may also be a periodical ranges of time, say every 7-9 AM or every Sunday. The time period selecting operation would help pattern queries.

3.5.2 Aggregation and Warehouse Maintenance

The moving patterns storing in the trajectory data warehouse are extracted with regular-sized grids. However, when users query for information, often they may want to know about attributes with ranges larger than the basic granularity in one or more dimensions. The query results are done by roll-up operation, which selecting the related patterns from multiple unit grids according to the value requested on each of the dimension. The slice and dice operations select patterns while giving limit on only one and two or more dimensions, respectively. These operations mainly operate as grouping patterns from multiple grids which meet the required ranges of all dimensions. Since a pattern is extracted based on a unit grid, a gd_u is also the minimum granularity of these operations. When users required not only list of matching patterns but also summarized information of them, aggregation over patterns need to be computed.

Operation 3.5 (Aggregation). An aggregation of moving patterns finds the new representation line for selected patterns. The start and end positions, time span,

speed, direction and group amount should be renewed as original groups Gs are

combined and a new group G⁰ is generated.

The new pattern is mainly decided by weighting old group amount |G| of each pattern. The new moving patterns L_G⁰ generated in aggregation is computed through using each original group amount as weighting parameter to find the new group representations following (3.6) and (3.7). And the new group amount |G⁰| =P |G|.

c_L_G0 = the number of lines in a original group G.

pL_{G0 s} = cL_G0 − l_L

For the trajectory data warehouse to run in a long term, we need a histor-ical pattern maintenance method. The historical maintenance should efficiently condense required storage space, while still keeping main information of aged patterns.

First, design of hierarchical granularity levels can help for warehouse maintenance.

Several hierarchy of granularity are defined for different aged patterns, usually focusing on spatial area and time range, while sometimes on distinguishing of direc-tion, length or speed. For example, the vehicle moving patterns may first be analyzed in 0.5 ∗ 0.5km² per hour. After a month, the stored pattern may be aggregated into 1 ∗ 1km² per three hour grids. The coarser granularity should always be defined as integer times of its finest level, so the aggregation operation can be processed. While applying the spatial and temporal operations defined earlier in this section, the unit of coarser granularity spatial area should always be continuous, and the granularity of time period might become a periodic time set, such as every 9-11 AM of day in a month. The grid granularity used for data warehouse maintenance of different

aged data are recorded in table grid unit. Then the maintenance is periodically triggered to aggregate the aged patterns according to new gdu and compute new summarized representation lines. After the reanalysis is done, old patterns along with the corresponding time period are removed, and new generated patterns with a combined time period of higher level are stored, in both table moving pattern and table time grid.

When the data warehouse is maintained with aggregation operation, the minimum group size th_amt may be increased, as gd_u for generating patterns are enlarged.

Thus, both information loss caused by pattern generation would be increased in these renewed aging patterns. This is a trade off between the amount of data to be stored and the conciseness of information to be kept in a long run.

3.5.3 Distinct Trajectory Estimation

Moving patterns kept in the data warehouse are originated from trajectory data sets.

For times, we would like to know about the amount of distinct trajectories passing an area. However, different moving patterns happen in an area may include replacement lines originally continuous in one trajectory. Thus, to count distinct trajectories related to an area, additional process is required besides simply check the member amount summation of all patterns in an area. To deal with the distinct trajectory estimation problem without keeping identity list of members in each pattern, we introduced table flow trace to record flow information between groups.

As discussed in earlier section, when dividing trajectories, only information of latest checked data points of each trajectory and extracted replacement lines of current period are kept in the temporary tables. Based on the limited information, an additional attribute, previous line ID, is introduced to table divide check. This attribute register the replacement line ID as a L_Rgenerated from a trajectory. When the record TR_i is initialized in Algorithm 3.1, L_pID = NULL is set. As a L_R is decided to be output, the existing L_pID is also transferred into table replacement line. Then the ID of just generated L_R is recorded back in L_pID of table divide check as the attributes in record TRi are reset.

In table replacement line, two additional attributes, belong group ID, G_ID, and next-line group ID, G_nID, are added and initialized as N U LL. As a L_R is

Algorithm 3.3 Trajectory flow amount trace

Input: a set of attribute pairs {G_ID, G_nID} from table replacement line Output: a set of flow amount records for table flow trace

for each records in table replacement line do if record G_ID= N U LL or G_nID= N U LL then

continue to check next record

else if record {G_ID, G_nID} pair does not exist then Add record {GID, GnID} pair

Initialize flow amount |F | = 1 else

Update record {G_ID, G_nID} pair, |F | = |F | + 1 end if

end for

included in a group G and the corresponding pattern is generated, the G_ID is updated in record L_R. And according to the L_pID registered with L_R, the G_nID of its previous line is updated as well. After finishing pattern extraction through Algorithm 3.2, the G_ID and G_nID pairs in table replacement line are summarized with Algorithm 3.3. Those pairs without N U LL values are processed and the flow amounts of each different pairs are calculated. The distinct G_ID and G_nIDpairs are record into table flow trace as from and to group IDs, along with their amounts,

|F |, for later distinct estimation usage.

Operation 3.6 (Distinct flow amount). The distinct flow amount AM T in an area A indicates the number of distinct trajectories moving start from, end in, or pass through A. Which can be seen as the total group amount, |G|, minus the amount of items that have their next lines also moving in the area, which caused same trajectories being counted among multiple groups. The AM T in area A can be estimated with the summation of group amount,

|G|, from table moving pattern, minus the summation of flow amount, |F |, from table flow trace, which both of its to and from group are included in the query results. This can be describe with an equation as follow,

AM T = X

G∈A

|G| − X

(Gf rom∈A and Gto∈A)

|F |. (3.8)

Using 3.3 and (3.8), the distinct trajectory amount in an area can be efficiently estimated. The process of parsing additional attributes for distinct estimation can be done in O(m), where m is the amount of L_Rs. The results in flow trace are only approximate estimation, though. The errors can come from several sources. For one, some replacement lines would not be included into patterns when not reaching the thamt required for a pattern. This might cause underestimate of original trajectory amount, but only to the ratio of information eliminated in pattern generation. In the mean time, an object might start from an area, then move out, and travel back into the area again later. This would cause some overestimate of distinct flow amount.

Moreover, when only limited space is available for group flow trace, we might have to keep only the larger amount flow records between groups and eliminate relatively minor ones. This information loss would also cause some underestimate of original trajectory amount.

3.5.4 Moving Pattern Queries

With ordinary condition assignments plus the operations described earlier in this section, moving pattern queries with different complexities can be answered. The basic queries assign some specific properties and get matched moving patterns from the trajectory data warehouse as direct answers. In advanced level, queries include some extra criteria and processes for extracting representations of patterns or their summarized properties. Application level of queries try to give out overall pictures of moving patterns in the trajectory data warehouse, and might further include in other spatial, temporal or related domain information for analytic processes.

A query on specific pattern properties may focus on the value range of one or more attributes, such as finding patterns move in a direction around certain speed. The queries may also work on an area with spatial operations, like finding patterns start from a spatial grid or pass through an area. Or may apply limits on temporal domain to look for patterns happen during some hours of a day. These kinds of property queries work like filters, and are useful in finding out moving patterns matching some assigned conditions.

Query 3.1. Search for moving patterns related to (including start from, end in, lay on and pass through) crossroad area A, with moving speed larger than 30

km/hr, happen during 9-10 AM, 2010-Nov-24. Next, adding in the aggregation and distinct flow amount operation, patterns are processed to further reveal information. Such as identify the average moving speeds in different directions of an area, through aggregating patterns in each direc-tion. Or analysis the distribution of length spans in certain direction on a road, to find the effectiveness of traffic light chain setting. Or summarize the end in amount of a large area grid by grid, to find out possible locations with special events happening as people gathering. Or process patterns of a spatial grid along each time unit and find out the major directions or the change of average speeds during a day. These queries with summarizing processes can be used to analyze characteristics of attributes under different conditions.

Query 3.2. For moving patterns related to crossroad area A, which happen during 8-10 AM, 2010-Nov-24, summarize the moving condition in each of their four forward direction units through finding average moving speed and accounting

flow amount.

Query 3.3. Find the behavior changes over a day through calculating the average speeds per hour, in direction north to south, among moving patterns related to

crossroad area A, happen during 2010-Nov-24.

Further more, with moving patterns collected in the trajectory data warehouse, general moving conditions can be built as pattern templates for applications. A template of work day morning traffic condition in downtown area can be extracted through finding common patterns from long term collection. Then the pattern template can be used for abnormal behavior detection or regional condition analysis.

Or through tracing condition of grids in a long term, the repeated pattern change in speeds, amounts or directions can be revealed and logged as regular trend. Then the trend can be used for event detection or change prediction. These processes can add value to moving patterns stored in the trajectory data warehouse and provide various kinds of useful applications.

Query 3.4. Find common moving patterns related to crossroad area A, happen during 8-10 AM, work day of Nov, 2010, which is the aggregation results of

moving patterns occurred on same unit grids during more than 70% of the assigned periods. Then build it as a pattern template of work day 8-10 AM. Query 3.5. Search for the series trend of speeds among moving patterns related to crossroad area A, through analyzing average speeds in each directions per hour on patterns happen during Nov, 2010. Then find out the repeated patterns and save

them as regular trends.

在文檔中以軌跡資料剖析與倉儲進行物體移動行為分析 (頁 78-87)