Chapter 3. Methodology
3.4 STAMP
Now the pattern is clearly defined, we propose a total solution “STAMP”, standing for the abbreviation of “Sequential Terminal mobility pAttern Mining and Predicting”. The STAMP algorithm, as illustrated in Figure 3.2, is composed of four phases with seven modules for three problems by three paths. Overall, the objective of STAMP is to find the moving portfolios of those people who are at high-risk.
That is, these people have higher moving or moving while calling frequency than others. And with these portfolios, we can solve each of the three problems online by predicting. The initial input of STAMP is all Moving Histories at the Home Registrar. Each Moving History is an individual registration list of location update while the mobile user moves into different domains, maintained by the Home
Registrar. The Moving History is composed of a tuple (location, time), recording “at which moment” this user moves into some different “domain”, denoted as <(D1, T1), (D2, T2), …(Di, Ti)>.
STAMP: 4 Phases with 7 Modules (for 3 problems by 3 paths)
MS
All Users’ MH at Home Registrar (Adaptive High-Risk Sifting)
Definition of Moving History (MH) : An individual registration list while moving into different domains, kept by Home Registrar. Moving History is composed of a tuple (location, time), which records “at which moment” this user moves into some different “domain”.
Rule Reducer
Personal MH
MH:
MS: Moving Sequence Moving History
Figure 3.2 : Methodology Framework: Sequential Terminal Mobility Pattern Mining and Predicting.
Before the three phases, there is a prior phase for preprocessing. Why we need preprocessing? The reason is we should take the resources available at the server into account. Thus, there is an adjustable parameter to control how many people we could rescue without any overhead at server side. During this preprocessing, we only select the moving histories of those people who belong to FM set or FCM set among all users’ moving histories.
The First Phase is to transform raw data into available information for further mining. In other words, we will firstly define “what is a meaningful moving behavior”. Based on the definition, the “Partitioner” module then partitions a long, meaningless personal Moving History into many valid Moving Sequences. Each
moving sequence represents a moving behavior at one time. For people belonging to FCM set, the “Filter” module will further filter Calling Sequences via Call Detail Record. The Call Detail Record keeps all information of each session, including caller, callee, session time, and even the location (i.e., each SIP Server on behalf of its domain). Thus, each calling sequence stands for a moving behavior while calling.
The Second Phase is to discover moving portfolios from moving sequences or calling sequences. The “Pattern Generator” finds all the frequent moving or calling patterns in an efficient manner. The “Rule Generator” linearly transforms the patterns into rules for easier readability and advanced processing. We also propose a new module “Rule Generator” to avoid overlapped rules. However, this module remains a future work.
The Third Phase is to solve problems online. According to the heterology of the three problems, we design two predictors for rule prediction and ranking to find the best result for each problem. Since Mid-Call Mobility and Pre-Call Mobility are both urgent problems, they share the same predictor. And, for Home Registrar Failure Restoration problem, it has an exclusive predictor.
Finally, the STAMP algorithms are primarily separated into two sections for offline mining and online predicting, as described in Algorithm 1 and Algorithm 2.
More details about each phase will be fully explained in Chapter 4.
Algorithm 1 : STAMP_Offline (Preprocessing, Phase I and Phase II)
Abbreviation:
MH: Moving Histories; MS: Moving Sequences; CS: Calling Sequences;
MP: Moving Patterns; CP: Calling Patterns; MR: Moving Rules; CR: Calling Rules;
CDR: Call Detail Records;
α: Threshold of moving frequency; β: Threshold of moving while calling frequency;
FM set: the mobile host set whose moving frequency>α;
FCM set: the mobile host set whose moving while calling frequency>β;
PF (MH): Phase I with input of some Moving History;
BR (MS): Phase II with input of some Moving Sequence;
BR (CS): Phase II with input of some Calling Sequence;
FM_R: denotes the set of mobile host ∈ FM set with their corresponding MR;
FCM_R: denotes the set of mobile host ∈ FCM set with their corresponding CR Input:
All Moving Histories in Home Registrar, Call Detail Records, αandβ Output:
Algorithm 2 : STAMP_Online (Phase III)
Abbreviation:
MCM: Mid-Call Mobility Problem; R_MCM: prediction result of MCM;
HRFR: Home Registrar Failure Restoration Problem; R_HRFR: prediction result of HRFR;
PCM: Pre-Call Mobility Problem; R_PCM: prediction result of PCM;
Predictor1 (CR): Predictor 1 at Phase III with input of a set of Calling Rules;
Predictor1 (MR): Predictor 1 at Phase III with input of a set of Moving Rules;
Predictor2 (MR): Predictor 2 at Phase III with input of a set of Moving Rules;
Output: Output of prediction result set;
Input:
The set of mobile host ∈ FM set with their corresponding MR;
The set of mobile host ∈ FCM set with their corresponding CR Output:
Adapted prediction results for MCM, HRFR and PCM Problems Pseudo Code:
STAMP_Online (FM_R, FCM_R) Begin
While every time a mobile host moves into a different domain, Do Set Output=φ ;
While every time backup after Home Registrar crash, Do Set Output=φ ;
IF MH ∈ FM set, Do
R_HRFR ← Predictor2 (MR);
Output ← R_HRFR;
Return Output;
End
Chapter 4.
Sequential Terminal Mobility Pattern Mining and Predicting (STAMP)
The literal meaning of “STAMP” indicates that once this algorithm is in use, no matter the transient packets or call requests will be definitely forwarded to where the user is most likely being located at present. The core of “STAMP” is to find the moving portfolios of those people who are at high-risk in advance so as to solve any of the three problems by adaptively online-predicting. Overall, the algorithm ultimately contributes to both fast handoff and smooth handoff for seamless mobility in SIP over WLAN.
4.1 Preprocessing — Adaptive High-Risk Sifting
Before getting into the principle part of “STAMP”, there is a prior phase for preprocessing. Why we need preprocessing? The reason lies in the reality that we should solve problems in an efficient way. Hence, the number of users for further phases depends on how many resources (power, load, and so forth) are currently available at the server. According to the performance status of the server, we can dynamically adjust the amount of users without any overhead at server’s side.
(However, the parameter adjustment remains a future work.)
Since Adaptive High-Risk Sifting (AHRS) phase aims at rescuing high-risk people. We define two mobile host sets. One is Frequent Moving (FM) set, which stands for Frequent Moving mobile hosts. The other is Frequent Calling while Moving (FCM) set, which represents Frequent Calling while Moving mobile hosts.
For each user belonging to FM set, it implies the users with higher moving frequency than others. That is, users in FM set are high-risk of HRFR Problem and PCM Problem, resulting in call failures. As a result, for FM set, we focus on finding users’ Moving Behavior. On the other hand, for each user belonging to FCM set, it suggests the users with higher calling while moving frequency than others. In other words, users in FCM set are high-risk of MCM Problem, resulting in packet loss.
Consequently, for FCM set, our concern is to find uses’ Calling Behavior. The main concept is illustrated in Figure 4.1
Figure 4.1 : Two sets of mobile users: FM set vs. FCM set
z Moving Frequency
Parameter: The Moving Frequency is defined as “Average Moving Occurrence Volume within per time period unit”.
ex: Within last year, Jenny moved 5000 times but Anny moved 1500 times in total; in other word, Jenny moves more frequently
than Anny.
z Moving while Calling Frequency
Parameter: The Moving while Calling Frequency is defined as
“Average Moving while Calling Occurrence Volume per call”. i.e., Velocity = Moving Occurrence Volume in CDR / Call Volume in CDR.
ex: Within last year, Jenny moved 2000 times in 1000 calls (Velocity = 2) while Anny moved 1000 times in 125 calls (Velocity = 8); in other words, Anny moves while calling at a much higher frequency.
z Moving Distribution
We suppose the moving distribution curve is depicted in Figure 4.2.
Figure 4.2
4.2
: Moving Frequency of FM or FCM set
Phase I — Partitioning & Filtering
We assume that the input of “STAMP” is the Moving History, on behalf of the individual registration lists, kept by Home Registrar. Each Moving History is composed of a sequence of tuples, (location, time), recording “at which moment”
the certain user moves into some different “domain”. As the input is a long record, it goes without saying that the reason why we need Phase I is obviously because the long record doesn’t convey any explicit information about “one meaningful moving behavior”, which is valuable information for further mining. Therefore, the objective of Partitioning & Filtering (P&F) phase is to transform these raw data into
available information.
Firstly, the “Partitioner” module partitions a long, meaningless personal Moving History of whomever in FM or FCM set into many valid Moving Sequences. Each Moving Sequence represents a moving behavior at one time. The
“Partitioner” segments out a meaningful Moving Sequence from the Moving History, based on two criteria: (1) If any two consecutive moving paths (Di-1 & Di) are not adjacent domains, then partition for the first time; (2) If the residence time of each domain Di-1 (i.e., ΔT=Ti-Ti-1) is longer than a predefined or statistical maximal window size, then partition for the second time. The former criterion implies strict consistency with the Adjacent Property of Moving Pattern & Calling Pattern Definition at Section 3.3. The latter criterion suggests that the meaningful Moving Sequence should be limited within a time space, such as the interval from registration to deregistration or per day.
Lemma 1: Every two consecutive items in a Moving Sequence β=<b1, b2, …, bn>, i.e., bi & bi+1, are adjacent domains.
Proof: Based on the definition of Adjacent Property of Moving Patterns and Calling Patterns in Section 3.3 as well as the fact that a mobile user could never cross a non-neighboring domain, it is easy to know the lemma is true.
□ Moreover, for people belonging to FCM set, the “Filter” module will further filter Calling Sequences via Call Detail Record. Each Calling Sequence stands for a moving behavior while calling.
Lemma 2: For every Calling Sequence α=<a1, a2, …, am>, there exists a Moving Sequence β=<b1, b2, …, bn> such that the Calling Sequence α is the Consecutive
Subsequence of the Moving Sequence β, denoted by α ⊆ β.
Proof: Each Calling Sequence records the moving behavior of a mobile user within the same call session, and is filtered from some Moving Sequence by the Call Detail Record, so the above lemma is true.
□ Theorem 1: Given a mobile user p, there exists a Moving History γp of the mobile user, a set of Moving Sequences M p={β1, β2, …, βm}, and a set of Calling Sequences C p={α1, α2, …, αn}, such that αi ⊆ βj ⊆ γp, where 1≦i≦n, 1≦j≦m, and i, j belong to integer.
Proof: It follows from the Adjacent Property, Lemma 1, and Lemma 2, thoroughly constituting the Phase I — Partitioning and Filtering, thus it is definitely true.
□ The whole process of both Partitioner and Filter modules is illustrated in the following example, Figure 4.3. Regarding the phase operation, refer to Algorithm 3.
Figure 4.3 : Example of Phase I - Partitioning & Filtering
Algorithm 3 : PF (Phase I: Partitioning & Filtering)
Abbreviation:
H: a Moving History; S: a set of Sequences; P: a set of Patterns; R: a set of Rules;
Input:
A Moving History & Call Detail Records Output:
A set of Sequences
(If MH∈ FM set then output a set of Moving Sequences; otherwise, Calling Sequences) Pseudo Code:
Since the earlier phase splits up individual Moving History into many Moving Sequences, the input of Phase II are well prepared. The goal of Behavior Mining (BM) phase is to find out moving portfolios of any user in FM or FCM set.
Firstly, the “Pattern Generator” module discovers frequent moving or calling patterns from Moving Sequences or Calling Sequences. Secondly, for the sake of easier readability and advanced process, the “Rule Generator” module transforms patterns into rules in a linear manner. E.g., pattern <D1, D2, D3> will be transformed into rules <D1→D2, D3>, and <D1, D2→D3>. Finally, the “Rule Reducer” module gets rid of the overlapped rules and keeps the complete ones. E.g.,
<D2→D3> and <D1, D2→D3> are overlapped. Thus the rule with shorter prefix or
postfix length, namely <D2→D3>, will be removed and only <D1, D2→D3> will be kept. Since the last module remains a future work, the whole process of Behavior Mining phase is simply illustrated in the following example, Figure 4.4. Concerning the phase operation, refer to Algorithm 4.
Figure 4.4 : Example of Phase II – Behavior Mining
Algorithm 4 : BR (Phase II: Behavior Mining)
Abbreviation:
S: a set of Sequences; P: a set of Patterns; R: a set of Rules; R’: a set of reduced Rules;
Input:
A set of Sequences Output:
A set of Reduced Rules
(If MH∈ FM set then output a set of Moving Rules; otherwise, Calling Rules) Pseudo Code:
BR (S) // if MH∈ FM set then BR (MS); otherwise, BR (CS).
Begin
P ← PatternGenerator (S); //based on Lemma 3 described in Section 4.3 R ← RuleGenerator (P);
R’ ← RuleReducer (R);
Return R’;
End
Particularly, the following lemma accelerates the candidate generation in the Pattern Generator module.
Lemma 3 (Pseudo Apriori): A moving sequence X=<x1, x2,…, xm> of length m has m-k+1 subsequences of length k, 1≦k≦m.
Proof: If we are going to generate a length-m candidate sequence, we only have to check whether the (m-k+1) length-k subsequences are frequent instead of
! K Ckm
. The key idea is that “Consecutive”, the Sub-sequence Relation, has a great impact on
“Pseudo Apriori” because “Consecutive” limits the relation between super vs. sub sequence. Thus, Pseudo Apriori is significant during the process of candidate generation because Pure Apriori fails generating and pruning candidates efficiently in our moving sequential pattern mining. Based on the “Consecutive” property,
“Pseudo Apriori” is more appropriate as the guideline for moving behavior mining in SIP domain. ex: <a, b, c> is a candidate length-3 sequence → all the length-2 subsequences must be frequent. Since <a, c> is not a length-2 subsequence of <a, b, c> anymore in Pseudo Apriori, the length-2 subsequence set of our concern is only {<a, b><b, c>}.
□ Besides, the Candidate Generation process within Pattern Generator module takes advantage of the network topology so as to enhance both efficiency and effectiveness during mining. We utilize both network topology of Graph and Large (N-1), resulting in a Hybrid manner of Behavior Mining. Since it is the time that matters, we then represent the Graph by adjacency matrix data structure, supposing there are enough memory and storage at the server.
Here we compare our Hybrid manner among different candidate generation
manners, as illustrated in Figure 4.5, Figure 4.6, and Figure 4.7.
z Pure Graph vs. Pure Large (N-1) vs. Hybrid
How efficient is the candidate generation process?
z How many candidates are generated in all? → least is best.
z ex: PG (55) vs. PL (38) vs. H (32)
How effective are the candidates?
z What percentage of candidates is certainly to be large? → largest is best.
z ex: PG (29%) vs. PL (42.1%) vs. H (50%)
Figure 4.5 : Example of Candidate Generation – Hybrid
Figure 4.6 : Example of Candidate Generation – Pure Graph
Figure 4.7 : Example of Candidate Generation – Pure Large (N-1)
According to the above, “Hybrid” exceeds in efficiency as well as in effectiveness from performance perspective. That is why we use “Hybrid” instead of
“Pure Graph” or “Pure Large (N-1)”.
4.4 Phase III — Problem-Oriented Predicting
Now that Phase I and Phase II have turned original individual Moving History into Moving or Calling Sequences and finally into Moving or Calling Rules, the rules undoubtedly become the input of Phase III. Why we need Phase III? The ground for Phase III is that we can solve each of the three problems online adaptively by Problem-Oriented Predicting with rules from Phase II.
The implication of the so-called “Problem-Oriented Predicting” is that we take the characteristic of each problem into account. The observation shows a key distinction between these problems: the 1st (MCM) and 3rd (PCM) Problems are both critically urgent in common; however, the 2nd (HRFR) Problem is not so exigent as MCM and PCM Problems. As a result, three problems are classified into two groups according to their demands. In accordance with these two problem groups, we devise two modules by the name of predictors: Predictor 1 targets on MCM as well as PCM Problem; Predictor 2 aims at HRFR Problem, respectively.
Before going into details of how each predictor functions, we can take a quick review on the Problems in SIP at Section 1.3.
Predictor 1: for problem 1st (MCM) or 3rd (PCM)
Scenario: If Jenny has followed the path <D1, D4, D2> and is currently at D2…
Question: Where is her next step?
Solution by Predictor 1:
z Rule Firing: Prefix.consecutive==true && Postfix.length==1
Rules can only be <D2→?>; <D4, D2→?>; <D1, D4, D2→?>; but the rule can never be <D1, D2→?>, because D1 and D2 are inconsecutive in this scenario.
z Rule Ranking: longest Prefix.length → highest Rule.support
if the rules fired are <D2→D3>, <D2→D5> and <D4, D2→D4>, then <D4, D2→D4> will be ranked first. So the output is D4.
z Solution: Home Registrar will tell D2 Jenny will probably head for “D4” so as to forward packets or call requests at the moment.
Predictor 2: for problem 2nd (HRFR)
Scenario: If Jenny has followed the path <D1, D4, D2> from backup after crash…
Question: However, is Jenny really in D2 now? (i.e., Jenny haven’t moved since last backup)
z Rule Firing: Prefix.consecutive==true
Rules can only be <D2→?, ?...>; <D4, D2→?, ?...>; <D1, D4, D2→?, ?...>.
z Rule Ranking: Postfix[i] << Postfix[i+1], i++, where 0 < i < Max(Postfix.length+1)
if the rules fired are <D2→D3>, <D2→D5>, <D4, D2→D4, D2>, <D4, D2
→D5>, and <D1, D4, D2→D4, D2>, the output will be <step1: (D3, D4, D5), step2: (D2)>.
z Solution: Home Registrar will firstly query D3, D4, and D5 simultaneously and then D2 incrementally until HR restores the current location.
Each of the two predictors in Phase III is algorithmically presented in Algorithm 5 and Algorithm 6. These two algorithms along with the Algorithm 2 in Section 3.4 contribute to the coherence of STAMP online predicting.
Algorithm 5 : Predictor1 (Phase III - 1st module)
Adapted prediction results of MCM and PCM Problems Pseudo Code:
Predictor1 (R) // If PCM then Predictor1 (MR); if MCM then Predictor1 (CR) Begin
For each Rule, Do
IF the prefix of the rule is consecutive & the postfix length of the rule equals 1, Do The rule R is fired;
Rank all fired rules according to first priority (rules with longest prefix length ranks first);
For the rules with same prefix length, Do
Rank rules by second priority (rules with highest support ranks first);
Return PR;
Adapted prediction result for HRFR Problem Pseudo Code:
Rank all fired rules according to postfix positions from 0 to maximal postfix length;
Return PR;
End
4.5 Mechanism Activation
Below, Figure 4.8 describes the summary of relationships among problems, activation strategies, patterns of different roles, and even the mobile host sets.
For MCM Problem, whenever the mobile host of FCM set makes an Inter-domain Mobility and registers at the home registrar, the prediction results in terms of sequential calling patterns are sent from the home registrar to the current domain SIP server along with the authentication information. Therefore, once the user makes another Inter-domain Mobility, the calling patterns will assist the previous domain in intercepting and forwarding media packets to the predicted domain(s) in case of packet loss before delayed re-Invite reaches the corresponding host. With these calling portfolios, the reduction amount of packet loss increases when the mobile host becomes more far away from the corresponding host becomes in distance.
For HRFR Problem, as soon as the home registrar gets crashed, the SIP server will query the predicted domains of those mobile users of FM set, recovered from backup, in terms of sequential moving patterns incrementally. The mission is to restore current & complete location information in case of obsolete location information to any upcoming call request before the mobile user makes another Inter-domain Mobility and location update.
For HRFR Problem, as soon as the home registrar gets crashed, the SIP server will query the predicted domains of those mobile users of FM set, recovered from backup, in terms of sequential moving patterns incrementally. The mission is to restore current & complete location information in case of obsolete location information to any upcoming call request before the mobile user makes another Inter-domain Mobility and location update.