Chapter 2 Related Work
2.3 SGCI Algorithm
SGCI algorithm is a community evolution detection proposed by B. Gliwa, S.
Saganowski, A. Zygmunt, P. Bro?dka, P. Kazienko, and J. Kozlak. SGCI stands for
Stable Group Change Identification. The algorithm can be split in the following steps:
Step 1. Identification of fugitive groups in the separate time periods
Step 2. Identification of group continuation – assigning transitions between groups
in adjacent time steps.
Step 3. Separation of the stable groups (lasting for at least three subsequent time steps).
Step 4. Identification of types of group changes. Assigning events from the list
describing the change of the state of the group to the transitions
In Step2, it use a measure called modify Jaccard to judge is there a transitions
between groups. The definition of modify Jaccard is 𝑀𝐽(𝐴, 𝐵) = max (𝐴∩𝐵𝐴 ,𝐴∩𝐵𝐵 ) ,
where A and B are two groups. Because of this simple but useful measurement, the
computation time of SGCI algorithm is very short compare to other algorithm. In Step4,
there are 8 events in SGCI’s definition. They are “Split”, “Deletion”, “Merge”,
“Addition”, “Split_Merge”, “Decay”, “Constancy”, and “Chang Size”.
Chapter 3 Our Method
In this chapter, we talk about how we do the community evolution detection and
prediction, including our definition of evolution types. We also introduce how we
transfer evolution chains to evolution types, and how we predict community evolution
types. Before we do the community detection, we examine our evolution detection
algorithm with our own synthetic dataset. In order to address these issues, in this chapter
we as well introduce how we generate our synthetic data, and how we measure the
correctness of Long-term Evolution Method.
3.1 Synthetic Data Generator
We use our own synthetic data to test Weiux’s algorithm, so in this section we talk
about how we generate the synthetic data. As we want to verify Weiux’s algorithm is
robust and can correctly detect the evolution, we are supposed to have datasets with
evolution ground truth to judge performance. To our best knowledge, there is no real
world dataset containing the ground truth of community evolution, so it is required to
generate a synthetic dataset on our own and assign the community evolution manually.
If Long-term Evolution Method can precisely detect the community evolution in the
synthetic dataset, then it indicates that this algorithm is useful and correct.
Our synthetic data generator is based on the Stochastic Block Model [4], which is
a widely used model for generating synthetic dataset with community ground truth. For
a simple Stochastic Block Model, each of nodes will be assigned to one of 𝐾
communities. Assume node 𝑖 is in community 𝐶𝑖 and node 𝑗 is in community 𝐶𝑗.
An edge is placed between node 𝑖 and node 𝑗 with probability 𝜓𝐶𝑖𝐶𝑗. We can define
a 𝐾 × 𝐾 matrix 𝜓 to construct all the edges [5]. Here we define 𝜓𝐶𝑖𝐶𝑗 = 𝑝𝑖𝑛 if
𝐶𝑖 = 𝐶𝑗, otherwise 𝜓𝐶𝑖𝐶𝑗 = 𝑝𝑜𝑢𝑡.
To build a simple but representative synthetic data, we only consider two kinds of
community evolution – “merge” and “split” in our synthetic data. We first generate a
graph with community by the Stochastic Block Model. After generating one graph, we
assign which communities should merge and which communities should split. Finally,
we generate a new graph followed by our evolution assignment. The following content
is the detail of the synthetic data generator.
Step1 Generate one graph by Stochastic Block Model. In this step, we generate
graph by Stochastic Block Model with two probabilities, 𝑝𝑖𝑛 and 𝑝𝑜𝑢𝑡. If two nodes
are in the same community, there will be an edge connecting two nodes with probability
equal to 𝑝𝑖𝑛. On the other hand, if two nodes are in different communities, there will
be an edge connecting two nodes with probability equal to 𝑝𝑜𝑢𝑡. To generate graphs
with community structure, the constraint is that 𝑝𝑖𝑛 should be larger than 𝑝𝑜𝑢𝑡 ,
because the definition of community is the connection between nodes in same
community is larger than connection between nodes in different community.
Throughout this step, we will set number of communities, and number of nodes in each
community. With 𝑝𝑖𝑛 and 𝑝𝑜𝑢𝑡, we can generate a graph with community ground truth
based on Stochastic Block Model.
Step2 Assign community evolution. In this step, we decide which community
should merge or split in next timeslot. We first separate communities into two groups.
The community in the first group are going to merge with each other in next time slot,
and the community in the second group are going to split into two communities. For
example, we can assign community 1.1 and community 1.2 to merge into community
2.1, and assign community 1.3 to split into community 2.2 and community 2.3, where
community a.b means the b-th community in timeslot a. In this example, community
1.1, community 1.2, and community 2.1 are in the same evolution chain. For easy
computation and analysis, there will be only two communities becoming one or one
community becoming two. There are no three or more communities merging into one
or one community split into three or more communities. In this step, we also decide the
evolution in the timeslot after next timeslot. If two communities are merging into one
community in next timeslot, the merged community will split into two communities in
the timeslot after next timeslot. For example, if community 1.1 and community 1.2
merge into community 2.1, then community 2.1 will, as our design, split into
community 3.1 and community 3.2. The reason why we choose to specify the merged
group should split in next timeslot is to make this synthetic data simple and reduce the
time for generating community evolution.
Step3 Move nodes to new communities and reconstruct the graph. After we assign the community evolution, we move nodes from original communities to new
communities one by one and reconstruct a new graph. We move nodes from the
communities in present timeslot to the communities we assigned in next timeslot. Every
node object will have a variable about which community the node is in. By change
value of this variable, we move the node to new community. When a node is moved to
a new community, some of its edges should be reconstructed with the same probability
used in constructing Stochastic Block Model in Step1. The edges which will be
reconstructed will be mentioned later. There are two possible evolution of communities:
merge and split. For two communities which will merge into a new community, nodes
will break their edges with the community and the community they are going to merged
with. The edge connecting in the new community will be generated with probability
equal to 𝑝𝑖𝑛. For example, if a node is originally in community 1.1 and it is assigned
to merge with community 1.2 into community 2.1, the node will break all the edges
connecting to nodes in community 1.1 and community 1.2 but remain the edges
connecting to other communities. The edges in community 2.1 will reconstruct with the
probability equal to 𝑝𝑖𝑛. For a community that will split into two communities, nodes
in the community will break their edges with each other, and the edges connecting
between the two new communities will be assigned with the probability 𝑝𝑜𝑢𝑡 . For
example, if a node is originally in community 1.3 and community 1.3 is going to split
into community 2.2 and community 2.3, the node will break all the edges connecting
to nodes in community 1.3. The edges connecting between nodes in community 2.2 and
nodes in community 2.3 will be generated with probability 𝑝𝑜𝑢𝑡. It is notable that edges
should be unchanged if they connect two nodes from different communities which
won’t merge together. For example, if community 1.1 merges with community 1.2, the
edges connecting nodes in community 1.1 and nodes in communities other than
community 1.2 should keep unchanged. After all nodes are moved to new communities
respectively and all edges are reconstructed with the probability set in Step 1, we get a
new graph with new communities, new edges, and community evolution ground truth.
By applying these procedures, the probability of the connection between two nodes in
same communities and the probability of connection between two nodes in different
communities remain the same with the probability we used in Step1. Besides, the
connections between communities that do not evolve together will remain the same,
which is more reasonable and more realistic for community evolution.
Step4 Adding core nodes and noise. In this step, we modify the synthetic data by setting some nodes as core nodes in communities and adding some noise when moving
nodes to new communities. To make the synthetic data more similar to the real world
case, we add noise. The noise we add in synthetic is that nodes may randomly move to
communities other than we assigned. We use a probability called “correctly migration
probability” to control the noise. The lager the correctly migration probability is, the
less the noise is, and the more stable the evolution is. Core nodes is to simulate the core
structure in communities. There are two key point about the core nodes in new synthetic
data: (1) core nodes will connect to all the nodes in the same community, and (2) core
nodes will always go to the correct communities we assigned in evolution. Because the
core nodes will move to new communities correctly, the noise we added in new
synthetic data will not affect to the core nodes.
Step5 Repeat Step2 to Step4 to get required number of graphs. After Step4,
we will get a new graph for 1st evolution, and we also assign the 2nd evolution in step2.
Repeat Step2 to Step4 will give us one more graphs and the pattern of 3rd evolution.
3.2 Measure Correctness of Evolution Detection
To judge how well Weiux’s algorithm is, we use normalized mutual information
to evaluate the community evolution detection result. Because the algorithm we used
(LM1) generate the community evolution chain instead of community evolution type,
we cannot just compare the evolution type of the detection result and the ground truth.
As a result, we need a method to calculate the precision of the evolution chain we
detected. In the algorithm LM1, LM2, and LM3, the last step of these algorithms are
the same. In the final step of these algorithms, they apply a community detection on the
bipartite graph or multipartite graph to get community evolution chain, so the evolution
chain we detected is actually the communities in these graphs. Finding communities in
graphs is a famous problem in online social network analysis. Many works manage to
find community detection algorithm and they need a method to evaluate their algorithm.
Normalized mutual information [12] is one of the common used algorithm to compare
the community detection result and the community ground truth. In Chapter 3, we know
the evolution chains are the communities detected from the bipartite or multipartite
graphs. We can therefore use normalized mutual information as the measurement of our
result.
3.3 Definition of Evolution Types
We propose our own defined evolution types in this work. Most community
evolution detection algorithms try to detect the evolution type, which is very different
from the way in Weiux’s algorithm. To compare the result with other algorithm, one
way is to generate the evolution types rather than the evolution chains. Evolution type
is much easier to read and analysis than evolution chain. The output of Long-term
Evolution Method is evolution chain so we propose a method to transform evolution
chain into evolution type. First, we define 7 types of evolution types. They are “Birth”,
“Merge”, “Split”, “Growth”, “Shrink”, “Continue”, and “Death”. The following is the
concept of each evolution type.
Birth “Birth” is the evolution type which means the community is just appear in
the present timeslot. No community is in its history. When we concentrate in what
happened to a community before, this evolution type is one of the possible case.
However, if we focus on what happens to a community next timeslot, this evolution
type will never be the answer because that will break our concept: no community is in
its history.
Merge This evolution type indicates that a community is merged form other
communities or it will merge with other communities. It depends on which timeslot we
are talking about. If we are talking about what happened to a community before, “Merge”
means this community is merge from communities in previous timeslot. If we are
talking about what will happen to a community, “Merge” means this community is
going to merge with other communities in the present timeslot and become new
communities in next timeslot.
Split This evolution means a community is split form communities in previous
timeslot or a community will split into multiple communities in next timeslot. Again, it
depends on which timeslot we are focus on.
Growth The type “Growth” is quite different from “Merge” and “Split”. “Merge”
and “Split” involve the number of communities changed from previous timeslot to
present timeslot or will change from present timeslot to next timeslot. “Growth” means
that number of nodes in communities increased when communities evolved to present
timeslot, or number of nodes in communities will increase when communities evolve
to next timeslot.
Shrink This evolution type is a contrary case of “Growth”. The number of
communities have never changed. However, number of nodes in communities
decreased when communities evolved to present timeslot, or number of nodes in
communities will decrease when communities evolve to next timeslot.
Continue This evolution type means that the number of communities have never changed, like “Growth” and “Shrink”, but number of nodes in communities remains the
same or no significant change when communities evolved to present timeslot or when
communities evolve to next timeslot. We talk about how to distinguish significant
change or not in Section 3.4.
Death This evolution type is a contrary case of “Birth”. A community is given the
evolution type “Death” as it will disappear in next timeslot and the evolution of this
community will end in the present timeslot. No other community in next timeslot is
evolved from this community. Different from the type “Birth”, this evolution type can
only happen when we talk about a community’s future. If we focus on what happens to
a community before, this evolution type will never be the possible type of this
community. No community is evolved from a “Death” community.
3.4 Method to Transfer Evolution Chain to Evolution Types
In this section, we give a method to transfer the detected evolution chain into
evolution type we defined in Section 3.3. Because most works focus on what happens
next to a community, we focus on the future of communities as well. As a result, “Birth”
will not appear in this method. In this method, we first decide whether communities are
“Death” or not. If the communities are not “Death”, then next step we see if
communities are “Merge” or “Split”. If communities are neither “Merge” nor “Split”,
we will calculate the total number of nodes in communities to judge the communities
belonging to “Growth”, “Shrink”, or “Continue”. The following is the detail of the
procedure.
Step1 Decide evolution type is “Death” or not. Given a community evolution
chain, there are many communities in the chain. If a community is in the last timeslot
of the community evolution chain, this community will be given a type “Death” since
no community is in next timeslot. For example, if there is an evolution chain “12:3,
22:4, 35:4”, the evolution type of community indexed 22 and community indexed 35
will be “Death” because there is no other community in this chain which time index is
5. And, for the community indexed 12, the evolution type of it is not “Death” because
in this chain there are two communities (community 22 and community 35) which are
in the following time slot (time index 4). If a community is not “Death”, then we go to
the next step.
Step2 Decide evolution type is “Merge” or “Split”. In this step, we will calculate
the number of communities in each timeslot in an evolution chain, and compare the
numbers between each timeslot. If the number of communities in a given timeslot is
less than the number of communities in the next timeslot, the evolution types of
communities in given timeslot will be “Split”. Otherwise, the evolution type will be
“Merge”. For example, if there is an evolution chain “12:3, 22:4, 35:4”, we can
calculate the number of communities in time index 3 and in time index 4. There is one
community whose time index is 3 in this chain, and there are two communities whose
time index is 4 in this chain. Then, we can compare the number of communities in time
index 3 and the number of communities in time index 4. The number of communities
in time index 3 is less than that in time index 4, so the community in time index 3 will
be given a type “Split”. “Merge” and “Split” are sometimes having different definition.
One may consider that only multiple communities merge into one community is called
“Merge”. However, others may think that large number of communities merging into
small number of communities can be called “Merge”. To make our transferring method
more flexible, we used a factor 𝛼 to control the rule we used to decide “Merge” and
“Split”. We used an example to show how the factor controls the decision. Given 𝑥 is
the number of communities in the present timeslot, and 𝑦 is the number of
communities in next timeslot, the possible evolution type of communities in the present
timeslot is in the Table 3-1.
Table 3-1 Rule of deciding “Merge” or “Split”
Compare 𝑥 and 𝑦 Evolution Type
𝑥 > 𝛼𝑦 Merge
𝑥 < 1
𝛼𝑦 Split
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Growth, Shrink, or Continue
If 𝛼1𝑦 < 𝑥 < 𝛼𝑦, then we go to next step to decide the evolution type is “Growth”,
“Shrink”, or “Continue”.
Step3 Decide evolution type is “Growth”, “Shrink”, or “Continue”. In this
step, we compare the number of nodes in communities. We will calculate the total
number of nodes in communities in each timeslot, and compare the numbers between
each timeslot. If the total number of nodes in a timeslot is larger than the total number
of nodes in its next timeslot, the communities in this timeslot will be given the type
“Shrink.” In contrast, if the total number of nodes in a timeslot is smaller than the total
number of nodes in its next timeslot, the communities in this timeslot will be given the
type “Growth”. Other cases besides these two will be the type “Continue”. For example,
there is an evolution chain “35:3, 36:3, 50:4, 51:4”. Given the number of communities
in time index 3 is two and the number of communities in time index 4 is also two, the
evolution type of communities in time index 3 is neither “Merge” nor “Split”. We have
to decide the evolution type of communities in time index 3. Assume the number of
nodes in community 35, 36, 50, 51 is 20, 20, 10, 10. The total number of nodes in
communities in time index 3 is 20 + 20 = 40, and the total number of nodes in
communities in time index 4 is 10 + 10 = 20. Because the total nodes in time index 3 is
larger than that in time index 4 (40 > 20), the evolution type of community 35 and
community 36 should be “Shrink.” There is a problem that one may think the
community 36 should be “Shrink.” There is a problem that one may think the