Privacy and Quality Preserving Multimedia Data Aggregation for Participatory Sensing Systems

(1)

Privacy and Quality Preserving Multimedia Data Aggregation for Participatory Sensing Systems

Fudong Qiu, Student Member, IEEE, Fan Wu, Member, IEEE, and Guihai Chen, Senior Member, IEEE

Abstract—With the popularity of mobile wireless devices equipped with various kinds of sensing abilities, a new service paradigm named participatory sensing has emerged to provide users with brand new life experience. However, the wide application of participatory sensing has its own challenges, among which privacy and multimedia data quality preservations are two critical problems.

Unfortunately, none of the existing work has fully solved the problem of privacy and quality preserving participatory sensing with multimedia data. In this paper, we propose SLICER, which is the first k-anonymous privacy preserving scheme for participatory sensing with multimedia data. SLICER integrates a data coding technique and message transfer strategies, to achieve strong protection of participants’ privacy, while maintaining high data quality. Specifically, we study two kinds of data transfer strategies, namely transfer on meet up (TMU) and minimal cost transfer (MCT). For MCT, we propose two different but complimentary algorithms, including an approximation algorithm and a heuristic algorithm, subject to different strengths of the requirement. Furthermore, we have implemented SLICER and evaluated its performance using publicly released taxi traces. Our evaluation results show that SLICER achieves high data quality, with low computation and communication overhead.

Index Terms—Participatory Sensing, Privacy Preservation, K-Anonymity, Erasure Coding.

F

1 I

NTRODUCTION

The wide application of mobile communication equipments and the fast advance of sensing technologies have led to the wide availability of privately-held, low-cost, advanced- processing, and big-storage mobile wireless devices, that are equipped with a number of embedded sensors (e.g., mi- crophone, camera, accelerometer, gyroscope, and GPS). On one hand, modern wireless communication technologies (e.g., 2G/3G/4G, Wi-Fi, and Bluetooth) make the communication between mobile devices and infrastructure, as well as between mobile devices themselves, convenient and fast. On the other hand, the mobile devices, especially smart phones, are no longer a tool only for communication, but “computers” with multifunction.

Participatory sensing [1] emerged as a new service paradigm using human-carried mobile devices, such as smart phones, for distributed data collection, exchange, analysis, and sharing.

With an estimated number of 6.8 billion mobile-cellular sub- scriptions worldwide [2], participatory sensing may provide an unprecedented spatial coverage, with very low or even no deployment cost. Compared with traditional decentralized data collection methods (e.g., wireless sensor networks), partici-

• F. Qiu, F. Wu, and G. Chen are with the Department of Computer Science and Engineering, Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, Shanghai 200240, P.R. China.

Emails: fdqiu@sjtu.edu.cn,{fwu, gchen}@cs.sjtu.edu.cn.

• F. Wu is the corresponding author.

• This work was supported in part by the State Key Development Pro- gram for Basic Research of China (973 project 2014CB340303 and 2012CB316201), in part by China NSF grant 61422208, 61472252, 61272443 and 61133006, in part by Shanghai Science and Technology fund 12PJ1404900 and 12ZR1414900, and in part by Program for Changjiang Scholars and Innovative Research Team in University (IRT1158, PC- SIRT) China. The opinions, findings, conclusions, and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies or the government.

patory sensing demonstrates several outstanding advantages, including larger coverage, lower cost, mobile capability, more sufficient energy supply, and more flexible interactive capability. Attracted by the practical and commercial value of participatory sensing, many participatory sensing applications have appeared. For instance, GreenGPS [3] provides the most fuel-efficient routes to drivers; PEIR [4] presents a personal environmental impact report for every individual; PEPSI [5]

[6] introduces a privacy enhanced infrastructure for participatory sensing system; ARTSense [7] proposes an anonymous reputation and trust mechanism for participatory sensing; and Ikarus [8] uses sensor data collected during cross-country flights via participatory sensing applications to study thermal effects in the atmosphere, and PoolView [9] gives a privacy preserving architecture for stream data collection. In addition, participatory sensing has been widely used in many practical situations [1], for instance, environment measurement, health care, traffic monitoring, community service, crowdsourcing, and so on.

However, the application of participatory sensing has a number of challenges. One of the major challenges is on privacy preservation [10]–[17]. Sensing record sent to the service provider, is usually attached with spatio-temporal tags indicating the location and time information of the data collected. However, a corrupt service provider may infer private information of the participants, such as identity, home and office addresses, traveling paths, as well as participants’ habits and lifestyles, from the sensing records. In turn, many users are reluctant to contribute any sensing record if proper privacy preservation scheme is not applied. Without sufficient number of participants, participatory sensing applications cannot guarantee their quality of services at the expected level. There- fore, designing privacy preserving schemes for participatory sensing is highly important. Another major challenge is on the variety of sensing data. Most of existing applications

(2)

of participatory sensing only collect small pieces of sensing data (e.g., temperature, velocity, and geographic location).

However, more and more newly emerged applications rely on collecting information of surrounding environment in the format of multimedia (e.g., digital image and video) [18], which result in much higher volume of sensing data. Simply applying existing privacy preserving schemes to participatory sensing with multimedia data is not satisfactory, since existing schemes either induce unacceptable amount of communication cost, or degrade the utility/quality of the data badly, in case of multimedia sensing.

In this paper, we present SLICER, which is a coding- based k-anonymous privacy preserving scheme, working on application layer, for participatory sensing with multimedia data. Intuitively, k-anonymity means that the service provider cannot identify the contributor of each sensing record from a group of at least k participants. SLICER integrates a data cod- ing technique and message exchanging strategies, to achieve strong protection of participants’ privacy, while maintaining high data quality and inducing low communication and computation overhead.

The contributions of this work are listed as follows:

• We propose SLICER for participatory sensing with multi- media data, to achieve both k-anonymous privacy preser- vation and high data quality, with low communication and computation overhead.

• We design an erasure coding based sensing record coding scheme to encode each sensing record into a number of data slices, each of which can be delivered to the service provider through the other participants or the record’s generator herself. When a proper data slice exchanging strategy is applied, the contributor of each particular sensing record is hidden in a group of at least k participants.

• We propose two kinds of strategies for slice transfer.

The first and straightforward strategy is named Transfer on Meet Up (TMU), which is to transfer a slice upon meeting another participant. The latter delivers the slice to the service provider. The second kind contains two complementary sub-optimal strategies to transfer the s- lices to a set of participants that might be met within a required period of time, minimizing the total cost while guaranteeing that the sensing record can be delivered to the service provider with guaranteed high probability, which is named Minimal Cost Transfer (MCT). The cost difference can be resulted from the wireless communication fee, available bandwidth, battery power, and so on.

• We have implemented SLICER and evaluated its performance using publicly released real traces of taxis [19].

Evaluation results show that SLICER achieves high data quality, with low computation and communication overhead.

The rest of this paper is organized as follows. In section 2, we briefly introduce some technical preliminaries, including the system model, privacy model, and design objectives. In section 3, we describe our coding-based privacy preserving scheme (SLICER), illustrate the basic rationale and detailed design processes, propose the well designed algorithms of

Service Provider

Base Station Wireless

Link

Mobile Node

Electromagnetic

Signal Temperature Location GPS

Digital Image Video Others ...

Base Station

Fig. 1. The Architecture of Cloud-Based Participatory Sensing.

slices transfer, and give the necessary analysis and proof of privacy preserving. In section 4, we present the evaluation results. In section 5, we talk about the related work and make some comparison with ours. Finally, we conclude our article and point out our potential directions of future work in section 6.

2 T

ECHNICAL

P

RELIMINARIES

In this section, we present the system model, privacy models, as well as objectives of our design.

2.1 System Model

We consider a cloud-based participatory sensing and service framework as shown in Fig. 1, in which there is a service provider and a number of mobile nodes/participants equipped with different kinds of sensors.

The service provider aggregates, classifies, analyzes, and stores sensing records reported from the participants, and provides query services based on the records. A mobile node/participant is a user carrying a portable and wireless- enabled device (e.g., smart phone, tablet, and laptop). In this paper, we use mobile node and participant interchangeably.

Participants can use their sensing devices to collect various kinds of environmental information, such as geographical location, temperature, electromagnetic signal, digital image, video, and so on. In contrast to most of the existing work, which focus on short sensor readings, we consider a participatory sensing system that adapts to multimedia information, such as digital image, audio, and video. We assume that the participants can directly report sensing records through pre-existing communication infrastructure, including GSM, 3G/4G, and Wi-Fi, or indirectly report the records with the help of the other participants.

In this paper, we consider one service provider and a set N = {a1, a2, . . . , an} of participants. Each participant ai∈ N would like to contribute her sensing records Ri={<

t1, l1, d1 >, < t2, l2, d2 >, . . .} to the service provider, only when her privacy is properly protected. The triple < t, l, d >

denotes a sensing record including timestamp, location info, and data info. To facilitate reading, the summary of the notations appeared in this paper is presented in Table 1.

(3)

2.2 Privacy Model

Although participatory sensing provides a new service paradig- m, its functionality relies on the contribution of participants.

Existing work [1], [11]–[13], [16], [17], [20]–[22] show that contributed information may be misused to reveal the participants’ privacy [23]. Most users are not willing to join participatory sensing applications, unless their sensitive information is well protected from both service provider and neighboring participants [12], [24], [25].

In this paper, we consider the problem of privacy preserving in a semi-honest model, in which the adversary correctly follows the protocol specification, but attempts to learn additional information by analyzing the transcript of messages received during the execution [20], [26]–[31]. We classify the attacks in the semi-honest model into two categories: external attack and internal attack. The external attack aims to obtain private information of participants by overhearing the message passing through the wireless communication network. Such attack can be prevented by end-to-end cryptographic schemes. Different from the external attack, designing a scheme to prevent the internal attack is much more challenging. The internal attack may come from two different kinds of entities, including the service provider and the participants.

• Service provider’s attack: The service provider has full access to the sensing records reported by the participants. It might infer considerable amount of sensitive information about the participants (e.g., home address, frequently visited places, traveling path, and even the lifestyle), if a proper privacy-preserving scheme is not provided. For instance, the sensor readings collected by a user who drives from home to work might reveal the participant’s traveling path as well as her home address.

In this work, we focus on protecting users’ location/path privacy against the service provider, while assuming that the service provider does not have other background or correlated information about participants. It is also important to consider the privacy protection of the content of multimedia data. However, it is out of the scope of this work. For interested readers, please refer to the previous literatures [32] [33] [34] for privacy processing techniques.

• Participants’ attack: Participants may receive some sensing records, when they serve as relays for other par- ticipants (e.g., in [35]). Semi-honest participants might position themselves to some critical locations in order to collect sensitive information by pretending to be relays. In this work, we assume that the participants do not collude with the service provider, and there is no collusion among different participants.

2.3 Design Objectives

The design of a privacy preserving scheme should prevent both the external and the internal attacks. Specifically, first, the design needs to prevent external eavesdroppers from obtaining any meaningful information. Second, the design needs to prevent service provider from recognizing the identity of the participant who contributes a particular sensing record,

Symbol Description

N ={a1, a2, . . . , an} The participants set

< t, l, d > An original sensing record Ri={< t1, l1, d1>, . . .} The sensing records set

m Number of encoded slices from one record

k Minimal number needed to construct record

EC(·) Erasure coding algorithms.

H(·, ·) Cryptographic hash function

rij Encoded slice

r^′_ij Encrypted slice

EN CRY P T (·, ·) Asymmetric encryption function

p(aj) Meeting probability

c(aj) Cost of ajfor delivering a slice

P Threshold possibility

xi Boolean parameter

DECRY P T (·, ·) Asymmetric decryption function

EC⁻¹(·) Decoding function

TABLE 1 Notations

and to prevent the participants from knowing the content of the relayed sensing record. Especially, we require the privacy protection scheme be k-anonymous [36] against the service provider. Here, k-anonymity is reached when the service provider can only identify a particular participant that contributes a sensing record with probability no more than 1/k.

Definition 1 (K-Anonymous Participatory Sensing): A pri- vacy preserving participatory sensing scheme satisfies k- anonymity against the service provider, if for any sensing record reported to the service provider, the service provider cannot distinguish the generator of the record from a group of at least k participants.

Besides the objective on privacy preservation, the design should also satisfy the following requirements:

• The design should maintain high quality of the sensor readings.

• The design should be tolerant of packet/message loss.

• The design can only induce low computation and communication overhead.

3 C

ODING

-

BASED

P

RIVACY

P

RESERVING

S

CHEME

In this section, we present the design of our coding-based k- anonymous privacy preserving scheme — SLICER. We first outline the general idea of SLICER, and then explain the details of each component. Finally, we analyze the privacy preservation properties of SLICER.

3.1 Design Rationale

The main idea of SLICER is to hide the generator of each sensing record among a group of at least k participants, through which all parts of the sensing record are reported to the service provider. Thus, the service provider cannot identify the generator of the original sensing record from at least k participants. We will illustrate the designing challenges and our idea in this section.

(1) Sensing Record Coding

(4)

Sensor Reading

User ID

Timestamp

Location Erasure Coding

Encoded Slice

+Tag

Encrypted Slice

Exchange Encrypted

Slice From Other User

Encrypted Slice

Reported by Others Maybe

Lost

Cryptographic Hash Function

Decryption and Reconstruction Encryption

Function Sensing

Exchange with Others

Service Provider

Fig. 2. Work Flow of SLICER

If we simply transfer the (encrypted) sensing record to k participants, then the communication overhead is k times the size of the sensing record, which is unacceptable especially when the sensing record contains multimedia data. Therefore, we incorporate erasure coding to encode each sensing record into a number of small slices. Then each of the slices can be transferred to a participant, and the latter reports the slice to the service provider. Once the service provider receives enough number of slices, not necessarily all the slices, it can decode the original sensing record. The usage of erasure coding has two advantages. One is to greatly reduce the communication overhead needed to transfer the sensing record (slices in this paper) to other participants. The other is to increase the reliability of the system, when the slices may be lost due to various reasons.

(2) Transfer Strategy

Since the slices need to be transferred to a set of participants, carefully selecting the participants to transfer to may affect the performance of the scheme. The straightforward strategy is to transfer a slice whenever another participant is met. However, when the participants in the system have different capabilities, the straightforward way may not be the best strategy. In this paper, we consider the case, in which the participants have different cost to deliver the same slice. The cost difference can be resulted from the wireless communication fee, available bandwidth, battery power, and so on. Through analysis, we also propose two sub-optimal slice transfer strategies to minimize the total cost for delivering the slices in section 3.3.2.

Fig. 2 shows the general work flow of our SLICER. Specif- ically, a sensing record contains the sensor reading and spatio- temporal information. Then, SLICER encodes the sensing record using an erasure coding technique(e.g., Tornado [37]), encrypts the encoded slices, and attaches an unique tag, to generate encrypted slices. Next, SLICER selectively transfers the encrypted slices to the target participants, following one of its transfer strategies. The slices are delivered to the service provider through different participants. Finally, the service provider decrypts the slice and reconstructs the original sensing record, when enough number of slices are received.

In the following subsections, we present the design details of SLICER’s major components, including Coding, Transferring, and Reconstructing.

3.2 Coding

Algorithm 1 Sensing Record Coding Algorithm

Input: A sensing record < t, l, d > from participant ai∈ N, and coding rate k/m.

Output: Encrypted slices{rij^′ |1 ≤ j ≤ m}.

1: {rij|1 ≤ j ≤ m} ← EC(< t, l, d >);

2: nonce← random();

3: tag = H(i, nonce);

4: for all j = 1to m do

5: r^′_ij = EN CRY P T (rij||tag, KEYpub);

6: end for

7: return {r^′ij|1 ≤ j ≤ m};

Algorithm 1 shows the pseudo-code of our sensing record coding algorithm. Given a sensing record < t, l, d > from participant ai∈ N, we encode it into a number of slices, each of which will be delivered to the service provider through different participants. We encode the record < t, l, d > using erasure coding (e.g., Reed-Solomon [38] and Tornado [37]).

Basically, erasure coding breaks a sensing record into frag- ments, expands and encodes with redundant data pieces into m slices. The original record can be reconstructed from any k out of m encoded slices, where m > k. The ratio k/m is the coding rate. Here, the combined size of any k slices is approx- imately equal to the size of the original record, according to Tornado Codes [37]. Intuitively, if the service provider decodes the record from k slices reported by k different participants, the real generator of the record is hidden in a group of k par- ticipants, which provides a privacy guarantee of k-anonymity.

Furthermore, SLICER inherits the property of loss tolerance from erasure coding to achieve high record reconstruction ratio with relatively lower communication overhead. We denote the encoded slices by {rij|1 ≤ j ≤ m}:

{rij|1 ≤ j ≤ m} = EC(< t, l, d >), where EC(·) is one of the erasure coding algorithms.

Since the service provider may receive a large number of encoded slices originating from various participants’ sensing records, we have to tag the slices to clearly indicate which slices belong to the same record. Since directly tagging a slice with its generator’s ID and a sequence number will reveal the identity privacy of the generator to the service provider,

(5)

Home

Office A

Home ^B

Home^C Home^D

A 1 2 3

T1

1 3 2 1 2

2 2 3 1

1 2 3 3

3 2

1 1 T3 T2

T1

T2

T3 B

C

D

1 2 1 2 3

1 2 3

Fig. 3. An Example of Transfer on Meet Up

we adopt a cryptographic hash function (e.g., SHA-1 [39]) to create the tag:

tag = H(i, nonce),

where H(·, ·) is a cryptographic hash function and nonce is an arbitrary number. Since the pseudo-random number generator usually takes discrete time as the seed in practice, if multiple participants happen to initialize their pseudo-random number generators at the same time, then the same sequences of numbers will be generated as the nonce, resulting in encoded slices originating from different sensing records having the same tag. This will cause failure in the process of sensing record reconstruction. Therefore, we append the participant’s ID to the randomly generated nonce, in order to eliminate the harm of nonce collision. Noting that the use of IDs in the form of plaintext reveals the participants privacy, we hash the combination of the generators ID and the nonce.

To prevent the content of encoded slices being revealed to external attacker and neighboring participants, we encrypt the encoded slices and the tag using the public key KEYpub of the service provider and get the encrypted slices:

r^′_ij = EN CRY P T (rij||tag, KEYpub), 1≤ j ≤ m, where EN CRY P T (·, ·) is an asymmetric encryption func- tion, and|| is string concatenation operation.

3.3 Transferring

To prevent the service provider from recognizing participants’

identities with the collected sensing records, not all slices of a sensing record can be directly sent to the service provider by the generator. To guarantee k-anonymity, at least k− 1 slices need to be delivered by participants other than the generator.

We note that although all the slices can be transferred to and delivered by participants other than the generator, SLICER requires the generator to report (at least) one slice to the service provider by herself, in order to guarantee the integrity of the sensing record.

In this paper, we consider two kinds of slice transferring strategies: transfer on meet up (TMU) and minimal cost transfer (MCT).

3.3.1 Transfer on Meet Up (TMU)

This is the straightforward way to spread the encrypted slices.

One slice of each sensing record is transferred, when the generator meets another participant. Later, all the participants, including the generator, report the slices(and received slices) to the service provider.

Fig. 3 shows a toy example of applying the strategy of TMU.

Assume that there is a participant A who is going to office from her home. She meets other participants B, C, and D in sequence on her way to the office. The upper part of Fig. 3 shows the path that A travels, and the lower part shows the slices each of the users hold with advance of time. Assume that A, B, C, and D initially have 3, 0, 2 and 3 slices of their own, respectively, and meetings occurs at T₁, T₂, and T3, at where a participant transfers one slice to the one met.

For example, at T1, A transfers one slice to B. After that, A has 2 slices left, and B holds 1 slice from A. Finally, after three meetings, A has 1 own slice and 2 slices from C and D, B has 1 slice from A, C has 1 own slices and 1 from A, and D has 2 own slices.

3.3.2 Minimal Cost Transfer (MCT)

In this section, we consider the case that different participants consume different costs to deliver a slice. The cost difference can be resulted from the wireless communication fee, available bandwidth, battery power, and so on. Intuitively, high cost will reduce people’s enthusiasm to participate in the sensing activities. Here, we present our algorithms for the problem of Minimal Cost Transfer (MCT).

Each sensing record has an expiration time, before which the record has to be delivered to the service provider. We assume that each participant ai ∈ N knows a set N(ai)⊂ N of par- ticipants that might be met before the expiration of the sensing record. For each participant aj∈ N(ai), let p(aj) and c(aj) be the meeting probability before the expiration time and the cost of the participant aj for delivering a slice. As we mentioned before, the cost can be resulted from the wireless communication fee, available bandwidth, battery power, and so on.

We assume that there is a mobility prediction module ( [40]–

[42]) to provide the prediction of N (ai), (p(aj))_a_j_∈N(a_i₎, and (c(a_j))_a_j_∈N(a_i₎, based on historical event logs.

The objective of MCT is to pick a subset of participants F ⊆ N(ai) as forwarders of the slices to minimize the cost for delivering the slices, satisfying one of the following requirements.

• Requirement 1: It is expected to meet at least m− 1 participants from the forwarder set F , namely MCT-EXP problem;

• Requirement 2: The (expected) probability of meeting at least m−1 participants from F is at least P (0 ≤ P ≤ 1), namely MCT-PRO problem.

Next, we will present our approaches to solve the above two problems, MCT-EXP problem and MCT-PRO problem.

Solution to MCT-EXP Problem

We first consider the MCT-EXP problem (i.e., MCT problem with requirement 1), which can be formulated as a binary program with an objective of minimizing the expected

(6)

delivery cost of the slices, as follows:

Objective:

M inimize ∑

a_j∈N(ai)

(c(aj)p(aj)xj)

Subject to:

∑

aj∈N(ai)

(p(aj)xj)≥ m − 1, (1)

xj ∈ {0, 1}, ∀aj ∈ N(ai) (2) Here, constraint (1) guarantees that the participant ai is ex- pected to meet at least other m−1 participants in the selected forwarder set F = {aj ∈ N(ai)|xj = 1}. Constraint (2) indicates the possible values of x_j. If a_j is selected to be a candidate for delivering a slice, then xj = 1; otherwise, xj= 0.

We note that the above formulation of MCT-EXP Problem can be reduced to the 0-1 Knapsack Problem [43] with an objective of maximizing the expected cost of the complimentary of the forwarder set. The re-formulated equation can be written as follows:

Objective:

M aximize ∑

a_j∈N(ai)

(c(aj)p(aj)(1− xj))

Subject to:

∑

aj∈N(ai)

(p(aj)(1− xj))≤ ∑

aj∈N(ai)

p(aj)− (m − 1), (3)

xj∈ {0, 1}, ∀aj ∈ N(ai) (4) In the reduced 0-1 Knapsack Problem, p(aj) and c(aj)p(aj) are the weight and value of the jth item, respectively, while the capacity of the knapsack is ∑

a_j∈N(ai)p(a_j)− (m − 1).

Here, constraint (3) guarantees that the sum of the weights must be less than the knapsack’s capacity. Constraint (4) is exactly the same as constraint (2). Consequently, we can have a Fully Polynomial Time Approximation Scheme (FPTAS) [43], which runs in polynomial time and is correct within 1−ϵ per- cent of the optimal solution, to solve the MCT-EXP problem.

Due to limitations of space, we refer the reader to [43] for the detailed solution.

Solution to MCT-PRO Problem

Although we can have an FPTAS solution to the MCT-EXP problem, it is still not satisfactory, because the probability of meeting m− 1 participants cannot be guaranteed at a high level. Therefore, we further consider the MCT-PRO problem, which strictly require that the probability of meeting at least m− 1 participants from the forwarder set F is at least at a preset level P . Again, we formulate the MCT-PRO problem as a binary program, which aims to minimize the average delivery cost of the m− 1 slices, as follows:

Objective: Minimize

∑

⃗

y: ∑

ag ∈N (ai)

(xgyg)=m−1

( ∑

aj∈N(ai)

(c(aj)xjyj) ∏

aj∈N(ai)

p(aj)^y^j )

∑

⃗ y: ∑

ag ∈N(ai)

(xgyg)=m−1

∏

aj∈N(ai)

p(aj)^y^j

Subject to:

∑

ag ∈N (ai)xg

∑

t=m−1

∑

⃗ y:∑

ag ∈N(ai)(xgyg)=t

∏

aj∈N(ai)

(p(aj)^y^j

·(1 − p(aj))¹^−y^j)

≥ P, (5)

xj∈ {0, 1}, ∀aj∈ N(ai) (6) Here, the numerator of objective formula calculates the total “weighted” cost of all possible combinations of m− 1 participants from a selected set of forwarders F = {aj ∈ N (ai)|xj = 1}, while the denominator denotes the total

“weight” of these combinations. The “weight” of a combi- nation of m− 1 participants here is the possibility of meeting exactly all of them by a_i. Consequently, the objective formula is to minimize the weighted-average cost for delivering the slices. Constraint (5) guarantees that ai can meet at least m− 1 participants in the selected forwarder set F = {aj ∈ N (ai)|xj = 1} with probability at least P . Constraint (6) is exactly the same as constraint (2). In the binary program, ⃗y is a binary vector with |N(ai)| bits. However, since the above binary program cannot be efficiently solved in polynomial time, we propose a polynomial time greedy algorithm, which can achieve good performance in most of the cases.

We first sort the participants in set N (ai) by p(aj)/c(aj), aj∈ N(ai) in non-increasing order β:

β : a^′₁, a^′₂, . . . , a^′_|N(a

i)|, such that

p(a_j)

c(aj) ≥ p(a_g)

c(ag),∀1 ≤ j < g ≤ |N(ai)|.

Then, we find the smallest number α of participants in the front of the ordered list β, such that the probability of meeting at least m−1 of them is at least P (i.e., constraint (5) is satisfied).

We call the last selected participant in this process as critical participant and α as critical number. The pseudo-code for finding the critical participant is shown by Algorithm 2.

In Algorithm 2, we first check whether there are enough participants (Lines 1-3). If not, then there is no feasible solution; otherwise, we use a dynamic programming-based method to find the critical participant a^′_α (Lines 4-14). In this process, we first initialize a one-dimensional array ρ for storing intermediate results (Line 4). Each element ρ[j]

(0 ≤ j ≤ |N(ai)|) means the probability of meeting j participant(s), given the first α participant(s) in the list β. Then we test the participants in list β one by one (Lines 5-9) and update the array elements up to ρ[α] (Lines 10-13), until the critical participant a^′_α is identified. If no critical participant is

(7)

Algorithm 2 Finding Critical Participant

Input: Set of participants N (ai), profile of meeting probabilities (p(aj))_a_j_∈N(a_i₎, profile of delivery costs (c(aj))a_j∈N(ai), ordered list β, and the minimal proba- bility P .

Output: Critical participant a^′_α.

1: if|N(ai)| < m − 1 then

2: return “No feasible solution.”;

3: end if

4: ρ← 0^|N(aⁱ⁾^|+1; ρ[0]← 1 − p(a^′1); ρ[1]← p(a^′1); α← 1;

5: while∑α

j=m−1ρ[j] < P do

6: if α =|N(ai)| then

7: return “No feasible solution.”;

8: end if

9: α← α + 1;

10: for g = αto 1 do

11: ρ[g]← ρ[g − 1]p(a^′α) + ρ[g](1− p(a^′α));

12: end for

13: ρ[0]← ρ[0](1 − p(a^′α));

14: end while

15: return a^′_α;

found, then return with no feasible solution (Lines 6-8). The runtime of Algorithm 2 is O(n²), where n =|N(ai)|.

Noting that having more than α participants in the front of the ordered list β, constraint (5) is always satisfied. Conse- quently, after locating the critical participant a^′_α, if any, each set with γ ∈ {α, α+1, . . . , |N(ai)|} participants in the front of the ordered list β is a feasible solution of the MCT-PRO prob- lem. So, our next job is to find the γ ∈ {α, α+1, . . . , |N(ai)|}

that minimize the objective function of the MCT-PRO problem formulation. Algorithm 3 shows our pseudo-code for selecting forwarder set F , given the critical participant a^′_α found by Algorithm 2.

Algorithm 3 maintains a two-dimensional matrix ρ to store intermediate results. Each element ρ[j][g] (0≤ j, g ≤ |N(ai)|) represents the probability of meeting g participants, under the condition that participant a^′_j is met, given the first γ participants in the list β (during the process, the position of a^′₁ and a^′_jis switched for calculating the probabilities of row ρ[j]).

After initialization (Line 1), we iterate each of the possible values of γ from 1 to|N(ai)| (Lines 2-19). For the iterations of γ from 1 to α−1, we only update the dynamic matrix ρ (Lines 3-12) without checking the average delivery cost, because the necessary number of participants has not been reached. From the iteration with γ = α on, we check the average delivery cost with m− 1 participants (Line 14), after updating the dynamic matrix ρ (Lines 3-12). If a lower average delivery cost is found (i.e., cost^′ < cost), we update the current smallest average delivery cost and its corresponding forwarder set (Lines 14- 17). Finally, Algorithm 3 returns the forwarder set F . The running time of Algorithm 3 is O(n³), where n =|N(ai)|.

Algorithm 3 can return a feasible result if there are sufficient number of meeting opportunities with other participants.

However, we note that it is possible that a sensing record generator cannot meet enough participants to transfer each of the encoded slices from a record to a different participant. In

Algorithm 3 Forwarder Set Selection

Input: Set of participants N (ai), profile of meeting probabilities (p(aj))a_j∈N(ai), profile of delivery costs (c(a_j))_a_j_∈N(a_i₎, ordered list β, and critical participant a^′_α. Output: Set of forwarders F .

1: ρ← 0^|N(aⁱ⁾^|+1,|N(aⁱ⁾^|+1; cost← MAX REAL;

2: for γ = 1to|N(ai)| do

3: for j = 1 to|N(ai)| do

4: for g = γ downto 2 do

5: if j = γ then

6: ρ[j][g]← ρ[j][g − 1]p(a^′1) + ρ[j][g];

7: else

8: ρ[j][g]← ρ[j][g − 1]p(a^′g) + ρ[j][g];

9: end if

10: end for

11: ρ[j][1]← p(a^′j); ρ[j][0]← 1;

12: end for

13: if γ≥ α then

14: cost^′ ←∑γ

j=1c(a^′_j)ρ[j][m− 1];

15: if cost^′< cost then

16: F ← first γ participants in β; cost ← cost^′;

17: end if

18: end if

19: end for

20: return F;

this case, we use the prediction model based on the history to estimate the number of encounters beforehand. For participants who do not have sufficient slice transfer opportunities, we allow them to transfer more than one slice during each meeting. Suppose h slices are transferred each time, then the record generator is hidden in⌈k/h⌉ participants.

3.4 Reconstructing

After receiving at least k slices encoded from the same sensing record, the service provider can reconstruct the original sensing record. Besides maintaining a database storing the sensing records, the service provider also keeps a table T caching slices that have not been decoded.

Algorithm 4 shows the pseudo-code of our sensing record reconstructing algorithm. Upon receiving a reported slice s, the service provider decrypts the slice using her private key KEY_priv to get the encoded slice s^′ and a tag that uniquely identifies the record it is encoded from:

(s^′, tag) = DECRY P T (s, KEYpriv),

where DECRY P T (·, ·) is an asymmetric decryption func- tion.

The service provider adds the encoded slice s^′ into the caching table T with index tag, and then check whether there are k encoded slices with the same tag. Then, the service provider checks the integrity of the k slices. If these slices pass the integrity check, service provider extracts the k encoded slices with the same tag, and then decodes the original sensing record:

< t, l, d >= EC⁻¹({¯s| < ¯s, ¯t>∈ T ∧ ¯t= tag}),

(8)

Algorithm 4 Sensing Record Reconstructing Algorithm Input: Caching table T .

Output: Each original sensing record < t, l, d >.

1: while T RU E do

2: Receive slice s;

3: (s^′, tag)← DECRY P T (s, KEYpriv);

4: Add (s^′, tag) into T ;

5: if|{¯s| < ¯s, ¯t>∈ T ∧ ¯t= tag}| ≥ k then

6: ifIntegrityCheck({¯s| < ¯s, ¯t >∈ T ∧ ¯t = tag})=true then

7: < t, l, d >← EC⁻¹({¯s| < ¯s, ¯t>∈ T ∧ ¯t= tag});

8: Remove{¯s| < ¯s, ¯t>∈ T ∧ ¯t= tag} from T ;

9: Store sensing record < t, l, d >;

10: else

11: Remove{¯s| < ¯s, ¯t>∈ T ∧ ¯t= tag} from T ;

12: end if

13: end if

14: end while

where EC⁻¹(·) is the decoding function corresponding to EC(·). Otherwise, the collected slices marked with tag are removed from the caching table.

3.5 Analysis

In this section, we show that SLICER can provide strong privacy protection against the external and internal attacks.

3.5.1 Protection Against External Attacks

The external attacker eavesdrops messages passed in the participatory sensing system, in order to collect sensitive information about particular participants. In SLICER, we employ an end-to-end cryptographic encryption scheme, such that the external attacker cannot decrypt the slices transferred among participants, as well as that reported to the service provider.

Although the external attacker may extract some information from the eavesdropped packets to uniquely identify the participant, she cannot get the concrete content of the sensing record.

Because the eavesdropped content is under the protection of the end-to-end encryption, such that the eavesdropper cannot decrypt it unless she colludes with service provider. There- fore, SLICER provides privacy protection against the external attacks.

3.5.2 Protection Against Internal Attacks

The internal attack may come from both the participants and the service provider. We distinguish two cases:

Protection against participants’ attack

Each participant may receive some slices, when she is selected as a slice deliver for participants met. Similar with the external attacker, the participant cannot decrypt the slice for delivering.

Protection against service provider’s attack

Since the service provider has full access to the sensing records contributed by the participants, she can easily infer private information about the participants, if proper privacy- preserving scheme is not provided. However, SLICER can

achieve the k-anonymity and protect participants’ privacy information against the service provider. Therefore, we can draw the following theorem.

Theorem 1: SLICER achieves k-anonymity, when there are k participants who deliver slices to the service provider.

Proof: In SLICER, we isolate the participants’ identity and the sensing records, by encoding each sensing record into m slices and letting at least k different slices be delivered to the service provider through different participants. To achieve this, we designed three different algorithms (TMC, MCT-EXP, and MCT-PRO) in section 3.3 according to different situations to select at least m participants (including the generator itself) as forwarders to transfer m slices to the service provider. Then, the original sensing record can be decoded by the service provider if and only if receiving at least k different slices.

Therefore, the identity of the record generator is hidden among a group of at least k participants.

We note that SLICER’s privacy guarantee degrades to

⌈k/h⌉-anonymity, when a sensing record generator cannot meet enough participants to transfer slices and thus has to transfer h slices during each meeting. Further, if the sensing record generator is completely isolated and cannot meet any other participant (i.e., h = k), SLICER cannot preserve the privacy on linkage between identity and location. In this case, an alternative privacy preserving scheme (e.g., [11], [21], [44]) can be applied.

4 E

VALUATION

We have implemented the SLICER and evaluated its performance on taxi traces collected from practice. In this section, we specify evaluation setups and metrics, and present evaluation results.

4.1 Setup and Metrics

Our evaluation is based on the realistic GPS mobility traces of 500 taxi cabs over 30 days in San Francisco, USA, which were collected by Cabspotting Project [19] and can be accessed from the CRAWDAD [45] website. In this real world deployment, each cab is outfitted with a GPS tracking device that is used by dispatchers to efficiently reach customers. Each cab sends a location-update triplet (timestamp, identifier, geo- coordinates) to a central server in a period varied from 30 to 60 seconds, which forms the mobility traces we used in this paper. We extend this scenario to a participatory sensing situation by assuming that the cabs are participants equipped with mobile devices.

We consider a mobile infrastructure with the whole 500 participants. We set that every participant generates one record per day, and the period of validity of the record is 24 hours.

The loss possibility of the slices varies from 0.2 to 0.4.

We evaluate the performance of SLICER using the following four metrics.

• Reconstruction Ratio: The percentage of sensing records successfully reconstructed by the service provider. This reflects the loss tolerance of SLICER.

• Communication Overhead: The total amount of data transmitted to guarantee required reconstruction ratio.

(9)

• Computation Overhead: The time consumed to process a sensing record.

• Total Transfer Cost: The sum of the cost for delivering a sensing record (i.e., m− 1 slices) to the service provider.

4.2 Evaluation Results on Reconstruction Ratio We compare the performance of SLICER implemented with the three transfer strategies proposed in Section 3 (i.e., T- MU, MCT-EXP, and MCT-PRO), with an existing privacy preserving schemes for participatory sensing, namely Simple Exchanging [35], in which the sensing records are transferred among participants as a whole without coding. We should note that we did not compare with [11], [12], [21], because the setup of these work are significantly different with ours.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

Reconstruction Ratio

Number of Participants SLICER with TMU SLICER with MCT-PRO Simple Exchanging SLICER with MCT-EXP

Fig. 4. Impact of Participant Number on Reconstruction Ratio

Fig. 4 shows the reconstruction ratios achieved by the four schemes with growing number of participants, which are selected from the public taxi trace dataset. We set the coding rate to 10/20 and the probability of slice loss to 0.2 in this simulation. To be fair, we let the four evaluated schemes have the same communication overhead, and then compare their achieved reconstruction ratios. Specifically, given that the coding rate of our three SLICER strategies is 10/20, the total size of encoded slices is doubled from the original sensing record. So, we let the Simple Exchanging scheme transfer twice for each sensing record. We can see from Fig. 4 that SLICER with TMU and SLICER with MCT- PRO perform better than Simple Exchanging, when there are sufficient number of participants (i.e., > 200 participants).

This is because SLICER inherits high loss tolerant capability from erasure coding technique. Specifically, the reconstruction ratio of SLICER with TMU, SLICER with MCT-PRO reaches 0.97 when there are 400 participants or more. In contrast, Simple Exchanging has relatively stable reconstruction ratio (about 0.86). However, we can see that SLICER with MCT- EXP performs not well, due to the fact that the MCT-EXP strategy may not guarantee the probability of meeting m− 1 participants at a high level. In addition, when the number of participants is less than 200, Simple Exchanging performs the best. This is because Simple Exchanging only needs one other participant to deliver the sensing record, while SLICER needs m−1 participants. However, Simple Exchanging cannot improve its reconstruction ratio with the help of increasing

number of participants, and loses its advantage when the number of participants grows beyond 200. Furthermore, Sim- ple Exchanging cannot provide the strong guarantee of k- anonymity. So the results of this simulation confirms that SLICER with TMU or MCT-PRO is preferred when there are sufficient number of participants in the participant sensing system.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

Number of Participants m=30 m=25 m=20 m=15

(a) TMU

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

(b) MCT-EXP

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

(c) MCT-PRO

Fig. 5. Impact of Coding Rate k/m on Reconstruction Ratio (We fix k = 10, and vary m in this evaluation.)

Then, we evaluate the impact of coding rate (k/m) on reconstruction ratio of our transfer strategies, including TMU, MCT-EXP, and MCT-PRO. The evaluation results are shown in Fig. 5. Here, we fix k = 10, and vary the value of m from 15 to 30 with a step of 5 in this evaluation. The slice losing probability is again set to 0.2. From Fig. 5, we can see that the reconstruction ratios achieved by the three transfer strategies increase with the decrement of coding rate (i.e., increment of m in the evaluation) and increment of the number of participants. Having coding rates of 10/25 and 10/30, each

(10)

of the three transfer strategies produces close reconstruction ratios, which are clearly higher than those in cases of 10/15 and 10/20. This indicates that coding the sensing record into at least 25 slices can achieve relatively good reconstruction ratio on the dataset used in our evaluation. We note that the coding ratio still need to be carefully set for different application scenarios in order to obtain high reconstruction ratios with appropriate costs.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

Number of Participants Ground Truth

±5% Noise

±10% Noise

±20% Noise

(a) MCT-EXP

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250 300 350 400 450 500

Number of Participants Ground Truth

±5% Noise

±10% Noise

±20% Noise

(b) MCT-PRO

Fig. 6. Impact of Inaccurate Mobility Prediction Module on the Reconstruction Ratios of Our Designs

Furthermore, we evaluate the impact of inaccurate mobility prediction module on the performance of our designs. In this set of evaluations, we directly add noises to the meeting probabilities generated by the mobility prediction module to make them deviate from the ground-truth prediction. Fig. 6 shows the evaluation results. By adding ±5% (±10% and

±20%) noise, we mean the meeting probabilities are randomly increased or decreased by up to 5% (10% and 20%) from their ground truth values, respectively. In this evaluation, the coding rate is set to 10/20, and the probability of slice loss is 0.2. Fig. 6(a) shows the results for MCT-EXP. We can observe that the reconstruction ratios achieved by MCT-EXP with ±5% and ±10% noise are very close to the case with ground-truth prediction. Specifically, when ±10% noise is added, reconstruction ratio is only decreased by 4.92% from the result on ground truth, given 500 participants. Only when the noise is as large as ±20%, the reconstruction ratio is decreased by 15.28% for 500 participants. Besides, the results shown in Fig. 6(b) for MCT-PRO is quite similar to those for MCT-EXP. Reconstruction ratios of MCT-PRO with±5% and

±10% noise have good approximations to that of MCT-PRO

with ground-truth prediction, while MCT-PRO with ±20%

noise suffers from 16.1% decrement on construction ratio for 500 participants. These results show that our approaches can tolerate small amount of prediction inaccuracy

4.3 Evaluation Results on Overhead

0 1 2 3 4 5 6 7

0.2 0.3 0.4

Communication Overhead (MB)

Slice Loss Probability SLICER with TMU

SLICER with MCT-EXP SLICER with MCT-PRO Simple Exchanging

Fig. 7. Communication Overhead to Achieve Reconstruc- tion Ratio of 0.99

We evaluate the communication overhead of four schemes (TMU, MCT-EXP, MCT-PRO, and Simple Exchanging) to achieve a targeted reconstruction ratio of 0.99, under different slice losing probabilities. We set the sensing record size to 1M B. Three loss probabilities are evaluated. To achieve the reconstruction ratio of 0.99, the coding rate of SLICER needs to reach 10/18, 10/21, and 10/26, when the loss probability is 0.2, 0.3, and 0.4, respectively. Similarly, we also set proper transmission redundancies for the Simple Exchanging for different loss probabilities. As shown in Fig. 7, we can see that the communication overhead of SLICER is always lower than Simple Exchanging under different losing probabilities, showing that SLICER has better loss tolerance. Although the communication overheads of the four schemes increase with the loss probability, the growth speed of SLICER is much slower. In addition, the performance of SLICER implemented with different transfer strategies has subtle differences due to the reason that participants selected by SLICER with MCT- EXP and MCT-PRO may not be met in some probability. This result confirms that SLICER can achieve low communication overhead.

1 10 100 1000 5000

10 100 1000 10000

Computation Overhead (ms)

Size of Sensing Record (KB) SLICER with MCT-PRO SLICER with MCT-EXP SLICER with TMU Simple Exchanging

Fig. 8. Computation Overhead

We also evaluate the computation overhead of SLICER with different transfer strategies (as shown in Fig. 8, which is a