Pay as How Well You Do: A Quality Based Incentive Mechanism for Crowdsensing

(1)

Pay as How Well You Do: A Quality Based Incentive Mechanism for Crowdsensing

^∗

Dan Peng, Fan Wu

^†

, and Guihai Chen

Shanghai Key Laboratory of Scalable Computing and Systems,

Department of Computer Science and Engineering, Shanghai Jiao Tong University, China

pd347@sjtu.edu.cn, {fwu, gchen}@cs.sjtu.edu.cn

ABSTRACT

In crowdsensing, appropriate rewards are always expected to compensate the participants for their consumptions of physical resources and involvements of manual efforts. While continuous low quality sensing data could do harm to the availability and preciseness of crowdsensing based services, few existing incentive mechanisms have ever addressed the issue of sensing data’s quality. The design of quality based incentive mechanism is motivated by its potential to avoid inefficient sensing and unnecessary rewards. In this paper, we incorporate the consideration of data quality into the design of incentive mechanism for crowdsensing, and propose to pay the participants as how well they do, to motivate the rational participants to perform data sensing efficiently.

This mechanism estimates the quality of sensing data, and offers each participant a reward based on her effective contribution. We also implement the mechanism and evaluate the improvements in terms of quality of service and profit of service provider. The evaluation results show that our mechanism achieves superior performance when compared to the uniform pricing scheme.

Categories and Subject Descriptors

C.2.1 [Computer-Communication Networks]: Network Architecture and Design

†F. Wu is the corresponding author.

∗This work was supported in part by the State Key Devel- opment Program for Basic Research of China (973 project 2014CB340303), in part by China NSF grant 61422208, 61472252, 61272443, and 61133006, in part by CCF-Intel Y- oung Faculty Researcher Program and CCF-Tencent Open Fund, in part by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, and in part by Jiang- su Future Network Research Project No. BY2013095-1-10.

The opinions, findings, conclusions, and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies or the government.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

MobiHoc’15,June 22–25, 2015, Hangzhou, China.

Copyright c 2015 ACM 978-1-4503-3489-1/15/06 ...$15.00.

http://dx.doi.org/10.1145/2746285.2746306.

Keywords

Crowdsensing; Incentive Mechanism; Quality Estimation;

Maximum Likelihood Estimation; Information Theory

1. INTRODUCTION

Crowdsensing is a new paradigm of applications that en- ables the ubiquitous mobile devices with enhanced sensing capabilities to collect and to share local information toward- s a common goal [5, 12]. In recent years, a wide variety of applications have been developed to realize the potential of crowdsensing throughout everyday life, such as environmental quality monitoring [2, 3], noise pollution assessment [16, 24], road and traffic condition monitoring [19, 31], road-side parking statistics [18, 21], and indoor localization [23,33]. However, the success of crowdsensing based services critically depends on sufficient and reliable data contributions from individual participants.

Sensing, processing, and transmitting data in crowdsensing applications requires manual efforts and physical resources.

Therefore, appropriate rewards are always expected to compensate the owners of task-taking mobile devices. These owners, or say participants in the literature of crowdsensing, are commonly assumed to be rational, and would not make contributions unless there are sufficient incentives. Although researchers have proposed a number of incentive mechanisms for participation in crowdsensing [9, 11, 13, 15, 32, 36, 39, 40], they have not fully exploited the connection between quality of sensing data and rewards for contributions.

Sensing data of high quality, based on which the crowdsensing service provider aggregates and extracts information for accurate decision making and attentive service providing, is fundamentally important. In crowdsensing, quality of sensing data can be affected by the characteristics of mobile sensors, the clarity of task instructions, as well as the expertise and willingness of individual participants [14, 20, 25]. Particularly, participants with different spatial- temporal contexts and personal effort levels are likely to submit sensing data of diverse quality. Furthermore, rational participants tend to strategically minimize their efforts, while doing the sensing tasks, and thus may degrade the quality of sensing data. For example, careless or indifferent submissions are always found in crowdsensing based noise monitoring applications. When asked for environmental sound heard of neighborhood, a participant may perform the sensing tasks through a mobile device placed inside her pocket, rather than carefully taking out the device to sense accurately. Such a low quality submission would invalidate the estimation of noise pollution.

(2)

Continuous low quality sensing data undoubtedly do har- m to the availability and preciseness of crowdsensing based services. However, to the best of our knowledge, few existing works have taken the observation of data quality in- to consideration, when designing incentive mechanisms for crowdsensing. It is very challenging to design quality based incentive mechanisms for crowdsensing.

Most of all, it is technically difficult to estimate the quality of sensing data without any prior knowledge of the sensing behavior of individual participants or the ground truth of targeted contexts. Subsequent quality verification would require significant investments in deploying particular infrastructures to do on-site sensing and ground truth collecting, like Model 831-NMS permanent noise monitoring system [4].

Lacking in flexibility and scalability, the deployment of traditional static sensing infrastructures, in turn, negates the necessity and benefits of crowdsensing.

Second, it is challenging to design incentive mechanism- s that achieve both individual rationality and profit maximization. Here, individual rationality means that a participant should be rewarded no less than her sensing cost, and the profit of service provider is the differences between the value of crowdsensing based services and the total rewards to participants. Deliberate incentive mechanisms are required to motivate effective data contributions from rational participants, and to maintain a robust, profitable market for crowdsensing service provider.

Third, it is nontrivial to bridge the gap between quality of sensing data and rewards for contributions. Participants of crowdsensing, who perform the sensing tasks with heteroge- neous physical resources and manual efforts, and therefore submit sensing data of diverse quality, may require appropriate rewards according to their contributions. While traditional uniform pricing scheme is unfair, the Pay-as-Bid pricing method used in most of the auction based incentive mechanisms is somehow troublesome for participants and in- dulgent of careless behavior. Both of these existing solutions ignore the quality issue, and thus are unlikely to encourage long-term, effective contributions.

In this paper, we incorporate the consideration of data quality into the design of incentive mechanism for crwod- sensing, and propose to pay the rational participants as how well they do, to motivate efficient crowdsensing.

Our main contributions are listed as follows.

• To the best of our knowledge, we are the first to design a quality based incentive mechanism for crowdsensing that directly motivates individual participants to con- tribute high quality sensing data.

• Second, we extend the well-known Expectation Maxi- mization algorithm that combines maximum likelihood estimation and Bayesian inference to estimate the quality of sensing data, and further apply the classical In- formation Theory to measure the effective contribution of sensing data. Based on the estimated quality and contribution, we can determine fair and proper rewards to the participants. The proposed incentive mechanism achieves individual rationality and profit maximization.

• Finally, we implement the incentive mechanism and extensively evaluate its performance. Our evaluation results show that it achieves superior performance in terms of quality assurance and profit management, when compared to the uniform pricing based scheme.

The rest of the paper is organized as follows. We briefly review related work in Section 2 and present technical preliminaries in Section 3. The detailed design of our quality based incentive mechanism is presented in Section 4. In Sec- tion 5, we evaluate our incentive mechanism and show the results. Finally, we conclude this paper in Section 6.

2. RELATED WORK

The problem of data quality has been widely studied in organizational databases and information systems [7, 29], mainly focusing on quality category, attribute and contex- tual pattern from a perspective of data consumer. Sachi- dananda et al. [27] surveyed building blocks and existing approaches related to quality of information in wireless sensor networks.

For the paradigm of crowdsensing, Reddy et al. [26] developed a recruitment framework to identify and select well- suited participants to achieve high utility within a budget.

Three stages are introduced, including qualifier, assessment, and progress review. Though reputation system based on performance has been built, this work focuses on the record of participation likelihood (whether the participant would take a sensing task when given a chance). In addition, it shows little consideration about incentives. On the other hand, although various empirical experiments [17, 20, 25, 35]

demonstrate that financial and social incentives do have an impact on the performance of participants, such as engagement, compliance and quality, they fail to generalize an incentive model to adaptively guide the participants’ behavior.

There are extensive researches targeting the incentive mechanism design for crowdsensing. Lee and Hoh [13] proposed a reverse auction based dynamic pricing scheme to motivate participants to sell their sensing data with claimed bids.

Yang et al. [32] considered a platform-centric incentive model, where the reward is proportionally shared by participants in the a Stackelberg game, and a user-centric incentive model, where participants in the auction bid for tasks and get paid no lower than the submitted bids. Koutsopoulos [11]

designed an incentive mechanism to determine participation level and payment allocation to minimize platform’s compensation cost with guaranteed service quality. Zhang et al.[37] and Zhao et al. [40] suggested online incentive mechanisms to flexibly recruit participants who appear opportunis- tically in the phenomena of interests. Luo et al. [15] studied an incentive mechanism based on all-pay auction with real- istic constraints such as information asymmetry, population uncertainty, and risk aversion. Kawajiri et al. [10] deployed a crowdsensing based wireless indoor localization system, and steered participants to cover sufficient locations to improve the quality of service. There is no skill variance and device variance in their system. In general, these existing incentive mechanisms either have not considered the quality of sensing data or have addressed the incentive concerns and quality issues separately. Moreover, few of them have investigated the method to estimate the quality of sensing data.

In contrast, we systemically consider the participants’ willingness to take a sufficient amount of efforts in crowdsensing, and bridge the gap between quality of sensing data and rewards for contributions, by providing a quality based incentive mechanism. The quality estimation method applied in this paper is originally introduced by Dawid and Skene [6], where the expectation maximization (EM) algorithm is used to obtain maximum likelihood estimates of observers’ error

(3)

service subscriber

2-1. query

2-2. service

service provider quality profit budget coverage

1-1. sensing task 1-2. reserve price 1-3. reward 1-4. sensing data

participants embedded

sensors

Figure 1: A general crowdsensing model.

rates and to infer the true response of patients. Wang and Ipeirotis [30] applied EM algorithm to estimate the quality of crowdsourced labeling workers. Zhang et al. [38] proposed to combine spectral methods and EM algorithm to address the problem of crowdsourced multi-class labeling with an optimal convergence rate up to a logarithmic factor. Be- yond quality estimation, we also quantify the contribution of sensing data via information theory, and determine fair and proper rewards to participants.

3. PRELIMINARIES

In this section, we present the model of quality based crowdsensing, and key techniques for quality estimation.

3.1 Crowdsensing Model

As illustrated by Figure 1, there are three major components in the crowdsensing system, i.e., service subscribers who request services, a service provider who conducts the crowdsensing campaign and provides services, and a crowd of participants who submit sensing data to support the services. The crowdsensing process (the right part) can be dis- cribed as follows. First, the service provider releases a set T of sensing tasks (e.g., noise sensing on campus at 10:00 am) with an incentive announcement and a quality requiremen- t (e.g., an error threshold). In the phenomena of interest, there is a set A = {a1, a2, . . . , an} of participants, with sensors embedded in their mobile devices. Each participant ak∈ A bears a private reserve price/sensing cost ck(i.e., a monetary value for her consumptions of physical resources and involvements of manual efforts), and thus expects a reward for her contribution. Without sufficient rewards, the participants may not undertake the sensing tasks. The service provider estimates the quality qkof sensing data from each participant ak. By taking the profile of the participants’ data quality and sensing costs into consideration, she selects a subset W ⊆ A of participants to perform each sensing task, and rewards each ak∈ W a certain amount of reward rkaccording to her effective contribution. After collecting the sensing data for some tasks, the service provider updates quality estimation qk for each ak ∈ W to guide the next round of recruitment (the right part), and extracts information to provide services (the left part).

We consider a general class of crowdsensing applications, in which the availability and preciseness of services signifi-

cantly depends on the quality of sensing data, e.g., urban noise pollution monitoring, which measures ambient noise pollution based on sensing data collected from mobile devices. For each piece of sensing data with an error below the specified threshold, the service provider gains a value V (e.g., the subscription fee from service subscribers). For simplicity, we assume that V is fixed. The objective of the service provider is to maximize her own profit, by providing services with guaranteed quality, and recruiting participants with proper rewards. The profit is defined as

Profit, X

ak∈W

(V − rk).

In this paper, we focus on the data quality that is specifically affected by participants’ effort levels for sensing, and aim at designing incentive mechanisms for the service provider to stimulate high quality sensing and long-term, effective contributions.

3.2 Quality Estimation via EM

For crowdsensing, e.g., urban noise sensing, it is reasonable to calibrate the sensing data, to tolerate the inherent uncertainty of mobile devices. Here, we divide the reading of sensing data into discrete intervals, and suggest the service provider to deliver a certain interval to the service subscribers, rather than an accurate reading, to mitigate the impact of device variance and device error. The discrete intervals are denoted as a set D = {d1, d2, . . . , dm}, where each interval spans over a range of decibels, and the granularity of interval division can be determined by the tradeoff between accuracy and complexity.

Regarding the quality of sensing data as a result of the effort levels, we estimate “effort matrix” e^k for each participant ak, and map this effort matrix into a scalar quality value through function qk = g(e^k). Here, the effort matrix e^k is an m × m matrix, with element e^kij ∈ [0, 1], i = 1, . . . , m, j = 1, . . . , m, indicating the probability that participant ak submits a piece of sensing data in interval dj while the true reading is in interval di. Particularly, {e^kii|i = 1, . . . , m} contains the probabilities that participant akobediently performs outside-pocket sensing for each of the m possible cases. Furthermore, the conditional probabilities satisfyP

je^kij= 1.

(4)

We note that, the effort matrix can be measured when we have ground truth for all spatial-temporal contexts. Howev- er, for crowdsensing, the true reading, or even the interval, cannot be ascertained in most cases, making the direct verification of data quality and the discernment of effort matrix challenging. In this paper, we resort to the well-known expectation maximization (EM) algorithm [6] to estimate each participant’s effort matrix.

The EM algorithm [6] is an iterative method for finding the Maximum Likelihood Estimation (MLE) of the parameters (e.g., the effort matrix for each participant, and the true noise interval for each task), when there is missing data (e.g., the indicators to tell right or wrong for sensing data) that precludes the straightforward estimation for the parameter- s. Here, MLE calculates the best estimation for parameters that maximizes the (log-)likelihood of the observations (e.g., the submitted sensing data), and converges in probability to the true value of the unknown parameters when the number of observations is sufficiently large.

Given a set S of observed sensing data, a set P of missing true interval indicators, a set E of unknown effort matrices, and the density function f , the likelihood of unknown E is

L(E; P, S) = f (P, S|E).

To find the MLE of E, EM algorithm iteratively runs the following two steps until convergence (Supposing that bE^tis the current value of E after t iterations).

E-stepcalculates the expected value of likelihood function, with respect to the conditional distribution of P given observation S under the current estimation of E,

Q(E| bE^t) = EP|S, bE^t[L(E; P, S)].

M-stepseeks the estimation bE that maximizes the expectation function,

Eb^t+1= arg max

E

Q(E| bE^t).

Inspired by [6], we extend the algorithm to estimate the true interval indicators and participants’ effort matrices, by iterating the following two steps until convergence: 1) estimate the effort matrix and noise interval distribution via maximum likelihood estimation, based on the estimated true interval indicators; and 2) calculate new estimation of true interval indicators, according to the estimated effort matrices and noise interval distribution.

The converged estimation of participant’s effort matrix indicates the quality of sensing data, while the noise interval distribution is suggestive of the noise pollution level.

4. QUALITY BASED INCENTIVE

In this section, we detail the design of our quality based incentive mechanism for crowdsensing. To pay each individual participant akas how well she does in sensing, we estimate her effort matrix e^k, calculate her quality qkof sensing data, quantify her effective contribution cm(qk), and offer a proper reward rk. Taking the quality of sensing data into consideration, our incentive mechanism can encourage long-term, effective contribution for crowdsensing based services.

4.1 A Simple Case

We first regard all of the submitted sensing data as qualified, and present a simple pricing scheme. We assume that

Table 1: Key notations Notation Definition

T Set of sensing tasks A Set of participants

D Set of discrete noise intervals

At Set of participants who complete task t ∈ T T^k Set of tasks that ak∈ A performs

S Set of observed sensing data

P Set of missing true noise interval indicators E Set of unknown effort matrices

L(E; P, S) Likelihood function of E e^k Effort matrix of ak

e^kij Probability that aksubmits data in interval dj while the true interval is di

Π Noise interval distribution

p^t True noise interval indicator for task t p^ti Probability of task t with true noise inter-

val being di

d^kt Noise interval that ak’s sensing data for task t falls into

I(d^kt = dj) Indicator function for the event d^kt = dj

qk Quality of ak’s sensing data

cm(qk) Effective contribution of sensing data of estimated quality qk

ck Reserve price/sensing cost of ak

rk Reward to akfor her contribution V Value gained from qualified sensing data r^∗ Optimal quality based reward

r^u Optimal uniform reward

the sensing cost for all of the participants follows a probability distribution, with a probability distribution function f (ck), and a cumulative distribution function F (ck).

A rational participant akwill not do a given sensing task unless she gets a reward r ≥ ck. Therefore, the profit of service provider, which is defined as the difference between value V gained from the sensing data, and the reward r to participant ak, where V ≥ r, is formulated as

Profit(ck, r) =

(0, r < ck, V − r, r ≥ ck.

While the distribution of ck is independent of value V and reward r, the expected profit can be calculated as

Profit(r) = Z ∞

0

Profit(ck, r)f (ck)dck

= Z r

0

(V − r)f (ck)dck= F (r)(V − r).

Therefore, the service provider can maximize her profit by taking the first derivative of the function Profit(r), solving the following equation, and getting the optimal reward,

r^∗= V −F (r^∗) f (r^∗).

4.2 Quality Estimation

In practice, due to their various effort levels, different participants may submit sensing data of diverse quality. In this subsection, we extend the Estimation Maximization algorithm to estimate the effort matrix e^k for each participant

(5)

ak, and then estimate the quality of her sensing data as qk= g(e^k).

Specifically, we denote the set of participants that submit sensing data to task t as At⊆ A, and the set of tasks that participant ak performs as T^k ⊆ T. For task t ∈ T^k, the true noise interval is denoted as d⁰t, while the interval into which participant ak’s sensing data falls is denoted as d^kt. An indicator function I(d^kt = dj) (i.e., I(d^kt = dj) = 1 when event d^kt = dj is true; otherwise, I(d^kt = dj) = 0) is applied to describe the submission of sensing data.

We assume that the effort levels of participants are independent, and do not change for a period of time. So that we can periodically learn the effort matrix e^k for each participant ak, and put this knowledge into practice. Without the true interval indicators, i.e., p^t = {p^ti|i = 1, . . . , m}

for each task t (p^ti = 1 if d⁰t = di for sure) is unavailable, we resort to the EM algorithm that combines Maximum likelihood estimation and Bayesian inference to iteratively estimate the unknown effort matrix e^k and noise interval distribution Π = {πi|i = 1, . . . , m}.

The pseudo-code of this expectation maximization algorithm is shown in Algorithm 1, which runs as follows.

(1) Initialization: For each task t, the probability distribution of true noise interval indicator p^tis initialized as:

p^ti= p(d⁰t= di) = P

ai∈AtI(d^kt = di)

|At| .

(2) Estimation of effort matrix and noise interval distribution: Given the likelihood function

L(E; P, S) = f (P, S|E), and

L(E; S) = f (S|E) =X

P

f (P, S|E),

where E = {e^k|ak∈ A}, P = {p^t|t ∈ T}, and S = {d^kt|t ∈ T, ak∈ At}, the maximum likelihood estimate of E makes the observation S most likely to happen.

We note that the effort matrix e^k for each participant akfollows the Multinomial Distribution. When participant akperforms n^ki independent tasks with true interval di, her sensing data for these tasks falls into interval djwith probability e^kij, where e^kij≥ 0 andP

je^kij= 1, j = 1, . . . , m. Let n^ki1, . . . , n^kimbe the number of submissions corresponding to interval d1, . . . , dm, respectively. Then we haveP

jn^kij= n^ki, and the likelihood function of e^ki,

f (n^ki1, . . . , n^kim|e^ki1, . . . , e^kim) = n^ki! Qn^k_ij!

Y(e^kij)ⁿ^k^ij.

Taking the log-likelihood, Lagrange multipliers, and deriva- tives, we get the most natural estimates,

b e^kij=n^kij

n^k_i = P

t∈T^kp^tiI(d^kt = dj) P

t∈T^kp^t_i , j = 1, . . . , m.

The noise interval distribution is estimated as πbi=

P

t∈Tp^ti

|T| , i = 1, . . . , m.

(3) Estimation of true noise interval indicator: Given the sensing data S, the effort matrix E, and the noise interval distribution Π, we apply the Bayesian inference to estimate

Algorithm 1:Effort Matrix Estimation

Input: A set S = {d^k_t|t ∈ T, ak∈ At} of observations.

Output: Estimation of effort matrix E, marginal distribution of noise interval Π, and posterior estimation of true noise interval indicators P . // Initialization of True Noise Interval Indicator 1 foreacht ∈ T do

2 cnt ← 0; cnt← 0;

3 foreachak∈ Atdo

4 i ← d^k_t; cnti← cnti+ 1; cnt ← cnt + 1;

5 foreachdi∈ D do 6 p^t_i← cnti/cnt;

7 whilenot converged do

// Estimation of Effort Matrix 8 foreachak∈ A do

9 cnt← 0; e^k← 0;

10 foreacht ∈ T^kdo 11 j ← d^k_t;

12 foreachdi∈ D do 13 e^k_ij← e^k_ij+ p^t_i; 14 cnt_i← cnt_i+ p^t_i;

15 foreachdi∈ D do 16 foreachd_j∈ D do

17 e^k_ij← e^k_ij/cnti;

// Estimation of Noise Interval Distribution 18 foreachdi∈ D do

19 πi← 0;

20 foreacht ∈ T do

21 πi← πi+ p^t_i;

22 π_i← π_i/|T|;

// Estimation of True Noise Interval Indicator 23 foreacht ∈ T do

24 smp ← 0; p^t← 1;

25 foreachdi∈ D do 26 foreachak∈ Atdo

27 j ← d^k_t;

28 p^t_i← p^t_ie^k_ij; 29 smp ← smp + πip^t_i; 30 foreachdi∈ D do 31 p^t_i← π_ip^t_i/smp;

32 ReturnE = {e^k|ak∈ A},Π = {πi|i = 1, . . . , m}, P = {p^t|t ∈ T};

the true noise interval indicator P . Considering the n independent observations {S¹, . . . , Sⁿ} of sensing data from individual participants, where Sⁱ= {dⁱt|t ∈ T}, i = 1, . . . , m, we have

p(P |S) =p(P )p(S|P )

p(S) = p(P )p(S¹|P ) . . . p(Sⁿ|P )

p(S) .

When all terms not involving the true noise interval indicator are absorbed into the proportionality sign, we calculate the distribution of true noise interval indicator according to

p^ti= πiQ

ak∈At

Q

j(e^kij)^I(d^k^t^=d^j⁾ P

qπqQ

ak∈At

Q

j(e^k_qj)^I(d^k^t^=d^j⁾, i = 1, . . . , m.

(4) Convergence: We iterate step 2 − 3 until the two estimates converge, i.e., | bE^t+1− bE^t|<ε, | bP^t+1− bP^t|<η, ε>0, η>0.

(6)

For each iteration (the while loop), the computation complexity is polynomial as O(|A||T||D|).

We claim that the EM algorithm increases the likelihood function in each iteration, and finally converges to a stable estimation. To circumvent the problem of getting trapped in a local optimum, we try different initializations for several executions of the algorithm. Although it is hard to provide theoretical guarantee for its performance, the EM algorithm has been widely used, and a provably optimal convergence rate up to a logarithmic factor has been shown in [38].

With the estimation for effort matrix e^k, we can get the quality of ak’s sensing data through the mapping function.

For simplicity, we focus on pure obedience, and set qk = g(e^k) =P

ie^kii/m. With the estimation for distribution of true noise interval indicator p^t = {p^t1, p^t2, . . . , p^tm} for task t, the interval d^∗i to be delivered is the one with maximum possibility, i.e., d^∗i = arg max

i p^ti.

4.3 Contribution Quantification

Various analyses and experiments have confirmed that expert work can be accomplished by the local crowd, even if they are lack of expert knowledge. However, the contribution of each individual participant remains unknown.

Here, inspired by ideas in Information Theory and Shan- non’s Channel Coding Theorem [28, 34], we quantify the participants’ contributions through information uncertainty reduction.

We regard the right part of crowdsensing system (Fig- ure 1) as a signal transmission system (Figure 2), where the input signal X is the sensing data provided by the participants, and correspondingly, the output Y is the information that service provider extracts from the sensing data. Trans- mitted through the channel, an input signal may be distort- ed in a random way depending on the channel condition, and thus the output signal may be different from the input signal. Here, the quality qk ∈ [0, 1] of the sensing data is expressed by a noise variable Z (independent of X) on the transmission channel, where P r(z = 0) = qkindicates that the output signal is equal to the input signal with probability qk, and P r(z = 1) = 1 − qkindicates that an error occurs with probability 1 − qk.

perfect sensing data

X

quality restriction

Z

received information

Y

Figure 2: A discrete channel (α, Z), where Y = α(X, Z).

We assume that, in the signal transmission system, the input signal is perfect but interfered by the noisy channel with probability 1 − qk. Thus, the output signal is equivalent to information extracted from sensing data of quality qk. Sim- ilar to the capacity of a noisy channel [34], the contribution

of the sensing data can be expressed as mutual information, I(X; Y ) = H(Y ) − H(Y |X)

= H(Y ) −X

x

p(x)H(Y |X = x)

= H(Y ) −X

x

p(x)hb(qk)

= H(Y ) − hb(qk),

where H(Y ) is entropy of Y , H(Y |X) is the conditional entropy of Y given X, and hb(qk) is a binary entropy for the binary random noise Z with distribution {qk, 1 − qk}.

Intuitively, when no sensing data is submitted, all the m optional intervals are equally likely to be observed with probability 1/m, making the uncertainty maximal at

H(Y ) = −X

m

(1/m) log(1/m) = log(m).

Given the sensing data, the information uncertainty is re- duced to be

hb(qk) = −q log(qk) − (1 − qk) log(1 − qk).

Generally, if Z is not a binary random variable, but distributed with qkin the correct interval and equal probability (1 − qk)/(m − 1) for each of the rest intervals, then the information uncertainty is calculated as

hm(qk) = −qklog(qk)−X

m−1

((1 − qk)/(m − 1)) log((1 − qk)/(m − 1)).

Therefore, the effective contribution of sensing data of quality qk, can be formulated as

cm(qk) = log(m) + qklog(qk) + (1 − qk) log((1 − qk)/(m − 1)).

With the convention 0 log 0 = 0, sensing data of quality qk = 1 will result in minimal uncertainty, hm(1) = 0, and maximal contribution, cm(1) = log(m). Though a binary channel which never makes errors and one always makes errors are equally good for communication, we only consider and reward sensing data of quality within the range [0.5, 1].

Practically, with the same volume, sensing data of high quality carries larger amount of constructive information than that of low quality. Specifically, the high quality data contains intrinsic efficiency, while the low quality data need- s extra information, functioning like error-correcting code (ECC), to detect and/or correct errors without resubmis- sion. In crowdsensing, such kind of error correction, is more often conducted in the form of verification, such as recruiting another group of participants or sensing another kind of data (i.e., light signal to determine if the device is out of pocket). Here, we elide the specific ECC and focus on its cost (i.e., accounting for a part of the data volume), and quantify the effective contribution of sensing data as the information uncertainty reduction.

4.4 Reward Distribution

In this subsection, we take a step further and reward the participants proportionally to their quantified contributions, i.e., r(qk) = rcm(qk), where r is a benchmark reward. Not- ing that we can learn the distribution f (ck, e^k) asymptoti- cally, we assume that the distribution is common knowledge.

We adjust parameters of the simple case. From participant akwith an effort matrix e^k, the profit that the service

(7)

40 45 50 55 60 65 70 75 80

0 50 100 150 200 250 300 350

Noise Reading (DB)

Time Slot Outside

Inside

(a)

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

Node 1 Node 2 Node 3

(b)

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

(c)

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

(d)

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

(e)

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

(f)

Figure 3: Accuracy of noise pollution monitoring with different effort levels of participants. (a)General noise reading differences between outside pocket sensing and inside pocket sensing; (b)-(f) Noise readings of ground truth (Node 1) and from the 10 participants (Node 2-11).

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

Ground Truth QM

(a) Quality Measured Model

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

Ground Truth MV

(b) Majority Voting Model

40 45 50 55 60 65 70 75 80 85 90

6000 12000 18000 24000 30000 36000 42000

Noise Reading (DB)

Time Slot

Ground Truth AA

(c) All and Average Model Figure 4: Comparison of monitoring accuracy of different models.

provider gains from the sensing data is

Profit(ck, e^k, r)=

(0, rcm(g(e^k)) < ck, V − rcm(g(e^k)), rcm(g(e^k)) ≥ ck. Then, the quality-based optimal reward is determined by

r^∗= arg max

r

Profit(r)

= arg max

r

Z

e^k

Z ∞ 0

Profit(ck, e^k, r)f (ck, e^k)dckde^k. For simple joint distribution f (ck, e^k) of sensing cost and effort matrix, the optimal reward r^∗ can be calculated by solving the integral equation and taking the derivation of r.

However, for complex cases, greedy algorithms can find the proper reward with approximate profit more efficiently.

5. EVALUATION RESULTS

In this section, we conduct simulations to evaluate performance of our quality based incentive mechanism. We first analyze the improvement in quality assurance. Then, we compare our quality based reward mechanism to the unifor- m pricing scheme, and illustrate the superior performance in profit management.

5.1 Quality Assurance

We install NoiseTube mobile app [1] on Google Nexus 7, and use the embedded acoustic sensor to measure noise in a meeting room. We recruit 10 participants to take part in the experiment, each of which carries a nexus and randomly puts it into his/her pocket or on the table. The participants are well told that accurate monitoring occurs when they put out the nexuses and keep them undisturbed.

The basic experiment is to test whether the participants’

effort levels will effect the noise readings. As Figure 3(a) shows, the noise reading from a muffled microphone inside pocket is at least 5dBs lower than that of outside pocket sensing. The sensing data submitted by participants, as shown in Figure 3(b)-Figure 3(f), also presents such reading differences, based on which we can roughly tell the effort levels of participants, i.e., node 10 is sensing with the highest effort level and submits almost perfect readings; node 7, node 8, node 9 and node 11 are 85% accurate with high effort levels at most of the time; node 2 and node 4 are helpful with 70% accuracy; node 6 is careless with high accuracy at first and then gradually slacks off; node 5 is indifferent with half accuracy and the other half deviation intermittently;

and node 3 is sensing with the lowest effort level with all readings lower than ground truth.

(8)

0 20 40 60 80

0 1 2 3 4

0.5 0.6 0.7 0.8 0.9 1.0

Reserve Price

Effort Level (e_ii) (a) No correlation ρ = 0.0

0 20 40 60 80

0 1 2 3 4

0.5 0.6 0.7 0.8 0.9 1.0

Reserve Price

Effort Level (e_ii) (b) Positive correlation ρ = 0.8

Figure 5: Joint distribution of sensing cost and effort matrix.

Given the reading differences, we compare the quality assurance, i.e., the overall monitoring accuracy, as a collective work from the crowd, in our quality measured model (QM), traditional majority voting model (MV), and all and average model (AA). The difference is: QM excludes sensing data with low quality (i.e., with accuracy less than 50%) and assigns quality-estimated data with different weights; MV selects the most frequent noise interval at first, and then calculates the noise reading averagely; and AA takes in all submissions and figures out the average reading.

Results, as shown in Figure 4, indicate that QM outperforms the other two models, in monitoring the noise pollution more accurately (i.e., the readings keep closely to the

ground truth), and more robustly to the efforts fluctuation of participants, especially when careless and indifferent participants take up more than half of the whole population.

Furthermore, the MV model may direct the monitoring into a fierce fluctuation when the noise interval is highly precise, which is 5dBs per interval in our setting. Despite a similar trend with QM, the AA model is more vulnerable to large amount of low quality submissions.

5.2 Profit Management

To test the performance of our quality based incentive mechanism in terms of profit management, we first generate the sensing costs and effort matrices for participants, and then compare the profit of our mechanism to that of the uniform pricing scheme.

We draw vcand ve from a bivariate normal distribution, (c, e) ∼ N (µ1, µ2, σ1², σ²2, ρ), where µ1 = 2.0, µ2 = 0.75, σ1 = 1.0, σ2 = 0.125, and ρ = 0.0 is set to indicate that there is no correlation between sensing cost and effort matrix (Figure 5(a)), or ρ = 0.8 for a strong positive correlation (Figure 5(b)). According to the 68 − 95 − 99.7 rule/3σ rule [22], the 95.45% confidence interval is µ ± 2σ, which empirically states that about 95.45% data drawn from the normal distribution lies within [0.0, 4.0] × [0.5, 1.0] in our setting. Then, we transform vc and ve to ck and e^k correspondingly by setting ck = max(−0.5, min(vc, 4.5)) and e^kii = max(0.45, min(ve, 1.05)), i = 1, . . . , m. Therefore, the extreme data is excluded and the rest majority approxi- mately follows the same normal distribution. Notably, other forms of distribution are also experimentally possible, and the exact joint distribution needs to be carefully estimated in practical crowdsensing markets [8].

After getting the joint distribution, we compare the profit of our quality based incentive mechanism and the uniform pricing scheme, where the profit is defined as the difference between the value gained from the sensing data and the rewards for contributions. We note that, the quality based incentive mechanism gains a full value V from sensing data by providing an error-bounded service, and offers each participant a proper reward based on her effective contribution.

In the uniform pricing scheme, the sensing data is regard- ed equally with the same quality, and the participants are offered the same reward,

r^u= max

ak∈At

ck.

However, the gained value is restricted by the actual quality qkof sensing data, i.e., vk= y(V, qk), which monotonous- ly increases with quality qk.

For simplicity, we consider that there are |D| = 2 noise intervals, and omit the subscript of e. Then, the effective contribution is calculated as

cm(g(e)) = c2(e) = 1 + e log e + (1 − e) log(1 − e).

Value function is set to be vk = V sin(cm(g(e)) × π/2), which is concave with feasible cm∈ [0, 1].

We select participants from sufficient crowd, in an increas- ing order of cost/contribution ratio, and calculate the optimal reward for the top proportion of them, ranging from 10% to 100%. The optimal reward, in our quality based incentive mechanism, is determined by

r^∗= arg min

r rcm(g(e^k)) − ck≥ 0, ∀ak∈ At.

(9)

Each participant akwill get a proper reward, rk= r^∗cm(g(e^k)).

Results, as shown in Figure 6, indicate that our quality based incentive mechanism overwhelmingly outperforms the uniform pricing scheme, in both of the two distributions.

The quality based incentive mechanism complies with the cost/contribution ratio to set the optimal reward in every stage, and thus can fully leverage the power of participants to complete the sensing tasks at a low cost, when compared to the uniform pricing scheme. Moreover, with the guaranteed value of service, the quality based incentive mechanism, with higher accuracy and less fluctuation in noise monitoring, is more appealing to the service provider.

The results also suggest the proper fraction of participants that the service provider should try to recruit, which is 80% for both schemes when sensing cost and effort matrix has no correlation, and 80% and 70%, for our quality based incentive mechanism and the uniform pricing scheme, respectively, when the factors are positively correlated. It is reasonable to see such a turning point in profit, from a smooth rise to a fall, since there are always some participants with sensing costs higher than what they deserve to be rewarded according to the their effective contributions, i.e., with unreasonably high sensing costs for subpar contributions. The x-axis ends with 970 (Figure 6(a)) and 940 (Figure 6(b)), respectively, since we have excluded extreme data from the distribution.

0 500 1000 1500 2000 2500

100 200 300 400 500 600 700 800 900 970

Profit

Number of participants Quality-based Pricing

Uniform Pricing

(a) No correlation ρ = 0.0

0 500 1000 1500 2000 2500

100 200 300 400 500 600 700 800 900 940

Profit

Number of participants Quality-based Pricing

Uniform Pricing

(b) Positive correlation ρ = 0.8

Figure 6: Comparison on profit of different pricing schemes.

6. CONCLUSION

In this paper, we have incorporated the consideration of data quality into the design of incentive mechanism for crowdsensing. By applying the expectation maximization algorithm and information theory, we have bridged the gap be-

tween quality of sensing data and proper reward for contribution, and proposed the quality based incentive mechanism, which achieves both individual rationality and profit maximization. Our incentive mechanism estimates the effort matrix for each participant, calculates the quality of sensing data, and offers a reward in accordance with each effective contribution, aiming to motivate individual participants with different sensing costs to place sufficient manual efforts and submit high quality sensing data in crowdsensing. We have also implemented part of the mechanism with extensive experiments and simulations. Compared to the existing uniform pricing scheme, our mechanism achieves superior performance in profit management.

7. ACKNOWLEDGEMENT

We appreciate the anonymous reviewers, whose comments led to an improvement of this paper. Our shepherd, Vishal Misra, gave us highly valuable comments to improve it.

8. REFERENCES

[1] Noisetube. http://www.noisetube.net/, 2008. [Online].

[2] Creek watch. http://creekwatch.researchlabs.ibm.com/, 2010. [Online].

[3] Opensense. http://www.opensense.ethz.ch/trac/, 2010.

[Online].

[4] Noisemonitoring. http://www.larsondavis.com/, 2015.

[Online].

[5] J. A. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava.

Participatory sensing. Center for Embedded Network Sensing, 2006.

[6] A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm.

Applied Statistics, pages 20–28, 1979.

[7] D. Hand. Principles of data mining. Drug Safety, 30(7):621–622, 2007.

[8] J. J. Horton and L. B. Chilton. The labor economics of paid crowdsourcing. In Proceedings of the 11th ACM Conference on Electronic Commerce (EC), 2010.

[9] L. G. Jaimes, I. Vergara-Laurens, and M. A. Labrador. A location-based incentive mechanism for participatory sensing systems with budget constraints. In Proceedings of the 2012 IEEE International Conference on Pervasive Computing and Communications (PerCom), 2012.

[10] R. Kawajiri, M. Shimosaka, and H. Kashima. Steered crowdsensing incentive design towards quality-oriented place-centric crowdsensing. In Proceedings of the 16th ACM International Conference on Ubiquitous Computing (UbiComp), 2014.

[11] I. Koutsopoulos. Optimal incentive-driven design of participatory sensing systems. In Proceedings of the 32nd Annual IEEE International Conference on Computer Communications (INFOCOM), 2013.

[12] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. Campbell. A survey of mobile phone sensing.

IEEE Communications Magazine, 48(9):140–150, 2010.

[13] J.-S. Lee and B. Hoh. Sell your experiences: a market mechanism based incentive for participatory sensing. In Proceedings of the 2010 IEEE International Conference on Pervasive Computing and Communications (PerCom), 2010.

[14] H. Lu, W. Pan, N. D. Lane, T. Choudhury, and A. T.

Campbell. Soundsense: Scalable sound sensing for people-centric applications on mobile phones. In Proceeding of the 7th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys), 2009.

[15] T. Luo, H.-P. Tan, and L. Xia. Profit-maximizing incentive for participatory sensing. In Proceedings of the 33rd Annual

(10)

IEEE International Conference on Computer Communications (INFOCOM), 2014.

[16] N. Maisonneuve, M. Stevens, M. Niessen, and L. Steels.

Noisetube: Measuring and mapping noise pollution with mobile phones. In Information Technologies in

Environmental Engineering. 2009.

[17] W. Mason and D. J. Watts. Financial incentives and the performance of crowds. SIGKDD Explorations Newsletter, 11(2):100–108, 2010.

[18] S. Mathur, T. Jin, N. Kasturirangan, J. Chandrasekaran, W. Xue, M. Gruteser, and W. Trappe. Parknet: Drive-by sensing of road-side parking statistics. In Proceeding of the 8th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys), 2010.

[19] P. Mohan, V. N. Padmanabhan, and R. Ramjee. Nericell:

rich monitoring of road and traffic conditions using mobile smartphones. In Proceedings of the 6th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2008.

[20] M. Musthag, A. Raij, D. Ganesan, S. Kumar, and S. Shiffman. Exploring micro-incentive strategies for participant compensation in high-burden studies. In Proceedings of the 13th ACM International Conference on Ubiquitous Computing (UbiComp), 2011.

[21] S. Nawaz, C. Efstratiou, and C. Mascolo. Parksense: a smartphone based sensing system for on-street parking. In Proceeding of the 19th Annual International Conference on Mobile Computing and Networking (MobiCom), 2013.

[22] F. Pukelsheim. The three sigma rule. American Statistician, 44:88–91, 1994.

[23] A. Rai, K. K. Chintalapudi, V. N. Padmanabhan, and R. Sen. Zee: zero-effort crowdsourcing for indoor

localization. In Proceeding of the 18th Annual International Conference on Mobile Computing and Networking (MobiCom), 2012.

[24] R. K. Rana, C. T. Chou, S. S. Kanhere, N. Bulusu, and W. Hu. Ear-phone: An end-to-end participatory urban noise mapping system. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2010.

[25] S. Reddy, D. Estrin, M. Hansen, and M. Srivastava.

Examining micro-payments for participatory sensing data collections. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing (UbiComp), 2010.

[26] S. Reddy, D. Estrin, and M. Srivastava. Recruitment framework for participatory sensing data collections. In Proceedings of the 2010 IEEE International Conference on Pervasive Computing and Communications (PerCom), 2010.

[27] V. Sachidananda, A. Khelil, and N. Suri. Quality of information in wireless sensor networks: A survey. In Proceeding of the 12th Annual International Conference on Information Quality (ICIQ), 2010.

[28] C. E. Shannon. A mathematical theory of communication.

The Bell System Technical Journal, 27:379–423,623–656, 1948.

[29] D. M. Strong, Y. W. Lee, and R. Y. Wang. Data quality in context. Communication of the ACM, 40(5):103–110, 1997.

[30] J. Wang and P. G. Ipeirotis. Quality-based pricing for crowdsourced workers. In Technical Report, New York University, 2013.

[31] Y. Wang, X. Liu, H. Wei, G. Forman, C. Chen, and Y. Zhu. Crowdatlas: Self-updating maps for cloud and personal use. In Proceeding of the 11th Annual

International Conference on Mobile Systems, Applications, and Services (MobiSys), 2013.

[32] D. Yang, G. Xue, X. Fang, and J. Tang. Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing. In Proceeding of the 18th Annual International Conference on Mobile Computing and Networking (MobiCom), 2012.

[33] Z. Yang, C. Wu, and Y. Liu. Locating in fingerprint space:

wireless indoor localization with little human intervention.

In Proceeding of the 18th Annual International Conference on Mobile Computing and Networking (MobiCom), 2012.

[34] R. W. Yeung. Information Theory and Network Coding.

Springer, 2008.

[35] L. Yu, P. Andr´e, A. Kittur, and R. Kraut. A comparison of social, learning, and financial strategies on crowd

engagement and output quality. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), 2014.

[36] Q. Zhang, Y. Wen, X. Tian, X. Gan, and X. Wang.

Incentivize crowd labeling under budget constraint. In Proceedings of the 34th IEEE International Conference on Computer Communications (INFOCOM), 2015.

[37] X. Zhang, Z. Yang, Z. Zhou, H. Cai, L. Chen, and X. Li.

Free market of crowdsourcing: Incentive mechanism design for mobile sensing. IEEE Transactions on Parallel and Distributed Systems, 25(12):3190–3200, 2014.

[38] Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014.

[39] Y. Zhang and M. van der Schaar. Reputation-based incentive protocols in crowdsourcing applications. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications (INFOCOM), 2012.

[40] D. Zhao, X.-Y. Li, and H. Ma. How to crowdsource tasks truthfully without sacrificing utility: Online incentive mechanisms with budget constraint. In Proceedings of the 33rd Annual IEEE International Conference on Computer Communications (INFOCOM), 2014.