On Efficient Meta-Data Collection for Crowdsensing

Dickens, L. and Lupu, E.  On Efficient Meta-Data Collection for Crowdsensing. In Crowdsensing Workshop at PerCom, 2014. (To appear.)

Building trustworthy systems that themselves rely on, or integrate, semi-trusted information sources is a challenging aim, but doing so allows us to make good use of floods of information continuously contributed by individuals and small organisations. This paper addresses the problem of quickly and efficiently acquiring high quality meta-data from human contributors, in order to support crowdsensing applications.

Crowdsensing (or participatory sensing) applications have been used to sense, measure and map a variety of phenomena, including: individuals’ health, mobility & social status; fuel & grocery prices; air quality & pollution levels; biodiversity; transport infrastructure; and route-planning for drivers & cyclists. Crowdsensing applications have an on-going requirement to turn raw data into useful knowledge, and to achieve this, many rely on prompt human generated meta-data to support and/or validate the primary data payload. These human contributions are inherently error prone and subject to bias and inaccuracies, so multiple overlapping labels are needed to cross-validate one another. While probabilistic inference can be used to reduce the required label overlap, there is a particular need in crowdsensing to minimise the overhead and improve the accuracy of timely label collection. This paper presents three general algorithms for efficient human meta-data collection, which support different constraints on how the central authority collects contributions, and three methods to intelligently pair annotators with tasks based on formal information theoretic principles. We test our methods’ performance on challenging synthetic data-sets, based on r eal data, and show that our algorithms can significantly lower the cost and improve the accuracy of human meta-data labelling, with a corresponding increase in the average novel information content from new labels.