Recent statistics show that in 2015 more than 140 millions new malware samples have been found. Among these, a large portion is due to ransomware, the class of malware whose specific goal is to render the victim’s system unusable, in particular by encrypting important files, and then ask the user to pay a ransom to revert the damage. Several ransomware include sophisticated packing techniques, and are hence difficult to statically analyse. We present EldeRan, a machine learning approach for dynamically analysing and classifying ransomware. EldeRan monitors a set of actions performed by applications in their first phases of installation checking for characteristics signs of ransomware. Our tests over a dataset of 582 ransomware belonging to 11 families, and with 942 goodware applications, show that EldeRan achieves an area under the ROC curve of 0.995. Furthermore, EldeRan works without requiring that an entire ransomware family is available beforehand. These results suggest that dynamic analysis can support ransomware detection, since ransomware samples exhibit a set of characteristic features at run-time that are common across families, and that helps the early detection of new variants. We also outline some limitations of dynamic analysis for ransomware and propose possible solutions.
Daniele Sgandurra, Luis Muñoz-González, Rabih Mohsen, Emil C. Lupu. In ArXiv e-prints, arXiv:1609.03020, September 2016.
Building trustworthy systems that themselves rely on, or integrate, semi-trusted information sources is a challenging aim, but doing so allows us to make good use of floods of information continuously contributed by individuals and small organisations. This paper addresses the problem of quickly and efficiently acquiring high quality meta-data from human contributors, in order to support crowdsensing applications.
Crowdsensing (or participatory sensing) applications have been used to sense, measure and map a variety of phenomena, including: individuals’ health, mobility & social status; fuel & grocery prices; air quality & pollution levels; biodiversity; transport infrastructure; and route-planning for drivers & cyclists. Crowdsensing applications have an on-going requirement to turn raw data into useful knowledge, and to achieve this, many rely on prompt human generated meta-data to support and/or validate the primary data payload. These human contributions are inherently error prone and subject to bias and inaccuracies, so multiple overlapping labels are needed to cross-validate one another. While probabilistic inference can be used to reduce the required label overlap, there is a particular need in crowdsensing to minimise the overhead and improve the accuracy of timely label collection. This paper presents three general algorithms for efficient human meta-data collection, which support different constraints on how the central authority collects contributions, and three methods to intelligently pair annotators with tasks based on formal information theoretic principles. We test our methods’ performance on challenging synthetic data-sets, based on r eal data, and show that our algorithms can significantly lower the cost and improve the accuracy of human meta-data labelling, with a corresponding increase in the average novel information content from new labels.