- Problem addressed by the paper
Protecting the privacy of cloud based data from untrusted MapReduce applications.
- Solution proposed in the paper. Why is it better than previous work?
Modifying JVM, MapReduce framework, and SELinux policy to add differential privacy guarantee in MapReduce applications. Unlike previous works, it does not rely on anonymity by transforming original data. In fact, Airavat can achieve privacy by running computation directly to the original data. This implementation is scalable and does not require rewriting existing applications like PINQ.
- The major results.
It can achieve differential privacy with high accuracy (around 80% accuracy on 0.2 privacy bound) and a reasonable overhead around 32%.
B. Basic idea and approach. How does the solution work?
To prevent information leaks through system resources, Airavat runs on SELinux policy and adds SELinux-like mandatory access control to the MapReduce distributed file system. To prevent leaks through the output of the computation, Airavat enforces differential privacy using modifications to the Java Virtual Machine and the MapReduce framework. On the SELinux, modifications were done to add domains for trusted and untrusted programs. Then applying restrictions on each domain. In MapReduce framework, modifications were done to support mandatory access control and set of trusted reducers. In the JVM, modifications were done to enforce Mapper independence.
- A novel implementation that can be deployed in large scale distribution, without the need of rewriting existing MapReduce applications.
- Airavat cannot confine every computation performed by untrusted code. For example, a MapReduce computation may output key/value pairs. Keys are text strings that provide a storage channel for malicious mappers. In general, Airavat cannot guarantee privacy for computations which output keys produced by untrusted mappers.
- Airavat does not block timing channels caused by infinite loops (non-termination). It reduced some side-channel attacks, but not completely. The authors plan to address it in future work.
- MapReduce applications are mostly written in Java. However, MapReduce also supports other programming language such as Python. This approach will not work on MapReduce applications that use non-Java programming language. Also, it will not work on cloud applications that do not use MapReduce framework.
- It supports only limited Reducer functions for now: sum, count, average, and threshold. For more general computation, it still need trusted Mapper.
- The authors mention that the differential privacy threshold is not automatic. They say it is different for every case. They need a differential privacy expert to set the boundary and then add it to Airavat’s system manually.