A Hadoop-like Distributed Computing Platform

This is a parallel cloud computing framework, which bears similarities with MapReduce/Hadoop. This distributed computing platform consists of core services such as an underlying distributed file system and a reliable membership protocol. This is extended from a course project (CS 425 at UIUC) and is awarded the best implementation in Java. Since it’s open-sourced on GitHub, you can only use this for reference if you are also implementing MapleJuice for CS 425.

The MapReduce interface consists of two phases of computation. Each phase is divided into tasks running parallelly on servers in the cluster. Two phases are separated by a barriers, meaning that Reduce can not be started when Map is still running. The new Map function processes 10 input lines (from a file) simultaneously at a time, while the traditional Map function processed only one input line at a time. Beyond this distinction, the paradigm is very similar to the MapReduce paradigm.

Haoran Qiu
Ph.D. Student in Computer Science

My research interests include distributed systems, machine learning and cloud computing.