Another smart solution for big data challenges

Posted: 

A big data software tool developed in the Department of Computer Science and Engineering is gaining recognition and usage in the computing research community.

The Computing Community Consortium, a National Science Foundation and Computing Research Association joint venture, recently highlighted YSmart as its weekly notable research.

ysmart[1].png

YSmart is designed to improve the productivity of big data processing by automatically translating SQL queries to MapReduce programs. A programming model for processing large data sets, MapReduce allows for massive scalability across thousands of servers in a cluster on the Hadoop platform.

Effective and automatic translating SQL queries to MapReduce programs is a critical task for high productivity of various big data analytics applications. With this automation, users can directly use their familiar SQL queries to interact with Hadoop systems without writing MapReduce programs. —Computing Community Consortium

YSmart’s elegance and advantage lies in its ability to automatically detect and utilize intra-query correlations when translating a complex SQL query to a series of MapReduce programs. An open-source project, YSmart has been adopted by Apache Hive, a data warehousing solution used by the likes of Facebook, LinkedIn, Microsoft, and Netflix.

Dr. Lee and Dr. Zhang

"I am glad to see that industry has quickly adopted more of our research for big data processing software ecosystems,” said Principle Investigator Xiaodong Zhang, Robert M. Chritchfield Professor in Engineering and chair of the CSE department.

In collaboration with Facebook software engineers in 2011, Zhang and his team also developed RCFile, a data placement structure for big data storage. Widely adopted in the open source community, it has become the default data placement structure in Facebook's data warehouse system.

"This is an exciting time to do big data research,” Zhang added, “which makes a direct impact on technology advancement to benefit society."

The YSmart team includes Buckeye engineers Yin Huai, Rubao Lee, Tian Luo, Meisam Fathi Salmi, and Yuan Yuan, as well outside collaborators Yongqiang He of Facebook and Fusheng Wang of Emory University.

Category: Faculty