You are here

Professor's Work Plays Role in Faster Google, Facebook

The research of Xiaodong Zhang, professor and chair, computer science and engineering, and his former doctoral student, Song Jiang, has been adopted by Sun Microsystems’ MySQL, the most widely used database software providing data processing service in various enterprise, government and education applications. There are more than 11 millions installations of the MySQL database in the world, including in the busiest Web sites of the human society, such as Google, Yahoo, Wikipedia, YouTube and Facebook. Many of the world's largest and fastest-growing organizations use MySQL to save time and money powering their high-volume Web sites, critical business systems and packaged software and to manage their huge data volumes.

Zhang co-authored an influential algorithm, or a sequence of complex instructions that tells computers how to complete specific tasks, to improve the speed and performance of data accesses operated by operating and database systems.

In a data management system, such as a database, frequently used data blocks are cached (temporally stored) in a small region of fast DRAM memory, called “buffer pool” (or buffer cache), to avoid the long delays that would occur if slower hard disks had to be used. Such caching mechanisms have been widely used in almost all memory-capable digital systems, from large computer and database systems to small devices such as cell phones.

In a search engine, such as Google, all kinds of Web pages from the Internet are timely retrieved and partially stored in a huge MySQL database. The majority of Web page data items are archived in massive hard disk storage, and only the most frequently accessed pages are expected to store in buffer pools in DRAM memory. For each user’s search in Google, if the query hits in the buffer pool, the user would quickly get the search results; otherwise, the results would have to be fetched from slow disk storages, causing long delays.

Since the buffer pool space is very limited, data blocks are frequently replaced by newcomers. A critical task of cache management is to keep the most frequently used data in the buffer pool (or to hold the frequently accessed Web pages in a search engine). A data block replacement algorithm decides which block will be evicted when a new data block needs to be allocated in the buffer pool. The database access speed is mainly determined by the accuracy and efficiency of the replacement algorithm.

The commonly used LRU, or Least Recent Used, algorithm (such as in operating systems, databases and many other digital devices) evicts the least recently used data block. But since a data block that was least recently used is not always the same as the most infrequently used data block, two serious problems arise: (1) one-time accessed data blocks that were recently used can easily kick out frequently used data in the cache; and (2) a high rate of misses occurs for loop-like data accesses due to limited cache sizes. Zhang and Jiang’s development, the LIRS (Low Inter-Reference Recency Set) caching algorithm, can effectively detect the one-time, or most infrequently, used data blocks and evict them as early as possible and can adapt the loop access patterns in replacement decisions. Another advantage of LIRS is its simple implementation in any system. Zhang and Jiang’s work serves as a critical component in the MySQL buffer pool management now. In fact, in MySQL's most recent version 5.1 (released in November 2008), LIRS is called the "Jiang-Zhang LIRS Caching Algorithm."

Relative to this work, Zhang has been elected to a Fellow of the Institute of Electrical and Electronics Engineers for his contributions to computer memory systems. The original research and paper were supported by National Science Foundation grants to Zhang. The student, Jiang, is now an assistant professor in the Department of Electrical and Computer Engineering at Wayne State University. NSF continues to support his research with grants including a CAREER award. Since the LIRS algorithm was published in the 2002 ACM SIGMETRICS Conference, it had been widely tested and evaluated by both academia researchers and industry developers. Before this database application, LIRS was adopted first in computer systems, such as Linux and NetBSD, the two widely used open source operating systems for laptops, servers, supercomputers and network systems.

Read more about Xiaodong Zhang.