OLAPS: Online Load-Balancing in Range-Partitioned Main Memory Database with Approximate Partition Statistics

Djahida Belayadi, Khaled-Walid Hidouci, Ladjel Bellatreche

Modern database systems can achieve high throughput main-memory query execution by being aware of the dynamics of highly parallel hardware. In such systems, data is partitioned into smaller pieces to reach a better parallelism. Unfortunately, data skew is one of the main problems faced during parallel processing in a parallel main memory database. In some data-intensive applications, parallel range queries over a dynamic range partitioned system are important. Continuous insertions/deletions can lead to a very high degree of data skew and consequently a poor performance of parallel range queries. In this paper, we propose an approach for maintaining balanced loads over a set of nodes as in a system of communicating vessels, by migrating tuples between neighboring nodes. These frequent (or even continuous) data transfers inevitably involve dynamic changes in the partition statistics. To avoid the performance degradation typically associated with this dynamism, we provide a solution based on an approximate Partition Statistics Table. The basic idea behind this table is that both clients and nodes may have an imperfect knowledge about the effective load distribution. They can nevertheless locate any data with almost the same efficiency as using exact partition statistics. Furthermore, maintaining load distribution statistics do not require exchanging additional messages as opposed to the cost of efficient solutions from the state-of-art (which requires at least O(logn) messages). We show through intensive experiments that our proposal supports efficient range queries, while simultaneously guaranteeing storage balance even in the presence of numerous concurrent insertions/deletions generating a heavy skewed data distribution.