In current large-scale distributed key-value stores, a single end-user request may lead to key-value access across tens or hundreds of servers. The tail latency of these key-value accesses is crucial to user experience and greatly impacts the revenue. To cut the tail latency, it is crucial for clients to choose the best replica server as much as possible for the service of each key-value access operation. Aware of the challenges on the time-varying performance across servers and the herd behaviors, an adaptive replica selection scheme C3 has been proposed recently. In C3, feedback from individual servers is brought into replica ranking to reflect the time-varying performance of servers, and the distributed rate control and backpressure mechanisms are invented. Despite C3's good performance, we reveal the timeliness issue of C3, which has large impacts on both the replica ranking and the rate control. To address this issue, we propose the TAP (timeliness-aware predication-based) replica selection algorithm, which predicts the queue size of replica servers under the poor timeliness condition, instead of utilizing the exponentially weighted moving average of the piggybacked queue sizes in history as in C3. Consequently, compared with C3, TAP can obtain more accurate queue-size estimation to guide the replica selection at clients. Simulation results also confirm the advantage of TAP over C3 in terms of cutting the tail latency.KEYWORDS key-value stores, prediction, tail latency, timeliness
INTRODUCTIONIn the current large-scale distributed key-value store system, data are partitioned into small pieces, replicated, and distributed across servers for parallel access and scalability. Consequently, a single end-user request may need key-value access from tens or hundreds of servers. 1-3 The tail latency of these key-value accesses decides the response time of the end-user request, which is directly associated with user experience and revenue. 4,5 Nevertheless, because the performance of servers is time varying, 6,7 the tail latency is hard to be guaranteed and may become long beyond expectation in a certain condition. A recent study shows that the 99th percentile latency can be one order of magnitude larger than the median latency, 6 indicating that there is a large space to cut the tail latency of key-value accesses. To cut the tail latency, the replica selection scheme, which chooses the best replica server for each key-value access as much as possible at clients, is crucial. 8 Other methods, including duplicate or reissue requests 2,6,9,10 for small tail latency, can also benefit from a good replica selection scheme.However, the replica selection schemes of current classic key-value stores are very simple for efficiency. For example, the OpenStack Swift just reads from an arbitrary server and retries in case of failures. 11 HBase relies on HDFS, which chooses the physically closest replica server. 12Riak uses an external load balancer such as Nginx, 13 which employs the least-outstanding requests (LOR) strategy. Accor...