D-RDMA: Bringing Zero-Copy RDMA to Database Systems
Abstract
The DMA part of RDMA stands for Direct Memory Access. It refers to the ability of a network card (among other devices) to read and write data from a host’s memory without CPU assistance. RDMA’s performance depends on efficient DMAs in the initiating and target hosts. In turn, a DMA’s cost is almost always proportional to the length of the data transfer. The exception is small DMAs, which suffer from high overheads. In this paper, we show that database systems often generate small DMA operations when using RDMA canonically. The reason is that the data they transmit is seldom contiguous by the time transmissions occur. Modern databases avoid this problem by copying data into large transmission buffers and issuing RDMAs over these buffers instead. However, doing this requires a substantial amount of CPU cycles and memory bandwidth, forfeiting RDMA’s benefits: its zero-copy feature. To solve this issue, we introduce DRDMA, a declarative extension to RDMA. D-RDMA is declarative in that it specifies what data to transmit but not the DMA schedule to do so. The approach leverages a smart NIC to group data fragments into larger DMAs and produce the same packet stream as regular RDMA. Our experiments show that the network throughput can increase from 18 Gbps per CPU core to up to 98 Gbps (on a 100 Gbps card) with virtually zero CPU usage when replacing RDMA with D-RDMA in a typical data shuffle scenario. We believe that D-RDMA can enable a new generation of high-performance systems to take full advantage of fast networking without incurring the usual CPU penalties.