Looking back over the last decade, Apache Spark has really disrupted big data processing and analytics in many ways. With its vibrant ecosystem, Spark, a high-performance analytics engine for big data processing, is the most active Apache open- source project. Key factors driving Spark enterprise adoption are unmatched performance, simple programming and general-purpose analytics over massive amounts of data.
Spark performance benchmarks indicate that Spark runs Big Data workloads 100x faster than Hadoop MapReduce for both batch and streaming data. This performance gain is primarily attributed to Spark’s in-memory computation approach to data processing and analysis – an approach that is very fast and efficient, enabling large-scale machine learning and data analytics. Spark also utilizes a new data model called resilient distributed datasets (RDDs). This is basically a data structure that is stored in-memory while being computed, thus eliminating expensive intermediate disk writes.
To handle data processing and analysis at scale, Spark operates a continuous event known as the shuffle—a mechanism for re-distributing data so that it’s grouped differently across partitions. Typically, copying data across executors and machines makes the shuffle a complex and costly operation since it involves disk I/O, data serialization, and network I/O. Therefore data scientists and software professionals use various techniques to avoid data shuffling as much as possible in the application design and constructs. Still, shuffle operations are a necessity for most workloads, thus compromising performance.
Remote Direct Memory Access (RDMA) is a network technology that allows for direct memory access of one computer into that of another, without involving either one’s operating-system and CPU. RDMA is especially useful in scenarios involving massively parallel computer clusters as it permits high-throughput, low-latency networking. Once an application performs an RDMA Read or Write request, the system delivers the application data directly to the network (zero-copy, fully offloaded by the network adapter), reducing latency and enabling fast message transfer. RDMA over Converged Ethernet (RDMA) is a network protocol that allows RDMA to run over an Ethernet network.
Mellanox Technologies has been a leading pioneer of the popular RDMA and RDMA over converged Ethernet (RoCE) networking technologies, starting in the high-performance computing (HPC) industry. In fact, Mellanox has just released its 8th generation of RDMA/ RoCE capable products including the intelligent ConnectX adapter cards and BlueField SmartNICs, which both have built-in RDMA and RoCE capabilities and deliver best-in class performance and usability.
RDMA today is integrated into the mainstream code of popular machine learning (ML) and artificial intelligence (AI) frameworks, namely TensorFlow, MXNet and Caffe2. Recently, Mellanox announced the v3 release of its Spark-compliant open-source SparkRDMA software plugin, which leverages RDMA communication technology to accelerate Spark’s shuffle operations. The plugin neither changes the mainstream Spark code nor impacts its functionality, making it a perfect fit for existing deployments.
Figure 4 below illustrates how SparkRDMA reuses the Unsafe and Sort Shuffle Writer implementations of the mainstream Spark (appears in light green). While Shuffle data is written and stored identically to the original implementation, the all-new ShuffleReader and ShuffleBlockResolver provide an optimized RDMA transport when blocks are read over the network (appears in light blue).
The following diagrams describe the shuffle read protocol in the original implementation, and when using RDMA (lower diagram). As indicated, using RDMA for Spark’s shuffle operations both greatly shortens and speeds up the process.
Spark over RDMA has shown substantial improvements in block transfer times (both latency and total transfer time), memory consumption and CPU utilization, compared to standard Spark’s implementation which uses over TCP. Moreover, the Spark RDMA plugin is designed with ease-of-use in mind, and supports per-job operation, allowing for incremental deployments and limited use for shuffle-intensive jobs.
Finally, the performance benefits of running Spark over RDMA are tremendous! Here are a few data points showing SparkRDMA in-action:
Apache Spark is today’s fastest growing Big Data analysis platform. The Mellanox team is excited to partner with large-scale enterprises, Cloud and AI solution providers to unlock scalable, faster and highly efficient big-data analytics and machine learning for a wide range of commercial and research use-cases.
Learn more about RDMA and RoCE.
To learn more about Mellanox’s fully-featured end-to-end InfiniBand and Ethernet product lines visit our website.