All posts by Ramnath Sagar

About Ramnath Sagar

Ramnath Sai Sagar is a Marketing Manager at Mellanox Technologies, heading market development for Big Data, Enterprise AI and Web2.0. He has an extensive background in both R&D and Marketing. Prior to joining Mellanox, he had worked as a Performance & Solutions Architect at Emulex Corporation, and in some of the premier research projects in European labs including Brain Mind Institute (BMI) at EPFL, Switzerland and Barcelona Supercomputing Center (BSC), Spain. He has been published in a number of leading conferences and journals in scientific computing and holds a Bachelor of Science in Computer Engineering from Anna University, India.

What Tesla’s Autopilot Teaches Us about DevOps for High Performance AI-Powered Applications

This post was co-written by Mellanox’s Ramnath Sagar and Nisha Tagala of Parallel Machines.

Tesla’s semi-autonomous Autopilot system has drawn a lot of attention in the automotive industry. The ability of Tesla to push smarter Autopilot service with each Over-The-Air (OTA) update enables them to maintain a competitive edge in the autonomous vehicle era. But using AI, especially High Performance Deep Learning (DL), to have a competitive edge is not just relevant for Tesla but for any enterprise looking to build an intelligent software.

In today’s world where rapid innovation is no longer optional but mandatory, DevOps becomes critical – bringing software developers and operations staff to work closely. However, in an DL-powered software world, effectively deploying a DevOps-styled application to production remains a humongous challenge. This is due to the complexities of configuration, the need for efficient hardware to scale training and inference performance, and the complexities of continuously managing and supporting deep learning in production.

Mellanox and ParallelM have teamed up to solve this challenge using MLOps (DevOps for Machine Learning) and defined a reference architecture for Production-scale High Performance Deep Learning solution. We demonstrate how our technologies (Mellanox for high performance deep learning and ParallelM for Production DL Management), coupled with the state-of-the-art technologies from Open Source community, can enable AI-first enterprises to maintain their competitive edge.

For our reference design, we chose Tensorflow, one of the most popular ML/DL frameworks, but the solution can be easily extended to other frameworks such as SparkML, Caffe, Torch and others.

This reference design accomplishes two key objectives:

  • Fastest Time-to-Train with support for leading tools and frameworks out-of-the-box
  • Fastest Time-to-Inference with the ability to rapidly train & retrain, and move to & manage/optimize in production while ensuring prediction quality in a dynamic environment

For more details, refer to our reference design:


Nisha Talagala is CTO and vice president of engineering at Parallel Machines, where she focuses on production machine learning and deep learning solutions from the edge to the cloud. Nisha has more than 15 years of expertise in software development, distributed systems, I/O solutions, persistent memory, and flash. Previously, Nisha was a fellow at SanDisk; a fellow and lead architect at Fusion-io, where she drove innovation in nonvolatile memory, including the industry’s first persistent memory solution; technology lead for server flash at Intel, where she led server platform nonvolatile memory technology development, storage-memory convergence, and technical partner engagements; and CTO of Gear6, where she designed and built clustered computing caches for high-performance I/O environments. Nisha holds 48 patents in distributed systems, networking, storage, performance, and nonvolatile memory. She has authored many technical ad research publications and serves on multiple academic and industry conference program committees. Nisha holds a PhD from UC Berkeley, where her research focused on software clustering and distributed storage.


Artificial Intelligence: A Journey to Deep Space

Since the dawn of the space age, unmanned spacecraft have flown blind, with little to no ability to make autonomous decisions based on their environment. That, however, changed in the early 2000s, when NASA started working on leveraging Artificial Intelligence (AI) and laying the foundation that would help Astronauts and Astronomers to work more efficiency in Space.  In fact, just last month, NASA’s Jet Propulsion Laboratory published how AI will govern the behavior of space probes.

Recent advancements in Artificial Intelligence, especially Deep Learning (a subfield in AI), are set to make a deeper impact in the field of astronomy and astrophysics. From navigating the unknown terrain of Mars, to analyzing petabytes of data generated from Square Kilometer Array, to finding Earth-like planets in our messy galaxy, AI is already revolutionizing our lives here on earth by building smarter and more autonomous cars, helping us find solutions to climate change, revolutionizing healthcare and much more. Mellanox is proud to be working closely with the leading companies and research organizations to make advancements in the field of Artificial Intelligence and Astronomy.

AI: The Next Industrial Revolution

Coined in 1956 by Dartmouth Assistant Professor John McCarthy, AI existed before the “Race to Space” but could only deliver rudimentary displays of intelligence in specific context. Progress was limited due to the complexities of algorithms needed to tackle various real-world issues. Many were above the ability of a mere human to execute. This however, changed in the past decade mainly due to two reasons:

  1. Storing Unstructured Data More Efficiently: Around 90 percent of data generated today are unstructured, including free-form documents, images, audio and video recordings. Traditionally, it hasn’t been possible for computers to efficiently store and process these data. However, the advancements in Hadoop and NoSQL databases, in concert with the underlying storage technologies (Software-Defined Storage, Object Storage, etc.), have enabled storing and processing petabytes of unstructured data in a far more cost effective way.
  2. Processing Data Faster: It takes massive amount of computing resource to train a sophisticated AI model – training that can take weeks to months. The advancements in the underlying hardware, including faster compute (GPUs, FPGAs etc.), faster storage (SSDs/NVMe, NVMe-over-Fabrics, etc.) and faster networks at speeds of up to 100Gb/s, has helped reduce the training time to just a week. Further, using Remote Direct Memory Access (RDMA), an industry networking standard that Mellanox has pioneered, helps to reduce days and to mere hours. (All popular AI frameworks such as Tensorflow, Caffe, Torch and Microsoft CTNK all support RDMA).

Due to this, AI now presents one of the most exciting and potentially transformative opportunities for the mankind. In fact, in some quarters it is being heralded as the next industrial revolution:

“The last 10 years have been about building a world that is mobile-first. In the next 10 years, we will shift to a world that is AI-first.” — Sundar Pichai, CEO of Google, October 2016

AI for the Messy Galaxy

While humanity has made great strides in exploring the observable universe, we need to rely on intelligent robots to explore where we cannot humanly go. This is because our galaxy, the Milky Way, is one messy place, filled with cosmic dust from stars, comets, and more; concealing the very things scientists want to study. That said, there are three major challenges in leveraging AI in the future of space exploration. Firstly, the probes will have to be able to learn about and adapt to unknown environments including responding to thick layers of gas in a planet’s atmosphere, extreme temperatures or unplanned for fluctuations in gravity. Secondly, when a probe falls outside the communication range, would have to figure out when and how to return the data collected during the time the signal was lost. Finally, given the vast distances in space, it could take several generations before the probe reaches its destination and therefore, will need to be flexible enough to adapt to any new discoveries and innovations we make here on earth. The solution to these problems will require training AI models on petabytes of data captured using supercomputers.

The benefits of using AI to control space-exploring robots are already being realized by missions that are currently underway. For example, Opportunity, the Mars Exploration Rover, which was launched back in 2003, has an AI driving system called Autonav that allows it to explore the surface of Mars. In addition, Autonomous Exploration for Gathering Increased Science (AEGIS) has been used by the NASA Mars rover, Curiosity, since May in order to select which aspects of Mars are particularly interesting and subsequently take photos of.

Figure 1: Image Captured by AEGIS Enabled Curiosity’s ChemCam.

But Mars is by no means the final destination and the exploration of more challenging destinations will require even more advanced AI. For example, exploring the subsurface ocean of the Jovian moon Europa in the hope of finding alien life, will require bypassing a thick (~10km) ice crust. Controlling this exploration would be severely limited without advanced autonomy.

Artificial Intelligence Needs Intelligent Network

Since the early age of Mellanox, we have been working closely with NASA and many research labs to help solve the challenges of scientific computing, whether it’s the aerodynamic simulation of the Jet Propulsion Engine or monitoring the universe in unprecedented detail. In addition, over the last few years, Mellanox has also enabled the pioneers in the field of AI including Baidu for their advancements in autonomous cars and Yahoo for image recognition. The applications of autonomous driving and object recognition go far beyond the limits of Earth and Mellanox is proud to be working closely with several research organizations and companies and helping them achieve technological breakthroughs in the field of astronomy and astrophysics.

Exactly 48 years ago, Neil Armstrong said “That’s one small step for man, one giant leap for mankind”, when he became the first human to set the foot on the surface of the moon. The next giant leap for mankind will come from the small step of a robot, powered by AI and Mellanox.

Supporting Resources:

Why Do Facebook, Yahoo, Baidu and Others Use Mellanox for Their Machine Learning Applications? Follow Our Social Media to Find That Out

Machine Learning is a buzzword in the tech world right now, and for a very good reason: It exemplifies a major step forward in how the digital applications interact with humanity and helps solve our problems. In just past 5 years, the innovation in this field has been staggering. Consider the following:

Just a month ago, NASA announced the discovery of seven “earthlike” planets less than 40 light years away. How do you think they were able to make this amazing discovery considering that there are 1 billion, trillion stars in the observable galaxy?

Or how about one of the hottest application- self-driving cars, otherwise called smart cars? Smart cars have raked up millions of miles driving in full autonomous mode with minimal human intervention and a low accident rate. How do you think this was achieved?

What is Machine Learning?

Machine Learning is a fundamental paradigm used for autonomous computing, self-managing systems and decision-making under uncertainty. Simply put, unlike the traditional software which relies on a hard-coded logic, machine learning enables a software to write a software simply by looking at the treasure trove of big data. Interestingly, a report from IDC last year reported that about 50 percent of all business analytics software will incorporate some form of Machine Learning.

With large computing power and data, machine learning has helped transform several business use cases and application verticals. Some of the most popular ones are: smart cars, e-commerce, image recognition, Natural Language Processing (NLP), healthcare, financial trading and fraud detection.
In next five days, you check out our social media channels and this blog to learn more about how Mellanox Intelligent Network – both Ethernet and Infiniband – is enabling our customers to solve their problems and helps them maintain an edge over their competitors.For the past few years, Mellanox has been working closely with several leading technological innovators including Facebook, Microsoft, Baidu, Tencent and many more to help accelerate their machine learning platforms. Some of the core technologies, including RDMA, GPUDirect and predictable switch fabrics has empowered these innovators to rapidly train on the huge data they accumulate each day and upgrade the trained model for real world problems sooner.






Follow Mellanox on: TwitterFacebookLinkedIn, and YouTube

Mellanox Networking is Everywhere at OCP Summit, Showcases Cloud Infrastructure Efficiency with Microsoft SONiC and Many More!

Every IT organization is on a continuous search for ways to become more efficient and to lower costs. For more than a decade, large “hyperscale” and Web2.0 companies (think Google, Microsoft, Facebook and Amazon) have set the gold standard for deploying the most efficient datacenters. Until very recently, the key to their success was kept tightly under wraps. However, this all changed in 2011 when Facebook decided to redefine the way we think about hyperscale and IT infrastructure, by introducing an initiative called Open Compute Project (OCP). OCP provides the specifications and blueprints for building energy-efficient and scalable datacenters. Since its inception, Mellanox has been an active contributor to OCP, addressing the demands of the next-gen data centers by delivering end-to-end network fabric with unmatched features for open architecture and proven, reliable performance at 10/25 and 40/50/100Gbps.

This year at OCP Summit, Mellanox is Everywhere! from powering the first PCIe Gen4.0 and delivering an OCP adapter capable of saturating dual 100GbE links to connecting all major OCP server architectures, from unlocking the performance of leading public and managed clouds to enabling open networking platforms with SONiC, Cumulus, Metaswitch and Many Network OS (NOS).

Mellanox Connects Industry’s First PCIe Gen 4.0 OCP Server

Mellanox ConnectX®-5 is the industry’s first PCIe Gen-4 based 100Gb/s OCP Ethernet adapter. The exponential growth of data, coupled with applications rapidly moving to cloud has created a pervasive need for a faster network. With PCIe Gen 4.0, ConnectX-5 delivers full 200Gb/s data throughput to servers and storage platforms. In addition, ConnectX-5 supports Multi-Host technology, delivering flexibility and major cost savings for the next generation of Cloud, Web2.0, Big Data and cognitive computing platforms. Multi-Host technology disaggregates the network and enables building new scale-out heterogeneous compute and storage racks with direct connectivity from multiple processors to shared network controller.

Mellanox Connects Clouds and Major Server Architectures

Mellanox network solutions power the industry’s broadest OCP server and storage platforms. Last year at OCP summit, we enabled OCP platforms like Facebook Yosemite, Facebook Leopard and Rackspace Barreleye servers. This year, we continue to innovate and deliver network for the next generation server platforms, namely: Microsoft Olympus server (x86 based), Rackspace Barreleye G2 with Zaius motherboard (OpenPOWER Power9 based) and Qualcomm CentriqTM (ARM based).

Both Microsoft Project Olympus and OpenPOWER based Zaius, the open server platform from Google and Rackspace are designed to usher the new era of open source hardware development for public cloud infrastructure. Project Olympus leverages intelligent networking solutions with Mellanox ConnectX-4 Lx and a tightly coupled programmable FPGA, to accelerate key functionalities including: crypto, Quality of Service (QoS), and storage protocols. OpenPOWER/OCP Barreleye G2 with Zaius motherboard, is the world’s first platform that combines IBM’s Power9 processor, PCIe Gen4 and Mellanox ConnectX-5 network adapters.

Mellanox Connects Open Networking Platforms

Network disaggregation was first envisioned and utilized by the hyper-scale web and cloud service providers, but has rapidly expanded its influence to enterprises in many industries. This has allowed enterprises to take control and easily customize their own network infrastructure. Through close collaboration with the community on open networking initiatives such as Cumulus, Switch Abstraction Interface (SAI), and Microsoft SONiC, Mellanox has helped bring the vision of Open Ethernet to a reality.

In addition, Mellanox has worked with Microsoft to demonstrate the key role that high-performance networking plays in improving total cloud infrastructure efficiency. In a demo led my Microsoft and with participation by Mellanox, faster and highest efficiency data movement leveraging RoCE is shown to significantly enhance storage access and applications such as virtual machine Live Migration. The high throughput, low and consistent latency, and innovative congestion management implementation on Mellanox Spectrum Ethernet switch together with intelligent network on Olympus servers have made it the best choice for supporting highly-efficient RoCE-based deployment.

Visit Mellanox Technologies at the OCP Summit

Visit Mellanox during the OCP Summit at the Santa Clara Convention Center, OCP Summit 2017, March 8 -9, booth no. C-23, to learn more about the OCP family of Connect-X adapters.

Mellanox Beijing hosts first Machine Learning & Big Data Workshop, Fosters Collaboration

Buckle your seatbelt. We are in the middle of a historic moment. A decade ago, seemingly in a galaxy far, far away, there used to the universe where we would need to program a computer so that it knew how to do things. We are however in a rouge reality now where machines can learn from experience. In fact, just last week, I was reading an article from MIT Technology Review, where a Machine Learning algorithm simply listens to Bach, then writes its own music in the same style. R2D2 would be proud.

In just the past three years, there has been a lot of advancements like this in the field of Machine Learning. One of the main reasons for this was companies like Facebook, Baidu, Microsoft and others are open sourcing their machine learning software to foster collaboration and to unlock the vast possibilities for humans to more effectively innovate with machines. [Off topic: Did you know Mellanox was the 6th most active contributor (by lines changed) in Linux 4.8 kernel?]

To that end, last week Mellanox hosted a workshop on Machine Learning to nurture a vibrant ML and big data community and to help foster collaboration to speed innovation with attendees from more than 15 companies including Tencent, Alibaba, Baidu, JD, Didi, Meituan, Sensetime, Face++, Horizon, Hisense, Xiaomi, Meizu, PerfXlab, Novumind and Momenta.

The full day workshop had an exciting agenda with speakers from Baidu, JD, ICT, NVidia, PerfXLab and Mellanox, all with the aim of helping attendees learn how to tap into the power of Big Data and Machine Learning with RDMA, GPU Direct Technology, rCUDA, Machine Learning Use cases and optimization.


The workshop kicked off with presentation from Eyal Waldman, CEO of Mellanox, who highlighted the new possibilities of Machine Learning with Intelligent Network.


This was followed by a presentation from Yunquan Zhang from ICT, leader of China HPC and Big Data Community. He mapped out the landscape of HPC and 3Big Data/Machine Learning trends in China. One of the interesting trends was that 60 percent of Top100 machines in China are from Web2.0 companies running HPC applications and growing faster than traditional scientific computing labs and government organizations.


Chuan Lu from NVidia presented how NVidia’s GPU Technology accelerates deep learning and how GPUDirect RDMA, by jointly partnering with Mellanox, helps them achieve it.

Later Jie Zhou from Baidu introduced how RoCE (RDMA over Converged Ethernet) accelerates Baidu’s deep learning framework. The key takeaway from his presentation was how migrating from standard TCP/IP to RDMA helped Baidu reduce the computing and communication runtime rate from 1:3 to 1:1. This was by far the greatest achievement for Paddle’s speedup.








Yu Chen from JD introduced how Machine Learning and Mellanox’s Infiniband helps them drive their new business model for intelligent customer service. Their entire ML system was based on a cluster of 20 GPU servers interconnected with Infiniband fabrics.


Xianyi Zhang from PerfXLab introduced an optimized deep learning performance with their software.


Finally, Qingchun Song from Mellanox rounded out the day with a talk on how intelligent network helps accelerate big data and Machine Learning applications including the RDMA acceleration for Spark and Tensorflow.

With the grand success of our first workshop, we look forward to hosting many more to help nurture and foster a collaborative effort to help advance the field of Machine Learning and Artificial Intelligence. Today, we live in a semantic economy where everything is interconnected and businesses maintain their edge over others by creating new informational values from machine learning. Power no longer resides at the top of the heap, but rather the center of intelligent networks. May the force of Machine Learning and AI be with you!

Tencent Galvanizes OpenPower with Mellanox 100GbE Network, Breaking World Records!

I was fascinated last week while reading an article on Bloomberg, which showed how Big Data has essentially transformed Guizhou, one of the poorest provinces in China into the third-fastest growing province in the country now boasting a rapid development pace of 10.5 percent. Historically, China has been the economic powerhouse of manufacturing and labor. But the next economic wave is all dependent on China’s ability to capitalize Big Data and advanced analytics. In fact, just last month, Gartner released the top 10 technology trends driving the Chinese economy, with Big Data listed as the crucial backbone for 6 of the 10 trends. With these technological investments, business intelligence shows promise, not only as a way to improve operations, but also as a means to derive value out of the rapidly growing amount of consumer data.

BAT – Romance of the Three Kingdoms

Romance of the Three Kingdoms”, a classical novel of Chinese literature, describes the wars that take place among three powerful kingdoms fighting to replace the dwindling Han Dynasty around 1800 years ago. Similarly, with more than 650 million tech-savvy users, today China’s internet world is dominated by three kingdoms – Baidu, Alibaba and Tencent (collectively referred to as BAT). Baidu, sometimes called the Google of China, holds a commanding 71 percent of market share in search. Alibaba holds a similar powerful market share in e-commerce with a record $14.3 billion last year in total sales volume during single’s day. Tencent is the dominant player in social media with over 600M active users. In fact, just in June this year, Tencent zoomed past Alibaba to become China’s most valuable tech company.

These three kingdoms have made huge progress towards becoming technological powerhouses in industries such as online services, smartphone technology and telecommunication. Their widely successful internet services give them a treasure trove of data to analyze and plenty of customers to experiment on. In early 2015, Baidu made a huge impact in Artificial Intelligence by announcing the Minwa Supercomputer, powered by Mellanox network and NVidia GPUs. Alibaba, on its end, held the record of fastest Daytona GraySort and MinuteSort with 15.9TB/min and 7.7TB receptivity and Indy GraySort and MinuteSort with 18.2TB/min and 11TB respectively. Yesterday, Tencent along with Mellanox and IBM, announced that it has been named the 2016 winner of Sort Benchmark’s annual global computing competition. Tencent broke records in the GraySort and MinuteSort categories, improving last year’s Alibaba overall results by up to five times and achieving more than one Terabyte/second of sort performance. In addition, the results improved by up to 33 times per node.

Terasorting on Tencent Cloud’s OpenPower-based Cluster

Each year, leading global companies and academic institutions participate in the Sort contest to evaluate the capability of their software and hardware system architectures, as well as their research results. The TeraSort benchmark ( is touted as the gold standard of sort benchmarks. TencentCloud Intelligent Distributed Computing Platform participated in two of the four competition categories – GraySort and MinuteSort for both Daytona (general purpose sort) and Indy (special purpose sort).

Table 1: 2016 Sort Benchmark Contest Results

Table 1: 2016 Sort Benchmark Contest Results

Using 512 OpenPower-based servers, with NVMe-based storage and Mellanox ConnectX®-4 100Gbps Ethernet adapters, TencentCloud spent less than 99 seconds to finish sorting a massive 100 terabytes of data, and used 85 percent less servers than the 3,377 servers used by last year’s winner. To achieve this, Tencent developed their own sort application and tuned it for specifically for the benchmark. Managing the combination of sort, NVMe storage and high-performance CPU, pushes the analytics boundary and hence latency and bandwidth of the network which plays a crucial part in achieving maximum performance. With advanced hardware-based stateless offloads and flow steering engine, Mellanox’s ConnectX-4 adapter reduces the CPU overhead in packet processing and provides the lowest latency and highest bandwidth.

“Mellanox ConnectX-4 100GbE NIC optimizations include enabling Large Send Offload (LSO), Large Receive Offload (LRO), and 64KB socket buffers to leverage LSO and LRO, using large packets (MTU 9000), and managing interrupt NUMA affinity. When the shuffle stage is run in isolation, per-node sustained throughput is close to 10GB/s.”



Figure 1: OpenPower Rack with 100GbE-based ConnectX-4 adapters, SpectrumTM switches and LinkXTM Optical cables.

The 32 rack cluster is equipped with 16 servers that are interconnected with Mellanox Spectrum-based 32-port 100GbE SN2700 (32 leaf switches and 16 spine switches) and Mellanox’s 100GbE LinkX optical cables. Sixteen ports in a leaf switch are connected to the servers in the rack and the other 16-ports are connected to each of the spine switches. The SN2700 switch provides the highest performance fabric solution in a 1U form facto, delivering non-blocking throughput for big data workloads, with predictable low-latency and Zero Packet Loss. Due to bursty network traffic in big data workloads, non-blocking switches play a crucial role in delivering predictable real-time analytics. In addition, Mellanox DAC and optical cables offer reliable connection with the highest quality, featuring error rates of up to 100 times lower than the industry standard. To learn more check out: Tencent whitepaper.

The key takeaway with this accomplishment is that the 10GbE-based network can no longer sustain the demand for real-time and advanced analytics as the industry is rapidly migrating to faster CPUs with faster flash-based SSDs and NVMe storage. While fewer enterprise customers will jump to 40GbE network, many will migrate to a more efficient and cost-effective 25/50/100GbE network. In fact, moving to 25GbE today makes a perfect sense, allowing businesses to future-proof their data center fabrics. On the other hand, hyperscale companies such as Baidu, Alibaba and Tencent who are the vanguards of technological innovations, will drive the demand for 100GbE based network as a way to solve their challenging analytics problem.

Mellanox Supercharges Spark, Delivers Industry’s First 100TB Spark SQL Benchmark

Growing up in the 90s, I spent considerable time watching exciting Formula 1 racing. For other kids, it was cartoons, but for me, the excitement of racing became my lifelong passion. It was not just the adrenaline rush from seeing stars such as Michael Schumacher and Rubens Barrichello race to the checkered flag, but the hours clocked in by the engineers with the passion to go few seconds faster and their absolute drive for perfection. Wanna see something cool? Check out the moves on this pit stop – if this video doesn’t blow your spark plugs, then you probably need a dealer service visit!

It was frequently mere seconds that turned the tide in a race, and as long as Formula 1 has been around, there has been advancement in aerodynamics, material science and data analysis to win. Just like in racing, the inherent desire to best the competition is also what keeps our technology industry thriving, bringing a revolutionary product to our door every few months.

Speed and efficiency are not just important in motorsports but for several applications that can change our way of life, such as security, finance and healthcare. It is the responsibility of hardware and software vendors to cater to these ever-growing needs, where finishing a second faster could mean saving millions of dollars, or even lives. To that end, last week at World of Watson and Spark Summit, IBM, Mellanox, Lenovo and Intel together showcased a solution to address the need for a faster analytics system with the highest efficiency in today’s data-driven world.

The result was industry’s first 100 Terabyte Spark SQL solution that showed up to five times faster query response with three times the efficiency when compared with the current-gen solution. Powered by IBM Open Platform with Apache Hadoop with Spark SQL optimization from IBM Spark Technology Center, and with the building blocks of compute, network and storage that is the epitome of speed and efficiency. With this Hadoop/Spark-based solution, enterprises can now accelerate their existing SQL applications, gaining faster insight from their business data.


Figure 1: Industry’s First 100TB Spark SQL powered by a Mellanox 100GbE network.

The solution consists of 30 Lenovo servers with 28 data nodes (X3650 M5) and two management nodes (x3640 M5). Each data nodes is powered by dual-socket Intel E5-2697 V4 processors (36 cores total) and loaded with 1.5TB of memory and 16TB of Intel NVMe SSDs. With an increasing need to use faster memory and storage for IOPS-intensive big data workloads such as Spark SQL, a faster network is of paramount importance. It takes ~10HDDs per data node to exceed what a 10Gbps can do, but only a single NVMe SSD to drive 25GbE network. Just four of these can fill a 100GbE pipe.

Network latency also becomes crucial for such transactional-centric workload to consistently deliver low- latency queries. The solution uses the industry’s lowest-latency and highest throughput adapter, the Mellanox ConnectX®-4 100GbE NIC. The solution is connected with a 100GbE non-blocking, zero packet loss Spectrum SN2700 switch and LinkX® DAC (Direct Attached Copper) cables.

Mellanox ConnectX-4 Ethernet adapters provide the most flexible interconnect solution for Big Data, Cloud and HPC applications at speeds of 10/25 and 40/50/100Gbps. Big Data applications utilizing TCP or UDP over IP transport can achieve the highest efficiency and application density with the hardware-based stateless offloads and flow steering engines. These advanced offloads reduce CPU overhead for packet processing and lower query latency.

Figure 2: Mellanox ConnectX-4 100GbE adapter

Figure 2: Mellanox ConnectX-4 100GbE adapter

Mellanox Spectrum-based SN2700 (32x 100GbE ports) switches provide the highest- performance fabric solution in a 1U form factor while delivering non-blocking throughput for big data workloads. They also feature predictable low-latency, zero-packet loss (ZPL) and microburst absorption that is 9x to 15x times better than the competitors (To learn more read the blog on Can you Afford an Unpredictable Network?). Due to the bursty network traffic in big data workloads, non-blocking switches are crucial in delivering predictable SQL query completion time. In addition, LinkX DAC cables offer reliable connections at speeds from 10 to 100Gb/s with highest quality, featuring error rates up to 100x lower than industry standards.


Figure 3: Mellanox Spectrum SN2700 switch (left); Figure 4: Mellanox 100GbE DAC cable (right)

To demonstrate the performance, this cluster ran a Hadoop-DS (derivative of TPC-DS) benchmark with IBM Spark supporting many of the SQL 2003 features required by the benchmark. Our results showed up to five times faster query times when compared with the current-gen solution, along with three times better energy efficiency in just one-fifth of floor space.

To learn more about the solution please refer to this brief and also check out the demo and see the blazingly fast results for yourself!


Why Does Next-Gen Video Studio need Ethernet Infrastructure? Catching Pokémon in Containers with Mellanox Switches

What does Pokémon Go and Containers have in common? Let’s find out – but first, a little introduction.

Pokémon Go is a gaming app that uses geolocation and augmented reality (AR). For those very few who don’t know about this game, the objective is to find a Pokémon and capture them in a PokéBall. When Pokémon Go was launched in July, it became the top grossing app on the U.S. app store in just 24 hours. In fact, the app now has three times more Android downloads than Tinder and nearly as many daily active users as Twitter. Users spend more time in this particular app than they do Snapchat.

Both Containers and Pokémon have been around for a while. Containerization, in effect, is an OS-level virtualization (as opposed to VMs, which run on hypervisors, each with its own OS). Containers are easily packaged, lightweight and designed to run anywhere. What is most intriguing is the striking similarity between the PokéBall and containers. Just like how a lightweight PokéBall is designed to virtually capture different Pokémon characters, a lightweight container is designed to provide virtualization to microservices.

Another interesting aspect of the game is the use of AR to capture these Pokémons. But it’s not just video games like Pokémon Go that takes us away to alternate and extended realities. Today, news, sports, music, film, social media and virtually every other form of visual entertainment content provider is dabbling with the technology. Just like enabling millions of users to access robust multi-media content, the creation of advanced virtual reality media content also requires the most advanced networking technologies, For example, BBC’s coverage of 2016 Olympics utilized augmented reality graphics that enhanced the story and breakdown the data for viewers.

The move to support these next-gen AR/VR applications and other use cases at 4K and 8K resolution is exposing cracks in current technologies. The legacy Serial Digital Interface (SDI) technology is unable to keep up with the requirement to send uncompressed video signals across the broadcast center. Furthermore SDI is simply just not flexible enough going forward. For example:

SDI can waste over 35 percent of the data bits communicated, depending upon the video resolution and frame rate.

These data bits are largely redundant and not necessary! Moreover, as a proprietary, hardware defined interface, it does not offer the flexibility and extensibility of software-based solutions on top of industry standard networking solutions. The desire for something better is causing a major technology upheaval in the industry: the migration from SDI to IP-based studio infrastructure. A Joint Task Force of Networked Media (JT-NM) announced a Reference Architecture at IBC 2015 which describes a conceptual model for interoperability that will allow end users and manufacturers to benefit from the flexibility, scalability and cost saving of this IP-based approach. [To learn more read the Blog on the Video Studio of the Future].


Fig 1

Figure 1: Traditional SDI based Media Studio

Fig 2

Figure 2: Next-Gen IP based Media Studio

While the JT-NM laid out the reference architecture, the task of creating interoperable solutions was left to initiatives such as Video Services Forum (VSF) TR03 and TR04 and Advanced Media Workflow Association (AMWA) Networked Media Incubator projects and the resulting Networked Media Open Specification (NMOS). Mellanox, a member of JT-NM, VSF and AMWA, has worked with the community to develop an innovative solution wherein we have implemented the three major components of the reference architecture as micro services on the fabrics. These three components are:

  • Registration & Discovery Services: Ensures that all devices, specifically senders (Video Camera) and receivers (Multiviewer, Mix Out) on a broadcast IP network infrastructure can find each other, and obtain appropriate information about each other and their corresponding capabilities and functionality
  • Connection Manager: Provides the ability for the user to “take” and/or “park” networked video streams to any of the receivers (i.e. Multiviewers). On demand, the Connection Manager sends the required meta-data to the receiver which, in turn, connects to the requested multicast network video stream. Finally, the receiver updates the registry to notify the network that is displaying the requested video
  • Web App: Provides 360-deg view of the broadcast IP network. It visualizes the current situation within the network displaying the requisite Nodes, Devices, Senders and Receivers. It displays each devices capabilities and functionality. It also shows the current relationship between receivers and senders allowing the user to see exactly which video stream is being displayed

Fig 3

Figure 3: JT-NM Microservices on Mellanox Spectrum Switch

In addition to providing the industry’s highest application fairness and zero packet loss for video applications, Mellanox SpectrumTM switches fully support containers like lxc and Docker thereby allowing applications and microservices to be hosted on the switches. Running these microservices on Mellanox Spectrum switch provides three major benefits:

  1. Ease-of-scale and management by replicating these microservices across several switches especially in an environment where there are hundreds of cameras,
  2. Efficient and simplistic implementation since the reference architecture demands these microservices close to the fabric and,
  3. Elimination of unnecessary server for processing in environments that do not require one.

At IBC 2016, happening in Amsterdam Sept. 9-13, Mellanox will be working with the Joint Task Force on an interoperability demo to showcase this capability at the IBCTV IP Studio area (Hall 8 – Stand 8.D10 – IBCTV IP Studio) along with 45 other technology partners. Mellanox, Arista and Cisco are the three network vendors in this demo and Mellanox is the only one to support both the transport service and the registration service on the fabrics.

Mellanox will also contribute to the “end-to-end IP” demo on the EBU stand (Hall 10 – 10.F20 – EBU), the live capture system with AMWA Incubator partners, sending live and not-so-live content from store to downstream IP distribution and personalization systems from other EBU partners.

In addition, a number of key Mellanox partners will be showing demo in their booths – namely DDN, SuitcaseTV, TAGvs, Embrionix, ATTO and GrassValley. This is why, when it’s time to choose your next networking technology provider for your media data center, you will choose Mellanox – but don’t just take my word for it, check out the demos and see for yourself!



OpenCloud Speeds Ahead with Mellanox at OpenStack Summit

Mellanox is at Vancouver this week and the frequency of #OpenStack tweets have quadrupled. If you are wondering how they are related, it’s because every 6 months, the industry showcases the next coolest thing in cloud at the OpenStack Summit.


For those who are unaware, OpenStack is an open source cloud operating system, which initially began as a joint project between Rackspace and NASA and was quickly embraced by the entire industry, from hot startups to big enterprises. Year after year, this honey pot has attracted more bees than ever imagined.


This year marks a big landmark for the OpenStack community partly because several organizations propelled OpenStack from a ‘test bed’ to a ‘production ready’ cloud [Read Walmart and Fujitsu story].


Continue reading

Cutting Edge Innovation in Hyperscale Architecture at Open Compute Project Summit #OCPSummit15

It is that time of the year at Mellanox, where we proudly present some of the coolest things our team has worked on! This time it is going to be at the Open Compute Project (OCP) Summit which will be held in the heart of Silicon Valley – San Jose Convention Center on March 11-12, 2015. It is impressive to see how hyper-scale architecture has been revolutionized in just 4 years.


What started as a small project from the basement of Facebook office in Palo Alto has come alive in the form of some cutting edge innovation in racks, server, networking and storage. Some of these innovations from Mellanox will take the center stage during the OCP summit that will accelerate the advancement of data center components, mainly server and networking. Key highlights during the OCP events are:


ConnectX-4 and Multi-Host:  Back in November, Mellanox announced the industry’s first 100GbE interconnect adapter pushing the innovation in the networking arena in HPC, Cloud, Web2.0, storage and enterprise applications. With a throughput of 100 Gb/s, bidirectional throughput of 195 Gb/s, application latency of 610 nanoseconds and message rate of 149.5 million messages per second, ConnectX-4 InfiniBand adapters provide the means to increase data center return on investment while reducing IT costs.

Ramnath Sagar 031015 Fig 1


Today Mellanox took a step further, by announcing Multi-Host Technology – a ground-breaking server disaggregation technology. Mellanox’s Multi-Host technology enables direct connectivity of multiple heterogeneous hosts (x86, Power, ARM, GPU etc.) to a single network controller, thus keeping the hosts completely independent of each other, yet saving on switch ports, cables, real estate and power.

Continue reading