Nvidia GPUs for data science, analytics, and distributed machine learning using Python with Dask
Nvidia wants to extend the success of the GPU beyond graphics and deep learning to the full data science experience. Open source Python library Dask is the key to this.
Nvidia has been more than a hardware company for a long time. As its GPUs are broadly used to run machine learning workloads, machine learning has become a key priority for Nvidia. In its GTC event this week, Nvidia made a number of related points, aiming to build on machine learning and extend to data science and analytics.
Nvidia wants to “couple software and hardware to deliver the advances in computing power needed to transform data into insights and intelligence.” Jensen Huang, Nvidia CEO, emphasized the collaborative aspect between chip architecture, systems, algorithms and applications.
Therefore, Nvidia is focused on building a GPU developer ecosystem. According to Huang, the GPU developer ecosystem is growing fast: the number of developers has grown to more than 1.2 million from 800,000 last year. And what can you build a developer ecosystem on? Open source software.
Nvidia announced the CUDA-X AI SDK for GPU-Accelerated Data Science at GTC. Nvidia touts CUDA-X AI as an end-to-end platform for the acceleration of data science, covering many parts of the data science workflow. The goal is to use GPUs to parallelize those tasks to the degree possible, thus speeding them up.
A key part of CUDA-X AI is RAPIDS. RAPIDS is a suite of open-source software libraries for executing end-to-end data science and analytics pipelines entirely on GPUs. And a key part of RAPIDS is Dask. Dask is an open source framework whose goal is to natively scale Python.
As Python is the language of choice for most data science work, you can see why Nvidia chose to make this a key part of its strategy. ZDNet had a Q&A with Dask creator, Matthew Rocklin, who recently started working for Nvidia.