A Beginner’s Guide to the Awesome Dask Library!

Freyam Mehta
2 min readJul 30, 2021
Time to read some Documentation!

What does Dask mean to me? What does Dask even do? What even is Dask in the first place?

Simply put, Dask is a parallel computing framework that scales the Python ecosystem already in effect. It’s a python library just like numpy, matplotlib, scipy, opencv, cryptography. One just needs to type import dask and voila! You are all set to unleash the magic ✨.

It was developed in cooperation with other open-source projects like NumPy, Pandas, and Scikit-Learn. Dask is unique in that it gives a fairly comparable experience to the libraries listed. Dask makes it simple to transition between NumPy, pandas, and scikit-learn and their Dask-powered counterparts by leveraging existing Python APIs and data structures. Dask Arrays ← Numpy Arrays, Dask DataFrames ← Pandas DataFrames, and Dask ML ← Scikit-Learn.

Dask’s schedulers grow to thousand-node clusters, and its algorithms have been tested on some of the world’s greatest supercomputers. But you don’t need a large cluster to get started. Dask comes with various schedulers geared for usage on home computers. Many individuals today use Dask to scale calculations on their laptops, employing several cores for processing and their drive for additional storage.

Dask in a nutshell!

My Area of Work

My GSoC Project primarily deals with improving the visualization of the Graph and the HTML Representations of the Dask computations. I am trying to make them more illustrative, attractive, and informative.

More information about my work in the final blog ⭐️

Dask Users 😎

--

--