Why you need Modin: Pandas alternative for fast big data proce | Big Data Science

Why you need Modin: Pandas alternative for fast big data processing
Handling large frames of data with Pandas is slow because this Python library does not support working with data that does not fit in available memory. As a result, Pandas workflows that work well for prototyping a few MB of data don't scale to a real or hundreds of real GB dataset. Therefore, due to the single-threaded execution of operations in RAM, Pandas is not very suitable for processing really large data sets. with a wide range of data. There is an alternative - the Modin, Python-library with a Pandas-like API that scales to all processor cores using the Dask or Ray engine.
Modin supports working with data that won't fit in, so you can comfortably work with hundreds of GB without worrying about massive memory slowdowns or memory errors. With support for the cluster and beyond the core, Modin represents the use of a DataFrame with exceptional performance on a single node and high scalability in a cluster.
In the context of an algorithm (no cluster), Modin will create and manage a local (Dask or Ray) cluster for execution. There is no need to suggest how to evaluate the data, or even know how many cores the system has. Extraction, you can use code with Pandas by simply changing the library import statement from pandas to modin.pandas and getting a significant speedup even on a single machine. Modin speeds up to 4x on a laptop with 4 main cores.
Docs: https://modin.readthedocs.io/en/latest/index.html
Github: https://github.com/modin-project/modin

Big Data Science

👨‍🎤 1.44K
Technologies

Big Data Science channel gathers together all interesting facts about Data Science. For cooperation: a.chernobrovov@gmail.com. 💼 — https://t.me/bds_job — channel about Data Science jobs and car...

Join
▲ Vote (1)

Why you need Modin: Pandas alternative for fast big data proce | Big Data Science

Login