search
HomeBackend DevelopmentPython TutorialFireDucks: Get performance beyond pandas with zero learning cost!

Pandas is one of the most popular libraries, when I was looking for an easier way to speed up its performance, I discovered FireDucks and became interested in it!

Comparison with pandas: Why FireDucks?

A Pandas program might encounter a serious performance issue depending on how it has been written. However, being a data scientist, I want to spend more and more time analyzing data rather than improving my code performance. So, it would be great if it could do something like interchange the order of processes and speed up the program performance automatically. For example, Process A =>Process B will be slower, so we will replace it as Process B =>Process A. (Of course, the result is guaranteed to be the same.) It is said that data scientists spend about 45% of their time preparing the data, and when I was thinking of doing something to speed-up the process, I came across a module called FireDucks.

From the FireDucks documentation, it seems to be supported for Linux only platforms. Since I use Windows on my main machine, I would like to try it from WSL2 (Windows Subsystem for Linux), an environment that can run Linux on Windows.

The environment I tried is as follows.

  • OS Microsoft Windows 11 Pro
  • Version 10.0.22631 Build 22631
  • System model Z690 Pro RS
  • System Type x64-based
  • PC Processor 12th Gen Intel(R) Core(TM) i3–12100, 3300 Mhz, 4 Cores, 8 Logical Processors
  • Baseboard Product Z690 Pro RS
  • Platform Role Desktop
  • Installed Physical Memory (RAM)64.0 GB

Installing and Configuring FireDucks

Install WSL

WSL was installed with the help of the following Microsoft documentation; the Linux distribution is Ubuntu 22.04.1 LTS.

Install FireDucks

Then actually install FireDucks. It is very easy to install, though.
pip install fireducks

It will take a few minutes to install FireDucks (along with pyarrow, pandas and other libraries).

I tried executing below code, the loading speed was so fast, pandas took 4 sec and fireDucks took only 74.5 ns.

# 1. analysis based on time period and creative duration
# convert timestamp to date/time object
df['timestamp_converted'] = pd.to_datetime(df['timestamp'], unit='s ')

# define time period 
def get_part_of_day(hour): 
  if 5 



<p>All these data preprocessing and analysis took around 8 seconds in pandas, whereas it could be completed within 4 seconds when using FireDucks. Almost 2 times speed up could be achieved.</p>

<h4>
  
  
  Improved performance
</h4>

<p>One of the most stressful things about using pandas is waiting when loading large data sets, and then I have to wait for complex operation like groupby. On the other hand, since FireDucks does lazy evaluation, loading itself takes no time at all, so processing is done where it is needed, and I felt it was very significant with a great reduction in total waiting time.</p>

<p>As for other performance, it seems that up to 16 times faster compared to pandas has been achieved, as officially announced by the organization. (I will compare the performance with various competing libraries next time.)</p>

<p><img src="/static/imghwm/default1.png" data-src="https://img.php.cn/upload/article/000/000/000/172790778456832.jpg?x-oss-process=image/resize,p_40" class="lazy" alt="FireDucks: Get performance beyond pandas with zero learning cost!"></p>

<h2>
  
  
  zero learning cost
</h2>

<p>The ability to follow the exact pandas notation without having to think about anything is a huge advantage. Apart from FireDucks, there are other data frame acceleration libraries, but they are too expensive to learn and too easy to forget.</p>

<p>For example, if you want to add columns with polars, you have to write something like this.<br>
</p>

<pre class="brush:php;toolbar:false">
# pandas df["new_col"] = df["A"] + 1
# polars 
df = df.with_columns((pl.col("A") + 1).alias("new_col"))

Nearly no need to change an existing code

I have several ETLs and other projects that use pandas, and it would be nice to see a performance improvement just by installing and replacing the import statement with FireDucks.

If you wanted to add it further, feel free to comment down below.

The above is the detailed content of FireDucks: Get performance beyond pandas with zero learning cost!. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Python: compiler or Interpreter?Python: compiler or Interpreter?May 13, 2025 am 12:10 AM

Python is an interpreted language, but it also includes the compilation process. 1) Python code is first compiled into bytecode. 2) Bytecode is interpreted and executed by Python virtual machine. 3) This hybrid mechanism makes Python both flexible and efficient, but not as fast as a fully compiled language.

Python For Loop vs While Loop: When to Use Which?Python For Loop vs While Loop: When to Use Which?May 13, 2025 am 12:07 AM

Useaforloopwheniteratingoverasequenceorforaspecificnumberoftimes;useawhileloopwhencontinuinguntilaconditionismet.Forloopsareidealforknownsequences,whilewhileloopssuitsituationswithundeterminediterations.

Python loops: The most common errorsPython loops: The most common errorsMay 13, 2025 am 12:07 AM

Pythonloopscanleadtoerrorslikeinfiniteloops,modifyinglistsduringiteration,off-by-oneerrors,zero-indexingissues,andnestedloopinefficiencies.Toavoidthese:1)Use'i

For loop and while loop in Python: What are the advantages of each?For loop and while loop in Python: What are the advantages of each?May 13, 2025 am 12:01 AM

Forloopsareadvantageousforknowniterationsandsequences,offeringsimplicityandreadability;whileloopsareidealfordynamicconditionsandunknowniterations,providingcontrolovertermination.1)Forloopsareperfectforiteratingoverlists,tuples,orstrings,directlyacces

Python: A Deep Dive into Compilation and InterpretationPython: A Deep Dive into Compilation and InterpretationMay 12, 2025 am 12:14 AM

Pythonusesahybridmodelofcompilationandinterpretation:1)ThePythoninterpretercompilessourcecodeintoplatform-independentbytecode.2)ThePythonVirtualMachine(PVM)thenexecutesthisbytecode,balancingeaseofusewithperformance.

Is Python an interpreted or a compiled language, and why does it matter?Is Python an interpreted or a compiled language, and why does it matter?May 12, 2025 am 12:09 AM

Pythonisbothinterpretedandcompiled.1)It'scompiledtobytecodeforportabilityacrossplatforms.2)Thebytecodeistheninterpreted,allowingfordynamictypingandrapiddevelopment,thoughitmaybeslowerthanfullycompiledlanguages.

For Loop vs While Loop in Python: Key Differences ExplainedFor Loop vs While Loop in Python: Key Differences ExplainedMay 12, 2025 am 12:08 AM

Forloopsareidealwhenyouknowthenumberofiterationsinadvance,whilewhileloopsarebetterforsituationswhereyouneedtoloopuntilaconditionismet.Forloopsaremoreefficientandreadable,suitableforiteratingoversequences,whereaswhileloopsoffermorecontrolandareusefulf

For and While loops: a practical guideFor and While loops: a practical guideMay 12, 2025 am 12:07 AM

Forloopsareusedwhenthenumberofiterationsisknowninadvance,whilewhileloopsareusedwhentheiterationsdependonacondition.1)Forloopsareidealforiteratingoversequenceslikelistsorarrays.2)Whileloopsaresuitableforscenarioswheretheloopcontinuesuntilaspecificcond

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment