search
HomeBackend DevelopmentPython Tutorialgoodbye! Python loops, vectorization is amazing

We have learned about loops in almost all programming languages. So, by default, whenever there is a repetitive operation, we start implementing loops. But when we're dealing with a lot of iterations (millions/billions of rows), using loops is a real pain, and you might get stuck for hours, only to realize later that it doesn't work. This is where implementing vectorization in Python becomes super critical.

What is vectorization?

Vectorization is a technique for implementing (NumPy) array operations on data sets. Behind the scenes, it operates on all elements of the array or series at once (unlike a 'for' loop, which operates one row at a time).

In this blog, we will look at some use cases where we can easily replace Python loops with vectorization. This will help you save time and become more proficient at coding.

Use Case 1: Finding the Sum of Numbers

First, let’s look at a basic example of finding the sum of numbers in Python using loops and vectors.

Using loops

import time 
start = time.time()

# 遍历之和
total = 0
# 遍历150万个数字
for item in range(0, 1500000):
total = total + item

print('sum is:' + str(total))
end = time.time()

print(end - start)

#1124999250000
#0.14 Seconds

Using vectorization

import numpy as np

start = time.time()

# 向量化和--使用numpy进行向量化
# np.range创建从0到1499999的数字序列
print(np.sum(np.arange(1500000)))

end = time.time()
print(end - start)

##1124999250000
##0.008 Seconds

Execution of vectorization compared to iteration using range functions The time is about 18 times. This difference becomes even more apparent when working with Pandas DataFrame.

Use Case 2: DataFrame Mathematical Operations

In data science, when using Pandas DataFrame, developers use loops to create new derived columns for mathematical operations.

In the example below, we can see that in such use cases, loops can easily be replaced by vectorization.

Create DataFrame

DataFrame is tabular data in the form of rows and columns.

We are creating a pandas DataFrame with 5 million rows and 4 columns filled with random values ​​between 0 and 50.

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 50, 
size=(5000000, 4)),
columns=('a','b','c','d'))
df.shape
# (5000000, 5)
df.head()

goodbye! Python loops, vectorization is amazing

We will create a new column 'ratio' to find the ratio of columns 'd' and 'c'.

Using loops

import time 
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
# creating a new column 
df.at[idx,'ratio'] = 100 * (row["d"] / row["c"])
end = time.time()
print(end - start)
### 109 Seconds

Using vectorization

start = time.time()
df["ratio"] = 100 * (df["d"] / df["c"])

end = time.time()
print(end - start)
### 0.12 seconds

We can see that there is a significant improvement in DataFrame, with python Compared to the loop in , vectorization is almost 1000 times faster.

Use case 3: If-else statement on DataFrame

We have implemented many operations that require us to use "if-else" type logic. We can easily replace this logic with vectorized operations in python.

Have a look at the example below to understand it better (we will use the DataFrame created in use case 2).

Imagine how to create a new column 'e' based on some conditions of the exited column 'a'.

Using loops

import time 
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
if row.a == 0:
df.at[idx,'e'] = row.d
elif (row.a <= 25) & (row.a > 0):
df.at[idx,'e'] = (row.b)-(row.c)
else:
df.at[idx,'e'] = row.b + row.c

end = time.time()

print(end - start)
### Time taken: 177 seconds

Using vectorization

start = time.time()
df['e'] = df['b'] + df['c']
df.loc[df['a'] <= 25, 'e'] = df['b'] -df['c']
df.loc[df['a']==0, 'e'] = df['d']end = time.time()
print(end - start)
## 0.28007707595825195 sec

Compared to python loops with if-else statements, Vectorized operations are 600 times faster than loops.

Use Case 4: Solving Machine Learning/Deep Learning Networks

Deep learning requires us to solve multiple complex equations, and for millions and billions of rows of equations. Running loops in Python to solve these equations is very slow, at which point vectorization is the best solution.

For example, you want to calculate the y values ​​for millions of rows in the following multiple linear regression equation.

goodbye! Python loops, vectorization is amazing

We can use vectorization instead of looping.

The values ​​of m1,m2,m3... are determined by solving the above equation using millions of values ​​corresponding to x1,x2,x3... (for simplicity, only look at one Simple multiplication steps)

Create data

>>> import numpy as np
>>> # 设置 m 的初始值 
>>> m = np.random.rand(1,5)
array([[0.49976103, 0.33991827, 0.60596021, 0.78518515, 0.5540753]])
>>> # 500万行的输入值
>>> x = np.random.rand(5000000,5)

goodbye! Python loops, vectorization is amazing

##Use a loop

import numpy as np
m = np.random.rand(1,5)
x = np.random.rand(5000000,5)

total = 0
tic = time.process_time()

for i in range(0,5000000):
total = 0
for j in range(0,5):
total = total + x[i][j]*m[0][j] 

zer[i] = total 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + "seconds")

####Computation time = 28.228 seconds

Matrix multiplication of vectors is implemented in the backend using vectorization

goodbye! Python loops, vectorization is amazing

tic = time.process_time()

#dot product 
np.dot(x,m.T) 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + "seconds")

####Computation time = 0.107 seconds

np.dot. It's 165 times faster compared to loops in python.

Written at the end

Vectorization in Python is very fast. When dealing with very large data sets, it is recommended that you should give priority to vectorization instead of loops. In this way, over time, you will gradually become accustomed to writing code according to vectorization ideas.

The above is the detailed content of goodbye! Python loops, vectorization is amazing. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
详细讲解Python之Seaborn(数据可视化)详细讲解Python之Seaborn(数据可视化)Apr 21, 2022 pm 06:08 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

详细了解Python进程池与进程锁详细了解Python进程池与进程锁May 10, 2022 pm 06:11 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

Python自动化实践之筛选简历Python自动化实践之筛选简历Jun 07, 2022 pm 06:59 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

归纳总结Python标准库归纳总结Python标准库May 03, 2022 am 09:00 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于标准库总结的相关问题,下面一起来看一下,希望对大家有帮助。

分享10款高效的VSCode插件,总有一款能够惊艳到你!!分享10款高效的VSCode插件,总有一款能够惊艳到你!!Mar 09, 2021 am 10:15 AM

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

Python数据类型详解之字符串、数字Python数据类型详解之字符串、数字Apr 27, 2022 pm 07:27 PM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

python中文是什么意思python中文是什么意思Jun 24, 2019 pm 02:22 PM

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。

详细介绍python的numpy模块详细介绍python的numpy模块May 19, 2022 am 11:43 AM

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)