Numpy is a core module of Python scientific computing. It provides very efficient array objects, as well as tools for working with these array objects. A Numpy array consists of many values, all of the same type.
Python's core library provides List lists. Lists are one of the most common Python data types, and they can be resized and contain elements of different types, which is very convenient.
So what is the difference between List and Numpy Array? Why do we need to use Numpy Array when processing big data? The answer is performance.
Numpy data structures perform better in the following aspects:
1. Memory size—Numpy data structures take up less memory.
2. Performance - The bottom layer of Numpy is implemented in C language, which is faster than lists.
3. Operation methods - built-in optimized algebraic operations and other methods.
The following explains the advantages of Numpy arrays over Lists in big data processing.
1. Smaller memory usage
If you use Numpy arrays instead of Lists appropriately, you can reduce your memory usage by 20 times.
For Python's native List, since every time a new object is added, 8 bytes are needed to reference the new object, and the new object itself occupies 28 bytes (taking integers as an example). So the size of the list can be calculated with the following formula:
64 8 * len(lst) len(lst) * 28 bytes
while using Numpy , which can reduce a lot of space occupied. For example, a Numpy integer Array of length n requires:
96 len(a) * 8 bytes
It can be seen that the larger the array, the more money you save The more memory space there is. Assuming your array has 1 billion elements, then the difference in memory usage will be on the GB level.
2. Faster, built-in calculation method
Run the following script, which also generates two arrays of a certain dimension and adds them together. You can see the native List and Numpy Array. performance gap.
import time import numpy as np size_of_vec = 1000 def pure_python_version(): t1 = time.time() X = range(size_of_vec) Y = range(size_of_vec) Z = [X[i] + Y[i] for i in range(len(X)) ] return time.time() - t1 def numpy_version(): t1 = time.time() X = np.arange(size_of_vec) Y = np.arange(size_of_vec) Z = X + Y return time.time() - t1 t1 = pure_python_version() t2 = numpy_version() print(t1, t2) print("Numpy is in this example " + str(t1/t2) + " faster!")
The results are as follows:
0.00048732757568359375 0.0002491474151611328 Numpy is in this example 1.955980861244019 faster!
As you can see, Numpy is 1.95 times faster than native arrays.
If you are careful, you can also find that Numpy array can directly perform addition operations. Native arrays cannot do this. This is the advantage of Numpy's operation method.
We will do several more repeated experiments to prove that this performance advantage is durable.
import numpy as np from timeit import Timer size_of_vec = 1000 X_list = range(size_of_vec) Y_list = range(size_of_vec) X = np.arange(size_of_vec) Y = np.arange(size_of_vec) def pure_python_version(): Z = [X_list[i] + Y_list[i] for i in range(len(X_list)) ] def numpy_version(): Z = X + Y timer_obj1 = Timer("pure_python_version()", "from __main__ import pure_python_version") timer_obj2 = Timer("numpy_version()", "from __main__ import numpy_version") print(timer_obj1.timeit(10)) print(timer_obj2.timeit(10)) # Runs Faster! print(timer_obj1.repeat(repeat=3, number=10)) print(timer_obj2.repeat(repeat=3, number=10)) # repeat to prove it!
The results are as follows:
0.0029753120616078377 0.00014940369874238968 [0.002683573868125677, 0.002754641231149435, 0.002803879790008068] [6.536301225423813e-05, 2.9387418180704117e-05, 2.9171351343393326e-05]
It can be seen that the second output time is always much smaller, which proves that this performance advantage is persistent.
So, if you are doing some big data research, such as financial data and stock data, using Numpy can save you a lot of memory space and have more powerful performance.
References:https://www.php.cn/link/5cce25ff8c3ce169488fe6c6f1ad3c97
Our article ends here, if you like Please continue to follow us for today’s Python practical tutorial.
The above is the detailed content of Why must Python big data use Numpy Array?. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于Seaborn的相关问题,包括了数据可视化处理的散点图、折线图、条形图等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于进程池与进程锁的相关问题,包括进程池的创建模块,进程池函数等等内容,下面一起来看一下,希望对大家有帮助。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于简历筛选的相关问题,包括了定义 ReadDoc 类用以读取 word 文件以及定义 search_word 函数用以筛选的相关内容,下面一起来看一下,希望对大家有帮助。

VS Code的确是一款非常热门、有强大用户基础的一款开发工具。本文给大家介绍一下10款高效、好用的插件,能够让原本单薄的VS Code如虎添翼,开发效率顿时提升到一个新的阶段。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于数据类型之字符串、数字的相关问题,下面一起来看一下,希望对大家有帮助。

pythn的中文意思是巨蟒、蟒蛇。1989年圣诞节期间,Guido van Rossum在家闲的没事干,为了跟朋友庆祝圣诞节,决定发明一种全新的脚本语言。他很喜欢一个肥皂剧叫Monty Python,所以便把这门语言叫做python。

本篇文章给大家带来了关于Python的相关知识,其中主要介绍了关于numpy模块的相关问题,Numpy是Numerical Python extensions的缩写,字面意思是Python数值计算扩展,下面一起来看一下,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 English version
Recommended: Win version, supports code prompts!

SublimeText3 Mac version
God-level code editing software (SublimeText3)
