Home > Article > Backend Development > 99% of people don’t know! Python, C, C extensions, Cython differences comparison!
Let’s take the simple Fibonacci sequence as an example to test the difference in their execution efficiency.
Python code:
def fib(n): a, b = 0.0, 1.0 for i in range(n): a, b = a + b, a return a
C code:
double cfib(int n) { int i; double a=0.0, b=1.0, tmp; for (i=0; i<n; ++i) { tmp = a; a = a + b; b = tmp; } return a; }
The above is a Fibonacci sequence implemented in C. Some people may be curious why we use floating point types instead of integers? The answer is that C's integer type has a range, so we use double, and Python's float corresponds to PyFloatObject at the bottom layer, which is also stored internally by double.
C extension:
Then C extension, note: C extension is not our focus, write C extension The essence of writing Cython is the same, writing extension modules for Python, but writing Cython is definitely much simpler than writing C extensions.
#include "Python.h" double cfib(int n) { int i; double a=0.0, b=1.0, tmp; for (i=0; i<n; ++i) { tmp = a; a = a + b; b = tmp; } return a; } static PyObject *fib(PyObject *self, PyObject *n) { if (!PyLong_CheckExact(n)) { wchar_t *error = L"函数 fib 需要接收一个整数"; PyErr_SetObject(PyExc_ValueError, PyUnicode_FromWideChar(error, wcslen(error))); return NULL; } double result = cfib(PyLong_AsLong(n)); return PyFloat_FromDouble(result); } static PyMethodDef methods[] = { {"fib", (PyCFunction) fib, METH_O, "这是 fib 函数"}, {NULL, NULL, 0, NULL} }; static PyModuleDef module = { PyModuleDef_HEAD_INIT, "c_extension", "这是模块 c_extension", -1, methods, NULL, NULL, NULL, NULL }; PyMODINIT_FUNC PyInit_c_extension(void) { return PyModule_Create(&module); }
You can see that if you write a C extension, even a simple Fibonacci is very complicated.
Cython code:
Finally take a look at how to use Cython to write Fibonacci, you What do you think the code written in Cython should look like?
def fib(int n): cdef int i cdef double a = 0.0, b = 1.0 for i in range(n): a, b = a + b, a return a
How about it, are Cython codes and Python codes very similar? Although we haven't formally learned Cython's syntax yet, you should be able to guess what the above code means. We defined a C-level variable using the cdef keyword and declared their type.
Cython code must be compiled into an extension module before it can be recognized by the interpreter, so it needs to be translated into C code first and then compiled into an extension module. Again, there is essentially no difference between writing C extensions and writing Cython. Cython code also needs to be translated into C code.
But it is obvious that writing Cython is much simpler than writing C extensions. If the quality of the Cython code written is high, then the quality of the translated C code will also be high, and the translation The process will also automatically carry out maximum optimization. But if it is a handwritten C extension, then all optimizations must be handled manually by the developer, not to mention that when the functions are complex, writing C extensions itself is a headache.
Looking at the Cython code, compared with the pure Python Fibonacci, we see that the difference seems to be that the types of variables i, a, and b have been specified in advance. The key is Why does this have an acceleration effect (although it has not been tested yet, the speed will definitely increase, otherwise there is no need to learn Cython).
But the reason is here, because all variables in Python are a generic pointer PyObject *. PyObject (a structure in C) has two internal members, namely ob_refcnt: holds the reference count of the object, ob_type *: holds the pointer of the object type.
Whether it is an integer, floating point number, string, tuple, dictionary, or anything else, all variables pointing to them are a PyObject *. When operating, you must first obtain the pointer of the corresponding type through -> ob_type, and then perform conversion.
For example, for a and b in the Python code, we know that no matter which level of loop is performed, the result points to a floating point number, but the interpreter will not make this inference. Each addition must be detected to determine what type it is and converted; then when performing the addition, go to the internal __add__ method to add the two objects and create a new object; after the execution is completed, Convert the pointer to this new object to PyObject * and return.
And Python objects allocate space on the heap, and a and b are immutable, so every cycle will create a new object and recycle the previous object. Lose.
All of the above have resulted in the execution efficiency of Python code being impossible to be high. Although Python also provides a memory pool and corresponding caching mechanism, it is obviously still unable to withstand the low efficiency.
As for why Cython can accelerate, we will talk about it later.
So what is the efficiency difference between them? Let’s use a table to compare:
The improvement multiple refers to how many times the efficiency is improved compared to pure Python.
The second column is fib(0). Obviously it does not actually enter the loop. fib(0) measures the cost of calling a function. The penultimate column "Loop body time consumption" refers to the time spent executing the inner loop body when executing fib(90), excluding the overhead of the function call itself.
Overall, Fibonacci written in pure C language is undoubtedly the fastest, but there are many things worth thinking about. Let’s analyze it.
Pure Python
As expected, it is the worst performer in all aspects one. Judging from fib(0), calling a function takes 590 nanoseconds, which is much slower than C. The reason is that Python needs to create a stack frame when calling a function, and this stack frame is allocated on the heap. , and after the end, it also involves the destruction of stack frames and so on. As for fib(90), obviously no analysis is needed.
Pure C
Obviously there is no interaction with the Python runtime at this time, so the performance consumption is minimal. fib(0) shows that calling a function in C only takes 2 nanoseconds; fib(90) shows that executing a loop, C is nearly 80 times faster than Python.
C extension
#What C extension does is already mentioned above, it is to use C to write extensions for Python. module. We look at the time consumption of the loop body and find that the C extension is almost the same as pure C. The difference is that more time is spent on function calls. The reason is that when we call the function of the extension module, we need to first convert the Python data into C data, then use the C function to calculate the Fibonacci sequence, and then convert the C data into Python data.
So the C extension is essentially C language, but it needs to follow the API specification provided by CPython when writing, so that the C code can be compiled into a pyd file and directly let Python do it. transfer. From the results point of view, it is the same thing as Cython. But again, writing extensions in C is essentially writing C, and you also have to be familiar with the underlying Python/C API, which is relatively difficult.
Cython
If you look at the loop body time alone, pure C, C extension, and Cython are all about the same , but writing in Cython is obviously the most convenient. We say that what Cython does is essentially similar to C extensions. They both provide extension modules for Python. The difference is: one is to manually write C code, and the other is to write Cython code and then automatically translate it into C code. Therefore, for Cython, the process of converting Python data into C data, performing calculations, and then converting Python data back is inevitable.
But we see that Cython takes much less time when calling functions than C extensions. The main reason is that the C code generated by Cython is highly optimized. But to be honest, we don’t need to care too much about the time it takes for function calls. What we need to pay attention to is the time it takes for internal code blocks to execute. Of course, we will also talk about how to reduce the overhead of the function call itself later.
We can see from the loop body time consumption that Python’s for loop is really notoriously slow, so what is the reason? Let’s analyze it.
1. Python’s for loop mechanism
When Python traverses an iterable object, it will first call The __iter__ method inside the iterable object returns its corresponding iterator; then it continuously calls the __next__ method of the iterator to iterate the values one by one until the iterator throws a StopIteration exception, which is captured by the for loop and terminates the loop.
And iterators are stateful, and the Python interpreter needs to record the iteration status of the iterator at all times.
2. Arithmetic operations in Python
We have actually mentioned this above. Due to its own dynamic characteristics, Python cannot do any type-based optimization.
For example: a b in the loop body, this a and b can point to integers, floating point numbers, strings, tuples, lists, or even we have implemented the magic method __add__ Instance objects of classes, and so on.
Although we know that it is a floating point number, Python does not make this assumption, so every time a b is executed, what is its type? Then determine whether there is an __add__ method internally, and if so, make a call with a and b as parameters, and add the objects pointed to by a and b. After the result is calculated, its pointer is converted into PyObject * and returned.
For C and Cython, when creating a variable, the type is specified in advance as double, not others, so the compiled a b is just a simple machine instruction. In comparison, how can Python not be slow?
3. Memory allocation of Python objects
Python objects are allocated on the heap, because Python objects Essentially, C's malloc function applies for a piece of memory in the heap area for the structure. Allocating and releasing memory in the heap area requires a lot of money, but the stack is much smaller, and it is maintained by the operating system and will be automatically recycled, which is extremely efficient. The allocation and release of memory on the stack only requires a single move. Just a register.
But the heap obviously does not have this treatment, and Python objects are all allocated on the heap. Although Python introduces a memory pool mechanism, it avoids frequent interactions with the operating system to a certain extent. Interaction, and also introduces small integer object pool, string intern mechanism, cache pool, etc.
But in fact, when it comes to the creation and destruction of objects (any objects, including scalars), the overhead of dynamically allocated memory and Python's memory subsystem will be increased. The float object is immutable, so it will be created and destroyed every time it loops, so the efficiency is still not high.
And the variables allocated by Cython (when the type is a type in C), they are no longer pointers (Python variables are pointers), for the current a and b In other words, it is a double-precision floating point number allocated on the stack. The efficiency of allocation on the stack is much higher than that of the heap, so it is very suitable for for loops, so the efficiency is much higher than that of Python. In addition, not only allocation, but also when addressing, the stack is more efficient than the heap.
So it’s not surprising that C and Cython are orders of magnitude faster than pure Python when it comes to for loops, since Python does a lot of work with each iteration.
We see that in Cython code, just adding a few cdefs can achieve such a big performance improvement, which is obviously very exciting. However, not all Python code will experience huge performance improvements when written in Cython.
Our Fibonacci sequence example here is deliberate, because the data inside is bound to the CPU, and the runtime is spent processing some variables in the CPU register. No data movement is required. If this function does the following work:
##Then The differences between Python, C, and Cython may be significantly reduced (for storage-intensive operations) or even disappear completely (for I/O-intensive or network-intensive operations).
When improving the performance of Python programs is our goal, the Pareto principle helps us a lot, that is: 80% of the running time of the program is composed of Caused by the code 20. But without careful analysis, it is difficult to find these 20 percent of code. Therefore, before we use Cython to improve performance, analyzing the overall business logic is the first step.
If we determine through analysis that the bottleneck of the program is caused by network IO, then we cannot expect Cython to bring significant performance improvements. Therefore, before you use Cython, it is necessary to first determine what is causing the bottleneck in the program. So while Cython is a powerful tool, it must be used in the right way.
In addition, Cython introduced the C type system into Python, so the limitations of C data types are what we need to pay attention to. We know that Python's integers are not restricted by length, but C's integers are restricted, which means that they cannot correctly represent infinite-precision integers.
However, some features of Cython can help us catch these overflows. In short, the most important thing is: C data types are faster than Python data types, but they are subject to limitations that make them less flexible and Universal. From here we can also see that Python chose the latter in terms of speed, flexibility, and versatility.
Also, consider another feature of Cython: connecting to external code. Suppose our starting point is not Python, but C or C, and we want to use Python to connect multiple C or C modules. Cython understands C and C declarations, and it can generate highly optimized code, so it is more suitable as a bridge.
Since I am a Python major, if C and C are involved, I will introduce how to introduce C and C into Cython and directly call the already written C library. It will not introduce how to introduce Cython into C and C as a bridge connecting multiple C and C modules. I hope you understand this, because I don’t use C or C to write services, I will only use them to assist Python to improve efficiency.
So far, I have only introduced Cython and mainly discussed its positioning and its relationship with Python and C. difference. As for how to use Cython to accelerate Python, how to write Cython code, and its detailed syntax, we will introduce it later.
In short, Cython is a mature language that serves Python. Cython code cannot be executed directly because it does not comply with Python's syntax rules.
The way we use Cython is: first translate the Cython code into C code, then compile the C code into an extension module (pyd file), and then import it in the Python code and call The functional methods inside are the correct way and of course the only way for us to use Cython.
For example, the Fibonacci we wrote in Cython above will report an error if executed directly, because cdef obviously does not comply with Python's syntax rules. Therefore, the Cython code needs to be compiled into an extension module and then imported in an ordinary py file. The significance of this is to improve the running speed. Therefore, Cython codes should be CPU-intensive codes, otherwise it will be difficult to significantly improve efficiency.
So before using Cython, it is best to carefully analyze the business logic, or do not use Cython for the time being and write it completely in Python. After writing is completed, start testing and analyzing the performance of the program to see where it takes more time, but at the same time, it can be optimized through static typing. Find them, rewrite them in Cython, compile them into extension modules, and then call the functions in the extension module.
The above is the detailed content of 99% of people don’t know! Python, C, C extensions, Cython differences comparison!. For more information, please follow other related articles on the PHP Chinese website!