Speeding up Python

高洛峰
高洛峰Original
2016-10-19 13:25:121190browse

Originally, I never knew how to better optimize the performance of web pages. Then recently, when I was comparing the rendering speed of similar web pages in Python and PHP, I accidentally discovered a very simple and idiotic method that I have never discovered (I have to BS me Self): Just like some PHP applications such as Discuz forum, print out "how many seconds this page was generated" in the generated web page, and then when you keep visiting the web page to test, you can intuitively find out what operations will happen. Causes bottleneck, how to solve the bottleneck.

So I found that when SimpleCD generated the homepage, it unexpectedly took about 0.2 seconds, which was really unbearable: compared to the average generation of the Discuz forum’s homepage, it only takes 0.02 seconds, and the Discuz forum’s homepage is undoubtedly much more complicated than the SimpleCD’s homepage; This makes me feel so embarrassed, because this gap is definitely not caused by the Python language. It can only be said that I have not done any optimization and the Discuz program is well optimized.


In fact, you can know without analysis that it is definitely the database that is dragging you down. When SimpleCD generates the homepage, it needs to perform more than 42 queries in three SQLite databases. It is an extremely inefficient design caused by historical reasons; but this Among the more than 40 queries, most of them are actually very fast queries. After careful analysis, there are two that are big performers, and the others are not slow.

The first big one is: Get the number of data

SELECT count(*) FROM verycd

This operation takes a lot of time every time, because every time the database needs to be locked and the primary key statistics are traversed Because of the number of data, the larger the amount of data, the more time-consuming it is. The time-consuming is O(N), and N is the size of the database. In fact, it is very easy to solve this problem. Just store the current number of data anywhere. Just make changes when adding or deleting data, so the time is O(1)

The second big one is: Get the latest updated 20 data lists

SELECT verycdid,title,brief,updtime FROM verycd

ORDER BY updtime DESC LIMIT 20;

Because the index is done on updtime, the real query time is actually the time of searching the index. But why is this operation slow? Because my data is inserted according to publish time, if it is displayed according to update time, I/O will definitely need to be done in at least 20 different places, which will be slow. The solution is to let it do I/O in one place. That is, unless new data is added to the database/original data is changed, the return result of this statement will be cached. This is 20 times faster:)

The next are 20 small cases: Get publisher and click number information

SELECT owner FROM LOCK WHERE id=XXXX;

SELECT hits FROM stat WHERE id=XXXX;

Why isn’t the SQL join statement used here to save some trouble? Due to architectural reasons, these data are placed in different databases. stat is a database such as click rate. Because it requires frequent insertion, it is stored in mysql. Lock and verycd are databases that require a large number of select operations because of the tragic index usage of mysql. It is stored in the sqlite3 database due to paging efficiency, so it cannot be joined -.-

In short, this is not a problem. Just like the solution just now, everything is cached

So looking at my example, optimizing web page performance can be summed up in a nutshell In short, cache the database query and that's it. I believe most web applications are like this:)


Finally it’s memcached’s turn. Since we plan to cache, if we use files for caching, there will still be disk I/O. It is better to cache directly into the memory. Memory I/O can It's much faster. So memcached, as the name suggests, is just that.

memcached is a very powerful tool because it can support distributed shared memory cache. Big sites use it. For small sites, as long as they can afford the memory, it is also a good thing; the memory buffer required for the homepage The size is estimated not to exceed 10K, not to mention that I am also a memory tycoon now, do I still care about this?

Configuration and operation: Because it is a stand-alone machine and there is nothing to configure, just change the memory and port.

vi /etc/memcached.conf

/etc/init.d/memcached restart

In the python web application Use

import memcache

mc = memcache.Client(['127.0.0.1:11211'], debug=0)

memcache is actually a map structure, and the most commonly used are two functions:

The first one is set(key, value, timeout), which is very simple to map key to value. Timeout refers to when the mapping becomes invalid. The second one is get(key) function, which returns the value pointed by key.

So you can do this for a normal sql query

sql = 'select count(*) from verycd'

c = sqlite3.connect('verycd.db').cursor()



# Original processing method

c.execute(sql)

count = c.fetchone()[0]



# Current processing method

from hashlib import md5

key=md5(sql)

count = mc.get(key)

if not count:

c.execute(sql)

count = c.fetchone()[0]

mc.set(key,count,60*5) #Save for 5 minutes

where md5 is to make the key distribution more even, others The code is very intuitive so I won’t explain it.


After optimizing statement 1 and statement 2, the average generation time of the home page has been reduced to 0.02 seconds, which is the same order of magnitude as discuz; after optimizing statement 3, the final result is that the home page generation time has been reduced to about 0.006 seconds. , after optimizing just a few lines of memcached code, the performance increased by 3300%. Finally I can straighten my back and watch Discuz)


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn