Home >Backend Development >Python Tutorial >Detailed explanation of performance testing of Python ORM based on TPC-C benchmark

Detailed explanation of performance testing of Python ORM based on TPC-C benchmark

coldplay.xixi
coldplay.xixiforward
2020-07-28 16:57:283133browse

Detailed explanation of performance testing of Python ORM based on TPC-C benchmark

## When the application that needs to be used together with the database, the object relationship maper (ORM) is usually used in Python programming. Examples of Python ORMs are SQLAlchemy, Peewee, Pony-ORM and Django. Choosing ORM performance plays a vital role. But how do these toolsets compare? ORM performance benchmarks provide a clear measure, but there is still a lot of room for improvement. I researched and extended qualitative ORM benchmarks to help developers in need. The qualitative Python ORM benchmark Tortoise ORM (link to repository) analyzes the speed of six ORMs for 11 types of SQL queries.

Related learning recommendations: python video tutorial

Generally, the Tortoise benchmark can evaluate the query execution speed of various ORMs. However, there is a flaw in this testing method: most ORMs are chosen for web applications. In this case, multiple users often send all manner of queries to the database. Because there are no evaluated benchmarking tools capable of evaluating the performance of Python ORM in this context, I decided to write my own PonyORM and SQLAlchemy to compare. As a basis, I used the TPC-C benchmark.

Since 1988, TPC has been developing and testing in the field of data processing. They have long since become an industry standard, with almost all device vendors using them on various hardware and software samples. The main feature of these tests is that they focus on testing under huge loads as close as possible to real conditions.

 TPC-C simulates warehouse network. It consists of a combination of five transactions of various types and complexity executed simultaneously. The purpose of this test is to evaluate the speed of transaction processing when multiple virtual users access the database simultaneously.

I decided to test two Python ORMs (SQLALchemy and PonyORM) using the TPC-C testing method suitable for this task. The purpose of this test is to evaluate the speed of transaction processing when multiple virtual users access the database simultaneously.

Test Instructions

The first step is to create and populate the database of the warehouse network.

The database contains eight relationships:

1. Warehouse

2. District

3. Order

4. Order line

 5. Stock

 6. Project

 7. Customer

 8. History

The databases of Pony and SQLAlchemy are the same . Only primary and foreign keys are indexed. Little A will automatically create these indexes. In SQLAlchemy I created it manually.

During the test, several virtual users sent different types of transactions to the database. Each transaction contains several requests. There are a total of five types of transactions submitted for processing with different probability of occurrence:

Transaction:

1. New order-45%

2. Payment-43%

 3. order_status-4%

 4. Delivery-4%

 5. Stock level-4%

  Likelihood of transaction occurring vs. original TPC- Same as in C test.

However, please keep in mind that due to technical limitations and because I wanted to test the performance of the following processors, the original was performed on a server with 64+ GB of RAM (requiring a lot of processors and huge disk space) TPC-C test. The ORM rather than the hardware's ability to withstand huge loads, so this test is somewhat simplified.

The main differences from the TPC-C test are as follows:

The main differences:

1. This test runs fewer virtual users than the original test

2 . My test has fewer table entries. For example: The number of entries in the "Inventory" relationship in the original test was calculated using the formula 100,000 * W, where W is the number of warehouses. In this test it is 100*W.

 3. In TPC-C, some transactions have multiple options for querying data from the database. For example, in a payment transaction, there is one possibility to request the customer from the database by ID, and another by last and first name. Currently my tests only make calls by ID.

 4. My test database has one less table than TPC-C. In TPC-C testing, after an order is created, it is added to the Order table and NewOrder table. Once the order is delivered, it is deleted from the NewOrder table. This can speed things up when applying a large number of transactions per minute; but since I have fewer users accessing the database, this is unnecessary. Instead, in the Order table, I added the bool attribute "is_o_delivered" which will be False until the order is delivered.

Next, I will briefly describe the role of each transaction.

Number of Transactions

New Command

1. Pass two parameters to the transaction: warehouse ID and customer ID

2. Use the passed ID from Select the warehouse and customer from the database

 3. Randomly select a warehouse area from the database

 4. Generate a random number indicating the number of order lines.

 5. Create an Order object

 6. Loop to create OrderLine objects. On each iteration of the loop, select a random item from the item table

 7. Change the inventory of each item in the order

Payment

1. Change the two Parameters passed to the transaction: warehouse ID and customer ID

 2. Select the warehouse and customer from the database by the passed ID

 3. Randomly select a warehouse area from the database

4. Generate a random number indicating the payment amount

5. Increase the balance of the warehouse and region by the payment amount

6. The customer balance decreases the payment amount

7. Increment Customer Payment Counter

 8. Increase the sum of customer payment amounts

 9. Create History Object

Order Status

 1. Pass Customer ID as Transaction Parameters

 2. Select the customer by ID and the customer’s last order

 3. Get the order status and order line from the order.

Delivery

1. Pass the warehouse ID as transaction parameter

2. Select the warehouse and all its areas from the database

3. For each Region selects the oldest undelivered order.

4. For each order that changes delivery status to True

5. For each customer with increasing order quantity

Inventory level

1. Pass the warehouse ID as the transaction parameter

 2. Select the warehouse from the database by ID

 3. Select the last 20 orders of the warehouse

 4. For the order For each project, evaluate the inventory level of the project

Test results

There are two ORMs participating in the test:

1. SQLAlchemy (blue line on the graph)

 2. PonyORM (orange line on the graph)

The following are the results of running the test for 10 minutes by accessing the database through 2 parallel processes. Start the process using the Multiprocessing module.

 

First, I tested all five transactions as expected in the TPC-C test. The result of this test is that the speed of Little A is about twice as fast as before.

Average speed:

· Small A-2543 transactions/minute

· SQLAlchemy-1353.4 transactions/minute

After that, I decided to evaluate the five transactions separately Transactional ORM performance. Below are the results for each trade.

New command

Average speed:

· Small A-3349.2 transactions/minute

· SQLAlchemy-1415.3 transactions/minute

Payment

Average speed:

· Small A-7175.3 transactions/minute

· SQLAlchemy-4110.6 transactions/minute

Order status

Average speed:

· Small A-16645.6 transactions/minute

· SQLAlchemy-4820.8 transactions/minute

Delivery

Average speed:

· SQLAlchemy-716.9 transactions/minute

· Small A-323.5 transactions/minute

Inventory level

Average speed:

· Small A-677.3 transactions/minute

· SQLAlchemy-167.9 transactions/minute

Test result analysis

After receiving the results, I analyzed why this happened and came to the following Conclusion:

In 4 out of 5 transactions, PonyORM is faster because when generating SQL code, PonyORM remembers the results of converting Python expressions into SQL. Therefore, Pony does not convert the expression again when the query is repeated, while SQLAlchemy is forced to generate SQL code every time it needs to execute the query.

Example of such a query in Pony:

stocks = select(stock for stock in Stock
  if stock.warehouse == whouse
  and stock.item in items).order_by(Stock.id).for_update()

Generated SQL:

 SELECT “stock”.”id”, “stock”.”warehouse”, “stock”.”item”,
  “stock”.”quantity”, “stock”.”ytd”, “stock”.”order_cnt”,
  “stock”.”remote_cnt”, “stock”.”data”FROM “stock” “stock”WHERE “stock”.”warehouse” = %(p1)s
  AND “stock”.”item” IN (%(p2)s, %(p3)s)ORDER BY “stock”.”id”FOR UPDATE
  {‘p1’:7, ‘p2’:7, ‘p3’:37}
  SQLAlchemy:
  stocks = session.query(Stock).filter(
  Stock.warehouse == whouse, Stock.item.in_(
  items)).order_by(text(“id”)).with_for_update()

Generated SQL:

SELECT stock.id AS stock_id, stock.warehouse_id AS stock_warehouse_id,
  stock.item_id AS stock_item_id, stock.quantity AS stock_quantity,
  stock.ytd AS stock_ytd, stock.order_cnt AS stock_order_cnt,
  stock.remote_cnt AS stock_remote_cnt, stock.data AS stock_dataFROM stockWHERE stock.warehouse_id = %(warehouse_id_1)s AND stock.item_id IN
  (%(item_id_1)s, %(item_id_2)s) ORDER BY id FOR UPDATE
  {‘warehouse_id_1’: 7, ‘item_id_1’: 53, ‘item_id_2’: 54}

However, obviously, SQLAlchemy can be more Fast execution of delivery type transactions because it can combine multiple UPDATE operations applied to different objects into a single command.

Example:

INFO:www.zpedu.com/sqlalchemy.engine.base.Engine:UPDATE order_line SET delivery_d=%
  (delivery_d)s WHERE order_line.id = %(order_line_id)s
  INFO:sqlalchemy.engine.base.Engine:(
  {‘delivery_d’: datetime.datetime(2020, 4, 6, 14, 33, 6, 922281),
  ‘order_line_id’: 316},
  {‘delivery_d’: datetime.datetime(2020, 4, 6, 14, 33, 6, 922272),
  ‘order_line_id’: 317},
  {‘delivery_d’: datetime.datetime(2020, 4, 6, 14, 33, 6, 922261))

In this case, Little A will send a separate query for each update.

Conclusion

Based on the test results, I can say that Pony selects from the database faster. On the other hand, in some cases, SQLAlchemy can generate Update type queries at a higher speed.

The above is the detailed content of Detailed explanation of performance testing of Python ORM based on TPC-C benchmark. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:learnku.com. If there is any infringement, please contact admin@php.cn delete