Home >Backend Development >PHP Tutorial >Memory Performance Boosts with Generators and Nikic/Iter
PHP iterator and generator: a powerful tool for efficient processing of large data sets
Arrays and iterations are the cornerstone of any application. As we get new tools, the way we use arrays should also improve.
For example, a generator is a new tool. At first we only have arrays, and then we gain the ability to define our own class array structure (called iterators). But since PHP 5.5, we can quickly create class iterator structures called generators.
The generator looks like functions, but we can use them as iterators. They provide us with a simple syntax for creating essentially interruptible, repeatable functions. They are amazing!
We will look at several areas where generators can be used and explore some issues that need to be paid attention to when using generators. Finally, we will learn a great library created by the talented Nikita Popov.
The sample code can be found at https://github.com/sitepoint-editors/generators-and-iter.
Key Points
array_filter
and array_map
, requiring other tools such as Nikic/Iter to handle such data. Question
Suppose you have a lot of relational data and want to do some preloading. Maybe the data is comma-separated, you need to load each data type and group them together.
You can start with the following simple code:
<code class="language-php">function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );</code>
You may then try to concatenate related elements by iterating or higher order functions:
<code class="language-php">function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 对 $author 进行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 对 $category 进行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 对 $post 进行其他更改 return $post; }, $posts);</code>
Looks good, right? So, what happens when we have a large number of CSV files to parse? Let's analyze the memory usage a little...
<code class="language-php">function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());</code>
(The sample code contains generate.php
, which you can use to create these CSV files...)
If you have large CSV files, this code should show how much memory it takes to link these arrays together. At least the same size as the file you have to read, because PHP has to keep everything in memory.
Generator comes to rescue!
One way to improve this problem is to use a generator. If you are not familiar with them, now is a good time to learn more.
The generator allows you to load a small amount of total data at once. You don't have to do much with the generator:
<code class="language-php">function readCSV($file) { $rows = []; $handle = fopen($file, "r"); while (!feof($handle)) { $rows[] = fgetcsv($handle); } fclose($handle); return $rows; } $authors = array_filter( readCSV("authors.csv") ); $categories = array_filter( readCSV("categories.csv") ); $posts = array_filter( readCSV("posts.csv") );</code>
If you iterate through CSV data, you will notice that the amount of memory required will be reduced immediately:
<code class="language-php">function filterByColumn($array, $column, $value) { return array_filter( $array, function($item) use ($column, $value) { return $item[$column] == $value; } ); } $authors = array_map(function($author) use ($posts) { $author["posts"] = filterByColumn( $posts, 1, $author[0] ); // 对 $author 进行其他更改 return $author; }, $authors); $categories = array_map(function($category) use ($posts) { $category["posts"] = filterByColumn( $posts, 2, $category[0] ); // 对 $category 进行其他更改 return $category; }, $categories); $posts = array_map(function($post) use ($authors, $categories) { foreach ($authors as $author) { if ($author[0] == $post[1]) { $post["author"] = $author; break; } } foreach ($categories as $category) { if ($category[0] == $post[1]) { $post["category"] = $category; break; } } // 对 $post 进行其他更改 return $post; }, $posts);</code>
If you've seen megabytes of memory usage before, you'll now see kilobytes. This is a huge improvement, but it is not without its problems.
First of all, array_filter
and array_map
do not work with generators. You must find other tools to process this type of data. Here is a tool you can try!
<code class="language-php">function formatBytes($bytes, $precision = 2) { $kilobyte = 1024; $megabyte = 1024 * 1024; if ($bytes >= 0 && $bytes < $kilobyte) { return $bytes . " b"; } if ($bytes >= $kilobyte && $bytes < $megabyte) { return round($bytes / $kilobyte, $precision) . " kb"; } return round($bytes / $megabyte, $precision) . " mb"; } print "memory:" . formatBytes(memory_get_peak_usage());</code>
This library introduces some functions that can be used with iterators and generators. So how do you still get all this relevant data without saving any data in memory?
<code class="language-php">function readCSVGenerator($file) { $handle = fopen($file, "r"); while (!feof($handle)) { yield fgetcsv($handle); } fclose($handle); }</code>
This can be simpler:
<code class="language-php">foreach (readCSVGenerator("posts.csv") as $post) { // 使用 $post 执行某些操作 } print "memory:" . formatBytes(memory_get_peak_usage());</code>
(Rereading each data source is inefficient every time. Consider saving smaller related data (such as authors and categories) in memory...)
Other interesting things
For Nikic's library, this is just the tip of the iceberg! Ever wanted to flatten an array (or iterator/generator)?
<code class="language-bash">composer require nikic/iter</code>
You can use functions such as slice
and take
to return slices of iterable variables:
<code class="language-php">// ... (后续代码与原文类似,但使用iter库函数进行优化,此处省略以节省篇幅) ...</code>
When you use generators more, you may find that you don't always have to reuse them. Consider the following example:
<code class="language-php">// ... (使用iter库函数简化代码,此处省略以节省篇幅) ...</code>
If you try to run the code, you will see an exception prompting: "Cannot traverse closed generator". Each iterator function in this library has a swappable corresponding function:
<code class="language-php">// ... (使用iter\flatten和iter\toArray函数的示例代码,此处省略以节省篇幅) ...</code>
You can use this mapping function multiple times. You can even make your own generator rewindable:
<code class="language-php">// ... (使用iter\slice和iter\toArray函数的示例代码,此处省略以节省篇幅) ...</code>
What you get from it is a reusable generator!
Conclusion
For every loop operation you need to consider, the generator may be an option. They are even useful for other things. Where language features are insufficient, Nikic's library provides a large number of higher-order functions.
Are you already using the generator? Do you want to see more examples on how to implement them in your own application for some performance improvements? Please tell us!
(The FAQs part is similar to the original text, and is omitted here to save space. The FAQs part can be optionally retained or reorganized as needed.)
The above is the detailed content of Memory Performance Boosts with Generators and Nikic/Iter. For more information, please follow other related articles on the PHP Chinese website!