Home >Backend Development >PHP Tutorial >Memory Performance Boosts with Generators and Nikic/Iter

Memory Performance Boosts with Generators and Nikic/Iter

Joseph Gordon-Levitt
Joseph Gordon-LevittOriginal
2025-02-16 09:17:10365browse

PHP iterator and generator: a powerful tool for efficient processing of large data sets

Arrays and iterations are the cornerstone of any application. As we get new tools, the way we use arrays should also improve.

For example, a generator is a new tool. At first we only have arrays, and then we gain the ability to define our own class array structure (called iterators). But since PHP 5.5, we can quickly create class iterator structures called generators.

Memory Performance Boosts with Generators and Nikic/Iter

The generator looks like functions, but we can use them as iterators. They provide us with a simple syntax for creating essentially interruptible, repeatable functions. They are amazing!

We will look at several areas where generators can be used and explore some issues that need to be paid attention to when using generators. Finally, we will learn a great library created by the talented Nikita Popov.

The sample code can be found at https://github.com/sitepoint-editors/generators-and-iter.

Key Points

  • Generators (available since PHP 5.5) are powerful tools for creating iterators that allow the creation of interruptible, repeatable functions, simplifying processing of large datasets and improving memory performance.
  • Nikita Popov creates Nikic/Iter library that introduces functions that can be used with iterators and generators, saving significantly memory by avoiding creating unnecessary intermediate arrays.
  • The generator and Nikic/Iter libraries are especially useful when working with large CSV files, which can handle large data sets without loading them all into memory at once.
  • While generators can significantly improve memory performance, they also present some of their own challenges, such as incompatible with array_filter and array_map, requiring other tools such as Nikic/Iter to handle such data.

Question

Suppose you have a lot of relational data and want to do some preloading. Maybe the data is comma-separated, you need to load each data type and group them together.

You can start with the following simple code:

<code class="language-php">function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);</code>

You may then try to concatenate related elements by iterating or higher order functions:

<code class="language-php">function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // 对 $author 进行其他更改

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // 对 $category 进行其他更改

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // 对 $post 进行其他更改

    return $post;
}, $posts);</code>

Looks good, right? So, what happens when we have a large number of CSV files to parse? Let's analyze the memory usage a little...

<code class="language-php">function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());</code>

(The sample code contains generate.php, which you can use to create these CSV files...)

If you have large CSV files, this code should show how much memory it takes to link these arrays together. At least the same size as the file you have to read, because PHP has to keep everything in memory.

Generator comes to rescue!

One way to improve this problem is to use a generator. If you are not familiar with them, now is a good time to learn more.

The generator allows you to load a small amount of total data at once. You don't have to do much with the generator:

<code class="language-php">function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);</code>

If you iterate through CSV data, you will notice that the amount of memory required will be reduced immediately:

<code class="language-php">function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // 对 $author 进行其他更改

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // 对 $category 进行其他更改

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // 对 $post 进行其他更改

    return $post;
}, $posts);</code>

If you've seen megabytes of memory usage before, you'll now see kilobytes. This is a huge improvement, but it is not without its problems.

First of all, array_filter and array_map do not work with generators. You must find other tools to process this type of data. Here is a tool you can try!

<code class="language-php">function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());</code>

This library introduces some functions that can be used with iterators and generators. So how do you still get all this relevant data without saving any data in memory?

<code class="language-php">function readCSVGenerator($file) {
    $handle = fopen($file, "r");

    while (!feof($handle)) {
        yield fgetcsv($handle);
    }

    fclose($handle);
}</code>

This can be simpler:

<code class="language-php">foreach (readCSVGenerator("posts.csv") as $post) {
    // 使用 $post 执行某些操作
}

print "memory:" . formatBytes(memory_get_peak_usage());</code>

(Rereading each data source is inefficient every time. Consider saving smaller related data (such as authors and categories) in memory...)

Other interesting things

For Nikic's library, this is just the tip of the iceberg! Ever wanted to flatten an array (or iterator/generator)?

<code class="language-bash">composer require nikic/iter</code>

You can use functions such as slice and take to return slices of iterable variables:

<code class="language-php">// ... (后续代码与原文类似,但使用iter库函数进行优化,此处省略以节省篇幅) ...</code>

When you use generators more, you may find that you don't always have to reuse them. Consider the following example:

<code class="language-php">// ... (使用iter库函数简化代码,此处省略以节省篇幅) ...</code>

If you try to run the code, you will see an exception prompting: "Cannot traverse closed generator". Each iterator function in this library has a swappable corresponding function:

<code class="language-php">// ... (使用iter\flatten和iter\toArray函数的示例代码,此处省略以节省篇幅) ...</code>

You can use this mapping function multiple times. You can even make your own generator rewindable:

<code class="language-php">// ... (使用iter\slice和iter\toArray函数的示例代码,此处省略以节省篇幅) ...</code>

What you get from it is a reusable generator!

Conclusion

For every loop operation you need to consider, the generator may be an option. They are even useful for other things. Where language features are insufficient, Nikic's library provides a large number of higher-order functions.

Are you already using the generator? Do you want to see more examples on how to implement them in your own application for some performance improvements? Please tell us!

(The FAQs part is similar to the original text, and is omitted here to save space. The FAQs part can be optionally retained or reorganized as needed.)

The above is the detailed content of Memory Performance Boosts with Generators and Nikic/Iter. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn