search
HomeBackend DevelopmentPHP TutorialMemory Performance Boosts with Generators and Nikic/Iter

PHP iterator and generator: a powerful tool for efficient processing of large data sets

Arrays and iterations are the cornerstone of any application. As we get new tools, the way we use arrays should also improve.

For example, a generator is a new tool. At first we only have arrays, and then we gain the ability to define our own class array structure (called iterators). But since PHP 5.5, we can quickly create class iterator structures called generators.

Memory Performance Boosts with Generators and Nikic/Iter

The generator looks like functions, but we can use them as iterators. They provide us with a simple syntax for creating essentially interruptible, repeatable functions. They are amazing!

We will look at several areas where generators can be used and explore some issues that need to be paid attention to when using generators. Finally, we will learn a great library created by the talented Nikita Popov.

The sample code can be found at https://github.com/sitepoint-editors/generators-and-iter.

Key Points

  • Generators (available since PHP 5.5) are powerful tools for creating iterators that allow the creation of interruptible, repeatable functions, simplifying processing of large datasets and improving memory performance.
  • Nikita Popov creates Nikic/Iter library that introduces functions that can be used with iterators and generators, saving significantly memory by avoiding creating unnecessary intermediate arrays.
  • The generator and Nikic/Iter libraries are especially useful when working with large CSV files, which can handle large data sets without loading them all into memory at once.
  • While generators can significantly improve memory performance, they also present some of their own challenges, such as incompatible with array_filter and array_map, requiring other tools such as Nikic/Iter to handle such data.

Question

Suppose you have a lot of relational data and want to do some preloading. Maybe the data is comma-separated, you need to load each data type and group them together.

You can start with the following simple code:

function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);

You may then try to concatenate related elements by iterating or higher order functions:

function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // 对 $author 进行其他更改

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // 对 $category 进行其他更改

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // 对 $post 进行其他更改

    return $post;
}, $posts);

Looks good, right? So, what happens when we have a large number of CSV files to parse? Let's analyze the memory usage a little...

function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());

(The sample code contains generate.php, which you can use to create these CSV files...)

If you have large CSV files, this code should show how much memory it takes to link these arrays together. At least the same size as the file you have to read, because PHP has to keep everything in memory.

Generator comes to rescue!

One way to improve this problem is to use a generator. If you are not familiar with them, now is a good time to learn more.

The generator allows you to load a small amount of total data at once. You don't have to do much with the generator:

function readCSV($file) {
    $rows = [];

    $handle = fopen($file, "r");

    while (!feof($handle)) {
        $rows[] = fgetcsv($handle);
    }

    fclose($handle);

    return $rows;
}

$authors = array_filter(
    readCSV("authors.csv")
);

$categories = array_filter(
    readCSV("categories.csv")
);

$posts = array_filter(
    readCSV("posts.csv")
);

If you iterate through CSV data, you will notice that the amount of memory required will be reduced immediately:

function filterByColumn($array, $column, $value) {
    return array_filter(
        $array, function($item) use ($column, $value) {
            return $item[$column] == $value;
        }
    );
}

$authors = array_map(function($author) use ($posts) {
    $author["posts"] = filterByColumn(
        $posts, 1, $author[0]
    );

    // 对 $author 进行其他更改

    return $author;
}, $authors);

$categories = array_map(function($category) use ($posts) {
    $category["posts"] = filterByColumn(
        $posts, 2, $category[0]
    );

    // 对 $category 进行其他更改

    return $category;
}, $categories);

$posts = array_map(function($post) use ($authors, $categories) {
    foreach ($authors as $author) {
        if ($author[0] == $post[1]) {
            $post["author"] = $author;
            break;
        }
    }

    foreach ($categories as $category) {
        if ($category[0] == $post[1]) {
            $post["category"] = $category;
            break;
        }
    }

    // 对 $post 进行其他更改

    return $post;
}, $posts);

If you've seen megabytes of memory usage before, you'll now see kilobytes. This is a huge improvement, but it is not without its problems.

First of all, array_filter and array_map do not work with generators. You must find other tools to process this type of data. Here is a tool you can try!

function formatBytes($bytes, $precision = 2) {
    $kilobyte = 1024;
    $megabyte = 1024 * 1024;

    if ($bytes >= 0 && $bytes < $kilobyte) {
        return $bytes . " b";
    }

    if ($bytes >= $kilobyte && $bytes < $megabyte) {
        return round($bytes / $kilobyte, $precision) . " kb";
    }

    return round($bytes / $megabyte, $precision) . " mb";
}

print "memory:" . formatBytes(memory_get_peak_usage());

This library introduces some functions that can be used with iterators and generators. So how do you still get all this relevant data without saving any data in memory?

function readCSVGenerator($file) {
    $handle = fopen($file, "r");

    while (!feof($handle)) {
        yield fgetcsv($handle);
    }

    fclose($handle);
}

This can be simpler:

foreach (readCSVGenerator("posts.csv") as $post) {
    // 使用 $post 执行某些操作
}

print "memory:" . formatBytes(memory_get_peak_usage());

(Rereading each data source is inefficient every time. Consider saving smaller related data (such as authors and categories) in memory...)

Other interesting things

For Nikic's library, this is just the tip of the iceberg! Ever wanted to flatten an array (or iterator/generator)?

composer require nikic/iter

You can use functions such as slice and take to return slices of iterable variables:

// ... (后续代码与原文类似,但使用iter库函数进行优化,此处省略以节省篇幅) ...

When you use generators more, you may find that you don't always have to reuse them. Consider the following example:

// ... (使用iter库函数简化代码,此处省略以节省篇幅) ...

If you try to run the code, you will see an exception prompting: "Cannot traverse closed generator". Each iterator function in this library has a swappable corresponding function:

// ... (使用iter\flatten和iter\toArray函数的示例代码,此处省略以节省篇幅) ...

You can use this mapping function multiple times. You can even make your own generator rewindable:

// ... (使用iter\slice和iter\toArray函数的示例代码,此处省略以节省篇幅) ...

What you get from it is a reusable generator!

Conclusion

For every loop operation you need to consider, the generator may be an option. They are even useful for other things. Where language features are insufficient, Nikic's library provides a large number of higher-order functions.

Are you already using the generator? Do you want to see more examples on how to implement them in your own application for some performance improvements? Please tell us!

(The FAQs part is similar to the original text, and is omitted here to save space. The FAQs part can be optionally retained or reorganized as needed.)

The above is the detailed content of Memory Performance Boosts with Generators and Nikic/Iter. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
How can you protect against Cross-Site Scripting (XSS) attacks related to sessions?How can you protect against Cross-Site Scripting (XSS) attacks related to sessions?Apr 23, 2025 am 12:16 AM

To protect the application from session-related XSS attacks, the following measures are required: 1. Set the HttpOnly and Secure flags to protect the session cookies. 2. Export codes for all user inputs. 3. Implement content security policy (CSP) to limit script sources. Through these policies, session-related XSS attacks can be effectively protected and user data can be ensured.

How can you optimize PHP session performance?How can you optimize PHP session performance?Apr 23, 2025 am 12:13 AM

Methods to optimize PHP session performance include: 1. Delay session start, 2. Use database to store sessions, 3. Compress session data, 4. Manage session life cycle, and 5. Implement session sharing. These strategies can significantly improve the efficiency of applications in high concurrency environments.

What is the session.gc_maxlifetime configuration setting?What is the session.gc_maxlifetime configuration setting?Apr 23, 2025 am 12:10 AM

Thesession.gc_maxlifetimesettinginPHPdeterminesthelifespanofsessiondata,setinseconds.1)It'sconfiguredinphp.iniorviaini_set().2)Abalanceisneededtoavoidperformanceissuesandunexpectedlogouts.3)PHP'sgarbagecollectionisprobabilistic,influencedbygc_probabi

How do you configure the session name in PHP?How do you configure the session name in PHP?Apr 23, 2025 am 12:08 AM

In PHP, you can use the session_name() function to configure the session name. The specific steps are as follows: 1. Use the session_name() function to set the session name, such as session_name("my_session"). 2. After setting the session name, call session_start() to start the session. Configuring session names can avoid session data conflicts between multiple applications and enhance security, but pay attention to the uniqueness, security, length and setting timing of session names.

How often should you regenerate session IDs?How often should you regenerate session IDs?Apr 23, 2025 am 12:03 AM

The session ID should be regenerated regularly at login, before sensitive operations, and every 30 minutes. 1. Regenerate the session ID when logging in to prevent session fixed attacks. 2. Regenerate before sensitive operations to improve safety. 3. Regular regeneration reduces long-term utilization risks, but the user experience needs to be weighed.

How do you set the session cookie parameters in PHP?How do you set the session cookie parameters in PHP?Apr 22, 2025 pm 05:33 PM

Setting session cookie parameters in PHP can be achieved through the session_set_cookie_params() function. 1) Use this function to set parameters, such as expiration time, path, domain name, security flag, etc.; 2) Call session_start() to make the parameters take effect; 3) Dynamically adjust parameters according to needs, such as user login status; 4) Pay attention to setting secure and httponly flags to improve security.

What is the main purpose of using sessions in PHP?What is the main purpose of using sessions in PHP?Apr 22, 2025 pm 05:25 PM

The main purpose of using sessions in PHP is to maintain the status of the user between different pages. 1) The session is started through the session_start() function, creating a unique session ID and storing it in the user cookie. 2) Session data is saved on the server, allowing data to be passed between different requests, such as login status and shopping cart content.

How can you share sessions across subdomains?How can you share sessions across subdomains?Apr 22, 2025 pm 05:21 PM

How to share a session between subdomains? Implemented by setting session cookies for common domain names. 1. Set the domain of the session cookie to .example.com on the server side. 2. Choose the appropriate session storage method, such as memory, database or distributed cache. 3. Pass the session ID through cookies, and the server retrieves and updates the session data based on the ID.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!