Efficient API consumption for huge data in JavaScript-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Efficient API consumption for huge data in JavaScript

Susan Sarandon

Oct 20, 2024 pm 08:42 PM

Efficient API consumption for huge data in JavaScript

When working with APIs that handle large datasets, it's crucial to manage data flow efficiently and address challenges such as pagination, rate limits, and memory usage. In this article, we’ll walk through how to consume APIs using JavaScript's native fetch function. We'll see important topics like:

Handling huge amounts of data: retrieving large datasets incrementally to avoid overwhelming your system.
Pagination: most APIs, including Storyblok Content Delivery API, return data in pages. We'll explore how to manage pagination for efficient data retrieval.
Rate Limits: APIs often impose rate limits to prevent abuse. We'll see how to detect and handle these limits.
Retry-After Mechanism: if the API responds with a 429 status code (Too Many Requests), we’ll implement the "Retry-After" mechanism, which indicates how long to wait before retrying to ensure smooth data fetching.
Concurrent Requests: fetching multiple pages in parallel can speed up the process. We’ll use JavaScript’s Promise.all() to send concurrent requests and boost performance.
Avoiding Memory Leaks: handling large datasets requires careful memory management. We’ll process data in chunks and ensure memory-efficient operations, thanks to generators.

We will explore these techniques using the Storyblok Content Delivery API and explain how to handle all these factors in JavaScript using fetch. Let’s dive into the code.

Things to keep in mind when using the Storyblok Content Delivery API

Before diving into the code, here are a few key features of the Storyblok API to consider:

CV parameter: the cv (Content Version) parameter retrieves cached content. The cv value is returned in the first request and should be passed in subsequent requests to ensure the same cached version of the content is fetched.
Pagination with page and per-page: using the page and per_page parameters to control the number of items returned in each request and to iterate through the results pages.
Total Header: The first response's total header indicates the total number of items available. This is essential for calculating how many data pages need to be fetched.
Handling 429 (Rate Limit): Storyblok enforces rate limits; when you hit them, the API returns a 429 status. Use the Retry-After header (or a default value) to know how long to wait before retrying the request.

JavaScript example code using fetch() for handling large datasets

Here’s how I implemented these concepts using the native fetch function in JavaScript.
Consider that:

This snippet creates a new file named stories.json as an example. If the file already exists, it will be overwritten. So, if you have a file with that name already in the working directory, change the name in the code snippet.
because the requests are executed in parallel, the order of the stories is not guaranteed. For example, if the response for the third page is faster than the response of the second request, the generators will deliver the stories of the third page before the stories of the second page.
I tested the snippet with Bun :)

import { writeFile, appendFile } from "fs/promises";

// Read access token from Environment
const STORYBLOK_ACCESS_TOKEN = process.env.STORYBLOK_ACCESS_TOKEN;
// Read access token from Environment
const STORYBLOK_VERSION = process.env.STORYBLOK_VERSION;

/**
 * Fetch a single page of data from the API,
 * with retry logic for rate limits (HTTP 429).
 */
async function fetchPage(url, page, perPage, cv) {
  let retryCount = 0;
  // Max retry attempts
  const maxRetries = 5;
  while (retryCount  setTimeout(resolve, retryAfter * 1000 * retryCount));
        continue;
      }

      if (!response.ok) {
        throw new Error(
          `Failed to fetch page ${page}: HTTP ${response.status}`,
        );
      }
      const data = await response.json();
      // Return the stories data of the current page
      return data.stories || [];
    } catch (error) {
      console.error(`Error fetching page ${page}: ${error.message}`);
      return []; // Return an empty array if the request fails to not break the flow
    }
  }
  console.error(`Failed to fetch page ${page} after ${maxRetries} attempts`);
  return []; // If we hit the max retry limit, return an empty array
}

/**
 * Fetch all data in parallel, processing pages in batches
 * as a generators (the reason why we use the `*`)
 */
async function* fetchAllDataInParallel(
  url,
  perPage = 25,
  numOfParallelRequests = 5,
) {

  let currentPage = 1;
  let totalPages = null;

  // Fetch the first page to get:
  // - the total entries (the `total` HTTP header)
  // - the CV for caching (the `cv` atribute in the JSON response payload)
  const firstResponse = await fetch(
    `${url}&page=${currentPage}&per_page=${perPage}`,
  );
  if (!firstResponse.ok) {
    console.log(`${url}&page=${currentPage}&per_page=${perPage}`);
    console.log(firstResponse);
    throw new Error(`Failed to fetch data: HTTP ${firstResponse.status}`);
  }
  console.timeLog("API", "After first response");

  const firstData = await firstResponse.json();
  const total = parseInt(firstResponse.headers.get("total"), 10) || 0;
  totalPages = Math.ceil(total / perPage);

  // Yield the stories from the first page
  for (const story of firstData.stories) {
    yield story;
  }

  const cv = firstData.cv;

  console.log(`Total pages: ${totalPages}`);
  console.log(`CV parameter for caching: ${cv}`);

  currentPage++; // Start from the second page now

  while (currentPage 
      fetchPage(url, page, perPage, firstData, cv),
    );

    // Wait for all requests in the batch to complete
    const batchResults = await Promise.all(batchRequests);
    console.timeLog("API", `Got ${batchResults.length} response`);
    // Yield the stories from each batch of requests
    for (let result of batchResults) {
      for (const story of result) {
        yield story;
      }
    }
    console.log(`Fetched pages: ${pagesToFetch.join(", ")}`);
  }
}

console.time("API");
const apiUrl = `https://api.storyblok.com/v2/cdn/stories?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`;
//const apiUrl = `http://localhost:3000?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`;

const stories = fetchAllDataInParallel(apiUrl, 25,7);

// Create an empty file (or overwrite if it exists) before appending
await writeFile('stories.json', '[', 'utf8'); // Start the JSON array
let i = 0;
for await (const story of stories) {
  i++;
  console.log(story.name);
  // If it's not the first story, add a comma to separate JSON objects
  if (i > 1) {
    await appendFile('stories.json', ',', 'utf8');
  }
  // Append the current story to the file
  await appendFile('stories.json', JSON.stringify(story, null, 2), 'utf8');
}
// Close the JSON array in the file
await appendFile('stories.json', ']', 'utf8'); // End the JSON array
console.log(`Total Stories: ${i}`);

Key Steps Explained

Here’s a breakdown of the crucial steps in the code that ensure efficient and reliable API consumption using the Storyblok Content Delivery API:

1) Fetching pages with retries mechanism (fetchPage)

This function handles fetching a single page of data from the API. It includes logic for retrying when the API responds with a 429 (Too Many Requests) status, which signals that the rate limit has been exceeded.
The retryAfter value specifies how long to wait before retrying. I use setTimeout to pause before making the subsequent request, and retries are limited to a maximum of 5 attempts.

2) Initial page request and the CV parameter

The first API request is crucial because it retrieves the total header (which indicates the total number of stories) and the cv parameter (used for caching).
You can use the total header to calculate the total number of pages required, and the cv parameter ensures the cached content is used.

3) Handling pagination

Pagination is managed using the page and per_page query string parameters. The code requests 25 stories per page (you can adjust this), and the total header helps calculate how many pages need to be fetched.
The code fetches stories in batches of up to 7 (you can adjust this) parallel requests at a time to improve performance without overwhelming the API.

4) Concurrent requests with Promise.all():

To speed up the process, multiple pages are fetched in parallel using JavaScript's Promise.all(). This method sends several requests simultaneously and waits for all of them to complete.
After each batch of parallel requests is completed, the results are processed to yield the stories. This avoids loading all the data into memory at once, reducing memory consumption.

5) Memory management with asynchronous iteration (for await...of):

Instead of collecting all data into an array, we use JavaScript Generators (function* and for await...of) to process each story as it is fetched. This prevents memory overload when handling large datasets.
By yielding the stories one by one, the code remains efficient and avoids memory leaks.

6) Rate limit handling:

If the API responds with a 429 status code (rate-limited), the script uses the retryAfter value. It then pauses for the specified time before retrying the request. This ensures compliance with API rate limits and avoids sending too many requests too quickly.

Conclusion

In this article, We covered the key considerations when consuming APIs in JavaScript using the native fetch function. I try to handle:

Large datasets: fetching large datasets using pagination.
Pagination: managing pagination with page and per_page parameters.
Rate limits and retry mechanism: handling rate limits and retrying requests after the appropriate delay.
Concurrent requests: fetching pages in parallel using JavaScript’s Promise.all() to speed up data retrieval.
Memory management: using JavaScript Generators (function* and for await...of) to process data without consuming excessive memory.

By applying these techniques, you can handle API consumption in a scalable, efficient, and memory-safe way.

Feel free to drop your comments/feedback.

References

JavaScript Generators
Bun the JavaScript runtime
The Storyblok Content Delivery API

The above is the detailed content of Efficient API consumption for huge data in JavaScript. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

From C/C to JavaScript: How It All WorksApr 14, 2025 am 12:05 AM

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

JavaScript Engines: Comparing ImplementationsApr 13, 2025 am 12:05 AM

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

Beyond the Browser: JavaScript in the Real WorldApr 12, 2025 am 12:06 AM

JavaScript's applications in the real world include server-side programming, mobile application development and Internet of Things control: 1. Server-side programming is realized through Node.js, suitable for high concurrent request processing. 2. Mobile application development is carried out through ReactNative and supports cross-platform deployment. 3. Used for IoT device control through Johnny-Five library, suitable for hardware interaction.

Building a Multi-Tenant SaaS Application with Next.js (Backend Integration)Apr 11, 2025 am 08:23 AM

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

How to Build a Multi-Tenant SaaS Application with Next.js (Frontend Integration)Apr 11, 2025 am 08:22 AM

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

JavaScript: Exploring the Versatility of a Web LanguageApr 11, 2025 am 12:01 AM

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

The Evolution of JavaScript: Current Trends and Future ProspectsApr 10, 2025 am 09:33 AM

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

Demystifying JavaScript: What It Does and Why It MattersApr 09, 2025 am 12:07 AM

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Hot Topics

Where is the login entrance for gmail email?

7500

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers