JavaScript can feel very removed from the hardware it runs on, but thinking low-level can still be useful in limited cases.
A recent post of Kafeel Ahmad on loop optimization detailed a number of loop performance improvement techniques. That article got me thinking about the topic.
Premature Optimization
Just to get this out of the way, this is a technique very few will ever need to consider in web development. Also, focusing on optimization too early can make code harder to write and much harder maintain. Taking a peek at low-level techniques can give us insight into our tools and the work in general, even if we can't apply that knowledge directly.
What is Loop Unrolling?
Loop unrolling basically duplicates the logic inside a loop so you perform multiple operations during each, well, loop. In specific cases, making the code in the loop longer can make it faster.
By intentionally performing some operations in groups rather than one-by-one, the computer may be able to operate more efficiently.
Unrolling Example
Let's take a very simple example: summing values in an array.
// 1-to-1 looping const simpleSum = (data) => { let sum = 0; for(let i=0; i { let sum1 = 0; let sum2 = 0; for(let i=0; i <p>This may look very strange at first. We're managing more variables and performing additional operations that don't happen in the simple example. How can this be faster?!</p> <h3> Measuring the Difference </h3> <p>I ran some comparisons over a variety of data sizes and multiple runs, as well as sequential or interleaved testing. The parallelSum performance varied, but was almost always better, excepting some odd results for very small data sizes. I tested this using RunJS, which is built on Chrome's V8 engine.</p> <p>Different data sizes gave <em>very roughly</em> these results:</p>
- Small (
- Medium (10k-100k): Typically ~20-80% faster
- Large (> 1M): Consistently twice as fast
Then I created a JSPerf with 1 million records to try across different browsers. Try it yourself!
Chrome ran parallelSum twice as fast as simpleSum, as expected from the RunJS testing.
Safari was almost identical to Chrome, both in percents and operations per second.
Firefox on the same system performed almost the same for simpleSum but parallelSum was only about 15% faster, not twice as fast.
This variation sent me looking for more information. While it's nothing definitive, I found a StackOverflow comment from 2016 discussing some of the JS engine issues with loop unrolling. It's an interesting look at how engines and optimizations can affect code in ways we don't expect.
Variations
I tried a third version as well, which added two values in a single operation to see if there was a noticeable difference between one variable and two.
const parallelSum = (data) => { let sum = 0 for(let i=0; i <p>Short answer: No. The two "parallel" versions were within the reported margin of error of each other.</p> <h2> So, How Does it Work? </h2> <p>While JavaScript is single-threaded, the interpreters, compilers, and hardware underneath can perform optimizations for us when certain conditions are met.</p> <p>In the simple example, the operation needs the value i to know what data to fetch, and it needs the latest value of sum to update. Because both of these change in each loop, the computer has to wait for the loop to complete to get more data. While it may seem obvious to us what i += 1 will do, the computer mostly understands "the value will change, check back later", so it has difficulty optimizing.</p> <p>Our parallel versions load multiple data entries for each value of i. We still depend on sum for each loop, but we can load and process twice as much data per cycle. But that doesn't mean it runs <em>twice as fast</em>.</p> <h3> Deeper Dive </h3> <p>To understand why loop unrolling works we look to the low-level operation of a computer. Processors with super-scalar architectures can have multiple pipelines to perform simultaneous operations. They can support out-of-order execution so operations that don't depend on each other can happen as soon as possible. For some operations, SIMD can perform one action on multiple pieces of data at once. Beyond that we start getting into caching, data fetching, and branch prediction...</p> <p>But this is a JavaScript article! We're not going that deep. If you want to know more about processor architectures, Anandtech has some excellent Deep Dives.</p><h2> Limits and Drawbacks </h2> <p>Loop unrolling is not magic. There are limits and diminishing returns that appear because of program or data size, operation complexity, computer architecture, and more. But we've only tested one or two operations, and modern computers often support four or more threads.</p> <p>To try some larger increments, I made another JSPerf with 1, 2, 4, and 10 records and ran it on an Apple M1 Max MacBook Pro running macOS 14.5 Sonoma, and an AMD Ryzen 9 3950X PC running Windows 11.</p> <p>Ten records at a time was 2.5-3.5x faster than the base loop, but only 12-15% faster than processing four records on the Mac. On the PC we still saw the 2x improvement between one to two records, but ten records was just 2% faster than four records, which I would not have predicted for a 16-core processor.</p> <h3> Platforms and Updates </h3> <p>These different results remind us to be careful with optimization. Optimizing for your computer could create a worse experience on less-capable or just different hardware. Performance or functionality issues for older or entry-level hardware is a common issue when developers work on fast, powerful machines, and it's something I've been tasked with multiple times in my career.</p> <p>For some performance scale, a currently-available entry-level Chromebook from HP has an Intel Celeron N4120 processor. This is roughly equivalent to my 2013 Core i5-4250U MacBook Air. It has just <em>one ninth</em> the performance of the M1 Max in a synthetic benchmark. On that 2013 MacBook Air, running the latest version of Chrome, <em>the 4-record function</em> was faster than the 10-record, but still only 60% faster than the single-record function!</p> <p>Browsers and standards are constantly changing, too. A routine browser update or a different processor architecture could make optimized code <em>slower</em> than a regular loop. When you find yourself deeply optimizing, you may need to ensure your optimization is relevant to your consumers, and that it <em>stays relevant</em>.</p> <p>It reminds me of the book High Performance JavaScript by Nicholas Zakas, which I read back in 2012. It was a great book and contained a lot of insight. However, by 2014 a number of the significant performance issues identified in the book had been resolved or substantially reduced by browser engine updates, and we were able to focus more effort on writing maintainable code.</p> <p>If you are trying to stay on the edge of performance optimization, be prepared for change and regular validation.</p> <h3> Lessons from the Past </h3> <p>While researching this topic I came across a Linux Kernel Mailing List thread from the year 2000 about removing some loop unrolling optimizations which ultimately improved the application performance. It included this still-relevant point (emphasis mine):</p> <blockquote> <p><strong>The bottom line is that our intuitive assumptions of what's fast and what isn't can often be wrong,</strong> especially given how much CPU's have changed over the past couple of years.<br> – Theodore Ts'o</p> </blockquote> <h2> Conclusion </h2> <p>There are times you may need to squeeze performance out of a loop, and if you are processing enough items, this could be one of the ways you do that. It's good to know about these kind of optimizations, but for most work, You Aren't Gonna Need It™.</p> <p>Still I hope you've enjoyed my rambling, and that maybe in the future your memory will be jogged about performance optimization considerations.</p> <p>Thanks for reading!</p>
The above is the detailed content of Loop Unrolling in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!

Detailed explanation of JavaScript string replacement method and FAQ This article will explore two ways to replace string characters in JavaScript: internal JavaScript code and internal HTML for web pages. Replace string inside JavaScript code The most direct way is to use the replace() method: str = str.replace("find","replace"); This method replaces only the first match. To replace all matches, use a regular expression and add the global flag g: str = str.replace(/fi

Article discusses creating, publishing, and maintaining JavaScript libraries, focusing on planning, development, testing, documentation, and promotion strategies.

The article discusses strategies for optimizing JavaScript performance in browsers, focusing on reducing execution time and minimizing impact on page load speed.

The article discusses effective JavaScript debugging using browser developer tools, focusing on setting breakpoints, using the console, and analyzing performance.

Bring matrix movie effects to your page! This is a cool jQuery plugin based on the famous movie "The Matrix". The plugin simulates the classic green character effects in the movie, and just select a picture and the plugin will convert it into a matrix-style picture filled with numeric characters. Come and try it, it's very interesting! How it works The plugin loads the image onto the canvas and reads the pixel and color values: data = ctx.getImageData(x, y, settings.grainSize, settings.grainSize).data The plugin cleverly reads the rectangular area of the picture and uses jQuery to calculate the average color of each area. Then, use

This article will guide you to create a simple picture carousel using the jQuery library. We will use the bxSlider library, which is built on jQuery and provides many configuration options to set up the carousel. Nowadays, picture carousel has become a must-have feature on the website - one picture is better than a thousand words! After deciding to use the picture carousel, the next question is how to create it. First, you need to collect high-quality, high-resolution pictures. Next, you need to create a picture carousel using HTML and some JavaScript code. There are many libraries on the web that can help you create carousels in different ways. We will use the open source bxSlider library. The bxSlider library supports responsive design, so the carousel built with this library can be adapted to any

Key Points Enhanced structured tagging with JavaScript can significantly improve the accessibility and maintainability of web page content while reducing file size. JavaScript can be effectively used to dynamically add functionality to HTML elements, such as using the cite attribute to automatically insert reference links into block references. Integrating JavaScript with structured tags allows you to create dynamic user interfaces, such as tab panels that do not require page refresh. It is crucial to ensure that JavaScript enhancements do not hinder the basic functionality of web pages; even if JavaScript is disabled, the page should remain functional. Advanced JavaScript technology can be used (

Data sets are extremely essential in building API models and various business processes. This is why importing and exporting CSV is an often-needed functionality.In this tutorial, you will learn how to download and import a CSV file within an Angular


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version
Recommended: Win version, supports code prompts!

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft