Home  >  Article  >  Web Front-end  >  Will Google crawl JavaScript that contains body content?

Will Google crawl JavaScript that contains body content?

WBOY
WBOYforward
2023-08-25 14:33:12740browse

Google 是否会抓取包含正文内容的 JavaScript

Historically, search engine crawlers like Googlebot could only read static HTML source code and were unable to scan and index material written dynamically using JavaScript. However, this has changed with the rise of JavaScript-rich websites and frameworks such as Angular, React, and Vue.JS, as well as single-page applications (SPA) and progressive web applications (PWA). In order to display web pages correctly before indexing them, Google modified and discontinued its previous AJAX crawling technology. While Google can generally crawl and index most JavaScript information, they recommend against using client-side solutions because JavaScript "is difficult to process, and not all search engine crawlers can process it correctly or quickly." ”

What is Google crawl?

Google and other search engines use software called Google crawlers (also known as search bots or spiders) to scan the web. In other words, it "crawls" the Internet from page to website, looking for fresh or updated content that isn't already in Google's database.

Each search engine has a unique collection of crawlers. For Google, there are more than 15 different types of crawlers, with Googlebot being the main one. Since Googlebot does crawling and indexing, we'll examine its operation in more detail.

How does Google crawler work?

No search engine (including Google) maintains a central register of URLs and updates that URL every time a new page is created. This means that Google has to search the internet for new pages rather than automatically "alert" them. Googlebot is constantly prowling the Internet, looking for new web pages to add to Google's inventory of existing web pages.

Once a new website is found, Googlebot renders (or "visualizes") the site in the browser by loading all HTML, third-party code, JavaScript, and CSS. Search engines use this data saved in databases to index and rank pages. The page will be added to the Google index, which is an additional, very large Google database if it is indexed.

JavaScript and HTML Rendering

Lengthy code can be difficult for Googlebot to process and render. If the code is not clean, the crawler may not be able to render your site correctly, in which case it will be treated as empty.

Regarding JavaScript rendering, please keep in mind that the language is evolving rapidly and Googlebot may sometimes stop supporting the latest version. Make sure your JavaScript is compatible with Googlebot to avoid showing your site wrongly. Ensure JavaScript loads quickly. Googlebot will not render and index script-generated material if it takes longer than five seconds to load.

When to use JavaScript for scraping?

We still recommend selectively using JavaScript crawling when first analyzing a site for JavaScript, although Google will typically render every page. JavaScript is used to leverage known client-side dependencies for auditing and during deployment on large sites.

All resources (including JavaScript, CSS, and images) must be selectively crawled to display each web page and build the DOM in a headless browser behind the scenes. JavaScript crawling is slower and more labor intensive.

While this isn't a problem for smaller sites, it can have a significant impact on larger sites with hundreds or even millions of pages. If your website doesn't rely heavily on JavaScript to dynamically change web pages, there's no need to spend time or resources.

When processing JavaScript and web pages with dynamic content (DOM), the crawler must read and evaluate the Document Object Model. After all the code is loaded and processed, a fully displayed version of such a website must also be generated. Browsers are the easiest tool for us to view displayed web pages. For this reason, crawling JavaScript is sometimes described as using a "headless browser."

in conclusion

There will be more JavaScript in the next few years because it is here to stay. JavaScript can coexist peacefully with SEOs and crawlers as long as you discuss it with SEO early on when creating your website architecture. Crawlers are still just replicas of the behavior of actual search engine bots. In addition to JavaScript crawlers, we strongly recommend using log file analysis, Google's URL inspection tool, or mobile-friendly testing tools to understand what Google can crawl, render, and index.

The above is the detailed content of Will Google crawl JavaScript that contains body content?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:tutorialspoint.com. If there is any infringement, please contact admin@php.cn delete