We know that domestic browsers and search tools use crawlers to crawl web page information, so how does the Google crawler crawl Javascript? Today I will have an in-depth study and discussion with you.
We tested how Google’s crawler crawls JavaScript, and here’s what we learned.
Think Google can’t handle JavaScript? Think again. Audette Audette shared the results of a series of tests in which he and his colleagues tested what types of JavaScript functionality would be crawled and included by Google.
Long story short
1. We conducted a series of tests that confirmed that Google can execute and include JavaScript in a variety of ways. We also confirmed that Google can render the entire page and read the DOM, allowing it to include dynamically generated content.
2. SEO signals in the DOM (page title, meta description, canonical tag, meta robots tag, etc.) are all paid attention to. Content dynamically inserted into the DOM can also be crawled and included. Furthermore, in some cases, the DOM may even take precedence over HTML source code statements. While this requires more work, it's one of several we tested.
Introduction: Google executes JavaScript & reads DOM
As early as 2008, Google successfully crawled JavaScript, but it was probably limited to a certain way.
What is clear today is that Google has not only been able to strategize the types of JavaScript they crawl and include, but has made significant progress (especially in the last 12 to 18 months) in rendering the entire web. ).
At Merkle, our SEO technical team wanted to better understand what types of JavaScript events Google crawlers can crawl and include. After research, we found eye-popping results and confirmed that Google can not only execute various JavaScript events, but also include dynamically generated content. How? Google can read the DOM.
What is DOM?
Many SEO people don’t understand what Document Object Model (DOM) is.
What happens when the browser requests a page, and how does the DOM get involved.
When used in a web browser, the DOM is essentially an application interface, or API, for marking up and structuring data (such as HTML and XML). This interface allows web browsers to combine them to form documents.
DOM also defines how to obtain and operate the structure. Although the DOM is a language-neutral API (not tied to a specific programming language or library), it is commonly used for JavaScript and dynamic content in web applications.
DOM represents the interface, or "bridge," that connects a web page to a programming language. The result of parsing HTML and executing JavaScript is the DOM. The content of a web page is not (not only) the source code, but the DOM. This makes it very important.
#How JavaScript works through the DOM interface.
We were excited to discover that Google can read the DOM and parse signals and dynamically inserted content such as title tags, page text, head tags, and meta annotations (eg: rel = canonical). Full details can be read there.
This series of tests and results
Because we want to know what kind of JavaScript functions will be crawled and included, we created a series of tests separately for Google Crawler. Ensure that URL activities can be understood independently by creating controls. Below, let us break down some interesting test results in detail. They are divided into 5 categories:
JavaScript redirection
JavaScript link
Dynamic insertion of content
Dynamic insertion of Meta data and page elements
An important example with rel = “nofollow”
Example: A page used to test the Google crawler’s ability to understand JavaScript.
1. JavaScript redirection
We first tested common JavaScript redirections. What will be the results of URLs expressed in different ways? We selected the window.location object for two tests: Test A calls window.location with an absolute path URL, while Test B uses a relative path.
Result: The redirect was quickly tracked by Google. From an indexing perspective, they are interpreted as 301s - the final URL replaces the redirect URL in Google indexing.
In subsequent tests, we used exactly the same content on an authoritative web page to complete a JavaScript redirect to a new page on the same site. And the original URL is ranked on the first page of Google’s top queries.
Result: Sure enough, the redirect was tracked by Google, but the original page was not indexed. The new URL is included and immediately ranked in the same position within the same query page. This was surprising to us, and seems to indicate that JavaScript redirects behave (sometimes) a lot like permanent 301 redirects from a ranking perspective.
Next time your client wants a JavaScript redirect move done for their website, you may not need to answer, or answer: "Please don't." Because this seems to have a transfer ranking signal relationship. Supporting this conclusion are quotes from Google guidelines:
Using JavaScript to redirect users may be a legal practice. For example, if you redirect logged-in users to an internal page, you can do this using JavaScript. When double-checking JavaScript or other redirection methods, make sure your site follows our guidelines and takes into account its intent. Remember that 301 redirects to your website are best, but if you don’t have access to your website server, you can use JavaScript redirects for this.
2. JavaScript links
We tested different types of JS links using multiple encoding methods.
We test the link of the drop-down menu. Historically, search engines have been unable to track this type of link. We want to determine if the onchange event handler will be tracked. Importantly, this only enforces specific types, and we need to be aware of the impact of other changes, rather than forcing the JavaScript redirect above.
#Example: Language selection drop-down menu on the Google Work page.
Result: The link was fully crawled and tracked.
We also tested common JavaScript links. Below are the most common types of JavaScript links, while traditional SEO recommends plain text. These tests include JavaScript linking code:
Acts on an external href key-value pair (AVP), but within a tag ("onClick")
Acts on an internal href key-value pair ("javascript: window" .location")
Acts outside the a tag, but calls AVP ("javascript: openlink()") within the href
And so on
Result: The link is completed Crawl and track.
Our next test is to further test the event handler, such as the onchange tested above. Specifically, we want to leverage the mouse movement event handler and then hide the URL variable so that it only executes when the event handler functions (onmousedown and onmouseout in this case) are fired.
Result: The link was fully crawled and tracked.
Constructing links: We know that Google can execute JavaScript, but we want to confirm that they can read the variables in the code. So in this test, we concatenate characters that construct a URL string.
Result: The link was fully crawled and tracked.
3. Dynamically insert content
Obviously, these are the key points: dynamically insert text, images, links and navigation. High-quality text content is crucial for search engines to understand the topic and content of the web page. In this age of dynamic websites, its importance is unquestionable.
These tests are designed to check the results of dynamically inserting text in two different scenarios.
1. Test whether the search engine can count dynamically inserted text, and the text comes from the HTML source code of the page.
2. Test whether the search engine can count dynamically inserted text that comes from outside the page's HTML source (in an external JavaScript file).
Results: In both cases, the text was crawled and included, and the page was ranked based on that content. Cool!
In order to learn more about this, we tested a client-side global navigation written in JavaScript, and the links in the navigation were inserted through the document.writeIn function, and confirmed that they could be fully crawled and tracked. . It should be noted that Google can interpret a website built using the AngularJS framework and HTML5 History API (pushState), render and index it, and rank it just like a traditional static web page. This is why it’s important not to block Google crawlers from fetching external files and JavaScript, and perhaps why Google is removing it from the Ajax-enabled SEO Guidelines. Who needs HTML snapshots when you can simply render the entire page?
After testing, we found that no matter what type of content, the results are the same. For example, images are crawled and included after they are loaded into the DOM. We even did a test where we made a breadcrumb by dynamically generating the data-vocabulary.org structure data and inserting it into the DOM. The results of it? The successfully inserted breadcrumbs appear in the search engine results page.
It is worth noting that Google now recommends using JSON-LD markup to form structured data. I'm sure there will be more stuff based on this in the future.
4. Dynamically insert Meta data & page elements
We dynamically insert various tags that are critical to SEO into the DOM:
Title element
Meta description
Meta robots
Canonical tags
Results: In all cases, tags were crawled and behaved just like elements in the HTML source code.
An interesting complementary experiment helps us understand priorities. When there are conflicting signals, which one wins? What will happen if there are noindex and nofollow tags in the source code and noindex and follow tags in the DOM? How does the HTTP x-robots response header behave as another variable in this protocol? This will be part of future comprehensive testing. However, our testing shows that when a conflict occurs, Google ignores the tags in the source code in favor of the DOM.
5. An important example with rel="nofollow"
We wanted to test how Google responds to the nofollow attribute appearing at the link level in the source code and DOM. We therefore create a control without nofollow applied.
For nofollow, we separately test the annotations generated by source code vs DOM.
The nofollow in the source code works as expected (the link is not followed). However, nofollow in the DOM fails (the link is followed and the page is included). Why? Because the modification of the href element in the DOM happens too late: Google is ready to crawl the link and queues the URL before executing the JavaScript function that adds rel="nofollow". However, if an a element with href = "nofollow" is inserted into the DOM, the nofollow and link will be tracked since they are inserted at the same time.
RESULTS
Historically, SEO recommendations of all kinds have been to focus on ‘plain text’ content whenever possible. And dynamically generated content, AJAX and JavaScript links can hurt SEO for major search engines. Apparently, this is no longer a problem for Google. JavaScript links behave like normal HTML links (this is just the surface, and we don't know what's going on behind the scenes).
JavaScript redirects are treated similar to 301 redirects.
Dynamically inserted content, even meta tags such as rel canonical annotations, are treated in the same way whether in the HTML source code or when JavaScript is triggered to generate the DOM after the initial HTML is parsed.
Google relies on being able to fully render the page and understand the DOM, not just the source code. It’s really incredible! (Remember to allow Google crawlers to fetch those external files and JavaScript.)
Google has been innovating at an alarming rate, leaving other search engines behind. We hope to see the same type of innovation in other search engines. If they are to remain competitive and make substantial progress in the new era of the web, this means better support for HTML5, JavaScript, and dynamic websites.
For SEO, those who have not understood the above basic concepts and Google technology should study and learn well to catch up with the current technology. If you don't take the DOM into account, you might lose half your share.
Related recommendations:
Summary of JavaScript methods for traversing arrays
How does JavaScript create an array?
Basic Tutorial for Getting Started with JavaScript
The above is the detailed content of How does Google crawler crawl JavaScript?. For more information, please follow other related articles on the PHP Chinese website!

去掉重复并排序的方法:1、使用“Array.from(new Set(arr))”或者“[…new Set(arr)]”语句,去掉数组中的重复元素,返回去重后的新数组;2、利用sort()对去重数组进行排序,语法“去重数组.sort()”。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于Symbol类型、隐藏属性及全局注册表的相关问题,包括了Symbol类型的描述、Symbol不会隐式转字符串等问题,下面一起来看一下,希望对大家有帮助。

怎么制作文字轮播与图片轮播?大家第一想到的是不是利用js,其实利用纯CSS也能实现文字轮播与图片轮播,下面来看看实现方法,希望对大家有所帮助!

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于对象的构造函数和new操作符,构造函数是所有对象的成员方法中,最早被调用的那个,下面一起来看一下吧,希望对大家有帮助。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于面向对象的相关问题,包括了属性描述符、数据描述符、存取描述符等等内容,下面一起来看一下,希望对大家有帮助。

方法:1、利用“点击元素对象.unbind("click");”方法,该方法可以移除被选元素的事件处理程序;2、利用“点击元素对象.off("click");”方法,该方法可以移除通过on()方法添加的事件处理程序。

本篇文章给大家带来了关于JavaScript的相关知识,其中主要介绍了关于BOM操作的相关问题,包括了window对象的常见事件、JavaScript执行机制等等相关内容,下面一起来看一下,希望对大家有帮助。

foreach不是es6的方法。foreach是es3中一个遍历数组的方法,可以调用数组的每个元素,并将元素传给回调函数进行处理,语法“array.forEach(function(当前元素,索引,数组){...})”;该方法不处理空数组。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version
Chinese version, very easy to use

WebStorm Mac version
Useful JavaScript development tools

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Linux new version
SublimeText3 Linux latest version

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
