Home >Web Front-end >JS Tutorial >How does Google crawler crawl JavaScript?

How does Google crawler crawl JavaScript?

php中世界最好的语言
php中世界最好的语言Original
2017-11-17 17:29:182592browse

We know that domestic browsers and search tools use crawlers to crawl web page information, so how does the Google crawler crawl Javascript? Today I will have an in-depth study and discussion with you.

We tested how Google’s crawler crawls JavaScript, and here’s what we learned.

Think Google can’t handle JavaScript? Think again. Audette Audette shared the results of a series of tests in which he and his colleagues tested what types of JavaScript functionality would be crawled and included by Google.

How does Google crawler crawl JavaScript?

Long story short

1. We conducted a series of tests that confirmed that Google can execute and include JavaScript in a variety of ways. We also confirmed that Google can render the entire page and read the DOM, allowing it to include dynamically generated content.

2. SEO signals in the DOM (page title, meta description, canonical tag, meta robots tag, etc.) are all paid attention to. Content dynamically inserted into the DOM can also be crawled and included. Furthermore, in some cases, the DOM may even take precedence over HTML source code statements. While this requires more work, it's one of several we tested.

Introduction: Google executes JavaScript & reads DOM

As early as 2008, Google successfully crawled JavaScript, but it was probably limited to a certain way.

What is clear today is that Google has not only been able to strategize the types of JavaScript they crawl and include, but has made significant progress (especially in the last 12 to 18 months) in rendering the entire web. ).

At Merkle, our SEO technical team wanted to better understand what types of JavaScript events Google crawlers can crawl and include. After research, we found eye-popping results and confirmed that Google can not only execute various JavaScript events, but also include dynamically generated content. How? Google can read the DOM.

What is DOM?

Many SEO people don’t understand what Document Object Model (DOM) is.


How does Google crawler crawl JavaScript?

What happens when the browser requests a page, and how does the DOM get involved.

When used in a web browser, the DOM is essentially an application interface, or API, for marking up and structuring data (such as HTML and XML). This interface allows web browsers to combine them to form documents.

DOM also defines how to obtain and operate the structure. Although the DOM is a language-neutral API (not tied to a specific programming language or library), it is commonly used for JavaScript and dynamic content in web applications.

DOM represents the interface, or "bridge," that connects a web page to a programming language. The result of parsing HTML and executing JavaScript is the DOM. The content of a web page is not (not only) the source code, but the DOM. This makes it very important.

How does Google crawler crawl JavaScript?

#How JavaScript works through the DOM interface.

We were excited to discover that Google can read the DOM and parse signals and dynamically inserted content such as title tags, page text, head tags, and meta annotations (eg: rel = canonical). Full details can be read there.

This series of tests and results

Because we want to know what kind of JavaScript functions will be crawled and included, we created a series of tests separately for Google Crawler. Ensure that URL activities can be understood independently by creating controls. Below, let us break down some interesting test results in detail. They are divided into 5 categories:

JavaScript redirection

JavaScript link

Dynamic insertion of content

Dynamic insertion of Meta data and page elements

An important example with rel = “nofollow”

How does Google crawler crawl JavaScript?

Example: A page used to test the Google crawler’s ability to understand JavaScript.

1. JavaScript redirection

We first tested common JavaScript redirections. What will be the results of URLs expressed in different ways? We selected the window.location object for two tests: Test A calls window.location with an absolute path URL, while Test B uses a relative path.

Result: The redirect was quickly tracked by Google. From an indexing perspective, they are interpreted as 301s - the final URL replaces the redirect URL in Google indexing.

In subsequent tests, we used exactly the same content on an authoritative web page to complete a JavaScript redirect to a new page on the same site. And the original URL is ranked on the first page of Google’s top queries.

Result: Sure enough, the redirect was tracked by Google, but the original page was not indexed. The new URL is included and immediately ranked in the same position within the same query page. This was surprising to us, and seems to indicate that JavaScript redirects behave (sometimes) a lot like permanent 301 redirects from a ranking perspective.

Next time your client wants a JavaScript redirect move done for their website, you may not need to answer, or answer: "Please don't." Because this seems to have a transfer ranking signal relationship. Supporting this conclusion are quotes from Google guidelines:

Using JavaScript to redirect users may be a legal practice. For example, if you redirect logged-in users to an internal page, you can do this using JavaScript. When double-checking JavaScript or other redirection methods, make sure your site follows our guidelines and takes into account its intent. Remember that 301 redirects to your website are best, but if you don’t have access to your website server, you can use JavaScript redirects for this.

2. JavaScript links

We tested different types of JS links using multiple encoding methods.

We test the link of the drop-down menu. Historically, search engines have been unable to track this type of link. We want to determine if the onchange event handler will be tracked. Importantly, this only enforces specific types, and we need to be aware of the impact of other changes, rather than forcing the JavaScript redirect above.

How does Google crawler crawl JavaScript?

#Example: Language selection drop-down menu on the Google Work page.

Result: The link was fully crawled and tracked.

We also tested common JavaScript links. Below are the most common types of JavaScript links, while traditional SEO recommends plain text. These tests include JavaScript linking code:

Acts on an external href key-value pair (AVP), but within a tag ("onClick")

Acts on an internal href key-value pair ("javascript: window" .location")

Acts outside the a tag, but calls AVP ("javascript: openlink()") within the href

And so on

Result: The link is completed Crawl and track.

Our next test is to further test the event handler, such as the onchange tested above. Specifically, we want to leverage the mouse movement event handler and then hide the URL variable so that it only executes when the event handler functions (onmousedown and onmouseout in this case) are fired.

Result: The link was fully crawled and tracked.

Constructing links: We know that Google can execute JavaScript, but we want to confirm that they can read the variables in the code. So in this test, we concatenate characters that construct a URL string.

Result: The link was fully crawled and tracked.

3. Dynamically insert content

Obviously, these are the key points: dynamically insert text, images, links and navigation. High-quality text content is crucial for search engines to understand the topic and content of the web page. In this age of dynamic websites, its importance is unquestionable.

These tests are designed to check the results of dynamically inserting text in two different scenarios.

1. Test whether the search engine can count dynamically inserted text, and the text comes from the HTML source code of the page.

2. Test whether the search engine can count dynamically inserted text that comes from outside the page's HTML source (in an external JavaScript file).

Results: In both cases, the text was crawled and included, and the page was ranked based on that content. Cool!

In order to learn more about this, we tested a client-side global navigation written in JavaScript, and the links in the navigation were inserted through the document.writeIn function, and confirmed that they could be fully crawled and tracked. . It should be noted that Google can interpret a website built using the AngularJS framework and HTML5 History API (pushState), render and index it, and rank it just like a traditional static web page. This is why it’s important not to block Google crawlers from fetching external files and JavaScript, and perhaps why Google is removing it from the Ajax-enabled SEO Guidelines. Who needs HTML snapshots when you can simply render the entire page?

After testing, we found that no matter what type of content, the results are the same. For example, images are crawled and included after they are loaded into the DOM. We even did a test where we made a breadcrumb by dynamically generating the data-vocabulary.org structure data and inserting it into the DOM. The results of it? The successfully inserted breadcrumbs appear in the search engine results page.

It is worth noting that Google now recommends using JSON-LD markup to form structured data. I'm sure there will be more stuff based on this in the future.

4. Dynamically insert Meta data & page elements

We dynamically insert various tags that are critical to SEO into the DOM:

Title element

Meta description

Meta robots

Canonical tags

Results: In all cases, tags were crawled and behaved just like elements in the HTML source code.

An interesting complementary experiment helps us understand priorities. When there are conflicting signals, which one wins? What will happen if there are noindex and nofollow tags in the source code and noindex and follow tags in the DOM? How does the HTTP x-robots response header behave as another variable in this protocol? This will be part of future comprehensive testing. However, our testing shows that when a conflict occurs, Google ignores the tags in the source code in favor of the DOM.

5. An important example with rel="nofollow"

We wanted to test how Google responds to the nofollow attribute appearing at the link level in the source code and DOM. We therefore create a control without nofollow applied.

How does Google crawler crawl JavaScript?

For nofollow, we separately test the annotations generated by source code vs DOM.

The nofollow in the source code works as expected (the link is not followed). However, nofollow in the DOM fails (the link is followed and the page is included). Why? Because the modification of the href element in the DOM happens too late: Google is ready to crawl the link and queues the URL before executing the JavaScript function that adds rel="nofollow". However, if an a element with href = "nofollow" is inserted into the DOM, the nofollow and link will be tracked since they are inserted at the same time.

RESULTS

Historically, SEO recommendations of all kinds have been to focus on ‘plain text’ content whenever possible. And dynamically generated content, AJAX and JavaScript links can hurt SEO for major search engines. Apparently, this is no longer a problem for Google. JavaScript links behave like normal HTML links (this is just the surface, and we don't know what's going on behind the scenes).

JavaScript redirects are treated similar to 301 redirects.

Dynamically inserted content, even meta tags such as rel canonical annotations, are treated in the same way whether in the HTML source code or when JavaScript is triggered to generate the DOM after the initial HTML is parsed.

Google relies on being able to fully render the page and understand the DOM, not just the source code. It’s really incredible! (Remember to allow Google crawlers to fetch those external files and JavaScript.)

Google has been innovating at an alarming rate, leaving other search engines behind. We hope to see the same type of innovation in other search engines. If they are to remain competitive and make substantial progress in the new era of the web, this means better support for HTML5, JavaScript, and dynamic websites.

For SEO, those who have not understood the above basic concepts and Google technology should study and learn well to catch up with the current technology. If you don't take the DOM into account, you might lose half your share.

Related recommendations:

Summary of JavaScript methods for traversing arrays

How does JavaScript create an array?

Basic Tutorial for Getting Started with JavaScript

The above is the detailed content of How does Google crawler crawl JavaScript?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn