Home >Web Front-end >JS Tutorial >Headless WebKit and PhantomJS

Headless WebKit and PhantomJS

Joseph Gordon-Levitt
Joseph Gordon-LevittOriginal
2025-02-24 10:24:13205browse

Headless WebKit and PhantomJS

Core points

  • PhantomJS, a WebKit-based headless browser that allows faster programmatic automation and testing of web pages without the need for a graphical user interface.
  • PhantomJS provides powerful features such as the ability to interact with pages through JavaScript, allowing easy automation of tasks such as clicking buttons, submitting forms, and even loading and manipulating web pages using libraries such as standard DOM API or jQuery.
  • PhantomJS provides an extensive file system API that enables applications to store source code to the file system, take screenshots of web pages, and even include external script files into pages.
  • Although PhantomJS is powerful, it is not very integrated with Node.js. Existing projects usually use child process modules to generate PhantomJS instances and communicate with Node.js via WebSockets.

If you are reading this article, you will most likely know what a browser is. Now remove the GUI and you get the so-called headless browser. A headless browser can do all the same things as a normal browser, but it's faster. They are perfect for programmatically automating and testing web pages. There are many headless browsers available at present, and PhantomJS is the best among them. Built on WebKit, the engine behind Chrome and Safari, PhantomJS provides you with powerful browser features without the need for a bulky GUI. Getting started with PhantomJS is easy – just download the executable. Next, create a file named hello.js and add the following lines of code:

<code class="language-javascript">console.log("Hello World!");
phantom.exit();</code>

To execute the script, run the following command. Note that the phantomjs executable must be in the current directory, or somewhere in the PATH of the environment. If everything is configured correctly, PhantomJS will print "Hello World!" to the console and terminate when phantom.exit() is called.

<code class="language-bash">phantomjs hello.js</code>

Using the webpage

After PhantomJS is running, you can start automating the web. The following example loads the Google homepage and saves the screenshot to a file. Line 1 creates a new instance of the web page. Line 4 loads google.com. After the page is loaded, the onLoadFinished() callback function will be executed. The callback function receives a single parameter status, which indicates whether the page is loaded successfully. The URL to load the page is available in page.url. This property is especially useful when the page contains redirects, and you want to know exactly where you are. Line 8 uses the render() method of the page to take a screenshot. render() can create PNG, GIF, JPEG, and PDF files.

<code class="language-javascript">console.log("Hello World!");
phantom.exit();</code>

Page settings

Many settings of page objects can be customized according to the needs of the application. For example, if you are only interested in downloading source code, you can speed up your application by ignoring the image file and closing JavaScript. The following rewrite example reflects these changes. The changed settings are displayed on lines 3 and 4. Note that any setting changes must be made before calling open() . If you look at the screenshot of this example, you will notice that the Google logo image is missing, but the rest of the page remains the same.

<code class="language-bash">phantomjs hello.js</code>

Accessing the file system

So far, our example has loaded the page and saved the screenshot as an image file. While this is undoubtedly cool, many applications prefer to store source code into a file system. PhantomJS does this by providing a wide range of file system APIs. The following example uses the FileSystem module to write google.com source code to a file. First, import the FileSystem module on line 2. On line 6, open the output file for writing. On line 7, use the write() method to write the data to the file. The actual source code can be obtained through the content property of the page. Finally, close the file and terminate PhantomJS.

<code class="language-javascript">var page = require("webpage").create();
var homePage = "http://www.google.com/";

page.open(homePage);
page.onLoadFinished = function(status) {
  var url = page.url;

  console.log("Status:  " + status);
  console.log("Loaded:  " + url);
  page.render("google.png");
  phantom.exit();
};</code>

Execute JavaScript

One of the most powerful features of PhantomJS is the ability to interact with pages through JavaScript. This makes it extremely easy to automate tasks such as clicking buttons and submitting forms. Our next example performs a web search by loading the Google homepage, typing a query, and submitting a search form. The beginning of the example should look familiar. The new content starts at line 8 and we determine which page has been loaded. If this is the home page, the evaluate() method of the page is called. evaluate() executes the code you provide in the context of the page. This actually gives you the same permissions as the original developer of the page. How cool is this?

<code class="language-javascript">var page = require("webpage").create();
var homePage = "http://www.google.com/";

page.settings.javascriptEnabled = false;
page.settings.loadImages = false;
page.open(homePage);
page.onLoadFinished = function(status) {
  var url = page.url;

  console.log("Status:  " + status);
  console.log("Loaded:  " + url);
  page.render("google.png");
  phantom.exit();
};</code>

Inside evaluation() we find the search box and form. We set the value of the search box to "JSPro" and submit the form. This will cause the page's onLoadFinished() method to be fired again. However, this time, screenshots of the search results will be taken, and PhantomJS will exit. PhantomJS also provides two methods includingJs() and injectJs(), which allow you to add external script files to a page. includeJs() is used to include any script file that can be accessed by the page. For example, you could use the following code to include jQuery in the previous example. Note the call to includeJs() in line 9, and the jQuery syntax inside evaluate().

<code class="language-javascript">var page = require("webpage").create();
var fs = require("fs");
var homePage = "http://www.google.com/";

page.open(homePage);
page.onLoadFinished = function(status) {
  var file = fs.open("output.htm", "w");

  file.write(page.content);
  file.close();
  phantom.exit();
};</code>
The

injectJs() method is similar to includeJs(). The difference is that the injected script file does not need to be accessed from the page object. For example, this allows you to inject scripts from your local file system.

PhantomJS and Node.js

Unfortunately, PhantomJS has not been integrated with Node.js. Some projects have been created to try to control PhantomJS from Node.js, but they are all a bit clumsy. Existing projects use child process modules to generate PhantomJS instances. Next, PhantomJS loads a special web page that communicates with Node.js using WebSockets. It may not be ideal, but it works. Two more popular PhantomJS Node modules are node-phantom and phantomjs-node. I recently started developing my own PhantomJS Node module called ghostbuster. Ghostbuster is similar to node-phantom, but attempts to reduce callback nesting by providing more powerful commands. The fewer calls to PhantomJS, the less time you waste communication on WebSockets. Another option is zombie.js, a lightweight headless browser built on jsdom. Zombie is not as powerful as PhantomJS, but it is a real Node.js module.

Conclusion

After reading this article, you should have a basic understanding of PhantomJS. One of the best features of PhantomJS is its ease of use. If you are already familiar with JavaScript, the learning curve is small. PhantomJS also supports various other features not covered in this article. As always, I encourage you to view the documentation. There are also some examples that show the full functionality of PhantomJS!

FAQs about Headless WebKit and PhantomJS

What is the main difference between headless WebKit and PhantomJS?

Headless WebKit and PhantomJS are both tools for automating web browsers. However, the main difference is their function. Headless WebKit is a browser without a graphical user interface that can programmatically control it for automation, testing, and server-side rendering. PhantomJS, on the other hand, is a scriptable headless browser for automating web interactions, providing JavaScript APIs that support automated navigation, screenshots, user behavior, and assertions.

Is PhantomJS still maintained?

As of March 2018, PhantomJS has no longer been actively maintained. The main reason is the emergence of modern headless browsers such as Chrome headless browsers and Firefox headless browsers, which provide more features and better support.

What are some alternatives to PhantomJS?

Since PhantomJS is no longer maintained, some alternatives emerge. These include Puppeteer, a Node library that provides a high-level API to control Chrome or Chromium through the DevTools protocol, and Selenium WebDriver, an open source collection of APIs for automated testing of web applications.

How does PhantomJS work?

PhantomJS works by providing a JavaScript API that supports automated navigation, screenshots, user behavior, and assertions. It is a scriptable headless WebKit using the JavaScript API. It has fast and native support for a variety of web standards: DOM processing, CSS selector, JSON, Canvas, and SVG.

Can I use PhantomJS for web crawling?

Yes, PhantomJS can be used for web crawling. It allows you to load and manipulate web pages using common libraries such as standard DOM API or jQuery.

How to install PhantomJS?

PhantomJS can be installed through npm (Node package manager). You can use the command "npm install phantomjs" in the terminal or in the command prompt.

What is the role of headless WebKit in server-side rendering?

Headless WebKit plays a crucial role in server-side rendering, as it allows the server to pre-render JavaScript-rendered pages, convert them to HTML, and then send them to the client. This improves the performance and SEO of your web application.

Can I use headless WebKit for automated testing?

Yes, headless WebKit is an excellent tool for automated testing. It allows you to run tests in a real browser environment without the need for a visible UI.

How to install headless WebKit?

The installation process of headless WebKit depends on the specific tool you use. For example, if you are using Puppeteer, you can install it through npm using the command "npm install puppeteer".

What are the advantages of using headless WebKit compared to traditional automated browsers?

Headless WebKit has several advantages over traditional browsers in automation. It's faster because it doesn't take time to render the visual. It also allows for automated, scriptable browsing, which is very useful for testing and web crawling.

The above is the detailed content of Headless WebKit and PhantomJS. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn