Home  >  Article  >  Backend Development  >  PHP crawler practice: crawling data on Twitter

PHP crawler practice: crawling data on Twitter

WBOY
WBOYOriginal
2023-06-13 13:17:392747browse

In the digital age, social media has become an indispensable part of people's lives. Twitter is one of them, with hundreds of millions of users sharing various information on it every day. For some research, analysis, promotion and other needs, it is very necessary to obtain relevant data on Twitter. This article will introduce how to use PHP to write a simple Twitter crawler to crawl some keyword-related data and store it in the database.

1. Twitter API

Twitter provides an official API (Application Programming Interface) interface for developers to obtain relevant data. To use Twitter's API, you need to create an application (App) in advance and obtain the relevant parameters of the application, including Consumer Key, Consumer Secret, Access Token and Access Token Secret. The specific application methods will not be described here.

2. Install Twitter API Library

Twitter API officially provides a development access library (PHP Library), which can simplify the process of using Twitter API. In this article, we will use this library to obtain Twitter data. There are many ways to install the Twitter API library. Here we introduce the method of using composer to manage dependencies. The specific steps are as follows:

1. Install composer

composer is a dependency management tool for PHP, you can download the corresponding Install the operating system installation package.

2. Use composer to install the Twitter API library

Enter the following command in the command line window to install the Twitter API library in the project directory:

composer require abraham/twitteroauth

3. Obtain Twitter data

Using the Twitter API to crawl data is divided into two steps: authentication and query. After the authentication is completed, you can use the query command to obtain the specified Twitter data, as shown below:

require_once('twitteroauth/autoload.php');
use AbrahamTwitterOAuthTwitterOAuth;

$consumerKey = "your_consumer_key";
$consumerSecret = "your_consumer_secret";
$accessToken = "your_access_token";
$accessTokenSecret = "your_access_token_secret";
$connection = new TwitterOAuth($consumerKey, $consumerSecret, $accessToken, $accessTokenSecret);

$tweets = $connection->get("search/tweets", array("q" => "php", "count" => 100));

The above code can obtain the latest 100 tweets (tweets) related to "php" and store the results in the $tweets variable.

4. Parse and save data

After obtaining the Twitter data, you need to parse and save the data. This example uses a MySQL database, and you can use PHP's PDO extension and SQL statements to store data. The specific code is as follows:

try{
    $dbh = new PDO('mysql:host=localhost;dbname=your_database_name', 'your_username', 'your_password');
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    
    $tweetsArray = json_decode(json_encode($tweets), True)['statuses']; // 将 tweets 转换成数组
    
    foreach ($tweetsArray as $tweet) {
        $id = $tweet['id_str'];
        $text = $tweet['text'];
        $created_at = date("Y-m-d H:i:s", strtotime($tweet['created_at']));
        $user = $tweet['user']['screen_name'];  
        
        // 将数据保存到数据库中
        $statement = $dbh->prepare("INSERT INTO tweets (id, text, created_at, user) VALUES (:id, :text, :created_at, :user)");
        $statement->bindParam(':id', $id);
        $statement->bindParam(':text', $text);
        $statement->bindParam(':created_at', $created_at);
        $statement->bindParam(':user', $user);
        $statement->execute();
    }
    
    echo "Data saved successfully!";
} catch (PDOException $e) {
    echo "Error: " . $e->getMessage();
}

The above code will parse the contents of the $tweets array and store the specified data in the database table tweets.

5. Complete code

require_once('twitteroauth/autoload.php');
use AbrahamTwitterOAuthTwitterOAuth;

$consumerKey = "your_consumer_key";
$consumerSecret = "your_consumer_secret";
$accessToken = "your_access_token";
$accessTokenSecret = "your_access_token_secret";
$connection = new TwitterOAuth($consumerKey, $consumerSecret, $accessToken, $accessTokenSecret);

$tweets = $connection->get("search/tweets", array("q" => "php", "count" => 100));

try{
    $dbh = new PDO('mysql:host=localhost;dbname=your_database_name', 'your_username', 'your_password');
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    
    $tweetsArray = json_decode(json_encode($tweets), True)['statuses']; // 将 tweets 转换成数组
    
    foreach ($tweetsArray as $tweet) {
        $id = $tweet['id_str'];
        $text = $tweet['text'];
        $created_at = date("Y-m-d H:i:s", strtotime($tweet['created_at']));
        $user = $tweet['user']['screen_name'];  
        
        // 将数据保存到数据库中
        $statement = $dbh->prepare("INSERT INTO tweets (id, text, created_at, user) VALUES (:id, :text, :created_at, :user)");
        $statement->bindParam(':id', $id);
        $statement->bindParam(':text', $text);
        $statement->bindParam(':created_at', $created_at);
        $statement->bindParam(':user', $user);
        $statement->execute();
    }
    
    echo "Data saved successfully!";
} catch (PDOException $e) {
    echo "Error: " . $e->getMessage();
}

6. Notes

  1. The Twitter API is limited. Each application can only initiate a certain number of requests every 15 minutes. . Too frequent requests will cause the API to fail.
  2. The data returned by the Twitter API is in JSON format and needs to be parsed using the json_decode function.
  3. It is recommended to store Twitter data in the database for subsequent analysis and processing.

7. Summary

This article introduces how to use PHP to write a simple Twitter crawler and store the data in the database. Although using the Twitter API can greatly simplify the process of data acquisition, you still need to pay attention to the limitations of the API and the data parsing and storage process in actual development. Learning and mastering these basic skills can provide a good foundation for future data analysis and processing.

The above is the detailed content of PHP crawler practice: crawling data on Twitter. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn