Home >Backend Development >Python Tutorial >Mobike crawler analysis - find the API

Mobike crawler analysis - find the API

PHPz
PHPzOriginal
2017-04-04 10:37:002451browse

Warning: This article is only for reference purposes for learning and research, please do not use it for illegal purposes.

In the previous article "Mobike Unofficial Big Data Analysis" I mentioned my data analysis of Mobike during the Spring Festival. I will further elaborate on it in the following series of articles. How does my crawler crawl this data efficiently?

Why climb Mobike’s data

Mobike is the first shared bicycle to enter Chengdu. Every day when I get off the subway station, I can see many bicycles in the APP, but when I walk there When I arrived, I realized the car was not there. Some cars are hidden somewhere; some cars may be behind high-rise buildings and cannot be found due to GPS errors; some cars are placed in residential areas, separated by a wall so that cyclists cannot get to them.

So is there a way to obtain the data of these bicycles to analyze whether these bicycles have become zombie bicycles? Did someone deliberately put it in the community so that no one can access it?

With these questions, I began to study how to obtain this data.

Where to get the data

If you can see the data, then we always have a way to automatically obtain the data. It’s just that the method of obtaining data determines the efficiency of obtaining data. For the task of data analysis of Mobike, the crawler must be able to obtain more data in a short time (usually about 10 minutes). For data Analysis is useful. So where does the data come from?

The most direct source is the Mobike APP. Modern software design pays attention to the separation of front-end and back-end, and the server will serve APP, web pages, etc. at the same time. Under this trend, we only need to figure out the HTTP request of the software. Generally speaking, the following tools can help:

Direct packet capture:

##Use a proxy to capture HTTP request packets and

debug :

  • Fid

    dler 4

  • Charles

  • Packet Capture (Android)

Since my phone is not rooted, there is too much interference in capturing packets on the router, and it is not easy to use https. So you can only try using Fiddler or Charles first. Hang up Fiddler's proxy, and then keep moving the location on the mobile phone to see if there are any new requests. But unfortunately, it seems that the requests are all for getting the Amap

map, and there is no data related to Mobike.

What's going on? Try the mobile version. After switching to Packet Capture, there was indeed traffic, and I found the one I was most concerned about in the request:

Mobike crawler analysis - find the API
##4372317-de272f8395d2106f.png

This

API

The request is obvious at first glance. I tried it in postman and it can return the information correctly. It seems that it is you! too happy too early

After climbing data for several days in a row, I analyzed the data and found that the GPS of Mobike seemed to be beating all the time, and sometimes the beat would exceed a distance of several kilometers. , obviously not a normal value.

Could it be that their

interface

has been manipulated to return false data? I observed that even in the APP, the data returned by the bicycle jumped. From early one morning to the next morning, I refreshed the cars near my home at intervals to see if this was really the case.

Picture

I can’t find it, but after observation, I came to the conclusion that there is indeed a problem with the location returned in the APP. There was a car placed in a very remote location. It disappeared for a while, then came back later, and it matched the data I captured. Moreover, this bounce has nothing to do with mobile phones, mobile phone numbers, or even mobile operators, which shows that this bounce is a problem with Mobike’s interface. It can also explain from another aspect why we sometimes see cars but there are actually no cars there. This is a screenshot of a

video

posted on Moments before. You can see that there is a sharp point near the entrance of the camp. The car is actually stopped there, but the GPS track shows that it is for a short time. The inner body moves nearby, even moves far away, and then returns to that position.

Mobike crawler analysis - find the API

## Such data is simply useless for data analysis, and I almost gave up.

turnaround

With the popularity of the WeChat mini program, Mobike also launched the mini program immediately. I laughed when I saw it, yes, it gave me another data source to try. After capturing data once with Packet Capture, it is easy to determine the API. The specific process will not be explained here. After crawling, I crawled two or three days of data and found that there was a turnaround, and the data was consistent with normal bicycle trajectories.

The only thing left is to improve the efficiency of the crawler.

Other attempts

Sometimes it is very convenient to directly analyze the source code of the APP to find the API entrance. I decompile the Mobike Android APP, but I find that except for some resource files, it is useful. , other files are packed using Qihoo 360's obfuscator. There are articles on the Internet that analyze how to perform shelling, but I don’t have much time to study it, so forget it.

Also talk about API design

The reason why Mobike’s API is easy to crawl and analyze is largely because the API design is too simple:

  • Only uses http requests, making it easy to carry out packet capture analysis

  • There is no encryption of requests in these APIs, making their services easy to be use.

  • In addition, WeChat mini programs are also an important source of leaked APIs. After all, requests in the APP can be encrypted through native code and then sent out, but there does not seem to be such a method in mini programs. Function.

If you are interested, you can try to take a look at the request of Xiaolan Bicycle APP. They use https request and encrypt the data request. It is difficult to capture their data. It will increase a lot.

Of course, if Mobike officials don’t care about data, such an API design would be OK.


The above is the detailed content of Mobike crawler analysis - find the API. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn