Home >Backend Development >Python Tutorial >How to use Python to display the distribution of colleges and universities across the country

How to use Python to display the distribution of colleges and universities across the country

WBOY
WBOYforward
2023-05-01 10:16:121081browse

Data acquisition

To display the distribution of colleges and universities, you must first obtain the location data of colleges and universities across the country. The data for this article comes from the Palm College Entrance Examination Network

. When writing this article in June 2022, a total of 2,822 colleges and universities information was obtained. After checking the data, except for a few null values, the entire data is very complete and does not affect use. The data has a total of 44 fields. This article will only use a few fields. They do not need to be processed and can be obtained on demand when using them.

How to use Python to display the distribution of colleges and universities across the country

Introduction to data acquisition methods (basic crawler knowledge):

1. Register and log in to the Palm College Entrance Examination Network. Select all schools on the page.

2. Press the F12 key, click Network > Fetch/XHR, and then click and <next page on the school> page several times. ; button, the accessed API and other information will be displayed on the XHR page. <br></next></previous></p> <p>3. Copy the API each time the page is turned for comparison. It is found that there are two parameters that change when the page is turned: page and signsafe. Page is the number of pages currently accessed, and signsafe is a The md5 value cannot be reversely solved, but the previous values ​​can be saved and used randomly later. With this information, by constantly changing the number of pages visited and the signsafe value, all school data can be obtained. </p> <p>The numFound parameter value in the Response is the total number of schools. Divide by the number of schools displayed on each page to get the total number of pages. You can also directly click on the <last page> of the page to view the total number of pages. This way: The number of visits is determined. <br></last></p> <p>4. Because the website needs to be logged in to use, it is also necessary to obtain the headers during access, such as Request Method (POST is used this time), User-Agent, etc. <br></p> <p>5. With the above information, loop out the URLs of all pages, use requests to send a request to get the data of all universities, and then use pandas to write the data to excel. <br></p> <blockquote><p>Warm reminder: When obtaining data, you must comply with the relevant statements of the website. Try to set a certain time interval for the crawler code. Do not run the crawler code during peak access periods. </p></blockquote> <h5>Getting latitude and longitude</h5> <p>The Palm College Entrance Examination Network is a website for filling out volunteer services for the college entrance examination. Although the data obtained has 44 fields, it does not contain the latitude and longitude of the school. In order to better display the location of colleges and universities on the map, it is necessary to obtain the corresponding longitude and latitude based on the school's address. <br></p> <p>This article uses the Baidu Map open platform: https://lbsyun.baidu.com/apiconsole/center#/home. You can use the open interface of Baidu Map to obtain the longitude and latitude of the geographical location. <br></p> <p>The steps are: <br></p> <p>1. Register and log in to a Baidu account. This account can be a common account for the entire Baidu ecosystem (such as accounts for network disks, libraries, etc. are universal of). <br></p> <p>2. Log in to the Baidu Map Open Platform, click to enter <control panel>, then click <my application> in <application management>, and then click <create applications>Create an application. Customize the application name, fill in other information as prompted and required, and conduct real-name authentication to become an individual developer. <br></create></application></my></control></p> <p><img src="https://img.php.cn/upload/article/000/887/227/168290737432734.jpg" alt="How to use Python to display the distribution of colleges and universities across the country"></p> <p>3. After creating the application, you will get an application's <access application>. You can use this AK value to call Baidu's API. The reference code is as follows. </access></p><pre class="brush:python;toolbar:false;">import requests def baidu_api(addr): url = "http://api.map.baidu.com/geocoding/v3/?" params = { "address": addr, "output": "json", "ak": "复制你创建的应用AK到此" } req = requests.get(url, params) res = req.json() if len(res["result"]) > 0: loc = res["result"]["location"] return loc else: print("获取{}经纬度失败".format(addr)) return {&#39;lng&#39;: &#39;&#39;, &#39;lat&#39;: &#39;&#39;}

4. After successfully calling Baidu Map API, read the locations of all colleges and universities, call the above function in sequence, obtain the longitude and latitude of all colleges and universities, and rewrite it into excel.

import pandas as pd
import numpy as np


def get_lng_lat():
df = pd.read_excel(&#39;school.xlsx&#39;)
lng_lat = []
for row_index, row_data in df.iterrows():
addr = row_data[&#39;address&#39;]
if addr is np.nan:
addr = row_data[&#39;city_name&#39;] + row_data[&#39;county_name&#39;]
# print(addr)
loc = baidu_api(addr.split(&#39;,&#39;)[0])
lng_lat.append(loc)
df[&#39;经纬度&#39;] = lng_lat
df[&#39;经度&#39;] = df[&#39;经纬度&#39;].apply(lambda x: x[&#39;lng&#39;])
df[&#39;纬度&#39;] = df[&#39;经纬度&#39;].apply(lambda x: x[&#39;lat&#39;])
df.to_excel(&#39;school_lng_lat.xlsx&#39;)

The final data result is as shown below:

How to use Python to display the distribution of colleges and universities across the country

Individual developers need to pay attention when using the Baidu Map open platform. There is a daily quota limit, so when debugging the code first Don’t use all the data, run through the demo first, otherwise you have to wait a day or buy credits.

How to use Python to display the distribution of colleges and universities across the country

College location display

The data is ready, let’s display them on the map.

This article uses Baidu’s open source data visualization tool Echarts. Echarts provides the pyecharts library for the Python language, which is very convenient to use.

Installation command:

pip install pyecharts

1. Mark the location of colleges and universities

from pyecharts.charts import Geo
from pyecharts import options as opts
from pyecharts.globals import GeoType
import pandas as pd

def multi_location_mark():
"""批量标注点"""
geo = Geo(init_opts=opts.InitOpts(bg_color=&#39;black&#39;, width=&#39;1600px&#39;, height=&#39;900px&#39;))
df = pd.read_excel(&#39;school_lng_lat.xlsx&#39;)
for row_index, row_data in df.iterrows():
geo.add_coordinate(row_data[&#39;name&#39;], row_data[&#39;经度&#39;], row_data[&#39;纬度&#39;])
data_pair = [(name, 2) for name in df[&#39;name&#39;]]
geo.add_schema(
maptype=&#39;china&#39;, is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color=&#39;#323c48&#39;, border_color=&#39;#408080&#39;)
).add(
&#39;&#39;, data_pair=data_pair, type_=GeoType.SCATTER, symbol=&#39;pin&#39;, symbol_size=16, color=&#39;#CC3300&#39;
).set_series_opts(
label_opts=opts.LabelOpts(is_show=False)
).set_global_opts(
title_opts=opts.TitleOpts(title=&#39;全国高校位置标注图&#39;, pos_left=&#39;650&#39;, pos_top=&#39;20&#39;,
title_textstyle_opts=opts.TextStyleOpts(color=&#39;white&#39;, font_size=16))
).render(&#39;high_school_mark.html&#39;)

How to use Python to display the distribution of colleges and universities across the country

##From the marking results It appears that colleges and universities are mainly located along the coast, central and eastern areas, with relatively few in the west, especially in high-altitude areas.

2. Draw a heat map of the distribution of colleges and universities

from pyecharts.charts import Geo
from pyecharts import options as opts
from pyecharts.globals import ChartType
import pandas as pd

def draw_location_heatmap():
"""绘制热力图"""
geo = Geo(init_opts=opts.InitOpts(bg_color=&#39;black&#39;, width=&#39;1600px&#39;, height=&#39;900px&#39;))
df = pd.read_excel(&#39;school_lng_lat.xlsx&#39;)
for row_index, row_data in df.iterrows():
geo.add_coordinate(row_data[&#39;name&#39;], row_data[&#39;经度&#39;], row_data[&#39;纬度&#39;])
data_pair = [(name, 2) for name in df[&#39;name&#39;]]
geo.add_schema(
maptype=&#39;china&#39;, is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color=&#39;#323c48&#39;, border_color=&#39;#408080&#39;)
).add(
&#39;&#39;, data_pair=data_pair, type_=ChartType.HEATMAP
).set_series_opts(
label_opts=opts.LabelOpts(is_show=False)
).set_global_opts(
title_opts=opts.TitleOpts(title=&#39;全国高校分布热力图&#39;, pos_left=&#39;650&#39;, pos_top=&#39;20&#39;,
title_textstyle_opts=opts.TextStyleOpts(color=&#39;white&#39;, font_size=16)),
visualmap_opts=opts.VisualMapOpts()
).render(&#39;high_school_heatmap.html&#39;)

How to use Python to display the distribution of colleges and universities across the country## From the heat map, the places where colleges and universities are more concentrated are mainly coastal areas, The north, Shanghai, Guangzhou, Yangtze and Yellow River basins, and the more western areas are only Sichuan and Chongqing.

3.绘制按省划分的分布密度图

from pyecharts.charts import Map
from pyecharts import options as opts
import pandas as pd


def draw_location_density_map():
"""绘制各省高校分布密度图"""
map = Map(init_opts=opts.InitOpts(bg_color=&#39;black&#39;, width=&#39;1200px&#39;, height=&#39;700px&#39;))
df = pd.read_excel(&#39;school_lng_lat.xlsx&#39;)
s = df[&#39;province_name&#39;].value_counts()
data_pair = [[province, int(s[province])] for province in s.index]
map.add(
&#39;&#39;, data_pair=data_pair, maptype="china"
).set_global_opts(
title_opts=opts.TitleOpts(title=&#39;全国高校按省分布密度图&#39;, pos_left=&#39;500&#39;, pos_top=&#39;70&#39;,
title_textstyle_opts=opts.TextStyleOpts(color=&#39;white&#39;, font_size=16)),
visualmap_opts=opts.VisualMapOpts(max_=200, is_piecewise=True, pos_left=&#39;100&#39;, pos_bottom=&#39;100&#39;,textstyle_opts=opts.TextStyleOpts(color=&#39;white&#39;, font_size=16))
).render("high_school_density.html")

How to use Python to display the distribution of colleges and universities across the country

从省级分布密度图可以看出,高校数量多的省份集中在中部和东部,尤其是北京和上海附近的几个省。

4.211和985高校的分布情况

筛选出211和985的高校数据,再绘制一次。(代码不重复粘贴,只需要加一行筛选代码即可)

How to use Python to display the distribution of colleges and universities across the country

The above is the detailed content of How to use Python to display the distribution of colleges and universities across the country. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete