Home > Article > Backend Development > How to use Python to capture administrative division codes
Foreword
The National Bureau of Statistics website has relatively complete administrative division codes. For some websites, this is very basic data, so I wrote a Python program to capture this part of the data.
Note: After grabbing it, you need to do simple manual sorting
Sample code:
# -*- coding:utf-8 -*- ''' 获取国家统计局上的行政区划码 ''' import requests,re base_url = 'http://www.stats.gov.cn/tjsj/tjbz/xzqhdm/201504/t20150415_712722.html' def get_xzqh(): html_data = requests.get(base_url).content pattern = re.compile('<p class="MsoNormal" style=".*?"><span lang="EN-US" style=".*?">(\d+)<span>.*?</span></span><span style=".*?">(.*?)</span></p>') areas = re.findall(pattern,html_data) print "code,name,level" for area in areas: print area[0],area[1].decode('utf-8').replace(u' ',''),area[1].decode('utf-8').count(u' ') if __name__=='__main__': get_xzqh()
Notes:
In addition, there is another way to obtain information about the country and region table, which is the country and region information table that comes with the QQ software. (The file name is LocList.xml
), the general storage location is: C:Program FilesTencentQQI18N2052
If you want the Chinese version, install the Chinese version of QQ to get it. If you want the English version, install the English version of QQ. The international version is in Catalog 1033.
The codes are all written in accordance with ISO3166 standards and are easy to import into the database.
Summary
The above is all about using Python to obtain administrative division codes. I hope the content of this article can be helpful to everyone in learning or using Python. If you have any questions, you can leave a message to communicate.