Python crawler implementation code example for taking names-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python crawler implementation code example for taking names

Y2J

May 10, 2017 am 11:42 AM

pythonreptile

Everyone will encounter something in their life. They will not care about it before it appears, but once it comes, they will find that it is extremely important and require a major decision to be made in a short period of time. That is for yourself. Give your newborn baby a name. The following article mainly introduces how to use Python crawler to give your child a good name. Friends in need can refer to it.

Preface

I believe every parent has experienced it, because it is necessary to name the child within two weeks after birth (you need to apply for a birth certificate), which is estimated to be a lot. Everyone is like me. I was very confused at first. Although I felt that there are so many Chinese characters, I could just find any character to make a name. But later I realized that it was really not a casual thing. No matter how I thought about it, I found that it was inappropriate, so I looked around in dictionaries and online. I search and read Tang and Song Dynasty poems, the Book of Songs, and even martial arts novels. However, the name I have been thinking about for a long time often encounters the opinions and objections of my family members, such as problems such as difficulty in speaking, the same accent as the name of relatives, etc., so I fall into a cycle of repeated searches and denials. The cycle becomes more and more confusing.

So we went back to the Internet again and searched various , and found many articles on the Internet such as "A complete list of good baby boy names". These articles gave hundreds of articles at once. Thousands of names are too dizzying to use. There are many websites or apps that test names. When you enter a name, you can get a rating of eight characters or five characters. This function is quite good and can be used as a reference. However, we either need to input names one by one for testing, or These websites or APPs have very few names, or they cannot meet our needs such as qualifying words, or they start charging, and in the end we can't find any useful ones.

So I want to make a program like this:

The main function is to provide reference for batch names. These names are combined with the baby's Calculated from birth date and horoscope;
You can expand the name library. For example, if you find a batch of good names in the Book of Songs on the Internet and want to see how they are, you can add them and use them;
You can limit the characters used in the name. For example, some family trees have restrictions. If the current generation is "国", the name must have the character "国";
The name list can be given scores, so that after inversion, you can look at the names from high scores to low scores;

In this way, you can get a copy There is a list of names that match your child's birth date, your family tree restrictions, and your preferences, and the list has given scores for reference. Based on this, we can figure it out one by one to find the name we like. Of course, if you have new ideas, you can add new names to the vocabulary at any time and recalculate.

Code structure of the program

Code introduction:

/chinese-name-score Code root directory
/chinese-name-score/main Code directory
/chinese-name-score/main/dicts Dictionary file directory
/chinese-name-score/main/dicts/names_boys_double.txt Dictionary file, two-letter names for boys
/chinese-name-score/main/dicts/names_boys_single.txt Dictionary file, single-letter names for boys
/chinese-name-score/ main/dicts/names_girls_single.txt Dictionary file, two-letter names for girls
/chinese-name-score/main/dicts/names_grils_double.txt Dictionary file, one-letter names for girls
/chinese-name-score/main/outputs Output data directory
/chinese-name-score/main/outputs/names_girls_source_wxy.txt Output Sample files
/chinese-name-score/main/scripts Some scripts for preprocessing dictionary files
/chinese-name -score/main/scripts/unique_file_lines.py Set the dictionary file to remove duplicate names and blank lines in the dictionary
##/chinese-name -score/main/sys_config.py The system configuration of the program, including the crawled target URL and dictionary file path
/chinese-name-score/main/user_config.py The user configuration of the program , including the baby’s age, month, day, time, gender and other settings
/chinese-name-score/main/get_name_score.py Program running entrance

How to use the code:

If there are no qualified words, find the dictionary files names_boys_double.txt and names_grils_double.txt, you can add yourself here For some name lists found, just split them by line and add them at the end;
If there are qualified words, find the dictionary files names_boys_single.txt and names_girls_single.txt, and add your favorites here. A single word list can be divided by line and added at the end;
Open user_config.py and configure it. See the next section for configuration items;
Run the script get_name_score.py
In the outputs directory, view your own output files, which can be copied to Excel for sorting and other operations;

Program The configuration entry

The configuration of the program is as follows:

# coding:GB18030
 
"""
在这里写好配置
"""
 
setting = {}
 
# 限定字，如果配置了该值，则会取用单字字典，否则取用多字字典
setting["limit_world"] = "国"
# 姓
setting["name_prefix"] = "李"
# 性别，取值为 男 或者 女
setting["sex"] = "男"
# 省份
setting["area_province"] = "北京"
# 城市
setting["area_region"] = "海淀"
# 出生的公历年份
setting[&#39;year&#39;] = "2017"
# 出生的公历月份
setting[&#39;month&#39;] = "1"
# 出生的公历日子
setting[&#39;day&#39;] = "11"
# 出生的公历小时
setting[&#39;hour&#39;] = "11"
# 出生的公历分钟
setting[&#39;minute&#39;] = "11"
# 结果产出文件名称
setting[&#39;output_fname&#39;] = "names_girls_source_xxx.txt"

According to the configuration item setting["limit_world"] , the system automatically determines whether to use a single-word dictionary or a multi-word dictionary Dictionary:

If this item is set, for example, if it is equal to "国", then the program will combine all the words into names for calculation. For example, both the names Guohao and Haoguo will be calculated;
If you do not set this item and keep it empty String, the program will only read the double-word dictionary of *_double.txt

Principle of the program

This is a simple crawler. You can open the life.httpcn.com/xingming.asp website to view. This is a POST form. Fill in the required parameters and click submit. A results page will open. The bottom of the results page contains the eight-character score and the five-frame score.

If you want to get scores, you need to do two things. One is to automatically submit the form to the crawler and get the results page; the other is to extract the scores from the results page;

For the first thing, it is very simple , urllib2 can achieve it (the code is in /chinese-name-score/main/get_name_score.py):

 post_data = urllib.urlencode(params)
 req = urllib2.urlopen(sys_config.REQUEST_URL, post_data)
 content = req.read()

The params here is a parameter dict. In this way, POST with data is submitted. Then the result data was obtained from content.

The parameters of params are set as follows:

 params = {}
 
 # 日期类型，0表示公历，1表示农历
 params[&#39;data_type&#39;] = "0"
 params[&#39;year&#39;] = "%s" % str(user_config.setting["year"])
 params[&#39;month&#39;] = "%s" % str(user_config.setting["month"])
 params[&#39;day&#39;] = "%s" % str(user_config.setting["day"])
 params[&#39;hour&#39;] = "%s" % str(user_config.setting["hour"])
 params[&#39;minute&#39;] = "%s" % str(user_config.setting["minute"])
 params[&#39;pid&#39;] = "%s" % str(user_config.setting["area_province"])
 params[&#39;cid&#39;] = "%s" % str(user_config.setting["area_region"])
 # 喜用五行，0表示自动分析，1表示自定喜用神
 params[&#39;wxxy&#39;] = "0"
 params[&#39;xing&#39;] = "%s" % (user_config.setting["name_prefix"])
 params[&#39;ming&#39;] = name_postfix
 # 表示女，1表示男
 if user_config.setting["sex"] == "男":
  params[&#39;sex&#39;] = "1"
 else:
  params[&#39;sex&#39;] = "0"
  
 params[&#39;act&#39;] = "submit"
 params[&#39;isbz&#39;] = "1"

The second thing is to extract the required scores from the web page. We can use BeautifulSoup4 to achieve this, and its syntax is also very simple:

 soup = BeautifulSoup(content, &#39;html.parser&#39;, from_encoding="GB18030")
 full_name = get_full_name(name_postfix)
 
 # print soup.find(string=re.compile(u"姓名五格评分"))
 for node in soup.find_all("p", class_="chaxun_b"):
  node_cont = node.get_text()
  if u&#39;姓名五格评分&#39; in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名五格评分"))
   result_data[&#39;wuge_score&#39;] = name_wuge.next_sibling.b.get_text()
  
  if u&#39;姓名八字评分&#39; in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名八字评分"))
   result_data[&#39;bazi_score&#39;] = name_wuge.next_sibling.b.get_text()

Through this method, HTML can be parsed and the scores of eight characters and five grids can be extracted.

Example of running results

1/1287 李国锦 姓名八字评分=61.5 姓名五格评分=78.6 总分=140.1
2/1287 李国铁 姓名八字评分=61 姓名五格评分=89.7 总分=150.7
3/1287 李国晶 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
4/1287 李鸣国 姓名八字评分=21 姓名五格评分=90.3 总分=111.3
5/1287 李柔国 姓名八字评分=64 姓名五格评分=78.3 总分=142.3
6/1287 李国经 姓名八字评分=21 姓名五格评分=89.8 总分=110.8
7/1287 李国蒂 姓名八字评分=22 姓名五格评分=87.2 总分=109.2
8/1287 李国登 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
9/1287 李略国 姓名八字评分=21 姓名五格评分=83.7 总分=104.7
10/1287 李国添 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
11/1287 李国天 姓名八字评分=22 姓名五格评分=83.7 总分=105.7
12/1287 李国田 姓名八字评分=22 姓名五格评分=93.7 总分=115.7

With these scores, we can sort them, which is a very practical reference.

Friendly reminder

The score is related to many factors, such as the time of birth, the limited words, the strokes of the limited words, etc. These conditions It has been decided that some names will not have high scores, so don’t be affected by this, just find the ones with high relative scores;
Currently, the program can only crawl the content of one website, and the address is http ://life.httpcn.com/xingming.asp
This list is for reference only. I have read some articles. There are many celebrities and great people in history. Their names have very low ratings but they all made great achievements. , the name does have some influence, but sometimes catchy words are the best;
After selecting a name from this list, you can check it on Baidu, Renren and other places to Just in case some negative people have the same name, or there are too many people with this name;
The eight-character score is inherited from China, and the five-frame score was invented by the Japanese in modern times. Sometimes You can also try the Western zodiac naming method, and strangely, the horoscopes and five scores are very different on different websites, which further proves that this thing is for reference only;

## The code of this article has been uploaded to github

Summary

[Related recommendations]

Python Free Video Tutorial

Python Meets Data Collection Video Tutorial

Python Learning Manual

The above is the detailed content of Python crawler implementation code example for taking names. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software