Use Python crawler to give your child a good name-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Use Python crawler to give your child a good name

高洛峰

Feb 20, 2017 am 10:13 AM

Preface

I believe every parent has experienced it, because it is necessary to choose a name within two weeks after the child is born (you need to apply for a birth certificate), and it is estimated that many people are like this I was the same. I was very confused at first. Although I felt that there are so many Chinese characters, I could just pick any character to make a name. But later I realized that it was really not a casual thing. No matter how I thought about it, I found that it was inappropriate. So I looked around in dictionaries, searched online, and Tang poetry, Song lyrics, The Book of Songs, and even martial arts novels. However, the name that I have been thinking about for a long time often encounters the opinions and objections of my family members, such as problems such as difficulty in pronouncing it, the same name as the relative and the accent, etc. In this way, I fall into a cycle of repeated search and denial, and the more and more It's getting more and more confusing.

So we went back to the Internet to search again and found many articles on the Internet such as "A complete list of good baby boy names". These articles gave hundreds or even thousands of names at once. Too dazzled to use. There are many websites or apps that test names. When you enter a name, you can get a rating of eight characters or five characters. This function is quite good and can be used as a reference. However, we either need to input names one by one for testing, or These websites or APPs have very few names, or they cannot meet our needs such as qualifying words, or they start charging, and in the end we can't find any useful ones.

So I want to make a program like this:

The main function is to provide reference for batch names. These names are combined with the baby's Calculated from the birth date;
You can expand the name library. For example, if you find a batch of good names in the Book of Songs on the Internet and want to see how they are, you can add them and use them;
You can limit the characters used in the name. For example, some family trees have restrictions. If the current generation is "国", the name must have the character "国";
The name list can be given scores, so that after inversion, you can look at the names from high scores to low scores;

In this way, you can get a copy There is a list of names that match your child's birth date, your family tree restrictions, and your preferences, and the list has given scores for reference. Based on this, we can figure it out one by one to find the name we like. Of course, if you have new ideas, you can add new names to the vocabulary at any time and recalculate.

Code structure of the program

Use Python crawler to give your child a good name

Code introduction:

/chinese-name-score Code root directory
/chinese-name-score/main Code directory
/chinese-name- score/main/dicts Dictionary file directory
/chinese-name-score/main/dicts/names_boys_double.txt Dictionary file, boys’ double-letter names
/chinese-name-score/main/dicts/names_boys_single.txt Dictionary file, single-letter names for boys
##/chinese-name-score/main/dicts/names_girls_single. txt dictionary file, girls’ two-letter names

How to use the code:

Configuration entrance of the program

The configuration of the program is as follows:

# coding:GB18030
 
"""
在这里写好配置
"""
 
setting = {}
 
# 限定字，如果配置了该值，则会取用单字字典，否则取用多字字典
setting["limit_world"] = "国"
# 姓
setting["name_prefix"] = "李"
# 性别，取值为 男 或者 女
setting["sex"] = "男"
# 省份
setting["area_province"] = "北京"
# 城市
setting["area_region"] = "海淀"
# 出生的公历年份
setting[&#39;year&#39;] = "2017"
# 出生的公历月份
setting[&#39;month&#39;] = "1"
# 出生的公历日子
setting[&#39;day&#39;] = "11"
# 出生的公历小时
setting[&#39;hour&#39;] = "11"
# 出生的公历分钟
setting[&#39;minute&#39;] = "11"
# 结果产出文件名称
setting[&#39;output_fname&#39;] = "names_girls_source_xxx.txt"

According to the configuration item

setting["limit_world"]

, the system Automatically decide whether to use a single-character dictionary or a multi-character dictionary: <ol class=" list-paddingleft-2"> <li>如果设置了该项，比如等于“国”，那么程序会组合所有的单字为名字用于计算，比如国浩和浩国两个名字都会计算；</li> <li>如果不设置该项，保持空字符串，则程序只会读取*_double.txt的双字词典 </li> </ol> 程序的原理 这是一个简单的爬虫。大家可以打开http://www.php.cn/网站查看，这是一个POST表单，填写需要的参数，点提交，就会打开一个结果页面，结果页面的最下方包含了八字分数和五格分数。 如果想得到分数，就需要做两件事情，一是爬虫自动提交表单，获取结果页面；二是从结果页面提取分数； 对于第一件事情，很简单，urllib2即可实现（代码在/chinese-name-score/main/get_name_score.py）： <pre class='brush:php;toolbar:false;'> post_data = urllib.urlencode(params) req = urllib2.urlopen(sys_config.REQUEST_URL, post_data) content = req.read()</pre> 这里的params是个参数dict，使用这种方式，就进行了POST带数据的提交，然后从content得到了结果数据。 params的参数设定如下： <pre class='brush:php;toolbar:false;'> params = {} # 日期类型，0表示公历，1表示农历 params[&#39;data_type&#39;] = "0" params[&#39;year&#39;] = "%s" % str(user_config.setting["year"]) params[&#39;month&#39;] = "%s" % str(user_config.setting["month"]) params[&#39;day&#39;] = "%s" % str(user_config.setting["day"]) params[&#39;hour&#39;] = "%s" % str(user_config.setting["hour"]) params[&#39;minute&#39;] = "%s" % str(user_config.setting["minute"]) params[&#39;pid&#39;] = "%s" % str(user_config.setting["area_province"]) params[&#39;cid&#39;] = "%s" % str(user_config.setting["area_region"]) # 喜用五行，0表示自动分析，1表示自定喜用神 params[&#39;wxxy&#39;] = "0" params[&#39;xing&#39;] = "%s" % (user_config.setting["name_prefix"]) params[&#39;ming&#39;] = name_postfix # 表示女，1表示男 if user_config.setting["sex"] == "男": params[&#39;sex&#39;] = "1" else: params[&#39;sex&#39;] = "0" params[&#39;act&#39;] = "submit" params[&#39;isbz&#39;] = "1"</pre> 第二件事情，就是从网页中提取需要的分数，我们可以使用BeautifulSoup4来实现，其语法也很简单： <pre class='brush:php;toolbar:false;'> soup = BeautifulSoup(content, &#39;html.parser&#39;, from_encoding="GB18030") full_name = get_full_name(name_postfix) # print soup.find(string=re.compile(u"姓名五格评分")) for node in soup.find_all("p", class_="chaxun_b"): node_cont = node.get_text() if u&#39;姓名五格评分&#39; in node_cont: name_wuge = node.find(string=re.compile(u"姓名五格评分")) result_data[&#39;wuge_score&#39;] = name_wuge.next_sibling.b.get_text() if u&#39;姓名八字评分&#39; in node_cont: name_wuge = node.find(string=re.compile(u"姓名八字评分")) result_data[&#39;bazi_score&#39;] = name_wuge.next_sibling.b.get_text()</pre> 通过该方法，就能对HTML解析，提取八字和五格的分数。 运行结果事例 <pre class='brush:php;toolbar:false;'>1/1287 李国锦姓名八字评分=61.5 姓名五格评分=78.6 总分=140.1 2/1287 李国铁姓名八字评分=61 姓名五格评分=89.7 总分=150.7 3/1287 李国晶姓名八字评分=21 姓名五格评分=81.6 总分=102.6 4/1287 李鸣国姓名八字评分=21 姓名五格评分=90.3 总分=111.3 5/1287 李柔国姓名八字评分=64 姓名五格评分=78.3 总分=142.3 6/1287 李国经姓名八字评分=21 姓名五格评分=89.8 总分=110.8 7/1287 李国蒂姓名八字评分=22 姓名五格评分=87.2 总分=109.2 8/1287 李国登姓名八字评分=21 姓名五格评分=81.6 总分=102.6 9/1287 李略国姓名八字评分=21 姓名五格评分=83.7 总分=104.7 10/1287 李国添姓名八字评分=21 姓名五格评分=81.6 总分=102.6 11/1287 李国天姓名八字评分=22 姓名五格评分=83.7 总分=105.7 12/1287 李国田姓名八字评分=22 姓名五格评分=93.7 总分=115.7</pre> 有了这些分数，我们就可以进行排序，是一个很实用的参考资料。 友情提示 <ol class=" list-paddingleft-2"> <li>分数跟很多因素有关，比如出生时刻、已经限定的字、限定字的笔画等因素，这些条件决定了有些名字不会分数高，不要受此影响，找出相对分数高的就可以了；</li> <li>目前程序只能抓取一个网站的内容，地址是http://life.httpcn.com/xingming.asp</li> <li>本列表仅供参考，看过一些文章，历史上很多名人伟人，姓名八字评分都非常低但是都建功立业，名字确实会有些影响但有时候朗朗上口就是最好的；</li> <li>从本列表中选取名字之后，可以在百度、人人网等地方查查，以防有些负面的人重名、或者起这个名字的人太多了烂大街；</li> <li>八字分数是中国传承，五格分数是日本人近代发明的，有时候也可以试试西方的星座起名法，并且奇怪的是八字和五个分数不同网站打分相差很大，更说明了这东西只供参考；</li> </ol> 本文的代码已上传到github 更多Use Python crawler to give your child a good name相关文章请关注PHP中文网！

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Python vs. C : Understanding the Key DifferencesApr 21, 2025 am 12:18 AM

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

Python vs. C : Which Language to Choose for Your Project?Apr 21, 2025 am 12:17 AM

Choosing Python or C depends on project requirements: 1) If you need rapid development, data processing and prototype design, choose Python; 2) If you need high performance, low latency and close hardware control, choose C.

Reaching Your Python Goals: The Power of 2 Hours DailyApr 20, 2025 am 12:21 AM

By investing 2 hours of Python learning every day, you can effectively improve your programming skills. 1. Learn new knowledge: read documents or watch tutorials. 2. Practice: Write code and complete exercises. 3. Review: Consolidate the content you have learned. 4. Project practice: Apply what you have learned in actual projects. Such a structured learning plan can help you systematically master Python and achieve career goals.

Maximizing 2 Hours: Effective Python Learning StrategiesApr 20, 2025 am 12:20 AM

Methods to learn Python efficiently within two hours include: 1. Review the basic knowledge and ensure that you are familiar with Python installation and basic syntax; 2. Understand the core concepts of Python, such as variables, lists, functions, etc.; 3. Master basic and advanced usage by using examples; 4. Learn common errors and debugging techniques; 5. Apply performance optimization and best practices, such as using list comprehensions and following the PEP8 style guide.

Choosing Between Python and C : The Right Language for YouApr 20, 2025 am 12:20 AM

Python is suitable for beginners and data science, and C is suitable for system programming and game development. 1. Python is simple and easy to use, suitable for data science and web development. 2.C provides high performance and control, suitable for game development and system programming. The choice should be based on project needs and personal interests.

Python vs. C : A Comparative Analysis of Programming LanguagesApr 20, 2025 am 12:14 AM

Python is more suitable for data science and rapid development, while C is more suitable for high performance and system programming. 1. Python syntax is concise and easy to learn, suitable for data processing and scientific computing. 2.C has complex syntax but excellent performance and is often used in game development and system programming.

2 Hours a Day: The Potential of Python LearningApr 20, 2025 am 12:14 AM

It is feasible to invest two hours a day to learn Python. 1. Learn new knowledge: Learn new concepts in one hour, such as lists and dictionaries. 2. Practice and exercises: Use one hour to perform programming exercises, such as writing small programs. Through reasonable planning and perseverance, you can master the core concepts of Python in a short time.

Python vs. C : Learning Curves and Ease of UseApr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks agoByDDD

Atomfall guide: item locations, quest guides, and tips

1 months agoByDDD

Hot Tools

Dreamweaver CS6

Visual web development tools

WebStorm Mac version

Useful JavaScript development tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7678

CakePHP Tutorial

1393

C# Tutorial

1209

What is the format of the account name of steam

win11 activation key permanent