Suppose there are multiple pieces of data as follows:
佛山市威尔康乳胶制品有限公司
爱奥乐医疗器械(深圳)有限公司
...
Now how to determine which of these data are valid corporate registration names, I hope you can give me some ideas.
PHP中文网2017-05-18 10:57:32
It’s best to go to the industrial and commercial website to see if you can find such industrial and commercial information. However, the industrial and commercial website has query restrictions, verification codes, etc. If you want to automate processing, you should pay attention to it
PHP中文网2017-05-18 10:57:32
This belongs to Named Entity Recognition (NER). If you just want to apply it, just import jieba
.
If you want to learn the principles in detail, I recommend the paper NLP from scratch