Home  >  Article  >  Backend Development  >  How to retrieve 20 million lines of text data

How to retrieve 20 million lines of text data

WBOY
WBOYOriginal
2016-08-04 09:19:11941browse

There are 2000w rows of data in a txt document. The format of the data is as follows
The Walking Dead_Mother
The Golden Cicada Escapes_Smile
Farewell My Concubine_Love
Unpunished_Eternity
....
Eight Immortals Crossing the Sea_Destiny

How can I quickly search for idioms or English words? Please give me the algorithm, thank you experts

Reply content:

There are 2000w rows of data in a txt document. The format of the data is as follows
The Walking Dead_Mother
The Golden Cicada Escapes_Smile
Farewell My Concubine_Love
Unpunished_Eternity
....
Eight Immortals Crossing the Sea_Destiny

How can I quickly search for idioms or English words? Please give me the algorithm, thank you experts

Is your purpose to determine whether a certain idiom/word exists in English or to count the number of times this idiom/word appears?
I think no matter what method, the greatest possibility is to read the entire text. If your retrieval frequency is very high, it will be fastest to put the 20 million data in the memory and then index and store it. If it is only run once, then the fastest time is the time to read all the files (calculate the number of occurrences).

Build a solr and create an index, which will greatly improve the search efficiency

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn