PHP environment, pointing out a misunderstanding in the original sentence creation. There is no need to consider decimal points or domain names when creating sentences, because standard sentences have spaces after periods. The only thing that needs to be considered is Mr. Li. The first step is to divide the paragraphs into paragraphs because some quotes end with colons.
- /*TWWY'S ART*/
- function break_passage($text){ //Split paragraphs
- return preg_split("/(r|n|rn)/", $text, -1 , PREG_SPLIT_NO_EMPTY);
- }
- function break_sentence($text){ //There must be a space after the English period to split sentences
- $re = '/# Split sentences on whitespace between them.
- (?<= # Begin positive lookbehind .
- [.!?] # Either an end of sentence punct,
- | [.!?]['"] # or end of sentence punct and quote.
- ) # End positive lookbehind.
- (? Mr. # Skip either "Mr."
- | Mrs. # or "Mrs.",
- | Ms. # or "Ms.",
- | Jr. # or "Jr.",
- | Dr. # or "Dr.",
- | Prof. # or "Prof.",
- | Sr. # or "Sr.",
- # or... (you get the idea).
- ) # End negative lookbehind.
- s+ # Split on whitespace between sentences.
- /ix';
- $sentences = preg_split($re, $text, -1, PREG_SPLIT_NO_EMPTY);
- return $sentences;
- }
- function get_sentence($text){ //First Split paragraphs and then sentences [recommended]
- $passage = break_passage($text);
- $return = array();
- foreach ($passage as $key => $value) $return = array_merge($return, break_sentence( $value));
- return $return;
- }
-
- ?>
Copy code
|