Home >Java >javaTutorial >How to Split Comma-Delimited Strings with Embedded Quotes?

How to Split Comma-Delimited Strings with Embedded Quotes?

DDD
DDDOriginal
2024-12-07 08:11:11194browse

How to Split Comma-Delimited Strings with Embedded Quotes?

Splitting Delimited Text with Embedded Quotes

When parsing text that follows a comma-delimited format, handling embedded quotes can pose a challenge. This article tackles this issue, providing a solution to split a string on commas while preserving the integrity of text enclosed within quotes.

Consider the following text:

123,test,444,"don't split, this",more test,1

Using the basic String.split(",") method would yield the following result:

123
test
444
"don't split
 this"
more test
1

However, the goal is to retain the quoted text as a single entity:

123
test
444
"don't split, this"
more test
1

To achieve this, we employ a regular expression-based solution:

str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

This expression splits the string based on commas that are followed by an even number of double quotes. This ensures that commas within quoted text are ignored as delimiters.

Understanding the Regular Expression:

  • ,(?=( - Matches a comma followed by a look-ahead assertion.
  • (?:[^"]*"[^"]*")*: Matches zero or more repetitions of the following quoted text pattern: "[^"]*" (a single quote followed by non-quote characters, followed by another quote).
  • [^"]*: Matches zero or more non-quote characters before the quotes.
  • $[^"]*$: Asserts that the end of the string is reached after matching non-quote characters.

Alternative Syntax:

For readability, you can also break the regular expression into multiple lines using the (?x) modifier:

String[] arr = str.split("(?x)   " + 
                     ",          " +   // Split on comma
                     "(?=        " +   // Followed by
                     "  (?:      " +   // Start a non-capture group
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "  )*       " +   // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
                     "  [^\"]*   " +   // Finally 0 or more non-quotes
                     "  $        " +   // Till the end  (This is necessary, else every comma will satisfy the condition)
                     ")          "     // End look-ahead
                         );

This approach ensures accurate splitting of delimited text, considering both commas and embedded quotes.

The above is the detailed content of How to Split Comma-Delimited Strings with Embedded Quotes?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn