Home >Web Front-end >JS Tutorial >How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

Mary-Kate Olsen
Mary-Kate OlsenOriginal
2024-12-04 16:45:12993browse

How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

Regex-based CSV String Parsing

Problem Statement:

Parse a CSV string with commas embedded within quoted values, while ignoring commas outside quotes.

Solution Overview:**

To properly parse a CSV string that may contain quoted values with escaped characters, it's necessary to walk through the string character by character. Two regular expressions are employed:

CSV Validation Regex:

^\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*(?:,\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*)*$

This regex ensures that the input string follows the defined CSV format, where:

  • Values can be single-quoted, double-quoted, or unquoted.
  • Quoted values may contain escaped characters.
  • Commas are used as separators.

Value Parsing Regex:

(?!\s*$)\s*(?:'([^'\]*(?:\[\S\s][^'\]*)*)'|"([^"\]*(?:\[\S\s][^"\]*)*)"|([^,'"\s\]*(?:\s+[^,'"\s\]+)*)|)\s*(?:,|$)

This regex extracts one value at a time from the CSV string, considering the same rules as the validation regex. It handles quoted values and removes escaped characters.

JavaScript Implementation:**

function CSVtoArray(text) {
    const re_valid = /^\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*(?:,\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*)*$/;
    const re_value = /(?!\s*$)\s*(?:'([^'\]*(?:\[\S\s][^'\]*)*)'|"([^"\]*(?:\[\S\s][^"\]*)*)"|([^,'"\s\]*(?:\s+[^,'"\s\]+)*))\s*(?:,|$)/g;
    // Return NULL if input string is not well formed CSV string.
    if (!re_valid.test(text)) return null;
    const a = [];                     // Initialize array to receive values.
    text.replace(re_value, // "Walk" the string using replace with callback.
        function(m0, m1, m2, m3) {
            // Remove backslash from \' in single quoted values.
            if      (m1 !== undefined) a.push(m1.replace(/\'/g, "'"));
            // Remove backslash from \" in double quoted values.
            else if (m2 !== undefined) a.push(m2.replace(/\"/g, '"'));
            else if (m3 !== undefined) a.push(m3);
            return ''; // Return empty string.
        });
    // Handle special case of empty last value.
    if (/,\s*$/.test(text)) a.push('');
    return a;
}

Example Usage:**

const csvString = "'string, duppi, du', 23, lala";
const result = CSVtoArray(csvString);
console.log(result); // ["string, duppi, du", "23", "lala"]

The above is the detailed content of How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn