search

Home  >  Q&A  >  body text

插入排序 - MongoDB如何正确插入带双引号的文本记录??

使用自带的mongoimport.exe导入test.csv文件(测试内容如下):

name,pass
test1,ztj"ile0
test2,"audreyhepburn"
test3,Xiaoya”””oge521
test4,""520xiangbin

问题:
导入后使用find({name:/^test/})查询,发现pass字段全部显示错误(和csv中原值完全不同,显示为空值或者只有一半文本等)——请问MongoDB如何正确插入带双引号的文本记录??

无论是逐条insert还是批量import都无法插入带双引号的记录,即使使用“\”转义也不行,求大神!

迷茫迷茫2796 days ago921

reply all(1)I'll reply

  • PHPz

    PHPz2017-04-24 09:15:20

    According to CSV standards:

       file = [header CRLF] record *(CRLF record) [CRLF]
       header = name *(COMMA name)
       record = field *(COMMA field)
       name = field
       field = (escaped / non-escaped)
       escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
       non-escaped = *TEXTDATA
       COMMA = %x2C
       CR = %x0D
       DQUOTE =  %x22
       LF = %x0A
       CRLF = CR LF
       TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E
    

    In the examples you gave, test1test4 are all illegal. Although I have not confirmed that MongoDB parses CSV in strict accordance with RFC 4180 standards, there must be a big problem with your file format.

    Therefore, it is recommended to use a tool to standardize your CSV file before importing it into the database. I don’t know how large your data volume is, but this is just simple text processing, and the time consumption should be acceptable.

    The following is a solution. Although it is not perfect, it should be suitable for most situations:

    # 除了首行之外,对于每一行:
    for line in file[1 ...]
      # 将第一个逗号前的部分作为 name, 逗号后的作为 pass
      [1:name, 2:pass] = line.match /^([^,])+,(.*)/
    
      # 如果存在 name 和 pass
      if name and pass
        # 如果在忽略首尾空格的情况下 pass 不是以双引号开头和结尾的,或者 pass 中间存在单个双引号就进行重新转义
        unless pass.trim().match(/^".*"$/) and !pass.match(/[^"]"[^"]/)
          # 重复双引号
          pass = pass.replace /"/, '""'
          # 在前后加上双引号
          pass = '"' + pass + '"'
    
        console.log [name, pass].join ','
    

    https://tools.ietf.org/html/rfc4180

    reply
    0
  • Cancelreply