一尘不染

Java用逗号分隔引号外

java

我的程序从文件中读取一行。此行包含逗号分隔的文本,例如:

123,test,444,"don't split, this",more test,1

我希望拆分的结果是这样的:

123
test
444
"don't split, this"
more test
1

如果使用String.split(","),我将得到:

123
test
444
"don't split
 this"
more test
1

换句话说:子字符串中的逗号"don't split, this"不是分隔符。该如何处理?


阅读 438

收藏
2020-03-06

共1个答案

一尘不染

你可以尝试以下正则表达式:

str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");

这将分割字符串,,后跟偶数双引号。换句话说,它用双引号引起来的逗号分隔。如果你在字符串中使用了引号,则此方法将起作用。

说明:

,           // Split on comma
(?=         // Followed by
   (?:      // Start a non-capture group
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
     [^"]*  // 0 or more non-quote characters
     "      // 1 quote
   )*       // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
   [^"]*    // Finally 0 or more non-quotes
   $        // Till the end  (This is necessary, else every comma will satisfy the condition)
)

你甚至可以在代码中使用(?x)正则表达式使用修饰符来键入此类内容。修饰符会忽略你的正则表达式中的任何空格,因此更容易读取分成多行的正则表达式,如下所示:

String[] arr = str.split("(?x)   " + 
                     ",          " +   // Split on comma
                     "(?=        " +   // Followed by
                     "  (?:      " +   // Start a non-capture group
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "    [^\"]* " +   // 0 or more non-quote characters
                     "    \"     " +   // 1 quote
                     "  )*       " +   // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
                     "  [^\"]*   " +   // Finally 0 or more non-quotes
                     "  $        " +   // Till the end  (This is necessary, else every comma will satisfy the condition)
                     ")          "     // End look-ahead
                         );
2020-03-06