Java用逗号分隔引号外
我的程序从文件中读取一行。此行包含逗号分隔的文本,例如:
123,test,444,"don't split, this",more test,1
我希望拆分的结果是这样的:
123test
444
"don't split, this"
more test
1
如果使用String.split(",")
,我将得到:
123test
444
"don't split
this"
more test
1
换句话说:子字符串中的逗号"don't split, this"
不是分隔符。该如何处理?
回答:
你可以尝试以下正则表达式:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
这将分割字符串,,后跟偶数双引号。换句话说,它用双引号引起来的逗号分隔。如果你在字符串中使用了引号,则此方法将起作用。
说明:
, // Split on comma(?= // Followed by
(?: // Start a non-capture group
[^"]* // 0 or more non-quote characters
" // 1 quote
[^"]* // 0 or more non-quote characters
" // 1 quote
)* // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
[^"]* // Finally 0 or more non-quotes
$ // Till the end (This is necessary, else every comma will satisfy the condition)
)
你甚至可以在代码中使用(?x)
正则表达式使用修饰符来键入此类内容。修饰符会忽略你的正则表达式中的任何空格,因此更容易读取分成多行的正则表达式,如下所示:
String[] arr = str.split("(?x) " + ", " + // Split on comma
"(?= " + // Followed by
" (?: " + // Start a non-capture group
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" [^\"]* " + // 0 or more non-quote characters
" \" " + // 1 quote
" )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even)
" [^\"]* " + // Finally 0 or more non-quotes
" $ " + // Till the end (This is necessary, else every comma will satisfy the condition)
") " // End look-ahead
);
以上是 Java用逗号分隔引号外 的全部内容, 来源链接: utcz.com/qa/408584.html