提取文本多次

Z时代
2024-01-10
分类：问答

我有一个样本文本数据如下：提取文本多次

1; ABC; 111; 10-NOV-2017 2; abc; 222; 11-NOV-2017 3; ABC; 333; 12-NOV-2017

鉴于2个输入ABC和11 nov1017我想提取字符串之间的两个，即

如何使用regex得到结果？有没有其他办法可以达到同样的效果？

实际的数据是这样的：

113434;轴黄金ETF; 2651.2868; 2651.2868; 2651.2868; 20-NOV-2017 113434;轴黄金ETF; 2627.6778; 2627.6778; 2627.6778; 21-新手觉得2017年 113434;轴黄金ETF; 2624.1880; 2624.1880; 2624.1880; 22 - 11月 - 2017年

任何帮助，高度赞赏。谢谢！

回答：

以下是提取所需子字符串（如果存在）的两种方法。我们给出以下内容。

str = "1;abc;111;10-nov-2017 2;abc;222;11-nov-2017 3;abc;333;12-nov-2017" 
before_str = "abc;" 
date_str = ";11-nov-2017"

我假设的date_str值出现在str最多一次。

＃1使用正则表达式

r =/
    .*   # match any number of characters greedily 
    #{before_str} # match the content of the variable 'before_str' 
    (.*)   # match any number characters greedily, in capture group 1 
    #{date_str} # match the content of the variable 'date_str' 
    /x   # free-spacing regex definition mode 
    #=> /.*abc;(.*);11-nov-2017/x 
str[r,1] 
    #=> "222"

这里的关键是.*在正则表达式的开始。作为一个贪婪的匹配，它会导致下一个匹配成为"abc;"（的值before_str）的前一个（值为date_str）的最后一个实例。

＃2确定用于期望subtring的开始和结束索引

idx_date = str.index(date_str) 
    #=> str.index(";11-nov-2017") => 31 
idx_before = str.rindex(before_str, idx_date-before_str.size) 
    #=> str.rindex("abc;", 27) => 24 
str[idx_before + before_str.size..idx_date-1] 
    #=> str[24+4..31-1] => str[28..30] => "222"

如果任idx_date或idx_before被nil，nil将被返回，并且最后一个表达式不进行评估。

查看String#rindex，特别是可选的第二个参数的功能。

（有人可能会写str[idx_before + date_str.before...idx_date]，但我发现在范围内使用三个点的是错误的潜在来源，所以我总是用两个点。）

回答：

你可以看看结果： /abc(.*?)10-nov-2017/g.exec("1;abc;111;10-nov-2017 2; abc; 222; 11-nov-2017 3; abc; 333; 12-nov-2017“）[1]

以上是提取文本多次的全部内容，来源链接： utcz.com/qa/258325.html

提取文本多次

回答：

回答：

其他人也看了：

【WPS教程】如何在文档中插入多行文字类型的文本框?

需要将文本的特定行写入新文本

【WPS教程】如何提取PDF文件中的文字内容？

【WPS教程】如何提取PDF文件中任意区域内的文字？

【WPS教程】如何使用大纲快速提取PPT中的文字？

《原神攻略》2.6光之流溢成就怎麼完成？光之流溢成就攻略分享