正则表达式(R&Python)
正则表达式,R和python都有
regular expression
1.R,spanly recommend this blog
The table_info examples are following:
du_mtime_cinema_showtime20190606
du_amap_shoppingmall_indoor_201903_d4
du_amap_shopping_mall_info_2017
du_amap_ship_201909
I want tables which start with"du_amap_" and end with year/month, so in the tables above,
I only want the fourth one. Below, in R, character escape (the backlash character \) should be \\.
"^"means the start and "$" means the end.
^can delete but $ can\'t because we use "grep" function.
keyword_all <-\'^du_amap_.+2\\d{5}$\'keyword_table <- grep(keyword_all, table_info$Tables_in_risingdata, value =T)
str_extract_all is a function which only filters out the characters that fits the pattern.
The below codes extract the last six numbers:year and month
table_name_body <- \'amap_cvs_citycount\'month<-str_extract_all(string=keyword_table[p],pattern=\'\\d.+\')%>% as.character()
* means the pattern in front of it will appear one or more times, | means or, and . means any characters.
Below codes means deleting "du_amap_" and "_201..".
keyword <- gsub(\'.*amap_|_201.*\', \'\', table_name_body)shoppingmall_amap$name<-gsub(\'(\\(.*\\))\',"",shoppingmall_amap$name)
latitude and longtitude
\\d{2}[.]\\d+
find Chinese
[\u4E00-\u9FA5\\s]+ #many characters,including space[\u4E00-\u9FA5]+ #many characters,not including space
[\u4E00-\u9FA5] #one character
2.Python
包
import re
查找数字,注意这里python转义只有一个\,但R里转义要两个:\\
pattern1 = re.compile(r\'\d+\')
这里是找表格里每行以(080)开头的数字
pattern1 = re.compile(r\'\(080\)\d+\')fixed_line_all=pd.DataFrame()
for i in range(len(calls_pd[0])):
fixed_line=pattern1.findall(calls_pd[0][i])
fixed_line_all=set(fixed_line_all).union(fixed_line)
fixed_line_all=pd.DataFrame(fixed_line_all)
这里提取以7、8、9开头的前四位数
pattern2=re.compile(r\'^(7\d{3}|8\d{3}|9\d{3})\')for i in range(len(fixed_line_bind[1])):
mobile_line=pattern2.findall(fixed_line_bind[1][i])
mobile_bang=set(mobile_bang).union(mobile_line)
mobile_bang=pd.DataFrame(mobile_bang)
以上是 正则表达式(R&Python) 的全部内容, 来源链接: utcz.com/z/388136.html