正则表达式(R&Python)

python

正则表达式,R和python都有

regular expression

1.R,spanly recommend this blog

The table_info examples are following:

du_mtime_cinema_showtime20190606

du_amap_shoppingmall_indoor_201903_d4

du_amap_shopping_mall_info_2017

du_amap_ship_201909

I want tables which start with"du_amap_" and end with year/month, so in the tables above,

I only want the fourth one. Below, in R, character escape (the backlash character \) should be \\.

"^"means the start and "$" means the end.

^can delete but $ can\'t because we use "grep" function.

keyword_all <-\'^du_amap_.+2\\d{5}$\'

keyword_table <- grep(keyword_all, table_info$Tables_in_risingdata, value =T)

 

str_extract_all is a function which only filters out the characters that fits the pattern.

The below codes extract the last six numbers:year and month

table_name_body <- \'amap_cvs_citycount\'

month<-str_extract_all(string=keyword_table[p],pattern=\'\\d.+\')%>% as.character()

 

 * means the pattern in front of it will appear one or more times, | means or, and . means any characters.

Below codes means deleting "du_amap_" and "_201..".

keyword <- gsub(\'.*amap_|_201.*\', \'\', table_name_body)

shoppingmall_amap$name<-gsub(\'(\\(.*\\))\',"",shoppingmall_amap$name)

 

latitude and longtitude

\\d{2}[.]\\d+

 

 find Chinese

[\u4E00-\u9FA5\\s]+ #many characters,including space

[\u4E00-\u9FA5]+ #many characters,not including space

[\u4E00-\u9FA5] #one character

 

2.Python

 包

import re

 查找数字,注意这里python转义只有一个\,但R里转义要两个:\\

pattern1 = re.compile(r\'\d+\')

 这里是找表格里每行以(080)开头的数字

pattern1 = re.compile(r\'\(080\)\d+\') 

fixed_line_all=pd.DataFrame()

for i in range(len(calls_pd[0])):

fixed_line=pattern1.findall(calls_pd[0][i])

fixed_line_all=set(fixed_line_all).union(fixed_line)

fixed_line_all=pd.DataFrame(fixed_line_all)

 这里提取以7、8、9开头的前四位数

pattern2=re.compile(r\'^(7\d{3}|8\d{3}|9\d{3})\') 

for i in range(len(fixed_line_bind[1])):

mobile_line=pattern2.findall(fixed_line_bind[1][i])

mobile_bang=set(mobile_bang).union(mobile_line)

mobile_bang=pd.DataFrame(mobile_bang)

 

 

 

以上是 正则表达式(R&Python) 的全部内容, 来源链接: utcz.com/z/388136.html

回到顶部