Python学习笔记三——文件操作及处理json

Z时代
2024-01-10
分类：综合

python

一、文件操作

基础知识：

1.open是打开已存在的文件或新建一个文件(在文件名后需加访问模式)

2.close是把刚刚新建或打开的文件关闭

3.write可以向文件中导入数据

4.read(num)可以从文本中读取数据，num表示要从文件中读取的数据的长度(单位是字节)，如果没有传入num,那么就表示读取文件中所有的数据

5.readlines可以按照操作行的方式把整个文件中的内容一次性读取，并且返回的是一个列表，其中每一行的数据为一个元素

6.tell在读写过程中可以知道文件的地址

7.seek()在读写过程中可以进行另一个位置操作

seek(offset,fom)

offset:偏移量

from:方向

0：表示文件开头

1：表示当前位置

2：表示文件末尾

另：

在os模块中对文件的操作

1.rename 可以对文件重命名

2.remove 可以对文件进行删除操作

3.mkdir 创建文件夹

4.getcwd 获取当前目录

5.chdir 改变默认目录

6.listdir 获取目录列表

7.rmdir 删除文件夹

对文件的操作分三步：

1、打开文件获取文件的句柄，句柄就理解为这个文件

2、通过文件句柄操作文件

3、关闭文件。

基本操作：

1 f = open(\'file.txt\',\'r\') #以只读方式打开一个文件，获取文件句柄，如果是读的话，r可以不写，默认就是只读,

2 在python2中还有file方法可以打开文件，python3中file方法已经没有了，只有open
res = f.read()#获取所有文件内容

print(res) #打印文件的所有内容

f.close()#关闭文件

           f = open(\'file.txt\',\'r\')

3         frist_line = f.readline()#获取文件的第一行内容，返回的是一个list
4         print(frist_line)#打印第一行
 　　　　   f.close()#关闭文件

打开文件时，需要指定文件路径和以何等方式打开文件，打开后，即可获取该文件句柄，后面通过此文件句柄对该文件操作，

打开文件的模式有：

1 r，只读模式（默认）。打开文件不存的话，会报错

2 w，只写模式。【不可读；不存在则创建；存在则删除内容；】

3 a，追加模式。【不可读；不存在则创建；存在则只追加内容；】

6 "+" 表示可以同时读写某个文件

8 r+ 读写模式【可读、可写；可追加，如果打开的文件不存在的话，会报错】

9 w+ 写读模式【写读模式，使用w+的话，已经存在的文件内容会被清空，可以读到已经写的文件内容】

10 a+ 追加读模式【追加读写模式，不存在则创建；存在则只追加内容；】

12 "U"表示在读取时，可以将 \r \n \r\n自动转换成 \n （与 r 或 r+ 模式同使用）

14 rU

15 r+U

17 "b"表示处理二进制文件（如：FTP发送上传ISO镜像文件，linux可忽略，windows处理二进制文件时需标注）

18 rb

19 wb

20 ab

文件操作方法：

 1             f = open(\'file.txt\',\'r+\',encoding=\'utf-8\')#encoding参数可以指定文件的编码
 2             f.readline()#读一行
 3             f.readable()#判断文件是否可读
 4             fr.writable()#判断文件是否可写
 5             fr.encoding#打印文件的编码
 6             f.read()#读取所有内容，大文件时不要用，因为会把文件内容都读到内存中，内存不够的话，会把内存撑爆
 7             f.readlines()#读取所有文件内容，返回一个list，元素是每行的数据，大文件时不要用，因为会把文件内容都读到内存中，内存不够的话，会把内存撑爆
 8             f.tell()#获取当前文件的指针指向
 9             f.seek(0)#把当前文件指针指向哪
10             f.write(\'爱情证书\')#写入内容
11             f.fulsh()#写入文件后，立即从内存中把数据写到磁盘中
12             f.truncate()#清空文件内容
13             f.writelines([\'爱情证书\',\'孙燕姿\'])#将一个列表写入文件中
14             f.close()关闭文件

读取小文件时，可以

 1 f = open(\'users.txt\',encoding=\'utf-8\')
 2 #文件对象、文件句柄
 3
 4 # while True:
 5 #     line = f.readline()
 6 #     if line!=\'\':
 7 #         print(\'line:\',line)
 8 #     else:
 9 #         print(\'文件内容都读完了，结束了\')
10 #         break

用上面的read()和readlines()方法操作文件的话，会先把文件所有内容读到内存中，这样的话，内存数据一多，非常卡，高效的操作，就是读一行操作一行，读过的内容就从内存中释放了

大文件时，读取文件高效的操作方法：

　　f = open(\'users.txt\',encoding=\'utf-8\')

 　　　　for line in f:
2    　 print(line)

这样的话，line就是每行文件的内容，读完一行的话，就会释放一行的内存

with使用：

在操作文件的时候，经常忘了关闭文件，这样的就可以使用with，它会在使用完这个文件句柄之后，自动关闭该文件，使用方式如下：

1 with open(\'file.txt\',\'r\') as f:#打开一个文件，把这个文件的句柄付给f

2 for line in f:

3 print(line)

4 with open(\'file.txt\') as fr,with open(\'file_bak\',\'w\') as fw: #这个是多文件的操作，打开两个文件，fr是读file.txt，fw是新建一个file_bak文件

5 for line in fr:#循环file.txt中的每一行

6 fw.write(line)#写到file_bak文件中

修改文件：

修改文件的话，有两种方式，

一种是把文件的全部内容都读到内存中，然后把原有的文件内容清空，重新写新的内容；

第二种是把修改后的文件内容写到一个新的文件中

下面是一个file.txt

寂寞当然有一点

你不在我身边

总是特别想念你的脸

距离是一份考卷

第一种方法： a:

1 #1、简单、粗暴直接的
2 f = open(\'file.txt\',encoding=\'utf-8\')
3 res = f.read().replace(\'一点\',\'二点\')
4 f.close()
5 f = open(\'file.txt\',mode=\'w\',encoding=\'utf-8\')
6 f.write(res)
7 f.flush()  # 立即把缓冲区里面的内容，写到磁盘上
8 f.close()

替换后的lile.txt :

寂寞当然有二点

你不在我身边

总是特别想念你的脸

距离是一份考卷

或者： b:

1 with open(\'file.txt\', \'r+\',encoding=\'utf-8\') as fr:
2     res1 = fr.read()
3     fr.seek(0)
4     new_res = res1.replace(\'你\', \'you\')
5     fr.write(new_res)

或者：

f = open(\'file.txt\',\'a+\',encoding=\'utf-8\')
f.seek(0)
res = f.read().replace(\'你\',\'you\')
f.seek(0)
f.truncate() #清空文件里面的内容
f.write(res)
f.close()

修改后的file.txt:

寂寞当然有二点

you不在我身边

总是特别想念you的脸

距离是一份考卷

第二种方法：

 （二）a:
import os
f = open(\'file.txt\',encoding=\'utf-8\')
f2 = open(\'file.txt.bak\',\'w\',encoding=\'utf-8\')
for line in f:
    new_line = line.replace(\'一点\',\'二点\')
    f2.write(new_line)
f.close()
f2.close()
os.remove(\'file.txt\')
os.rename(\'file.txt.bak\',\'file.txt\')
 （二）b:
import os
with open(\'file.txt\',encoding=\'utf-8\') as f, open(\'file.txt.bak\',\'w\',encoding=\'utf-8\') as f2:  #这个是多文件的操作，打开两个文件，f是读file.txt，f2是新建一个file_bak文件
    for line in f:  #循环file.txt中的每一行
        new_line = line.replace(\'一点\',\'二点\')
        f2.write(new_line)  #写到file_bak文件中
os.remove(\'file.txt\')
os.rename(\'file.txt.bak\',\'file.txt\')

替换后file.txt:

寂寞当然有二点

你不在我身边

总是特别想念你的脸

距离是一份考卷

拓展练习：监控日志

日志文件：

access.log

178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /wp-includes/logo_img.php HTTP/1.0" 302 161 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /blog HTTP/1.0" 301 233 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:15 +0800] "GET /blog/ HTTP/1.0" 200 38278 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
66.249.75.29 - - [04/Jun/2017:03:45:55 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=574&filter=hot HTTP/1.1" 200 17482 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
37.9.169.20 - - [04/Jun/2017:03:47:59 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:01 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:02 +0800] "GET /blog/ HTTP/1.1" 200 38330 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
37.9.169.20 - - [04/Jun/2017:03:48:23 +0800] "GET /blog/ HTTP/1.1" 200 38330 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
42.236.49.31 - - [04/Jun/2017:03:49:04 +0800] "GET /questions HTTP/1.1" 200 41977 "http://bbs.besttest.cn/questions" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider" "-"
66.249.75.28 - - [04/Jun/2017:03:49:42 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=473&filter=digest&digest=1 HTTP/1.1" 200 17242 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
123.125.71.60 - - [04/Jun/2017:03:52:50 +0800] "GET /robots.txt HTTP/1.1" 302 161 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.117 - - [04/Jun/2017:03:52:50 +0800] "GET /blog HTTP/1.1" 301 233 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
123.125.71.80 - - [04/Jun/2017:03:52:51 +0800] "GET /blog/ HTTP/1.1" 200 38330 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
66.249.75.28 - - [04/Jun/2017:03:53:29 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=516&filter=heat&orderby=heats HTTP/1.1" 200 17019 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
40.77.167.135 - - [04/Jun/2017:03:55:07 +0800] "GET /static/css/bootstrap/fonts/glyphicon

 1 #1、要从日志里面找到1分钟之内访问超过200次的
 2 #2、每分钟都运行一次
 3 
 4 # 1、读取文件内容，获取到ip地址
 5 # 2、把每个ip地址存起来 {}
 6 # 3、判断ip访问的次数是否超过200次
 7 # 4、加入黑名单 print
 8 
 9 #[\'118.24.4.30\',\'118.24.4.30\',\'118.24.4.30\',\'118.1xx.x.xx\',\'118.1xx.x.xx\']
10 # {
11 #     \'118.23.3.40\':2,
12 #     \'118.23.3.41\':5
13 # }
14 import time
15 point = 0 #初始的位置  
16 while True:
17     ips = {}  # 存放ips字典
18     f = open(\'access.log\',encoding=\'utf-8\')
19     f.seek(point)
20     for line in f: #循环取文件里面每行数据
21         ip = line.split()[0] #按照空格分割，取第一个元素就ip
22         if ip in ips:#判断这个ip是否存在
23             # ips[ip] = ips[ip]+1
24             ips[ip]+=1#如果存在的话，次数加+1
25         else:
26             ips[ip]=1 #如果不存在ip的次数就是1
27     point = f.tell() #记录文件指针位置，下一个60s后从这个位置开始循环
28     f.close()
29     for ip,count in ips.items():#循环这个字典，判断次数大于200的
30         if count>=200:
31             print(\'%s 加入黑名单\'%ip)
32     time.sleep(60)
3

二、处理Json

# json通用的数据类型，所有的语言都认识
# k-v { }
#json串是字符串

json串格式 ：  用三个单引号 引住json    ps: json的键值一定是用双引号

1 s=\'\'\'

2 {

3 "error_code": 0,

4 "stu_info": [

5 {

6 "id": 309,

7 "name": "小白",

8 "sex": "男",

9 "age": 28,

10 "addr": "河南省济源市北海大道32号",

11 "grade": "天蝎座",

12 "phone": "18512572946",

13 "gold": 100

14 },

15 {

16 "id": 310,

17 "name": "小白",

18 "sex": "男",

19 "age": 28,

20 "addr": "河南省济源市北海大道32号",

21 "grade": "天蝎座",

22 "phone": "18516572946",

23 "gold": 100

24 }

25 ]

26 }

28 \'\'\'

json是一种所有语言中都通用的key-value数据结构的数据类型，很像python中的字典，json处理使用json模块，json模块有下面常用的方法：

json.dumps()

json.dump()

json.loads()

json.load()

1     import json    
2     dic = {"name":"niuniu","age":18}
3     print(json.dumps(dic))#把字典转成json串
# 输出 ： 
    {"age": 18, "name": "niuniu"}  


4     fj = open(\'a.json\',\'w\')   # a.json不存在的
5     print(json.dump(dic,fj))#把字典转换成的json串写到一个文件里面
#  输出 ：  在当前的目录下，新增了一个a.json文件，文件内容为json ：

{"age": 18, "name": "niuniu"}


6     s_json = \'{"name":"niuniu","age":20,"status":true}\'
7     print(json.loads(s_json))#把json串转换成字典
# 输出 ：  
{\'status\': True, \'name\': \'niuniu\', \'age\': 20}



8     fr = open(\'a.json\',\'r\')      # a.json 内容为 ： {"age": 18, "name": "niuniu"}
9 　　print(json.load(fr))#从文件中读取json数据，然后转成字典
# 输出 ：
   {\'name\': \'niuniu\', \'age\': 18}

以上是 Python学习笔记三——文件操作及处理json 的全部内容，来源链接： utcz.com/z/389036.html

Python学习笔记三——文件操作及处理json

其他人也看了：