使用Python逐行比对两个TXT文件中每行数据，有相同的数字，但是为何结果好乱啊？

Z时代
2024-02-08
分类：IT

两个记事本文件内容

test1

1,2,3,4,5,6 5,6,7,8,9,10

11,12,13,14,15

test2

2,3,4,5,6,7

7,8,9,10,11,12

我想用逐行比对的方法对比 test1 中的第一行与 test2 中的所有行比对，然后是 test1 的第二行，以此类推，查找对比过程中两行数组中是否有 4 个相同的数字，找到的话就返回test2 中对应的行数据，

下面是我的代码，我就写了这么多，最后发现了好多未知的问题，水平太低，进行不下去了，望各位提携~！

python">import os, linecache
file1 = open('test1.txt','r',encoding= 'gb18030');
arr1 = file1.readlines()
file2 = open('test2.txt','r',encoding= 'gb18030');
arr2 = file2.readlines()
for fields1 in arr1:
    for fields2 in arr2:
        c = set(fields1).intersection(set(fields2))
        d = len(c)
        if d == 4:            print(list(c), ",", len(c))

返回结果看着好乱：

['2', ',', '3', ',', '4', ',', '5', ',', '6', ',', '7', '\n'] ['7', '5', '6', '\n', ','] , 5['2', ',', '3', ',', '4', ',', '5', ',', '6', ',', '7', '\n'] ['4', '2', '3', '5', ','] , 5

回答：


arr1 = ["1,2,3,4,5,6", "5,6,7,8,9,10", "11,12,13,14,15"]
arr2 = ["2,3,4,5,6,7", "7,8,9,10,11,12"]
for fields1 in arr1:
    nums1 = set(fields1.split(','))  # 把字符串按逗号分割成单独的数字，并去重
    for fields2 in arr2:
        nums2 = set(fields2.split(','))  # 同上
        c = nums1.intersection(nums2)  # 求两个集合的交集
        d = len(c)
        if d >= 4:  # 满足条件则打印并终止搜索
            print(fields2)
            break

回答：

因为你的代码的遍历过程，包括了逗号和换行符，所以导致匹配出现混乱。有几种思路做法，比如说我们可以写两个遍历，一个全部录入，第二个循环删掉一些字符（逗号，换行符）做筛选，当然我个人觉得最简单的是使用split()函数将每行的字符串按逗号分隔成列表，然后再进行比对。

file1 = open('test1.txt', 'r', encoding='gb18030')
arr1 = file1.readlines()
file2 = open('test2.txt', 'r', encoding='gb18030')
arr2 = file2.readlines()
for line1 in arr1:
    line1 = line1.strip()  # 去除行尾的换行符
    list1 = line1.split(',')  # 将字符串按逗号分隔成列表
    for line2 in arr2:
        line2 = line2.strip()  # 去除行尾的换行符
        list2 = line2.split(',')  # 将字符串按逗号分隔成列表
        common_elements = set(list1) & set(list2)  # 找到两个列表的共同元素
        if len(common_elements) == 4:            print(list(common_elements), ",", len(common_elements))

回答：

代码如下:

def compare_two_files(t1, t2):
    # 读取文件1
    with open(t1) as f1:
        # 死循环读取文件1
        while True:
            # 去除每一行的换行符
            f1_line = f1.readline().rstrip("\n")
            # 如果读完了，就break，退出当前循环
            if not f1_line:
                break
            # 读取文件2
            with open(t2) as f2:
                # 死循环读取文件2
                while True:
                    # 去除每一行的换行符
                    f2_line = f2.readline().rstrip("\n")
                    # 如果读完了，就break，退出当前循环
                    if not f2_line:
                        break
                    # 将f1、f2的每一行字符串使用字符串 `split` 函数
                    # 以 ``,`` 进行分割，并使用set函数转为集合对象
                    f1_set = set(f1_line.split(","))
                    f2_set = set(f2_line.split(","))
                    # 对两个集合求交集
                    intersection_set = f1_set.intersection(f2_set)
                    # 判断交集的长度，若大于等于4，则print(f2_line)
                    if len(intersection_set) >= 4:
                        print(f2_line)
if __name__ == "__main__":
    compare_two_files("test1.txt", "test2.txt")

以上是使用Python逐行比对两个TXT文件中每行数据，有相同的数字，但是为何结果好乱啊？的全部内容，来源链接： utcz.com/p/938941.html

使用Python逐行比对两个TXT文件中每行数据，有相同的数字，但是为何结果好乱啊 ？

回答：

回答：

回答：

其他人也看了：

使用Python逐行比对两个TXT文件中每行数据，有相同的数字，但是为何结果好乱啊？