《python数据挖掘入门与实践》第4章的apriori算法代码是不是有错?
这是apriori算法代码的一部分。我们想从只包含1项的频繁项集出发得到包含2项的频繁项集。代码如下:
from collections import defaultdict\n", "\n",
"def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support):\n",
" counts = defaultdict(int)\n",
" for user, reviews in favorable_reviews_by_users.items():\n",
" for itemset in k_1_itemsets:\n",
" if itemset.issubset(reviews):\n",
" for other_reviewed_movie in reviews - itemset:\n",
" current_superset = itemset | frozenset((other_reviewed_movie,))\n",
" counts[current_superset] += 1\n",
" return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support])"
我认为这里的频繁项集被重复计算了。例如:对用户1来说,集合{A,B}和{B,A}是相同的,但是根据代码:
python">for itemset in k_1_itemsets:\n", " if itemset.issubset(reviews):\n",
" for other_reviewed_movie in reviews - itemset:\n",
" current_superset = itemset | frozenset((other_reviewed_movie,))\n",
" counts[current_superset] += 1\n",
当 itemset==A, 我们对{A,B}计数一次,
当 itemset==B, 我们对{B,A}又计数一次,
所以这里是不是重复计数了?如果是,应该怎样修改程序呢?
以上是 《python数据挖掘入门与实践》第4章的apriori算法代码是不是有错? 的全部内容, 来源链接: utcz.com/p/938548.html