ValueError：n_splits = 10不能大于每个类中的成员数

Z时代
2024-01-10
分类：问答

我正在尝试运行以下代码：

from sklearn.model_selection import StratifiedKFold 
X = ["hey", "join now", "hello", "join today", "join us now", "not today", "join this trial", " hey hey", " no", "hola", "bye", "join today", "no","join join"]
y = ["n", "r", "n", "r", "r", "n", "n", "n", "n", "r", "n", "n", "n", "r"]
skf = StratifiedKFold(n_splits=10)
for train, test in skf.split(X,y):  
    print("%s %s" % (train,test))

但是我收到以下错误：

ValueError: n_splits=10 cannot be greater than the number of members in each class.

我在这里查看了scikit-learn错误：y中人口最少的类只有1个成员，但是我仍然不确定我的代码有什么问题。

我的列表长度均为14 print(len(X))print(len(y))。

我的部分困惑是，我不确定在此上下文中amembers的定义和含义class。

如何解决该错误？什么是会员？什么是课程？（在这种情况下）

回答：

分层意味着在每个折叠中保持每个类的比率。因此，如果您的原始数据集有3个类别，比例分别为60％，20％和20％，那么分层将尝试在每个折叠中保持该比例。

就你而言

X = ["hey", "join now", "hello", "join today", "join us now", "not today", "join this trial", " hey hey", " no", "hola", "bye", "join today", "no","join join"] y = ["n", "r", "n", "r", "r", "n", "n", "n", "n", "y", "n", "n", "n", "y"]

您总共有14个样本（成员）与分布：

class    number of members         percentage
 'n'        9                        64
 'r'        3                        22
 'y'        2                        14

因此，StratifiedKFold将尝试在每次折叠中保持该比例。现在，您已指定10折（n_splits）。因此，对于y类，要保持此比率，至少要折叠2/10

= 0.2个成员。但是我们不能给出少于1个成员（样本），所以这就是为什么它会在其中抛出错误。

如果不是n_splits=10，则设置了n_splits=2，那么它会起作用，因为’y’的成员数量将是2/2 =1。为了n_splits =

10正确工作，每个类至少需要有10个样本。

以上是 ValueError：n_splits = 10不能大于每个类中的成员数的全部内容，来源链接： utcz.com/qa/411334.html