Home  >  Q&A  >  body text

python - 在推荐系统、机器学习中,如何将一个完整的数据集划分为训练集和测试集

如题,有没有快速一点的方法,我如果要做多折交叉验证,应该怎么去划分数据集

天蓬老师天蓬老师2742 days ago978

reply all(3)I'll reply

  • 黄舟

    黄舟2017-04-18 09:05:54

    Divide into 10 parts on average, cycle 10 times, select 1 part each time as the test set, and 9 parts as the training set

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 09:05:54

    Generally speaking, when doing cross validation, everyone will set k to 5 or 10. In other words, the data is (randomly) divided into k份,其中k-1份为训练,1 parts for testing. But having said that, you have to do cross validation, so it shouldn’t be fast.

    reply
    0
  • PHP中文网

    PHP中文网2017-04-18 09:05:54

    You can use 3.1. Cross-validation: evaluating estimator performance

    >>> from sklearn.model_selection import cross_val_score
    >>> clf = svm.SVC(kernel='linear', C=1)
    >>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
    >>> scores                                              
    array([ 0.96...,  1.  ...,  0.96...,  0.96...,  1.        ])

    reply
    0
  • Cancelreply