机器学习：Sklearn的train_test_split用法

Code意义train_data待划分的样本特征集合x_train划分出的训练数据集数据y_train划分出的训练数据集的标签x_test划分出的测试数据集数据y_test划分出的测试数据集的标签test_size若在0~1之间，为测试集样本数目与原始样本数目之比；若为整数，则是测试集样本的数目随机数种子，不同的随机数种子划分的结果不同stratify。

浊酒南街

954人浏览 · 2023-05-11 11:33:38

浊酒南街 · 2023-05-11 11:33:38 发布

train_test_split作用

在机器学习中，用户可调用该函数，随机将样本集合划分为训练集和测试集，并返回划分好的训练集和测试集数据。

参数介绍

语法：

X_train,X_test, y_train, y_test =cross_validation.train_test_split(X,y,test_size, random_state)

parm	意义
train_data	待划分的样本特征集合
x_train	划分出的训练数据集数据
y_train	划分出的训练数据集的标签
x_test	划分出的测试数据集数据
y_test	划分出的测试数据集的标签
test_size	若在0~1之间，为测试集样本数目与原始样本数目之比；若为整数，则是测试集样本的数目
random_state	随机数种子，不同的随机数种子划分的结果不同
stratify	stratify是为了保持split前类的分布，例如训练集和测试集数量的比例是 A：B= 4：1，等同于split前的比例（80：20）。通常在这种类分布不平衡的情况下会用到stratify。

示例

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split #切分数据集为训练集和测试集
iris =load_iris()
x = iris.data
y = iris.target.reshape(-1,1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=35, stratify=y)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)