sklearn中的数据集2 (Covertype 一个关于植被的数据集)
Covertype 简介植被覆盖类型数据集,包括位于美国科罗拉多州北部罗斯福国家森林的四个荒野区域。样本总数为581012,在kaggle中,样本划分为训练集(training set):15120,以及测试集 (test set) :565892 。每个样本来自一块30m x 30m 的区域采样。每个样本有54个特征,且有7种类型, 这七种类型是:1 - 云杉/冷杉2 - 洛奇波尔松3 - 黄松
Covertype 简介
植被覆盖类型数据集,包括位于美国科罗拉多州北部罗斯福国家森林的四个荒野区域。样本总数为581012,在kaggle中,样本划分为训练集(training set):15120,以及测试集 (test set) :565892 。
每个样本来自一块30m x 30m 的区域采样。每个样本有54个特征,且有7种类型, 这七种类型是:
1 - 云杉/冷杉
2 - 洛奇波尔松
3 - 黄松
4 - 三叶杨/柳树
5 - 阿斯彭
6 - 花旗松
7 - 克鲁姆霍尔茨
Elevation 1
Aspect 2
Slope 3
Horizontal_Distance_To_Hydrology 4
Vertical_Distance_To_Hydrology 5
Horizontal_Distance_To_Roadways 6
Hillshade_9am 7
Hillshade_Noon 8
Hillshade_3pm 9
Horizontal_Distance_To_Fire_Points 10
#以下都是不同类型植物:值为 0 or 1
Wilderness_Area1 11 Rawah Wilderness Area
Wilderness_Area2 12 Neota Wilderness Area
Wilderness_Area3 13 Comanche Peak Wilderness Area
Wilderness_Area4 14 Cache la Poudre Wilderness Area
Soil_Type1 15 Cathedral family - Rock outcrop complex, extremely stony.
Soil_Type2 16 Vanet - Ratake families complex, very stony.
Soil_Type3 17 Haploborolis - Rock outcrop complex, rubbly.
Soil_Type4 18 Ratake family - Rock outcrop complex, rubbly.
Soil_Type5 19 Vanet family - Rock outcrop complex complex, rubbly.
Soil_Type6 20 Vanet - Wetmore families - Rock outcrop complex, stony.
Soil_Type7 21 Gothic family.
Soil_Type8 22 Supervisor - Limber families complex.
Soil_Type9 23 Troutville family, very stony.
Soil_Type10 24 Bullwark - Catamount families - Rock outcrop complex, rubbly.
Soil_Type11 25 Bullwark - Catamount families - Rock land complex, rubbly.
Soil_Type12 26 Legault family - Rock land complex, stony.
Soil_Type13 27 Catamount family - Rock land - Bullwark family complex, rubbly.
Soil_Type14 28 Pachic Argiborolis - Aquolis complex.
Soil_Type15 29 unspecified in the USFS Soil and ELU Survey.
Soil_Type16 30 Cryaquolis - Cryoborolis complex.
Soil_Type17 31 Gateview family - Cryaquolis complex.
Soil_Type18 32 Rogert family, very stony.
Soil_Type19 33 Typic Cryaquolis - Borohemists complex.
Soil_Type20 34 Typic Cryaquepts - Typic Cryaquolls complex.
Soil_Type21 35 Typic Cryaquolls - Leighcan family, till substratum complex.
Soil_Type22 36 Leighcan family, till substratum, extremely bouldery.
Soil_Type23 37 Leighcan family, till substratum - Typic Cryaquolls complex.
Soil_Type24 38 Leighcan family, extremely stony.
Soil_Type25 39 Leighcan family, warm, extremely stony.
Soil_Type26 40 Granile - Catamount families complex, very stony.
Soil_Type27 41 Leighcan family, warm - Rock outcrop complex, extremely stony.
Soil_Type28 42 Leighcan family - Rock outcrop complex, extremely stony.
Soil_Type29 43 Como - Legault families complex, extremely stony.
Soil_Type30 44 Como family - Rock land - Legault family complex, extremely stony.
Soil_Type31 45 Leighcan - Catamount families complex, extremely stony.
Soil_Type32 46 Catamount family - Rock outcrop - Leighcan family complex, extremely stony.
Soil_Type33 47 Leighcan - Catamount families - Rock outcrop complex, extremely stony.
Soil_Type34 48 Cryorthents - Rock land complex, extremely stony.
Soil_Type35 49 Cryumbrepts - Rock outcrop - Cryaquepts complex.
Soil_Type36 50 Bross family - Rock land - Cryumbrepts complex, extremely stony.
Soil_Type37 51 Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony.
Soil_Type38 52 Leighcan - Moran families - Cryaquolls complex, extremely stony.
Soil_Type39 53 Moran family - Cryorthents - Leighcan family complex, extremely stony.
Soil_Type40 54 Moran family - Cryorthents - Rock land complex, extremely stony.
from sklearn.datasets import fetch_covtype
X, y = fetch_covtype(return_X_y=True) #第一次会下载数据约90M CVS
print(X.shape) #(581012, 54)
print(y.shape) # (581012,)