Covertype 简介

植被覆盖类型数据集,包括位于美国科罗拉多州北部罗斯福国家森林的四个荒野区域。样本总数为581012,在kaggle中,样本划分为训练集(training set):15120,以及测试集 (test set) :565892 。
每个样本来自一块30m x 30m 的区域采样。每个样本有54个特征,且有7种类型, 这七种类型是:

1 - 云杉/冷杉
2 - 洛奇波尔松
3 - 黄松
4 - 三叶杨/柳树
5 - 阿斯彭
6 - 花旗松
7 - 克鲁姆霍尔茨

除了前10个特征是浮点数外,其余特征都是one-hot变量。这54个特征分别是:
Elevation 1
Aspect 2
Slope 3
Horizontal_Distance_To_Hydrology 4
Vertical_Distance_To_Hydrology 5
Horizontal_Distance_To_Roadways 6
Hillshade_9am 7
Hillshade_Noon 8
Hillshade_3pm 9
Horizontal_Distance_To_Fire_Points 10

#以下都是不同类型植物:值为 0 or 1
Wilderness_Area1 11 Rawah Wilderness Area
Wilderness_Area2 12 Neota Wilderness Area
Wilderness_Area3 13 Comanche Peak Wilderness Area
Wilderness_Area4 14 Cache la Poudre Wilderness Area
Soil_Type1 15 Cathedral family - Rock outcrop complex, extremely stony.
Soil_Type2 16 Vanet - Ratake families complex, very stony.
Soil_Type3 17 Haploborolis - Rock outcrop complex, rubbly.
Soil_Type4 18 Ratake family - Rock outcrop complex, rubbly.
Soil_Type5 19 Vanet family - Rock outcrop complex complex, rubbly.
Soil_Type6 20 Vanet - Wetmore families - Rock outcrop complex, stony.
Soil_Type7 21 Gothic family.
Soil_Type8 22 Supervisor - Limber families complex.
Soil_Type9 23 Troutville family, very stony.
Soil_Type10 24 Bullwark - Catamount families - Rock outcrop complex, rubbly.
Soil_Type11 25 Bullwark - Catamount families - Rock land complex, rubbly.
Soil_Type12 26 Legault family - Rock land complex, stony.
Soil_Type13 27 Catamount family - Rock land - Bullwark family complex, rubbly.
Soil_Type14 28 Pachic Argiborolis - Aquolis complex.
Soil_Type15 29 unspecified in the USFS Soil and ELU Survey.
Soil_Type16 30 Cryaquolis - Cryoborolis complex.
Soil_Type17 31 Gateview family - Cryaquolis complex.
Soil_Type18 32 Rogert family, very stony.
Soil_Type19 33 Typic Cryaquolis - Borohemists complex.
Soil_Type20 34 Typic Cryaquepts - Typic Cryaquolls complex.
Soil_Type21 35 Typic Cryaquolls - Leighcan family, till substratum complex.
Soil_Type22 36 Leighcan family, till substratum, extremely bouldery.
Soil_Type23 37 Leighcan family, till substratum - Typic Cryaquolls complex.
Soil_Type24 38 Leighcan family, extremely stony.
Soil_Type25 39 Leighcan family, warm, extremely stony.
Soil_Type26 40 Granile - Catamount families complex, very stony.
Soil_Type27 41 Leighcan family, warm - Rock outcrop complex, extremely stony.
Soil_Type28 42 Leighcan family - Rock outcrop complex, extremely stony.
Soil_Type29 43 Como - Legault families complex, extremely stony.
Soil_Type30 44 Como family - Rock land - Legault family complex, extremely stony.
Soil_Type31 45 Leighcan - Catamount families complex, extremely stony.
Soil_Type32 46 Catamount family - Rock outcrop - Leighcan family complex, extremely stony.
Soil_Type33 47 Leighcan - Catamount families - Rock outcrop complex, extremely stony.
Soil_Type34 48 Cryorthents - Rock land complex, extremely stony.
Soil_Type35 49 Cryumbrepts - Rock outcrop - Cryaquepts complex.
Soil_Type36 50 Bross family - Rock land - Cryumbrepts complex, extremely stony.
Soil_Type37 51 Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony.
Soil_Type38 52 Leighcan - Moran families - Cryaquolls complex, extremely stony.
Soil_Type39 53 Moran family - Cryorthents - Leighcan family complex, extremely stony.
Soil_Type40 54 Moran family - Cryorthents - Rock land complex, extremely stony.

测试

from sklearn.datasets import fetch_covtype
X, y = fetch_covtype(return_X_y=True) #第一次会下载数据约90M CVS
print(X.shape) #(581012, 54)
print(y.shape) # (581012,)

Ref

1.https://archive.ics.uci.edu/ml/datasets/covertype
2.https://datahub.io/machine-learning/covertype
3.https://www.kaggle.com/c/forest-cover-type-prediction/data
4.https://scikit-learn.org/stable/auto_examples/kernel_approximation/plot_scalable_poly_kernels.html#sphx-glr-auto-examples-kernel-approximation-plot-scalable-poly-kernels-py

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐