04-空间连接

源代码 请看此处

本章目标:

  • 根据 countries cities 数据框架,确定每个城市所在的国家
  • 为了解决这个问题,使用'spatial join' 空间连接操作,根据地理空间数据集的空间关系将其信息结合起来。
%matplotlib inline

import pandas as pd
import geopandas
countries = geopandas.read_file("zip://data/ne_110m_admin_0_countries.zip")
cities = geopandas.read_file("zip://data/ne_110m_populated_places.zip")
rivers = geopandas.read_file("zip://data/ne_50m_rivers_lake_centerlines.zip")

4.1 dataframes的连接

Pandas提供了以不同方式连接或合并数据框的功能

详见 https://chrisalbon.com/python/data_wrangling/pandas_join_merge_dataframe/

以及 https://pandas.pydata.org/pandas-docs/stable/merging.html 了解完整的文档。

为了说明用pandas连接两个数据框信息的概念,用 cities 以及 countries 数据集做示例

cities2=cities[cities['name'].isin(['Bern', 'Brussels', 'London', 'Paris'])].copy()
cities2

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>geometry</th> </tr> </thead> <tbody> <tr> <th>26</th> <td>Bern</td> <td>POINT (7.46698 46.91668)</td> </tr> <tr> <th>170</th> <td>Brussels</td> <td>POINT (4.33137 50.83526)</td> </tr> <tr> <th>219</th> <td>London</td> <td>POINT (-0.11867 51.50194)</td> </tr> <tr> <th>235</th> <td>Paris</td> <td>POINT (2.33139 48.86864)</td> </tr> </tbody> </table> </div>

cities2['iso_a3'] = ['CHE', 'BEL', 'GBR', 'FRA']
cities2

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>geometry</th> <th>iso_a3</th> </tr> </thead> <tbody> <tr> <th>26</th> <td>Bern</td> <td>POINT (7.46698 46.91668)</td> <td>CHE</td> </tr> <tr> <th>170</th> <td>Brussels</td> <td>POINT (4.33137 50.83526)</td> <td>BEL</td> </tr> <tr> <th>219</th> <td>London</td> <td>POINT (-0.11867 51.50194)</td> <td>GBR</td> </tr> <tr> <th>235</th> <td>Paris</td> <td>POINT (2.33139 48.86864)</td> <td>FRA</td> </tr> </tbody> </table> </div>

countries2 = countries[['iso_a3', 'name', 'continent']]
countries2.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>iso_a3</th> <th>name</th> <th>continent</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>AFG</td> <td>Afghanistan</td> <td>Asia</td> </tr> <tr> <th>1</th> <td>AGO</td> <td>Angola</td> <td>Africa</td> </tr> <tr> <th>2</th> <td>ALB</td> <td>Albania</td> <td>Europe</td> </tr> <tr> <th>3</th> <td>ARE</td> <td>United Arab Emirates</td> <td>Asia</td> </tr> <tr> <th>4</th> <td>ARG</td> <td>Argentina</td> <td>South America</td> </tr> </tbody> </table> </div>

上面的操作中,我们在 cities 数据集中增加了一个 iso_a3 列,表示该城市的国家代码。这个国家代码也存在于 countries 数据集中,这使我们能够根据共同的列合并这两个数据框

根据一个共同的键,将 cities 数据框架与 countries 合并,将有关国家的额外信息(全称、大洲)合并到 cities 数据框架中。

cities2.merge(countries2,on='iso_a3')

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name_x</th> <th>geometry</th> <th>iso_a3</th> <th>name_y</th> <th>continent</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Bern</td> <td>POINT (7.46698 46.91668)</td> <td>CHE</td> <td>Switzerland</td> <td>Europe</td> </tr> <tr> <th>1</th> <td>Brussels</td> <td>POINT (4.33137 50.83526)</td> <td>BEL</td> <td>Belgium</td> <td>Europe</td> </tr> <tr> <th>2</th> <td>London</td> <td>POINT (-0.11867 51.50194)</td> <td>GBR</td> <td>United Kingdom</td> <td>Europe</td> </tr> <tr> <th>3</th> <td>Paris</td> <td>POINT (2.33139 48.86864)</td> <td>FRA</td> <td>France</td> <td>Europe</td> </tr> </tbody> </table> </div>

但是 ,在这个说明性的例子中,我们手动添加了公共列,它在原始数据集中并不存在。但是,我们仍然可以根据这两个数据集的空间坐标将其连接起来。

4.2 空间对象之间的空间关系

在02-空间关系中,我们看到了几何对象之间空间关系的概念:包含、相交等等

在这种情况下,我们知道每个城市都 位于 其中一个国家内,或者反过来说,每个国家可以 包含 多个城市。

我们可以用前面练习中看到的方法来检验这种关系

france = countries.loc[countries['name']=='France','geometry'].squeeze()
france

svg

cities.within(france)
0      False
1      False
2      False
3      False
4      False
       ...  
238    False
239    False
240    False
241    False
242    False
Length: 243, dtype: bool

以上输出的布尔数列,表示 cities 数据框中的每个点(城市)是否位于法国境内

因为这是一个布尔系列的结果,所以我们可以用它来筛选原始数据框,只显示那些真正位于法国境内的城市

cities[cities.within(france)]

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>geometry</th> </tr> </thead> <tbody> <tr> <th>10</th> <td>Monaco</td> <td>POINT (7.40691 43.73965)</td> </tr> <tr> <th>13</th> <td>Andorra</td> <td>POINT (1.51649 42.50000)</td> </tr> <tr> <th>186</th> <td>Geneva</td> <td>POINT (6.14003 46.21001)</td> </tr> <tr> <th>235</th> <td>Paris</td> <td>POINT (2.33139 48.86864)</td> </tr> </tbody> </table> </div>

我们现在可以对每个国家重复上述分析,并在 cities 数据框架中增加一列,表示这个国家。然而,这将是繁琐的手动操作,也正是空间连接操作为我们提供的。

(注:上述结果有可能是不正确的,但这只是因为国家数据集的粗糙性造成的,与方法无关)

4.3 空间连接操作

空间连接 = 根据空间关系将一个图层的属性连接到另一图层

空间连接操作步骤:

  • 我们要添加信息的GeoDataFrame
  • 包含我们要添加的信息的GeoDataFrame
  • 我们要用来匹配两个数据集的空间关系("相交"、"包含"、"在内")
  • 连接的类型:左连接或内连接。

在这种情况下,我们要根据两个数据集之间的空间关系,将 cities 数据框架与 countries 数据框架的信息连接起来

使用 geopandas.sjoin 函数

# !easy_install Rtree 
# !sudo apt-get update && apt-get install -y libspatialindex-dev
joined = geopandas.sjoin(cities, countries, op='within', how='inner')
joined.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name_left</th> <th>geometry</th> <th>index_right</th> <th>iso_a3</th> <th>name_right</th> <th>continent</th> <th>pop_est</th> <th>gdp_md_est</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Vatican City</td> <td>POINT (12.45339 41.90328)</td> <td>79</td> <td>ITA</td> <td>Italy</td> <td>Europe</td> <td>62137802.0</td> <td>2221000.0</td> </tr> <tr> <th>1</th> <td>San Marino</td> <td>POINT (12.44177 43.93610)</td> <td>79</td> <td>ITA</td> <td>Italy</td> <td>Europe</td> <td>62137802.0</td> <td>2221000.0</td> </tr> <tr> <th>226</th> <td>Rome</td> <td>POINT (12.48131 41.89790)</td> <td>79</td> <td>ITA</td> <td>Italy</td> <td>Europe</td> <td>62137802.0</td> <td>2221000.0</td> </tr> <tr> <th>2</th> <td>Vaduz</td> <td>POINT (9.51667 47.13372)</td> <td>9</td> <td>AUT</td> <td>Austria</td> <td>Europe</td> <td>8754413.0</td> <td>416600.0</td> </tr> <tr> <th>212</th> <td>Vienna</td> <td>POINT (16.36469 48.20196)</td> <td>9</td> <td>AUT</td> <td>Austria</td> <td>Europe</td> <td>8754413.0</td> <td>416600.0</td> </tr> </tbody> </table> </div>

joined = geopandas.sjoin(cities, countries, op='within', how='left')
joined.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name_left</th> <th>geometry</th> <th>index_right</th> <th>iso_a3</th> <th>name_right</th> <th>continent</th> <th>pop_est</th> <th>gdp_md_est</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Vatican City</td> <td>POINT (12.45339 41.90328)</td> <td>79.0</td> <td>ITA</td> <td>Italy</td> <td>Europe</td> <td>62137802.0</td> <td>2221000.0</td> </tr> <tr> <th>1</th> <td>San Marino</td> <td>POINT (12.44177 43.93610)</td> <td>79.0</td> <td>ITA</td> <td>Italy</td> <td>Europe</td> <td>62137802.0</td> <td>2221000.0</td> </tr> <tr> <th>2</th> <td>Vaduz</td> <td>POINT (9.51667 47.13372)</td> <td>9.0</td> <td>AUT</td> <td>Austria</td> <td>Europe</td> <td>8754413.0</td> <td>416600.0</td> </tr> <tr> <th>3</th> <td>Lobamba</td> <td>POINT (31.20000 -26.46667)</td> <td>152.0</td> <td>SWZ</td> <td>Swaziland</td> <td>Africa</td> <td>1467152.0</td> <td>11060.0</td> </tr> <tr> <th>4</th> <td>Luxembourg</td> <td>POINT (6.13000 49.61166)</td> <td>97.0</td> <td>LUX</td> <td>Luxembourg</td> <td>Europe</td> <td>594130.0</td> <td>58740.0</td> </tr> </tbody> </table> </div>

joined['continent'].value_counts() # 重复元素出现的次数
Asia             59
Africa           57
Europe           46
North America    26
South America    14
Oceania           8
Name: continent, dtype: int64

4.4 练习

我们将再次使用巴黎的数据集来做一些练习,再次导入它们

districts = geopandas.read_file("data/paris_districts.geojson").to_crs(epsg=2154)
stations = geopandas.read_file("data/paris_bike_stations.geojson").to_crs(epsg=2154)

4.4.1 练习一

  • 使用空间连接方法,确定每个自行车站位于哪个区,将结果命名为 joined
joined=geopandas.sjoin(stations,districts[['district_name','geometry']],op='within')
joined.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>name</th> <th>bike_stands</th> <th>available_bikes</th> <th>geometry</th> <th>index_right</th> <th>district_name</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>14002 - RASPAIL QUINET</td> <td>44</td> <td>4</td> <td>POINT (650791.111 6860114.328)</td> <td>52</td> <td>Montparnasse</td> </tr> <tr> <th>143</th> <td>14112 - FAUBOURG SAINT JACQUES CASSINI</td> <td>16</td> <td>0</td> <td>POINT (651406.382 6859738.689)</td> <td>52</td> <td>Montparnasse</td> </tr> <tr> <th>293</th> <td>14033 - DAGUERRE GASSENDI</td> <td>38</td> <td>1</td> <td>POINT (650694.991 6859723.873)</td> <td>52</td> <td>Montparnasse</td> </tr> <tr> <th>346</th> <td>14006 - SAINT JACQUES TOMBE ISSOIRE</td> <td>22</td> <td>0</td> <td>POINT (651327.035 6859441.637)</td> <td>52</td> <td>Montparnasse</td> </tr> <tr> <th>429</th> <td>14111 - DENFERT-ROCHEREAU CASSINI</td> <td>24</td> <td>8</td> <td>POINT (651261.351 6859926.893)</td> <td>52</td> <td>Montparnasse</td> </tr> </tbody> </table> </div>

joined['district_name'].value_counts() 
Gare               33
Charonne           30
Picpus             26
Saint-Lambert      26
Javel 15Art        22
                   ..
Arts-et-Metiers     4
Archives            4
Sainte-Avoie        4
Notre-Dame          3
Enfants-Rouges      3
Name: district_name, Length: 80, dtype: int64
import contextily as ctx
ax=joined.plot(column="district_name",cmap='tab20')
ctx.add_basemap(ax)

png

4.4.2 练习二:分区域的树木密度计算(Ⅰ)

利用巴黎公共空间所有树木的数据集,目标是制作一张各区树木密度图。为此,我们首先需要找出每个地区包含多少棵树,我们将在本练习中进行。在接下来的练习中,我们将利用这个结果计算密度并创建地图。

为了获得按地区划分的树木数量,我们首先需要知道每棵树位于哪个地区,我们可以通过空间连接来实现。然后,利用空间连接的结果,我们将使用pandas的"group-by"功能计算出每个区的树木数量。

  • 导入树木数据集 "paris_trees.gpkg" ,将结果命名为 trees 。同时读取我们之前看到的地区数据集( "paris_districts.geojson" ),并调用这个 districts 。将地区数据集转换为与树木数据集相同的CRS
  • 使用空间连接将一个带有 'district_name' 的列添加到树木数据集中。将结果称为 joined
trees=geopandas.read_file("data/paris_trees.gpkg")
districts=geopandas.read_file("data/paris_districts.geojson").to_crs(trees.crs)
trees.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa231f53898>

png

districts.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa2288f15f8>

png

joined=geopandas.sjoin(trees,districts[['district_name','geometry']],op='within')
joined.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>species</th> <th>location_type</th> <th>geometry</th> <th>index_right</th> <th>district_name</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Marronnier</td> <td>Alignement</td> <td>POINT (455834.122 5410780.606)</td> <td>43</td> <td>Sainte-Marguerite</td> </tr> <tr> <th>130</th> <td>Micocoulier</td> <td>Alignement</td> <td>POINT (455458.848 5411310.443)</td> <td>43</td> <td>Sainte-Marguerite</td> </tr> <tr> <th>142</th> <td>Platane</td> <td>Alignement</td> <td>POINT (455704.681 5410991.067)</td> <td>43</td> <td>Sainte-Marguerite</td> </tr> <tr> <th>402</th> <td>Cedrele</td> <td>Alignement</td> <td>POINT (455538.223 5411112.314)</td> <td>43</td> <td>Sainte-Marguerite</td> </tr> <tr> <th>428</th> <td>Micocoulier</td> <td>Alignement</td> <td>POINT (455487.563 5411285.863)</td> <td>43</td> <td>Sainte-Marguerite</td> </tr> </tbody> </table> </div>

至此,可以确定每棵树所处的区

4.4.3 练习三:分区域的树木密度计算(Ⅱ)

  • 计算每个地区的树木数量:通过 'district_name' 列对 joined DataFrame进行分组,并计算每组的数量。我们将得到的系列 "trees_by_district"转换为DataFrame,用于下一步的练习
trees_by_district=joined.groupby("district_name").size()
trees_by_district=trees_by_district.to_frame(name='n_trees')
trees_by_district.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>n_trees</th> </tr> <tr> <th>district_name</th> <th></th> </tr> </thead> <tbody> <tr> <th>Amérique</th> <td>183</td> </tr> <tr> <th>Archives</th> <td>8</td> </tr> <tr> <th>Arsenal</th> <td>60</td> </tr> <tr> <th>Arts-et-Metiers</th> <td>20</td> </tr> <tr> <th>Auteuil</th> <td>392</td> </tr> </tbody> </table> </div>

4.4.4 练习四:分区域的树木密度计算(Ⅲ)

现在已经获得了各区的树木数量,我们可以根据树木密度来制作各区的地图。

为此,我们首先需要将上一步计算出的每个地区的树木数量( trees_by_district )合并回地区数据集。我们将使用 pd.merge() 函数,基于一个共同的列将两个数据框合并

由于不是所有的地区都有相同的面积,所以比较公平的做法是将树木密度可视化,即单位面积的树木数量

districts_trees = pd.merge(districts, trees_by_district, on='district_name')
districts_trees.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>id</th> <th>district_name</th> <th>population</th> <th>geometry</th> <th>n_trees</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>St-Germain-l'Auxerrois</td> <td>1672</td> <td>POLYGON ((451922.133 5411438.484, 451922.080 5...</td> <td>40</td> </tr> <tr> <th>1</th> <td>2</td> <td>Halles</td> <td>8984</td> <td>POLYGON ((452278.419 5412160.893, 452192.407 5...</td> <td>40</td> </tr> <tr> <th>2</th> <td>3</td> <td>Palais-Royal</td> <td>3195</td> <td>POLYGON ((451553.806 5412340.522, 451528.058 5...</td> <td>4</td> </tr> <tr> <th>3</th> <td>4</td> <td>Place-Vendôme</td> <td>3044</td> <td>POLYGON ((451004.908 5412654.095, 450960.640 5...</td> <td>7</td> </tr> <tr> <th>4</th> <td>5</td> <td>Gaillon</td> <td>1345</td> <td>POLYGON ((451328.752 5412991.278, 451294.721 5...</td> <td>7</td> </tr> </tbody> </table> </div>

districts_trees['n_trees_per_area']=districts_trees['n_trees']/districts_trees.geometry.area
ax=districts_trees.plot(column='n_trees_per_area',figsize=(12,6),cmap='BuGn',legend=True)
ax
<matplotlib.axes._subplots.AxesSubplot at 0x7fa22882f4e0>

png

4.5 空间叠加操作

在上面的空间连接操作中,我们并没有改变矢量对象本身。我们不是在连接矢量对象本身,而是基于矢量对象之间的空间关系来连接属性。这也意味着矢量对象至少需要部分重叠

如果想基于将不同数据框的矢量对象连接(组合)成一个新的对象(例如通过取几何体的交点)来创建新的矢量对象,需要进行空间 叠加 操作

africa=countries[countries['continent']=='Africa']
africa.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa2286ed7f0>

png

cities['geometry']=cities.buffer(2)
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'buffer' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  """Entry point for launching an IPython kernel.
geopandas.overlay(africa,cities,how="difference").plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa22661c160>

png

<div class="alert alert-info" style="font-size:120%"> <b>注意</b> <br>

  • 空间连接,Spatial join :根据空间关系将属性从一个数据框连接(复制)到另一个数据框
  • 空间叠加,Spatial overlay :根据两个数据框之间的空间操作(并结合两个数据框的属性)构建新的矢量对象

</div>

4.5.1 练习一:探索土地利用数据集

在下面的练习中,我们首先介绍一个新的数据集:关于巴黎土地使用的数据集(基于开放的欧洲 城市地图集 的简化版本)。土地使用表明了某一区域被用于何种活动,如住宅区或休闲娱乐。它是一个多边形数据集,用一个标签代表巴黎不同地区的土地使用类别。

在这个练习中,我们将探索数据,直观地探究数据,并计算出巴黎地区不同土地利用的面积

  • 读取 'paris_land_use.shp' 文件,并将结果命名为变量 land_use
  • 绘制土地利用图,使用 class 列给多边形上色,并添加图例。注意:由于有很多多边形,可能需要几秒钟的时间来生成图形
  • 添加一列 area 来表示每个多边形的面积
  • 使用 groupby() 方法计算每种土地利用类型的总面积,以km²为单位,并打印结果
land_use=geopandas.read_file("zip://data/paris_land_use.zip")
land_use.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>class</th> <th>geometry</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Water bodies</td> <td>POLYGON ((3751386.281 2890064.323, 3751395.345...</td> </tr> <tr> <th>1</th> <td>Roads and associated land</td> <td>POLYGON ((3751390.345 2886000.000, 3751390.345...</td> </tr> <tr> <th>2</th> <td>Roads and associated land</td> <td>POLYGON ((3751390.345 2886898.192, 3751390.370...</td> </tr> <tr> <th>3</th> <td>Roads and associated land</td> <td>POLYGON ((3751390.345 2887500.000, 3751390.345...</td> </tr> <tr> <th>4</th> <td>Roads and associated land</td> <td>POLYGON ((3751390.345 2888647.357, 3751390.370...</td> </tr> </tbody> </table> </div>

land_use.plot(column='class', legend=True, figsize=(15, 10))
<matplotlib.axes._subplots.AxesSubplot at 0x7fa22871a9b0>

png

land_use['area'] = land_use.geometry.area
total_area=land_use.groupby('class')['area'].sum()/1000**2
total_area
class
Continuous Urban Fabric             45.943090
Discontinuous Dense Urban Fabric     3.657343
Green urban areas                    9.858438
Industrial, commercial, public      13.295042
Railways and associated land         1.935793
Roads and associated land            7.401574
Sports and leisure facilities        3.578509
Water bodies                         3.189706
Name: area, dtype: float64

4.5.2 练习二:取两个多边形的交集

在这项工作中,我们将使用两个单独的多边形:从 districts 数据集中提取的Muette区,以及从 land_use 数据集中提取的布洛涅绿色城区(巴黎西部的一个大型公共公园)。这两个多边形已经被分配给 muette park_boulogne 变量

我们首先将这两个多边形可视化。你会看到它们是重叠的,但公园并不完全位于穆埃特区。让我们来确定重叠的部分。

  • 将两个多边形绘制在一张地图上,直观地检查重叠的程度
  • 计算 park_boulogne muette 两个多边形的交集
  • 输出公园面积占该区总面积的比例
land_use = geopandas.read_file("zip://data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
# 提取多边形
land_use['area']=land_use.geometry.area
park_boulogne=land_use[land_use['class']=="Green urban areas"].sort_values('area').geometry.iloc[-1]
muette = districts[districts.district_name == 'Muette'].geometry.squeeze()
geopandas.GeoSeries([park_boulogne,muette]).plot(alpha=0.5,color=['green','red'])
<matplotlib.axes._subplots.AxesSubplot at 0x7fa2261de0f0>

png

intersection = park_boulogne.intersection(muette)
geopandas.GeoSeries([intersection]).plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fa2265b6b00>

png

print(intersection.area / muette.area)
0.4352082235641065

4.5.3 练习三:GeoDataFrame与多边形相交

结合土地使用数据集和地区数据集,我们现在可以调查某个地区的土地使用情况

为此,我们首先需要确定土地利用数据集与给定区的交集,以 Muette 区为例

  • 计算 land_use 多边形与 muette 多边形的交集,结果命名为 land_use_muette
  • 绘制地图,并传递 edgecolor='black' 以更清楚地看到不同多边形的边界
  • 打印 land_use_muette 的前五行
land_use = geopandas.read_file("zip://data/paris_land_use.zip")
districts = geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
muette = districts[districts.district_name == 'Muette'].geometry.squeeze()
land_use_muette = land_use.geometry.intersection(muette)
# land_use_muette.reset_index().plot(edgecolor='black')
# TypeError: 'Polygon' object does not support indexing :.reset_index()
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-52-822195a428d0> in <module>
----> 1 land_use_muette.reset_index().plot(edgecolor='black')
      2 # TypeError: 'Polygon' object does not support indexing :.reset_index()


/usr/local/lib/python3.6/dist-packages/geopandas/geodataframe.py in plot(self, *args, **kwargs)
    919         from there.
    920         """
--> 921         return plot_dataframe(self, *args, **kwargs)
    922 
    923     plot.__doc__ = plot_dataframe.__doc__


/usr/local/lib/python3.6/dist-packages/geopandas/plotting.py in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, categories, classification_kwds, missing_kwds, aspect, **style_kwds)
    614     if column is None:
    615         return plot_series(
--> 616             df.geometry,
    617             cmap=cmap,
    618             color=color,


/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5128             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5129                 return self[name]
-> 5130             return object.__getattribute__(self, name)
   5131 
   5132     def __setattr__(self, name: str, value) -> None:


/usr/local/lib/python3.6/dist-packages/geopandas/geodataframe.py in _get_geometry(self)
    171             raise AttributeError(
    172                 "No geometry data set yet (expected in"
--> 173                 " column '%s'.)" % self._geometry_column_name
    174             )
    175         return self[self._geometry_column_name]


AttributeError: No geometry data set yet (expected in column 'geometry'.)

png

land_use_muette.head()
/usr/local/lib/python3.6/dist-packages/geopandas/array.py:689: RuntimeWarning: All-NaN slice encountered
  np.nanmin(b[:, 0]),  # minx
/usr/local/lib/python3.6/dist-packages/geopandas/array.py:690: RuntimeWarning: All-NaN slice encountered
  np.nanmin(b[:, 1]),  # miny
/usr/local/lib/python3.6/dist-packages/geopandas/array.py:691: RuntimeWarning: All-NaN slice encountered
  np.nanmax(b[:, 2]),  # maxx
/usr/local/lib/python3.6/dist-packages/geopandas/array.py:692: RuntimeWarning: All-NaN slice encountered
  np.nanmax(b[:, 3]),  # maxy





0    POLYGON EMPTY
1    POLYGON EMPTY
2    POLYGON EMPTY
3    POLYGON EMPTY
4    POLYGON EMPTY
dtype: geometry

从图中可以看出,我们现在只有完整的土地使用数据集的一个子集。虽然 land_use_muette 仍然有和原来的 land_use 一样多的行数。但正如你在打印第一行时看到的那样,许多行现在是由空多边形组成的,即与Muette区没有交集

land_use_muette = land_use.copy()
land_use_muette['geometry'] = land_use.geometry.intersection(muette)
land_use_muette = land_use_muette[~land_use_muette.is_empty]
land_use_muette.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>class</th> <th>geometry</th> </tr> </thead> <tbody> <tr> <th>135</th> <td>Green urban areas</td> <td>POLYGON ((3752020.694 2891519.323, 3752025.310...</td> </tr> <tr> <th>229</th> <td>Sports and leisure facilities</td> <td>POLYGON ((3752443.986 2891171.823, 3752446.430...</td> </tr> <tr> <th>239</th> <td>Water bodies</td> <td>POLYGON ((3752110.034 2891774.197, 3752112.444...</td> </tr> <tr> <th>278</th> <td>Roads and associated land</td> <td>POLYGON ((3752000.000 2891530.298, 3752000.000...</td> </tr> <tr> <th>279</th> <td>Roads and associated land</td> <td>POLYGON ((3751954.462 2891571.778, 3752000.000...</td> </tr> </tbody> </table> </div>

land_use_muette.dissolve(by='class') # 根据class进行矢量融合

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>geometry</th> </tr> <tr> <th>class</th> <th></th> </tr> </thead> <tbody> <tr> <th>Continuous Urban Fabric</th> <td>MULTIPOLYGON (((3755334.091 2889457.833, 37553...</td> </tr> <tr> <th>Discontinuous Dense Urban Fabric</th> <td>MULTIPOLYGON (((3755585.963 2889793.822, 37556...</td> </tr> <tr> <th>Green urban areas</th> <td>MULTIPOLYGON (((3755772.518 2889995.038, 37558...</td> </tr> <tr> <th>Industrial, commercial, public</th> <td>MULTIPOLYGON (((3755275.182 2889527.443, 37552...</td> </tr> <tr> <th>Railways and associated land</th> <td>POLYGON ((3755654.921 2889540.054, 3755560.618...</td> </tr> <tr> <th>Roads and associated land</th> <td>MULTIPOLYGON (((3754820.067 2889843.877, 37548...</td> </tr> <tr> <th>Sports and leisure facilities</th> <td>MULTIPOLYGON (((3753932.354 2891267.190, 37539...</td> </tr> <tr> <th>Water bodies</th> <td>MULTIPOLYGON (((3755507.459 2889412.262, 37555...</td> </tr> </tbody> </table> </div>

land_use_muette.dissolve(by='class').reset_index().plot(column='class',legend=True,figsize=(15,6))
<matplotlib.axes._subplots.AxesSubplot at 0x7fa22858f438>

png

4.5.4 练习四:空间数据集的叠加操作

现在,通过叠加操作结合两个数据集,创建一个新的GeoDataFrame,由每个地区的土地利用多边形的交集组成,但要确保从两个源层中引入属性数据

一旦我们创建了土地利用和地区数据集的叠加,我们就可以更容易地检查不同地区的土地利用情况。让我们回到Muette区的例子,检查该区的土地利用情况

  • 根据 land_use districts 的交集创建一个新的GeoDataFrame,将结果分配给一个变量 combined
  • 打印生成的GeoDataFrame( combined )的第一行
  • 添加一个新的列 area ,即 combined 中每个多边形的面积
  • 创建一个名为 land_use_muette 的子集,其中 district_name 等于 Muette
  • 绘制 land_use_muette 的图,使用 class 列对多边形进行着色。
  • 使用 groupby() 方法计算 land_use_muette 的每个 class 的总面积,并打印结果。
land_use=geopandas.read_file("zip://data/paris_land_use.zip")
districts=geopandas.read_file("data/paris_districts.geojson").to_crs(land_use.crs)
combined=geopandas.overlay(land_use,districts,how='intersection')
combined.head()

<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>class</th> <th>id</th> <th>district_name</th> <th>population</th> <th>geometry</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>Water bodies</td> <td>61</td> <td>Auteuil</td> <td>67967</td> <td>POLYGON ((3751395.345 2890118.001, 3751395.345...</td> </tr> <tr> <th>1</th> <td>Continuous Urban Fabric</td> <td>61</td> <td>Auteuil</td> <td>67967</td> <td>MULTIPOLYGON (((3753253.104 2888254.529, 37532...</td> </tr> <tr> <th>2</th> <td>Roads and associated land</td> <td>61</td> <td>Auteuil</td> <td>67967</td> <td>POLYGON ((3751519.830 2890061.509, 3751522.057...</td> </tr> <tr> <th>3</th> <td>Green urban areas</td> <td>61</td> <td>Auteuil</td> <td>67967</td> <td>MULTIPOLYGON (((3754314.717 2890283.121, 37543...</td> </tr> <tr> <th>4</th> <td>Roads and associated land</td> <td>61</td> <td>Auteuil</td> <td>67967</td> <td>POLYGON ((3751619.113 2890500.000, 3751626.627...</td> </tr> </tbody> </table> </div>

combined.plot(column='district_name')
<matplotlib.axes._subplots.AxesSubplot at 0x7fa225f181d0>

png

combined['area'] = combined.geometry.area
land_use_muette=combined[combined['district_name']=='Muette']
land_use_muette.plot(column='class')
<matplotlib.axes._subplots.AxesSubplot at 0x7fa228665f60>

png

print(land_use_muette.groupby('class')['area'].sum()/1000**2)
class
Continuous Urban Fabric             1.275297
Discontinuous Dense Urban Fabric    0.088289
Green urban areas                   2.624229
Industrial, commercial, public      0.362990
Railways and associated land        0.005424
Roads and associated land           0.226271
Sports and leisure facilities       0.603989
Water bodies                        0.292194
Name: area, dtype: float64
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐