Tutorial 1.2 - Spatial analysis with Python#

In this tutorial, we will take a quick tour to Python’s (spatial) data science ecosystem and see how we can use some of the fundamental open source Python packages, such as:

  • pandas / geopandas

  • shapely

  • pysal

  • pyproj

  • osmnx / pyrosm

  • matplotlib (visualization)

As you can see, we won’t use any GIS software for doing the programming (such as ArcGIS/arcpy or QGIS), but focus on learning the open source packages that are independent from any specific software. These libraries form nowadays not only the core for modern spatial data science, but they are also fundamental parts of commercial applications used and developed by many companies around the world.

Note

If you have experience working with the Python’s spatial data science stack, this tutorial probably does not bring much new to you, but to get everyone on the same page, we will all go through this introductory tutorial.

Contents:

  • Reading / writing spatial data

  • Retrieving OpenStreetMap data

  • Reprojections

  • Spatial join

  • Plotting data with matplotlib

Fundamental library: Geopandas#

In this course, the most often used Python package that you will learn is geopandas. Geopandas makes it possible to work with geospatial data in Python in a relatively easy way. Geopandas combines the capabilities of the data analysis library pandas with other packages like shapely and fiona for managing spatial data. The main data structures in geopandas are GeoSeries and GeoDataFrame which extend the capabilities of Series and DataFrames from pandas. In case you wish to have additional help getting started with pandas, we recommend you to take a look lessons 5 and 6 from the openly available Geo-Python -course. The main difference between GeoDataFrames and pandas DataFrames is that a GeoDataFrame should contain (at least) one column for geometries. By default, the name of this column is 'geometry'. The geometry column is a GeoSeries which contains the geometries (points, lines, polygons, multipolygons etc.) as shapely objects.

geodataframe.png

Reading and writing spatial data#

Next we will learn some of the basic functionalities of geopandas. We have a couple of GeoJSON files stored in the data folder that we will use.

We can read the data easily with read_file() -function:

import geopandas as gpd

# Filepath
fp = "data/buildings.geojson"

# Read the file
data = gpd.read_file(fp)

# How does it look?
data.head()
addr:city addr:country addr:housenumber addr:housename addr:postcode addr:street email name opening_hours operator ... start_date wikipedia id timestamp version tags osm_type internet_access changeset geometry
0 Helsinki None 29 None 00170 Unioninkatu None None None None ... None None 4253124 1542041335 4 None way None NaN POLYGON ((24.95121 60.16999, 24.95122 60.16988...
1 Helsinki None 2 None 00100 Kaivokatu ainfo@ateneum.fi Ateneum Tu, Fr 10:00-18:00; We-Th 10:00-20:00; Sa-Su 1... None ... 1887 fi:Ateneumin taidemuseo 8033120 1544822447 27 {'architect': 'Theodor Höijer', 'contact:websi... way None NaN POLYGON ((24.94477 60.16982, 24.94450 60.16981...
2 Helsinki FI 22-24 None None Mannerheimintie None Lasipalatsi None None ... 1936 fi:Lasipalatsi 8035238 1533831167 23 {'name:fi': 'Lasipalatsi', 'name:sv': 'Glaspal... way None NaN POLYGON ((24.93561 60.17045, 24.93555 60.17054...
3 Helsinki None 2 None 00100 Mannerheiminaukio None Kiasma Tu 10:00-17:00; We-Fr 10:00-20:30; Sa 10:00-18... None ... 1998 fi:Kiasma (rakennus) 8042215 1553963033 30 {'name:en': 'Museum of Modern Art Kiasma', 'na... way None NaN POLYGON ((24.93682 60.17152, 24.93662 60.17150...
4 None FI None None None None None None None None ... None None 15243643 1546289715 7 None way None NaN POLYGON ((24.93675 60.16779, 24.93660 60.16789...

5 rows × 34 columns

As we can see, the GeoDataFrame contains various attributes in separate columns. The geometry column contains the spatial information. We can take a look of some of the basic information about our GeoDataFrame with command:

data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 486 entries, 0 to 485
Data columns (total 34 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   addr:city           86 non-null     object  
 1   addr:country        57 non-null     object  
 2   addr:housenumber    88 non-null     object  
 3   addr:housename      4 non-null      object  
 4   addr:postcode       54 non-null     object  
 5   addr:street         89 non-null     object  
 6   email               2 non-null      object  
 7   name                81 non-null     object  
 8   opening_hours       8 non-null      object  
 9   operator            7 non-null      object  
 10  phone               8 non-null      object  
 11  ref                 1 non-null      object  
 12  url                 8 non-null      object  
 13  website             20 non-null     object  
 14  building            486 non-null    object  
 15  amenity             26 non-null     object  
 16  building:levels     162 non-null    object  
 17  building:material   2 non-null      object  
 18  building:min_level  4 non-null      object  
 19  height              17 non-null     object  
 20  landuse             2 non-null      object  
 21  office              5 non-null      object  
 22  shop                5 non-null      object  
 23  source              3 non-null      object  
 24  start_date          87 non-null     object  
 25  wikipedia           47 non-null     object  
 26  id                  486 non-null    int64   
 27  timestamp           486 non-null    int64   
 28  version             486 non-null    int64   
 29  tags                181 non-null    object  
 30  osm_type            486 non-null    object  
 31  internet_access     1 non-null      object  
 32  changeset           66 non-null     float64 
 33  geometry            486 non-null    geometry
dtypes: float64(1), geometry(1), int64(3), object(29)
memory usage: 129.2+ KB

From here, we can see that our data is indeed a GeoDataFrame object with 486 entries and 34 columns. You can also get descriptive statistics of your data by calling:

data.describe()
id timestamp version changeset
count 4.860000e+02 4.860000e+02 486.000000 66.0
mean 1.400780e+08 1.455829e+09 4.849794 0.0
std 1.633527e+08 9.247528e+07 4.561162 0.0
min 8.253000e+03 1.197929e+09 1.000000 0.0
25% 2.294267e+07 1.374229e+09 2.000000 0.0
50% 1.228699e+08 1.493288e+09 3.000000 0.0
75% 1.359805e+08 1.530222e+09 7.000000 0.0
max 1.042029e+09 1.555840e+09 31.000000 0.0

In this case, we didn’t have many columns with numerical data, but typically you have numeric values in your dataset and this is a good way to get a quick view how the data look like.

Naturally, as the data is spatial, we want to visualize it to understand the nature of the data better. We can do this easily with plot() method:

data.plot()
<Axes: >
../../_images/e2a7d07940b2d2dbc29fbd1dcb105663781324671420b8e548228b3fbee376c5.png

Now we can see that the data indeed represents buildings (in central Helsinki). Naturally we can also write this data to disk. Geopandas supports writing data to various data formats as well as to PostGIS which is the most widely used open source database for GIS. Let’s write the data as a Shapefile to the same data directory from where we read the data. When writing data to local disk you can use to_file() method that exports the data in Shapefile format by default:

# Output filepath
outfp = "data/buildings_copy.shp"
data.to_file(outfp)
/tmp/ipykernel_18713/403506898.py:3: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile.
  data.to_file(outfp)

Retrieving data from OpenStreetMap#

Now we have seen how to read spatial data from disk. OpenStreetMap (OSM) is probably the most well known and widely used spatial dataset/database in the world. Also in this course, we will use OSM data frequently. Hence, let’s see how we can retrieve data from OSM using a library called pyrosm. With pyrosm you can easily download and extract data from anywhere in the world based on OSM.PBF files that are distributed e.g. by Geofabrik. The tool aims to be an efficient way to parse OSM data covering large geographical areas (such as countries and cities), but as a downside, it is a bit limited in a sense how you can define your area of interest. With pyrosm you can extract OSM data from 654 regions in the world (covering all countries plus many city regions, see docs for further info).

Note

In case you want to extract OSM data from smaller areas, e.g. using a buffer of 2 km around a specific location, we recommend using OSMnx library, which is more flexible in terms of specifying the area of interest.

Let’s see how we can download and extract OSM data for Helsinki central area using pyrosm:

from pyrosm import OSM, get_data

# Download data for Helsinki
fp = get_data("helsinki_pbf")

# Initialize the reader object for Helsinki
osm = OSM(fp)

As a first step, we downloaded the data for “Helsinki” using the get_data function, which is a helper function that automates the data downloading process and stores the data locally in a temporary folder (inside /tmp/ in this case). The next step that we did, was to initialize a reader object called OSM. The OSM takes the filepath to a given osm.pbf file as an input. Notice that at this point we didn’t yet read any data into GeoDataFrame.

OSM is a “database of the world”, hence it contains a lot of information about different things. With pyrosm you can easily extract information about:

  • street networks –> osm.get_network()

  • buildings –> osm.get_buildings()

  • Points of Interest (POI) –> osm.get_pois()

  • landuse –> osm.get_landuse()

  • natural elements –> osm.get_natural()

  • boundaries –> osm.get_boundaries()

Let’s see how we can read all the buildings from Helsinki Region:

buildings = osm.get_buildings()
/home/hentenka/.conda/envs/mamba/envs/sm-test2/lib/python3.10/site-packages/geopandas/array.py:1406: UserWarning: CRS not set for some of the concatenation inputs. Setting output's CRS as WGS 84 (the single non-null crs provided).
  warnings.warn(
buildings.head()
addr:city addr:country addr:housenumber addr:housename addr:postcode addr:street email name opening_hours operator ... start_date wikipedia id timestamp version tags osm_type geometry internet_access changeset
0 Helsinki None 29 None 00170 Unioninkatu None None None None ... None None 4253124 1542041335 4 None way POLYGON ((24.95121 60.16999, 24.95122 60.16988... NaN NaN
1 Helsinki None 2 None 00100 Kaivokatu ainfo@ateneum.fi Ateneum Tu, Fr 10:00-18:00; We-Th 10:00-20:00; Sa-Su 1... None ... 1887 fi:Ateneumin taidemuseo 8033120 1544822447 27 {"architect":"Theodor H\u00F6ijer","contact:we... way POLYGON ((24.94477 60.16982, 24.94450 60.16981... NaN NaN
2 Helsinki FI 22-24 None None Mannerheimintie None Lasipalatsi None None ... 1936 fi:Lasipalatsi 8035238 1533831167 23 {"name:fi":"Lasipalatsi","name:sv":"Glaspalats... way POLYGON ((24.93561 60.17045, 24.93555 60.17054... NaN NaN
3 Helsinki None 2 None 00100 Mannerheiminaukio None Kiasma Tu 10:00-17:00; We-Fr 10:00-20:30; Sa 10:00-18... None ... 1998 fi:Kiasma (rakennus) 8042215 1553963033 30 {"name:en":"Museum of Modern Art Kiasma","name... way POLYGON ((24.93682 60.17152, 24.93662 60.17150... NaN NaN
4 None FI None None None None None None None None ... None None 15243643 1546289715 7 None way POLYGON ((24.93675 60.16779, 24.93660 60.16789... NaN NaN

5 rows × 34 columns

Let’s check how many buildings did we get:

len(buildings)
490

Okay, so in this sample there are almost 500 buildings in the Helsinki city center area. Naturally, we would like to see them on a map. Let’s plot our data using plot() (might take some time as there are many objects to plot):

buildings.plot()
<Axes: >
../../_images/cc406d20e56af2bc3f150557bb7615d60414d7da8dc6515b7d2c64d15a45dfd9.png

Great! As a result we got a map that seems to look correct.

Reprojecting data#

As we can see from the previous maps that we have produced, the coordinates on the x and y axis hint that our geometries are represented in decimal degrees (WGS84). In many cases, you want to reproject your data to another CRS. Luckily, doing that is easy with geopandas. Let’s first take a look what the Coordinate Reference System (CRS) of our GeoDataFrame is. We can access the CRS information of the GeoDataFrame by accessing an attribute called crs:

buildings.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

As a result, we get information about the CRS, and we can see that the data is indeed in WGS84. We can also see that the EPSG code for the CRS is 4326. We can easily reproject our data by using a method to_crs(). The easiest way to use the method is to specify the destination CRS as an EPSG code. Let’s reproject our data into EPSG 3067 which is the most widely used projected coordinate reference system used in Finland, EUREF-FIN:

projected = buildings.to_crs(epsg=3067)
projected.crs
<Projected CRS: EPSG:3067>
Name: ETRS89 / TM35FIN(E,N)
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Finland - onshore and offshore.
- bounds: (19.08, 58.84, 31.59, 70.09)
Coordinate Operation:
- name: TM35FIN
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

As we can see, now we have an Projected CRS as a result. To confirm the difference, let’s take a look at the geometry of the first row in our original buildings GeoDataFrame and the projected GeoDataFrame. To select a specific row in data, we can use the loc indexing:

orig_geom = buildings.loc[0, "geometry"]
projected_geom = projected.loc[0, "geometry"]

print("Orig:\n", orig_geom, "\n")
print("Proj:\n", projected_geom)
Orig:
 POLYGON ((24.9512088 60.1699932, 24.951221 60.1698793, 24.9512218 60.1698721, 24.9510476 60.1698675, 24.9510329 60.1700042, 24.9512071 60.1700088, 24.9512077 60.1700029, 24.9512088 60.1699932)) 

Proj:
 POLYGON ((386322.1865094752 6672106.636750729, 386322.4697492139 6672093.934731436, 386322.489251401 6672093.131744741, 386312.8098611589 6672092.919509148, 386312.46675447107 6672108.164263601, 386322.1461046373 6672108.376497884, 386322.15900343354 6672107.718590526, 386322.1865094752 6672106.636750729))

As we can see the coordinates that form our Polygon has changed from decimal degrees to meters. Let’s see what happens if we just call the geometries:

orig_geom
../../_images/4c06f437111d24c14623cf6174ab2a1cc06a1e0b0a94e649a78b4b8f7c7503ec.svg
projected_geom
../../_images/b43fb34a367ad510dc4e25b273b1226c167e7af67b9f039b266fc83310ac6efa.svg

As you can see, we can draw the geometry directly in the screen, and we can easily see the difference in the shape of the two geometries. The orig_geom and projected_geom variables contain a Shapely geometry which is Polygon in this case. We can confirm this by checking the type:

type(orig_geom)
shapely.geometry.polygon.Polygon

These shapely geometries are used as the underlying data structure in most GIS packages in Python to present geometrical information. Shapely is fundamentally a Python wrapper for GEOS which is widely used library (written in C++) under the hood of many GIS softwares such as QGIS, GDAL, GRASS, PostGIS, Google Earth etc. Currently, there is ongoing work to vectorize all the GEOS functionalities for Python and bring those eventually into Shapely which will greatly boost the performance of all geometry related operations in Python ecosystem (approaching the same efficiency as PostGIS). Some of these improvements can already be found under the hood of latest version of geopandas.

Calculating area#

One thing that is quite often interesting to know when working with spatial data, is the area of the geometries. In geopandas, we can easily calculate e.g. the area for each of our buildings by:

projected["building_area"] = projected.area
projected["building_area"].describe()
count     490.000000
mean     1063.572690
std      1164.875233
min         0.000000
25%       206.837818
50%       791.543677
75%      1425.282218
max      8244.176220
Name: building_area, dtype: float64

We calculated the area by calling area which is the attribute containing information about areas of the buildings measured based on the map units of the data. Hence, in this case because our data is projected in Euref-FIN the units that we stored in "building_area" column are square meters. It’s important to always keep in mind the CRS when calculating areas, distances etc. with geometries.

Spatial join#

A commonly needed GIS functionality, is to be able to merge information between two layers using location as the key. Hence, it is somewhat similar approach as table join but because the operation is based on geometries, it is called spatial join. Next, we will see how we can conduct a spatial join and merge information between two layers. We will read all restaurants from the OSM for Helsinki Region, and combine information from restaurants to the underlying building (restaurants typically are within buildings). We will again use pyrosm for reading the data, but this time we will use get_pois() function:

# Read Points of Interest using the same OSM reader object that was initialized earlier
restaurants = osm.get_pois(custom_filter={"amenity": ["restaurant"]})
restaurants.plot()
/home/hentenka/.conda/envs/mamba/envs/sm-test2/lib/python3.10/site-packages/pyrosm/pyrosm.py:576: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  gdf = get_poi_data(
<Axes: >
../../_images/b9247956d81b76dd516703d1d28487a75ea3021f88be96f88497109d9aeda6de.png
restaurants.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   changeset         214 non-null    int8    
 1   version           214 non-null    int8    
 2   lon               214 non-null    float32 
 3   lat               214 non-null    float32 
 4   tags              181 non-null    object  
 5   timestamp         214 non-null    int64   
 6   id                214 non-null    int64   
 7   addr:city         172 non-null    object  
 8   addr:country      121 non-null    object  
 9   addr:housenumber  145 non-null    object  
 10  addr:housename    24 non-null     object  
 11  addr:postcode     144 non-null    object  
 12  addr:street       161 non-null    object  
 13  email             27 non-null     object  
 14  name              213 non-null    object  
 15  opening_hours     102 non-null    object  
 16  operator          10 non-null     object  
 17  phone             54 non-null     object  
 18  url               7 non-null      object  
 19  website           154 non-null    object  
 20  amenity           214 non-null    object  
 21  internet_access   1 non-null      object  
 22  pub               2 non-null      object  
 23  source            1 non-null      object  
 24  geometry          214 non-null    geometry
 25  osm_type          214 non-null    object  
dtypes: float32(2), geometry(1), int64(2), int8(2), object(19)
memory usage: 39.0+ KB

As we can see, the OSM for Helsinki Region contains 1388 restaurants altogether. As you can probably guess, the OSM data is far from “perfect” in terms of the quality of the restaurant listings. This is due to the voluntary nature of adding information to the OpenStreetMap, and the fact restaurants (as well as other POI features) are highly dynamic by nature, i.e. new amenities open and close all the time, and it is challenging to keep up to date with those changes (this is a challenge even for commercial companies).

Joining data from buildings to the restaurants can be done easily using sjoin() function from geopandas:

# Join information from buildings to restaurants
join = gpd.sjoin(restaurants, buildings)

# Print column names
print(join.columns)

# Show rows
join
Index(['changeset_left', 'version_left', 'lon', 'lat', 'tags_left',
       'timestamp_left', 'id_left', 'addr:city_left', 'addr:country_left',
       'addr:housenumber_left', 'addr:housename_left', 'addr:postcode_left',
       'addr:street_left', 'email_left', 'name_left', 'opening_hours_left',
       'operator_left', 'phone_left', 'url_left', 'website_left',
       'amenity_left', 'internet_access_left', 'pub', 'source_left',
       'geometry', 'osm_type_left', 'index_right', 'addr:city_right',
       'addr:country_right', 'addr:housenumber_right', 'addr:housename_right',
       'addr:postcode_right', 'addr:street_right', 'email_right', 'name_right',
       'opening_hours_right', 'operator_right', 'phone_right', 'ref',
       'url_right', 'website_right', 'building', 'amenity_right',
       'building:levels', 'building:material', 'building:min_level', 'height',
       'landuse', 'office', 'shop', 'source_right', 'start_date', 'wikipedia',
       'id_right', 'timestamp_right', 'version_right', 'tags_right',
       'osm_type_right', 'internet_access_right', 'changeset_right'],
      dtype='object')
changeset_left version_left lon lat tags_left timestamp_left id_left addr:city_left addr:country_left addr:housenumber_left ... source_right start_date wikipedia id_right timestamp_right version_right tags_right osm_type_right internet_access_right changeset_right
0 0 11 24.952852 60.178001 {"smoking":"outside","seasonal:summer":"yes","... 1429855475 56418307 Helsinki None 3 ... None None None 17028575 1287484999 5 None way NaN NaN
1 0 10 24.944996 60.172112 {"diet:vegan":"yes","diet:vegetarian":"yes","w... 1553963252 59622323 Helsinki None 18 ... NaN None None 36737 1493288031 4 {"type":"multipolygon"} relation None 0.0
72 0 3 24.944563 60.172104 {"cuisine":"nepalese","wheelchair":"limited"} 1548278645 1369465630 Helsinki None 8 ... NaN None None 36737 1493288031 4 {"type":"multipolygon"} relation None 0.0
2 0 5 24.941545 60.176704 {"lunch":"yes","smoking":"outside"} 1429827254 59631978 Helsinki None 5 ... None before 1943 None 25891166 1522084241 7 None way NaN NaN
3 0 4 24.937647 60.171337 {"created_by":"Potlatch 0.4a","wheelchair":"yes"} 1493503123 62967659 Helsinki None 1 B ... None 1938-10 fi:Postitalo (Helsinki) 122595218 1544822461 7 {"name:sv":"Posthuset","wikidata":"Q3399855"} way NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
195 0 1 24.942354 60.168209 {"cuisine":"sushi","level":"0"} 1541783848 6049453046 Helsinki None 52 ... None 1930 fi:Stockmannin Helsingin keskustan tavaratalo 122595241 1540393287 21 {"branch":"Kluuvi","loc_name":"Stokka","name:f... way NaN NaN
196 0 3 24.942410 60.167786 {"cuisine":"salad","diet:vegan":"yes","diet:ve... 1552250378 6049453047 Helsinki None 41 ... None 1930 fi:Stockmannin Helsingin keskustan tavaratalo 122595241 1540393287 21 {"branch":"Kluuvi","loc_name":"Stokka","name:f... way NaN NaN
197 0 3 24.945948 60.166965 {"cuisine":"japanese;okinawan","diet:vegan":"y... 1551713507 6054365876 Helsinki None 47 ... None None None 22463165 1312140630 5 None way NaN NaN
200 0 2 24.937914 60.173843 {"level":"0"} 1547974382 6138893749 None None None ... None None fi:Helsingin keskustakirjasto Oodi 596937289 1552126135 8 {"name:en":"Helsinki Central Library Oodi","na... way NaN NaN
202 0 2 24.938280 60.168777 {"cuisine":"salad","diet:vegan":"yes","diet:ve... 1550688597 6139262264 Helsinki FI 14-20 ... None None None 289767507 1547388397 2 None way NaN NaN

210 rows × 60 columns

# Visualize the data as well
join.plot()
<Axes: >
../../_images/aa8c10f0b85640ae57b34412b7cd0454fd145682081fd7519ac39330d7e22e8d.png

As we can see from the above, now we have merged information from the buildings to restaurants. The geometries of the left GeoDataFrame, i.e. restaurants were kept by default as the geometries.

Selecting data using sjoin#

One handy trick and efficient trick for spatial join is to use it for selecting data. We can e.g. select all buildings that intersect with restaurants by conducting the spatial join other way around, i.e. using the buildings as the left GeoDataFrame and the restaurants as the right GeoDataFrame:

# Merge information from restaurants to buildings (conducts selection at the same time)
join2 = gpd.sjoin(buildings, restaurants, how="inner", op="intersects")
join2.plot()
/home/hentenka/.conda/envs/mamba/envs/sm-test2/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3448: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead.
  if await self.run_code(code, result, async_=asy):
<Axes: >
../../_images/f3d15229c7cf5939a94ca45ce70f3cbaa93eda226c9b8022e3abcb6ce826a91f.png

As we can see (although the small building geometries are a bit poorly visible), the end result is a layer of buildings which intersected with the restaurants. This is a straightforward way to conduct simple spatial queries. You can specify with op parameter whether the binary predicate between the layers (i.e. the spatial relation between geometries) should be:

  • intersects

  • contains

  • within

Plotting data with matplotlib#

Thus far, we haven’t really made any effort to make our maps visually appealing. Let’s next see how we can adjust the appearance of our map, and how we can visualize many layers on top of each other. Let’s start by visualizing the buildings that we selected earlier and adjust a bit of the colors and figuresize. We can adjust the color of polygons with facecolor parameter and the figure size with figsize parameter that accepts a tuple of width and height as an argument:

ax = join2.plot(facecolor="red", figsize=(12,12))
../../_images/c5b6003fd5fd7217f0ab26afdc0d951b0878632ff4afc5c12a54f777fb8f726a.png
join2.columns
Index(['addr:city_left', 'addr:country_left', 'addr:housenumber_left',
       'addr:housename_left', 'addr:postcode_left', 'addr:street_left',
       'email_left', 'name_left', 'opening_hours_left', 'operator_left',
       'phone_left', 'ref', 'url_left', 'website_left', 'building',
       'amenity_left', 'building:levels', 'building:material',
       'building:min_level', 'height', 'landuse', 'office', 'shop',
       'source_left', 'start_date', 'wikipedia', 'id_left', 'timestamp_left',
       'version_left', 'tags_left', 'osm_type_left', 'geometry',
       'internet_access_left', 'changeset_left', 'index_right',
       'changeset_right', 'version_right', 'lon', 'lat', 'tags_right',
       'timestamp_right', 'id_right', 'addr:city_right', 'addr:country_right',
       'addr:housenumber_right', 'addr:housename_right', 'addr:postcode_right',
       'addr:street_right', 'email_right', 'name_right', 'opening_hours_right',
       'operator_right', 'phone_right', 'url_right', 'website_right',
       'amenity_right', 'internet_access_right', 'pub', 'source_right',
       'osm_type_right'],
      dtype='object')

Now with the bigger figure size, it is already a bit easier to see the selected buildings that have a restaurant inside them (according OSM). Let’s color our buildings based on the building type. Hence, each building type category will receive a different color:

ax = join2.plot(column="building", cmap="RdYlBu", figsize=(12,12), legend=True)
../../_images/b247e1cc729155bfdbc9acd285d73844616301ca7f79ac6a0091866bae08723b.png

Now we used the parameter column to specify the attribute that is used to specify the color for each building (can be categorical or continuous). We used cmap to specify the colormap for the categories and we added the legend by specifying legend=True.

To get a bit more context to our visualizaton. Let’s also add roads with our buildings. To do that we first need to extract the roads from OSM:

# Get roads (retrieves walkable roads by default)
roads = osm.get_network()

Now we can continue and add the roads as a layer to our visualization with gray line color:

# Plot the map again
ax = join2.plot(column="building", cmap="RdYlBu", figsize=(12,12), legend=True)

# Plot the roads into the same axis
ax = roads.plot(ax=ax, edgecolor="gray", linewidth=0.75)
../../_images/bcefcd9c3b46671537dbeeefa8fa6db9548b608ed6e20bb21db5ac6c472734ea.png

Perfect! Now it is much easier to understand our map because the roads brought much more context (assuming you know Helsinki). We ware able to add the roads to the same map by specifying the ax parameter to point to the axis that we received when first plotting the join2 (i.e. selected buildings). In a similar manner, you can add as many layers in your map as you wish. Let’s still do a small visual trick and specify that the background color in our map is black instead of white. This can be done easily by changing the style of matplotlib visualization renderer:

# Import matplotlib pyplot and use a dark_background theme
import matplotlib.pyplot as plt
plt.style.use("dark_background")

# Plot the map again
ax = join2.plot(column="building", cmap="RdYlBu", figsize=(12,12), legend=True)

# Plot the roads into the same axis
ax = roads.plot(ax=ax, edgecolor="gray", linewidth=0.75)
../../_images/56a669f031e4839ac930fd8f7329c36d076fc8a2cadc7696ad6a565fe458400a.png

Awesome! Now we have a nice dark theme with our map. With this information you should be able to get going with Exercise 1.

Further information#

For further information, we recommend checking the materials from Automating GIS Processes -course (GIS things) and Geo-Python -course (intro to Python and data analysis with pandas). In addition, we always recommend to check the latest documentation from the websites of the libraries: