Consisting of open data from OpenStreetMap, Meta, Esri, Microsoft, Google, and more, Overture Maps data is conflated and converted to a consistent schema before being distributed as geoparquet files in the cloud. This workshop will explore the advantages of GeoParquet and cloud-native geospatial technologies for researchers working with the data both locally and in the cloud.
| Name | Description |
|---|---|
| Overture Explore Page | Easiest place to get an overview of Overture data in an X-Ray map view |
| Overture Documentation | Schema definition and examples of how to access and work with Overture data |
| Fused.io | A new cloud-based analytics platform with User Defined Functions |
| DuckDB | An fast in-process database system for analytics and data manipulation |
The Overture Maps Foundation is an open data project within the Linux Foundation that aims to “Power current and next-generation map products by creating reliable, easy-to-use, and interoperable open map data.”
Primarily, “Overture is for developers who build map services or use geospatial data.” Additionally, Overture is a fantastic resource for researchers looking to work with one of the most complete and computationally efficient open geospatial datasets.
The explore page lets you inspect the properties of each feature and links out to the overture schema: docs.overturemaps.org/schema where you can learn more about the attributes available for each theme.
The explore page helps us get an overview of what’s in Overture by rendering pre-processed PMTiles archives on a web map. Next, we’ll look at the different ways we can interact with Overture data in the raw, Geoparquet format.
Fused is a new analytical platform with powerful capabilities to read and visualize geoparquet right in your browser. The Fused workbench allows you to run any number of public User Defined Functions, or UDFs.
Zoom all the way out to see the spatial partitioning:
Now that we’ve seen what’s in Overture data, can we combine (or fuse) our Overture data with another dataset?
join with NSI parameter in this UDF. Toggle this parameter and have a look around the map at a few different places. For example, here are buildings in Fargo, North Dakota:After getting buildings from Overture, this UDF queries the National Structures Inventory for information about the various buildings. The NSI returns point geometries, which are joined to our Overture buildings with a spatial join in GeoPandas:
join = gdf_overture.sjoin(gdf, how='left')
Next, if Overture does not have height information for a given building, we calculate a height based on the number of stories from the NSI.
join["metric"] = join.apply(lambda row: row.height if pd.notnull(row.height) else row.num_story*3, axis=1)
Next, we’ll turn to our local machines and look at ways to interact with Overture data from our local environment.
Since the data is hosted in the cloud as GeoParquet files, we can access it via DuckDB, which can take advantage of this cloud-native format.
First, Install DuckDB version >= 1.1.1
Next, you’ll need a A GIS environment of your choice. Both QGIS and Esri ArcMap or similar should work. To load geoparquet directly into QGIS, you will need a version of QGIS with the latest GDAL. Alternatively, most of this workshop can be visualized with kepler.gl
If you do not want to install DuckDB locally, you can sign up for MotherDuck, a cloud-based DuckDB, however, you will not be able to use the COPY TO commands nor the h3 extension.
Tip: When launching DuckDB, specify a persistent DB, such as duckdb my_db.duckdb. This way if you create tables, you can access them later.
Obtain a bounding box of interest (https://boundingbox.klokantech.com) is a great tool for creating a bounding box. Specifically, it lets you copy the coordinates in the following format (DublinCore) which is very human-readable. Here is a bounding box for Belém:
westlimit=-48.510183;
southlimit=-1.481205;
eastlimit=-48.387434;
northlimit=-1.385577
(I recommend a smaller bounding box, like just a small city or neighborhood for now so you’re not working with a lot of data in the example).
A basic places query looks like this:
SELECT
id,
names.primary as name,
confidence,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE
bbox.xmin BETWEEN X_WEST AND X_EAST
AND bbox.ymin BETWEEN Y_SOUTH AND Y_NORTH
LIMIT 10;
Update the query with the proper values in the WHERE clause for X and Y from your bounding box. Remember, east/west = longitude = X and north/south = latitude = Y.
Paste your query into DuckDB and run it.
You should see something similar to this:
┌──────────────────────────────────┬───────────────────────────────────────────┬─────────────────────┬────────────────────────────────┐
│ id │ name │ confidence │ geometry │
│ varchar │ varchar │ double │ geometry │
├──────────────────────────────────┼───────────────────────────────────────────┼─────────────────────┼────────────────────────────────┤
│ 08f2b81b7170b90803a1b4376438169b │ Hôtel de ville de Senneville │ 0.9544565521095278 │ POINT (-73.9600016 45.4136658) │
│ 08f2b81b7171c0db037e3818c87b6c02 │ charles rivers │ 0.30856423173803527 │ POINT (-73.96118 45.41467) │
│ 08f2b81b714f2a2603f85a7ea76652ac │ Club De Voile Senneville │ 0.5592783505154639 │ POINT (-73.9685221 45.4187574) │
│ 08f2b81b714ad4e5037c634d2494f50e │ Vignoble Souffle de Vie │ 0.77 │ POINT (-73.9676663 45.4201718) │
│ 08f2b81b71583049039f215b4d3b9a7b │ Souffle de Vie Vineyard │ 0.9567577686259828 │ POINT (-73.9678019 45.4228501) │
│ 08f2b81b7158c219034200ed79cfbc13 │ Tenaquip Limited │ 0.9826508620689655 │ POINT (-73.9657541 45.4229399) │
│ 08f2b81b7152e4140398b266e3eab10b │ Cimetière et Complexe Funéraire Belvédère │ 0.9797443181818182 │ POINT (-73.9618394 45.4233901) │
│ 08f2b81b703aa099031e13858e363b78 │ Les Écuries de Senneville | Senneville QC │ 0.9537408699085346 │ POINT (-73.9693036 45.4291791) │
│ 08f2b81b700551a2039ae5eda64366c7 │ Ferme GUSH Farm │ 0.9213349225268176 │ POINT (-73.96907 45.43491) │
│ 08f2b81b70b90148038bf5f719a95574 │ Braeside Golf Club │ 0.9588815789473685 │ POINT (-73.9626141 45.4377414) │
├──────────────────────────────────┴───────────────────────────────────────────┴─────────────────────┴────────────────────────────────┤
│ 10 rows 4 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Notice the type of the geometry column is geometry. DuckDB recognizes the geo metadata in the source parquet files and automatically converts the column to a geometry type.
Consult the places schema to learn more about which columns can be accessed and their data types.
spatial extension to convert to common spatial data formatsspatial extension is installed: install spatial;.Load the spatial extension with load spatial;.
Now we can use our same query again, but this time we add the COPY TO command to write GeoJSON. We can also remove the LIMIT argument. The complete query looks like this:
INSTALL spatial;
LOAD spatial;
COPY(
SELECT
id,
names.primary as name,
confidence,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
WHERE
bbox.xmin BETWEEN -73.974157 AND -73.474295
AND bbox.ymin BETWEEN 45.410076 AND 45.70479
) TO 'montreal.geojson' WITH (FORMAT GDAL, DRIVER GeoJSON);
Now open that GeoJSON file in your preferred GIS environment to inspect the attributes (I recommend dragging-and-dropping the result directly into kepler.gl for fast visualization.)
categories.primary as category, to the query to get the category for each place.Overture contains more than 2B building footprints. See the buildings data guide for more information on how Overture constructs the buildings theme. Attempting to download them all to our local machine will be difficult. However, we can extract only a small subset of the buildings with a query:
Overture data is available both on Amazon S3 and Microsoft Azure Blob Storage. In this example, we’ll use the data from Azure:
INSTALL azure;
LOAD azure;
SET azure_storage_connection_string = 'DefaultEndpointsProtocol=https;AccountName=overturemapswestus2;AccountKey=;EndpointSuffix=core.windows.net';
LOAD spatial;
COPY(
SELECT
id,
names.primary as primary_name,
height,
sources[1].dataset AS primary_source,
sources[1].record_id AS source_id,
geometry
FROM read_parquet('azure://release/2024-10-23.0/theme=buildings/type=building/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin BETWEEN -122.352055 AND -122.316697
AND bbox.ymin BETWEEN 47.593064 AND 47.619655
) TO 'seattle_buildings.geojson' WITH (FORMAT GDAL, DRIVER GeoJSON);
Update that bounding box for anywhere else in the world, and you instantly have a global building database at your finger tips.
How about some spatial statistics? If we don’t want to first download all 566,806 buildings in our Montréal bounding box, we can bring our statistics directly into our query. First, we’ll install h3 extension for spatial aggregation by h3 hexagon.
INSTALL h3 FROM community;
LOAD h3;
COPY(
SELECT
h3_latlng_to_cell_string(ST_Y(ST_CENTROID(geometry)), ST_X(ST_CENTROID(geometry)), 10) as h3,
count(1) as _count
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=buildings/type=building/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin BETWEEN -73.974157 AND -73.474295
AND bbox.ymin BETWEEN 45.410076 AND 45.70479
GROUP BY h3_latlng_to_cell_string(ST_Y(ST_CENTROID(geometry)), ST_X(ST_CENTROID(geometry)), 10)
) TO 'montreal_buildings_h3.csv';
Load that image into Kepler.gl, and we have building densities by h3 resolution 10 cell across Montreal.
The transportation theme has 2 types of data, connectors and segments.
Let’s start by looking at the segments in Paris:
COPY(
SELECT
id,
names.primary as name,
class,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=transportation/type=segment/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin > 2.276
AND bbox.ymin > 48.865
AND bbox.xmax < 2.314
AND bbox.ymax < 48.882
) TO 'paris_roads.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');
Looks very similar to what we were seeing on the Explore map, but this time we’re working with the raw data:
Connectors are a decent proxy of road network complexity and density. First we’ll download a bunch of connectors to a local parquet file:
LOAD h3;
COPY(
SELECT
h3_latlng_to_cell_string(ST_Y(geometry), ST_X(geometry), 8) as h3,
id,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=transportation/type=connector/*', filename=true, hive_partitioning=1)
WHERE bbox.xmin > 8.82
AND bbox.ymin > 48.5
AND bbox.xmax < 13.36
AND bbox.ymax < 50.39
) TO connectors.parquet;
Now aggregate by h3 cell:
COPY(
SELECT
h3,
count(id)
FROM connectors.parquet
GROUP BY
h3
) TO 'connectors_h3.csv';
Bring that CSV into Kepler.gl again, and we have a nice visualization of road network density in Germany:
What is the base theme? Mostly landuse, infrastructure, and water features from OpenStreetMap that have been converted into a rigic schema depending on their initial tag values. For example:
Query Overture for all of the mountain peaks in North America:
SET s3_region='us-west-2';
COPY(
SELECT
id,
names.primary as name,
elevation,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-10-23.0/theme=base/type=land/*', filename=true, hive_partitioning=1)
WHERE subtype = 'physical' AND class IN ('peak','volcano') AND elevation IS NOT NULL
AND bbox.xmin BETWEEN -175 AND -48
AND bbox.ymin BETWEEN 10 AND 85
) TO 'north_american_peaks.parquet';
This query downloads about 1.1M peaks and the distribution looks like this:
We can build an h3-gridded DEM for North American high points from this file:
COPY(
SELECT
h3_latlng_to_cell_string(ST_Y(ST_CENTROID(geometry)), ST_X(ST_CENTROID(geometry)), 6) as h3,
max(elevation) as _max,
min(elevation) as _min,
avg(elevation) as _avg
FROM north_american_peaks.parquet
GROUP BY h3_latlng_to_cell_string(ST_Y(ST_CENTROID(geometry)), ST_X(ST_CENTROID(geometry)), 6)
) TO 'na_peaks_h3.csv';
What other types of features from OSM are you interested in exploring? The logic for how features convert from OSM to Overture is here: https://docs.overturemaps.org/schema/concepts/by-theme/base/
Now that we’ve worked with the data locally, let’s go back to the cloud. Since our data lives there, let’s bring our analysis to the data, not the other way around.