Pathfinder Phase Workshop: Finding Data

Description & purpose: This Notebook is designed to showcase the functionality of the Earth Observation Data Hub (EODH) as the project approaches the end of the Pathfinder Phase. It provides a snapshot of the Hub, the pyeodh API client and the various datasets as of February 2025.

Author(s): Alastair Graham, Dusan Figala

Date created: 2025-02-18

Date last modified: 2025-02-20

Licence: This notebook is licensed under Creative Commons Attribution-ShareAlike 4.0 International. The code is released using the BSD-2-Clause license.

Copyright (c) , All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Visual data discovery

STAC Browser

The first thing to do is find some data. Initially we will be using the current Catalogue User Interface which is an implementation of the STAC Browser (Note: a replacement user interface is in development, as demonstrated in the Workshop). When visiting the Catalogue you should see a page not too dissimilar to the following screenshot.

Click on the Search button in the top left of the page (next to Browse) and start to enter some details under search for items. We will look for data between 12 Nov and 19 Nov 2023, in the Thetford area, in the sentinel2_ard collection. The details are shown in the next screenshot.

Once you click Submit the the search should return a series of image items to the right of the page, as shown below.

We are looking for the following item (S2A_20231117_latn527lone0008_T30UYD_ORB137_20231117131218_utm30n_osgb) and the reference will be as shown here:

Click on the relevant item to find the assets within it. There are a number of assets (data layers, metadata, thumbnail etc.) within the item. Take some time to investigate what exists. The two we are interested in here are thumbnail and cog (the cog holds the image data). The image below shows how to copy the URL to the COG data: either using the button on the left or copying the path in the text box on the right.

Check that you have found the datset we are interested in:

  • Thumbnail: https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0008_T30UYD_ORB137_20231117131218_utm30n_osgb_vmsk_sharp_rad_srefdem_stdsref_thumbnail.jpg (you can open this in a web browser and it should look like the image below)
  • Dataset: https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0008_T30UYD_ORB137_20231117131218_utm30n_osgb_vmsk_sharp_rad_srefdem_stdsref.tif

Take some time to click around the listed datasets to see what is included and accessible.

Note that not all collections contain accessible items.

Coded data discovery

There are a number of API endpoints that are exposed by the EODH. Oxidian have developed a Python API Client, pyeodh, that makes the Hub’s API endpoints available to Python users. pyeodh is available on PyPi (https://pypi.org/project/pyeodh/) and can be installed using pip. Documentation for the API Client is available at: https://pyeodh.readthedocs.io/en/latest/api.html

We will use pyeodh throughout this workshop.

Presentation set up

The following cell only needs to be run on the EODH AppHub. If you have a local Python environment running, please install the required packages as you would normally e.g. using mamba, poetry etc.

# If needed you can install a package in the current AppHub Jupyter environment using pip
# For instance, we will need at least the following libraries

import sys

!{sys.executable} -m pip install --upgrade pyeodh geopandas shapely matplotlib numpy pillow folium
# Imports
import pyeodh

import shapely as sh
import geopandas as gpd
import folium

import urllib.request
from requests.exceptions import HTTPError

from PIL import Image
from io import BytesIO

Having imported the necessary libraries the next task is to set up the locations of the areas of interest. Having created the AOI points the user needs to connect to the Resource Catalogue so that they can start to find some data.

# Areas of Interest
thet_pnt = sh.Point(0.6715892933273722, 52.414471075812315)  # a site near Thetford
# Optional cell
# If you want to see these points on a map run this cell
# You may need to run the notebook through a service such as nbviewer: https://nbviewer.org/

# Create a map (m) centered on the point
center_lat = thet_pnt.y
center_lon = thet_pnt.x

m = folium.Map(location=[center_lat, center_lon], zoom_start=10)

# Add markers for the point
folium.Marker(
    [thet_pnt.y, thet_pnt.x], popup="Thetford Site", icon=folium.Icon(color="green")
).add_to(m)

# Step 4: Display the map
m
Make this Notebook Trusted to load map: File -> Trust Notebook
# Connect to the Hub
# base_url can be changed to optionally specify a different server, such as test.eodatahub

client = pyeodh.Client(base_url="https://eodatahub.org.uk").get_catalog_service()
# Print a list of the collections held in the Resource Catalogue (their id and description).
# As the Resource Catalogue fills and development continues, the number of collections and the richness of their descriptions will increase

for index, collect in enumerate(client.get_collections(), start=1):
    print(f"{index} -- {collect.id}: {collect.description}")
1 -- ukcp: Regional climate model projections produced as part of the UK Climate Projection 2018 (UKCP18) project. The data produced by the Met Office Hadley Centre provides information on changes in climate for the UK until 2080, downscaled to a high resolution (12km), helping to inform adaptation to a changing climate. The projections cover Europe and a 100 year period, 1981-2080, for a high emissions scenario, RCP8.5. Each projection provides an example of climate variability in a changing climate, which is consistent across climate variables at different times and spatial locations. This dataset contains 12km data for the United Kingdom, the Isle of Man and the Channel Islands provided on the Ordnance Survey's British National Grid.
2 -- sentinel2_ard: These data have been created by the Department for Environment, Food and Rural Affairs (Defra) and Joint Nature Conservation Committee (JNCC) in order to cost-effectively provide high quality, Analysis Ready Data (ARD) for a wide range of applications. The dataset contains modified Copernicus Sentinel-2 (Level 1C data processed into a surface reflectance product using ARCSI software (Level 2)).
3 -- sentinel1: This dataset contains level 1 Interferometric Wide swath (IW) Single Look Complex (SLC) C-band Synthetic Aperture Radar (SAR) data from the European Space Agency (ESA) Sentinel 1 series satellites. Sentinel 1 satellites provide continuous all-weather, day and night imaging radar data. The IW mode is the main operational mode. The IW mode supports single (HH or VV) and dual (HH+HV or VV+VH) polarisation.
4 -- land_cover: As part of the ESA Land Cover Climate Change Initiative (CCI) project a new set of Global Land Cover Maps have been produced. These maps are available at 300m spatial resolution for each year between 1992 and 2015. Each pixel value corresponds to the classification of a land cover class defined based on the UN Land Cover Classification System (LCCS). The reliability of the classifications made are documented by the four quality flags (decribed further in the Product User Guide) that accompany these maps. Data are provided in both NetCDF and GeoTiff format.
5 -- eocis-sst-cdrv3-climatology: ESA SST CCI Climatology v3.0
6 -- eocis-sst-cdrv3: This dataset provides daily estimates of global sea surface temperature (SST) based on observations from multiple satellite sensors. Resolution: 5km.  Available from 1980 onwards.
7 -- eocis-lst-s3b-night: This collection contains datasets of level L3C global land surface temperature from the SLSTR sensor on board Sentinel 3B observed daily during nighttime.   The collection is available from 2018-11-17.
8 -- eocis-lst-s3b-day: This collection contains datasets of level L3C global land surface temperature from the SLSTR sensor on board Sentinel 3B observed daily during daytime.   The collection is available from 2018-11-17.
9 -- eocis-lst-s3a-night: This collection contains datasets of level L3C global land surface temperature from the SLSTR sensor on board Sentinel 3A observed daily during nighttime.   The collection is available from 2016-05-01.
10 -- eocis-lst-s3a-day: This collection contains datasets of level L3C global land surface temperature from the SLSTR sensor on board Sentinel 3A observed daily during daytime.   The collection is available from 2016-05-01.

The dataset that we are interested in for the purposes of this workshop is sentinel2_ard. As seen from the output from the previous cell, we can see that the description of the dataset is as follows:

These data have been created by the Department for Environment, Food and Rural Affairs (Defra) and Joint Nature Conservation Committee (JNCC) in order to cost-effectively provide high quality, Analysis Ready Data (ARD) for a wide range of applications. The dataset contains modified Copernicus Sentinel-2 (Level 1C data processed into a surface reflectance product using ARCSI software (Level 2)).

# The next thing to do is find some open data
# For this workshop we want to find Sentinel-2 analysis ready (ARD) imagery near Thetford

# First we just want to understand the timespan of the dataset which is reported from the STAC collection record
sentinel2_ard = client.get_catalog(
    "public/catalogs/ceda-stac-catalogue"
).get_collection("sentinel2_ard")
sentinel2_ard.get_items()

print(
    "DATASET TEMPORAL EXTENT: ",
    [str(d) for d in sentinel2_ard.extent.temporal.intervals[0]],
)
DATASET TEMPORAL EXTENT:  ['2022-01-02 11:35:01+00:00', '2025-01-21 11:33:19+00:00']
# Now we want to access the first few items and see what they are called, when the image was collected and how much cloud there is

lim = 10

for i, item in enumerate(sentinel2_ard.get_items()):
    if i >= lim:
        break
    print(item.id, item.properties["datetime"], item.properties["eo:cloud_cover"])
neodc.sentinel_ard.data.sentinel_2.2023.11.21.S2B_20231121_latn536lonw0052_T30UUE_ORB123_20231121122846_utm30n_TM65 2023-11-21T11:43:49Z 67.568010963291
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn563lonw0037_T30VVH_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 17.320411981252
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn546lonw0037_T30UVF_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 37.725362031379
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn536lonw0007_T30UXE_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 20.866700948979
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn528lonw0022_T30UWD_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 44.517572934396
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn527lonw0007_T30UXD_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 5.880352134231
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn519lonw0037_T30UVC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 61.069157036639
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn519lonw0022_T30UWC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 19.243094747529
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn518lonw0008_T30UXC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 21.15911075158
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn510lonw0036_T30UVB_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51Z 31.848935471349

The previous cell shows us that we are able to access Sentinel 2 ARD data and find out a number of bits of information about the item. If you are interested in seeing what other information is accessible, have a look at the

  • collection endpoint: https://staging.eodatahub.org.uk/api/catalogue/stac/catalogs/supported-datasets/catalogs/ceda-stac-catalogue/collections/sentinel2_ard
  • items endpoint: https://staging.eodatahub.org.uk/api/catalogue/stac/catalogs/supported-datasets/catalogs/ceda-stac-catalogue/collections/sentinel2_ard/items
# To find specific imagery for the Thetford site we need to add the intersects parameter. We set this to be our AOI point.

items = client.search(
    collections=["sentinel2_ard"],
    catalog_paths=["supported-datasets/catalogs/ceda-stac-catalogue"],
    intersects=thet_pnt,
    query=[
        "start_datetime>=2023-11-01",
        "end_datetime<=2023-11-30",
    ],
)

# We can then count the number of items returned by the search
# print('Number of items found: ', items.total_count)

total_items = sum(1 for _ in items)
print(f"Total items: {total_items}")
Total items: 3
# We can print out the item names so that we understand which images we are looking at

for item in items:
    print(f"Item ID: {item.id}")
Item ID: neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb
Item ID: neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn527lone0008_T30UYD_ORB137_20231117131218_utm30n_osgb
Item ID: neodc.sentinel_ard.data.sentinel_2.2023.11.07.S2A_20231107_latn527lone0009_T31UCU_ORB137_20231107131225_utm31n_osgb

Once we have found the intersecting images and know their names we can choose the image item that we are interested. For the purposes of this exercise this is T31UCU. Now we need to know what assets are held for that item. The following code prints out all the STAC information linked to that item.

for item in items[:1]:  # Process only the first item
    print(f"Item ID: {item.id}")
    print("Assets:")

    if not item.assets:
        print("  No assets available.")
    else:
        for asset_key, asset in item.assets.items():
            print(
                f"  - {asset_key}: {asset.to_dict()}"
            )  # Convert asset to dict for readable output
            print("-" * 40)  # Separator for better readability
Item ID: neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb
Assets:
  - cloud: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_clouds.tif', 'type': 'image/tiff; application=geotiff', 'size': 3565297, 'location': 'on_disk', 'roles': ['data']}
----------------------------------------
  - cloud_probability: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_clouds_prob.tif', 'type': 'image/tiff; application=geotiff', 'size': 85969313, 'location': 'on_disk', 'roles': ['data']}
----------------------------------------
  - metadata: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_vmsk_sharp_rad_srefdem_stdsref_meta.xml', 'type': 'application/xml', 'size': 18461, 'location': 'on_disk', 'roles': ['metadata']}
----------------------------------------
  - thumbnail: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_vmsk_sharp_rad_srefdem_stdsref_thumbnail.jpg', 'type': 'image/jpeg', 'size': 107743, 'location': 'on_disk', 'roles': ['thumbnail']}
----------------------------------------
  - topographic_shadow: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_toposhad.tif', 'type': 'image/tiff; application=geotiff', 'size': 256526, 'location': 'on_disk', 'roles': ['data']}
----------------------------------------
  - cog: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_vmsk_sharp_rad_srefdem_stdsref.tif', 'type': 'image/tiff; application=geotiff; profile=cloud-optimized', 'size': 1906497347, 'eo:bands': [{'eo: full_width_half_max': 0.07, 'name': 'B02', 'eo:central_wavelength': 496.6, 'description': 'Blue', 'eo:common_name': 'blue'}, {'eo: full_width_half_max': 0.04, 'name': 'B03', 'eo:central_wavelength': 560, 'description': 'Green', 'eo:common_name': 'green'}, {'eo: full_width_half_max': 0.03, 'name': 'B04', 'eo:central_wavelength': 664.5, 'description': 'Red', 'eo:common_name': 'red'}, {'eo: full_width_half_max': 0.02, 'name': 'B05', 'eo:central_wavelength': 703.9, 'description': 'Visible and Near Infrared', 'eo:common_name': 'rededge'}, {'eo: full_width_half_max': 0.02, 'name': 'B06', 'eo:central_wavelength': 740.2, 'description': 'Visible and Near Infrared', 'eo:common_name': 'rededge'}, {'eo: full_width_half_max': 0.02, 'name': 'B07', 'eo:central_wavelength': 782.5, 'description': 'Visible and Near Infrared', 'eo:common_name': 'rededge'}, {'eo: full_width_half_max': 0.11, 'name': 'B08', 'eo:central_wavelength': 835.1, 'description': 'Visible and Near Infrared', 'eo:common_name': 'nir'}, {'eo: full_width_half_max': 0.02, 'name': 'B08a', 'eo:central_wavelength': 864.8, 'description': 'Visible and Near Infrared', 'eo:common_name': 'nir08'}, {'eo: full_width_half_max': 0.09, 'name': 'B11', 'eo:central_wavelength': 1613.7, 'description': 'Short Wave Infrared', 'eo:common_name': 'swir16'}, {'eo: full_width_half_max': 0.18, 'name': 'B12', 'eo:central_wavelength': 2202.4, 'description': 'Short Wave Infrared', 'eo:common_name': 'swir22'}], 'location': 'on_disk', 'roles': ['data']}
----------------------------------------
  - valid_pixels: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_valid.tif', 'type': 'image/tiff; application=geotiff', 'size': 345896, 'location': 'on_disk', 'roles': ['data']}
----------------------------------------
  - saturated_pixels: {'href': 'https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_sat.tif', 'type': 'image/tiff; application=geotiff', 'size': 1957075, 'location': 'on_disk', 'roles': ['data']}
----------------------------------------

From this we can see that there is a thumbnail. Before we do anything with the full image data or associated assets it would be good to understand the image. We can extract the URL to the thumbnail and view the attached image file.

tn_url = None

for item in items[:1]:  # Process only the first item
    # print(f"Item ID: {item.id}")
    # print("Assets:")

    if not item.assets:
        print("  No assets available.")
    else:
        for asset_key, asset in item.assets.items():
            # print(f"  - {asset_key}: {asset.to_dict()}")  # Convert asset to dict for readable output
            if asset_key == "thumbnail":
                tn_url = asset.href  # Directly access the href attribute

    # print("-" * 40)  # Separator for better readability
print(tn_url)
https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lone0009_T31UCU_ORB137_20231117131218_utm31n_osgb_vmsk_sharp_rad_srefdem_stdsref_thumbnail.jpg
# We can use this information to view the image thumbnail

# Here we open the remote URL, read the data and dislay the thumbnail
with urllib.request.urlopen(tn_url) as url:
    img = Image.open(BytesIO(url.read()))

display(img)

This shows that we can relatively easily interrogate the Resource Catalogue and filter the results so that we can find the data we require in the EODH. With a bit of tweaking of the code the user could also generate a list of assets and accompanying URLs to the datasets (for this and other datasets).

Now we want to see what commercial data exists.

# Find some Airbus commercial data: SPOT

count = 0
lim = 15

for i in client.search(
    catalog_paths=["supported-datasets/catalogs/airbus"],
    collections=["airbus_spot_data"],
):
    if count >= lim:
        break
    print(i.id)
    count += 1
DS_SPOT6_202502191331447_FR1_FR1_SV1_SV1_W067S56_01140
DS_SPOT6_202502191331277_FR1_FR1_SV1_SV1_W068S56_02033
DS_SPOT6_202502191330576_FR1_FR1_SV1_SV1_W067S55_04550
DS_SPOT6_202502191330431_FR1_FR1_SV1_SV1_W065S55_01140
DS_SPOT6_202502191330174_FR1_FR1_SV1_SV1_W066S55_03414
DS_SPOT6_202502191330009_FR1_FR1_SV1_SV1_W064S55_01546
DS_SPOT6_202502191321372_FR1_FR1_SV1_SV1_W052S21_01709
DS_SPOT6_202502191321230_FR1_FR1_SV1_SV1_W052S19_01627
DS_SPOT6_202502191320590_FR1_FR1_SV1_SV1_W053S19_03251
DS_SPOT6_202502191320440_FR1_FR1_SV1_SV1_W053S18_01871
DS_SPOT6_202502191320167_FR1_FR1_SV1_SV1_W052S18_03332
DS_SPOT6_202502191319530_FR1_FR1_SV1_SV1_W052S18_03170
DS_SPOT6_202502191319384_FR1_FR1_SV1_SV1_W052S17_01709
DS_SPOT6_202502191319247_FR1_FR1_SV1_SV1_W052S16_01709
DS_SPOT6_202502191319100_FR1_FR1_SV1_SV1_W052S16_01627
# Find some Planet commercial data: Planetscope Scene
count = 0
try:
    cat = client.get_catalog("supported-datasets/catalogs/planet")
    for i in cat.search(collections=["PSScene"]):
        if count >= lim:
            break
        print(i.id)
        count += 1
except HTTPError as e:
    if "422 Client Error: Unprocessable Entity" in str(e):
        print("Skipping E422")
    else:
        raise  # Re-raise other errors
20250303_141931_05_251a
20250303_113803_67_251e
20250303_084737_77_2532
20250303_083846_10_2532
20250303_084642_25_2532
20250303_084115_56_2532
20250303_084230_29_2532
20250303_084707_87_2532
20250303_083958_69_2532
20250303_084525_38_2532
20250303_084707_87_2532
20250303_084115_56_2532
20250303_084737_77_2532
20250303_083939_48_2532
20250303_084022_18_2532

The final step would be to use the ordering service integrated into the EODH resource catalogue to purchase the required commercial imagery. This would be stored in a users workspace and could then be used in specific workflows or for data analytics (depending on licence restrictions).

For the purposes of this workshop we looked at the different commercial datasets offline in QGIS.