Demonstration for DEFRA

Description & purpose: This Notebook is designed to showcase the initial functionality of the Earth Observation Data Hub. It provides a snapshot of the Hub, the pyeodh API client and the various datasets as of September 2024. The Notebook “user” would like to understand more about the satellite data available for their test areas. This user is also interested in obtaining a series of smaller images and ultimately creating a data cube. The Notebook series (of 3) is designed in such a way that it can be run on the EODH AppHub (Notebook Service) or from a local environment.

Author(s): Alastair Graham, Dusan Figala, Phil Kershaw

Date created: 2024-09-05

Date last modified: 2024-09-18

Licence: This notebook is licensed under Creative Commons Attribution-ShareAlike 4.0 International. The code is released using the BSD-2-Clause license.

Copyright (c) , All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Links: * Oxidian: https://www.oxidian.com/ * CEDA: https://www.ceda.ac.uk/ * EO Data Hub: https://eodatahub.org.uk/

What is the EODH?

The Earth Observation Data Hub is:

“A UK Pathfinder project delivering access to Earth Observation (EO) data for effective decisionmaking across government, business and academia. The Earth Observation DataHub (EODH) brings together an expert project delivery team and industrial partners in an ambitious project… Users of the Hub will be able to explore areas of interest in the UK and across the globe… It will also enable selected users to support their own analyses, services and tools using the Hub’s workflow and compute environments.”

More details can be found online at https://eodatahub.org.uk/

Components of the Hub include:

  • A Resource Catalogue - a STAC compliant catalogue of open and commercial satellite imagery, climate data, model results, workflows and more
  • A Workflow Runner - a dedicated piece of cloud infrastructure to horizontally scale workflow requirements
  • A Web Presence - an intuitive user interface to allow account management, data discovery and mapping
  • An App Hub - a science portal providing access to a Jupyter lab environment

Presentation set up

The following cell only needs to be run on the EODH AppHub. If you have a local Python environment running, please install the required packages as you would normally.

# If needed you can install a package in the current AppHub Jupyter environment using pip
# For instance, we will need at least the following libraries
import sys
!{sys.executable} -m pip install --upgrade pyeodh geopandas matplotlib numpy pillow folium

EODH: it’s data discovery

There are a number of API endpoints that are exposed by the EODH. Oxidian have developed a Python API Client, pyeodh, that makes the Hub’s API endpoints available to Python users. pyeodh is available on PyPi (https://pypi.org/project/pyeodh/) and can be installed using pip. Documentation for the API Client is available at: https://pyeodh.readthedocs.io/en/latest/api.html

We will use pyeodh throughout this presentation.

# Imports
import pyeodh

import os

import shapely 
import geopandas as gpd
import folium

import urllib.request
from io import BytesIO 
from PIL import Image

Having imported the necessary libraries the next task is to set up the locations of the areas of interest. Having created the AOI points the user needs to connect to the Resource Catalogue so that they can start to find some data.

# Areas of Interest
rut_pnt = shapely.Point(-0.683261054299237, 52.672193937442586) # a site near Rutland
thet_pnt = shapely.Point(0.6715892933273722, 52.414471075812315) # a site near Thetford
# Optional cell
# If you want to see these points on a map run this cell

# Create a map (m) centered between the two points
center_lat = (rut_pnt.y + thet_pnt.y) / 2
center_lon = (rut_pnt.x + thet_pnt.x) / 2

m = folium.Map(location=[center_lat, center_lon], zoom_start=8)

# Add markers for each point
folium.Marker([rut_pnt.y, rut_pnt.x], popup="Rutland Site", icon=folium.Icon(color="blue")).add_to(m)
folium.Marker([thet_pnt.y, thet_pnt.x], popup="Thetford Site", icon=folium.Icon(color="green")).add_to(m)

# Step 4: Display the map
m
# Connect to the Hub
client = pyeodh.Client().get_catalog_service()

# Print a list of the collections held in the Resource Catalogue (their id and description).
# As the Resource Catalogue fills and development continues, the number of collections and the richness of their descriptions will increase
for collect in client.get_collections():
    print(f"{collect.id}: {collect.description}")
cmip6: CMIP6
cordex: CORDEX
ukcp: UKCP
defra-airbus: A collection of Airbus data for the DEFRA use case.
defra-planet: A collection of Planet data for the DEFRA use case.
airbus_sar_data: The German TerraSAR-X / TanDEM-X satellite formation and the Spanish PAZ satellite (managed by Hisdesat Servicios Estratégicos S.A.) are being operated in the same orbit tube and feature identical ground swaths and imaging modes - allowing Airbus and Hisdesat to establish a unique commercial Radar Constellation. The satellites carry a high frequency X-band Synthetic Aperture Radar (SAR) sensor in order to acquire datasets ranging from very high-resolution imagery to wide area coverage.
airbus_data_example: Airbus data
sentinel2_ard: sentinel 2 ARD
sentinel1: Sentinel 1
naip: The [National Agriculture Imagery Program](https://www.fsa.usda.gov/programs-and-services/aerial-photography/imagery-programs/naip-imagery/) (NAIP) provides U.S.-wide, high-resolution aerial imagery, with four spectral bands (R, G, B, IR).  NAIP is administered by the [Aerial Field Photography Office](https://www.fsa.usda.gov/programs-and-services/aerial-photography/) (AFPO) within the [US Department of Agriculture](https://www.usda.gov/) (USDA).  Data are captured at least once every three years for each state.  This dataset represents NAIP data from 2010-present, in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.
# The next thing to do is find some open data
# For this presentation we want to find Sentinel-2 analysis ready (ARD) imagery near Rutland

# First we just want to understand some of the parameters linked to the data collection
# We will just print the first 5 records and the dataset temporal extent   
sentinel2_ard = client.get_catalog("supported-datasets/ceda-stac-fastapi").get_collection('sentinel2_ard')
sentinel2_ard.get_items()

lim = 5
i = 0

for item in sentinel2_ard.get_items():
    if i < lim:
        print(item.id)
        i += 1

print('DATASET TEMPORAL EXTENT: ', [str(d) for d in sentinel2_ard.extent.temporal.intervals[0]])
neodc.sentinel_ard.data.sentinel_2.2023.11.21.S2B_20231121_latn536lonw0052_T30UUE_ORB123_20231121122846_utm30n_TM65
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn563lonw0037_T30VVH_ORB037_20231120132420_utm30n_osgb
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn546lonw0037_T30UVF_ORB037_20231120132420_utm30n_osgb
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn536lonw0007_T30UXE_ORB037_20231120132420_utm30n_osgb
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn528lonw0022_T30UWD_ORB037_20231120132420_utm30n_osgb
DATASET TEMPORAL EXTENT:  ['2023-01-01 11:14:51+00:00', '2023-11-01 11:43:49+00:00']
# To find out information about all the imagery in the collection then use this cell
# It undertakes a search for specific date ranges (November 2023) and limits the pagination return to 10
item_search = client.search(
    collections=['sentinel2_ard'],
    catalog_paths=["supported-datasets/ceda-stac-fastapi"],
    query=[
        'start_datetime>=2023-11-01',
        'end_datetime<=2023-11-30', 
    ],
    limit=10,
)

# The item id and start time of image capture can be printed
# If end time is also required, add the following code to the print statement: item.properties["end_datetime"]  
for item in item_search:
    print(item.id, item.properties["start_datetime"])
neodc.sentinel_ard.data.sentinel_2.2023.11.21.S2B_20231121_latn536lonw0052_T30UUE_ORB123_20231121122846_utm30n_TM65 2023-11-21T11:43:49+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn563lonw0037_T30VVH_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn546lonw0037_T30UVF_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn536lonw0007_T30UXE_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn528lonw0022_T30UWD_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn527lonw0007_T30UXD_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn519lonw0037_T30UVC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn519lonw0022_T30UWC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn518lonw0008_T30UXC_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.20.S2A_20231120_latn510lonw0036_T30UVB_ORB037_20231120132420_utm30n_osgb 2023-11-20T11:23:51+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.18.S2B_20231118_latn554lonw0053_T30UUG_ORB080_20231118122250_utm30n_osgb 2023-11-18T11:33:19+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb 2023-11-17T11:13:31+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn519lone0023_T31UDT_ORB137_20231117131218_utm31n_osgb 2023-11-17T11:13:31+00:00
neodc.sentinel_ard.data.sentinel_2.2023.11.17.S2A_20231117_latn509lone0009_T31UCS_ORB137_20231117131218_utm31n_osgb 2023-11-17T11:13:31+00:00
# To find specific imagery for the Rutland site we need to add the intersects parameter. We set this to be our AOI point.
# We can also filter the search by cloud cover, in this case limiting our search to images with less than 50% cloud in them

items = client.search(
    collections=['sentinel2_ard'],
    catalog_paths=["supported-datasets/ceda-stac-fastapi"],
    intersects=rut_pnt,
    query=[
        'start_datetime>=2023-11-01',
        'end_datetime<=2023-11-30', 
        'Cloud Coverage Assessment<=50.0'
    ],
    limit=10,
)

# We can then count the number of items returned by the search 
print('Number of items found: ', items.total_count)
Number of items found:  2
# For the purposes of this presentation we will look at the second record ([1]) in more detail
# First we need to understand what information we can access

a = items[1].to_dict()
print(a.keys())
dict_keys(['type', 'stac_version', 'id', 'properties', 'geometry', 'links', 'assets', 'bbox', 'stac_extensions', 'collection'])
# The data we want to access is stored under the 'assets' key. But what information is held in that? 
for key in (a['assets']):
    print(key)
cloud
cloud_probability
cog
metadata
saturated_pixels
thumbnail
topographic_shadow
valid_pixels
# Now we can get the link to each of the different assets
for key, value in items[1].assets.items():
    print(key, value.href)
cloud https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_clouds.tif
cloud_probability https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_clouds_prob.tif
cog https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_vmsk_sharp_rad_srefdem_stdsref.tif
metadata https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_vmsk_sharp_rad_srefdem_stdsref_meta.xml
saturated_pixels https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_sat.tif
thumbnail https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_vmsk_sharp_rad_srefdem_stdsref_thumbnail.jpg
topographic_shadow https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_toposhad.tif
valid_pixels https://dap.ceda.ac.uk/neodc/sentinel_ard/data/sentinel_2/2023/11/17/S2A_20231117_latn527lonw0007_T30UXD_ORB137_20231117131218_utm30n_osgb_valid.tif
# We can use this information to view the image thumbnail

asset_dict = items[1].assets

# Get the url as a string
thumbnail_assets = [v for k, v in asset_dict.items() if 'thumbnail' in k]
thumbnail_url = thumbnail_assets[0].href

# Here we open the remote URL, read the data and dislay the thumbnail 
with urllib.request.urlopen(thumbnail_url) as url:
    img = Image.open(BytesIO(url.read()))

display(img)

This shows that we can relatively easily interrogate the Resource Catalogue and filter the results so that we can find the data we require in the EODH. With a bit of tweaking of the code the user could also generate a list of assets and accompanying URLs to the datasets (for this and other datasets).

Now our user wants to see what commercial data exists for the Thetford site.

# Find some commercial data

for collect in client.get_collections():
    if 'defra' in collect.id: 
        print(f"{collect.id}: {collect.description}")
defra-airbus: A collection of Airbus data for the DEFRA use case.
defra-planet: A collection of Planet data for the DEFRA use case.
# Let's search for information on the Planet holdings  
planet = client.get_catalog("supported-datasets/defra").get_collection('defra-planet')
planet.get_items()

lim = 5
i = 0

for item in planet.get_items():
    if i < lim:
        print(item.id)
        i += 1

print('PLANET DATASET TEMPORAL EXTENT: ', [str(d) for d in planet.extent.temporal.intervals[0]])
2024-08-23_strip_7527622_composite
2024-08-23_strip_7527462_composite
PLANET DATASET TEMPORAL EXTENT:  ['2024-08-23 11:09:19.358417+00:00', '2024-08-23 11:24:40.991786+00:00']
# To find specific imagery for the Thetford site we need to add the intersects parameter. We set this to be our AOI point.
items1 = client.search(
    collections=['defra-planet'],
    catalog_paths=["supported-datasets/defra"],
    intersects=thet_pnt,
    limit=10,
)

items2 = client.search(
    collections=['defra-airbus'],
    catalog_paths=["supported-datasets/defra"],
    intersects=thet_pnt,
    limit=10,
)

# We can then count the number of items returned by the search 
print('Number of Planet items found: ', items1.total_count)
print('Number of Airbus items found: ', items2.total_count)
Number of Planet items found:  1
Number of Airbus items found:  2
for key, value in items1[0].assets.items():
    print('Planet: ', key, value)
Planet:  data <Asset href=2024-08-23_strip_7527462_composite_file_format.tif>
Planet:  udm2 <Asset href=2024-08-23_strip_7527462_composite_udm2_file_format.tif>

The final step would be to use the ordering service integrated into the EODH resource catalogue to purchase the required commercial imagery. This would be stored in a users workspace and could then be used in specific workflows or for data analytics (depending on licence restrictions).

For the purposes of this presentation we looked at the different commercial datasets offline in QGIS.