Sentinel-1 RTC#

Product description#

The Sentinel-1 Radiometric Terrain Corrected (RTC) data product available in the SALDi Data Cube (SDC) is a mirror of Digital Earth Africa’s product of the same name.

Detailed information can be found here.

The product abbreviation used in this package is s1_rtc

Loading data#

import numpy as np
from matplotlib import pyplot as plt

from sdc.load import load_product

ds = load_product(product="s1_rtc",
                  vec="site06", 
                  time_range=("2021-01-01", "2022-01-01"))
ds
[WARNING] Loading data for an entire SALDi site will likely result in performance issues as it will load data from multiple tiles. Only do so if you know what you are doing and have optimized your workflow! It is recommended to start with a small subset to test your workflow before scaling up.
<xarray.Dataset> Size: 98GB
Dimensions:      (latitude: 5500, longitude: 6500, time: 211)
Coordinates:
  * latitude     (latitude) float64 44kB -24.9 -24.9 -24.9 ... -26.0 -26.0 -26.0
  * longitude    (longitude) float64 52kB 30.75 30.75 30.75 ... 32.05 32.05
    spatial_ref  int32 4B 4326
  * time         (time) datetime64[ns] 2kB 2021-01-01T03:17:55.709725 ... 202...
Data variables:
    vv           (time, latitude, longitude) float32 30GB dask.array<chunksize=(211, 563, 563), meta=np.ndarray>
    vh           (time, latitude, longitude) float32 30GB dask.array<chunksize=(211, 563, 563), meta=np.ndarray>
    area         (time, latitude, longitude) float32 30GB dask.array<chunksize=(211, 563, 563), meta=np.ndarray>
    angle        (time, latitude, longitude) uint8 8GB dask.array<chunksize=(211, 1127, 1127), meta=np.ndarray>

Exactly as the example in the Sentinel-2 L2A notebook, we have now lazily loaded Sentinel-1 RTC data. If you are not familiar with the concept of lazy loading, please refer to Xarray, Dask and lazy loading and the Sentinel-2 L2A notebook. The latter also covers some Xarray basics (Look for the “Xarray Shorts” sections), so be sure to check it out if you haven’t already!

Let’s have a closer look at the vh-band (vertical transmit, horizontal receive):

ds.vh
<xarray.DataArray 'vh' (time: 211, latitude: 5500, longitude: 6500)> Size: 30GB
dask.array<vh, shape=(211, 5500, 6500), dtype=float32, chunksize=(211, 563, 563), chunktype=numpy.ndarray>
Coordinates:
  * latitude     (latitude) float64 44kB -24.9 -24.9 -24.9 ... -26.0 -26.0 -26.0
  * longitude    (longitude) float64 52kB 30.75 30.75 30.75 ... 32.05 32.05
    spatial_ref  int32 4B 4326
  * time         (time) datetime64[ns] 2kB 2021-01-01T03:17:55.709725 ... 202...
Attributes:
    nodata:   nan

Converting to decibel scale#

With Synthetic Aperture Radar (SAR) data, it is often useful to convert the data to decibel (dB) scale for visualization purposes. The Sentinel-1 RTC data here is stored in power (linear) scale by default and can be converted to decibel scale using numpy’s log10-method and multiply the result by 10:

vh_db = 10 * np.log10(ds.vh)
vh_db
<xarray.DataArray 'vh' (time: 211, latitude: 5500, longitude: 6500)> Size: 30GB
dask.array<mul, shape=(211, 5500, 6500), dtype=float32, chunksize=(211, 563, 563), chunktype=numpy.ndarray>
Coordinates:
  * latitude     (latitude) float64 44kB -24.9 -24.9 -24.9 ... -26.0 -26.0 -26.0
  * longitude    (longitude) float64 52kB 30.75 30.75 30.75 ... 32.05 32.05
    spatial_ref  int32 4B 4326
  * time         (time) datetime64[ns] 2kB 2021-01-01T03:17:55.709725 ... 202...
Attributes:
    nodata:   nan

Warning

Due to the logarithmic nature of the decibel scale, it is not appropriate for some mathematical operations. For example, the mean of decibel data is not the same as the mean of the data in power (linear) scale. It is recommended to keep the data in power (linear) scale for your analysis and only convert to decibel scale for visualization purposes in the end.

Tip

The SAR Handbook is a great resource on topics related to SAR data processing and interpretation. Regarding power vs decibel scale, have a closer look at Chapter 3 Tutorial and the sections “Comparing histograms of the amplitude, power, and dB scaled data” (p. 92) and “Why is the scale important?” (p. 93) in particular.

Notice how the data structure above is still the same as for the vh-band we started with and that the data is still lazily loaded. The only difference is the increase of graph layers in the row called “Dask graph”. Dask is constructing a graph behind the scenes, which is a representation of the computation that will be executed once the data is loaded into memory.

To see the difference between the power and the decibel scaled data, let’s plot the time series of Xarray Shorts: Indexing and basic plotting again:

px_water = (31.571, -24.981)
px_veg1 = (31.5384, -25.0226)
px_veg2 = (31.551, -25.034)

vh_px_water = ds.vh.sel(longitude=px_water[0], latitude=px_water[1], method="nearest")
vh_px_veg1 = ds.vh.sel(longitude=px_veg1[0], latitude=px_veg1[1], method="nearest")
vh_px_veg2 = ds.vh.sel(longitude=px_veg2[0], latitude=px_veg2[1], method="nearest")

vh_db_px_water = vh_db.sel(longitude=px_water[0], latitude=px_water[1], method="nearest")
vh_db_px_veg1 = vh_db.sel(longitude=px_veg1[0], latitude=px_veg1[1], method="nearest")
vh_db_px_veg2 = vh_db.sel(longitude=px_veg2[0], latitude=px_veg2[1], method="nearest")
vh_px_water.plot.scatter(x="time", label="Water")
vh_px_veg1.plot.scatter(x="time", label="Vegetation 1")
vh_px_veg2.plot.scatter(x="time", label="Vegetation 2")
plt.title("Comparison of Sentinel-1 RTC backscatter time series")
plt.ylabel("VH-polarized backscatter [linear]")
plt.legend()
<matplotlib.legend.Legend at 0x7f10230ddaf0>
../../_images/26877f0f5ac944457765fdb48517c91c7fbdfac583deab914c6ba10a3600fb00.png
vh_db_px_water.plot.scatter(x="time", label="Water")
vh_db_px_veg1.plot.scatter(x="time", label="Vegetation 1")
vh_db_px_veg2.plot.scatter(x="time", label="Vegetation 2")
plt.title("Comparison of Sentinel-1 RTC backscatter time series")
plt.ylabel("VH-polarized backscatter [dB]")
plt.legend()
<matplotlib.legend.Legend at 0x7f10230bfe00>
../../_images/9047810cb28cdc28055ee64e1fc7dcba76d321ff82217bf051355324f7c6327b.png

Xarray Shorts: Groupby-operations#

Combining groupby-operations with aggregation functions can be very useful to calculate statistics for different time periods. In the following example, we calculate the median for each month of the year.

Let’s first just apply the groupby-operation. The result is a xarray.core.groupby.DataArrayGroupBy object, which is a lazy representation of the grouped data.

vh_monthly = ds.vh.groupby("time.month")
vh_monthly
<DataArrayGroupBy, grouped over 1 grouper(s), 12 groups in total:
    'month': 12 groups with labels 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12>

Tip

Handling Xarray’s datetime components as in the example above can be very useful. You can find an overview on this topic here.

We can now apply an aggregation function to the grouped data. In this case, we calculate the median for each month of the year. The result is a xarray.DataArray object with a new month-dimension.

Notice that the size of the array is now much smaller than the original vh data we started with because we have reduced the number of time steps from 211 (acquisitions) to 12 (months). The size is an estimate of the potential memory footprint of the data.

vh_monthly_median = vh_monthly.median(dim="time")
vh_monthly_median
<xarray.DataArray 'vh' (month: 12, latitude: 5500, longitude: 6500)> Size: 2GB
dask.array<stack, shape=(12, 5500, 6500), dtype=float32, chunksize=(1, 563, 563), chunktype=numpy.ndarray>
Coordinates:
  * latitude     (latitude) float64 44kB -24.9 -24.9 -24.9 ... -26.0 -26.0 -26.0
  * longitude    (longitude) float64 52kB 30.75 30.75 30.75 ... 32.05 32.05
    spatial_ref  int32 4B 4326
  * month        (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
Attributes:
    nodata:   nan

We could have also chained both operations together, which would have resulted in the same output:

vh_monthly_median = ds.vh.groupby("time.month").median(dim="time")

You can find an overview of which methods are available for grouped DataArrayGroupBy objects here.

As you can probably imagine, these operations can be very useful to handle and analyse large time series data. Instead of going further at this point, let’s just convert the result to decibel scale, select the month of June and plot it:

vh_monthly_median_db = 10 * np.log10(vh_monthly_median)
vh_monthly_median_db.sel(month=6).plot(robust=True, cmap="gray", cbar_kwargs={"label": "VH backscatter [dB]"})
plt.title("Median VH-polarized backscatter for June 2021; SALDi site 06")
Text(0.5, 1.0, 'Median VH-polarized backscatter for June 2021; SALDi site 06')
../../_images/5388ab07095899cf7aab1f57fd17ed0f01b11266c11a208d00cc5d1b53086cf5.png