System setup

To use eWaterCycle package you need to setup the system with software and data.

This chapter is for system administrators or Research Software Engineers who need to set up a system for the eWaterCycle platform.

These instructions cover installing an eWaterCycle system from scratch on an “empty” Linux machine. We have also have created a codified version of these instructions using Ansible specifically targeted at the SURF Research Cloud in a separate Infra repo.

This setup should work on any Linux machine with sufficient memory (8Gb, mostly depends on the models you run), cpu (More is better, one core will do if needed), and storage (At least 200Gb) available.

The setup steps:

Conda environment
Install ewatercycle packages
Configure ESMValTool
Download climate data
Install container engine
Configure ewatercycle
Model container images
Download example parameter sets
Prepare other parameter sets
Download example forcing
Download observation data

Conda environment

The eWaterCycle Python package uses a lot of geospatial dependencies which can be installed using Conda package management system.

Install Conda by using the miniforge installer.

After conda is installed you can install the software dependencies with a conda environment file.

curl -o conda-lock.yml https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/conda-lock.yml
conda install mamba conda-lock -n base -c conda-forge -y
conda-lock install --no-dev -n ewatercycle
conda activate ewatercycle

Do not forget that any terminal or Jupyter kernel should activate the conda environment before the eWaterCycle Python package can be used.

Install eWaterCycle packages

The Python package and the plugins can be installed using pip

pip install ewatercycle ewatercycle-hype ewatercycle-lisflood ewatercycle-marrmot ewatercycle-pcrglobwb ewatercycle-wflow  ewatercycle-leakybucket

Configure ESMValTool

ESMValTool is used to generate forcing (temperature, precipitation, etc.) files from climate data for hydrological models. The ESMValTool has been installed as a dependency of the package.

See https://docs.esmvaltool.org/en/latest/quickstart/configuration.html how configure ESMValTool.

Download climate data

The ERA5 and ERA-Interim data can be used to generate forcings.

ERA5

To download ERA5 data files you can use the era5cli tool.

pip install era5cli

Follow instructions to get access to data.

As an example, the hourly ERA5 data for the years 1990 and 1991 and for variables pr, psl, tas, taxmin, tasmax, tdps, uas, vas, rsds, rsdt and fx orog are downloaded as:

cd <ESMValTool ERA5 raw directory for example /projects/0/wtrcycle/comparison/rawobs/Tier3/ERA5/1>
era5cli hourly --startyear 1990 --endyear 1991 --variables total_precipitation
era5cli hourly --startyear 1990 --endyear 1991 --variables mean_sea_level_pressure
era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_temperature
era5cli hourly --startyear 1990 --endyear 1991 --variables minimum_2m_temperature_since_previous_post_processing
era5cli hourly --startyear 1990 --endyear 1991 --variables maximum_2m_temperature_since_previous_post_processing
era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_dewpoint_temperature
era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_u_component_of_wind
era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_v_component_of_wind
era5cli hourly --startyear 1990 --endyear 1991 --variables surface_solar_radiation_downwards
era5cli hourly --startyear 1990 --endyear 1991 --variables toa_incident_solar_radiation
era5cli hourly --startyear 1990 --endyear 1991 --variables orography
cd -

The hourly data needs need be converted to daily using a ESMValTool recipe

esmvaltool run cmorizers/recipe_era5.yml

ERA-Interim

ERA-Interim has been superseeded by ERA5, but could be useful for reproduction studies and its smaller size. The ERA-Interim data files can be downloaded at https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim

Or you can use the download_era_interim.py script to download ERA-Interim data files. See first lines of script for documentation. The files should be downloaded to the ESMValTool ERA-Interim raw directory for example /projects/0/wtrcycle/comparison/rawobs/Tier3/ERA-Interim.

The ERA5-Interim raw data files need to be cmorized using script:

cmorize_obs -o ERA-Interim

Install container engine

In eWaterCycle package, the hydrological models are run in containers with engines like Apptainer or Docker. At least Apptainer or Docker should be installed.

Note

Apptainer is the open source fork of Singularity. In the eWaterCycle project we prefer to use Apptainer over Singularity. Apptainer uses the same image format as Singularity.

Installing a container engine requires root permission on the machine.

Apptainer

Install Apptainer using instructions.

Docker

Install Docker using instructions. Docker should be configured so it can be called without sudo

Configure eWaterCycle

The eWaterCycle package simplifies the API by reading some of the directories and settings from a configuration file.

The configuration can be set in Python with

import logging
logging.basicConfig(level=logging.INFO)
import ewatercycle
# Which container engine is used to run the hydrological models
ewatercycle.CFG.container_engine = 'apptainer'   # or 'docker'
# If container_engine==apptainer then where can the Apptainer images files (*.sif) be found.
ewatercycle.CFG.apptainer_dir = './apptainer-images'
# Directory in which output of model runs is stored. Each model run will generate a sub directory inside output_dir
ewatercycle.CFG.output_dir = './'
# Where can GRDC observation files (<station identifier>_Q_Day.Cmd.txt) be found.
ewatercycle.CFG.grdc_location = './grdc-observations'
# Where can parameters sets prepared by the system administator be found
ewatercycle.CFG.parameterset_dir = './parameter-sets'

and then written to disk with

ewatercycle.CFG.save_to_file('./ewatercycle.yaml')

Later it can be loaded by using:

ewatercycle.CFG.load_from_file('./ewatercycle.yaml')

To make the ewatercycle configuration load by default for current user it should be copied to ~/.config/ewatercycle/ewatercycle.yaml .

To make the ewatercycle configuration available to all users on the system it should be copied to /etc/ewatercycle.yaml .

See CFG API documention for more information.

Configuration file for Snellius system

Users part of the eWaterCycle project can use the following configurations on the Snellius system of SURF:

container_engine: apptainer
apptainer_dir: /projects/0/wtrcycle/apptainer-images
output_dir: /scratch-shared/ewatercycle
grdc_location:  /projects/0/wtrcycle/GRDC/GRDC_GCOSGTN-H_27_03_2019
parameterset_dir: /projects/0/wtrcycle/parameter-sets

The /scratch-shared/ewatercycle output directory will be automatically removed if its content is older than 14 days. If the output directory is missing it can be recreated with

mkdir /scratch-shared/ewatercycle
chgrp wtrcycle /scratch-shared/ewatercycle
chmod 2770 /scratch-shared/ewatercycle

Configuration file for ewatecycle Jupyter machine

Users can use the following configurations on systems constructed with eWaterCycle application on SURF Research Cloud:

container_engine: apptainer
apptainer_dir: /mnt/data/apptainer-images
output_dir: /scratch
grdc_location: /mnt/data/GRDC
parameterset_dir: /mnt/data/parameter-sets

Model container images

As hydrological models run in containers, their container images should be made available on the system.

The names of the images can be found in the ewatercycle.models.<model class>().bmi_image classes. For example for LeakyBucket model:

>>> from ewatercycle.models import LeakyBucket
>>> LeakyBucket().bmi_image
'ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1'
>>> LeakyBucket().bmi_image.apptainer_filename
'ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif'

Docker

Docker images will be downloaded with docker pull:

docker pull ewatercycle/lisflood-grpc4bmi:20.10
docker pull ewatercycle/marrmot-grpc4bmi:2020.11
docker pull ewatercycle/pcrg-grpc4bmi:setters
docker pull ewatercycle/wflow-grpc4bmi:2020.1.1
docker pull ewatercycle/wflow-grpc4bmi:2020.1.2
docker pull ewatercycle/wflow-grpc4bmi:2020.1.3
docker pull ewatercycle/hype-grpc4bmi:feb2021
docker pull ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1
docker pull ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023

Apptainer

Apptainer images should be stored in configured directory (ewatercycle.CFG.apptainer_dir) and can build from Docker with:

cd {ewatercycle.CFG.apptainer_dir}
apptainer build ewatercycle-lisflood-grpc4bmi_20.10.sif docker://ewatercycle/lisflood-grpc4bmi:20.10
apptainer build ewatercycle-marrmot-grpc4bmi_2020.11.sif docker://ewatercycle/marrmot-grpc4bmi:2020.11
apptainer build ewatercycle-pcrg-grpc4bmi_setters.sif docker://ewatercycle/pcrg-grpc4bmi:setters
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.1.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.1
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.2.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.2
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.3.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.3
apptainer build ewatercycle-hype-grpc4bmi_feb2021.sif docker://ewatercycle/hype-grpc4bmi:feb2021
apptainer build ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif docker://ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1
apptainer build ewatercycle-sfincs-bmiserver_sfincs-v2.0.2-blockhaus-release-q2-2023.sif docker://ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023
cd -

Download example parameter sets

To quickly run the models it is advised to setup a example parameter sets for each model.

ewatercycle.parameter_sets.download_example_parameter_sets()

INFO:ewatercycle.parameter_sets._example:Downloading example parameter set wflow_rhine_sbm_nc to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset wflow_rhine_sbm_nc to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets._example:Downloading example parameter set pcrglobwb_rhinemeuse_30min to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset pcrglobwb_rhinemeuse_30min to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets._example:Downloading example parameter set lisflood_fraser to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/lisflood_fraser...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset lisflood_fraser to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets:3 example parameter sets were downloaded
INFO:ewatercycle.config._config_object:Config written to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml
INFO:ewatercycle.parameter_sets:Saved parameter sets to configuration file /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml

Example parameter sets have been downloaded and added to the configuration file.

cat ./ewatercycle.yaml

container_engine: null
grdc_location: None
output_dir: None
parameter_sets:
  lisflood_fraser:
    config: lisflood_fraser/settings_lat_lon-Run.xml
    directory: lisflood_fraser
    doi: N/A
    supported_model_versions: !!set {'20.10': null}
    target_model: lisflood
  pcrglobwb_rhinemeuse_30min:
    config: pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
    directory: pcrglobwb_rhinemeuse_30min
    doi: N/A
    supported_model_versions: !!set {setters: null}
    target_model: pcrglobwb
  wflow_rhine_sbm_nc:
    config: wflow_rhine_sbm_nc/wflow_sbm_NC.ini
    directory: wflow_rhine_sbm_nc
    doi: N/A
    supported_model_versions: !!set {2020.1.1: null}
    target_model: wflow
parameterset_dir: /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets
apptainer_dir: None

ewatercycle.parameter_sets.available_parameter_sets()

('lisflood_fraser', 'pcrglobwb_rhinemeuse_30min', 'wflow_rhine_sbm_nc')

parameter_set = ewatercycle.parameter_sets.get_parameter_set('pcrglobwb_rhinemeuse_30min')
print(parameter_set)

Parameter set
-------------
name=pcrglobwb_rhinemeuse_30min
directory=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min
config=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
doi=N/A
target_model=pcrglobwb
supported_model_versions={'setters'}

The parameter_set variable can be passed to a model class constructor.

Prepare other parameter sets

The example parameter sets downloaded in the previous section are nice to show off the platform features but are a bit small. To perform more advanced experiments, additional parameter sets are needed. Users could use ewatercycle.base.parameter_set.ParameterSet to construct parameter sets themselves. Or they can be made available via ewatercycle.parameter_sets.available_parameter_sets() and ewatercycle.base.parameter_set.ParameterSet.download() by extending the configuration file (ewatercycle.yaml).

A new parameter set should be added as a key/value pair in the parameter_sets map of the configuration file. The key should be a unique string on the current system. The value is a dictionary with the following items:

directory: Location on disk where files of the parameter set are stored. If Path is relative then relative to ewatercycle.config.Configuration.parameterset_dir.
config: Model configuration file which uses files from directory. If Path is relative then relative to ewatercycle.config.Configuration.parameterset_dir.
doi: Persistent identifier of the parameter set. For example a DOI for a Zenodo record.
target_model: Name of the model that parameter set can work with
supported_model_versions: Set of model versions that are supported by this parameter set. If not set then parameter set will be supported by all versions of model

For example the parameter set for PCR-GLOBWB from https://doi.org/10.5281/zenodo.1045339 after downloading and unpacking to /data/pcrglobwb2_input/ could be added with following config:

pcrglobwb_rhinemeuse_30min:
    directory: /data/pcrglobwb2_input/global_30min/
    config: /data/pcrglobwb2_input/global_30min/iniFileExample/setup_30min_non-natural.ini
    doi: https://doi.org/10.5281/zenodo.1045339
    target_model: pcrglobwb
    supported_model_versions: !!set {setters: null}

Download example forcing

To be able to run the Marrmot example notebooks you need a forcing file. You can use ewatercycle.forcing.generate() to make it or use an already prepared forcing file.

cd docs/examples
wget https://github.com/wknoben/MARRMoT/raw/dev-docker-BMI/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat
cd -

Download observation data

Observation data is needed to calculate metrics of the model performance or plot a hydrograph . The ewatercycle package can use Global Runoff Data Centre (GRDC) or U.S. Geological Survey Water Services (USGS) data.

The GRDC daily data files can be ordered at https://grdc.bafg.de/GRDC/EN/02_srvcs/21_tmsrs/riverdischarge_node.html.

The GRDC files should be stored in ewatercycle.CFG.grdc_location directory.