System setup
To use eWaterCycle package you need to setup the system with software and data.
This chapter is for system administrators or Research Software Engineers who need to set up a system for the eWaterCycle platform.
These instructions cover installing an eWaterCycle system from scratch on an “empty” Linux machine. We have also have created a codified version of these instructions using Ansible specifically targeted at the SURF Research Cloud in a separate Infra repo.
This setup should work on any Linux machine with sufficient memory (8Gb, mostly depends on the models you run), cpu (More is better, one core will do if needed), and storage (At least 200Gb) available.
The setup steps:
Prepare other parameter sets
Conda environment
The eWaterCycle Python package uses a lot of geospatial dependencies which can be installed using Conda package management system.
Install Conda by using the miniconda installer.
After conda is installed you can install the software dependencies with a conda environment file.
wget https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/environment.yml
conda install mamba -n base -c conda-forge -y
mamba env create --file environment.yml
conda activate ewatercycle
Do not forget that any terminal or Jupyter kernel should activate the conda environment before the eWaterCycle Python package can be used.
Install eWaterCycle package
The Python package can be installed using pip
pip install ewatercycle
Configure ESMValTool
ESMValTool is used to generate forcing (temperature, precipitation, etc.) files from climate data for hydrological models. The ESMValTool has been installed as a dependency of the package.
See https://docs.esmvaltool.org/en/latest/quickstart/configuration.html how configure ESMValTool.
Download climate data
The ERA5 and ERA-Interim data can be used to generate forcings.
ERA5
To download ERA5 data files you can use the era5cli tool.
pip install era5cli
Follow instructions to get access to data.
As an example, the hourly ERA5 data for the years 1990 and 1991 and for variables pr, psl, tas, taxmin, tasmax, tdps, uas, vas, rsds, rsdt and fx orog are downloaded as:
cd <ESMValTool ERA5 raw directory for example /projects/0/wtrcycle/comparison/rawobs/Tier3/ERA5/1>
era5cli hourly --startyear 1990 --endyear 1991 --variables total_precipitation
era5cli hourly --startyear 1990 --endyear 1991 --variables mean_sea_level_pressure
era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_temperature
era5cli hourly --startyear 1990 --endyear 1991 --variables minimum_2m_temperature_since_previous_post_processing
era5cli hourly --startyear 1990 --endyear 1991 --variables maximum_2m_temperature_since_previous_post_processing
era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_dewpoint_temperature
era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_u_component_of_wind
era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_v_component_of_wind
era5cli hourly --startyear 1990 --endyear 1991 --variables surface_solar_radiation_downwards
era5cli hourly --startyear 1990 --endyear 1991 --variables toa_incident_solar_radiation
era5cli hourly --startyear 1990 --endyear 1991 --variables orography
cd -
The hourly data needs need be converted to daily using a ESMValTool recipe
esmvaltool run cmorizers/recipe_era5.yml
ERA-Interim
ERA-Interim has been superseeded by ERA5, but could be useful for reproduction studies and its smaller size. The ERA-Interim data files can be downloaded at https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim
Or you can use the download_era_interim.py
script to download ERA-Interim data files. See first lines of script for documentation.
The files should be downloaded to the ESMValTool ERA-Interim raw directory for example /projects/0/wtrcycle/comparison/rawobs/Tier3/ERA-Interim
.
The ERA5-Interim raw data files need to be cmorized using script:
cmorize_obs -o ERA-Interim
Install container engine
In eWaterCycle package, the hydrological models are run in containers with engines like Apptainer or Docker. At least Apptainer or Docker should be installed.
Note
Apptainer is the open source fork of Singularity. In the eWaterCycle project we prefer to use Apptainer over Singularity. Apptainer uses the same image format as Singularity.
Installing a container engine requires root permission on the machine.
Apptainer
Install Apptainer using instructions.
Docker
Install Docker using instructions. Docker should be configured so it can be called without sudo
Configure eWaterCycle
The eWaterCycle package simplifies the API by reading some of the directories and settings from a configuration file.
The configuration can be set in Python with
import logging
logging.basicConfig(level=logging.INFO)
import ewatercycle
# Which container engine is used to run the hydrological models
ewatercycle.CFG.container_engine = 'apptainer' # or 'docker'
# If container_engine==apptainer then where can the Apptainer images files (*.sif) be found.
ewatercycle.CFG.apptainer_dir = './apptainer-images'
# Directory in which output of model runs is stored. Each model run will generate a sub directory inside output_dir
ewatercycle.CFG.output_dir = './'
# Where can GRDC observation files (<station identifier>_Q_Day.Cmd.txt) be found.
ewatercycle.CFG.grdc_location = './grdc-observations'
# Where can parameters sets prepared by the system administator be found
ewatercycle.CFG.parameterset_dir = './parameter-sets'
and then written to disk with
ewatercycle.CFG.save_to_file('./ewatercycle.yaml')
Later it can be loaded by using:
ewatercycle.CFG.load_from_file('./ewatercycle.yaml')
To make the ewatercycle configuration load by default for current user
it should be copied to ~/.config/ewatercycle/ewatercycle.yaml
.
To make the ewatercycle configuration available to all users on the
system it should be copied to /etc/ewatercycle.yaml
.
See CFG API documention for more information.
Configuration file for Snellius system
Users part of the eWaterCycle project can use the following configurations on the Snellius system of SURF:
container_engine: apptainer
apptainer_dir: /projects/0/wtrcycle/apptainer-images
output_dir: /scratch-shared/ewatercycle
grdc_location: /projects/0/wtrcycle/GRDC/GRDC_GCOSGTN-H_27_03_2019
parameterset_dir: /projects/0/wtrcycle/parameter-sets
The /scratch-shared/ewatercycle output directory will be automatically removed if its content is older than 14 days. If the output directory is missing it can be recreated with
mkdir /scratch-shared/ewatercycle
chgrp wtrcycle /scratch-shared/ewatercycle
chmod 2770 /scratch-shared/ewatercycle
Configuration file for ewatecycle Jupyter machine
Users can use the following configurations on systems constructed with eWaterCycle application on SURF Research Cloud:
container_engine: apptainer
apptainer_dir: /mnt/data/apptainer-images
output_dir: /scratch
grdc_location: /mnt/data/GRDC
parameterset_dir: /mnt/data/parameter-sets
Model container images
As hydrological models run in containers, their container images should be made available on the system.
The names of the images can be found in the ewatercycle.models.*
classes.
Docker
Docker images will be downloaded with docker pull
:
docker pull ewatercycle/lisflood-grpc4bmi:20.10
docker pull ewatercycle/marrmot-grpc4bmi:2020.11
docker pull ewatercycle/pcrg-grpc4bmi:setters
docker pull ewatercycle/wflow-grpc4bmi:2020.1.1
docker pull ewatercycle/wflow-grpc4bmi:2020.1.2
docker pull ewatercycle/wflow-grpc4bmi:2020.1.3
docker pull ewatercycle/hype-grpc4bmi:feb2021
Apptainer
Apptainer images should be stored in configured directory
(ewatercycle.CFG.apptainer_dir
) and can build from Docker with:
cd {ewatercycle.CFG.apptainer_dir}
apptainer build ewatercycle-lisflood-grpc4bmi_20.10.sif docker://ewatercycle/lisflood-grpc4bmi:20.10
apptainer build ewatercycle-marrmot-grpc4bmi_2020.11.sif docker://ewatercycle/marrmot-grpc4bmi:2020.11
apptainer build ewatercycle-pcrg-grpc4bmi_setters.sif docker://ewatercycle/pcrg-grpc4bmi:setters
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.1.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.1
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.2.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.2
apptainer build ewatercycle-wflow-grpc4bmi_2020.1.3.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.3
apptainer build ewatercycle-hype-grpc4bmi_feb2021.sif docker://ewatercycle/hype-grpc4bmi:feb2021
cd -
Download example parameter sets
To quickly run the models it is advised to setup a example parameter sets for each model.
ewatercycle.parameter_sets.download_example_parameter_sets()
INFO:ewatercycle.parameter_sets._example:Downloading example parameter set wflow_rhine_sbm_nc to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset wflow_rhine_sbm_nc to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets._example:Downloading example parameter set pcrglobwb_rhinemeuse_30min to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset pcrglobwb_rhinemeuse_30min to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets._example:Downloading example parameter set lisflood_fraser to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/lisflood_fraser...
INFO:ewatercycle.parameter_sets._example:Download complete.
INFO:ewatercycle.parameter_sets._example:Adding parameterset lisflood_fraser to ewatercycle.CFG...
INFO:ewatercycle.parameter_sets:3 example parameter sets were downloaded
INFO:ewatercycle.config._config_object:Config written to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml
INFO:ewatercycle.parameter_sets:Saved parameter sets to configuration file /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml
Example parameter sets have been downloaded and added to the configuration file.
cat ./ewatercycle.yaml
container_engine: null
grdc_location: None
output_dir: None
parameter_sets:
lisflood_fraser:
config: lisflood_fraser/settings_lat_lon-Run.xml
directory: lisflood_fraser
doi: N/A
supported_model_versions: !!set {'20.10': null}
target_model: lisflood
pcrglobwb_rhinemeuse_30min:
config: pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
directory: pcrglobwb_rhinemeuse_30min
doi: N/A
supported_model_versions: !!set {setters: null}
target_model: pcrglobwb
wflow_rhine_sbm_nc:
config: wflow_rhine_sbm_nc/wflow_sbm_NC.ini
directory: wflow_rhine_sbm_nc
doi: N/A
supported_model_versions: !!set {2020.1.1: null}
target_model: wflow
parameterset_dir: /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets
apptainer_dir: None
ewatercycle.parameter_sets.available_parameter_sets()
('lisflood_fraser', 'pcrglobwb_rhinemeuse_30min', 'wflow_rhine_sbm_nc')
parameter_set = ewatercycle.parameter_sets.get_parameter_set('pcrglobwb_rhinemeuse_30min')
print(parameter_set)
Parameter set
-------------
name=pcrglobwb_rhinemeuse_30min
directory=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min
config=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
doi=N/A
target_model=pcrglobwb
supported_model_versions={'setters'}
The parameter_set
variable can be passed to a model class
constructor.
Prepare other parameter sets
The example parameter sets downloaded in the previous section are nice to show off the platform features but are a bit small.
To perform more advanced experiments, additional parameter sets are needed.
Users could use ewatercycle.parameter_sets.ParameterSet
to construct parameter sets themselves.
Or they can be made available via ewatercycle.parameter_sets.available_parameter_sets()
and ewatercycle.parameter_sets.get_parameter_set()
by extending the configuration file (ewatercycle.yaml).
A new parameter set should be added as a key/value pair in the parameter_sets
map of the configuration file.
The key should be a unique string on the current system.
The value is a dictionary with the following items:
directory: Location on disk where files of the parameter set are stored. If Path is relative then relative to
ewatercycle.CFG.parameterset_dir
.config: Model configuration file which uses files from directory. If Path is relative then relative to
ewatercycle.CFG.parameterset_dir
.doi: Persistent identifier of the parameter set. For example a DOI for a Zenodo record.
target_model: Name of the model that parameter set can work with
supported_model_versions: Set of model versions that are supported by this parameter set. If not set then parameter set will be supported by all versions of model
For example the parameter set for PCR-GLOBWB from https://doi.org/10.5281/zenodo.1045339 after downloading and unpacking to /data/pcrglobwb2_input/
could be added with following config:
pcrglobwb_rhinemeuse_30min:
directory: /data/pcrglobwb2_input/global_30min/
config: /data/pcrglobwb2_input/global_30min/iniFileExample/setup_30min_non-natural.ini
doi: https://doi.org/10.5281/zenodo.1045339
target_model: pcrglobwb
supported_model_versions: !!set {setters: null}
Download example forcing
To be able to run the Marrmot example notebooks you need a forcing file.
You can use ewatercycle.forcing.generate()
to make it or use an
already prepared forcing
file.
cd docs/examples
wget https://github.com/wknoben/MARRMoT/raw/dev-docker-BMI/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat
cd -
Download observation data
Observation data is needed to calculate metrics of the model performance or plot a hydrograph . The ewatercycle package can use Global Runoff Data Centre (GRDC) or U.S. Geological Survey Water Services (USGS) data.
The GRDC daily data files can be ordered at https://www.bafg.de/GRDC/EN/02_srvcs/21_tmsrs/riverdischarge_node.html.
The GRDC files should be stored in ewatercycle.CFG.grdc_location
directory.