System setup ============ To use eWaterCycle package you need to setup the system with software and data. This chapter is for system administrators or Research Software Engineers who need to set up a system for the eWaterCycle platform. These instructions cover installing an eWaterCycle system from scratch on an "empty" Linux machine. We have also have created a codified version of these instructions using `Ansible `__ specifically targeted at the `SURF Research Cloud `__ in a `separate Infra repo `__. This setup should work on any Linux machine with sufficient memory (8Gb, mostly depends on the models you run), cpu (More is better, one core will do if needed), and storage (At least 200Gb) available. The setup steps: 1. `Conda environment <#conda-environment>`__ 2. `Install ewatercycle packages <#install-ewatercycle-packages>`__ 3. `Configure ESMValTool <#configure-ESMValTool>`__ 4. `Download climate data <#download-climate-data>`__ 5. `Install container engine <#install-container-engine>`__ 6. `Configure ewatercycle <#configure-ewatercycle>`__ 7. `Model container images <#model-container-images>`__ 8. `Download example parameter sets <#download-example-parameter-sets>`__ 9. `Prepare other parameter sets <#prepare-other-parameter-sets>`__ 10. `Download example forcing <#download-example-forcing>`__ 11. `Download observation data <#download-observation-data>`__ Conda environment ----------------- The eWaterCycle Python package uses a lot of geospatial dependencies which can be installed using `Conda `__ package management system. Install Conda by using the `miniforge installer `__. After conda is installed you can install the software dependencies with a `conda environment file `__. .. code:: shell curl -o conda-lock.yml https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/conda-lock.yml conda install mamba conda-lock -n base -c conda-forge -y conda-lock install --no-dev -n ewatercycle conda activate ewatercycle Do not forget that any terminal or Jupyter kernel should activate the conda environment before the eWaterCycle Python package can be used. Install eWaterCycle packages ---------------------------- The Python package and the plugins can be installed using pip .. code:: shell pip install ewatercycle ewatercycle-hype ewatercycle-lisflood ewatercycle-marrmot ewatercycle-pcrglobwb ewatercycle-wflow ewatercycle-leakybucket Configure ESMValTool -------------------- ESMValTool is used to generate forcing (temperature, precipitation, etc.) files from climate data for hydrological models. The ESMValTool has been installed as a dependency of the package. See https://docs.esmvaltool.org/en/latest/quickstart/configuration.html how configure ESMValTool. Download climate data --------------------- The ERA5 and ERA-Interim data can be used to generate forcings. ERA5 ~~~~ To download ERA5 data files you can use the `era5cli `__ tool. .. code:: shell pip install era5cli Follow `instructions `_ to get access to data. As an example, the hourly ERA5 data for the years 1990 and 1991 and for variables pr, psl, tas, taxmin, tasmax, tdps, uas, vas, rsds, rsdt and fx orog are downloaded as: .. code:: shell cd era5cli hourly --startyear 1990 --endyear 1991 --variables total_precipitation era5cli hourly --startyear 1990 --endyear 1991 --variables mean_sea_level_pressure era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_temperature era5cli hourly --startyear 1990 --endyear 1991 --variables minimum_2m_temperature_since_previous_post_processing era5cli hourly --startyear 1990 --endyear 1991 --variables maximum_2m_temperature_since_previous_post_processing era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_dewpoint_temperature era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_u_component_of_wind era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_v_component_of_wind era5cli hourly --startyear 1990 --endyear 1991 --variables surface_solar_radiation_downwards era5cli hourly --startyear 1990 --endyear 1991 --variables toa_incident_solar_radiation era5cli hourly --startyear 1990 --endyear 1991 --variables orography cd - The hourly data needs need be converted to daily using a `ESMValTool recipe `_ .. code:: shell esmvaltool run cmorizers/recipe_era5.yml ERA-Interim ~~~~~~~~~~~ ERA-Interim has been superseeded by ERA5, but could be useful for reproduction studies and its smaller size. The ERA-Interim data files can be downloaded at https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim Or you can use the `download_era_interim.py `_ script to download ERA-Interim data files. See first lines of script for documentation. The files should be downloaded to the ESMValTool ERA-Interim raw directory for example ``/projects/0/wtrcycle/comparison/rawobs/Tier3/ERA-Interim``. The ERA5-Interim raw data files need to be cmorized using `script `_: .. code:: shell cmorize_obs -o ERA-Interim Install container engine ------------------------ In eWaterCycle package, the hydrological models are run in containers with engines like `Apptainer `__ or `Docker `__. At least Apptainer or Docker should be installed. .. note:: Apptainer is the open source fork of `Singularity `__. In the eWaterCycle project we prefer to use Apptainer over Singularity. Apptainer uses the same image format as Singularity. Installing a container engine requires root permission on the machine. Apptainer ~~~~~~~~~ Install Apptainer using `instructions `__. Docker ~~~~~~ Install Docker using `instructions `__. Docker should be configured so it can be `called without sudo `__ Configure eWaterCycle --------------------- The eWaterCycle package simplifies the API by reading some of the directories and settings from a configuration file. The configuration can be set in Python with .. code:: ipython3 import logging logging.basicConfig(level=logging.INFO) import ewatercycle # Which container engine is used to run the hydrological models ewatercycle.CFG.container_engine = 'apptainer' # or 'docker' # If container_engine==apptainer then where can the Apptainer images files (*.sif) be found. ewatercycle.CFG.apptainer_dir = './apptainer-images' # Directory in which output of model runs is stored. Each model run will generate a sub directory inside output_dir ewatercycle.CFG.output_dir = './' # Where can GRDC observation files (_Q_Day.Cmd.txt) be found. ewatercycle.CFG.grdc_location = './grdc-observations' # Where can parameters sets prepared by the system administator be found ewatercycle.CFG.parameterset_dir = './parameter-sets' and then written to disk with .. code:: ipython3 ewatercycle.CFG.save_to_file('./ewatercycle.yaml') Later it can be loaded by using: .. code:: ipython3 ewatercycle.CFG.load_from_file('./ewatercycle.yaml') To make the ewatercycle configuration load by default for current user it should be copied to ``~/.config/ewatercycle/ewatercycle.yaml`` . To make the ewatercycle configuration available to all users on the system it should be copied to ``/etc/ewatercycle.yaml`` . See `CFG API documention `_ for more information. Configuration file for Snellius system ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Users part of the eWaterCycle project can use the following configurations on the `Snellius system of SURF `_: .. code:: yaml container_engine: apptainer apptainer_dir: /projects/0/wtrcycle/apptainer-images output_dir: /scratch-shared/ewatercycle grdc_location: /projects/0/wtrcycle/GRDC/GRDC_GCOSGTN-H_27_03_2019 parameterset_dir: /projects/0/wtrcycle/parameter-sets The `/scratch-shared/ewatercycle` output directory will be automatically removed if its content is older than 14 days. If the output directory is missing it can be recreated with .. code:: shell mkdir /scratch-shared/ewatercycle chgrp wtrcycle /scratch-shared/ewatercycle chmod 2770 /scratch-shared/ewatercycle Configuration file for ewatecycle Jupyter machine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Users can use the following configurations on systems constructed with eWaterCycle application on SURF Research Cloud: .. code:: yaml container_engine: apptainer apptainer_dir: /mnt/data/apptainer-images output_dir: /scratch grdc_location: /mnt/data/GRDC parameterset_dir: /mnt/data/parameter-sets Model container images ---------------------- As hydrological models run in containers, their container images should be made available on the system. The names of the images can be found in the ``ewatercycle.models.().bmi_image`` classes. For example for LeakyBucket model: .. code:: ipython3 >>> from ewatercycle.models import LeakyBucket >>> LeakyBucket().bmi_image 'ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1' >>> LeakyBucket().bmi_image.apptainer_filename 'ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif' Docker ~~~~~~ Docker images will be downloaded with ``docker pull``: .. code:: shell docker pull ewatercycle/lisflood-grpc4bmi:20.10 docker pull ewatercycle/marrmot-grpc4bmi:2020.11 docker pull ewatercycle/pcrg-grpc4bmi:setters docker pull ewatercycle/wflow-grpc4bmi:2020.1.1 docker pull ewatercycle/wflow-grpc4bmi:2020.1.2 docker pull ewatercycle/wflow-grpc4bmi:2020.1.3 docker pull ewatercycle/hype-grpc4bmi:feb2021 docker pull ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1 docker pull ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023 Apptainer ~~~~~~~~~ Apptainer images should be stored in configured directory (``ewatercycle.CFG.apptainer_dir``) and can build from Docker with: .. code:: shell cd {ewatercycle.CFG.apptainer_dir} apptainer build ewatercycle-lisflood-grpc4bmi_20.10.sif docker://ewatercycle/lisflood-grpc4bmi:20.10 apptainer build ewatercycle-marrmot-grpc4bmi_2020.11.sif docker://ewatercycle/marrmot-grpc4bmi:2020.11 apptainer build ewatercycle-pcrg-grpc4bmi_setters.sif docker://ewatercycle/pcrg-grpc4bmi:setters apptainer build ewatercycle-wflow-grpc4bmi_2020.1.1.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.1 apptainer build ewatercycle-wflow-grpc4bmi_2020.1.2.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.2 apptainer build ewatercycle-wflow-grpc4bmi_2020.1.3.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.3 apptainer build ewatercycle-hype-grpc4bmi_feb2021.sif docker://ewatercycle/hype-grpc4bmi:feb2021 apptainer build ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif docker://ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1 apptainer build ewatercycle-sfincs-bmiserver_sfincs-v2.0.2-blockhaus-release-q2-2023.sif docker://ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023 cd - Download example parameter sets ------------------------------- To quickly run the models it is advised to setup a example parameter sets for each model. .. code:: ipython3 ewatercycle.parameter_sets.download_example_parameter_sets() .. parsed-literal:: INFO:ewatercycle.parameter_sets._example:Downloading example parameter set wflow_rhine_sbm_nc to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc... INFO:ewatercycle.parameter_sets._example:Download complete. INFO:ewatercycle.parameter_sets._example:Adding parameterset wflow_rhine_sbm_nc to ewatercycle.CFG... INFO:ewatercycle.parameter_sets._example:Downloading example parameter set pcrglobwb_rhinemeuse_30min to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min... INFO:ewatercycle.parameter_sets._example:Download complete. INFO:ewatercycle.parameter_sets._example:Adding parameterset pcrglobwb_rhinemeuse_30min to ewatercycle.CFG... INFO:ewatercycle.parameter_sets._example:Downloading example parameter set lisflood_fraser to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/lisflood_fraser... INFO:ewatercycle.parameter_sets._example:Download complete. INFO:ewatercycle.parameter_sets._example:Adding parameterset lisflood_fraser to ewatercycle.CFG... INFO:ewatercycle.parameter_sets:3 example parameter sets were downloaded INFO:ewatercycle.config._config_object:Config written to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml INFO:ewatercycle.parameter_sets:Saved parameter sets to configuration file /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml Example parameter sets have been downloaded and added to the configuration file. .. code:: shell cat ./ewatercycle.yaml .. parsed-literal:: container_engine: null grdc_location: None output_dir: None parameter_sets: lisflood_fraser: config: lisflood_fraser/settings_lat_lon-Run.xml directory: lisflood_fraser doi: N/A supported_model_versions: !!set {'20.10': null} target_model: lisflood pcrglobwb_rhinemeuse_30min: config: pcrglobwb_rhinemeuse_30min/setup_natural_test.ini directory: pcrglobwb_rhinemeuse_30min doi: N/A supported_model_versions: !!set {setters: null} target_model: pcrglobwb wflow_rhine_sbm_nc: config: wflow_rhine_sbm_nc/wflow_sbm_NC.ini directory: wflow_rhine_sbm_nc doi: N/A supported_model_versions: !!set {2020.1.1: null} target_model: wflow parameterset_dir: /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets apptainer_dir: None .. code:: ipython3 ewatercycle.parameter_sets.available_parameter_sets() .. parsed-literal:: ('lisflood_fraser', 'pcrglobwb_rhinemeuse_30min', 'wflow_rhine_sbm_nc') .. code:: ipython3 parameter_set = ewatercycle.parameter_sets.get_parameter_set('pcrglobwb_rhinemeuse_30min') print(parameter_set) .. parsed-literal:: Parameter set ------------- name=pcrglobwb_rhinemeuse_30min directory=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min config=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini doi=N/A target_model=pcrglobwb supported_model_versions={'setters'} The ``parameter_set`` variable can be passed to a model class constructor. Prepare other parameter sets ---------------------------- The example parameter sets downloaded in the previous section are nice to show off the platform features but are a bit small. To perform more advanced experiments, additional parameter sets are needed. Users could use :py:class:`ewatercycle.base.parameter_set.ParameterSet` to construct parameter sets themselves. Or they can be made available via :py:func:`ewatercycle.parameter_sets.available_parameter_sets` and :py:func:`ewatercycle.base.parameter_set.ParameterSet.download` by extending the configuration file (ewatercycle.yaml). A new parameter set should be added as a key/value pair in the ``parameter_sets`` map of the configuration file. The key should be a unique string on the current system. The value is a dictionary with the following items: * directory: Location on disk where files of the parameter set are stored. If Path is relative then relative to :py:const:`ewatercycle.config.Configuration.parameterset_dir`. * config: Model configuration file which uses files from directory. If Path is relative then relative to :py:const:`ewatercycle.config.Configuration.parameterset_dir`. * doi: Persistent identifier of the parameter set. For example a DOI for a Zenodo record. * target_model: Name of the model that parameter set can work with * supported_model_versions: Set of model versions that are supported by this parameter set. If not set then parameter set will be supported by all versions of model For example the parameter set for PCR-GLOBWB from https://doi.org/10.5281/zenodo.1045339 after downloading and unpacking to ``/data/pcrglobwb2_input/`` could be added with following config: .. code:: yaml pcrglobwb_rhinemeuse_30min: directory: /data/pcrglobwb2_input/global_30min/ config: /data/pcrglobwb2_input/global_30min/iniFileExample/setup_30min_non-natural.ini doi: https://doi.org/10.5281/zenodo.1045339 target_model: pcrglobwb supported_model_versions: !!set {setters: null} Download example forcing ------------------------ To be able to run the Marrmot example notebooks you need a forcing file. You can use ``ewatercycle.forcing.generate()`` to make it or use an already prepared `forcing file `__. .. code:: shell cd docs/examples wget https://github.com/wknoben/MARRMoT/raw/dev-docker-BMI/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat cd - Download observation data ------------------------- Observation data is needed to calculate metrics of the model performance or plot a hydrograph . The ewatercycle package can use `Global Runoff Data Centre (GRDC) `__ or `U.S. Geological Survey Water Services (USGS) `__ data. The GRDC daily data files can be ordered at https://grdc.bafg.de/GRDC/EN/02_srvcs/21_tmsrs/riverdischarge_node.html. The GRDC files should be stored in ``ewatercycle.CFG.grdc_location`` directory.