System setup
============

To use eWaterCycle package you need to setup the system with software
and data.

This chapter is for system administrators or Research Software Engineers who need to set up a system for the eWaterCycle platform.

These instructions cover installing an eWaterCycle system from scratch on an "empty" Linux machine. We have also have created a codified version of these instructions using `Ansible <https://docs.ansible.com/ansible/latest/index.html>`__ specifically targeted at the `SURF Research Cloud <https://servicedesk.surfsara.nl/wiki/display/WIKI/Research+Cloud+Documentation>`__ in a `separate Infra repo <https://github.com/eWaterCycle/infra>`__.

This setup should work on any Linux machine with sufficient memory (8Gb, mostly depends on the models you run), cpu (More is better, one core will do if needed), and storage (At least 200Gb) available.

The setup steps:

1.  `Conda environment <#conda-environment>`__
2.  `Install ewatercycle packages <#install-ewatercycle-packages>`__
3.  `Configure ESMValTool <#configure-ESMValTool>`__
4.  `Download climate data <#download-climate-data>`__
5.  `Install container engine <#install-container-engine>`__
6.  `Configure ewatercycle <#configure-ewatercycle>`__
7.  `Model container images <#model-container-images>`__
8.  `Download example parameter sets <#download-example-parameter-sets>`__
9.  `Prepare other parameter sets <#prepare-other-parameter-sets>`__
10. `Download example forcing <#download-example-forcing>`__
11. `Download observation data <#download-observation-data>`__

Conda environment
-----------------

The eWaterCycle Python package uses a lot of geospatial dependencies
which can be installed using `Conda <https://conda.io/>`__ package
management system.

Install Conda by using the `miniforge
installer <https://github.com/conda-forge/miniforge>`__.

After conda is installed you can install the software dependencies with
a `conda environment
file <https://github.com/eWaterCycle/ewatercycle/blob/main/conda-lock.yml>`__.

.. code:: shell

    curl -o conda-lock.yml https://raw.githubusercontent.com/eWaterCycle/ewatercycle/main/conda-lock.yml
    conda install mamba conda-lock -n base -c conda-forge -y
    conda-lock install --no-dev -n ewatercycle
    conda activate ewatercycle

Do not forget that any terminal or Jupyter kernel should activate the conda environment before the eWaterCycle Python package can be used.

Install eWaterCycle packages
----------------------------

The Python package and the plugins can be installed using pip

.. code:: shell

    pip install ewatercycle ewatercycle-hype ewatercycle-lisflood ewatercycle-marrmot ewatercycle-pcrglobwb ewatercycle-wflow  ewatercycle-leakybucket


Configure ESMValTool
--------------------

ESMValTool is used to generate forcing (temperature, precipitation,
etc.) files from climate data for hydrological models. The
ESMValTool has been installed as a dependency of the package.

See https://docs.esmvaltool.org/en/latest/quickstart/configuration.html
how configure ESMValTool.

Download climate data
---------------------

The ERA5 and ERA-Interim data can be used to generate
forcings.

ERA5
~~~~

To download ERA5 data files you can use the
`era5cli <https://era5cli.readthedocs.io/>`__ tool.

.. code:: shell

    pip install era5cli

Follow `instructions <https://era5cli.readthedocs.io/en/stable/getting_started/>`_ to get access to data.

As an example, the hourly ERA5 data for the years 1990
and 1991 and for variables pr, psl, tas, taxmin, tasmax, tdps, uas,
vas, rsds, rsdt and fx orog are downloaded as:

.. code:: shell

    cd <ESMValTool ERA5 raw directory for example /projects/0/wtrcycle/comparison/rawobs/Tier3/ERA5/1>
    era5cli hourly --startyear 1990 --endyear 1991 --variables total_precipitation
    era5cli hourly --startyear 1990 --endyear 1991 --variables mean_sea_level_pressure
    era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_temperature
    era5cli hourly --startyear 1990 --endyear 1991 --variables minimum_2m_temperature_since_previous_post_processing
    era5cli hourly --startyear 1990 --endyear 1991 --variables maximum_2m_temperature_since_previous_post_processing
    era5cli hourly --startyear 1990 --endyear 1991 --variables 2m_dewpoint_temperature
    era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_u_component_of_wind
    era5cli hourly --startyear 1990 --endyear 1991 --variables 10m_v_component_of_wind
    era5cli hourly --startyear 1990 --endyear 1991 --variables surface_solar_radiation_downwards
    era5cli hourly --startyear 1990 --endyear 1991 --variables toa_incident_solar_radiation
    era5cli hourly --startyear 1990 --endyear 1991 --variables orography
    cd -

The hourly data needs need be converted to daily using a `ESMValTool recipe <https://docs.esmvaltool.org/en/latest/input.html#cmorization-as-a-fix>`_

.. code:: shell

    esmvaltool run cmorizers/recipe_era5.yml

ERA-Interim
~~~~~~~~~~~

ERA-Interim has been superseeded by ERA5, but could be useful for
reproduction studies and its smaller size. The ERA-Interim data files
can be downloaded at
https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim

Or you can use the `download_era_interim.py <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/download_scripts/download_era_interim.py>`_
script to download ERA-Interim data files. See first lines of script for documentation.
The files should be downloaded to the ESMValTool ERA-Interim raw directory for example ``/projects/0/wtrcycle/comparison/rawobs/Tier3/ERA-Interim``.

The ERA5-Interim raw data files need to be cmorized using `script <https://docs.esmvaltool.org/en/latest/input.html#using-a-cmorizer-script>`_:

.. code:: shell

    cmorize_obs -o ERA-Interim

Install container engine
------------------------

In eWaterCycle package, the hydrological models are run in containers
with engines like `Apptainer <https://apptainer.org/>`__ or
`Docker <https://www.docker.com/>`__. At least Apptainer or Docker
should be installed.

.. note::

    Apptainer is the open source fork of `Singularity <https://sylabs.io/singularity/>`__.
    In the eWaterCycle project we prefer to use Apptainer over Singularity.
    Apptainer uses the same image format as Singularity.

Installing a container engine requires root permission on the machine.

Apptainer
~~~~~~~~~

Install Apptainer using
`instructions <https://apptainer.org/docs/user/main/quick_start.html>`__.

Docker
~~~~~~

Install Docker using
`instructions <https://docs.docker.com/engine/install/>`__. Docker
should be configured so it can be `called without
sudo <https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user>`__

Configure eWaterCycle
---------------------

The eWaterCycle package simplifies the API by reading some of the
directories and settings from a configuration file.

The configuration can be set in Python with

.. code:: ipython3

    import logging
    logging.basicConfig(level=logging.INFO)
    import ewatercycle
    # Which container engine is used to run the hydrological models
    ewatercycle.CFG.container_engine = 'apptainer'   # or 'docker'
    # If container_engine==apptainer then where can the Apptainer images files (*.sif) be found.
    ewatercycle.CFG.apptainer_dir = './apptainer-images'
    # Directory in which output of model runs is stored. Each model run will generate a sub directory inside output_dir
    ewatercycle.CFG.output_dir = './'
    # Where can GRDC observation files (<station identifier>_Q_Day.Cmd.txt) be found.
    ewatercycle.CFG.grdc_location = './grdc-observations'
    # Where can parameters sets prepared by the system administator be found
    ewatercycle.CFG.parameterset_dir = './parameter-sets'

and then written to disk with

.. code:: ipython3

    ewatercycle.CFG.save_to_file('./ewatercycle.yaml')

Later it can be loaded by using:

.. code:: ipython3

    ewatercycle.CFG.load_from_file('./ewatercycle.yaml')

To make the ewatercycle configuration load by default for current user
it should be copied to ``~/.config/ewatercycle/ewatercycle.yaml`` .

To make the ewatercycle configuration available to all users on the
system it should be copied to ``/etc/ewatercycle.yaml`` .

See `CFG API documention <apidocs/ewatercycle.config.rst>`_ for more information.

Configuration file for Snellius system
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users part of the eWaterCycle project can use the following configurations on the `Snellius system of
SURF <https://servicedesk.surfsara.nl/wiki/display/WIKI/Snellius>`_:

.. code:: yaml

   container_engine: apptainer
   apptainer_dir: /projects/0/wtrcycle/apptainer-images
   output_dir: /scratch-shared/ewatercycle
   grdc_location:  /projects/0/wtrcycle/GRDC/GRDC_GCOSGTN-H_27_03_2019
   parameterset_dir: /projects/0/wtrcycle/parameter-sets

The `/scratch-shared/ewatercycle` output directory will be automatically removed if its content is older than 14 days.
If the output directory is missing it can be recreated with

.. code:: shell

    mkdir /scratch-shared/ewatercycle
    chgrp wtrcycle /scratch-shared/ewatercycle
    chmod 2770 /scratch-shared/ewatercycle

Configuration file for ewatecycle Jupyter machine
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users can use the following configurations on systems constructed with eWaterCycle application on SURF Research
Cloud:

.. code:: yaml

   container_engine: apptainer
   apptainer_dir: /mnt/data/apptainer-images
   output_dir: /scratch
   grdc_location: /mnt/data/GRDC
   parameterset_dir: /mnt/data/parameter-sets

Model container images
----------------------

As hydrological models run in containers, their container images should be
made available on the system.

The names of the images can be found in the ``ewatercycle.models.<model class>().bmi_image``
classes. For example for LeakyBucket model:

.. code:: ipython3

    >>> from ewatercycle.models import LeakyBucket
    >>> LeakyBucket().bmi_image
    'ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1'
    >>> LeakyBucket().bmi_image.apptainer_filename
    'ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif'

Docker
~~~~~~

Docker images will be downloaded with ``docker pull``:

.. code:: shell

    docker pull ewatercycle/lisflood-grpc4bmi:20.10
    docker pull ewatercycle/marrmot-grpc4bmi:2020.11
    docker pull ewatercycle/pcrg-grpc4bmi:setters
    docker pull ewatercycle/wflow-grpc4bmi:2020.1.1
    docker pull ewatercycle/wflow-grpc4bmi:2020.1.2
    docker pull ewatercycle/wflow-grpc4bmi:2020.1.3
    docker pull ewatercycle/hype-grpc4bmi:feb2021
    docker pull ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1
    docker pull ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023

Apptainer
~~~~~~~~~

Apptainer images should be stored in configured directory
(``ewatercycle.CFG.apptainer_dir``) and can build from Docker with:

.. code:: shell

    cd {ewatercycle.CFG.apptainer_dir}
    apptainer build ewatercycle-lisflood-grpc4bmi_20.10.sif docker://ewatercycle/lisflood-grpc4bmi:20.10
    apptainer build ewatercycle-marrmot-grpc4bmi_2020.11.sif docker://ewatercycle/marrmot-grpc4bmi:2020.11
    apptainer build ewatercycle-pcrg-grpc4bmi_setters.sif docker://ewatercycle/pcrg-grpc4bmi:setters
    apptainer build ewatercycle-wflow-grpc4bmi_2020.1.1.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.1
    apptainer build ewatercycle-wflow-grpc4bmi_2020.1.2.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.2
    apptainer build ewatercycle-wflow-grpc4bmi_2020.1.3.sif docker://ewatercycle/wflow-grpc4bmi:2020.1.3
    apptainer build ewatercycle-hype-grpc4bmi_feb2021.sif docker://ewatercycle/hype-grpc4bmi:feb2021
    apptainer build ewatercycle-leakybucket-grpc4bmi_v0.0.1.sif docker://ghcr.io/ewatercycle/leakybucket-grpc4bmi:v0.0.1
    apptainer build ewatercycle-sfincs-bmiserver_sfincs-v2.0.2-blockhaus-release-q2-2023.sif docker://ghcr.io/ewatercycle/sfincs-bmiserver:sfincs-v2.0.2-blockhaus-release-q2-2023
    cd -

Download example parameter sets
-------------------------------

To quickly run the models it is advised to setup a example parameter
sets for each model.

.. code:: ipython3

    ewatercycle.parameter_sets.download_example_parameter_sets()


.. parsed-literal::

    INFO:ewatercycle.parameter_sets._example:Downloading example parameter set wflow_rhine_sbm_nc to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/wflow_rhine_sbm_nc...
    INFO:ewatercycle.parameter_sets._example:Download complete.
    INFO:ewatercycle.parameter_sets._example:Adding parameterset wflow_rhine_sbm_nc to ewatercycle.CFG...
    INFO:ewatercycle.parameter_sets._example:Downloading example parameter set pcrglobwb_rhinemeuse_30min to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min...
    INFO:ewatercycle.parameter_sets._example:Download complete.
    INFO:ewatercycle.parameter_sets._example:Adding parameterset pcrglobwb_rhinemeuse_30min to ewatercycle.CFG...
    INFO:ewatercycle.parameter_sets._example:Downloading example parameter set lisflood_fraser to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/lisflood_fraser...
    INFO:ewatercycle.parameter_sets._example:Download complete.
    INFO:ewatercycle.parameter_sets._example:Adding parameterset lisflood_fraser to ewatercycle.CFG...
    INFO:ewatercycle.parameter_sets:3 example parameter sets were downloaded
    INFO:ewatercycle.config._config_object:Config written to /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml
    INFO:ewatercycle.parameter_sets:Saved parameter sets to configuration file /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/ewatercycle.yaml


Example parameter sets have been downloaded and added to the
configuration file.

.. code:: shell

    cat ./ewatercycle.yaml


.. parsed-literal::

    container_engine: null
    grdc_location: None
    output_dir: None
    parameter_sets:
      lisflood_fraser:
        config: lisflood_fraser/settings_lat_lon-Run.xml
        directory: lisflood_fraser
        doi: N/A
        supported_model_versions: !!set {'20.10': null}
        target_model: lisflood
      pcrglobwb_rhinemeuse_30min:
        config: pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
        directory: pcrglobwb_rhinemeuse_30min
        doi: N/A
        supported_model_versions: !!set {setters: null}
        target_model: pcrglobwb
      wflow_rhine_sbm_nc:
        config: wflow_rhine_sbm_nc/wflow_sbm_NC.ini
        directory: wflow_rhine_sbm_nc
        doi: N/A
        supported_model_versions: !!set {2020.1.1: null}
        target_model: wflow
    parameterset_dir: /home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets
    apptainer_dir: None


.. code:: ipython3

    ewatercycle.parameter_sets.available_parameter_sets()


.. parsed-literal::

    ('lisflood_fraser', 'pcrglobwb_rhinemeuse_30min', 'wflow_rhine_sbm_nc')


.. code:: ipython3

    parameter_set = ewatercycle.parameter_sets.get_parameter_set('pcrglobwb_rhinemeuse_30min')
    print(parameter_set)


.. parsed-literal::

    Parameter set
    -------------
    name=pcrglobwb_rhinemeuse_30min
    directory=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min
    config=/home/verhoes/git/eWaterCycle/ewatercycle/docs/examples/parameter-sets/pcrglobwb_rhinemeuse_30min/setup_natural_test.ini
    doi=N/A
    target_model=pcrglobwb
    supported_model_versions={'setters'}

The ``parameter_set`` variable can be passed to a model class
constructor.

Prepare other parameter sets
----------------------------

The example parameter sets downloaded in the previous section are nice to show off the platform features but are a bit small.
To perform more advanced experiments, additional parameter sets are needed.
Users could use :py:class:`ewatercycle.base.parameter_set.ParameterSet` to construct parameter sets themselves.
Or they can be made available via :py:func:`ewatercycle.parameter_sets.available_parameter_sets` and :py:func:`ewatercycle.base.parameter_set.ParameterSet.download` by extending the configuration file (ewatercycle.yaml).

A new parameter set should be added as a key/value pair in the ``parameter_sets`` map of the configuration file.
The key should be a unique string on the current system.
The value is a dictionary with the following items:

* directory: Location on disk where files of the parameter set are stored. If Path is relative then relative to :py:const:`ewatercycle.config.Configuration.parameterset_dir`.
* config: Model configuration file which uses files from directory. If Path is relative then relative to :py:const:`ewatercycle.config.Configuration.parameterset_dir`.
* doi: Persistent identifier of the parameter set. For example a DOI for a Zenodo record.
* target_model: Name of the model that parameter set can work with
* supported_model_versions: Set of model versions that are supported by this parameter set. If not set then parameter set will be supported by all versions of model

For example the parameter set for PCR-GLOBWB from https://doi.org/10.5281/zenodo.1045339 after downloading and unpacking to ``/data/pcrglobwb2_input/`` could be added with following config:

.. code:: yaml

    pcrglobwb_rhinemeuse_30min:
        directory: /data/pcrglobwb2_input/global_30min/
        config: /data/pcrglobwb2_input/global_30min/iniFileExample/setup_30min_non-natural.ini
        doi: https://doi.org/10.5281/zenodo.1045339
        target_model: pcrglobwb
        supported_model_versions: !!set {setters: null}


Download example forcing
------------------------

To be able to run the Marrmot example notebooks you need a forcing file.
You can use ``ewatercycle.forcing.generate()`` to make it or use an
already prepared `forcing
file <https://github.com/wknoben/MARRMoT/blob/dev-docker-BMI/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat>`__.

.. code:: shell

    cd docs/examples
    wget https://github.com/wknoben/MARRMoT/raw/dev-docker-BMI/BMI/Config/BMI_testcase_m01_BuffaloRiver_TN_USA.mat
    cd -

Download observation data
-------------------------

Observation data is needed to calculate metrics of the model performance or plot a hydrograph . The
ewatercycle package can use `Global Runoff Data Centre
(GRDC) <https://grdc.bafg.de/>`__ or `U.S. Geological Survey Water
Services (USGS) <https://waterservices.usgs.gov/>`__ data.

The GRDC daily data files can be ordered at
https://grdc.bafg.de/GRDC/EN/02_srvcs/21_tmsrs/riverdischarge_node.html.

The GRDC files should be stored in ``ewatercycle.CFG.grdc_location``
directory.