Author: Grant J. Firl

Date: 2017

This version of the GMTB SCM is used for running the GFS operational physics and other physics that are made to work with NOAA EMC's physics driver (v3). The main purpose of this version is for use in GMTB's Physics Testbed for testing alternate physics schemes within the GFS physics suite. As of this version, this code is not connected to GMTB's CCPP driver, which will accommodate all physics suites within the CCPP. This version uses the GFS vertical grid and either the forward Euler or filtered leapfrog time integration schemes. It is set up to run GCSS-style cases with the option of specified or calculated surface fluxes.

Directory Structure

/bin

directory where the makefile places object files and the executable

/case_config

directory containing experiment configuration namelist files
each file contains:
- host model name (currently only GFS supported)
- number of columns (usually 1; columns are independent)
- case name (identifier for the initialization/forcing dataset to use)
- time step in seconds (dynamics and physics time step are the same for the SCM right now)
- time stepping scheme (1 = forward Euler, 2 = filtered leapfrog)
- experiment runtime in seconds
- output frequency in seconds
- shortwave radiation call frequency in seconds
- longwave radiation call frequency in seconds
- number of vertical levels (28, 42, 60, 64, 91 supported)
- directory where to write SCM output
- name of the output file
- type of thermodynamic forcing to use (1= total advective tendencies, 2= horizontal advective tendencies with prescribed vertical motion, 3= relaxation to observed profiles with vertical motion prescribed)
- type of momentum forcing to use (options are same as thermodynamic forcing)
- timescale in seconds for the relaxation forcing (if used)
- flag for using specified surface fluxes (if true, they must be supplied in the case input file)
- reference profile choice for use above specified profiles (1= "McClatchey" profile, 2: mid-latitude summer standard atmosphere)
- year, month, day, and hour of the initial time
physics configuration (will eventually be used for configuring the CCPP)
- name of the physics suite
- directory containing the physics suite XML files
- number of fields in the data type used by the physics suite

/doc

directory containing documentation in the form of Doxygen output (html, LaTeX)

/model_config

directory containing namelist file to configure the GFS physics (filename corresponds to experiment configuration file)
also contains files needed to set up GFS vertical coordinate

/comparison_data

directory containing netCDF files used to compared model output to

/processed_case_input

directory containing netCDF files with case initialization and forcing data (all case netCDF data must conform to defined format)

/raw_case_input

directory containing case input data in non-standardized format; this data must be processed into the correct netCDF format

/scripts

directory containing python scripts for producing case input files and for analyzing SCM output

/src

directory containing all source code
- files in the top level src directory are GMTB-generated source code (SCM infrastructure)
- other source code is organized into subdirectories:
  - /w3nco_v2.0.6 (NCEP library required by GFS physics)
    - directory contains the NOAA w3 library (NCO v.2.0.6)
  - /sp_v2.0.2 (NCEP library required by GFS physics)
    - directory contains the NOAA sp library (NCO v2.0.2)
  - /bacio_v2.1.0 (NCEP library required by GFS physics)
    - directory contains the NOAA bacio library (NCO v2.1.0)
  - /GFS_branch_GMTB_gf_da_test
    - directory contains the GMTB/gf_da_test branch of GFS (close to 2017 GFS operational code) (checked out SVN repo)
  - /cmake
    - directory containing cmake modules needed for compilation

/standalone_data

directory containing data originally used by the standalone NUOPC driver:
- GFS output for 8 columns
- aerosol dataset
- CO2 dataset
- solar constant dataset

Other files

gmtb_scm_dox – doxygen configuration file for generating documentation
readme – link to this documentation
gmtb_scm_run.py – python run script for submitting a SCM job to NOAA's Theia batch system (can also be modified to run an ensemble or to run the plotting routines)
gmtb_scm_ens.py – python script for running a SCM forcing ensemble
Theia_setup – file used to set up the computing environment on NOAA's Theia machine for the SCM
Yellowstone_setup – file used to set up the computing environment on NCAR's Yellowstone machine for the SCM
Cheyenne_setup – file used to set up the computing environment on NOAA's Cheyenne machine for the SCM (SCM runs, but plotting does not)

How to Obtain the Code

Version 1.3 of the code is housed in a NOAA VLab git repository under the project name gmtb-scm. Although the code base is not available to the public due to ongoing development, you may contact the author (grant.nosp@m.f@uc.nosp@m.ar.ed.nosp@m.u) for access to the repository or a compressed file containing the code as a "friendly user".

If you have access to the NOAA VLab system, the following steps can be done to gain access to the code:

If you're a first time VLab user, you need to log on to all three components of the VLab system before being added to a project:
- VLab collaborative interface
- VLab Gerrit interface
- VLab Redmine interface
- Note: use NOAA email credentials (without @noaa.gov)
Send a request to grant.nosp@m.f@uc.nosp@m.ar.ed.nosp@m.u to be added to the gmtb-scm project on VLab.
Once added, log in to the VLab Gerrit interface and configure some settings by following the instructions at the following webpage:
- Gerrit configuration
- This involves setting up SSH keys and configs to allow SSH access to the repos.
- If you create a custom ssh-key for this project, you will need to add it to your local ssh agent with:
  - 1 ssh-add ~/.ssh/id_rsa.vlab
- If you want to interact with the repo using HTTPS, this step may not be necessary, although you would need to use your HTTP password every time an interaction with the remote repo takes place.
Clone the gmtb-scm repo using one of two methods:
1. SSH
  - Issue the following command from the directory where you want the code to be located on your local machine:
  - 1 git clone ssh://First.Last@vlab.ncep.noaa.gov:29418/gmtb-scm
    where First.Last is your NOAA logon credentials
2. HTTPS
  - Once logged in to the VLab Gerrit interface, click on your user name in the upper right hand corner and click "Settings."
  - Click on the "HTTP Password" menu item on the left. Click on generate password if one is not present.
  - Issue the following command from the directory where you want the code to be located on your local machine:
  - 1 git clone https://First.Last@vlab.ncep.noaa.gov/code-review/gmtb-scm
    where First.Last is your NOAA logon credentials
  - When prompted, enter your HTTP password from Gerrit.

If you want to checkout a stable, tagged version of the code (recommended) rather than the latest development on the master branch, use the following command within the repo directory after it has been cloned:

1 git checkout -b new_branch_name v#.#

where v#.# is the desired tag. This will create and check out a new branch called "new_branch_name" that loads the code from the specified tagged version.

How to Set Up, Compile, Run, and Plot

Case Setup

For using initialization and forcing data for a case that has already been set up:
1. Copy and edit the default experiment configuration file in the case_config directory to suite your needs.
2. Copy and edit the default namelist file in the model_config directory to suite your needs. The model_config namelist must have the same name as the case_config file.
For a new case:
1. Process the new case data such that a netCDF file with the same format as that supplied is produced. A python script is supplied as an example of how to do so.
2. Perform the two steps above.

Compile

Building and compilation is accomplished using the CMake utility. To build (out-of-source), perform the following:
1. cd to the bin directory (or make another build directory and cd into it)
2. For a standard build, use:
  1 cmake ../src
3. For a debugging build using an IDE (like Eclipse), use:
  1 cmake -G"Eclipse CDT4 - Unix Makefiles" ../src
4. For a special build using a different GFS directory (e.g., for the Grell-Freitas test), use:
  1 cmake -G"Eclipse CDT4 - Unix Makefiles" -DNEMS_SRC:STRING=subdirectory_with_GFS_source ../src
A working netCDF installation is required. CMake will attempt to find the netCDF installation so that it will be linked during compilation.
To compile, simply invoke
1 make
The code has been built and compiled on several machines:
- Mac OS X 10.11.6 Darwin 15.6 (Cmake 3.6.2, Fortran GNU 6.3.0, C AppleClang 7.0.0.7000072, netCDF 4.4.1)
- NOAA R&D machine Theia (Cmake 2.8.12.2, Fortran and C Intel 14.0.2, netCDF 4.3.0)
- NCAR's Yellowstone (Cmake 3.0.2, Fortran and C Intel 14.0.2, netCDF 4.3.0)
- NCAR's Cheyenne (Cmake 2.8.12.1, Fortran and C Intel 16.0.3, netCDF 4.4.1)
For using one of the HPC machines you need to configure the environment BEFORE compiling. Do the following before performing the steps above:
- From the top level gmtb-scm directory,
  1 source [machine]_setup
  (where [machine] is one of the machines supported) loads the default intel module, the default netcdf module compiled with intel, sets the FC environment variable to 'ifort', prepends the path to the Anaconda python distribution for using the supplied python scripts, and installs the python package 'f90nml' in the user's local space (if necessary)
- Run cmake and make as above depending on desired build.

Run

After successful compilation, issue the command:
- ./gmtb_scm experiment_name (where experiment_name is the filename of the experiment configuration file in the case_config directory without the extension.)
For Theia or another HPC machine, a python run script is provided (gmtb_scm_run.py). Copy from the top-level directory to the bin directory and edit as necessary. The script contains options for:
- the account to charge for the processing time
- the job name, estimated wall clock time (typically less than 1 minute, depending on the case), email address and options
- the actual run command: (./gmtb_scm experiment_name)

Plot

The gmtb_scm_analysis.py script is included within the scripts directory to plot SCM output from one run, to compare more than one run, and to compare to observations or other data. It uses a configuration file to control which plots are made and how they look. To run, use
1 ./gmtb_scm_analysis.py name_of_config_file.ini

Configuration File

An example configuration file (example.ini) is included in the scripts directory. Copy and modify this configuration file to generate plots. The following information is contained within a configuration file:
- The following variables control which SCM output files are read, what they're called in the plots, where to put the plots, where to find comparison data, and other plotting options.
  - gmtb_scm_datasets: a list of output files to read. Specify relative paths from the scripts directory. Enter one or more files, separated by commas.
  - gmtb_scm_datasets_labels: a list of labels corresponding to the output files specified above. These labels are used to differentiate plotted data in legends, etc. Enter exactly one label for each output file to be read, separated by commas.
  - plot_dir: a string specifying where to put the plots (relative path from the scripts dir)
  - obs_file: a string specifying the path (relative to the scripts dir) to the file containing data with which to compare the model output. The script reads the case_config namelist corresponding to the first listed output file to find the case name. Based on the case name, the script tries to read the specified file using a routine found in gmtb_scm_read_obs.py. For a new case type, one must write a new routine in gmtb_scm_read_obs.py to read the desired file and fill in the observation dictionary passed back to gmtb_scm_analysis.py.
  - obs_compare: a boolean value controlling whether observations from obs_file are plotted alongside the model output. The gmtb_scm_analysis.py script attempts to find observation data corresponding to each requested plot's variable. If there are observations corresponding to the requested variable, they are plotted. Otherwise, observations are ignored for that particular plot.
  - plot_ind_datasets: a boolean value controlling whether plots are generated for individual output datasets. If true, directories are created to contain plots for each individual dataset (named using gmtb_scm_datasets_labels). If more than one dataset is specified, a 'comp' directory is created to contain plots comparing the datasets.
  - time_series_resample: a boolean value controlling whether resampling is performed on time series data. This is useful if the observational dataset frequency is different than the model output frequency. If true, the data or observations are resampled to the lowest frequency among them. If false, the model data and observations are plotted as is, with their respective frequencies.
- The next sections control the following types of plots: profiles, time series, and contour plots (time-vertical cross-sections).
  - [time_slices] section: this section controls which times in the output files are used to generate the plots. If one is interested in the entire SCM run, enter information about the start and end times of the run. If one is interested in a particular time period in the SCM run, these can be specified by their start/end times too. Enter at least one time slice. A separate directory is created for each time slice.
    - each time slice is given a name within two brackets [[time_slice_name]] on its own line
    - each time slice must have two lists of 4 integers in the format:
      - 1 start = year, month, day, hour
        2 end = year, month, day, hour
  - [plots] section: this section controls each type of plot in its own subsection
    - [[profiles_mean]]: this section controls the plotting of mean profiles for all time slices. Mean profiles representing the mean over each time slice are calculated.
      - vars: list of strings corresponding to variable names in the SCM output netCDF files. The strings must match the variable names in that file to be plotted.
      - vars_labels: list of strings corresponding to the variables listed above; the string for each variable will appear as the abscissa's label, so it should include units as appropriate.
      - vert_axis: string containing the name of the SCM output netCDF variable to use as the ordinate axis
      - vert_axis_label: string corresponding to the vertical axis; will appear as ordinate axis label, so should include units as appropriate
      - y_inverted: boolean to control whether the ordinate axis should be inverted (top-to-bottom)
      - y_log: boolean to control whether the ordinate axis should be logarithmic
      - y_min_option: choice of min, max, val (if val is specified, include y_min = floating point value in this subsection)
      - y_max_option: choice of min, max, val (if val is specified, include y_max = floating point value in this subsection)
        
        example of a plot generated from the profiles_mean section
    - [[profiles_mean_multi]]: this section controls the plotting of mean profiles for all time slices for multiple variables on the same plot. Mean profiles representing the mean over each time slice are calculated for each variable and plotted together.
      - each multi-variable profile plot is given its own subsection, named within triple brackets: [[[multi_variable_profile_plot_name]]]
      - vars: list of strings corresponding to variable names in the SCM output netCDF files that should be plotted together. The strings must match the variable names in that file to be plotted.
      - vars_labels: list of string corresponding to the variables listed above; the labels will appear in a legend (no units necessary)
      - x_label: string used to label the ordinate axis of the plot (should contain units)
        
        example of a plot generated from the profiles_mean_multi section; different variables are denoted by colors, output datasets by line style
    - [[time_series]]: this section controls the plotting of time series for all time slices.
      - vars: list of strings corresponding to variable names in the SCM output netCDF files. These variables must be time-dependent only (for now).
      - vars_labels: list of strings corresponding to the variables listed above; the string for each variable will appear as the ordinate's label, so it should include units as appropriate.
        
        example of a plot generated from the time_series section
    - [[contours]]: this section controls contour plots for all time slices; these will plot a time-vertical cross-section.
      - vars: list of strings corresponding to variable names in the SCM output netCDF files. The strings must match the variable names in that file to be plotted.
      - vars_labels: list of strings corresponding to the variables listed above; the string for each variable will appear as a title, so it should include units as appropriate.
      - vert_axis: string containing the name of the SCM output netCDF variable to use as the ordinate axis
      - vert_axis_label: string corresponding to the vertical axis; will appear as ordinate axis label, so should include units as appropriate
      - y_inverted: boolean to control whether the ordinate axis should be inverted (top-to-bottom)
      - y_log: boolean to control whether the ordinate axis should be logarithmic
      - y_min_option: choice of min, max, val (if val is specified, include y_min = floating point value in this subsection)
      - y_max_option: choice of min, max, val (if val is specified, include y_max = floating point value in this subsection)
      - x_ticks_num: integer of the number of labeled ticks on the abscissa axis
      - y_ticks_num: integer of the number of labeled ticks on the ordinate axis
        
        example of a plot generated from the contours section