.. _gsi_tool:

A. GSI Community Tools
===================

BUFR Format and BUFR Tools
--------------------------

Under *./util/bufr_tools*, there are many Fortran examples to illustrate
basic BUFR/PrepBUFR file process skills such as encoding, decoding, and
appending. For details of these examples and the BUFR format, please see
the BUFR/PrepBUFR Users Guide, which is freely available on-line

::

   http://www.dtcenter.org/com-GSI/BUFR/docs/index.php

The observation BUFR files generated by NCEP (for example, PrepBUFR and
BUFR files from NCEP ftp server or gdas1.t12z.prepbufr.nr in tutorial
case) are in Big Endian binary format. For release version older than
3.2, the BUFR files have to be converted from Big Endian BUFR file to
Little Endian file when used by the GSI on Linux platform. Please refer
to older version Users guide on how to convert.

Since release 3.2, BUFRLIB can automatically identify and handle either
byte orders. For Intel and PGI compilers on Linux, the Big Endian
BUFR/PrepBUFR files can be used by GSI without byte swap.

Read GSI Diagnostic Files
-------------------------

Lots of useful information about how one observation was used in the
analysis such as innovation, observation values, observation error and
adjusted observation error, and quality control information, has been
saved in diagnostic files. To generate these diagnosis files, namelist
variable *write_diag* in namelist section *&SETUP* needs to be true. The
*write_diag* variable has been introduced in Part 4 of Section 3.4. The
following is an example of using the *write_diag* variable to control
diagnostic files. When we set the number of outer loops to 2, and set
the *write_diag* namelist variable to the following:

::

   write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true.,

| GSI will write out diagnostic files before the start of the 1st outer
  loop start (O-B) and after the completion of the 2nd outer loop finish
  (O-A). We dont want GSI to write out diagnosis files after the 1st
  outer loop because we set *write_diag(2)=.false.*
| This is what we set in our example case described in section 5.2. From
  this case, we can see the following diagnostic files generated from
  the GSI analysis:

::

   diag_amsua_metop-a_anl.2014061700  diag_amsua_n18_ges.2014061700
   diag_amsua_metop-a_ges.2014061700  diag_conv_anl.2014061700
   diag_amsua_n15_anl.2014061700      diag_conv_ges.2014061700
   diag_amsua_n15_ges.2014061700      diag_hirs4_metop-a_anl.2014061700
   diag_amsua_n18_anl.2014061700      diag_hirs4_metop-a_ges.2014061700

| All files are identified with a filename containing three elements.
  The first element "diag" indicates these are combined diagnostic
  files. The second element identifies the observation type (here,
  "conv" means conventional observation from prepbufr and "amsua_n15"
  corresponds to radiance observation AMSU-A from NOAA 15). The last
  element identifies which step of outer loop the files were generated
  for. Here, "anl" means the contents were written after the last outer
  loop (from *write_diag(3)=.true.*) and "ges" means the contents were
  written before the first output loop (from *write_diag(1)= .true.*).
| To help users to read the information from these diagnostic files, we
  have provided two Fortran programs in the
  *./util/Analysis_Utilities/read_diag/* directory:

.4in1 *read_diag_conv.f90* : Reads the diagnostic files for conventional
observations. For example: *diag_conv_anl.2014061700* and
*diag_conv_ges.2014061700*

*read_diag_rad.f90* : Reads the diagnostic files for radiance
observation. For example:

::

          diag_amsua_n15_ges.2014061700 diag_hirs4_metop-a_anl.2014061700
          diag_amsua_n18_anl.2014061700 diag_hirs4_metop-a_ges.2014061700

To compile the programs, use the makefile provided:

::

   ./make 

Note: since information in the GSI *include* directory is required, the
GSI must have been compiled first.

To run *read_diag_conv.exe*, a namelist file *namelist.conv* needs to be
in the directory along with the executable. The namelist.conv only has
two parameters:

::

   &iosetup
     infilename='./diag_conv_anl',     : The path and name of GSI diagnosis file
     outfilename='./results_conv_anl', : The path and name of a text file used to
                     save the content of the diagnostic file
    /

The user can set the test case directory and file
*diag_conv_anl.2014061700* from section 5.2 as the entry for
*infilename* in the namelist, then run the executable

::

   ./read_diag_conv.exe

The results are placed in the file specified by the outfilename entry in
the namelist. In this case that would be a file *results_conv_anl*
located in the directory where the executable was run.

Similarly, to run *read_diag_rad.exe*, the namelist file namelist.rad is
needed. It contains the same parameters as *namelist.conv* but it links
to radiance diag files. After setting it to use the same case from
section 5.2, such as:

::

   &iosetup
     infilename='(test directory)/diag_amsua_n18_ges.2014061700',
     outfilename='./results_amsua_n18_ges',
    /     

| Running the executable creates the text file *results_amsua_n18_ges*
  specified by the namelist in the directory *read_diag_rad.exe* runs.
| For the conventional observations, the data is stored in two arrays:
  *rdiagbuf* and *cdiagbuf*. Their contents are listed as follows, for
  temperature (check *src/main/setupt.f90*) as an example:

::

      cdiagbuf     = station id
      rdiagbuf(1)  = observation type
      rdiagbuf(2)  = observation subtype
      rdiagbuf(3)  = observation latitude (degrees)
      rdiagbuf(4)  = observation longitude (degrees)
      rdiagbuf(5)  = station elevation (meters)
      rdiagbuf(6)  = observation pressure (hPa)
      rdiagbuf(7)  = observation height (meters)
      rdiagbuf(8)  = observation time (hours relative to analysis time)
      rdiagbuf(9)  = input prepbufr qc or event mark
      rdiagbuf(10) = setup qc or event mark (currently qtflg only)
      rdiagbuf(11) = read_prepbufr data usage flag
      rdiagbuf(12) = analysis usage flag (1=use, -1=not used)
      rdiagbuf(13) = nonlinear qc relative weight
      rdiagbuf(14) = prepbufr inverse obs error (K**-1)
      rdiagbuf(15) = read_prepbufr inverse obs error (K**-1)
      rdiagbuf(16) = final inverse observation error (K**-1)
      rdiagbuf(17) = observation (K)
      rdiagbuf(18) = obs-ges used in analysis (K)
      rdiagbuf(19) = obs-ges without bias correction (K)

For wind observations, the content after index 16 (check
*src/main/setupw.f90*) is:

::

      rdiagbuf(17) = earth relative u wind component observation (m/s)
      rdiagbuf(18) = earth relative u obs-ges used in analysis (m/s)
      rdiagbuf(19) = earth relative u obs-ges w/o bias correction (m/s) 
      rdiagbuf(20) = earth relative v wind component observation (m/s)
      rdiagbuf(21) = earth relative v obs-ges used in analysis (m/s)
      rdiagbuf(22) = earth relative v obs-ges w/o bias correction (m/s) 

The *read_diag_conv.exe* reads these arrays and outputs important
information in the text file *results_conv_anl*\ specified by the user
in the *&iosetup* namelist. For example:

::

      station      obs       obs      obs        obs         obs       usag     obs       O-B
          ID       type     time   latitude   longitude    pressure            value              
   ps @ 46047    : 180     -0.17     32.40     240.50     1014.30       1     1014.30   -0.09
    t @ 72293    : 120     -0.98     32.85     242.88      996.00       1      297.25    1.26
   uv @ 72293    : 220     -0.98     32.85     242.88      996.00       1        3.90    0.38   3.30    2.66

For wind, the last 4 columns are the wind components in the order of: U
observation, O-B for U, V observation, O-B for V.

For radiance observations, the data is stored in two arrays: *diagbuf*
and *diagbufchan*. Their contents are listed as follows (please refer to
*src/main/setuprad.f90* for more details):

::

     diagbuf(1)  = observation latitude (degrees)
     diagbuf(2)  = observation longitude (degrees)
     diagbuf(3)  = model (guess) elevation at observation location
     diagbuf(4)  = observation time (hours relative to analysis time)

     diagbuf(5)  = sensor scan position
     diagbuf(6)  = satellite zenith angle (degrees)
     diagbuf(7)  = satellite azimuth angle (degrees)
     diagbuf(8)  = solar zenith angle (degrees)
     diagbuf(9)  = solar azimuth angle (degrees)
     diagbuf(10) = sun glint angle (degrees) (sgagl)

     diagbuf(11) = surface fractional coverage by water
     diagbuf(12) = surface fractional coverage by land
     diagbuf(13) = surface fractional coverage by ice
     diagbuf(14) = surface fractional coverage by snow
   if(.not. retrieval)then
     diagbuf(15) = surface temperature over water (K)
     diagbuf(16) = surface temperature over land (K)
     diagbuf(17) = surface temperature over ice (K)
     diagbuf(18) = surface temperature over snow (K)
     diagbuf(19) = soil temperature (K)
     diagbuf(20) = soil moisture
     diagbuf(21) = surface land type
   else
     diagbuf(15) = SST first guess used for SST retrieval
     diagbuf(16) = NCEP SST analysis at t
     diagbuf(17) = Physical SST retrieval
     diagbuf(18) = Navy SST retrieval
     diagbuf(19) = d(ta) corresponding to sstph
     diagbuf(20) = d(qa) corresponding to sstph
     diagbuf(21) = data type
   endif
     diagbuf(22) = vegetation fraction
     diagbuf(23) = snow depth
     diagbuf(24) = surface wind speed (m/s)
   !   Note:  The following quantities are not computed for all sensors
     if (.not.microwave) then
        diagbuf(25)  = cloud fraction (%)
        diagbuf(26)  = cloud top pressure (hPa)
     else
        diagbuf(25)  = cloud liquid water (kg/m**2)
        diagbuf(26)  = total column precip. water (km/m**2)
     endif

     diagbuf(27) = foundation temperature: Tr
     diagbuf(28) = diurnal warming: d(Tw) at depth zob
     diagbuf(29) = sub-layer cooling: d(Tc) at depth zob
     diagbuf(30) = d(Tz)/d(Tr)

*diagbufchan* include loop through channel *i* from 1 to *nchanl*:

::

       diagbufchan(1,i)= observed brightness temperature (K)
       diagbufchan(2,i)= observed - simulated Tb with bias corrrection (K)
       diagbufchan(3,i)= observed - simulated Tb with no bias correction (K)
       diagbufchan(4,i)= inverse observation error
       diagbufchan(5,i)= quality control mark or event indicator
       diagbufchan(6,i)= surface emissivity
       diagbufchan(7,i)= stability index
       diagbufchan(8,i)= d(Tb)/d(Ts)
       do j=1,npred+1
          diagbufchan(7+j,i)= Tb bias correction terms (K)
       end do

In the sample output file *results_amsua_n18_ges*, only the observation
location and time are written in the file. Users can write out other
information based on the list.

Read and Plot Convergence Information from fort.220
---------------------------------------------------

In section 4.6, we introduced how to check the convergence information
in the *fort.220* file. Further detail on the *fort.220* convergence
information can be found in the Advanced Users Guide. Here, we provide
tools to filter this file and to plot the values of the cost function
and the norm of gradient during each iteration.

These tools - one ksh script and one ncl script - are in:
*./util/Analysis_Utilities/plot_cost_grad* directory:

The ksh script, *filter_fort220.ksh*, only has one line:

::

   grep 'cost,grad,step,b' fort.220 | sed -e 's/cost,grad,step,b,step? =   //g' | sed -e 's/good//g' > cost_gradient.txt

| To run *filter_fort220.ksh*, the *fort.220* needs to be in the same
  directory. The script will filter out the values of the cost function
  and the norm of gradient at each iteration from *fort.220* into a text
  file called *cost_gradient.txt*.
| Once the file *cost_gradient.txt* is ready, run ncl script
  *GSI_cost_gradient.ncl* to generate the plot:

::

   ncl GSI_cost_gradient.ncl

The pdf file *GSI_cost_gradient.pdf* is created. The pdf file contains
plots of the convergence of the GSI analysis like those in section 4.6.

Plot Single Observation Test Result and Analysis Increment
----------------------------------------------------------

| In Section 4.2, we introduced how to do a single observation test for
  GSI. Here we provide users with the ncl scripts to plot the single
  observation test results.
| There are 5 ncl scripts in the *util/Analysis_Utilities/plots_ncl*
  directory:

.. table:: the list of ncl plotting tools

   ==========================
   ================================================================
   GSI_singleobs_arw.ncl      Plot single observation test with ARW NetCDF background
   ==========================
   ================================================================
   GSI_singleobs_nmm.ncl      Plot single observation test with NMM NetCDF background
   Analysis_increment.ncl     Plot analysis increment from the case with ARW NetCDF background
   Analysis_increment_nmm.ncl Plot analysis increment from the case with NMM NetCDF background
   fill_nmm_grid2.ncl         E grid to A grid convertor
   ==========================
   ================================================================

[taba1]

The main difference between the ARW and NMM core used in GSI is that ARW
is on a C grid, while NMM is on an E grid. *GSI_singleobs_nmm.ncl* calls
*fill_nmm_grid2.ncl* to convert the E grid to an A grid for plotting,
while *GSI_singleobs_arw.ncl* itself includes a C grid to A grid
convertor.

Before running ncl scripts, users need to set up two links:

-  *cdf_analysis* - Link to analysis result in NetCDF format
   (*wrf_inout*)

-  *cdf_bk* - Link to background file in netCDF format

| These scripts read in the analysis and background fields of
  temperature (T), U component of wind (U), V component of wind (V), and
  moisture (Q) and calculate the difference of the analysis field minus
  the background field. Then XY sections (left column) and XZ sections
  (right column) are plotted for T, U, V, and Q through the point that
  has maximum analysis increment of single observation. Here the default
  single observation test is T. If the user conducts other single
  observation tests, the corresponding changes should be made based on
  the current scripts.
| The scripts *Analysis_increment.ncl* and *Analysis_increment_nmm.ncl*
  are very similar the one for the single observation but only the XY
  section for a certain analysis level is plotted.
| For more information on how to use ncl, please check the NCL website
  at:

::

   http://www.ncl.ucar.edu/

Generate initial regional ensembles 
------------------------------------

Under the *./util/EnKF/arw/src* directory, there are two
sub-directories: *enspreproc_regional.fd/* and
*initialens_regional.fd/*. The first one is to extract ensemble
pertubations from GDAS 80 member ensembles and the second one is to add
the extracted ensembles to a regional WRF background field (considered
as the mean filed) to generate initial regional ensembles.

Before using these two unitilies, you should have already sucessfully
compiled the GSI and gotten the "gsi.exe" file. After that, under the
build directory, type "cmake -DBUILD_UTIL_COM path_to_comGSI " to
compile the utilities. A sucessful compilation should yield
"enspreproc.exe" and "initialens.exe" in the bin/ directory.

Now, the next step is to get GDAS spectrally smoothed atmospheric
ensemble forecasts. These files should be in the sigma format, which is
currrenlty the only format supported by "enspreproc.exe". You need to
contact NCEP or other appropriate contacts to download these kind of
ensembles. These ensemble files follow the name convection of
"sfg_$CDATE_fhr$FEs_mem$MEM". $CDATE is the cycle date, such as
2017011518 which means 18z of Jan. 15th, 2017. $FE is the forecast hour,
for example, 06 means 6 hour of forecasts. $MEM is the member number.
Here is an example of GDAS ensmbles: *sfg_2017011518_fhr06s_mem001*.

After you download the required GDAS ensembles, follow the following
steps:

1. Running "enspreproc.exe", enter the *enspreproc_regional.fd/*
directory:

(1). generate the file "fileslist01". This file lists the ensemble files
to be used in the calculation of ensemble pertubations. For example, if
it is determined to use 20 members to generate ensemble perturbations,
the file "filelist01" will be as follows:

::

              sfg_2017011518_fhr06s_mem001
              sfg_2017011518_fhr06s_mem002
              sfg_2017011518_fhr06s_mem003
              ...
              sfg_2017011518_fhr06s_mem018
              sfg_2017011518_fhr06s_mem019
              sfg_2017011518_fhr06s_mem020

(2). Modify the file "namelist.input", change "n_ens" to the total
number of ensembles to be used.

(3). Copy the "anavinfo" file used by GSI into current directory.

(4). Copy the background WRF file, name it as "wrf_inout".

(5). Create a job description file, submit the job to get it run in
parallel.

After the successful running of "enspreproc.exe", you will get ensemble
perturbations as follows:

::

              en_perts4ars.mem0001
              en_perts4ars.mem0002
              en_perts4ars.mem0003
              ...
              en_perts4ars.mem0018
              en_perts4ars.mem0019
              en_perts4ars.mem0020

2. Runnning "initialens.exe", enter the *initialens_regional.fd/*
directory:

(1). Modify the file "namelist.input", change "n_ens" to the total
number of ensembles to be used.

(2). Copy wrf_inout to current directry

(3). Copy wrf_inout to wrfinput_d01.mem$MEM files as follows:

::

              cp wrf_inout wrfinput_d01.mem0001
              cp wrf_inout wrfinput_d01.mem0002
              cp wrf_inout wrfinput_d01.mem0003
              ...
              cp wrf_inout wrfinput_d01.mem0018
              cp wrf_inout wrfinput_d01.mem0019
              cp wrf_inout wrfinput_d01.mem0020

Be sure that each member has a correspoding wrfinput_d01 file. These
files will be updated by "initialens.exe" later.

(4). Link the ensemble perturbations generated by "enspreproc.exe" to
current directory. Something like this *ln -s
../enspreproc_regional.fd/en_perts4arw.mem\**.

(5). Create a job description file, submit the job to get it run in
parallel. Please note that only 1 processor is required to run
"initialens.exe" but submitting it to run on computing node is a must.

After the sucessful running of "initialens.exe", all the
*wrfinput_d01.mem$MEM* files are updated with ensemble perturbation
added to the background or "mean" state of the original wrf_inout.

Now the initial regionl ensembles have been sucessfully generated.