A. GSI Community Tools

BUFR Format and BUFR Tools

Under ./util/bufr_tools, there are many Fortran examples to illustrate basic BUFR/PrepBUFR file process skills such as encoding, decoding, and appending. For details of these examples and the BUFR format, please see the BUFR/PrepBUFR Users Guide, which is freely available on-line

http://www.dtcenter.org/com-GSI/BUFR/docs/index.php

The observation BUFR files generated by NCEP (for example, PrepBUFR and BUFR files from NCEP ftp server or gdas1.t12z.prepbufr.nr in tutorial case) are in Big Endian binary format. For release version older than 3.2, the BUFR files have to be converted from Big Endian BUFR file to Little Endian file when used by the GSI on Linux platform. Please refer to older version Users guide on how to convert.

Since release 3.2, BUFRLIB can automatically identify and handle either byte orders. For Intel and PGI compilers on Linux, the Big Endian BUFR/PrepBUFR files can be used by GSI without byte swap.

Read GSI Diagnostic Files

Lots of useful information about how one observation was used in the analysis such as innovation, observation values, observation error and adjusted observation error, and quality control information, has been saved in diagnostic files. To generate these diagnosis files, namelist variable write_diag in namelist section &SETUP needs to be true. The write_diag variable has been introduced in Part 4 of Section 3.4. The following is an example of using the write_diag variable to control diagnostic files. When we set the number of outer loops to 2, and set the write_diag namelist variable to the following:

write_diag(1)=.true.,write_diag(2)=.false.,write_diag(3)=.true.,
GSI will write out diagnostic files before the start of the 1st outer loop start (O-B) and after the completion of the 2nd outer loop finish (O-A). We dont want GSI to write out diagnosis files after the 1st outer loop because we set write_diag(2)=.false.
This is what we set in our example case described in section 5.2. From this case, we can see the following diagnostic files generated from the GSI analysis:
diag_amsua_metop-a_anl.2014061700  diag_amsua_n18_ges.2014061700
diag_amsua_metop-a_ges.2014061700  diag_conv_anl.2014061700
diag_amsua_n15_anl.2014061700      diag_conv_ges.2014061700
diag_amsua_n15_ges.2014061700      diag_hirs4_metop-a_anl.2014061700
diag_amsua_n18_anl.2014061700      diag_hirs4_metop-a_ges.2014061700
All files are identified with a filename containing three elements. The first element “diag” indicates these are combined diagnostic files. The second element identifies the observation type (here, “conv” means conventional observation from prepbufr and “amsua_n15” corresponds to radiance observation AMSU-A from NOAA 15). The last element identifies which step of outer loop the files were generated for. Here, “anl” means the contents were written after the last outer loop (from write_diag(3)=.true.) and “ges” means the contents were written before the first output loop (from write_diag(1)= .true.).
To help users to read the information from these diagnostic files, we have provided two Fortran programs in the ./util/Analysis_Utilities/read_diag/ directory:

.4in1 read_diag_conv.f90 : Reads the diagnostic files for conventional observations. For example: diag_conv_anl.2014061700 and diag_conv_ges.2014061700

read_diag_rad.f90 : Reads the diagnostic files for radiance observation. For example:

diag_amsua_n15_ges.2014061700 diag_hirs4_metop-a_anl.2014061700
diag_amsua_n18_anl.2014061700 diag_hirs4_metop-a_ges.2014061700

To compile the programs, use the makefile provided:

./make

Note: since information in the GSI include directory is required, the GSI must have been compiled first.

To run read_diag_conv.exe, a namelist file namelist.conv needs to be in the directory along with the executable. The namelist.conv only has two parameters:

&iosetup
  infilename='./diag_conv_anl',     : The path and name of GSI diagnosis file
  outfilename='./results_conv_anl', : The path and name of a text file used to
                  save the content of the diagnostic file
 /

The user can set the test case directory and file diag_conv_anl.2014061700 from section 5.2 as the entry for infilename in the namelist, then run the executable

./read_diag_conv.exe

The results are placed in the file specified by the outfilename entry in the namelist. In this case that would be a file results_conv_anl located in the directory where the executable was run.

Similarly, to run read_diag_rad.exe, the namelist file namelist.rad is needed. It contains the same parameters as namelist.conv but it links to radiance diag files. After setting it to use the same case from section 5.2, such as:

&iosetup
  infilename='(test directory)/diag_amsua_n18_ges.2014061700',
  outfilename='./results_amsua_n18_ges',
 /
Running the executable creates the text file results_amsua_n18_ges specified by the namelist in the directory read_diag_rad.exe runs.
For the conventional observations, the data is stored in two arrays: rdiagbuf and cdiagbuf. Their contents are listed as follows, for temperature (check src/main/setupt.f90) as an example:
cdiagbuf     = station id
rdiagbuf(1)  = observation type
rdiagbuf(2)  = observation subtype
rdiagbuf(3)  = observation latitude (degrees)
rdiagbuf(4)  = observation longitude (degrees)
rdiagbuf(5)  = station elevation (meters)
rdiagbuf(6)  = observation pressure (hPa)
rdiagbuf(7)  = observation height (meters)
rdiagbuf(8)  = observation time (hours relative to analysis time)
rdiagbuf(9)  = input prepbufr qc or event mark
rdiagbuf(10) = setup qc or event mark (currently qtflg only)
rdiagbuf(11) = read_prepbufr data usage flag
rdiagbuf(12) = analysis usage flag (1=use, -1=not used)
rdiagbuf(13) = nonlinear qc relative weight
rdiagbuf(14) = prepbufr inverse obs error (K**-1)
rdiagbuf(15) = read_prepbufr inverse obs error (K**-1)
rdiagbuf(16) = final inverse observation error (K**-1)
rdiagbuf(17) = observation (K)
rdiagbuf(18) = obs-ges used in analysis (K)
rdiagbuf(19) = obs-ges without bias correction (K)

For wind observations, the content after index 16 (check src/main/setupw.f90) is:

rdiagbuf(17) = earth relative u wind component observation (m/s)
rdiagbuf(18) = earth relative u obs-ges used in analysis (m/s)
rdiagbuf(19) = earth relative u obs-ges w/o bias correction (m/s)
rdiagbuf(20) = earth relative v wind component observation (m/s)
rdiagbuf(21) = earth relative v obs-ges used in analysis (m/s)
rdiagbuf(22) = earth relative v obs-ges w/o bias correction (m/s)

The read_diag_conv.exe reads these arrays and outputs important information in the text file results_conv_anlspecified by the user in the &iosetup namelist. For example:

   station      obs       obs      obs        obs         obs       usag     obs       O-B
       ID       type     time   latitude   longitude    pressure            value
ps @ 46047    : 180     -0.17     32.40     240.50     1014.30       1     1014.30   -0.09
 t @ 72293    : 120     -0.98     32.85     242.88      996.00       1      297.25    1.26
uv @ 72293    : 220     -0.98     32.85     242.88      996.00       1        3.90    0.38   3.30    2.66

For wind, the last 4 columns are the wind components in the order of: U observation, O-B for U, V observation, O-B for V.

For radiance observations, the data is stored in two arrays: diagbuf and diagbufchan. Their contents are listed as follows (please refer to src/main/setuprad.f90 for more details):

  diagbuf(1)  = observation latitude (degrees)
  diagbuf(2)  = observation longitude (degrees)
  diagbuf(3)  = model (guess) elevation at observation location
  diagbuf(4)  = observation time (hours relative to analysis time)

  diagbuf(5)  = sensor scan position
  diagbuf(6)  = satellite zenith angle (degrees)
  diagbuf(7)  = satellite azimuth angle (degrees)
  diagbuf(8)  = solar zenith angle (degrees)
  diagbuf(9)  = solar azimuth angle (degrees)
  diagbuf(10) = sun glint angle (degrees) (sgagl)

  diagbuf(11) = surface fractional coverage by water
  diagbuf(12) = surface fractional coverage by land
  diagbuf(13) = surface fractional coverage by ice
  diagbuf(14) = surface fractional coverage by snow
if(.not. retrieval)then
  diagbuf(15) = surface temperature over water (K)
  diagbuf(16) = surface temperature over land (K)
  diagbuf(17) = surface temperature over ice (K)
  diagbuf(18) = surface temperature over snow (K)
  diagbuf(19) = soil temperature (K)
  diagbuf(20) = soil moisture
  diagbuf(21) = surface land type
else
  diagbuf(15) = SST first guess used for SST retrieval
  diagbuf(16) = NCEP SST analysis at t
  diagbuf(17) = Physical SST retrieval
  diagbuf(18) = Navy SST retrieval
  diagbuf(19) = d(ta) corresponding to sstph
  diagbuf(20) = d(qa) corresponding to sstph
  diagbuf(21) = data type
endif
  diagbuf(22) = vegetation fraction
  diagbuf(23) = snow depth
  diagbuf(24) = surface wind speed (m/s)
!   Note:  The following quantities are not computed for all sensors
  if (.not.microwave) then
     diagbuf(25)  = cloud fraction (%)
     diagbuf(26)  = cloud top pressure (hPa)
  else
     diagbuf(25)  = cloud liquid water (kg/m**2)
     diagbuf(26)  = total column precip. water (km/m**2)
  endif

  diagbuf(27) = foundation temperature: Tr
  diagbuf(28) = diurnal warming: d(Tw) at depth zob
  diagbuf(29) = sub-layer cooling: d(Tc) at depth zob
  diagbuf(30) = d(Tz)/d(Tr)

diagbufchan include loop through channel i from 1 to nchanl:

diagbufchan(1,i)= observed brightness temperature (K)
diagbufchan(2,i)= observed - simulated Tb with bias corrrection (K)
diagbufchan(3,i)= observed - simulated Tb with no bias correction (K)
diagbufchan(4,i)= inverse observation error
diagbufchan(5,i)= quality control mark or event indicator
diagbufchan(6,i)= surface emissivity
diagbufchan(7,i)= stability index
diagbufchan(8,i)= d(Tb)/d(Ts)
do j=1,npred+1
   diagbufchan(7+j,i)= Tb bias correction terms (K)
end do

In the sample output file results_amsua_n18_ges, only the observation location and time are written in the file. Users can write out other information based on the list.

Read and Plot Convergence Information from fort.220

In section 4.6, we introduced how to check the convergence information in the fort.220 file. Further detail on the fort.220 convergence information can be found in the Advanced Users Guide. Here, we provide tools to filter this file and to plot the values of the cost function and the norm of gradient during each iteration.

These tools - one ksh script and one ncl script - are in: ./util/Analysis_Utilities/plot_cost_grad directory:

The ksh script, filter_fort220.ksh, only has one line:

grep 'cost,grad,step,b' fort.220 | sed -e 's/cost,grad,step,b,step? =   //g' | sed -e 's/good//g' > cost_gradient.txt
To run filter_fort220.ksh, the fort.220 needs to be in the same directory. The script will filter out the values of the cost function and the norm of gradient at each iteration from fort.220 into a text file called cost_gradient.txt.
Once the file cost_gradient.txt is ready, run ncl script GSI_cost_gradient.ncl to generate the plot:
ncl GSI_cost_gradient.ncl

The pdf file GSI_cost_gradient.pdf is created. The pdf file contains plots of the convergence of the GSI analysis like those in section 4.6.

Plot Single Observation Test Result and Analysis Increment

In Section 4.2, we introduced how to do a single observation test for GSI. Here we provide users with the ncl scripts to plot the single observation test results.
There are 5 ncl scripts in the util/Analysis_Utilities/plots_ncl directory:

[taba1]

The main difference between the ARW and NMM core used in GSI is that ARW is on a C grid, while NMM is on an E grid. GSI_singleobs_nmm.ncl calls fill_nmm_grid2.ncl to convert the E grid to an A grid for plotting, while GSI_singleobs_arw.ncl itself includes a C grid to A grid convertor.

Before running ncl scripts, users need to set up two links:

  • cdf_analysis - Link to analysis result in NetCDF format (wrf_inout)
  • cdf_bk - Link to background file in netCDF format
These scripts read in the analysis and background fields of temperature (T), U component of wind (U), V component of wind (V), and moisture (Q) and calculate the difference of the analysis field minus the background field. Then XY sections (left column) and XZ sections (right column) are plotted for T, U, V, and Q through the point that has maximum analysis increment of single observation. Here the default single observation test is T. If the user conducts other single observation tests, the corresponding changes should be made based on the current scripts.
The scripts Analysis_increment.ncl and Analysis_increment_nmm.ncl are very similar the one for the single observation but only the XY section for a certain analysis level is plotted.
For more information on how to use ncl, please check the NCL website at:
http://www.ncl.ucar.edu/

Generate initial regional ensembles

Under the ./util/EnKF/arw/src directory, there are two sub-directories: enspreproc_regional.fd/ and initialens_regional.fd/. The first one is to extract ensemble pertubations from GDAS 80 member ensembles and the second one is to add the extracted ensembles to a regional WRF background field (considered as the mean filed) to generate initial regional ensembles.

Before using these two unitilies, you should have already sucessfully compiled the GSI and gotten the “gsi.exe” file. After that, under the build directory, type “cmake -DBUILD_UTIL_COM path_to_comGSI ” to compile the utilities. A sucessful compilation should yield “enspreproc.exe” and “initialens.exe” in the bin/ directory.

Now, the next step is to get GDAS spectrally smoothed atmospheric ensemble forecasts. These files should be in the sigma format, which is currrenlty the only format supported by “enspreproc.exe”. You need to contact NCEP or other appropriate contacts to download these kind of ensembles. These ensemble files follow the name convection of “sfg_$CDATE_fhr$FEs_mem$MEM”. $CDATE is the cycle date, such as 2017011518 which means 18z of Jan. 15th, 2017. $FE is the forecast hour, for example, 06 means 6 hour of forecasts. $MEM is the member number. Here is an example of GDAS ensmbles: sfg_2017011518_fhr06s_mem001.

After you download the required GDAS ensembles, follow the following steps:

1. Running “enspreproc.exe”, enter the enspreproc_regional.fd/ directory:

(1). generate the file “fileslist01”. This file lists the ensemble files to be used in the calculation of ensemble pertubations. For example, if it is determined to use 20 members to generate ensemble perturbations, the file “filelist01” will be as follows:

sfg_2017011518_fhr06s_mem001
sfg_2017011518_fhr06s_mem002
sfg_2017011518_fhr06s_mem003
...
sfg_2017011518_fhr06s_mem018
sfg_2017011518_fhr06s_mem019
sfg_2017011518_fhr06s_mem020

(2). Modify the file “namelist.input”, change “n_ens” to the total number of ensembles to be used.

(3). Copy the “anavinfo” file used by GSI into current directory.

(4). Copy the background WRF file, name it as “wrf_inout”.

(5). Create a job description file, submit the job to get it run in parallel.

After the successful running of “enspreproc.exe”, you will get ensemble perturbations as follows:

en_perts4ars.mem0001
en_perts4ars.mem0002
en_perts4ars.mem0003
...
en_perts4ars.mem0018
en_perts4ars.mem0019
en_perts4ars.mem0020

2. Runnning “initialens.exe”, enter the initialens_regional.fd/ directory:

(1). Modify the file “namelist.input”, change “n_ens” to the total number of ensembles to be used.

(2). Copy wrf_inout to current directry

(3). Copy wrf_inout to wrfinput_d01.mem$MEM files as follows:

cp wrf_inout wrfinput_d01.mem0001
cp wrf_inout wrfinput_d01.mem0002
cp wrf_inout wrfinput_d01.mem0003
...
cp wrf_inout wrfinput_d01.mem0018
cp wrf_inout wrfinput_d01.mem0019
cp wrf_inout wrfinput_d01.mem0020

Be sure that each member has a correspoding wrfinput_d01 file. These files will be updated by “initialens.exe” later.

(4). Link the ensemble perturbations generated by “enspreproc.exe” to current directory. Something like this ln -s ../enspreproc_regional.fd/en_perts4arw.mem*.

(5). Create a job description file, submit the job to get it run in parallel. Please note that only 1 processor is required to run “initialens.exe” but submitting it to run on computing node is a must.

After the sucessful running of “initialens.exe”, all the wrfinput_d01.mem$MEM files are updated with ensemble perturbation added to the background or “mean” state of the original wrf_inout.

Now the initial regionl ensembles have been sucessfully generated.