NASA - National Aeronautics and Space Administration
FIND IT @ NASA
HOME MISSIONS DATA TOOLS ABOUT US
NASA Tropospheric Chemistry Integrated Data Ceneter

 

INTEX-NA Logo
INTEX-A

ICARTT Data Management Implementation Plan

Eric Williams; AL/NOAA
Jim Crawford; LaRC/NASA
Ali Aknan; LaRC/NASA
Hans Schlager; DLR

(1 December 2004)

INTEX-B Logo
INTEX-B

Click here to view latest revision


Preivous Version (21 May 2004) -- the document has been amended to clarify:
1) Definition for Data Interval and Description of independent variable
2) Time Format Reporting for Satellite Data


 


Part I. Overview of ICARTT Data Management - 2004

1. Introduction

The ICARTT study will involve a large number of measurement platforms that will provide a tremendous volume of data. Because there are a number of laboratories involved in the study, each with their own way of handling data, common methods of dealing with data must be identified and agreed upon prior to the study. This will facilitate data transfer both during the study and, more importantly, after the campaign is completed. Toward this end a data transfer and archiving standard has been agreed to by all of the principals in the study. This standard, modelled after the NASA Ames format, has been chosen because it is quasi-universal (i.e., most institutions are familiar with it), it satisfies most of the data handling issues that are expected to arise during this study, and it is easily handled by most computer-based data manipulation programs especially those used for merging and archiving data for public use. For reference, the full specification for the Ames file exchange format can be found at: http://cloud1.arc.nasa.gov/solve/archiv/archive.tutorial.html.

The Ames data format describes a generalized procedure by which data are placed into structured files with accompanying header information and appropriate file names. Since the ICARTT data format will vary somewhat from the Ames format, the purpose of this document is to specify in detail the exact structure for ICARTT data files, the exact information and structure of the header, and the exact file names to be used. The reasoning behind these specifications will become apparent as the details emerge below.

We first discuss the overall strategy and considerations for data management followed by a detailed description of the format requirements for data files during and after the ICARTT study.

2. Data exchange protocol

A. Data Managers

Certainly many issues and problems will arise with a study of this size and will need to be addressed. Toward this end, we recommend that each group designate a Data Manager (DM) who will be responsible for ensuring that all the measurements from that group be placed on an accessible server in the required format and within the timeframe specified below. These people should be identified well before the study so that all Data Managers can collectively address the needs and requirements for data exchange and identify and implement solutions prior to the start of the field campaign. It is the team of Data Managers, with some input from the Data Management Working Group, that will make the decisions regarding all aspects of ICARTT data exchange.

B. Data access

There will not be a central data collection and distribution server, but rather individual data servers set-up and maintained by each group and accessed either via the web or ftp. Collectively, these sites can be thought of as a data repository (see Figure 1).

We should ensure that all groups involved in the study have a server that is accessible. If not, arrangements should be made for the data from that group (or PI) to be placed on the server of one of the other groups in the study. All participants will have access to all the data products available during and after the study.

Fig1

C. Data transfer during the field study

During the field study every attempt should be made to have data posted to the data repository no more than 24 hours after the measurements have been taken. For some data this will be an absolute requirement due to the needs of flight and ship track planning. These data should be identified well before the commencement of the field campaign.

During and immediately after the campaign, “field” data files will be available. Data exchanged during the field study are considered a special case since these data are typically “first look” and, due to time constraints, are not likely to have undergone the full scrutiny of the PI. In order to reflect this fact the file names will be modified slightly with respect to the convention stipulated below in that the data revision code will be a capital letter (e.g., A, B) instead of a numeric code. This will be the flag to indicate to the user that these are Field data to be used only during the field study. These files should be deleted as soon as possible after the study and replaced with preliminary data files which will have some QA/QC performed.

D. Post-mission data exchange deadlines

We introduce here the data catalog (see Figure 2). This is a fully accessible (to the public) web site that is set up and maintained by the Data Manager team. The data catalog does not contain data files, but rather contains information about what data are available, what the status is (field, preliminary, or final), and where the data can be accessed. This is the central site where an investigator can go to find out what data were taken during the study, what stage the data QA/QC process is at, and who to contact to get the data. The data catalog provides openness without compromising security. The DM team may decide to implement this data catalog prior to the study as a means of providing an overview “at a glance” of the measurements being conducted during the ICARTT study.

Preliminary data files should be submitted to the respective servers no later than 01 January 2005. These files will be required to give PIs adequate time for analysis prior to the anticipated data workshop in April, 2005. Final data should be submitted no later than 01 June 2005.

E. Data archiving

The NASA LaRC group has agreed to accept all ICARTT data for final archiving to a public database and to merge the data into an easily usable format.

Fig2

F. Responsibilities of data access

A major goal of this data management plan is to facilitate the free exchange of data between and among various teams of researchers. The intention of this data sharing is to broaden the interpretation of observations and to exploit complementary data collected by different research teams. While this level of access is desirable, there are clear responsibilities that come with this access. It is appropriate and expected that researchers may browse all data unfettered; however, once earnest research is pursued, it is essential that relevant Principal Investigators will be made aware that their data are being used. It is also expected that they will be offered co-authorship and the opportunity to comment on the content of manuscripts prior to submission for publication. It is imperative that Principal Investigators be consulted when suspicious data is encountered or when interpretation of data becomes dependent upon understanding the underlying technique.

It is customary for research teams to seek publication of results in an agreed upon journal as a “special issue” or “special section”. It is also possible that more than one research team may decide to pursue a common publication schedule. It is expected that researchers will honor the publication schedules agreed upon by their respective teams. Some researchers may decide to request an exception to the publication schedule. Such exceptions can include results of extremely high interest that would have immediate impact or interesting results that are peripheral to the primary goals of ICARTT. Requests for an exception to the publication schedule should be arbitrated by the primary author’s science team.

 

Part II. Data File Formats

Click here to view the latest updates for the ICARTT Data File Formats.

In a study this large there are many different types of data collected. Many data sets are simply straight time series with one or a number of parameters being measured sequentially (and simultaneously) in time. However, there are some data sets that are truly multi-dimensional in that a sample will be taken by an instrument at a single point in time and a number of parameters will be measured on that sample simultaneously. An example is wind profiler data in which a 30-minute averaged sample taken at some time period will be binned into height information and at each height will be wind speed, wind direction, and temperature. Another, more extreme, example is output from 3- dimensional models. Data such as these clearly cannot be represented as a single time series. Sections 1- 2 below outline the ICARTT format for all types of data, with an emphasis on standard time-series types of data. Section 3 - 4 is specific for standard timeseries types of data. Section 5 offers guidance for non-standard time-series data.

Though adapted from the NASA Ames data format, the ICARTT data format will have no restriction on the number of characters per line or on the number of characters per record. The file name will be limited to 127 characters in length. These features (and others; see below) will require modification to some existing file reading and checking software (such as Ames). The NASA LaRC group has agreed to create web-based checking software, accessible to the ICARTT science team, to accommodate these changes.

1. Requirements for data files

A. Time information

The philosophy here is that the data in the files must possess at least the minimum amount of accompanying information to uniquely identify each data point - this generally means time and location information. Moreover, the format must be able to handle all forms of timing configurations, including data that are irregularly spaced in time. For example, there are instruments that integrate a measurement over time until a certain signal-to-noise threshold has been reached. The integration period varies according to atmospheric conditions so that the resulting data have both variable integration times and are irregularly spaced in time. There is absolutely no way to represent these data with a single time point. The most efficient way of representing these data is with two time points: starting time and stopping time. This is the first requirement for the data file structure.

In those cases when many data sets are used or merged, a convenient single time reference point is the mid-point of the sampling period(s). Generally, this is the average of the start time and the stop time, but this is not always the case. As an example, there are measurements that integrate over a certain time but because of sample airflow changes (e.g., changing altitude during aircraft sampling) the sampling volume mid point does not correspond to the sampling time mid-point. In this case, the actual time midpoint must be specified by the investigator. Thus in order to encompass all of the possible diversity in sampling, three times need to be specified for each data point: start time, mid-point time, and stop time.

There are different views on what format should be used to represent time. In current measurement practice it is typical to find 1 second sampling intervals regardless of the platform (i.e., aircraft, etc.). Measurements at 1 Hz generally capture most of the important variablity in air quality data, and, while longer intervals are commonly reported, shorter intervals are not. The Ames format shows time as seconds from the start of the day defined in the file header and in the file name (see below). The ICARTT file format will adopt this structure. However, recognizing the need in some cases for >1 Hz sampling, the ICARTT format will allow data in fractional seconds though the default will be integer seconds. This does not mean that data MUST be shown in 1 second increments; whether it be 1 minute or some other increment, this decision is left to the principal investigator. In all cases, though, all times are explicitly accounted for in the period (day) specified by the header and file name. If no data are available for any time period, then that is represented by the missing data identifier. The one exception to this is when no sampling takes place from the start of a day to some point during the day. This might occur because of, for example, aircraft take-off. All times are in UTC.

B. Location information

The specification of this information is straightforward. All data points in the files need to have latitude (lat), longitude (lon), and either altitude (for aircraft, lidars, sondes) or elevation (for surface data). The lat/lon system used here will be strictly numeric: decimal degrees (to five decimal places) with south latitudes and west longitudes represented as negative numbers (i.e., no N, E, W, S identifiers). Elevations will be in integral meters. Altitudes must be explicitly defined since many types of altitude measurements are in use (pressure alt; GPS alt; geopotential alt; etc.).

Because this information is required to uniquely identify any given data point, ideally it is included in the file with those data. However, it is sometimes advantageous to have location information consolidated and uniquely identified in a separate file (e.g., an aircraft parameter file). If this is done, then information about that parameter file must be included in the data file header information. This will be specified below.

C. Measurements

In general, each file contains data of one parameter or species separated by a space. Multiple variables per file are allowed only if all were measured on exactly the same time base, as, for example, by the same instrument (e.g., GC/MS; PILS/IC). The numeric representation of a variable will be defined by the units in which it was measured. The ICARTT format contains the NASA Ames provision for a data scaling factor. However, we recommend that all scale factors be 1 unless it is grossly inconvenient to do so. If very large or very small numbers are required, then they can be represented with exponential notation, as in 1.01e9 or 5.23e-6.

i. Uncertainties

Every data point should have a corresponding total uncertainty (or error) which has the same units as the measurement. This uncertainty in the measurement is indicated as a TOTAL uncertainty to include all systematic and random effects. Ideally, these uncertainties are tabulated as the next (and separate) column after the data column in the file. However, this requirement can be relaxed if the uncertainty data can be reproduced by information in the header of the file. For example, if all uncertainties can be calculated by a function that has any given data point as input, then the formula can be included as header information.

ii. Missing data

Missing data are just that - missing. It makes no difference what the reason, whether it be a calibration period, a system crash, instrument maintenance, etc. Missing data are represented by negative numbers large enough to never be construed as actual data. For the ICARTT file format the value is -9999. Note that this is different from the Ames data exchange format in that Ames requires missing data flags to be numbers larger than any “good” data value. This somewhat arbitrary standard breaks down for measurements in urban areas where “good” data values can exceed reasonable expectation. For example, it is not uncommon in these areas for NO, NO2, or CO data to be in the parts per million range which are very large numbers for the standard units of measure (ppbv) for these species. On the other hand, there is no conceivable situation in which large negative numbers (e.g., -9999) can be construed as “good” data. Therefore, we specify for the ICARTT format that the primary missing data flag be -9999.

On the other hand, data below (or above) the limit of detection (LOD) are not actually “missing” but do convey some information. While some investigators choose to tabulate all of their quantifiable data, including negative values, others choose not to show these data points, but rather indicate the value is less than (or greater than) some quantifiable limit. These conditions will be indicated by two additional missing data flags that are substituted for the missing data values. The flag for data values GREATER THAN some UPPER LOD (ULOD) will be –7777, and the flag for data values LESS THAN some LOWER LOD (LLOD) will be -8888. These flags (if used) and the values of the upper and lower LOD are documented at specific locations in the header file (see below).

2. Filenames

Features of different file naming conventions (including Ames) have been adapted here. File names for the ICARTT data format, limited to 127 characters or less, are defined as follows:

dataID_locationID_YYYYMMDD[hh[mm[ss]]]_R#[_L#][_V#][_comments].extension,

where the only allowed characters are: a-zA-Z0-9_.- (that is, upper case and lower case alphanumeric, underscore, period, and hyphen). All fields not in square brackets are required and are described as follows:

dataID: short description of measured parameter/species, instrument, or model (e.g., O3; RH; VOC; PTRMS; MM5)
locationID: short description of site; station; platform; laboratory or institute
YYYY: four-digit year
MM: two-digit month
DD: two-digit day
hh: optional two-digit hour
mm: optional two-digit minute
ss: optional two-digit second
R: revision number of data
L: optional launch number
V: optional volume number
comments: optional additional information
extension: file type descriptor

The underscore is used ONLY to separate the different fields of the file name; it has special significance for file-checking software. To separate characters within a field for readability, use lower and upper case letters. The use of the hyphen, though allowed, is discouraged since this character in file names may cause problems with some older operating systems and network software. The square brackets “[ ]” enclose optional parameters but are not shown in the file name. Dates and times in file names are always UTC. The date and time in the file name give the date/time at which the data within the file begin (data files), or date/time at which the image applies (image files). For aircraft and sonde data files, the date always refers to the UT date of launch.

The dataID is a short string of characters used to identify the parameters in the file. For files that contain one or two variables those variable names can be used in the file name. For files in which many variables are represented, it may be best to indicate in the file name a class of compounds (e.g., VOC; PhotolysisRates) or an abbreviation of the instrument used to make the measurements (e.g., PTRMS).

The locationID is used to identify the measurement platform, site, station, or source (laboratory or institute) of the information within a data file. Some examples could be: DC8, BAE146, RHBrown, GOME (satellite), IoS (Appledore Island site), ChebPt (Chebogue Point site), and others. It may be useful to have a standardized set of abbreviations used for the ICARTT study. These should be decided upon by the DM team.

The R parameter will not be optional in the ICARTT study. We must specify a data revision code that will track changes in data and document why those changes occurred. For this we specify a revision number counter “_R#” where the underscore is a required element to separate the fields (this is needed for certain file checking software). The revision number "#" must match the revision number specified in the Normal Comments section of the file header (see below).

The optional parameters “_L#” and “_V#” may be needed in some special cases. If the contents of the file pertain to a second or third aircraft launch on the indicated date, then a launch counter "_L#" (i.e. L2, L3, etc.) must appear after the "R" identifier but before a volume counter, if present (see below). Launch number one is implied when "_L#" is omitted from the file name. If a data file is one volume of a multi-volume dataset, then a volume counter "_V#" (i.e. V1, V2, V3, etc.), must appear after the "R" parameter (and the “L” parameter, if present) separated by an underscore from the rest of the identifier. The volume number (the "#" in "V#") must match the volume number in the file header. When "_V#" is missing from the file name a one-volume dataset is implied.

The optional comments parameter is for additional information required by the PI (or Data Manager) to identify the file contents but that does not fit into the other fields of the file name. This should be used sparingly.

The file extension is a 2-4 character parameter that identifies the file type. The principal file type for the ICARTT study will be “.ict” and describes the time series data in a file formatted to ICARTT standards. Other file types may include:

“.txt” text file; not ICARTT formatted
“.jpg” image file
“.cdf” NetCDF file

These allowable file extensions will need to be defined by the team of Data Managers.

3. Recommended File Format Specification for ICARTT Time-series Data Files

 

A. Structure

We recommend that, whenever possible, ICARTT time series data files conform to the following Ames file format:

FFI = 1001; one real, unbounded independent variable; primary variables are real; no auxiliary variables; independent and primary variables are recorded in the same record.

What this means in English is that there is one time (independent) variable and that all other data depend on that variable. Any number of other variables can be defined, but they all depend on the one. In the typical case the fundamental variable is the start time of the measurement and others can be defined as in the following example, where the variable names refer to columns in the data file:

start time
stop time
mid-point time
latitude
longitude
altitude/elevation
data variable1
variable1 uncertainty
data variable2
variable2 uncertainty
<etc.>

This format accounts for most time series data measured anytime, over any arbitrary integration period, and at any place on or above the planet (within reason for air quality data). Obviously, the format can be condensed. For example, if measurements are reported as 1 second intervals, then stop time and mid-point time need not be included as data columns provided all time intervals in the measurement period are accounted for by inclusion of the missing data flag(s). Similarly, if the measurements are made at a fixed location then latitude, longitude, and elevation are fixed and these data would be included in the header information (see below). As pointed out above, if the location data (latitude, etc.) are included in a separate file, then these columns can be excluded provided the location data file name is included in the header information for the data file. Similarly, if uncertainty is defined as some function that is the same for all data points then that function can be included in the header information and the user can then calculate uncertainties. Variations in the way the format is used, based on the needs of the data provider, are accounted for in the file header information. As an example, some PIs may wish to report the END time of the measurement period as the independent variable. The ICARTT format allows this provided that the time variable is clearly labeled as such (e.g., End_UTC) and that additional information describing this (non-standard) situation be provided in the Normal Comments section of the file header. If the data periods are not of a constant duration, then the start time and mid-point time of each period must be included as an additional column and the Data Interval value set to 0 (see below). The header specifications are described below.

B. File header information

The basic structure of the ICARTT file header is similar to the Ames exchange format. For the ICARTT study we recommend some additional information that will be included in the comments sections. The most general header is shown below as an example; more specialized headers will be described as modifications to the general form. Different items of information in the same record (same line) are shown below as separated by a semicolon – in the actual file they are separated by a single space.

  • Number of lines in header; file format index: most files use 1001
  • PI name: last name, first name/initial
  • Organization/affiliation of PI
  • Data source description: e.g., instrument name; platform name; model name, etc.
  • Mission name: this will be ICARTT_ followed by your project; e.g., NEAQS, INTEX, etc.
  • File volume number; number of file volumes: These integer values are used when the data require more than one file per day. For data that require only one file these values are 1 1.
  • UTC date when data begin; UTC date of data reduction or revision.
  • Data Interval: This value describes the time spacing (in seconds) between consecutive data records. It is the (constant) interval between values of the independent variable. For 1 Hz data the data interval value is 1; for 1 minute data the value is 60; for 2 Hz data the value is 0.5. If the data records include start AND stop times, then the data interval value is 0.
  • Description or name of independent variable: This will be the name chosen for the start time or in some cases the mid-point time or end time of the data stream. It always refers to the number of seconds from the UTC start of the day.
  • Number of variables: Integer value showing the number of dependent variables (the total number of columns of data will be this value plus one).
  • Scale factors: This will be 1 for all variables, except where grossly inconvenient.
  • Missing data indicator: This will be –9999 for any missing data condition, except for the main time variable which is never missing.
  • VVariable names: Name or description of data in that column. When possible, we recommend that the variable name include units.
  • Number of SPECIAL comment lines: Integer value indicating the number of lines of special comments, NOT including this line.
  • Special comments: Notes of problems or special circumstances unique to this file. An example would be comments/problems associated with a particular flight.
  • Number of Normal comments (i.e., number of additional lines of SUPPORTING information): Integer value indicating the number of lines of additional information, NOT including this line.
  • Normal comments (SUPPORTING information): This is the place for investigators to more completely describe the data and measurement parameters. The supporting information structure is described below as a list of key word: value pairs. Specifically include here information on the platform used, the geo-location of data, measurement technique, and data revision comments. Note the non-optional information regarding uncertainty, the upper limit of detection (ULOD) and the lower limit of detection (LLOD) for each measured variable. The ULOD and LLOD are the values, in the same units as the measurements that correspond to the flags –7777 and –8888 within the data, respectively. The last line of this section should contain all the variable names on one line. The key words in this section are written in BOLD for clarity below. The actual file will not have special formatting codes. The key word must be typed followed by a colon then followed by your text (information). When more than one value (or information) is to be written on the same line, separate the values using a semicolon. For lines where information is not needed or applicable, simply enter N/A. The scanning program will look for these key words (case insensitive) when the file is submitted.

    PI_CONTACT_INFO: Phone number, mailing address, email address and/or fax number.
    PLATFORM: Platform or site information.
    LOCATION: including lat/lon/elev if applicable.
    ASSOCIATED_DATA: File names with associated data: location data, aircraft parameters, ship data, etc.
    INSTRUMENT_INFO: Instrument description, sampling technique and peculiarities, literature references, etc.
    DATA_INFO: Units and other information regarding data manipulation.
    UNCERTAINTY: Uncertainty information, whether a constant value or function, if the uncertainty is not given as separate variables.
    ULOD_FLAG: -7777 (Upper LOD flag, always -7's).
    ULOD_VALUE: Upper LOD value (or function) corresponding to the -7777's flag in the data records.
    LLOD_FLAG: -8888 (Lower LOD flag, always -8's).
    LLOD_VALUE: Lower LOD value (or function) corresponding to the -8888's flag in the data records.
    DM_CONTACT_INFO: Name, affiliation, phone number, mailing address, email address and/or fax number.
    PROJECT_INFO: Study start & stop dates, web links, etc.
    STIPULATIONS_ON_USE: (self explanatory)
    OTHER_COMMENTS: Any other relevant information.
    REVISION: R# (see filenames discussion above);
    R#: comments specific to this data revision. The revision numbers and the associated comments are cumulative in the data file. This is required in order to track the changes that have occurred to the data over time. Prepend the information to this section so that the latest revision number and comments always start this part of the header information. The latest revision data should correspond to the revision date on Line 7 of the main file header. Note that FIELD data files have revision LETTERS, not numbers.
    Indep_Var VarName_1 VarName_2 VarName_3 … VarName_n

For accounting purposes, the following formulas may be useful in identifying the number of lines of the header. If all key-value pairs in the normal comments section such as PI_CONTACT_INFO are on one line each, the number of normal comments is:
17 + (number of lines in the R# section)

The formula for the total number of lines in the header is:
12 + ( # dependent variables, given in line 10) + (1 + # special comments) + (1 + # normal comments)

C. Examples

Below are three examples of (similar) time series data using different forms of header information. Be aware that the automatic word-wrap feature in word processing programs gives the appearance that there are more lines of text than are really there. In these examples any continuation of lines from directly above has been indented for clarity.


EXAMPLE 1. All required data columns are shown explicitly.
File name: NOX_RHBrown_20040830_R0.ict

41 1001
Williams, Eric
Aeronomy Laboratory/NOAA
Nitric oxide and nitrogen dioxide mixing ratios from R/V Ronald H. Brown
ICARTT_NEAQS
1 1
2004 08 30 2004 12 25
0
Start_UTC (number of seconds from 0000 UTC)
9
1 1 1 1 1 1 1 1 1
-9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999
Stop_UTC
Mid_UTC
DLat
DLon
Elev
NO_ppbv
NO_1sig
NO2_ppbv
NO2_1sig
0
18
PI_CONTACT_INFO: Address: 325 Broadway, Boulder, CO 80305; email: eric@al.noaa.gov; 303-497-3226
PLATFORM: NOAA research vessel Ronald H. Brown
LOCATION: Latitude, longitude and elevation data is included in the data records
ASSOCIATED_DATA: N/A
INSTRUMENT_INFO: NO: chemiluminescence; NO2: narrow-band photolysis/chemiluminescence
DATA_INFO: All data with the exception of the location data is in ppbv. All one-minute averages contain at least 35 seconds of data, otherwise missing.
UNCERTAINTY: included in the data records as variables with a _1sig suffix
ULOD_FLAG: -7777
ULOD_VALUE: N/A
LLOD_FLAG: -8888
LLOD_VALUE: N/A; N/A; N/A; N/A; N/A; 0.005; N/A; 0.025; N/A
DM_CONTACT_INFO: N/A
PROJECT_INFO: ICARTT study; 1 July-15 August 2004; Gulf of Maine and North Atlantic Ocean
STIPULATIONS_ON_USE: Use of these data requires PRIOR OK from the PI
OTHER_COMMENTS: N/A
REVISION: R0
R0: No comments for this revision.
Start_UTC Stop_UTC Mid_UTC DLat DLon Elev NO_ppbv NO_1sig NO2_ppbv NO2_1sig
43200 43259 43229 41.00000 -71.00000 15 0.555 0.033 2.220 0.291
43260 43319 43289 41.01234 -71.01234 15 10.333 0.522 31.000 0.375


EXAMPLE 2 This example is similar to Example 1. Differences include the exception of the elimination of variables stop time, mid time, lat, lon, elev, and uncertainties, the inclusion of a special comment, the inclusion of DM info, and a second revision comment.
File name: NOX_RHBrown_20040830_R1.ict


36 1001
Williams, Eric
Aeronomy Laboratory/NOAA
Nitric oxide and nitrogen dioxide mixing ratios from R/V Ronald H. Brown
ICARTT_NEAQS
1 1
2004 08 30 2004 12 25
60
Start_UTC
2
1 1
-9999 -9999
NO_ppbv
NO2_ppbv
1
Lightning struck the ship at ~ 14:00:23 UTC, or at 50423 seconds after midnight UTC. The 13 minute section of missing data from 14:00 to 14:43 (50400 through 52780 of Start_UTC) reflects the period when the instrument was checked out and the computer rebooted.
19
PI_CONTACT_INFO: Address: 325 Broadway, Boulder, CO 80305; email: eric@al.noaa.gov; 303-497-3226
PLATFORM: NOAA research vessel Ronald H. Brown; sampling through high-flow manifold (res. time ~ 1 s) at 15 m above waterline
LOCATION: Ship location data in file ShipData_RHBrown_20040830_R0.ict
ASSOCIATED_DATA: ShipData_RHBrown_20040830_R0.RHB
INSTRUMENT_INFO: NO: chemiluminescence; NO2: narrow-band photolysis/chemiluminescence, See Williams et al., BigScience, 42, p. 50-51, 2001
DATA_INFO: Units are ppbv. All one-minute averages contain at least 35 seconds of data, otherwise missing. Midpoint time is 29 seconds after the minute. One second data are available, contact the PI.
UNCERTAINTY: NO: +/-(5%+0.005 ppbv); NO2: +/-(12%+0.025 ppbv)
ULOD_FLAG: -7777
ULOD_VALUE: N/A
LLOD_FLAG: -8888
LLOD_VALUE: 0.005; 0.025
DM_CONTACT_INFO: Donna Sueper; NOAA/AL; dsueper@al.noaa.gov. Data manager for data within ShipData_RHBrown_20040830_R0.ict is Jim Johnson with PMEL, James.Q.Johnson@noaa.gov
PROJECT_INFO: ICARTT study; 1 July-15 August 2004; Gulf of Maine and North Atlantic Ocean
STIPULATIONS_ON_USE: Use of these data requires PRIOR OK from the PI
OTHER_COMMENTS: N/A
REVISION: R1; R0
R1: NO2 data have been increased by 13% based on calibration standard recheck.
R0: No comments for this revision.
Start_UTC NO_ppbv NO2_ppbv
43200 0.555 2.509
43260 10.333 35.030


EXAMPLE 3. This example is similar to examples 1 and 2. Here the platform is a ground site with a locationID of ChebPt.
File name: NOX_ChebPt_20040830_R2.ict


36 1001
Williams, Eric
Aeronomy Laboratory/NOAA
Nitric oxide and nitrogen dioxide mixing ratios from Chebogue Point, Nova Scotia
ICARTT_NEAQS
1 1
2004 08 30 2004 12 25
60
Start_UTC
2
1 1
-9999 -9999
NO_ppbv
NO2_ppbv
0
20
PI_CONTACT_INFO: Address: 325 Broadway, Boulder, CO 80305; email: eric@al.noaa.gov; 303-497-3226
PLATFORM: 10 m tower at the Chebogue Point ICARTT research site.
LOCATION: Chebogue Point, Nova Scotia, Canada; lat: 43.45678; lon: -66.00000; elev: 30 m.
ASSOCIATED_DATA: Met_ChebPt_20040830_R2.ict
INSTRUMENT_INFO: NO: chemiluminescence; NO2: narrow-band photolysis/chemiluminescence.
DATA_INFO: All data is in units of ppbv.
UNCERTAINTY: NO: +/-(5%+0.005 ppbv); NO2: +/-(12%+0.025 ppbv)
ULOD_FLAG: -7777
ULOD_VALUE: N/A
LLOD_FLAG: -8888
LLOD_VALUE: 0.005; 0.025
DM_CONTACT_INFO: Donna Sueper; NOAA/AL; dsueper@al.noaa.gov
PROJECT_INFO: ICARTT study; 1 July-15 August 2004
STIPULATIONS_ON_USE: Use of these data requires PRIOR OK from the PI
OTHER_COMMENTS: N/A
REVISION: R2; R1; R0
R2: NO data have been decreased by 13% based on operator ineptitude.
R1: NO2 data have been increased by 13% based on calibration standard recheck.
R0: No comments for this revision.
Start_UTC NO_ppbv NO2_ppbv
43200 0.483 2.509
43260 0.899 35.030


4. Recommended File Format Specification for ICARTT Multi-dimensional Data Files

Also, view the "Amended FFI 2310" document for more details on this File Type.

A. Structure

We recommend the standard Ames file format, FFI=2310, for exchange of most multidimensional data files associated with the ICARTT study. The FFI descriptor is:

FFI 2310; two real independent variables, one unbounded and one bounded with its number of constant increment values, base value, and increment defined in the auxiliary variable list; primary variables are real; auxiliary variables are real;

For a more complete description, please see the Ames file format document at the web site listed in Part I. The following is based on an example in that document. The text in italics indicates comments not in the file but those added here for clarity. The normal comments section mimics that in the Ames1001 format described above.


EXAMPLE
File name: LidarO3_WP3_20040830_R0.ict

46 2310
Williams, Eric
NOAA Aeronomy Laboratory
Ozone number density profile from WP3 aircraft lidar
ICARTT_ITCT
1 1
2004 08 30 2009 09 04
60.0
Geometric altitude of observation (m)
Elapsed time in UT seconds from 0 hours on day given by date
1 {Number of PRIMARY variables}
1.0e9
-9999
O3 number density
9 {Number of AUXILIARY variable}
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
-9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999
number of altitudes at current time mark
geometric altitude (m) at which data begin
altitude increment (m)
geometric altitude of aircraft (m)
UT hour
UT minutes
UT seconds
aircraft longitude (deg)
aircraft latitude (deg)
0
18
PI_CONTACT_INFO: Address: 325 Broadway, Boulder, CO 80305; email: eric@al.noaa.gov; 303-497-3226
PLATFORM: NOAA WP3
LOCATION: Lat, Lon, and Alt included in the data records
ASSOCIATED_DATA: N/A
INSTRUMENT_INFO: Differential absorption lidar. See Williams et al., BigScience, 42, p. 50-51, 2001
DATA_INFO: The units are number density (#/cc). The vertical averaging interval is 975 m at 1-7 km above the aircraft and 2025 m > 7 km above the aircraft. Horizontal averaging interval: 60 km.
UNCERTAINTY: N/A
ULOD_FLAG: -7777
ULOD_VALUE: N/A
LLOD_FLAG: -8888
LLOD_VALUE: N/A
DM_CONTACT_INFO: N/A
PROJECT_INFO: ICARTT study; 1 July-15 August 2004
STIPULATIONS_ON_USE: Use of these data requires PRIOR OK from the PI
OTHER_COMMENTS: N/A
REVISION: R0
R0: No comments for this revision.
UT_TIME Num_altitudes geo_alt_begin alt_increment geo_alt_aircraft UT_hour UT_min UT_sec Lon_aircraft Lat_aircraft Array_O3_NumDensity[]
30300 26 12819 75 10389 8 25 35 -133.24 -9.45
1340 1519 1660 1779 1868 1939 1973 1992 1989 1955 1934 1897 1817 1721 1619 1514 1434 1343 1258 1203 1140 1088 1037 956 892 878
30360 22 12819 75 10383 8 26 0 -133.22 -9.93
1351 1523 1658 1774 1860 1930 1962 1974 1966 1932 1909 1877 1803 1706 1600 1493 1407 1310 -9999 -9999 1094 1045


Note that this file uses a scale factor (1e9) for the number density data since it would be very cumbersome to add the exponential notation to every value. Also, this example was adapted from the NASA document and did not have uncertainty or flag values associated with the data.


5. File Formats for Other Data

Data collected during the ICARTT study for which a standard time-series format does not apply can be formatted according to standards common to the user community and agreed to by the Data Management Working Group. For many modeling data sets the data files are generally stored in net.cdf format, which is a de facto standard for that community and will serve for the ICARTT study as well. However, the multi-dimensional data format defined above can accommodate these data sets and we leave this as an optional format. For some instruments (e.g., lidars), data are available as image files usually in standard formats such as GIF or JPEG. Not all software for reading and writing these formats allow additional text information (e.g., as a header) so the file names for these files must be defined to include as much information as possible. If necessary, the Data Management team will work with these PIs to achieve a mutually acceptable solution.

Data acquired by sensors on satellites are not conveniently incorporated into the ICARTT format. The data protocol allows each data record to be identified with a single timestamp only if data are reported continuously with a constant time interval (e.g., 1 second). Otherwise, start and stop times must be reported, and a data interval of 0 is entered on line 8 of the file header. Satellite data are unique in that while they are recorded on a constant data interval, significant gaps in the data may exist. These gaps may be due to cloud interference, changes in viewing mode (e.g., nadir versus limb), or other considerations. Given the sheer volume of data and the file sizes associated with satellite observations, it is not sensible to populate these data gaps with missing data values. It is also unreasonable to report start and stop times since data are typically collected on short timescales (typically sub-second) such that integration time is not an issue. Instead, satellite data files will report a data interval of -1 on line 8 of the file header. This signifies that each data record is identified by a single timestamp, but the actual timeline is discontinuous.

In general, if problems or difficulties arise the Data Management Working Group will deal with them on a case-by-case basis. We want to ensure that all data that are collected during the ICARTT study are made available to all participants as quickly and as seamlessly as possible. We welcome any comments or suggestions.

 

NASA - National Aeronautics and Space
 Administration
Curator: Ali Aknan
NASA Official: Dr. Gao Chen

+ Freedom of Information Act
+ Budgets, Strategic Plans and Accountability Reports
+ The President's Management Agenda
+ Inspector General Hotline
+ Equal Employment Opportunity Data Posted Pursuant to the No Fear Act
+ Information-Dissemination Priorities and Inventories
+ Privacy Policy and Important Notices
+ USA.gov
+ ExpectMore.gov
+ Multimedia Browser Plug-ins
+ Comments or Questions?