Over the past century, more than 16 million ocean temperature profiles had been collected by various instruments (Figure 1). There is an increasing demand for the high-quality, global-distributed in-situ data to support scientific research, governmental and non-governmental organizations, industry, fisheries, individuals, and policy makers.
Figure 1. Number of temperature casts collected by different instruments in the World Ocean Database (up to December 2021). (Image by IAP)
However, each instrumentation provides data of different accuracy, different quality, and different completion of the metadata. Before using this raw data to do the scientific research, the quality control (QC) process is compulsory to ensure data accuracy and availability. In early years, the QC was usually performed manually by the experts. However, the manual QC of large datasets (for example, 16 million profiles in global ocean datasets) is not feasible because of the manpower and time cost. Therefore, how to identify outliers efficiently, quickly, and automatically in an AutoQC system is still a research priority.
A recent research study published in
Deep-Sea Research Part I in January 2023 provides a new climatological range-based automatic quality control system for ocean temperature in-situ profiles (namely CODC-QC: CAS Ocean Data Center - Quality Control system). The CODC-QC includes 14 distinct quality checks to identify outliers. "We developed this new QC system to provide a quality-homogenous database, with reduced human-workload and time cost on manual QC," said TAN Zhetao, the first author from the Institute of Atmospheric Physics (IAP) at the Chinese Academy of Sciences (CAS).
Unlike many existing QC procedures, no assumption is made of a Gaussian distribution law in the new approach as the statistical distribution of the oceanic variables (e.g., temperature and salinity) are typically skewed. Instead, the 0.5% and 99.5% quantiles are used as thresholds in CODC-QC to define the local climatological ranges. In addition, these thresholds are time-varying, which aims at erroneously excluding real data during the "extreme events". The above strategies are used in local climatological range check for both temperature and vertical temperature gradient, in which the anisotropic feature of water properties is accounted for, and the topography barriers adjustment of water mass are made.
Besides, the performance of CODC-QC system was evaluated using two expert/manual QC-ed benchmark datasets. This evaluation demonstrated the effectiveness of the proposed scheme in removing spurious data and minimizing the percentage of mistakenly flagged good data (Figure 2).
Figure 2. Application of three AutoQC procedures to 3000 temperature profiles arbitrarily selected from the QuOTA dataset: (a) raw profiles; (b) profiles after removing data flagged by manual/expert QC (benchmark); (c) same as (b) but for the ICDC-QC (Integrated Climate Data Center, University of Hamburg); (d) same as (b) but for the CODC-QC. (Image by IAP)
Finally, CODC-QC was also applied to the global World Ocean Database (WOD18) including 16, 804, 361 temperature profiles from 1940 to 2021. Based on the statistics of temperature outliers, 7.97% of measurements were rejected, in which XBT data takes the highest rejection rate (15.44%) whereas the Argo profiling float takes the lowest rejection rate (2.39%). "We suggest a dependency of the quality of temperature observations on the instrumentation type," said Viktor GOURETSKI, one of co-author and researcher with the IAP/CAS.
The paper also applies the CODC-QC system to the study of monitoring global ocean warming (Figure 3). "We found that the application of the CODC-QC system leads to a 15% difference for linear trend of the global 0–2000m ocean heat content changes within 1991–2021 compared to the application of WOD-QC (NOAA/NCEI), implying a non-negligible source of error in ocean heat content estimate." said CHENG Lijing, the corresponding author of this study, and a professor at IAP/CAS.
Figure 3. Global upper 2000 meters ocean heat content (OHC) anomaly (J/m2) based on WOD-QC and CODC-QC. Dash lines denotes the linear trend. (Image by IAP)
The quality-controlled (by CODC-QC) and bias-corrected ocean in-situ profile data of CAS-Ocean Data Center, Global Ocean Science Database (CODC-GOSD) are now freely accessible at
http://www.ocean.iap.ac.cn/ and
https://www.casodc.com/data/. Besides, the CODC-QC is freely available from Github (https://github.com/zqtzt/CODCQC) as an Open-Source Python package under the Apache-2.0 License.
This study is supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant no. XDB42040402), open fund of State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, MNR, China (Grant no. QNHX2133), the National Natural Science Foundation of China (Grant no. 42122046, 42076202) and the Program of Oceanographic Data Center, Chinese Academy of Sciences (CASWX2022SDC-XK11). This study also acknowledges IQuOD (International Quality-Controlled Ocean Database) for sharing some QC codes, which were used in this study for QC system inter-comparisons.
Reference:
Tan Z, Cheng L*, Gouretski V, Zhang B, Wang Y, Li F, Liu Z, and Zhu J., 2023: A new automatic quality control system for ocean profile observations and impact on ocean warming estimate. Deep Sea Research Part I: Oceanographic Research Papers, 194, 103961, https://doi.org/10.1016/j.dsr.2022.103961.
Media contact: Ms. LIN Zheng, jennylin@mail.iap.ac.cn