Identification and inter-comparison of appropriate long-term precipitation datasets using decision tree model and statistical matrix over China

Muhammad Abrar Faiz, Yongqiang Zhang, Faisal Baig, Dariusz Wrzesiński, Farah Naz

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


The reliability of long-term precipitation estimates is vital for climatology and hydrometeorology applications. Different climatic zones and high rain gauge network (more than 800) of China are a suitable topography for performance evaluation of different long-term precipitation datasets. In this study, seven long-term precipitation datasets are tested against in situ observations at different time scales (1981–2016) at 813 grid points. Well-known statistical indicators and Fast-frugal tree (FFt) decision model are employed to identify the best long-term datasets. Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Climate Data Record is the only datasets that did not perform well in the study area. Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE), Global Precipitation Climatology Centre (GPCC), and Climate Prediction Center (CPC-Global) estimates are comparable with in situ observations. Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), National Centers for Environmental Prediction (NCEP2) overestimate the precipitation extremes in the region. There exists a difference of 100–250 mm among precipitation datasets at an annual scale. All of the seven long-term datasets underestimate the rxdays across China. The minimum range of rxdays (maximum precipitation amount in under defined days: 1 day or 5 days) captured by datasets is comparable except PERSIANN-CDR. The maximum range calculated with PERSIANN-CDR is 55.01 (rx1day), and 129.67 (rx5day), much less than the in situ rxdays. This analysis shows that datasets failed to capture maximum precipitation intensity in the region as well. FFt decision model results show that APHRODITE ranked first based on calculated consecutive dry days among all six other datasets in the most climatic zones. Overall, results indicate that data assimilation, the spatial coverage of ground stations, and interpolation techniques used to develop the datasets may limit the reliability of precipitation datasets in the study area.

Original languageEnglish
Pages (from-to)5003-5021
Number of pages19
JournalInternational Journal of Climatology
Issue number10
Publication statusPublished - Aug 2021
Externally publishedYes


  • CMA
  • extremes
  • NCEP2


Dive into the research topics of 'Identification and inter-comparison of appropriate long-term precipitation datasets using decision tree model and statistical matrix over China'. Together they form a unique fingerprint.

Cite this