Repository of Research and Investigative Information

Repository of Research and Investigative Information

Kurdistan University of Medical Sciences

A framework for exploration and cleaning of environmental data � Tehran air quality data experience

(2014) A framework for exploration and cleaning of environmental data � Tehran air quality data experience. Archives of Iranian Medicine.

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Background: Management and cleaning of large environmental monitored data sets is a specific challenge. In this article, the authors present a novel framework for exploring and cleaning large datasets. As a case study, we applied the method on air quality data of Tehran, Iran from 1996 to 2013. Methods: The framework consists of data acquisition here, data of particulate matter with aerodynamic diameter �10 µm (PM<inf>10</inf>), development of databases, initial descriptive analyses, removing inconsistent data with plausibility range, and detection of missing pattern. Additionally, we developed a novel tool entitled spatiotemporal screening tool (SST), which considers both spatial and temporal nature of data in process of outlier detection. We also evaluated the effect of dust storm in outlier detection phase. Results: The raw mean concentration of PM<inf>10</inf> before implementation of algorithms was 88.96 µg/m3 for 1996�2013 in Tehran. After implementing the algorithms, in total, 5.7% of data points were recognized as unacceptable outliers, from which 69% data points were detected by SST and 1% data points were detected via dust storm algorithm. In addition, 29% of unacceptable outlier values were not in the PR. The mean concentration of PM<inf>10</inf> after implementation of algorithms was 88.41 µg/m3. However, the standard deviation was significantly decreased from 90.86 µg/m3 to 61.64 µg/m3 after implementation of the algorithms. There was no distinguishable significant pattern according to hour, day, month, and year in missing data. Conclusion: We developed a novel framework for cleaning of large environmental monitored data, which can identify hidden patterns. We also presented a complete picture of PM<inf>10</inf> from 1996 to 2013 in Tehran. Finally, we propose implementation of our framework on large spatiotemporal databases, especially in developing countries. © 2014, Academy of Medical Sciences of I.R. Iran. All rights reserved.

Item Type: Article
Keywords: air monitoring; air pollutant; air quality; Article; beta radiation; cleaning; data analysis; dust control; environmental management; geographic distribution; information processing; Iran; outcome assessment; spatiotemporal analysis; air pollution; algorithm; analysis; environmental monitoring; evaluation study; factual database; methodology; particulate matter; procedures; statistical analysis; statistical model; statistics and numerical data; time, air pollutant; particulate matter, Air Pollutants; Air Pollution; Algorithms; Data Interpretation, Statistical; Databases, Factual; Environmental Monitoring; Iran; Models, Statistical; Particulate Matter; Research Design; Spatio-Temporal Analysis; Time Factors
Page Range: pp. 821-829
Journal or Publication Title: Archives of Iranian Medicine
Volume: 17
Number: 12
Publisher: Academy of Medical Sciences of I.R. Iran
ISSN: 10292977
Depositing User: مهندس جمال محمودپور
URI: http://eprints.muk.ac.ir/id/eprint/1086

Actions (login required)

View Item View Item