Ground Level Ozone has been 'widely recognized as a critical air pollutant that has the potential to induce various adverse environmental and health effects.' (Du & Yu, 2022). Ground Level Ozone forms when pre-cursor pollutants such as (nitrogen dioxide) are exposed to sunlight. This project trains a linear regression to model the relationship between pre-cursor factors and environmental factors (such as UV index, wind speed etc). Modelling these factors and predicting the ground level ozone concentrations could form the basis for proactive and remedial measures.
The Breathe London Network is a ‘community sensing network’ run by the Environmental Research Group at Imperial College London. The sensor network consist of over 400 air quality sensors in the Greater London Area.
This projects compiles a dataset from 3 REST APIs:
- the Breathe London API,
- the Open-Meteo Air Quality API
- the Open-Meteo Historical weather API
The assembled dataset was checked and modified as need to be compliant with the linear regression assumptions:
- Multi-collinearity check was performed using Variance Inflation Factor
- Skewness and Kurtosis values were calculated to check the data did not violate the normality assumption. A box-cox transformation was applied in response to the Kurtosis test.
- Checked for significant outliers using Z-score
- Correlation matrices confirmed linear correlations.
Recursive Feature Elimination was utilized to identify the optimum number of features.
The final linear regression model achieved an R^2 score of 0.87 ± 0.12 and an RMSE score of 0.13 ± 0.08.