Abstract:Based on observation data from the Beijing-Tianjin-Hebei National Weather Stations between 2017 and 2021, and ERA5 reanalysis data, a forecasting model for light fog and fog was developed by using a variety of machine learning algorithms. The study also investigated the influence of reanalysis and topographic factors on model performance, and utilized a method combining multi-model integration and statistical voiding to optimize the model. The main findings are as follows: (1)ensemble learning methods such as XGBoost (eXtreme Gradient Boosting), LightGBM (Light Gradient Boosting Machine), and random forest outperform the decision tree method in terms of low-visibility weather forecast ability; (2) the performance of the XGBoost and LightGBM models is significantly improved when introducing ERA5 reanalysis and topographic factors. Specifically, the TS (Threat Score) of fog forecast is 30% and 32% higher than that built on surface elements only, reaching 0.52 and 0.49, and the POD (Probability of Detection) is 0.62 and 0.87, respectively. In addition, the TS of light fog and fog forecast increase to 0.51 and 0.54 after stacking two models; (3) during a regional fog event in the fall of 2022, our methods accurately predict fog 72 h in advance. In particular, the LightGBM model performs best, with 0-36 h fog forecast TS and 0-72 h light fog forecast TS reaching 0.3, which is better than ECMWF (European Center for Medium Weather Forecasting) in accuracy and timeliness.