Forecasting of saline intrusion in Ham Luong river, Ben Tre province (Southern Vietnam) using Box-Jenkins ARIMAmodels

Use your smartphone to scan this QR code and download this article ABSTRACT Introduction: HamLuong River is a branch ofMekong River located in Ben Tre Province, which has played a crucial role in supporting livelihoods of local residents and the province's economic development. However, the saline intrusion has been expanding in Ham Luong River, which seriously affects the productive agriculture, aquaculture, and further causes tremendous difficulties for local people's lives. Thus, it is crucial to have research for forecast the saline intrusion in Ham Luong River. Our aimwas to developmathematicalmodels in order to forecast the saline intrusion inHamLuong River, Ben Tre Province. Methods: The Auto regressive integrated moving average (ARIMA) model was built to forecast the weekly saline intrusion in Ham Luong River, which has been obtained from Ben Tre Province's Hydro-Meteorological Forecasting Center over eight years (from 2012 to 2019). Results: The saline concentration increased from January to March and then decreased from April to June. The highest salinity occurred in February andMarchwhile the lowest salinity was observed in early June. Moreover, the ARIMA technique provided an adequate predictivemodel for a forecast of the saline intrusion in An Thuan, Son Doc, and An Hiep station. However, the ARIMA model in My Hoa and Vam Mon might be improved upon by other forecasting methods. Conclusion: Our study suggested that the nonseasonal/seasonal ARIMA is an easy-to-use modeling tool for a quick forecast of the saline intrusion.


INTRODUCTION
Ham Luong River (HLR) (in Vietnamese: Sông Hàm Luông) is a branch of the Mekong River in the Mekong Delta region that flows entirely within Ben Tre Province (BTP). HLR has played a crucial role in supporting the livelihoods of local residents, giving a productive environment for agriculture, aquaculture, capture fisheries, non-fish aquatic goods, and tourism revenue 1 . However, saline intrusion (SI) has been expanding in Mekong Delta, especially in BTP in recent years, which seriously affect the productive agriculture, aquaculture, and also causes tremendous difficulties for local people's lives 2 . In the dry season, the saline water from the East Sea has intruded into HLR, and after that continued intrusion into complicated canal networks in BTP. SI is a complex phenomenon depending on a variety of variables include freshwater discharge from upstream, capacity, and morphology of the rivers/canals, a configuration of the drainage network, tidal conditions, and presence of control artificial structures such as dams, sluice gates 3,4 . Moreover, the impacts of climate change and sea-level rise also exacerbate the damage of SI 5 . However, SI might be predicted by using statistical models. Therefore, it is crucial to have research for forecast SI in HLR in order to give useful information that can be used in water resource management and saltwater monitoring as well. Nowadays, capabilities to predict SI was a principle of interest in many studies. Various models have been developed to predict SI in main rivers. An artificial intelligence model, like an Artificial Neural Network (ANN) model 6 , simulate SI using a trained neural network. Remote sensing techniques, like resolution applications of available satellite images for detecting SI 5 . However, these methods mostly rely on complex statistics, artificial intelligence techniques, and large amounts of meteorological and topographic data 7 . This leads to needing a model that is reliability, accurate, suitability whereas small amounts of hydrodynamic. The Auto regressive integrated moving average (ARIMA) model is regarded as a smooth method, and it is applicable when the data is reasonably long and the correlation between past observations is stable 8 . ARIMA model 9 , also known as the Box-Jenkins model or methodology, is commonly used in forecasting and analysis. Some significant advantages of ARIMA forecasting are: first, it only needs endogenous variables and does not need to use other exogenous variables. Second, the ARIMA technique only requires the prior data of a time series to generalize the forecast. Hence, it can increase the forecast accuracy while keeping the number of parameters to a minimum 10 . This lead to the ARIMA model has been applied to analyze hydrological time series, especially at the monthly scale 11 . Several studies in the literature have used the ARMA model for saline intrusion prediction. Sun and Koch (2001) used ARIMA to analyze and forecast of salinity in Apalachicola Bay, Florida. The results show that ARMIA has been possible to statistically define the interaction of different parameters that affect the salinity change in Apalachicola Bay provided help one understand the hydrodynamic circulation of the water body through the approach of data analysis 12 . Felisa et al. (2015) applied the ARIMA model to forecast the groundwater salinization in Ravenna (Italy). The resulting predictive models were validated by comparison with data and demonstrated that data-driven approaches may provide useful information in situations where physics-based models have only limited success in characterizing the phenomenon of interest 13 . As well as this, the ARIMA model is a major technique in hydrology and has been used extensively, mainly for the prediction of natural phenomena such as precipitation, streamflow events, solar radiation 11,14,15 . Here, our primary objective was to develop the ARIMA model to forecast the weekly SI of HLR, BTP in consideration of the accuracy, suitability, adequacy, and timeliness of a collected data, which have been obtained from Ben Tre Province's Hydro-Meteorological Forecasting Center (BTHMFC) over eight years (from 2012 to 2019). The reliability, accuracy, suitability, and performance of the model are investigated in comparison with those of established tests, such as standardized residuals.

Study area and dataset collection
HLR is separated from Tien River in Tan Phu Commune, Chau Thanh District, BTP, creating a natural border between Bao and Minh islet. It has 72 km long, from 12 to 15 m in-depth, and from 1,200 to 1,500 m (over 3,000 m at estuary) in width. During the rainy season, average river flows are approximately 3,300-3,400 m 3 /s, while around 800-850 m 3 /s in the dry season 16 . There are six saltwater monitoring stations (from estuary to upstream) situated in An Thuan-AT (Tiem Tom harbor, Ba Tri District), Son Doc-SD (Hung Le Commune, Giong Trom District), Phu Khanh-PK (Phu Khanh Commune, Thanh Phu District), My Hoa-MH (Ben Tre city), An Hiep-AH (An Hiep Commune, Chau Thanh District), and Vam Mon-VM (Phu Son Commune, Cho Lach District) (Figure 1). In each station, the saltwater monitoring data were collected one time per week for a period of 23 weeks (from January to June that is the dry season in Mekong Delta). The river saltwater monitoring data from 2012 to 2019 were provided by BTHMFC (available at http://www.bentre.gov.vn/Lists/ThongTinCanBiet /TongQuat.aspx). The present study forecast the SI in HLR from Jan 1 st -Jan 8 th (week 1) to Jun 4 th -Jun 11 st (week 23) of 2020 based on saltwater monitoring data from 2012 to 2019 (Appendix 1).

ARIMA models description and application
ARIMA was first formed by Box and Jenkin in 1976 9 . The general equation of successive differences at the dth difference of Xt is briefly expressed as follows: where d is the different order, and B is the backshift operator The successive difference at one-time lag equals to: In this situation, the general non-seasonal ARIMA (p, d, q) is as follows: is an autoregressive operator of order p, θ q (B) is a moving average operator of order q, and W t = ∆dX t A general nonseasonal/seasonal ARIMA (p, d, q)x(P, D, Q)s model with nonseasonal parameters p, d, q, seasonal parameters P, D, Q, and seasonality s that consists of several terms: A nonseasonal autoregressive term of order p, a onseasonal differencing of order d, a nonseasonal moving average term of order q, a seasonal autoregressive term of order P, a seasonal differencing of order D, a seasonal moving average term of order Q. ARIMA(0,1,1)x(0,1,1)s-seasonal and nonseasonal MA terms of order 1 which was a common nonseasonal/seasonal ARIMA model. For a more detailed description of the terminology, see Box and Jenkins (1976) 9 , Bowerman and O'Connell (1987) 17 , and Pankraz (1991) 18 . ARIMA modeling was developed using Statgraphics Centurion ver. 18 software. Model performance was evaluated using the root mean squared error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the mean error (ME), the mean percentage error (MPE) 19 .

Map visualizations
An Inverse Distance Weighting (IDW) method in Ar-cGIS 10.3 was used to interpolate forecast point data to create continuous surface maps 20 : 1/Di j where λ i was the property at location i; λ j was the property at location jDij was the distance from i to j G was the number of sampled locations, and was the inverse-distance weighting power.

Long-term saline intrusion data in Ham Luong River from 2012 to 2019
The saline concentration data in HLR for eight years that is obtained from the BTHMFC and Figure 2 presented the basic trends of the collected data. Overall, the saltwater concentration in HLR increased from February to April. The maximum saltwater occurred at the end of March or the beginning of April in which was the driest months in the year. Subsequently, the saltwater concentration decreased slightly in late May and fell rapidly in early June because of the seasonal change with rainfall in May. In early June, it is the beginning of the rainy season with much rainfall than those in May; therefore, the saline concentration decreased rapidly in the whole river. Notably, the highest saltwater concentration in HLR was observed in 2016 because of a severe El Niño, BTP experienced serious SI. The maximum saltwater concentration was 31.

The ARIMA model for the forecast of saline intrusion in Ham Luong River
In AT station, the highest saline concentration of 25.34 ‰ is observed in week 6, followed by 21.25‰ (week 10) and 21.16‰ (week 9). Furthermore, week 12 was expressed as the highest saltwater concentration (13.24‰), week 5 (8.95‰), week 12 (4.67‰), week 4 (1.68‰), week 11 (0.72‰). By contrast, the lowest saltwater concentration of 12.46 ‰ is observed in week 23. The saltwater concentration measured  Table 1 showed an overview of the monthly average of the forecasted saltwater concentration for all stations in HLR from January to June 2020. Generally, the saltwater concentration increased from January to March and then decreased from April to June. The maximum saltwater occurred in February and March while the lowest saltwater was observed in early June. Figure 3 showed the historical data, the forecasts, and the forecast limits (95% P.I.)

Testing forecast models
A normal probability plot of the residuals can be displayed in Figure 4. If the residuals come from a normal distribution, they should fall close to the line. In fact, the residual plot in AT, SD, PK, AH showed some curvature away from the line while MH and VM did not. There are five tests have been run to determine whether or not the residuals form a random sequence of numbers. If a p-value for each test is greater than or equal to 0.05, we can not reject the hypothesis that the series is random at the 95.0% or higher confidence level. ARIMA forecasting model in AT, SD, PK, AH passed five tests while MH and VM did not ( Table 2).

The perspective view of the saline intrusion in Ham Luong River in 2020 is predicted by the ARIMA model
At the beginning of the dry season (January), the saltwater levels of 10‰ will have occurred in a location where between Mo Cay Nam and Thanh Phu District, over 50 km away from Ham Luong estuary.  Also, the saltwater levels from 5-10‰ will cover almost all of Giong Trom and half of Mo Cay Nam District. These districts in upstream such as Chau Thanh and Cho Lach District will be covered by under 2‰ (Figure 5A). Subsequently, at the driest month (February and March), saltwater will be intruded into an area within 60-70 km from the mouth of HLR; therefore all of Giong Trom and Mo Cay Nam District will be affected with the saltwater rate 10‰. Ben Tre City and a small part of Chau Thanh District will be covered by under 5‰ (Figure 5B, C). Finally, at the beginning of the rainy season (early June), saltwater will be pushed away from the inland. The saltwater levels of 10‰ will be observed in Ba Tri District, approximately 10km away from the estuary ( Figure 5F). Based on the forecasting results of the ARIMA model, saltwater with 5‰ will be entered up to 60-70 km deep inland that means Ben Tre city (areas with the highest population) and Chau Thanh District (areas with large-scale fruit production) seems to be affected by SI. Outcomes of this study are useful for reducing damages caused by the saline intrusion in the Mekong    Delta, also BTP in saline season 2020.

The ARIMA model: advantages and disadvantages
Forecast is an activity to calculate or predict future events or situations, usually as a result of rational study or analysis of suitable data 21 . The accurate information for saline forecast will become more and more difficult to predict due to climate change and extreme weather 22 . In recent years, there are several quantitative forecast techniques available such as ARIMA models, Random walk models, Trend models, or Exponential Smoothing. Generally, ARIMA models are considered as statistical theory and mathematically complex techniques while the others are defined as simple prediction techniques. Therefore, the ARIMA model has been regarded as the most efficient prediction technique in hydrology 12 . In the empirical research, many advantages of the ARIMA model were found and support it as a proper way in especially short-term time series forecasting 23 . The ARIMA model requires fewer the prior data inputs to generalize the forecast., only needs endogenous variables and does not need to use other exogenous variables. Basically, this model is relatively more robust and efficient than other complex structural models in relation to short-run predictions 24 . However, the main limitation of ARIMA is the lack of a deterministic cause 25 . In addition, many traditional techniques for time series forecast, such as ARIMA, which assume that the series is generated from linear processes and as a result might be inappropriate for most realworld problems that are nonlinear 26,27 . This problem has now been circumvented through large numbers of past data inputs, stochastic events, and the accuracy of past data inputs that must be enhanced.