A Modified Semi-parametric Regression Model For Flood Forecasting

, 2015) ABSTRACT: In recent years, inundation, one of natural calamities, occurs frequently and fiercely. We are sustained severe losses in the floods every year. Therefore, the development of control methods to determine, analyze, model and predict the floods is indispensable and urgent. In this paper, we propose a justified semiparametric regression model for flood water levels forecasting. The new model has three components. The first one is parametric elements of the model. They are water level, precipitation, evaporation, air-humidity and ground-moisture values, etc. There is a complex connection among these parametrics. Several innovated regression models have been offered and experimented for this complicated relationship. The second one is a non-parametric ingredient of our model. We use the Arnak S. Dalalyan et al.’s effective dimension-reduction subspace algorithm and some modified algorithms in neural networks to deal with it. They are altered back-propagation method and ameliorated cascade correlation algorithm. Besides, we also propose a new idea to modify the conjugate gradient one. These actions will help us to smooth the model’s non-parametric constituent easily and quickly. The last component is the model’s error. The whole elements are essential inputs to operational flood management. This work is usually very complex owing to the uncertain and unpredictable nature of underlying phenomena. Flood-water-levels forecasting, with a lead time of one and more days, was made using a selected sequence of past water-level values observed at a specific location. Time-series analytical method is also utilized to build the model. The results obtained indicate that, with a new semiparametric regression one and the effective dimension-reduction subspace algorithm, together with some improved algorithms in neural network, the estimation power of the modern statistical model is reliable and auspicious, especially for

ABSTRACT: In recent years, inundation, one of natural calamities, occurs frequently and fiercely. We are sustained severe losses in the floods every year. Therefore, the development of control methods to determine, analyze, model and predict the floods is indispensable and urgent. In this paper, we propose a justified semiparametric regression model for flood water levels forecasting. The

INTRO DUCTIO N
Vietnam is a tropical and temperate country. It is characterized by a strong monsoon influence, a considerable amount of sunny days, and with a high rate of rainfall and humidity. It's usually affected by the change of climate. Floods happen more and more with increasing frequency and devastation. To help people to subsist on floods, to reduce human and material losses to the minimum are the our main goal. Flood modeling or forecasting is a remedy for this problem. There are several techniques for modeling flood water levels. One of the most important prerequisites in operational flood management is predicted flood values in nealy real-time sense.
Historically, there are different methods for flood forecasting. A large number of rainfallrunoff models have been developed. These include conceptual models that try to conceptualize the physical process influencing the runoff, empirical models, and complex models that couple meteorologic and hydrologic models for flow forecasting - [1]. There are some advantages when using a neural network for flood water-level modeling and forecasting: Neural networks are useful when the underlying problem is either poorly defined or not clearly understood.
Their applications do not requrire a prerequisite knowledge about the studied process.
etc [2]  Owing to these reasons, neural networks are designed to recognize the a hidden pattern in the data in a similar way to that of the human brain. The details of their functions and applications could be given in various documents (e.g., refs. [1], [4], and [6]). The neural network is suitable for the particular application belongs to the feedforward type, as illustrated in Figure 1, that has the capacity for approximating any continuous function.
The following parts decribe an effort to modify and develop the back-propagation, cascade correlation and conjugate gradient neural networks for flood water-level modeling or forecasting in a particular location.
The last component is the model's error. It represents the measurement errors such as counting and figures surveying errors.
The whole elements of the model are essential inputs to operational flood management. This work is usually very complex owing to the uncertain and unpredictable nature of underlying phenomena. The technique of multi-variate semiparametric regression modeling and neural networks therefore was applied to model it. Flood-water-levels forecasting, with a lead time of one and more days, was made using a selected sequence of past water-level values observed at a specific location.

CASE STUDY 2.1. Study area
The measured flood water-level data were available at the Chau Doc and Tan Chau gauging stations, in An Giang province, Vietnam. Tan Chau station is coded as 019803, located on upstream of Tien River, at longitude 105 o 13'' and lattitude 10 o 45'. Chau Doc station is coded as 039801, located on upstream of Hau River. They are settled in Long Xuyen quadrangular basin, one of areas sustained heavy losses in the inundations in Mekong Delta every year. It is shown in Figure 2.

Figure 2. Catchment area plan
This catchment is approximately 489000 hectare natural area. The topography is sunken, even and flat with nearly from 0,4 m to 2,0 m altitude from the sea water level. Yearly, the flood season occurs from July to December. This studied basin is often inundated from 0,5 to 2,5 meter depth. The irregular change of upstream head-waters of Mekong River, especially from the border between Cambodia and Vietnam, could lead into the fluctuations.

The data
Daily 24-hours flood water-level values in twelve years, from 1 st January, 2000 to 31 st December, 2011, were extracted from the weekly reports' records of the Regional Flood Management and Mitigation Centre, a division of Mekong River Commission. In each year, every seven successive days is gathered to form a group. In these groups, the 5 first daily values were the input values and two remaining ones were the output values. The first group, the third one, the fifth set…were used for training; and the others were applied for tesing purposes. Thus, in all, 52704 input-output data records were used successively for training and 52704 data records were used for testing application.
The objective was to model and forecast daily flood water-level values with lead time of 1 and 2 days. Since the main purpose of this paper is to furnish citizens with short-term or mediumterm forecasted results, we do not carry out the algorithm for 3-days, 4-days and beyond. The final results received from the modified semiparametric regression model, via dimensionreduction subspace algorithm and these artificial neural networks could be helpful basic information for model adjusting, extending and upgrading. In other words, even though a larger lead time of model or forecast would be more useful to issues the flood warnings well in advance, the smaller lead time can help in making emergency reservoir operations and also in cautioning the population at longer distances downstream or at many specific sites where a nearby river gauging station is not available.
As shown in Figure 1, a sequence of five preceding daily values was given as input to the network, so as to enable the network to learn the pattern of flood water-level in the preceding days and make a prediction accordingly to the future event. We can see that this future event belonged to lead times of one and two days, videlicet the sixth day flood values and the seventh day flood ones. If the lead time changes, the weights of neural network will be updated. At that time, the input part of the training pattern remains the same, but the output value will be changed. The choice of this sequence was made on a trial basis. No significant improvement in the prediction was noted when the sequence length was increased or decreased beyond 5 days, 6 days or 7 days.

THE TRAINING ALGORITHM
The proposed semi-parametric regression model is shown as following formula: Suppose the data consists of n subjects. For   [11].
In our model, flood forecasting problem is far from simple due to water level, precipitation, evaporation, air-humidity and ground-moisture. In this paper, many linear regression models We can see ref.
We attempt to reduce this global error by adjusting the weights and biases.

Adjusted Back-Propagation Algorithm
This involves minimization of the global error using a steepest-descent or gradient-descent approach. The network weights and biases are adjusted by moving a small step in the direction of a negative gradient of the error function during each iteration. The iterations are repeated until a specified convergence or number of iterations are achieved.
The gradient descent is defined by The preceding error-gradient approach is simple to use. Nonetheless, it converges slowly and may exhibit oscillatory behaviour due to the fixed step size. So, we could change some parameters from f(Wk), modify the iteration step flexibly rely on typical characteristics of each data set. We could also alter the threshold for normalizing the input values, if these values exceed the given threshold. These actions would diminish separately the error for each training pattern. Since that time, the global error could be reduced to minimum. In other words, the global error is close to zero. These changes abovementioned will be stoped when a specified convergence is archieved.
There are some notes for this algorithm.
Firstly, the input layer has five nodes. The hidden layer has three nodes, and the output layer has two others.
Secondly, the standard threshold of our network is 420 (cm) for Tan Chau gauging station and 350 (cm) for Chau Doc one. These values are chosen because if water levels equal or overcome it, floods or inundations situations will occur.
However, if one of five normalized input values for a specific operation is more than or equal to 1, the threshold (or the milestone) of our global network will be added 50 centimeters. We will repeat this action (add 50 cm for the current global threshold) if one of five normalized input values for a specific operation is still more than or equal to 1. This 50-centimeter gap is chosen because it is the gap between three flood dangeralarm levels at these gauging stations.
This alteration causes some unprecedented and flexible change for our neural network. It means that the output values of the neural network will be gotten better and better if we use the suitable threshold. Besides, our model is not influenced by any input value.
Thirthly, the transfer function as sigmoidal function, which we use, is given by where OOq is the output of the qth output neuron, IOq is the input of the qth output neuron, θOq is the threshold of the qth neuron.
if the normalized output value is greater This is also a creative point of our neural network. The choice of signs contributes to reduce errors for training patterns. Note that we only choose the minus signal if the normalized output value is not greater than zero.
However, if the resulting size of the network is too small, it gives rise to inadequate learning of the problem. On the other hand, lack of generalization and convergence difficulties may arise if the network is huge. The training modified algorithm of cascade correlation is directed toward eliminating these inconveniences.

Modified Cascade Correlation Algorithm
This algorithm begins a minimal network, i.e without any hidden node, then automatically trains and adds new hidden unit one-by-one in a cascading maner. Scilicet, if the variance between the realized output and the targeted one is not low, it adds one hidden node [7], [8]. This candidate node is connected to all input nodes and previous added hidden units, i.e to all other nodes except the output nodes. Weights associated with hidden units are optimized by a gradient-descent method in which the correlation between the hidden unit's output and the residual error of the network is maximized. If C S is an overall sum of such correlations, We can see ref [13] for more details.
Strictly speaking, C S is actually a covariance, not a true correlation because the formula leaves out some of the normalization terms.
There are several new points for this algorithm. Firstly, we have the standardized way for input and output values, as mentioned above. Seccondly, we propose some sigmoidal functions for hidden units [13].
Results which we received from these different sigmoidal functions show that they are trusty and reliable for constructing neural network models, especially for flood water-levels forecasting.

Ameliorated Conjugate Gradient Algorithm
This technique differs from the previously mentioned error back-propagation in gradient calculations and subsequent corrections to weights and bias.
Here, a search direction k d is computed at each training iteration k , and the error function is minimized along it using a line search.
The gradient descent does not move down the error gradient as in the preceding backpropagation method but along a direction that is conjugate to the preceding step. The change in gradient is taken as orthogonal to the preceding step with the advantage that the function minimization, carried out in each step, is fully preserved due to lack of any interference from subsequent steps.
For each iteration k , we determine the constant k  which minimizes the error function by a line search, where k d is the search direction at iteration k . Then, we choose a new direction vector with n is the number of iteration steps.
This is a altered conjugate gradient equation.
The modified conjugate gradient algorithm based on this equation posseses the property of quadratic termination. This is proved by the fact that for a given quadratic function ) (x f and a perfect line search, the direction generated by the new method is identical to the one obtained by Fletcher-Reeves conjugate gradient and the DFP methods.

RESULTS AND DISCUSSION
The modified model was trained with the help of 52704 input-output data records by using some modified methods which are mentioned above. In this work, various parameters of the model, some ameliorated algorithms in neural network, the number of iterations, the initial normalized values for the input layer, etc., were tested. The configuration of the model, the number of iterations to archive an overall mean square error of the 10 -31 , and the CPU time required for this on a laptop, with Intel core i5 processor, are given in Table 1 for warning time of 1 and 2 days. Besides, the maximum error (ME1), minimum error (ME2), the average value of errors (AE), the normalized maximum and minimum values (ME3 and ME4), the maximum and minimum values of η (ME5, ME6) and α (ME7, ME8) are also given in Table 2.  The values of R were approximate to 1. All of global error values were less than 10 -31 . So, the convergence in the global error is satisfied.

CONCLUSIONS
The major aim of the work is to study, test, explore and demonstrate the potential of semiparametric regression model, together with artificial neural networks, for modeling and forecasting flood water levels. It can be noticed that the adjustment of the synaptic weights was quicker in the smaller network, with the mean square error dropping sharply until it reached the maximum value acceptable, defined by the user. It is interesting to observe that, like occurred in this case, the performance sometimes is not improved when the number of neurons is increased. For this reason, it is interesting to test the network several times if a solution is not found on the first traing exercise. when we use suitable sigmoidal functions for hidden units, the speed of computation is raised up rapidly. As can be easily noticed, the neural networks usually fit the experimental data with high accuracy and sensibleness.
Furthermore, simulation is a widely accepted tool in systems design and analysis. Because its basic concepts are easily understood, it has become a powerful decision-making instrument. The results have shown that a semiparametric regression model, along with artificial neural network models, is capable of modeling and forecasting the flood water levels, especially for low warning times. The precision of the estimates will depend on the quality of the information used to train the model. It is possible to create flexible and non-linear models that have better adherence to experimental data than traditional models. Moreover, it is possible to acquire and store knowledge in a dynamic configuration, creating models that can be constantly updated for different situations. In short, the simulations carried out, using real data from various tests, demonstrated that a semi-parametric regression model, together with artificial neural network, can be very useful tools for modeling and forecasting spatio-temporal flood water levels. The new semi-parametric regression model will be continued to develop and apply in our real world, emphasized in the studied area.