In this study, the hybrid support vector machine–artificial flora algorithm method was developed and the obtained results were compared with those of the support vector–wave vector machine model. Karkheh catchment area was considered as a case study to estimate the flow rate of rivers using the daily discharge statistics taken from hydrometric stations located upstream of the dam in the statistical period of 2008 to 2018. Necessary criteria including coefficient of determination, root mean square error (RMSE), mean absolute error (MAE), and Nash–Sutcliffe coefficient were used to evaluate and compare the models. The results illustrated that the combined structures provided acceptable results in terms of river flow modeling. Also, a comparison of the models based on the evaluation criteria and Taylor’s diagram demonstrated that the proposed hybrid method with the correlation coefficient of R2 = 0.924 to 0.974, RMSE = 0.022 to 0.066 m3/s, MAE = 0.011 to 0.034 m3/s, and Nash-Sutcliffe (NS) coefficient = 0.947 to 0.986 outperformed other methods in terms of estimating the daily flow rates of rivers.
The most important concern in managing flood and preventing the ensuing economic and life-threatening damages is accurate estimation of river flow. Accordingly, application of reliable methods to the prediction of river flow to plan for timely use of water resources is gaining growing significance. 1 In other words, accurate river flow forecasting can play a vital role in water resources planning and management. However, various factors affect this phenomenon, making its analysis difficult. Hence, it is necessary to incorporate influential factors in a model for estimating river flow at an acceptable level. 2,3 Today, intelligent systems are widely used for estimating nonlinear phenomena. One of the methods that has been considered in the field of hydrology is the support vector machine (SVM) model. This model has good performance and optimization algorithms have been applied to it in recent years to increase its accuracy and reduce its error rate. In metaheuristic algorithms, due to the addition of velocities with random values to the problem variables, they may be inadvertently transferred out of their defined ranges. On the contrary, based on the values of discrete variables in other algorithms, the answers obtained in all iterations are in the domain of the problem. As a result, finding a global optimum solution to some particular cases takes a longer amount of time, causing the problem to be trapped in local optima. 4 Therefore, the algorithm of artificial flora (AF), which is a combination of continuous and discrete optimizations, has been developed for large-scale problems to shorten the time required to achieve a global optimal solution and prevent trapping in local optima. This algorithm has an acceptable ability to solve nonlinear problems with large dimensions at an appropriate convergence speed. With these in mind, this study combines the AF algorithm with SVM. In recent years, a number of studies have attempted to present smart hybrid models for forecasting river flow rate. In the following, some cases are presented.
Huang et al 5 predicted monthly flow in the Huaxian Station in China using a SVM and their results proved the high accuracy of the proposed model. Sedighi et al 6 predicted the rainfall runoff process in Rudak catchment area located in northeastern Tehran by artificial neural networks (ANNs) and SVM using 92 Modis sensors within the statistical period of 2003-2005. They demonstrated the acceptable performance of the SVM model in estimating runoff. In another study, Ghorbani et al 7 used supportive modeling machines and ANNs to predict the daily flow of the Cypress River in Texas. They employed correlation coefficient and root mean square error (RMSE) to evaluate the models and demonstrated the proper performance of the SVM in predicting river flow and its better accuracy than ANNs. Samadianfard et al 8 proposed a hybrid model comprising SVM regression and fly algorithm and compared its performance with the decision tree model in estimating Dubai River and Venar located in Iran. Superior performance of the proposed hybrid model was proven in this research. Having employed support models and decision trees to predict the monthly flow of the Swat River in Pakistan, Adnan et al 9 showed effectiveness of the SVM model. Rajaee et al 10 used a combination of wavelet conversion and SVM models, nephropathy, ANNs, and genetic planning to predict the daily flow of the Dunbe River in Serbia. The results showed that the hybrid model comprising vector machine model and wavelet model experienced less serious errors than other hybrid models did. Alizadeh et al 11 examined the hybrid model of support vector-wavelet machine to predict the daily flow of the Souris River in the northern United States and observed the efficiency and accuracy of the proposed model. Hussain and Ahmed Khan 12 conducted a study on predicting the flow of the Hanza River in Pakistan employing the supportive vector machine models, ANNs, and random forest. The results showed better performance of the random forest model.
The rivers of Karkheh catchment area are generally considered as the most important watersheds in Iran. They constitute the major source of water supply to the adjacent areas for agriculture and drinking purposes. However, the drastic reduction in their flow indicates the necessity of simulating river flow in this basin and presenting measures to manage water more than ever. Therefore, the aim of this study was to predict the daily flow of Karkheh catchment rivers using a hybrid model comprising SVM and AF as well as to compare the results with those of the hybrid SVM-wavelet model.
Materials and Methods
The studied region
Karkheh basin with an area of 51 640 square kilometers in southwestern Iran is located in the range of 30° to 35° N and 46° to 49° E. The Karkheh catchment is part of the Persian Gulf catchment area, which is bounded in the north by the Sirvan, Sefidrood, and Qarachai river basins; in the west by the Iran-Iraq border area; in the south by a part of the western borders of the country; and in the east by the Dez River. The Karkheh river is 900 km long and it is the third largest river in the country in terms of the average annual discharge (8.5 billion cubic meters). Figure 1 shows the selected stations of the Karkheh catchment area, which did not have any missing or homogeneous data. The data were obtained from the Lorestan Regional Water Company and the Khuzestan Water and Electricity Organization (Table 1).
Support vector machine
SVM was developed in the early 1990s by Vapnik 13 and Misra et al. 14 Support vector machine embodies the structural risk minimization (SRM) principle, which minimizes the expected error of a learning model, reduces the overfitting problem, and enables better generalization. 13 It is an efficient learning system based on optimization theory that uses minimization of structure error and leads to an optimum response . In the regression model, SVM is a function related to the Y-dependent variable and an X-independent one. Similar to regression issues, the relation between independent and dependent variables is assumed to be clear, as given below 15
If WT is the coefficient vector, b is fixed for the regression function properties and ∅ is Kernel function, whose form is given below. These properties are further corrected by training the support vector model using data collection. 16 To calculate W and b, error function (equation (3)) in SVM- ε must be minimized (equations (3) and (4)) 13
where C is a true and positive value that determines any deviation in the model training error. ∅ is kernel, N is the number of samples, and εi and εi are deficient variables. Support vector machine function is re-written as follows
where is the Lagrange coefficient. ∅(x) is calculated in a special space. 13 To solve the problem, a common pattern in the support vector model is the kernel function
Different Kernel functions have been used for ε-SVM fabrication. Different kernel functions in the support vector model include polynomial kernel, radial basis functions (RBFs), and linear kernel, which, due to their popularity and widespread use, 17,18 have been employed in this study. Of note, vector machine calculations were conducted based on coding in MATLAB software and the parameters were optimized
Algorithm of AF
Flora disperses its grains in different ways, which are divided into autochory and allochory. Autochory involves self-dispersal of grains, while allochory is the process of distributing grains through external forces. Autochory provides a condition for independent migration of flora to an appropriate environment. On the contrary, allochory provides conditions for migration to far regions. The different methods for grain distribution reduce the probability of plant extinction. Natural environment under a harsh condition and competition may reduce flora distribution. Following the migration of flora to a new environment, flora species develop. 19 Flora migration can change the distribution region and cause development, extinction, and emergence of flora. Flora cannot move and is not smart. However, it can find the best place for life. Flora randomly distributes grains during migration and reproduction. A grain can survive for a while. Flora survives and distributes grains in its surrounding environment. It develops and adapts to the environment under a harsh condition. Before the extinction of flora in a region, it may grow in a new environment. Grains may grow in a new region and replicate by multi-replication. Flora finds an optimum region for growth, development, extinction, and growth. 19
AF algorithm theory
AF algorithm is composed of 4 main elements: main flora, child flora, flora position, and distribution distance. Child flora acts as the grain for the main florae and it cannot distribute grain. Distribution distance means grain distribution distance. There are 3 behavior patterns: development behavior, distribution behavior, and selection behavior. 20-22 Development behavior means flora development for adaptation to environmental behavior. 23-25 Distribution behavior stands for the movement of grains. Grains can move using allochory and autochory. Selection behavior suggests survival and extinction for environmental reasons. Figure 2 shows a flowchart for the establishment of AF algorithm.
A wavelet transform is presented as a replacement method for Fourier transformation and its purpose is to dominate the degradation of frequency within a short amount of time. For the transform wavelets such as short-time transformation, the signal is divided into windows. 26 The most important difference between the above-mentioned 2 methods is the changes of frequency type in wavelet transform, in which scale is found rather than frequency. Based on wavelet transform, high scales are expanded and thus the details can be analyzed. 27 A wavelet means a small wave and it is a small part of the main signal whose energy is concentrated in time. The mother signal can be degraded to wavelets and different scales. Wavelets include the transformed and dilated samples with fluctuations. Based on the properties of wavelets, time series of continuous wavelet transform (CWT) can be analyzed. 28 Wavelet transformation is defined in the continuous and discrete forms.
Continuous wavelet transform is defined based on equations (11) and (12) as follows 27
Equation (12) is the relationship between 2 variables of s and τ, where s is the scaling parameter and τ, is the translation parameter. In addition, * shows the mixed paired, Ψ is the window function for the mother wavelet, and is the wavelet of transformation and scale change for the mother wavelet. 28 The term “mother” is used because all the transformed and dilated (daughter wavelet) versions are obtained from the function. The mother wavelet is a pattern for other windows, showing the vector cross of 2 functions in the signal space.
In this study, to evaluate the accuracy and efficiency of the models, the indicators of coefficient of determination (R2), RMSE, mean absolute error (MAE), and Nash-Satcliffe (NS) coefficient are used according to the given relationships. 29 The best values for these 4 criteria are one, zero, zero, and one, respectively
In the above relations, xi and yi are the observational and computational values in the ith temporal step, respectively; N is the number of temporal steps; and x̄ and ȳ are the means of the observational and computational values, respectively.
Results and Discussion
Combinational selection of input variables is an important step for modeling. Hence, the cross-correlation between input and output variables was calculated and input parameters were selected for obtaining an optimum model for predicting the flow rate of the river of Karkheh catchment. The results are shown in Table 2. In Table 3, Q(t − 1), Q(t − 2), Q(t − 3), and Q(t − 4) columns show river flow at times t − 1, t − 2, t − 3, and t − 4 and Q(t) shows river flow at time t. To facilitate a better understanding of the nature of the mechanism, pattern complexity and memory are increased, while the model precision decreases. To model the river flow, most of the efficient data were used as the training data. This study investigated the effects of streamflow using return flow. The cross-correlation between input and output data was higher than 0.750 and different combinations of input parameters were used for estimating the optimum model for Karkheh catchment. The data were obtained from hydrometric stations of Chamanjir, Madianrod, Afrineh, Kashkan, Polzal, and Jologir over the years 2008-2018. The total number of 2920 records for training and other 730 records for assessing accuracy were selected. It should be mentioned that 80% of the data were selected for training and 20% for testing randomly. 30,31 Cross-correlation between input and output variables is shown in Table 3.
Selected combinations of input parameters.
Cross-correlation between input and output variables.
The results for support vector model-AF algorithm
In this study, a hybrid method comprising the SVM and AF algorithm is proposed. The optimal values of the characteristics of the SVM model including ε and C were determined. Also, different kernels were examined and based on their performances as well as the used kernel functions, the RBF function was adopted due to its higher accuracy in estimating the daily flow rate of rivers. 32,33 In this function, the characteristic of Γgamma; must be determined. Therefore, in general, to predict the daily flow rate of rivers by using the SVM model, it is necessary to calculate the optimal values of the 3 mentioned characteristics, namely ε, C, and Γgamma;, for which the best values are determined by AF algorithm. By using the developed models, the model with the least error could be determined and its characteristics be selected as the optimal values of ε, C, and Γgamma;. The AF algorithm was inspired by the migration and reproduction behavior of flora, comprising 3 main behaviors including evolution, distribution, and selection. This algorithm is able to prevent reaching a local optimal solution. It incorporates both self-pollination and cross-pollination behaviors. While the former searches around itself for the optimum solution, the latter explores a broader space, which improves the capability of the algorithm to find the optimum solution and increases the convergence speed to the optimal solution. The results of the hybrid SVM-AF algorithm are given in Table 4. According to the table, the proposed hybrid model for the basin station of the catchment area is more accurate and less erroneous due to the lack of intervention of the base flow along the river. The correlation coefficient of R2 = 0.924 to 0.974, RMSE = 0.022 to 0.066 m3/s, MAE = 0.011 to 0.034 m3/s, and Nash-Sutcliffe (NS) coefficient = 0.947 to 0.986 were achieved at the validation step of the model. Figure 3 shows the distribution diagram of the proposed hybrid model at the validation step, indicating the best fit line of computational values y = x. In this figure, the estimated and observational values, except for a few points, are on the semiconductor line, indicating their equality on (y = x). Also, as can be seen in the figure, the hybrid model has an acceptable performance in predicting the maximum and minimum with high proximity to the actual values.
Analysis of AF-support vector machine for selected stations.
The results of SVM-wavelet
To evaluate the results of the hybrid model, first, the input parameters were broken down into sub-signals using wavelet conversion and then the mentioned sub-signals were added to the model of the backup vector machine as input, constituting the combined model. One of the most important and fundamental points in this study was the study of different wave functions and it was observed that the Mexican cap wave had better performance than other functions. Table 5 shows the results of the hybrid model for the selected stations of the Karkheh catchment area. The table indicates that the proposed hybrid model for Chamanjir station had higher accuracy and lower error with the correlation coefficient of R2 = 0.915 to 0.964, RMSE = 0.031 to 0.084 m3/s, MAE = 0.015 to 0.068 m3/s, and NS coefficient = 0.930 to 0.978. Figure 4 demonstrates the best fit line (y = x) for the distribution diagram of the computational values of the support wave vector machine in the validation stage. In this figure, the estimated and observational values except for a few points are on the semiconductor line (y = x), indicating their equality. Also, as observed in the figure, the hybrid model has an acceptable performance in predicting intermediate values with high proximity to the actual values.
Analysis of the wavelet-support vector machine for selected stations.
Comparison of the performances of the models
By considering the optimal results of each hybrid artificial intelligence model and comparing the findings, the capability of both models to simulate the flow in the Karkheh catchment area was proved (Figure 5). Figure 5 illustrates the diagrams of the observed and calculated values for the studied models with respect to time in all of the studied stations. As observed earlier, the SVM-AF optimization algorithm model has shown an acceptable ability to estimate the minimum and maximum values. Moreover, the SVM-wavelet model exhibits appropriate performance in estimating the intermediate values such that they will be close to the observed values. Figure 6 displays the diagrams of relative error of the studied models with respect to the observed values. In this figure, the SVM-AF optimization algorithm has lower error than the SVM-wavelet such that the relative error values of the latter model are higher for all of the studied stations. 34,35
Taylor diagrams were used to analyze and evaluate the models used in the study, as shown in Figure 7. A clear advantage of Taylor’s diagram is that it uses 2 common correlation statistics: the correlation coefficient and the standard deviation. 36 The closer the predicted value to the observational value is in terms of correlation coefficient and standard deviation, the higher the predictability will be. Taylor’s performance chart shows that the AF-SVM model has the highest efficiency and performance, because the predicted standard deviation value has the closest distance to the standard deviation of observational data and the correlation coefficient shows the highest value. According to all the evaluation criteria, the models with the highest predictive power of AF-SVM and WSVM have the lowest predictability.
In this study, an attempt was made to evaluate the performance of the models in simulating the daily flow of rivers in the Karkheh catchment area using data from stations. The employed models were the SVM-AF hybrid model and the support vector wave machine. The observational values for the flow were compared with the predicted values using evaluation criteria. The research results can be summarized as follows: both models, namely the hybrid model of SVM-AF and support vector wave machine, achieved better results with structures consisting of 1 to 4 times delays than those with other structures. Also, according to the evaluation criteria, it was concluded that both models could predict the daily flow rates of the rivers with relatively high accuracy. Meanwhile, the proposed hybrid model of SVM-AF showed higher accuracy and lower error. Taylor’s diagrams showed that the hybrid model was more accurate. In general, it can be stated that high accuracy of the hybrid model was due to the optimization of the parameters of the backing machine model by the AF algorithm with the best possible values, which could be due to the capability of the algorithm to find the optimal location and its increased convergence speed. Overall, this study supported the effectiveness of the combined model of SVM-AF in predicting the daily flow of rivers. Given that the decision to exploit water resources and implement management strategies for many uses (especially agriculture and industry) depends on the accurate estimation of river flow, the proposed hybrid model can be an appropriate tool for managerial decision making. It is recommended that one use hybrid models of SVM with new optimization algorithms such as creative gunner, ski, chicken crowding, and cat crowding and compare the results. Moreover, the proposed model in this study can be applied to other hydrological phenomena.