Feature subset selection in dynamic stability assessment power system using artificial neural networks

This paper presents method of feature subset selection in dynamic stability assessment (DSA) power system using artificial neural networks (ANN). In the application of ANN on DSA power system, feature subset selection aims to reduce the number of training features, cost and memory computer. However, the major challenge is to reduce the number of features but classification rate gets a high accuracy. This paper proposes applying Sequential Forward Selection (SFS), Sequential Backward Selection (SBS), Sequential Forward Floating Selection (SFFS) and Feature Ranking (FR) algorithm to feature subset selection. The effectiveness of the algorithms was tested on the GSO-37bus power system. With the same number of features, the calculation results show that SFS algorithm yielded higher classification rate than FR, SBS algorithm. SFS algorithm yielded the same classification rate as SFFS algorithm.


INTRODUCTION
Modern power systems are forced to operate under highly stressed operating conditions closer to their stability limits. The operation of power systems is challenged increasingly significant because investment sources and transmission systems are not developed to meet the load demand. While operating the power system is always faced with unusual circumstances such as a generator outage, loss of a line, sudden dropping of a large load, switching of station or substation, and three-phase sudden short circuit, ... Power system stability is the ability to regain an equilibrium state after being subjected to a physical disturbance and maintain the continuous supply of electricity to customers. Power system stability is classified [1]: rotor angle stability, frequency stability and voltage stability. Rotor angle stability is divided into two categories including short-term and long-term. Short-term stability angle is considered transient dynamic stability and important contribution in power system stability. Long-term stability angle includes small signal stability and frequency stability.
Due to the complexity of the power system, traditional methods to power system analysis take so much time and cause delays in decision making. However, the relationship between prefault parameters of the power system state and post-fault modes of power system stability has highly nonlinear, extremely difficult to describe this mathematical relationship. In order to overcome such difficulties, intelligent system, that is ANN, has been proposed for DSA thanks to special abilities in pattern classification [2], [6], [7]. Operating conditions of power systems have wide range so that it is difficult perform online calculations. ANN is in need of initial off-line data for training. Extensive offline simulation is performed so as to acquire a large enough set of training data to represent the different operating conditions of typical power systems. As a pattern classifier, once trained, neural networks not only have extremely fast solutions but also get the ability to update new patterns or new operating conditions by generalizing the training data, improving recognition accuracy [7].

Mathematical Model of Multimachine Power System
The dynamic behavior of a generator power system can be described by the following differential equations [1]: It is known that: By substituting (2) in (1), therefore (1) becomes: The state of the power system is stable when the rotor angle deviation of any two generators not exceeding 180 0 , and is unstable when the rotor angle deviation of any two generators exceed 180 0 . Status of power system was performed according to the proposed rules in [1],]4], [5], as follow: , and the new mapping ynewi=fnew(zi). Thus, feature selection is actually taking away unnecessary features and selecting a candidate subset of features that get rich information with highly accurate identification of model. This process includes the following steps: Step 1. Data generation, initial feature set selection.
Step 3. Training and testing classification rate.

Data generation, initial feature set selection.
A large number of samples are generated through off-line simulation and the stable status is evaluated for each fault under study. Data for each bus or line fault occurring in the test systems are recorded in which samples of data are kept in a database. The input is the vector of system state parameters that characterize the current system state, usually called feature, they can be classified into pre-fault, fault-on and postfault features.
Fault-on features [6]: variables that characterize at fault-on state of power system occur such as changes in nodal powers, in power flows in transmission line, voltage drops in the nodes at instance of fault (Pflow, Qflow, Pload, Qload, Vbus,…).
Post-fault features [4]: variables that describe system dynamic behavior after disturbance occurs such as relative rotor angle, rotor angular velocity, rotor acceleration, rotor kinetic energy, and the dynamic voltage trajectory,… The problem of transient stability is usually divided into two main categories: assessment and prediction. Transient stability assessment usually focuses on the critical clearing time (CCT). In transient stability prediction, the CCT is not of interest [11]. In this aspect, the progress of power system transient due to the occurrence of disturbance is monitored. The key question in transient stability prediction is: the transient swings are finally 'Stable' or 'Unstable' [3], [10]- [12]. Vector output variables represent the stable conditions of the power system. Need of fast DSA power system after the fault is stable or unstable, so the output variables are assigned to label binary variable y [10,01]. Class 1 [10] is stable class and class 2 [01] is unstable class.
The use the post-fault variables can be too long for operators to take timely remedial actions to stop the extremely fast transient instability development process.
Found that, pre-fault input features are variables that are too difficult to find a clear signal for sampled dataset learning. Post-fault input features will prolong a warning of instability power system. Fault-on input features are proposed in [6] to overcome the drawbacks such as analysis since the changes in the value of the parameters of input variables are a clear signal for dataset learning. So, this paper did mining of fault-on input features (Vbus, Pload, Qload, Pflow, Qflow) as a database for training neural networks.
The output variables represent the dynamic behavior of power system at fault-on. By observation from off-line simulation, these binary output variables indicate the status of the power system to comply with the law (4).
The quantitative variables have different units of measurement; the value of the variables in the different ranges will affect the calculation results in recognition. Data normalization methods commonly applied in accordance with the following formula: Where: mi is mean value of data. i is standard deviation of data.

Candidate feature subset selection.
This step is the process of searching for potential subset features. The search strategy is divided into a global search and local search. Global search strategy has the great advantage that for optimal result, but expensive computation time. Therefore, the optimal search strategy is not appropriate when a large number of input variables. In the case of large input feature, local optimization search strategy will spend less time searching because the search process is not through the entire search space.

Local optimization search strategies -Sequential Forward Selection -SFS [8]:
The SFS method begins with an empty set (k=0), adds one feature at a time to selected subset with (k+1) features so that the new subset maximizes the cost function J(k+1). It stops when the selected subset has the d desired number of features, k<d.

-Sequential Backward Selection-SBS [8]:
The SBS method begins with all input features D (k=D), removes one feature at a time to selected subset with (k-1) features so that the resultant subset maximizes the cost function J(k-1). The algorithm stops when the resultant feature set has the d desired number of features, k<d. tries to backtrack by using the SBS algorithm to remove one feature at a time to find a better subset. The algorithm terminates when the size of the current feature set is larger than the d desired number of features.

-Feature Ranking-FR [2],[4]
: This is a simple method which uses less computing time. By evaluating cost function of a single feature, then it is ranked by ordering the best of them and select for a good feature.

Cost function [8, 9]
Let the n data samples be x1 , . . . , xn. The sample covariance matrix, Sm, is given by (6): Si: is the covariance matrix for class i.
Between-class scatter matrix that describes the scatter of the class means about the total mean is: Sm is the covariance matrix of the feature vector with respect to the global mean. Its trace is the sum of variances of the features around their respective global mean. Sm is: Goal is to find a feature subset for which the within-class spread is small and the betweenclass spread is large. The cost function is: Formula (14), that was written for the k th single feature, is Fisher distance function: The value of J is bigger means that the feature is more important.

Training and testing classification rate
To test the studied methods without loss of generality, the database is randomly partition into k subsets that are D1, D2,… , Di,…, Dk, each equal size. The model is trained on all the subsets except for one that is tested to measuring of validation accuracy. Training and testing are performed k times. The validation accuracy or classification rate is computed for each of the k validation sets and averaged to get a final crossvalidation accuracy. Classification rate of training or testing is determined by the formula Where: nr is the number of sample for training or testing with right result; N is the number of sample for training or testing.
The expected value (EV) of classification rate of the model was proposed in [6] by the formula (16):

Training and testing classification rate and subset feature evaluation
Applying feature subset selection algorithms were described as above to selecting feature subsets. Each feature subset was trained and tested, the classification rates are calculated by the formula (15).
Feature subset is selected with conditions that have smaller a number of features, agree to the formula (16) and get higher classification rate.

Feature set, samples for training
The off-line simulation was implemented to collection data for training. In this study, the GSO-37bus system, that is the standard system in the simulation program of PowerWorld 17 software, [5], was used as case study.

Results of feature subset selection
In this paper, four search algorithms that are SFS, SBS, SFFS and FR, were proposed applying to feature subset selection. In which, the SFS, SBS, SFFS algorithms had been applied in [2]. The objective function (13) was applied for these three algorithms in this study. FR algorithm had been applied in [2], [4] with Fisher distance function (15). Figure 1 shows the results of distance measuring value by SFFS, SFS and SBS algorithm. Figure 2 shows the results ranked from large to small according to Fisher's distance measuring the value of each single feature.

Results of training
MLNF had three layers: one input layer, one hidden layer and one output layer. Hidden layer has 10 neurals with activate function tansig. Activate function purelin was used for output layer. Levenberg-Marquardt optimization based for weight and bias updating algorithm was selected. These functions are supported in neural networks tool of R2011b Matlab software. Programs were performed by laptop with CPU Inter Core TM i3-380M, 2GB DDR3 Memory, 500GB HDD. Figure 3 shows classification rate of testing feature subsets with algorithms by MLNF.  Figure 4 shows classification rate of testing feature subsets with algorithms by LC.  Figure 1 shows the results of distance measuring value by SFFS, SFS and SBS algorithm. Figure 2 shows the results ranked from large to small according to Fisher's distance measuring value of each single feature by FR algorithm. In which, the same distance measuring values were caculated by SFS and SFFS, but that have very small value difference at subsets with 13 features and 20 features as  Figure 3, classification rates of SFS and SFFS algorithm are the same. SFFS and SFS algorithm give better results than SBS and FR algorithm. Classification rates of SFS and SFFS algorithm are more 1,3% to 2,9% than SBS algorithm and more 4,6% to 8,3% than FR algorithm. Table 3, SFS algorithm, subset has 12 features that its classification rate got 95% by MLFN. Comparing with feature set has 199 features, SFS algorithm's feature number was reduced 16,5 times, its training time was reduced 3,74 times. Classification rate of that feature set has 199 features is 95,8%. By comparing the calculated results found that SFS algorithm has the same results as SFFS algorithm. These results can be explained that in step backward search SFFS algorithm only removes one feature for each execution algorithm could not search deep enough to find