Local descriptors based random forests for human detection

This paper presents a framework based on Random forest using local feature descriptors to detect human in dynamic camera. The contribution presents two issues for dealing with the problem of human detection in variety of background. First, it presents the local feature descriptors based on multi scales based Histograms of Oriented Gradients (HOG) for improving the accuracy of the system. By using local feature descriptors based multiple scales HOG, an extensive feature space allows obtaining high-discriminated features. Second, machine detection system using cascade of Random Forest (RF) based approach is used for training and prediction. In this case, the decision forest based on the optimization of the set of parameters for binary decision based on the linear support vector machine (SVM) technique. Finally, the detection system based on cascade classification is presented to speed up the computational cost.


INTRODUCTION
In recent years, human detection systems using vision sensors have been become key task for a variety of applications, which have potential influence in modern intelligence systems knowledge integration and management in autonomous systems [1,2].However, there are many challenges in the detection procedures such as various articulate poses, appearances, illumination conditions and complex backgrounds of outdoor scenes, and occlusion in crowded scenes.Up to day, several successful methods for object detection have been proposed.The state of the art of human detection was presented by Dollar et al. in [3].The standard approach investigated Haar-like features using the classification SVM for object detection [4].However, the performance of Haar-like features is limited in human detection applications [5,6] due to it is sensitive to a high variety of human appearances, complex backgrounds, and illuminative dynamic in outdoor environments.
Other authors proposed the Histograms of Oriented Gradients descriptor (HOG) [7][8][9] to deal with that problem.In another approach, Schewartz et al. [10] proposed the method for integrating whole body detection with face detection to reduce the false positive rate.However, the camera pose is not always opposite with the human, therefore the face is not always appearance.In terms of learning algorithms used in object detection, SVM and boosting methods are the most popular algorithms which have been successfully applied to classification problems.
Recently, some groups focused on combining classification algorithms.They proposed a new hybrid algorithm combining SVM with boosting techniques in order to create a better classification benefitting from the desirable properties of both methods [11].In order to improve the capability of mechanism system, the heuristic process is added for enforcing the selection of proper subset of training set to avoid the duplication examples and emphasizes the probabilities of examples that hard to learn.However, that paper did not explore the relation of data structure that allows sufficient combining features of data fed to each SVM learner.In other investigation, the system based on AdaBoost and SVM is presented for pedestrian detection [12].The authors used the SVM technique instead of a one-cascade AdaBoost classifier layer when the number of weak classifiers of the current layer exceeded a preset threshold.That mean the SVM is only used when the number of weak classifiers larger than the threshold value.The strengths of SVM will be omitted when the number of weak classifiers less than preset value.By contrast, the system using AdaBoost and SVM as two stages was proposed for pedestrian detection [13].The classification system consists of two stages.The AdaBoost is first used to raw classify, and then the output classification is fed to the SVM machine.That mean SVM is used to confirm all positive examples, which pass the first stage.This method can help to reduce the false alarm rate, but it also reduces the detection rate.The miss-detection examples at first stage will not be rescued at later stage.On the other hand, the system also consumes high computational time because it has to solve the problem in two stages.
On the contrary, this paper focuses on enhancing the accuracy and improving the speed of a pedestrian detection system by using variant scale block-based HOG features along with a hybrid of Random Forests and SVM techniques.The Random Forests technique is used as global system, while the SVM is used as classifier inside of the Random Forests.Vector data input for SVM is blocks of HOG feature vector, this represent data structure for SVM can avoid the duplication common data and guarantee the independence of SVM machines in global system.

PRELIMINARY RANDOM FOREST
Random forest (RF) is an ensemble model in machine learning, which is used for classification and regression.The basic idea based on construction of multiple decision trees at the training step.The prediction output is combination of all individual trees in forest.In the training step, the selection subset of sample features for each tree is randomly processed.
The trees are grown very deep tend to learn highly irregular patterns, which can made overfitting the model with training data.The RF is averaging multiple deep decision trees, trained on different parts of the same training data, with the objective of reducing the variance.
The training algorithm for random forest applies the general technique of bootstrap aggregating to tree learners, which is summarized as follows.
Evaluation for goodness of partition by using purity measurement, which called as information gain.
where the entropy H() is The objective is finding the parameters  for each node j to maximal information gain * argmax ( ( )) The ensemble prediction of RF is presented as follows:

LOCAL DESCRIPTORS
In this contribution, a feature descriptor based on HOG features is applied [7].The general flowchart of feature extraction is presented in  The extended descriptor is improved based on the original HOG [7] by using multiple scale block based HOG feature.There was no limitation in the scale degrees of block size for constructing HOG features, providing an extensive feature 1 2 3 descriptor space, which helps in obtaining highly discriminative features for high accuracy detection.Because of the use of multiple scale levels, histograms of gradients are repeatedly computed many times around the sample region.Therefore, to speed up the system, a cumulative sum of histogram gradients method is used for rapidly computing the feature descriptor.Similarly, the histogram of each oriented gradient within an arbitrary region is computed with four accesses using the cumulative sum gradient table (CS).In accordance with the characteristics of the cumulative sum table, gradients are separated into groups based on orientation, with each group organized into one table for computing cumulative sums.Each CS table is used to compute the histogram of gradients with respect to each orientated interval, e.g., each 20 degrees for one group, which is known as one layer, illustrated in Fig. 5. Finally, the histogram of gradients within any block only requires four operations multiplying with the number of oriented gradient layers, e.g. 4 operations/layer 9 layers, with respect to 9 groups of orientation gradients.
In coherence with our argument, the HOG feature descriptor as well as the fast computation based on the cumulative sum of histogram gradients method is briefly presented [9].The gradient values at each pixel in the sample image are computed by discrete derivations.The filter kernels [-1 0 1] and [-1 0 1] T are used to compute discrete derivations on horizontal and vertical axes, respectively.Gx and Gy are directional gradients on the x and y axis, respectively.The gradient magnitude and gradient orientation are computed as follows: The gradient magnitudes are separated into 9 tables based on their oriented angles.The unsigned orientation of the gradients (spaced from 1 degree through 180 degrees, in conjunction with 9 bins, 20 degrees/bin) is used to construct the histogram of oriented gradients, as depicted in Fig. 3.Each table of gradients is used to compute the cumulative sum gradients.Finally, 9 CS tables are used for computing the HOGBs and constructing the feature vector, which feed into training and classification.Fig. 4 presents the visualization of HOG using different size of basic cells.As the use of multiple scales of cell size is inevitable, several HOGBs are highly discriminative between positive (person) and negative (non-person) regions, besides that also there are many low distinctive HOGBs.To select for the highly discriminative blocks, which are used for classification stage, the SVM technique is applied on each individual HOGB for training and evaluation.Only blocks, so that SVM results high accuracy, would be selected for detection system.This preprocessing step is applied for both fullbody and component detections.

EVALUATION
In this session, the affection of some criteria to the time consuming and accuracy of the RF for object detection is analyzed and tested.The training data consists of 1,500 positive samples and 1,500 negative samples.In classification stage, the evaluation data includes 15,000 positive samples and 15,000 negative samples.Fig. 5 shows testing results of 15 times and the mean values on the same data.The results show that, there is a tradeoff of the RF, the large number of trees results in high accuracy, also expensive computational time and vice versa.Therefore, the number of tree in forest is defined based on the objective of the system, which is balance accuracy and time processing target.Fig. 6 presents the comparison results of the SVM and the RF classification method.The results figure out that the SVM results higher detection rate than the RF at low false detection rate.However, the RF results higher that of at high false detection.In other comparison criteria, SVM is usually faster in training stage, and slower in classification stage than the RF.Fig. 7 presents the comparison results of our feature descriptor with original HOG with LBP feature descriptors using SVM classification method.Fig. 8 presents some results of people detection.

CONCLUSION
The classification approach based on local feature descriptors and the RF frame-work is presented for human detection.The approach utility of advantage of fast processing based forest of decision trees and robustness of the SVM for estimating the optimal parameters for split function.The classification method is based on the RF ensemble using multiple local feature descriptors.The proposed method utilizes the rich block-based descriptor .The computing time of the variety block sizes based feature descriptor is speeded up using heuristic stored data structure Given a training data set =(X,Y) with X={ x1, …, xn } and Y ={y1, …, yn} are the samples and labels, respectively.The label Y is a set of classes (Y={0,1} for binary classification).The bagging repeatedly selects a random sample feature with replacement of the training set and fits trees to these samples.For t = 1,…T: (a) Randomly sample a small subset of features, called s (b) For each j  s (b-1) Split the set of j into two subsets by split function h(x,j), which  is the set of defined parameters of split function, with the feature selector . {x pt is the decision prediction of each tree in the forest.Training decision tree includes all training data {x}, the feature selector : R d  R d' with d'<<d.The forest of tree can be process parallel.Due to d'<<d, the RF can deal with the expensive consuming time in the case of huge dimensional data.

Fig. 1 .
Difference to other approaches, the split function of weak classifier based on optimization of maximum margin hyperplane of the feature descriptor in local patch is used.The ensemble of local descriptor is solved by appropriate feature selector (x).Fig.2demonstrates the idea of the use local descriptors based ensemble approach.In this work, the set of local feature block is used at a node for split function.The optimization  parameter is solved by the linear SVM learning method.

Figure 3 .
Figure 3. Gradient process based on orientation for the cumulative sum method.

Figure 4 .
Figure 4. Intuitive histogram of oriented gradients using HOG based on different sizes.

Figure 5 .Figure 6 .
Figure 6.Comparison of accuracy result by using SVM and RF methods.

Figure 7 .
Figure 7.The comparison of our method with the standard approach HOG+ SVM method.