in Application of self organizing map in construction, geology and petroleum

 Abstract — In recent years, Artificial Intelligence (AI) has become an emerging subject and been recognized as the flagship of the Fourth Industrial Revolution. AI is subtly growing and becoming vital in our daily life. Particularly, Self-Organizing Map (SOM), one of the major branches of AI, is a useful tool for clustering data and has been applied successfully and widespread in various aspects of human life such as psychology, economic, medical and technical fields like mechanical, construction and geology. In this paper, the primary purpose of the authors is to introduce SOM algorithm and its practical applications in geology and construction. The results are classification of rock facies versus depth in geology and clustering two sets of construction prices indices and building material costs



Abstract-In recent years, Artificial Intelligence (AI) has become an emerging subject and been recognized as the flagship of the Fourth Industrial Revolution. AI is subtly growing and becoming vital in our daily life. Particularly, Self-Organizing Map (SOM), one of the major branches of AI, is a useful tool for clustering data and has been applied successfully and widespread in various aspects of human life such as psychology, economic, medical and technical fields like mechanical, construction and geology. In this paper, the primary purpose of the authors is to introduce SOM algorithm and its practical applications in geology and construction.

The results are classification of rock facies versus depth in geology and clustering two sets of construction prices indices and building material costs indice.
Index Terms-Self Organizing Map, Hierarchical Clustering, Geology, Well logging, Construction Economics.

INTRODUCTION
OM (Self-Organizing Map) is a result of Unsupervised learning algorithmthis algorithm bases on the structure of input data in order to reduce the quantity of data dimensions or to cluster this data into different sections without precise result at the output [1,2]. SOM's result is a clustering map including a number of nodes (cells) with similar characteristic into the same field or section [2,3]. The biggest advantage of SOM is that it illustrates visually multi-dimension (multi-characteristic) input data on twodimension map but still retains the essence of original data. SOM prevails over traditional algorithms on clustering functions [4]. Specifically, kinds of data having similar tendency or symptoms are grouped together by SOM while traditional algorithms just determine average values, variance, standard deviation and data frequency. Thus, thanks to SOM, readers will have more visual assessments in order to give more suitable conclusions. The primary purpose of the paper is to introduce SOM algorithm and its practical applications in geology and construction. In geology, geophysical surveys are conducted to determine rock formation characteristics versus depth such as density, sonic travel time, and gamma rays. After well logging, a substantially large data sheet containing these rock properties with respect to every depth is obtained. From this logging data sheet, the rocks at specific depth will be categorized into different facies. The conventional interpretation process would take a large amount of time and efforts. SOM is definitely a solution to this problem because it eliminates the time-consuming manual interpretation. SOM will effortlessly characterize rock intervals into particular facies simultaneously.
In construction, catching up with and being able to forecast the change in construction as well as major materials cost are considered not only huge advantages but also key factors affecting decisions of contractors and investors. SOM algorithm brings us visual look at the changing tendency of one factor if other factors in the same section change.

METHODOLOGY
2.1 Self -Organizing Maps (SOM) -Konohen network Self-Organizing map or SOM, initialized by Professor T. Konohen since the early of 1982s, is undeniably useful clustering tool. Since it was introduced, SOM has been applied widely in Application of self organizing map in construction, geology and petroleum industry Pham Son Tung, Truong Minh Huy, Pham Ba Tuan S various fields such as psychology, economy, medical care, engineering and a vast majority of other professionals [1,3].
SOM algorithm helps simplifying data, reducing dimensions (properties) of input data and thus SOM results in a map with lesser dimension than the former, usually 2-D map. SOM is built on a foundation of Unsupervised Learning algorithm on input data. An example pertaining to a simple Konohen network with the size of 4x4 (16 nodes) is shown in figure 1. Each node from the map represent a vector with as many dimensions as those of the input vectors (data); i.e., if the input vector has n dimensions ⃗ ( 1 ; 2 ; … ; ), then the weighted vector of a node would contain n dimensions ⃗⃗⃗ ( 1 ; 2 ; … ; ) [2]. At the beginning of SOM algorithm, weighted vectors in the Konohen network have random values associated with different properties ranging from 0 to 1. After each iteration, these random values will be adjusted to a random input vector chosen from the normalized input data [3]. The number of iterations is usually 500 times bigger than the quantity of network nodes [1]. The following section describes how SOM algorithm works in more detail.

2.2
Algorithm sequence Step 1: Build Konohen network with configuration of x (nodes) with random values for each property ranging from 0 to 1.
Step 2: Normalize input data in order to determine relative effect level between properties. In other words, eliminating the unit of each property with the following formula: Step 3: Choose randomly one input vector from normalized input vector series. Determine the distance from chosen input vector to each node on Konohen network according to Euclidean distance formula: = √∑ ( − ) 2

=0
(2) Step 4: Obtain the node with the smallest distance to chosen input vector and name this node Best Matching Unit (BMU) Step 5: From BMU, determine neighborhood radius with formula 2 = (3) Step 6: Calibrate all the nodes within neighborhood radius according to below formula: Step 7: After calibration, continue to conduct iteration from step 3, with t increase by 1. This step is repeated over again until reaching the desire number of iterations N. With increasing t, these below parameters change concurrently: number of iterations

SOM algorithm's results
From a network with nodes containing random values of different parameters, after experiencing a training progress with determined number of iterations, SOM algorithm results in a network with the same number of nodes but these nodes have been modified so that the closer the nodes are, the more similarity in characteristics they would be [2]. Based on Euclidean formula about distance calculated from a random input vector to a node on Konohen network, a specific node containing the interested elements could be identified. We eventually know how each node in the resultant map represents which input vectors from initial input data. Thus, SOM algorithm not only illustrates huge input data series into selforganizing map with much lesser nodes but also visually highlights similar behaviors or characteristics of input data. Besides, SOM results also facilitate Hierarchical clustering method in reducing the amount of time for algorithm calculation and number of iterations as well. In the next section, the authors will present Hierarchical clustering method and its application via SOM

2.4
Hierarchical clustering algorithm Hierarchical clustering is an algorithm in which normalized input data is clustered according to Euclidean distance formula [5]. For instance, if initial input data has 5 elements then the number of segments created from 5 element is (5 − 1)! = 4! = 24, we then have to calculate the distance for 24 times. The result of Hierarchical algorithm is a chart as shown in figure 2. More branches represent more detail in clustering or more groups we could obtain. Specifically, in figure 2, if an interpreter just wants to divide the data into two groups then he just needs to look in position 1 with group (A, B, C) and group (D, E, F, G). On the other hand, if the interpreter decides to divide the data into five groups instead of two, then the results will be (A), (B, C), (D, E), (F) and (G) groups. Therefore, it is obvious that the advantages of Hierarchical algorithm are simple theoretical basis, easy computation and direct visual results. However, if the size of input data is relatively large, for example 100.000 elements, the number of times to calculate Euclidean distances would increase substantially (up to 99.999! times) leading to unrealistic computation time.
In order to solve the above problems of Hierarchical algorithm, thenauthors have decided to use SOM algorithm. First, SOM algorithm will simplify the input data into a map (network) with substantially lesser nodes, and then Hierarchical algorithm will be applied on this map to cluster these representative nodes. In this paper, we will apply both algorithms in two fields: geology and construction.

Application of SOM algorithm in geology
In geology, SOM algorithm is applied to determine rock facies. From initial input data containing different properties acquired from well logging processes such as neutron log, sonic log, density log, gamma ray, etc… Groups of depth associated with similar log characteristics are located into the same node of a Konohen network using SOM. Combining with Hierarchical algorithm, the ultimate result will be a selforganizing map with nodes divided into separate sections representing different kinds of rock facies (the number of rock facies will be determined based on experience of the interpreter). Figure 3 represents well geo-physical data of four main properties with start depth at 1843.278 ft and end depth at 4248.302 ft with an increment of 0.154 ft. The values in column DT, NPHI, RHOZ and GR are the results of logging activity namely sonic log, neutron log, density log and gamma log (gamma ray).

3.3
Result From initial input data, SOM algorithm has clustered into nodes in Konohen network. Here, authors have chosen a 30x30 (900 nodes) network to represent the whole input data series (Figure 4). Basic parameters that need to be set from the beginning are iteration number N (60,000), initial learning rate 0 (default 0.5). After Konohen network has been built, Hierarchical algorithm will be applied on top of this map with a size of 30x30,  and then there will be just distance calculation on 900 elements instead of over 15,000 elements.
Each node on the network contains four different properties namely DT, GR, NPHI and RHOZ which are the characteristics of input data series. In the beginning period (t = 0), Konohen network contains 900 nodes having random values. After lots of iterations, node's value has been calibrated and nodes have been arranged more discipline. SOM algorithm has successfully reduced not only the number of dimensions but also the amount of input data; this will reduce the amount of work which Hierarchical algorithm has to deal with. Depending on purposes and experience of geological interpreter, the number of rock facies will be decided. In this case, authors choose to divide into six facies ( Figure 5).
From the result in Figure 4 (below), at a specific depth with corresponding properties, we could determine whether that depth belongs to which specific node in Konohen network. From the clustering result in Figure 5, we could obtain a random node and know which facies it belongs to. Thus, when combining two algorithms, we could know what kind of facies there are in a specific depth interval ( Table 1). The result of this application will facilitate geological mapping which has critical effect on construction (determining compaction of a particular section), petroleum industry (determining depth that has potential of oil preserve) and environment (determining underground water field in order to avoid contaminating activities) and other fields.

SOM application in construction professional
Another application shown in here is in economic evaluation in construction, SOM will help clustering level change in cost of construction and building material, facilitating contractors as well as investors in choosing optimum way to invest their money on. Here, SOM algorithm helps analyzing data of construction price indices and core materials price indices. Table 2 and Table 3 list the data pertaining to construction price indices and core materials price indices, respectively. Price index is a parameter describing how much the price of a specific period has changed comparing to a chosen standard period [7]. The chosen time here is year 2015.

Result
With the table containing data about construction price indices, because the size of input data is relatively small, authors will adjust parameters of SOM algorithm to fit the input data. Specifically, Konohen network chosen has the size of 4x4 (16 nodes), iteration number N of 8,000 and initial learning rate 0 of 0.5. After running SOM, we obtain a map containing 16 nodes divided into 4 sections representing 17 objects.
From Table 4, it is obvious that there are four major groups and objects from the same group have the similarities in price indices. For instance, bridges and roads constructions, irrigation construction and cultural construction have similar tendency in price indices in the first three quarter of 2016 compared to 2015. SOM is undoubtedly useful tool for contractors because base on the map provided, they could choose projects that have similarities in cost change. Furthermore, assuming this trend will continue in quarter IV of 2016,

GROUP 4
The others clustering will help contractors choose kinds of construction with price indices lower than 100%, facilitate them in optimizing economical problem.
In the upcoming table, authors will conduct clustering on quarter IV of 2016 to check how efficient the forecast could be. Result of this procedure is presented in Table 5. Table 5 shows the differences between forecasting and real data in quarter IV of 2016, there are two objects which have changed their positions in four sections namely water resources infrastructure and apartment building from 9 to 15 floors. The accuracy of the IV quarter price indices prediction procedure is up to 89.47%. With this result, forecasting price indices has proved to be reliable and could be used for further calculation in the future. In the similar vein, applying the two algorithms on the data relating to core material price indices has an outcome of a characterized table of core materials grouping together. Table 6 is the result obtained from the input data series.
The results presented in Table 6 provide a visual look into the change in core material price, facilitating the determination of total price falsity of construction. Specifically, we will conduct a forecast of quarter IV depending on the three first quarters. The result will be shown in Table 7.
In Table 7 resulting from the analyze of core material price indices in quarter IV, there are four object having the change in their positions namely enameled tile, electrical materials, cement and gasoline. The accuracy of the IV quarter price indices prediction procedure is up to 75%, less than in case of construction price indices but could still be reliable on.

CONCLUSION
Since its inception, SOM has had applications in various majors, facilitating artificial intelligent to become key factor in the fourth technology revolution. To take the advantages of SOM, this study focuses on applying this tenichque to construction and geology engineering.
In geological aspect, SOM aids in clustering different facies based on logging data, allowing the form of geology maps for construction major (determine compaction of areas), petroleum major (determine which layer contains oil and gas) and environment major (determine which layer contains water in order to avoid contaminating that layer) and a majority of other professionals. Details level of the result depends upon the number of cells (nodes) in Konohen network as well as iteration number N of algorithm. The higher this number can be, the more exact SOM algorithm can be, but the longer it takes for calculation. Thus, the problem relating to optimization the figure for nodes and iterations is the one that bares much consideration.