THE PREDICTION AND THE GHSOM Artificial Neural Network (ANN) is one of the data mining

The Prediction Approach with Growing Hierarchical Self-Organizing Map

I. THE PREDICTION AND THE GHSOM Artificial Neural Network (ANN) is one of the data mining

techniques, which plays an important role in accomplishing the task of financial fraud detection (FFD) that involves distinguishing fraudulent financial data from authentic data, disclosing fraudulent behavior or activities, and enabling decision makers to develop appropriate strategies to decrease the impact of fraud [20]. Amongst the ANN applications to FFD, Self-Organizing Map (SOM) [17] is adopted a lot in diagnosing bankruptcy [6]. The major advantage of SOM is its great visualization capability of topological relationship amongst the high-dimensional inputs in the low-dimensional view. To address the issue of fixed network architecture of SOM through developing a multilayer hierarchical network structure, the Growing Hierarchical Self-Organizing Map (GHSOM) [8][9][23] has been developed. The flexible and hierarchical features of GHSOM generate more delicate clustering results than SOM and make GHSOM a powerful and versatile analysis tool. GHSOM has been used in many fields

such as image recognition, web mining, text mining, and data mining [8][9][23][25][26][29] as a useful clustering tool for further feature extraction. Few of researches have applied GHSOM to the prediction tasks [12][14][20]. This motivates this study develop a prediction approach in which GHSOM is used to help identify the fraud counterpart of each non-fraud subgroup and vice versa.

Specifically, this study assumes that there is a certain spatial relationship amongst fraud and non-fraud samples. For instance, if a group of fraud samples and their non-fraud counterparts are identical, each cluster of fraud samples tend to be located separately from the non-fraud samples. In other words, the spatial distributions of most fraud samples and their non-fraud counterparts are the same. If such a spatial relationship amongst fraud and non-fraud samples does exist, we hope that the GHSOM can help to identify the fraud counterpart of each subgroup of non-fraud samples and vice versa. Such idea of combining supervised and unsupervised learning approach can inspire the model design and improve the classification accuracy. For example, Carlos (1996) applied SOM to financial diagnosis by developing a DSS for the analysis of accounting statements, which includes Linear Discriminant Analysis (LDA) and Multilayer Perceptron (MLP) to delimit the solvency regions within the SOM. Such approach is based on the idea that the unsupervised neural models must be complemented with a statistical study of the available information.

To practically explore such a spatial correspondent assumption, this study derives a prediction approach based upon the GHSOM. To justify such a prediction approach, we set up the fraudulent financial reporting (FFR) experiment.

The remainder of this paper is organized as follows. A review of relevant literature is shown in Section II. Section III presents the proposed approach. Section IV reports the experimental design of FFR. The last section concludes with a summary of findings, implications, and future works.

II. LITERATURE REVIEW A. Growing Hierarchical Self-Organizing Maps

The training process of GHSOM consists of the following four phases [8]:

y Initialize the layer 0: The layer 0 includes single node whose weight vector is initialized as the expected value of all input data. Then, the mean quantization error of layer 0 (MQE0) is calculated. The MQE of a node denotes the mean quantization error that sums up the deviation between the weight vector of the node and every input data mapped to the node.

y Train each individual SOM: Within the training process of an individual SOM, the input data is imported one by one. The distances between the imported input data and the weight vector of all nodes are calculated. The node with the shortest distance is selected as the winner. Under the competitive learning principle, only the winner and its neighborhood nodes are qualified to adjust their weight vectors. Repeat the competition and the training until the learning rate decreases to a certain value.

y Grow horizontally each individual SOM: Each individual SOM will grow until the mean value of the MQEs for all of the nodes on the SOM (MQEm) is smaller than the MQE of the parent node (MQEp) multiplied by τ1 as stated in (1). If the stopping criterion is not satisfied, find the error node that owns the largest MQE and then, as shown in Fig. 1, insert one row or one column of new nodes between the error node and its dissimilar neighbor.

MQEm < τ1 × MQEp (1)

Figure 1. Horizontal growth of individual SOM. The notation x indicates the error node and y for the dissimilar neighbor

y Expand or terminate the hierarchical structure: The node with an MQEi greater than τ2 × MQE0 will develop a next layer of SOM. In this way, the hierarchy grows until all of the leaf nodes satisfy the stopping criterion stated in (2).

MQEi < τ2 × MQE0 (2) GHSOM has been used in fields of image recognition, web mining, text mining, and data mining. For example, [25] had shown the feasibility of using GHSOM and LabelSOM techniques in legal research by tests with text corpora in European case law. GHSOM was used to present a content-based and easy-to-use map hierarchy for Chinese legal documents in the securities and futures markets [26]. Reference [1] used GHSOM to analyze a citizen web portal and provided a new visualization of the patterns in the hierarchical structure.

Not many studies have applied GHSOM in the purpose of forecasting until recent years. For instance, a two-stage architecture with GHSOM and SVM was employed by [14] to better predict financial indices. GHSOM was applied with support vector regression model to product demand forecasting [20]. In [12], GHSOM was integrated into case-based reasoning system in design domain.

B. Fraudulent financial reporting

FFR is a kind of financial fraud that involves the intentional misstatement or omission of material information from an organization’s financial reports [4]. FFR can lead not only to significant risks for stockholders and creditors, but also financial crises for the capital market [3]. Prior FFR related studies showed that the main data mining techniques used for FFD are logistic models, neural networks, the Bayesian belief network, and decision trees. These data mining techniques also contribute to the FFR detection. For example, [11] applied the back-propagation neural network to FFR detection. The model used five ratios and three accounts as input. The results showed that the back-propagation neural network had significant capabilities when used as a fraud detection tool. Reference [10]

proposed a generalized adaptive neural network algorithm, named AutoNet to detect FFR and compared their model against the linear and quadratic discriminant analysis and logistic regression. They concluded that AutoNet is more effective at detecting fraud than standard statistical methods.

For a broader financial fraud detection domain, [21] have done a classification framework and an academic review of literature which used data mining techniques in FFD.

III. THE PROPOSED PREDICTION APPROACH

The proposed prediction approach consists of the following three phases: the training, modeling, and predicting. Table I shows the training phase, in which the task of data preprocessing is done via step 1 and step 2. Step 2 can apply any variable selection tool such as discriminant analysis, logistic model, and so forth.

TABLE I. THE TRAINING PHASE.

In the training phase, we want to use GHSOM to classify fraud and non-fraud samples respectively in such a way that the spatial relationship amongst fraud and non-fraud samples can be explicitly identified later. Thus, before processing step 3, the training samples are grouped as the fraud ones and the non-fraud ones. In step 3, the non-fraud samples are used to generate an acceptable GHSOM named FT (fraud tree). After identifying the FT, the values for (GHSOM) parameters breadth (τ1) and depth (τ2) are determined and stored in step 4. Then, in step 5, the determined values of τ1 and τ2 and the non-fraud samples are used for setting up another GHSOM named NFT (non-fraud tree). With the spatial correspondent assumption and the same setting of training parameters (τ1 and τ2) of GHSOM, each leaf node of NFT may have one or more than one counterpart leaf nodes in FT and vice versa despite FT and NFT are established based on fraud and non-fraud samples,

step 1: Sample and variable measure.

step 2: Identify the significant variables that will be used as the input variables.

step 3: Use the fraud samples to generate an acceptable GHSOM named FT.

step 4: Based upon the accepted FT, determine the (GHSOM training) parametersbreadth (τ1) and depth (τ2).

step 5: Use the non-fraud samples and the determined parameters τ1

and τ2 to generate another GHSOM named NFT.

Table II presents the modeling phase, in which the prediction rule is set up. In the modeling phase, via inputting all of (fraud and non-fraud) training samples to FT and NFT, we can match each leaf node of NFT to its counterpart leaf nodes in FT and vice versa. That is, from the NFT perspective, if the leaf node #x in FT hosts the majority of all of (fraud and non-fraud) training samples classified in the leaf node *y in NFT, then we match the leaf node #x in FT to the leaf node *y in NFT and claim that the leaf node #x in FT is the counterpart of the leaf node *y in NFT. Hereafter, we use #x to denote the x^th leaf node of FT and *y the y^th leaf node of NFT. The leaf-node matching of #x to *y states the spatial relationship amongst the fraud and non-fraud samples classified in the leaf nodes of #x and *y. That is, if any sample is classified into the leaf node *y when using NFT, it is more likely to be classified into the leaf node #x when using FT.

TABLE II. THE MODELING PHASE

Similarly, from the FT perspective, if the leaf node *y in NFT hosts the majority of all of (fraud and non-fraud) training samples classified in the leaf node #x in FT, then we match the leaf node *y in NFT to the leaf node #x in FT and claim that the leaf node *y in NFT is the counterpart of the leaf node #x in FT.

In step 1, the Avg value (i.e., the average of Euclidean distances between the weight vector and the grouped fraud training samples) and the Std value (i.e., the standard deviation

of Euclidean distances between the weight vector and the grouped fraud training samples) of each leaf node of FT are calculated and stored. Similarly, in step 2, the Avg value (i.e., the average of Euclidean distances between the weight vector and the grouped non-fraud training samples) and the Std value (i.e., the standard deviation of Euclidean distances between the weight vector and the grouped non-fraud training samples) of each leaf node of NFT are calculated and stored. In step 3, we collect and store the following information regarding each training sample: the winning leaf node of FT, the winning leaf node of NFT, the corresponding Avg and Std values of the winning leaf node of FT, the corresponding Avg and Std values of the winning leaf node of NFT, the Dft (i.e., the Euclidean distance between the training sample and the weight vector of the winning leaf node of FT), and Dnft (i.e., the Euclidean distance between the training sample and the weight vector of the winning leaf node of NFT). The GHSOM classification rule is used to identify the winning leaf nodes of FT and NFT, respectively.

The Rule 1 is defined in (3), in which Avg_x is the Avg value of the leaf node #x of FT, Stdy is the Std value of the counterpart leaf node *y of NFT, and β1 is the parameter which states that some non-fraud samples cluster around a subset of fraud samples. That is, for the (fraud or non-fraud) sample t that is classified into the leaf node #x of FT, if Dft is smaller than the value of Avgx + β1 × Stdy, the sample t will be classified as the fraud one; otherwise, the non-fraud one. Because the discrimination boundary is data-dependent, the parameter β1 of the Rule 1 needs to be tuned to find the optimal discrimination boundary (i.e., Avgx + β1 × Stdy) for the match pair of the leaf node #x of FT and the leaf node *y of NFT. Therefore, in step 4, we use all training samples to determine the parameter β1

associated with the Rule 1 through the minimization of the sum of (type I and type II) classification errors.

Rule 1: If (Dft < Avgx + β1 × Stdy), the sample is classified as the fraud one; otherwise, the non-fraud one. (3) The Rule 2 is defined in (4), in which Avg_y is the Avg value of the leaf node *y of NFT, Stdx is the Std value of the counterpart leaf node #x of FT, and β2 is the parameter which states that some fraud samples cluster around a subset of non-fraud samples. That is, for the (non-fraud or non-non-fraud) sample t that is classified into the leaf node *y of NFT, if Dnft is smaller than the value of Avg_y + β₂ × Std_x, the sample t will be classified as the non-fraud one; otherwise, the fraud one. Because the discrimination boundary is data-dependent, the parameter β2 of the Rule 2 also needs to be tuned to find the optimal discrimination boundary (i.e., Avgy + β2 × Stdx) for the match pair of the leaf node *y of NFT and the leaf node #x of FT.

Therefore, in step 5, we use all training samples to determine the parameter β2 associated with the Rule 2 through the minimization of the sum of (type I and type II) classification errors.

Rule 2: If (Dnft < Avgy + β2 × Stdx), the sample is classified as the non-fraud one; otherwise, the fraud one. (4) In step 6 of Table II, the picked prediction rule is the Rule 1 if the sum of classification errors resulted in step 4 is smaller than the one resulted in step 5; otherwise, the Rule 2.

step 1: For each leaf node of FT,

i. calculate and store its Avg value that is the average of Euclidean distances between the weight vector and the grouped fraud training samples;

ii. calculate and store its Std value that is the standard deviation of Euclidean distances between the weight vector and the grouped fraud training samples.

step 2: For each leaf node of NFT,

i. calculate and store its Avg value that is the average of Euclidean distances between the weight vector and the grouped non-fraud training samples;

ii. calculate and store its Std value that is the standard deviation of Euclidean distances between the weight vector and the grouped non-fraud training samples.

step 3: For each training sample,

i. identify and store the winning leaf node of FT and the winning leaf node of NFT, respectively;

ii. store its Avg values of the winning leaf nodes of FT and NFT, respectively;

iii. store its Std values of the winning leaf nodes of FT and NFT, respectively.

iv. calculate and store its Dft, the Euclidean distance between the training sample and the weight vector of the winning leaf node of FT; and

v. calculate and store its Dnft, the Euclidean distance between the training sample and the weight vector of the winning leaf node of NFT.

step 4: Use the Rule 1 defined in (3) and all training samples to determine the parameter β1 that minimizes the corresponding sum of (type I and type II) classification errors.

step 5: Use the Rule 2 defined in (4) and all training samples to determine the parameter β2 that minimizes the corresponding sum of (type I and type II) classification errors.

step 6: Pick up the optimal prediction rule via comparing the classification errors obtained in step 4 and step 5.

The predicting phase is shown in Table III. For each investigated sample s, we first follow the GHSOM classification rule to find the winning leaf nodes of FT and NFT, respectively. Assume the winning leaf node of FT is the

#x one and the winning leaf node of NFT is the *y one. Then, we use the decided prediction rule obtained from the modeling phase to do the prediction. That is, if the Rule 1 is better in the modeling phase, step 2 is processed via (3) to predict the investigated sample s. If the Rule 2 is better in modeling phase, step 3 is processed via (4) to predict the investigated sample s.

TABLE III. THE PREDICTING PHASE

IV. ^THEFFR EXPERIMENT AND ITS RESULTS

This study follows the FFR experiment of [15] that explores FFR via GHSOM to help capital providers evaluate the integrity of financial statements. The details and the experimental results are briefed as follows.

A. Sample and data

The following sources were used to identify the fraud samples between the years from 1992 to 2006: indictments and sentences for major securities crimes issued by the Securities and Futures Bureau of the Financial Supervisory Commission, class action litigation cases initiated by Securities and Futures Investors Protection Center, and the law and regulations retrieving system of the Judicial Yuan in Taiwan. If a company’s financial statement for a specific year is confirmed to be fraudulent by the indictments and sentences for major securities crimes issued by the Department of Justice, it is classified into our fraud observations. For those financial statements that are free from fraud allegations are classified into our non-fraud observations.

The matched-firm design is used to form a sample set. That is, for each fraud firm, we match a non-fraud firm based on industry, total assets, and year. Thus, our sample composites of 116 publicly traded companies, including 58 fraud and 58 non-fraud ones over the period from 1992 to 2006. For each non-fraud company, we first identify the earliest year in which the financial statement fraud was committed. The sample periods cover two years before and two years after the year of the event, thus this study uses five consecutive annual financial statements. The final observations consist of 580 firm-year observations (i.e., annual financial statements) which comprise

B. Variable measurement and variable selection

Based upon FFR literature [2][7][10][11][13][16][18][19]

[21][22][27][28][31], 24 explanatory variables are selected and incorporated into the discriminant analysis to identify the significant variables that will be used as the input variables.

These are measurement proxies for attributes of profitability, liquidity, operating ability, financial structure, cash flow ability, financial difficulty, and corporate governance of a firm. These explanatory variables are collected from the Taiwan Economic Journal (TEJ) database.

The discriminant analysis result indicates that eight variables – return on assets (ROA), current ratio (CR), quick ratio (QR), debt ratio (DR), cash flow ratio (CFR), cash flow adequacy ratio (CFAR), Z-Score and sock pledge ratio (SPR) – have statistically significant effects. The corresponding Wilks' Λ value equals 0.766 and x² equals 151.095 (both significant at p-value < 0.01), which indicate that the discriminant model employed has adequate explanatory power. These eight variables proxy a company’s attributes from the aspects of profitability (ROA), liquidity (CR and QR), financial structure (DR), cash flow ability (CFR and CFAR), financial difficulty (Z-Score), and corporate governance (SPR).

C. Training GHSOM

As stated in [8], the development of the GHSOM is primarily dominated by the parameters of breadth (τ1) and depth (τ2). In order to reach the goal of obtaining the multi-layer hierarchy feature and preventing the overly clustering of fraud samples, we predefined the following selection criteria to derive an acceptable FT:

1) There are more than one layers of SOM in the GHSOM.

2) Samples of each node should not be overly clustered 3) Each leaf node should at least contain one sample.

Based on the criteria aforementioned, the trials of GHSOM parameter setting are taken. The parameter τ1 is adjusted from 0.4 to 0.8 per 0.05 scales, and the parameter τ2 is adjusted from 0.01 to 0.07 per 0.05 scales. When τ1 = 0.6 and τ2 = 0.07, each leaf node has at least one fraud sample. In the condition of

在文檔中以資料為中心之類神經網路方法論與離群值 (頁 35-42)