財務報表舞弊探索與類神經網路(I)

(1)

行政院國家科學委員會專題研究計畫成果報告

財務報表舞弊探索與類神經網路(第 2 年)

研究成果報告(完整版)

計畫類別：個別型

計畫編號： NSC 98-2410-H-004-049-MY2

執行期間： 99 年 08 月 01 日至 100 年 07 月 31 日

執行單位：國立政治大學資訊管理學系

計畫主持人：蔡瑞煌

共同主持人：林宛瑩

計畫參與人員：博士班研究生-兼任助理人員：黃馨瑩

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 100 年 07 月 29 日

(2)

行政院國家科學委員會補助專題研究計畫

■成果報告

□期中進度報告

財務報表舞弊探索與類神經網路

計畫類別：■個別型計畫 □整合型計畫

計畫編號：NSC 98－2410－H－004－049

－MY2

執行期間： 99 年 8 月 1 日至 100 年 7 月 31 日

執行機構及系所：國立政治大學資訊管理學系

計畫主持人：蔡瑞煌

共同主持人：林宛瑩

計畫參與人員：黃馨瑩

成果報告類型(依經費核定清單規定繳交)：□精簡報告 ■完整報告

本計畫除繳交成果報告外，另須繳交以下出國心得報告：

□赴國外出差或研習心得報告

□赴大陸地區出差或研習心得報告

■出席國際學術會議心得報告

□國際合作研究計畫國外研究報告

處理方式：

除列管計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年□二年後可公開查詢

中華民國 100 年 7 月 29 日

(3)

摘要

財務報表舞弊不僅對股東造成顯著的投資危機，也掀起資本市場的財務風暴。雖然財務報表的舞弊已

經引起許多關注，但大部分相關研究者著重在預測財務危機和破產，而鮮少聚焦在對財報舞弊本身知

識的探討。本研究旨在透過以下四個階段而對財報舞弊有更深的了解。

(1) 從文獻中整理出財務和公司治理方面和財報舞弊相關的所有指標，然後用統計分析方法採擷、獲得

和財報舞弊顯著相關的指標;

(2) 利用 Growing Hierarchical Self-Organizing Map (GHSOM)之人工智慧分群方法來對正常及舞弊的財

報資料分群;

(3) 剖析分群的財報資料以及利用專家之研判，以擷取財報舞弊的相關知識;

(4) 再利用專家來研判所採擷的財報舞弊的相關知識之可信度。

因為人工智慧分群方法可以從龐大的資料中找尋隱藏的階層關聯；所以學理上，這項研究是可行的。

在第一年，這項研究計畫著重於財務和公司治理方面和財報舞弊相關的所有指標之文獻整理，然後利

用統計分析方法採擷、獲得和財報舞弊相關的顯著指標；並且利用

GHSOM 分群方法來對正常及舞弊

的財報資料分群。在第二年，研究計畫套用所得之

GHSOM 來對舞弊的財報之起訴書和判決書做分群，

再利用專家之研判，以對每一群起訴書和判決書擷取財報舞弊的相關知識。

關鍵字：財務報表舞弊，GHSOM 人工智慧分群方法，知識萃取

(4)

ABSTRACT

Fraudulent financial reporting (FFR) has drawn much public as well as academic attention. However, the

extant literature review reveals that prior FFR-related research focused on the nature or the prediction of FFR

and that there is no study that extracts FFR features – the delicate but hid-den truths regarding FFR – from

very large quantities of FFR data through tools of artificial intelligence. On the other hand, artificial

intelligence techniques play an important role in accomplishing the task of financial fraud detection. Via

conducting a Growing Hierarchical Self-Organizing Map (GHSOM) application to FFR samples of Taiwan,

this study justifies the theoretical benefits of feature-extracting through GHSOM. Specifically, for each leaf

node of GHSOM, this study uncovers common fraudulent techniques from corresponding FFR indictments

and sentences of clustered samples without referring to the attributes of input variables. The acknowledge that

different leaf nodes have distinctive common fraudulent techniques can confirm the ability of GHSOM in

extracting features in terms of exogenous variables that are more abundant and more informative than

endogenous variables.

(5)

THE EXOGENOUS ISSUE OF FEATURE EXTRACTION

Introduction

Fraudulent Financial Reporting (FFR), or financial statement fraud, involves the intentional misstatement

or omission of material information from an organization’s financial reports (Beasley et al. 1999). Although

with the lowest frequency, FFR casts a severe financial impact with median losses of $2 million per scheme

(ACFE 2008). These are cases that often have severe consequences in terms of not only significant risks for

stockholders and creditors but also financial crises for the capital market.

However, the following extant literature review reveals that prior FFR-related research focused on the

nature or the prediction of FFR and that there is no in-depth study that explores FFR features – the delicate

but hidden truths regarding FFR – extracted from very large quantities of FFR data through tools of artificial

intelligence (AI). The nature-related FFR research often uses the case study approach and provides a

descriptive analysis of the characteristics of FFR and techniques commonly used. For instance, the Committee

of Sponsoring Organizations (COSO) and the Association of Certified Fraud Examiners (ACFE) regularly

publish their own analysis on fraudulent financial reporting of U.S. companies. Based on the FFR samples,

COSO examines and summarizes certain key company and management characteristics. ACFE analyzes the

nature of occupational fraud schemes and provides suggestions to create adequate internal control

mechanisms. Table 1 summarizes the research methodology and findings in nature-related FFR studies.

Other FFR researches often apply the empirical approach to archival data and identify significant variables

that help predict the occurrence of fraudulent reporting. Such research emphasizes the predictability of the

(6)

model used. For example, logistic regression and neural networks techniques are used in this line of research

(Bell and Carcello 2000; Fanning and Cogger 1998; Kirkos et al. 2007; Persons 1995; Virdhagriswaran and

Dakin 2006). Table 2 summarizes the research methodology and findings of prediction-related FFR studies.

On the other hand, AI techniques play an important role in accomplishing the task of financial fraud

detection (FFD) that involves distinguishing fraudulent financial data from authentic data, disclosing

fraudulent behavior or activities, and enabling decision makers to develop appropriate strategies to decrease

the impact of fraud (Ngai et al. 2010). One of popular AI techniques is Self-Organizing Map (SOM) proposed

by Kohonen (1982), a Neural Networks tool that conducts an unsupervised learning to produce a

low-dimensional view of high-dimensional data. SOM has been applied to FFD scenarios such as credit card,

automobile insurance and corporate fraud (Severin 2010; Ngai et al. 2010). There are several weaknesses of

SOM, however. For instance, its size (and thus topology) needs to be predefined and fixed and it is unable to

provide hierarchical relations amongst samples.

Growing Hierarchical Self-Organizing Map (GHSOM) proposed in (Dittenbach et al. 2000; Rauber et al.

2002) addresses the issue of fixed network architecture of SOM through developing the multilayer

hierarchical network structure, in which, as shown in Figure 1, each layer contains a number of SOMs. The

training process of GHSOM can be summarized in the following four phases (Dittenbach et al. 2000):

(1) Initialize the layer 0 and the layer 1: SOM of layer 0 consists of only a single node (group) whose weight

vector is initialized as the expected value of all imported samples. Then the mean quantization error

(MQE) of layer 0 (MQE

0

) is calculated. Hereafter, MQE of a node denotes the mean quantization error

(7)

the node. SOM of layer 1 initially has four nodes each of whose weight vectors is initialized randomly.

Then apply the following three phases to the SOM of layer 1 and SOMs of its subsequent layers.

(2) Train every individual SOM: Within the training process of an individual SOM, the sample is imported

one by one. The distances between the imported sample and the weight vectors of all nodes are

calculated. The node with the shortest distance is selected as the winner. Under the competitive learning

principle, only the winner and its neighboring nodes are qualified to adjust their weight vectors. Repeat

the competition and the training until the learning rate decreases to a certain value.

(3) Grow horizontally each individual SOM: Each individual SOM will grow until the mean value of MQEs

of all nodes on the SOM (MQE

m

) is smaller than the MQE of the parent node (MQE

p

) multiplied by τ

1

.

That is, the criterion for the stoppage of growth is MQE

m

< τ

1

× MQE

p

. If the stopping criterion is not

satisfied, identify the error node that owns the largest MQE and then, as shown in Figure 2, insert one

row or one column of new nodes between the error node and its dissimilar neighbor.

(4) Expand or terminate the hierarchical structure: After the horizontal growth phase of individual SOM,

MQE of each node (MQE

i

) is compared with the value of MQE

0

multiplied by τ

2

. The node with an

MQE

i

less than τ

2

× MQE

0

will become a leaf node that does not own a subsequent layer of SOM. The

node with an MQE

i

greater than τ

2

× MQE

0

will develop a subsequent layer of SOM that initially has

four nodes each of whose weight vectors is initialized randomly. In this way, the hierarchy grows until all

leaf nodes satisfy the stopping criterion MQE

i

< τ

2

× MQE

0

.

(8)

2001) (Shih et al. 2008) (Soriano-Asensi et al. 2008). These applications show that GHSOM is a useful tool to

extract relevant features from the vast amount of information or data.

Theoretically, there are several benefits when applying GHSOM to extracting features. First, with the

unsupervised learning nature, there are no predefined categories into which samples are to be classified; rather,

the GHSOM system will develop its own feature representation of the sample with a competitive learning

algorithm. Second, GHSOM classifies the sample into tons of small-sized leaf nodes (subgroups) with

hierarchical relationship such that further and more delicate analyses are feasible. Third, due to a competitive

learning nature GHSOM works as a regularity detector that is supposed to discover statistically salient

features of the sample population (Rumelhart and Zipser 1985). That is, extracted features in different leaf

nodes are distinctive.

Via conducting a GHSOM application to FFR samples of Taiwan, the study wants to justify the theoretical

benefits of feature-extracting mentioned above. Specifically, there are three objectives of the study: First,

based upon certain significant input variables derived from the FFR literature and some statistical tool, FFR

samples are classified into several small-sized leaf nodes of GHSOM. Second, unlike the traditional

approaches that interpret the outcome of a model via its input variables, this study uncovers the (common)

FFR features from the fraud samples clustered in each leaf node without referring to the attributes of input

variables. Suppose the FFR feature that we are interested in is the common fraudulent techniques used by the

fraud samples. Therefore, for each leaf node, the regularity of fraudulent techniques is uncovered from

indictments and sentences issued by the Department of Justice without referring to the attributes of input

variables. Such way of uncovering common fraudulent techniques can avoid the predicament of

(9)

examine whether different leaf nodes have distinctive common fraudulent techniques of FFR. Since common

fraudulent techniques are extracted from corresponding FFR indictments and sentences that are concluded

from more information than the values of derived variables, it is arguable whether GHSOM can obtain the

corresponding relation between common fraud techniques and derived variables. With acknowledging that

different leaf nodes have distinctive common fraudulent techniques, the study confirms the corresponding

relation between common fraud techniques and input variables that GHSOM has obtained implicitly.

The study may also contribute to the FFR literature at least as follows. Any FFR feature uncovered in

certain leaf node is applicable to all samples clustered in that leaf node. For each leaf node, this principle and

any pre-warning signals provided by features can result in some FFR audit guideline. Furthermore, with

distinctive FFR features extracted from different leaf nodes and a ton of leaf nodes, a further analysis of

associations between FFR features and corresponding clustered samples can provide insights of FFR.

The remainder of this paper is organized as follows. Section two reports the data preprocessing with the

discriminate analysis. Section three presents the sample and the outcomes of GHSOM. Section four provides

the extracted FFR feature of some subgroups. The last section concludes with a summary of findings,

implications, and suggestions for future works.

Data preprocessing

Sample for data preprocessing

The following sources were used to identify the fraud sample: indictments and sentences for major

securities crimes issued by the Securities and Futures Bureau of the Financial Supervisory Commission, class

action litigation cases initiated by Securities and Futures Investors Protection Center, and the law and

(10)

year is confirmed to be fraudulent by indictments and sentences for major securities crimes issued by the

Department of Justice, it is classified into our fraud observations, as to that company’s financial statements

free from fraud allegations they are classified into our non-fraud observations.

The matched-sample design is used to form a sample composite of 116 publicly traded companies,

including 58 fraud and 58 non-fraud ones between the years of 1992 to 2006. For each fraud firm, we match a

non-fraud firm based on industry, total assets, and year. For each fraud company, we first identified the

earliest year in which financial statement fraud was committed. The sample periods cover two years before

and two years after the year of the event. That is, five consecutive annual financial statements were used in

our study. The final observations used in the study consisted of 580 firm-year observations, i.e., 580 annual

financial statements were examined in the research.

For the 58 fraud firms, 113 annual financial statements were confirmed to have committed financial report

fraud (henceforth fraud samples) and 177 annual financial statements were free of allegations of such fraud

(henceforth non-fraud samples). As to the 58 non-fraud firms, 290 non-fraud samples were included. In brief,

our final research samples were comprised of 113 fraud samples and 467 non-fraud samples. The composite

ratio of fraud samples to non-fraud samples was 113:467. On average, approximately two fraudulent financial

statements (1.95 = 113/58) were included for each fraud firm. It is worth noting that of the 113 fraud samples,

there are 78 fraudulent financial statements and 35 restated financial statements. The firms that provided the

35 restated statements were the ones that survived financial scandals and whose restated statements were in

compliance with government regulations. The restated financial statements can be perceived as reflecting the

firms’ true financial positions that lead to the occurrences of the fraudulent financial reporting behavior. Such

mixture of data mimics the environment of information in the real world which prevails with both true and

(11)

false data.

Variable measurement and discriminant analysis

Based upon FFR literature, 25 explanatory variables are selected and incorporated into the discriminant

analysis. Table 3 summarizes the definition and measurement of these variables. These are measurement

proxies for attributes of profitability, liquidity, operating ability, financial structure, cash flow ability, financial

difficulty, and corporate governance of a firm. These explanatory variables are collected from the Taiwan

Economic Journal (TEJ) database.

We first test the multi-collinearity issue between explanatory variables. The unreported results indicate that

GIS should be excluded. As a result, 24 independent variables are incorporated in the Canonical Discriminant

Analysis as shown in model (1).

DBCBSCFR DBVRCFR SMLSR SPR Score -Z CFRR CFAR CFR LFTFA DR ITTA ARTTA GARS GI GAR TAT ART QR CR GNI GS ROA OPR GPM RAUD F 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2                                                                        1

(1)

Table 4 shows the descriptive statistics of the variables, including the mean, median, 25 percentiles and 75

percentiles. Column Z means one result of non-parametric test. Except GS, GIS, DBVRCFR, DBCBSCFR,

other variables do have different statistical features between the fraud and non-fraud samples.

Table 5 shows the empirical results of the discriminant analysis and shows that the Wilks' Λ value equals

0.766 and x

2

_{equals 151.095 (both significant at p-value < 0.01), which indicates that the discriminant model}

employed has adequate explanatory power. Table 5 indicates that eight variables, ROA, CR, QR, DR, CFR,

(12)

a company’s attributes from the aspects of profitability (ROA), liquidity (CR and QR), financial structure (DR),

cash flow ability (CFR and CFAR), financial difficulty (Z-Score), and corporate governance (SPR).

Sample and Growing Hierarchical Self Organizing Map

These eight variables chosen from discriminant analysis were collected for our 113 fraud samples and used

as the training data for GHSOM. To have the prevention of overly clustering fraud samples, we set up the

following predefined selection criteria to pick a suitable GHSOM:

(1) There is more than one layer of SOM in the GHSOM.

(2) Samples of each mapping should not be overly clustered into any one of child nodes.

Figure 3 shows the sample distribution of the obtained GHSOM (with τ

1

being 0.8 and τ

2

0.07), in which

leaf nodes are marked in taint. In each node, there is a name given according to its layer number and its node

order in the same SOM as well as its parent’s name. For instance, the node #12-21is node number 1 in layer 2

developed from the node number 2 of layer 1. In each node, the numbers within the parenthesis indicate the

number of fraudulent financial statements and the number of (fraud) firms.

Common Fraudulent Techniques

To refer to fraudulent techniques that are generally accepted, here the ten fraudulent techniques from

(Beasley et al. 1999) are used. That is, there are three basic types of fraudulent techniques: Improper Revenue

Recognition, Overstatement of Assets, and Others. Improper Revenue Recognition includes recording

fictitious revenues (FT1), recording revenues prematurely (FT2), and no description/overstated revenues

(FT3). Overstatement of Assets includes overstating existing assets (FT4), recording fictitious assets or assets

(13)

not owned (FT5), and capitalizing items that should be expensed (FT6). Others includes understatement of

expenses/liabilities (FT7), misappropriation of assets (FT8), inappropriate disclosure (FT9), and other

miscellaneous techniques (FT10).

For demonstration purposes, we take merely the three leaf nodes, #11, #14-21, and #14-24 to illustrate the

parts of uncovering the regularity of fraudulent techniques from the corresponding indictments and sentences

for major securities crimes issued by the Department of Justice. Table 6 summarizes the fraudulent techniques

commonly adopted by companies clustered in these three leaf nodes. The code and year in the first two

column of Table 6 lists the company code and the year of each clustered financial statement.

As shown in Table 6, common fraudulent techniques found in leaf node #11 are FT1, FT6 and FT8; in leaf

node #14-24 are FT1, FT4 and FT8; and in leaf node #14-21 are FT4 and FT8. In sum, Table 6 shows that the

observed common fraudulent techniques in different leaf nodes are distinctive even though samples are

clustered based upon corporate financial situations proxied by input variables (i.e., the eight variables

identified from discriminant analysis).

Compared to the traditional fraudulent technique classification scheme, such a contrast demonstrates the

advantage of our approach since our classification outcomes appear to be more delicate. For instance, some

fraud samples in leaf node #11 were found using FT1 via creating fictitious transactions and defrauding export

drawbacks from the Internal Revenue Service by reporting fictitious export sales. Moreover, some fraud

samples used FT8 by processing the receipt and payment in advance. In contrast, some fraud samples in leaf

node #14-24 were found to have been using FT4 through purchasing intangible asset/long-term investment

with high premiums. Some fraud samples used FT8 through related party transactions and merger and

(14)

acquisition activities to misappropriate cash.

Conclusion

In the data preprocessing stage, a sample set comprised of 113 fraud samples and 467 non-fraud samples is

used to identify eight significant variables regarding FFR via the discriminant analysis. Based upon the

(identified) variables as inputs, GHSOM clusters 113 fraud samples into 13 (small-sized) leaf nodes.

Distinguishing this study from others of feature extraction is that, for each leaf node, common fraud

techniques are disclosed with the assistance of expert knowledge in examining corresponding FFR

indictments and sentences (exogenous information) of clustered samples without referring to the attributes of

input variables. With acknowledging that different leaf nodes have distinctive common fraudulent techniques,

the study confirms the corresponding relation between common fraud techniques (an exogenous variable) and

input variables that GHSOM has obtained implicitly as well as the abilities of GHSOM in (1) extracting

features from exogenous information that are more abundant and more informative than input variables and (2)

classifying exogenous variables in terms of input variables. To go further to uncover the corresponding

relation between common fraud techniques and input variables is one of future works.

The systematic and integrated approach extended from the study is capable of constructing cause and

effect evidence of FFR. In addition, accumulating FFR features can help investigate as well as detect the

nature and possibility of FFR in future reporting. For instance, based upon the observed regularity of common

fraudulent techniques in each leaf node, we could identify the relevant financial indicators as the signal which

reveals the potential fraudulent activities for any samples clustered into this leaf node by GHSOM. When a

new sample is imported into the obtained GHSOM, if the distance deviation between the input vector and the

weight vector of winner node is less than a predefined threshold



, then the new sample is assigned as the

(15)

(fraud) member of that subgroup; otherwise, as the non-fraud. Based upon the assignment, we can develop

another systematic and integrated approach that helps capital providers (including investors and creditors)

make their investment or credit decisions as well as can help auditors perform prudent audit planning and

audit judgment. Such extended approach can also assist individuals such as corporate board members whose

responsibility is to monitor the performance of top management and who may need to play a more proactive

risk reduction role by designing and performing extended procedures as part of the fraud deterrence

engagements.

Other future works are suggested as follows: (1) to refine the GHSOM to get a better classification

mechanism or to identify better ways in extracting FFR features from the outcomes of GHSOM; (2) to

investigate the generality of our approach using data from other countries; and (3) to examine the prediction

ability of each result extended from the study.

References

ACFE. 2008. 2008 Report to the nation on occupational fraud & abuse, Association of Certified Fraud

Examiners, Austin, TX.

Altman, E.I. 1968. "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,"

Journal of Finance (23:4), pp 589-609.

Beasley, M.S., Carcello, J.V., and Hermanson, D.R. 1999. Fraudulent financial reporting: 1987-1997 COSO,

New York.

Bell , T.B., and Carcello, J.V. 2000. "A decision aid for assessing the likelihood of fraudulent financial

reporting," Auditing: A Journal of Practice & Theory (19), pp 169-184.

Dechow, P.M., Ge, W., Larson, C.R., Sloan, R.G., and Investors, B.G. 2007. "Predicting material accounting

manipulations," AAA 2007 Financial Accounting and Reporting Section (FARS) Paper [Electronic Version]

(1001), December 13, 2007 p48109. Retrieved December 13, 2007 from http://ssrn.com/abstract=997483.

Dechow, P.M., Sloan, R.G., and Sweeney, A.P. 1996. "Causes and Consequences of Earnings Manipulation:

An Analysis of Firms Subject to Enforcement Actions by the SEC," Contemporary accounting research

(13:1), pp 1-36.

Dittenbach, M., Merkl, D., and Rauber, A. 2000. "The growing hierarchical self-organizing map," Proceedings

of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, IJCNN, pp. 15-19.

(16)

financial data," Intelligent Systems in Accounting, Finance & Management (7:1), pp 21-41.

Green, B.P., and Choi, J.H. 1997. "Assessing the risk of management fraud through neural network

technology," Auditing (16), pp 14-28.

Hoogs, B., Kiehl, T., Lacomb, C., and Senturk, D. 2007. "A genetic algorithm approach to detecting temporal

patterns indicative of financial statement fraud," Intelligent Systems in Accounting, Finance &

Management (15:1-2), pp 41-56.

Kirkos, E., Spathis, C., and Manolopoulos, Y. 2007. "Data Mining techniques for the detection of fraudulent

financial statements," Expert Systems with Applications (32:4), pp 995-1003.

Kohonen, T. 1982. "Self-organized formation of topologically correct feature maps," Biological cybernetics

(43:1), pp 59-69.

La Porta, R., Lopez-de-Silanes, F., Shleifer, A., and Vishny, R. 1999. "Corporate ownership around the

world," Journal of Finance (54:2), pp 471-517.

Lee, T.S., and Yeh, Y.H. 2004. "Corporate governance and financial distress: evidence from Taiwan,"

Corporate Governance: An International Review (12:3), pp 378-388.

Ngai, E.W.T., Hu, Y. , Wong , Y.H., Chen, Y., and Sun, X. 2010. "The application of data mining techniques in

financial fraud detection: A classification framework and an academic review of literature", Decision

Support Systems, In Press, Accepted Manuscript, Available online 19 August 2010.

Persons, O.S. 1995. "Using financial statement data to identify factors associated with fraudulent financial

reporting," Journal of Applied Business Research (11), pp 38-46.

Rauber, A., Merkl, D., and Dittenbach, M. 2002. "The growing hierarchical self-organizing map: exploratory

analysis of high-dimensional data," IEEE Transactions on Neural Networks (13:6), pp 1331-1341.

Rumelhart, D.E., and Zipser, D. 1985. "Feature discovery by competitive learning," Cognitive Science (9:1),*

pp 75-112.

Schweighofer, E., Rauber, A., and Dittenbach, M. 2001. "Automatic text representation, classification and

labeling in European law," ACM, p. 87.

Séverin, E. 2010. "Self organizing maps in corporate finance: Quantitative and qualitative analysis of debt and

leasing," Neurocomputing (73:10-12), pp. 2061-2067

Shih, J., Chang, Y., and Chen, W. 2008. "Using GHSOM to construct legal maps for Taiwan's securities and

futures markets," Expert Systems with Applications (34:2), pp 850-858.

Soriano-Asensi, A., Martin-Guerrero, J., Soria-Olivas, E., Palomares, A., Magdalena-Benedito, R., and

Serrano-Lopez, A. 2008. "Web mining based on Growing Hierarchical Self-Organizing Maps: Analysis of

a real citizen web portal," Expert Systems with Applications (34:4), pp 2988-2994.

Stice, J.D. 1991. "Using financial and market information to identify pre-engagement factors associated with

lawsuits against auditors," Accounting Review (66:3), pp 516-533.

Summers , S.L., and Sweeney, J.T. 1998. "Fraudulently misstated financial statements and insider trading: An

empirical analysis," Accounting Review (73:1), pp 131-146.

Virdhagriswaran, S., and Dakin, G. 2006. "Camouflaged fraud detection in domains with complex

relationships," ACM, pp. 941-947.

Yeh, Y., Lee, T., and Woidtke, T. 2001. "Family control and corporate governance: Evidence from Taiwan,"

(17)

Table 1: Research methodology and findings in nature-related FFR studies.

Research Methodology

Findings

(Beasley et

al. 1999)

• Case study

• Descriptive

statistics

• Nature of companies involved

–

Companies committing financial statement

fraud were relatively small.

–

Companies committing the fraud were inclined

to experience net losses or close to break-even

positions in periods before the fraud.

• Nature of the control environment

–

Top senior executives were frequently involved.

–

Most audit committees only met about once a

year or the company had no audit committee.

• Nature of the frauds

–

Cumulative amounts of fraud were relatively

large in light of the relatively small sizes of the

companies involved.

–

Most frauds were not isolated to a single fiscal

period.

–

Typical financial statement fraud techniques

involved the overstatement of revenues and

assets.

• Consequences for the company and individuals

involved

–

Severe consequences awaited companies

committing fraud.

–

Consequences associated with financial

statement fraud were severe for individuals

allegedly involved.

(ACFE

2008)

• Case study

• Descriptive

statistics

• Occupational fraud schemes tend to be extremely

costly. The median loss was $175,000. More than

one-quarter of the frauds involved losses of at

least $1 million.

• Occupational fraud schemes frequently continue

for years, two years in typical, before they are

detected.

• There are 11 distinct categories of occupational

fraud. Financial statement fraud was the most

costly category with a median loss of $2 million for

the cases examined.

• The industries most commonly victimized by fraud

in our study were banking and financial services

(15% of cases), government (12%) and

healthcare (8%).

• Fraud perpetrators often display behavioral traits

that serve as indicators of possible illegal

behavior. In financial statement fraud cases,

which tend to be the most costly, excessive

organizational pressure to perform was a

particularly strong warning sign.

(18)

Table 2: Research methodology and findings in FFR empirical studies.

Author Methodology Variable Sample Findings

(Dechow et al. 1996)

Logistic regression • 21 variables – Financial ratios – Other indicators: corporate governance, motivationn etc. Matched-pairs design 92 firms subject to enforcement actions by the SEC

• To attract external financing at low cost was found an

important motivation for earnings manipulation • Firms manipulating earnings

are more likely to have: - insiders dominated boards, - Chief Executive Officer

simultaneously serves as Chairman of the Board. (Persons 1995) Stepwise logistic model • 9 financial ratios • Z-score Matched- pairs design

The study found four significant indicators: financial leverage, capital turnover, asset composition and firm size (Fanning and Cogger 1998) Self-organizing artificial neural network • 62 variables • Financial ratios • Other indicators: corporate governance, capital structure etc. Matched- pairs design: 102 fraud samples and 102 non-fraud samples

• Neural network is more effective

• Financial ratios such as debt to equity, ratios of accounts receivable to sales, trend variables etc are significant indicators.

(Bell and Carcello 2000)

Logistic regression 46 fraud risk factors

77 fraud samples and 305 non-fraud samples

Logistic regression model outperformed auditors for fraud samples, but were equally performed for non-fraud samples. (Kirkos et al. 2007) • Decision tree • Back-propagatio n neural network • Bayesian belief network • 27 financial ratios • Z-score Matched- pairs design: 38 fraud samples and 38 non-fraud samples

• Training dataset: neural network is the most accurate • Validation dataset: Bayesian

belief network is the most accurate

(Hoogs et al. 2007)

Genetic Algorithm • 38 financial ratios • 9 qualitative indicators 51 fraud samples vs. 51 non-fraud samples

Integrated pattern had a wider coverage for suspected fraud companies while it remained lower false classification rate for non-fraud ones

(19)

Table 3: Variable definition and measurement

Variable Definition Literature Measurement Dependent variable:

FRAUD (Persons 1995)

If a company’s financial statements for specific years are confirmed to be fraudulent by the indictments and sentences for major securities crimes issued by the Department of Justice, the firm-year data are classified into fraud observations, and the variable FRAUD will be set to 1, 0 otherwise.

Independent variable Profitability

Gross profit margin

(GPM) (Dechow et al. 2007) Sales

costs Operating -Sales Operating profit ratio (OPR)

(Green and Choi 1997)

les expenses Operating costs-Operating les-Sa Sa Return on assets (ROA) (Hoogs et al. 2007;

Persons 1995) Average totalassets

rate) Tax -(1 expenses Interest income Net   Growth in sales (GS) (Dechow et al. 2007; Stice 1991; Summers and Sweeney 1998) 1 ) (  year fiscal prior in Sales Sales Growth in net income (GNI) (Dechow et al. 2007; Stice 1991; Summers and Sweeney 1998) 1 ) (  year fiscal prior in income Net income Net Liquidity Current ratio

(CR) (Kirkos et al. 2007) Currentliabilities

assets Current

Quick ratio

(QR) (Kirkos et al. 2007) Currentliabilities

expenses Prepaid -s Inventorie -assets Current Operating ability Accounts receivable turnover (ART)

(Green and Choi 1997)

receivable accounts Average Sales Total asset turnover (TAT) (Kirkos et al. 2007;

Persons 1995) Totalassets

(20)

Growth in accounts receivable (GAR) (Dechow et al. 2007) )-1 year fiscal prior in receivable Accounts receivable Accounts ( Growth in inventory

(GI) (Dechow et al. 2007) Inventory in prior fiscalyear)-1

Inventory ( Growth in accounts receivable to sales (GARS) (Summers and Sweeney 1998) t-1 1 -t t t Sales receivable Accounts -Sales receivable Accounts Growth in inventory to sales (GIS) (Summers and Sweeney 1998) t-1 1 -t t t Sales Inventory -Sales Inventory Accounts receivable to total assets (ARTTA)

(Green and Choi 1997; Persons 1995; Stice 1991) Totalassets receivable Accounts Inventory to total assets (ITTA) (Persons 1995; Stice 1991) Totalassets Inventory Financial structure Debt ratio (DR) (Kirkos et al. 2007;

Persons 1995) Totalassets

s liabilitie Total Long-term funds to fixed assets (LFTFA) (Kirkos et al. 2007) assets Fixed s liabilitie Longterm + Equity

Cash flow ability

Cash flow ratio

(CFR) (Dechow et al. 2007) Current liabilities

activities operating from flows Cash Cash flow adequacy ratio (CFAR) (Dechow et al. 2007) dividends) cash and additions inventory es, expenditur capital of sum year (Five activities operating from flows cash of sum year Five Cash flow reinvestment ratio (CFRR) (Dechow et al. 2007) capital) Working + assets Other + s investment term Long + assets fixed (Gross dividends Cash -activities operating from flows Cash Financial difficulty

(21)

Z-score

(Altman 1968; Fanning and Cogger 1998; Stice 1991; Summers and Sweeney 1998) ) 1.0 TAT debt total of Book value equity of ue Market val ( 6 . 0 ) assets Total taxes and interest before Earnings ( 3 . 3 ) assets Total earnings Retained ( 4 . 1 ) assets Total capital Working ( 2 . 1          Corporate Governance

Stock Pledge ratio

(SPR)# (Lee and Yeh 2004) _large_shareholde_rs'_shareholdi_ngs pledge in ngs shareholdi rs' shareholde large Sum of percentage of major shareholders’ shareholdings (SMLSR)

(Beasley et al. 1999) Σ (Percentage of shareholdings >10%)

Deviation

between VR and CFR

(DBVRCFR)

(La Porta et al. 1999; Lee

and Yeh 2004) Voting rights (VR) - Cash flow rights (CFR) Deviation

between CBS and CFR

(DBCBSCFR)

(Lee and Yeh 2004; Yeh et al. 2001)

Percentage of board seats controlled (CBS) - Cash flow rights (CFR)

#

_{: According to the rule issued from the Securities and Futures Commission (SFC) of Taiwan,}

directors, supervisors, managers and large shareholders (that own 10 per cent or more of a

company’s outstanding shares) in public companies are obliged to report to the SFC the percentage

of their shareholdings that are pledged for loans and credits. These data matter, since pledging for

loans effectively reduces the personal funds required for shareholding. In other words, the degree of

personal leverage expands and the over-investments in the stock market by the largest shareholder

also create risk for the companies to a certain degree. (Lee and Yeh 2004)

(22)

Table 4: Descriptive Statistics of variables

Fraud Sample (N=113) Non-fraud Sample (N=467)

Variable Mean Median

25 Percentiles 75 Percentiles Mean Median 25 Percentiles 75 Percentiles Z GPM 11.85 10.65 4.99 19.41 15.51 14.47 8.12 22.77 -3.19 OPR -5.39 0.32 -7.26 6.92 -34.49 3.81 -0.24 8.60 -3.98 ROA -13.45 -2.76 -23.48 5.29 3.40 4.19 0.39 7.97 -6.53 GS 8.30 7.84 -15.47 24.99 38.73 5.23 -7.77 19.89 -0.08 GNI 47.23 -71.97 -636.91 24.49 -41.32 14.30 -44.89 80.07 -6.74 CR 109.83 104.68 60.98 141.48 190.94 150.01 110.02 210.00 -7.00 QR 57.79 45.54 21.84 77.09 110.36 75.73 38.09 124.66 -5.16 ART 7.10 4.62 3.16 7.34 8.91 5.36 3.75 8.94 -2.51 TAT 0.61 0.48 0.31 0.74 0.75 0.64 0.41 0.93 -3.69 GAR 39.67 -5.73 -37.06 34.73 68.97 6.03 -15.15 33.86 -2.42 GI 13.85 -1.02 -28.82 23.66 27.03 2.18 -14.80 31.14 -1.67 GARS -0.17 -1.04 -7.95 3.30 2.13 0.22 -2.75 3.11 -2.46 GIS 24.91 -0.34 -5.40 3.43 23.96 0.00 -3.37 4.80 -1.11 ARTTA 12.02 10.11 4.79 18.37 13.70 10.84 5.05 20.33 -1.33 ITTA 16.72 11.36 5.96 19.49 19.94 13.57 5.82 24.67 -1.74 DR 64.02 60.23 48.10 71.40 48.17 45.03 33.67 56.75 -7.59 LFTFA 452.26 165.79 95.29 399.96 482.48 225.20 146.73 427.05 -3.48 CFR -14.91 -6.88 -21.21 6.54 13.41 8.12 -5.96 29.70 -6.26 CFAR -18.56 -6.54 -27.97 8.65 9.36 14.52 -17.16 54.56 -5.53 CFRR -46.73 -2.69 -14.70 3.74 0.37 2.03 -4.17 7.56 -4.59 SPR 37.44 33.44 1.83 63.26 19.32 3.58 0.00 32.49 -5.67 SMLSR 13.97 11.98 3.72 20.38 10.83 7.89 0.09 16.96 -3.16 DBVRCFR 3.47 0.47 0.00 2.76 3.62 0.56 0.00 4.09 -0.66 DBCBSCFR 46.00 45.58 22.87 67.41 44.26 43.68 26.99 63.69 -0.59 Z-Score 31.45 79.60 -91.69 166.17 198.67 194.70 120.89 270.95 -8.68

(23)

Table 5: Empirical results of discriminant analysis.

Variable Coefficient F-value

Significance

GPM

0.14

3.51

0.061 OPR

-0.03

0.16

0.688 ROA

0.77 105.82

0.000***

GS

0.06

0.63

0.427 GNI

-0.02

0.05

0.822 CR

0.34 20.59

0.000***

QR

0.28 13.42

0.000***

ART

0.09

1.58

0.210 TAT

0.19

6.38

0.012 GAR

0.03

0.12

0.731 GI

0.07

0.90

0.344 GARS

0.00

0.997 ARTTA

0.11

2.25

0.134 ITTA

0.12

2.37

0.125 DR

-0.42 30.46

0.000***

LFTFA

0.02

0.09

0.764 CFR

0.33 19.21

0.000***

CFAR

0.24 9.89

0.002***

CFRR

0.19

6.41

0.012 SPR

-0.47 38.85

0.000***

SMLSR

-0.19

6.18

0.013 DBVRCFR

0.02

0.04

0.835 DBCBSCFR

-0.05

0.41

0.524 Z-score

0.64 72.74

0.000***

Wilks' Λ value

0.77 p-value

0.000

(24)

Table 6: Common fraudulent techniques adopted by fraud firms of leaf nodes of #11, #14-21, and #14-24. The

code and year in the first two columns lists the company code and the year of each clustered financial statement. In each node, the numbers within the parenthesis indicate the number of fraudulent financial statements and the number of (fraud) firms.

Code year FT1 FT2 FT3 FT4 FT5 FT6 FT7 FT8 FT9 FT10 leaf node #11 (12/9) 2505 1998 ● 2529 1998 ● ● 8716 1999 ● ● 2334 1999 ● ● 3039 2004 ● 1601 1998 ● 1221 2002 ● ● ● 1221 2003 ● ● ● 2014 2003 ● ● 5901 1997 ● ● 5901 1998 ● ● 5901 1999 ● ● leaf node #14-24 (12/9) 2206 1999 ● 2350 1998 ● 2407 2002 ● ● ● ● ● ● 2407 2003 ● ● ● ● ● ● 2407 2004 ● ● ● ● ● ● 2490 2000 ● ● 2490 2002 ● ● 8295 1998 ● ● 1221 2001 ● ● 8723 1998 ● ● ● 2017 1997 ● ● 5007 1999 ● ● leaf node #14-21 (7/7/) 5504 1999 ● 2328 1998 ● ● 2334 1998 ● ● 1505 1997 ● 5007 1998 ● ● 2614 1999 ● ● ● ● 1466 1998 ● ●

FT1: recording fictitious revenues; FT2: recording revenues prematurely; FT3: no description/overstated about revenues; FT4: overstating existing assets;

(25)

FT5: recording fictitious assets or assets not owned; FT6: capitalizing items that should be expensed; FT7: understatement of expenses/liabilities; FT8: misappropriation of assets;

(26)

(27)

Figure 2: Horizontal growth of individual SOM. The notation e indicates the error node and d the dissimilar

(28)

Figure 3: The sample distribution in the obtained GHSOM, in which leaf nodes are marked in taint. In each

node, the numbers within the parenthesis indicate the number of fraudulent financial statements and the number of (fraud) firms.

#11 (12/9) #12 (28/18) #13 (25/18) #14 (48/15) Layer 1 Layer 2 #12-21 (3/3) #12-23 (13/9) #12-22 (11/7) #12-24 (1/1) #13-21 (6/5) #13-23 (4/4) #13-22 (9/7) #13-24 (6/3) #14-21 (7/7) #14-23 (21/12) #14-22 (8/8) #14-24 (12/9) #01 (113/58)

(29)

國科會補助專題研究計畫成果報告自評表

請就研究內容與原計畫相符程度、達成預期目標情況、研究成果之學術或應用價

值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）

、是否適

合在學術期刊發表或申請專利、主要發現或其他有關價值等，作一綜合評估。

1. 請就研究內容與原計畫相符程度、達成預期目標情況作一綜合評估

■

達成目標

□ 未達成目標（請說明，以

100 字為限）

□ 實驗失敗

□ 因故實驗中斷

□ 其他原因

說明：

2. 研究成果在學術期刊發表或申請專利等情形：

論文：□已發表 □未發表之文稿 ■撰寫中 □無

專利：□已獲得 □申請中 □無

技轉：□已技轉 □洽談中 □無

其他：（以

100 字為限）

已將研究成果陸續在

American Accounting Association Annual Conference

2010 以及 American Accounting Association Annual Conference 2011 發表，並

已陸續撰寫成數篇論文投稿到國際期刊，應該會有至少兩篇的

SCI/SSCI 國際期

(30)

3. 請依學術成就、技術創新、社會影響等方面，評估研究成果之學術或應用價

值（簡要敘述成果所代表之意義、價值、影響或進一步發展之可能性）（以

500 字為限）

1. 研究所得之結果及研究過程，可供國內學術及研究單位易參考之用。

2. 參與本研究之人員，可獲取下述之研習效果﹕

(1) 參與相關蒐集、理論探討、程式撰寫及報告完成，有助於其日後獨立從

事研究工作。

(2) 藉由參與計畫，引發研究人員對於 GHSOM 和財報舞弊有更深的瞭解，有

助於未來之研究發展。

例如，

計畫參與人員黃馨瑩乃為博士班學生

，

其博士論文乃基於此研究

案之成果

，

而其研究成果

已陸續撰寫成數篇論文投稿到國際期刊，應該會有

至少兩篇的

SCI/SSCI 國際期刊論文發表。

(31)

出席國際學術會議心得報告

計畫編號 NSC98-2410-H-004-049-MY2 計畫名稱財務報表舞弊探索與類神經網路出國人員姓名服務機關及職稱林宛瑩國立政治大學會計學系副教授會議時間地點 2010/7/31-2010/8/4

IN SAN FRANCISCO, CA, USA

會議名稱 2010 AMERICAN ACCOUNTING ASSOCIATION ANNUAL MEETING

AND CONFERENCE ON TEACHING AND LEARNING IN ACCOUNTING

發表論文題目 A Neural Networks Tool to Enhance the Understanding of Fraudulent Financial _Reporting

一、參加會議經過

我於01/08/2010 晚上到達 San Francisco，於 02/08/2010-04/08/2010 會議期間至會

場聆聽Opening Plenary Session 及每日之 Plenary Session 與當日多篇論文之發表，包

括Determinants of disclosure quality、Information technology impacts on operational

performance、New approaches to accounting research、COSO Panel、Internal auditing、 Internal control、Conservatism、Fraud detection、Fraud case studies 等場次之討論與多

篇論文發表；並於03/08/2010 下午 4:00 的 Concurrent Sessions 發表論文。於 05/08/2010 凌晨離開 San Francisco 回國。本人所發表之論文參見後附之附件。二、與會心得 Plenary Session 邀請了不少會計學各領域之知名學者參與座談及演講，各場次論文發表亦使本人受益良多。本人所發表之論文，蒙與會學者提供多項建議，修正後論文目前已在IMDS 期刊進行審查。

(32)

2

A Neural Networks Tool to Enhance the

Understanding of Fraudulent Financial

Reporting

Abstract

Fraudulent financial reporting (FFR) has drawn much public as well as academic attention. However, most literature focuses on predicting the likelihood of financial fraud, financial distress or bankruptcy. Less emphasis has been placed on exploring FFR itself, and FFR techniques and knowledge. The purpose of this research is to explore FFR via Growing Hierarchical Self-Organizing Map (GHSOM), an unsupervised Neural Network tool, to enhance the understanding of FFR. This study addresses the challenge through the following two-stage approach: a classification stage that well trains the GHSOM to cluster the sample into subgroups with hierarchical relationship and a pattern-disclosure stage that uncovers patterns of the common financial reporting fraud techniques and relevant risk indicators to enhance the understanding of FFR. An application is conducted and its results show that the proposed two-stage approach is helpful in enhancing the understanding of FFR.

Key words: Fraudulent Financial Report; Growing Hierarchical Self-Organizing Map; Knowledge Extraction

(33)

1. Introduction

This study focuses on exploring financial reporting fraud via Growing Hierarchical Self-Organizing Map (GHSOM), an unsupervised Neural Network tool (Dittenbach et al. 2000; Dittenbach et al. 2002; Rauber et al. 2002), to enhance the understanding of Fraudulent Financial Reporting (FFR). FFR, or financial statement fraud, involves the intentional misstatement or omission of material information from an organization’s financial reports (Beasley et al. 1999). These are cases known as “cooking the books” that often have severe economic consequences and make front page headlines. FFR can lead not only to significant risks for stockholders and creditors, but also financial crises for the capital market. FFR, although with the lowest frequency, casts a severe financial impact (Association of Certified Fraud Examiners, ACFE 2008). According to the ACFE (2008), financial misstatements are the most costly form of occupational fraud, with median losses of $2 million per scheme.

Most prior FFR-related research focused on the nature or the prediction of FFR. The nature-related FFR research often uses the case study approach and provides a descriptive analysis of the characteristics of FFR and techniques commonly used. For example, the Committee of Sponsoring Organizations (COSO) and the Association of Certified Fraud Examiners (ACFE) regularly publish their own analysis on fraudulent financial reporting of U.S. companies. Based on the FFR samples, COSO examines and summarizes certain key company and management characteristics. ACFE analyzes the nature of occupational fraud schemes and provides suggestions to create adequate internal control mechanisms. Table 1 summarizes the research methodology and findings in nature-related FFR studies.

Table 1: Research methodology and findings in nature-related FFR studies.

Research Methodology Findings

Beasley et al. (1999)

•Case study

•Descriptive statistics

•Nature of companies involved

– Companies committing financial statement fraud were

relatively small.

– Companies committing the fraud were inclined to experience

net losses or close to break-even positions in periods before the fraud.

•Nature of the control environment

– Top senior executives were frequently involved.

– Most audit committees only met about once a year or the

company had no audit committee. •Nature of the frauds

– Cumulative amounts of fraud were relatively large in light of the relatively small sizes of the companies involved.

– Most frauds were not isolated to a single fiscal period.

– Typical financial statement fraud techniques involved the

(34)

4

•Consequences for the company and individuals involved

– Severe consequences awaited companies committing fraud.

– Consequences associated with financial statement fraud were

severe for individuals allegedly involved. ACFE

(2008)

•Case study

•Descriptive statistics

• Occupational fraud schemes tend to be extremely costly. The

median loss was $175,000. More than one-quarter of the frauds involved losses of at least $1 million.

• Occupational fraud schemes frequently continue for years, two

years in typical, before they are detected.

• There are 11 distinct categories of occupational fraud. Financial statement fraud was the most costly category with a median loss of $2 million for the cases examined.

• The industries most commonly victimized by fraud in our study

were banking and financial services (15% of cases), government (12%) and healthcare (8%).

• Fraud perpetrators often display behavioral traits that serve as indicators of possible illegal behavior. In financial statement fraud cases, which tend to be the most costly, excessive organizational pressure to perform was a particularly strong warning sign.

Another type of FFR research often uses the empirical approach to archival data and identifies significant variables that help predict the occurrence of fraudulent reporting. This line of research also inputs these significant variables into the fraud prediction model. Such research emphasizes the predictability of the model used. For example, logistic regression and neural network techniques are used in this line of research (e.g., Persons 1995; Fanning and Cogger 1998; Bell et al. 2000; Virdhagriswaran 2006; Kirkos et al. 2007). The matched-sample design is typical for traditional FFR empirical studies. That is, a set of samples with fraudulent financial statements confirmed by the Department of Justice is matched with a set of samples without any allegations of fraudulent reporting.

Table 2 summarizes the research methodology and findings of FFR empirical studies most relevant to our study. The research methodology has shown a trend with an emphasis on the classification mechanization which is used as the decision support information for future risk identification (Basens et al. 2003). However, engagements relating to criminal matters typically arise in the aftermath of FFR and an assessment to criminal engagements requires the accumulation of FFR knowledge.

Table 2: Research methodology and findings in FFR empirical studies.

Author Methodology Variable Sample Findings

Dechow et al. (1996)

Logistic regression • 21 variables

– Financial

ratios

Matched-pairs design

92 firms subject

• To attract external financing at low cost was found an important motivation for earnings

(35)

– Other indicators: corporate governance, motivationn etc. to enforcement actions by the SEC manipulation

• Firms manipulating earnings are

more likely to have: - insiders dominated boards, - Chief Executive Officer

simultaneously serves as Chairman of the Board. Persons (1995) Stepwise logistic model • 9 financial ratios • Z-score Matched- pairs design

The study found four significant indicators: financial leverage, capital turnover, asset composition and firm size Fanning and Cogger (1998) Self-organizing artificial neural network • 62 variables • Financial ratios • Other indicators: corporate governance, capital structure etc. Matched- pairs design: 102 fraud samples and 102 non-fraud samples

• Neural network is more effective

• Financial ratios such as debt to equity, ratios of accounts

receivable to sales, trend variables etc are significant indicators. Bell and

Carcello (2000)

Logistic regression 46 fraud risk factors

77 fraud samples and 305 non-fraud samples

Logistic regression model outperformed auditors for fraud samples, but were equally performed for non-fraud samples. Kirkos et al. (2007) • Decision tree • Back-propagation neural network • Bayesian belief network • 27 financial ratios • Z-score Matched- pairs design: 38 fraud samples and 38 non-fraud samples

• Training dataset: neural network is the most accurate

• Validation dataset: Bayesian belief network is the most accurate

Hoogs et al. (2007)

Genetic Algorithm • 38 financial ratios • 9 qualitative indicators 51 fraud samples vs. 51 non-fraud samples

Integrated pattern had a wider coverage for suspected fraud companies while it remained lower false classification rate for non-fraud ones

This study explores the financial reporting fraud techniques via applying GHSOM to a research sample of 580 observations between the years of 1992 to 2006. A brief review of GHSOM literature is as follows. GHSOM addresses the issue of fixed network architecture of Self-Organizing Map (SOM) (Kohonen 1982) through developing the multilayer hierarchical network structure, in which, as shown in Figure 1, each layer contains a number of SOMs. The training process of GHSOM consists of the following four phases (Dittenbach et al. 2000):

(36)

6

Figure 1: GHSOM structure. (Dittenbach et al. 2000)

(1) Initialize the layer 0: The layer 0 includes single node (mapping) whose weight vector is initialized as the expected value of all input data. Then the mean quantization error of layer 0 (MQE0) is

calculated. Hereafter, MQE of a mapping denotes the mean quantization error that sums up the deviation between the weight vector of the node and every input data mapped to the node.

(2) Train every individual SOM: Within the training process of an individual SOM, the input data is imported one by one. The distances between the imported input data and the weight vector and all mapping are calculated. The mapping with the shortest distance is selected as the winner. Under the competitive learning principle, only the winner and its neighboring mappings are qualified to adjust their weight vectors. Repeat the competition and the training until the learning rate decreases to a certain value.

(3) Grow horizontally each individual SOM: Each individual SOM will grow until the mean value of the MQEs for all of the mappings on the SOM (MQEm) is smaller than the MQE of the parent

mapping (MQEp) multiplied by τ1. That is, the criterion for the stoppage of growth is stated in (1). If

the stopping criterion is not satisfied, find the error mapping that owns the largest MQE and then, as shown in Figure 2, insert one row or one column of new nodes between the error mapping and its dissimilar neighbor.

MQEm < τ1 × MQEp (1)

Figure 2: Horizontal growth of individual SOM. The notation e indicates the error mapping and d

(37)

(4) Expand or terminate the hierarchical structure: After the horizontal growth phase of individual SOM, MQE of every mapping (MQEi) is compared with the value of MQE0 multiplied by τ2. The mapping

with an MQEi greater than τ2 × MQE0 will develop a next layer of SOM. In this way, the hierarchy

grows until all of the leaf mappings satisfy the stopping criterion stated in (2). The leaf mapping means the mapping does not own a next layer of SOM.

MQEi < τ2 × MQE0 (2)

For this study a two-stage approach was developed as depicted in Figure 3. In the classification stage, the discriminant analysis is first applied to the research sample to identify the significant variables that help predict the occurrence of FFR. These significant variables are then used in training the GHSOM to obtain leaf mappings that may consist of data from fraud and non-fraud samples. The main purpose is to build up a well-trained GHSOM. With the unsupervised learning nature, the GHSOM treats samples equally without specifying the occurrence of sentence or manipulation, a scenario close to the real world. Due to a competitive learning nature the GHSOM works as a regularity detector that is supposed to discover statistically salient features of the sample population (Rumelhart and Zipser 1985). That is, there are no predefined categories into which samples are to be classified; rather, the GHSOM system must develop its own feature representation of the sample which captures the most salient features of the population of sample. Furthermore, through a set of small-sized mappings, the GHSOM classifies the sample into more subgroups using hierarchical relationships instead of a dichotomous result and therefore further and more delicate analyses are feasible.

Figure 3: The two-stage approach for exploring the financial reporting fraud techniques via GHSOM.

The FFR tendency, also denoted as the degree of risk, can be recognized from each mapping. There are three risk categories for leaf mappings based on the identified degree of risk — high-risk, mixed, and healthy. High-risk mapping contains many fraudulent financial statements and healthy mapping more non-fraudulent financial statements.

In the pattern-disclosure stage, unlike the traditional approaches that interpret the outcome of a model via its input variables, this study uncovers the (common) FFR patterns from the fraud samples clustered in

(38)

8

each leaf mapping without referring to the attributes of input variables. The associated FFR knowledge can be applied to all samples clustered in the same leaf mapping.

The two-stage approach used by this study is to further understand corporate behavior through integrating the outcome of the GHSOM model with the output features associated with FFR indictments and/or domain expertise. Our proposed approach focuses on uncovering the common patterns from each mapping without referring to the attributes of input variables resulting from the first stage analysis.

For instance, the patterns of each high-risk mapping that we are interested in are the common fraudulent techniques used by the fraud samples. Therefore, for a high-risk leaf mapping, we extract the regularity of fraudulent techniques from the corresponding indictments and sentences issued by the Department of Justice to the fraud sample. To refer to fraudulent techniques that are generally accepted, here the ten fraudulent techniques from (Beasley et al. 1999) are used. That is, there are three basic types of fraudulent techniques: Improper Revenue Recognition, Overstatement of Assets, and Others. Improper Revenue Recognition includes recording fictitious revenues (FT1), recording revenues prematurely (FT2), and no description/overstated revenues (FT3). Overstatement of Assets includes overstating existing assets (FT4), recording fictitious assets or assets not owned (FT5), and capitalizing items that should be expensed (FT6). Others includes understatement of expenses/liabilities (FT7), misappropriation of assets (FT8), inappropriate disclosure (FT9), and other miscellaneous techniques (FT10).

Based upon the observed regularity of corporate behavior in each leaf mapping, we can further identify the relevant indicators of such regularity for future reporting. For instance, built on the common fraudulent techniques observed in a high-risk mapping, we could identify the relevant financial indicators as the signal which reveals the potential fraudulent activities for any samples clustered into this high-risk leaf mapping by GHSOM. The relevant indicators help accumulate the FFR knowledge, and also help in exploring the financial reporting fraud techniques for future samples.

Our primary research findings include the followings. The results of the classification stage led to a GHSOM with three layers and 41 leaf mappings, each of which preserved its own salient features. The outcomes of the pattern-disclosure stage resulted in certain FFR techniques used by firms in each of the high FFR tendency subgroups as well as the relevant risk indicators that can be used as the FFR audit guideline. As a result, the systematic and integrated approach of this study is capable of constructing evidence to better understand the FFR.

The findings of this research help identify the link between the usages of FFR techniques and FFR drivers (including financial conditions). The FFR knowledge derived from our research methodology can help investors make their investment decisions and can help auditors do their audit planning and make audit judgments. Such knowledge can help someone who wants to play a more proactive risk reduction role by designing and performing extended procedures as a part of the statutory audit, acting as advisers to audit committees, fraud deterrence engagements, and assisting in investment analyst research. In addition, with the unsupervised learning method feature of GHSOM, our methodology is applicable to

財務報表舞弊探索與類神經網路(I)

行政院國家科學委員會專題研究計畫 成果報告

財務報表舞弊探索與類神經網路(第 2 年)

研究成果報告(完整版)

計 畫 類 別 ： 個別型

計 畫 編 號 ： NSC 98-2410-H-004-049-MY2

執 行 期 間 ： 99 年 08 月 01 日至 100 年 07 月 31 日

執 行 單 位 ： 國立政治大學資訊管理學系

計 畫 主 持 人 ： 蔡瑞煌

共 同 主 持 人 ： 林宛瑩

計畫參與人員： 博士班研究生-兼任助理人員：黃馨瑩

報 告 附 件 ： 出席國際會議研究心得報告及發表論文

處 理 方 式 ： 本計畫可公開查詢

中 華 民 國 100 年 07 月 29 日

行政院國家科學委員會補助專題研究計畫

■成果報告

□期中進度報告

財務報表舞弊探索與類神經網路

計畫類別：■個別型計畫 □整合型計畫

計畫編號：NSC 98－2410－H－004－049

－MY2

執行期間： 99 年 8 月 1 日至 100 年 7 月 31 日

執行機構及系所：國立政治大學資訊管理學系

計畫主持人：蔡瑞煌

共同主持人：林宛瑩

計畫參與人員：黃馨瑩

成果報告類型(依經費核定清單規定繳交)：□精簡報告 ■完整報告

本計畫除繳交成果報告外，另須繳交以下出國心得報告：

□赴國外出差或研習心得報告

□赴大陸地區出差或研習心得報告

■出席國際學術會議心得報告

□國際合作研究計畫國外研究報告

處理方式：

除列管計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年□二年後可公開查詢

中 華 民 國 100 年 7 月 29 日

摘要

財務報表舞弊不僅對股東造成顯著的投資危機，也掀起資本市場的財務風暴。雖然財務報表的舞弊已

經引起許多關注，但大部分相關研究者著重在預測財務危機和破產，而鮮少聚焦在對財報舞弊本身知

識的探討。本研究旨在透過以下四個階段而對財報舞弊有更深的了解。

(1) 從文獻中整理出財務和公司治理方面和財報舞弊相關的所有指標，然後用統計分析方法採擷、獲得

和財報舞弊顯著相關的指標;

(2) 利用 Growing Hierarchical Self-Organizing Map (GHSOM)之人工智慧分群方法來對正常及舞弊的財

報資料分群;

(3) 剖析分群的財報資料以及利用專家之研判，以擷取財報舞弊的相關知識;

(4) 再利用專家來研判所採擷的財報舞弊的相關知識之可信度。

因為人工智慧分群方法可以從龐大的資料中找尋隱藏的階層關聯；所以學理上，這項研究是可行的。

在第一年，這項研究計畫著重於財務和公司治理方面和財報舞弊相關的所有指標之文獻整理，然後利

用統計分析方法採擷、獲得和財報舞弊相關的顯著指標；並且利用

GHSOM 分群方法來對正常及舞弊

的財報資料分群。在第二年，研究計畫套用所得之

GHSOM 來對舞弊的財報之起訴書和判決書做分群，

再利用專家之研判，以對每一群起訴書和判決書擷取財報舞弊的相關知識。

關鍵字：財務報表舞弊，GHSOM 人工智慧分群方法，知識萃取

ABSTRACT

Fraudulent financial reporting (FFR) has drawn much public as well as academic attention. However, the

extant literature review reveals that prior FFR-related research focused on the nature or the prediction of FFR

and that there is no study that extracts FFR features – the delicate but hid-den truths regarding FFR – from

very large quantities of FFR data through tools of artificial intelligence. On the other hand, artificial

intelligence techniques play an important role in accomplishing the task of financial fraud detection. Via

conducting a Growing Hierarchical Self-Organizing Map (GHSOM) application to FFR samples of Taiwan,

this study justifies the theoretical benefits of feature-extracting through GHSOM. Specifically, for each leaf

node of GHSOM, this study uncovers common fraudulent techniques from corresponding FFR indictments

and sentences of clustered samples without referring to the attributes of input variables. The acknowledge that

different leaf nodes have distinctive common fraudulent techniques can confirm the ability of GHSOM in

extracting features in terms of exogenous variables that are more abundant and more informative than

endogenous variables.

THE EXOGENOUS ISSUE OF FEATURE EXTRACTION

Introduction

Fraudulent Financial Reporting (FFR), or financial statement fraud, involves the intentional misstatement

or omission of material information from an organization’s financial reports (Beasley et al. 1999). Although

with the lowest frequency, FFR casts a severe financial impact with median losses of $2 million per scheme

(ACFE 2008). These are cases that often have severe consequences in terms of not only significant risks for

stockholders and creditors but also financial crises for the capital market.

However, the following extant literature review reveals that prior FFR-related research focused on the

nature or the prediction of FFR and that there is no in-depth study that explores FFR features – the delicate

but hidden truths regarding FFR – extracted from very large quantities of FFR data through tools of artificial

intelligence (AI). The nature-related FFR research often uses the case study approach and provides a

descriptive analysis of the characteristics of FFR and techniques commonly used. For instance, the Committee

of Sponsoring Organizations (COSO) and the Association of Certified Fraud Examiners (ACFE) regularly

行政院國家科學委員會專題研究計畫成果報告

計畫類別：個別型

計畫編號： NSC 98-2410-H-004-049-MY2

執行期間： 99 年 08 月 01 日至 100 年 07 月 31 日

執行單位：國立政治大學資訊管理學系

計畫主持人：蔡瑞煌

共同主持人：林宛瑩

計畫參與人員：博士班研究生-兼任助理人員：黃馨瑩

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫可公開查詢

中華民國 100 年 07 月 29 日

中華民國 100 年 7 月 29 日