Initial Population - Components of the Proposed Approach

CHAPTER 3 Proposed Approach

3.2 Components of the Proposed Approach

3.2.2 Initial Population

The strategy of generating initial population is important, because it reflects on the final optimization result. You et al. pointed out that a portfolio with cash dividend yield is better than a portfolio with types in the Taiwan stock market [39]. Therefore, choose cash dividend yields of stocks to frame the initial population in this proposed approach. Next, we use an example to explain this method for explaining the benefits of using cash dividend yield to generate initial population shown in Table 2. In Table 2, it shows cash dividends, stock price,

and cash dividend yield of two companies named Hon Hai Precision Industry Co., Ltd. (2317) and Formosa Petrochemical Corporation (6505).

Table 2. Cash dividend yields of two companies.

2011 2012 2013 2014 2015

Cash dividends of

2317 (per share) 1.5 1.5 1.8 3.8 4

Stock price of 2317 82.90 88.90 80.10 87.90 80.80 Cash dividend

yield of 2317 1.81% 1.69% 2.25% 4.32% 4.95%

Cash dividends of

6505 (per share) 2 0.26 2.5 0.85 4

Stock price of 6505 93.80 86.00 81.80 68.70 78.80 Cash dividend

yield of 6505 2.13% 0.30% 3.06% 1.24% 5.08%

In Table 2, the cash dividends of 2317 are 1.5, 1.5, 1.8, 3.8 and 4, the stock prices of 2317 are 82.90, 88.90, 80.10, 87.90 and 80.80. Then, we calculate the cash dividend yields defined as that cash dividend divided by stock price. Therefore, the cash dividend yields of 2317 are 1.81%, 1.69%, 2.25%, 4.32% and 4.95%. In the same way, the cash dividend yields of 6505 are 2.13%, 0.30%, 3.06%, 1.24% and 5.08%. To compare the stability of 2317 and 6505, the standard deviation of each cash dividend yields is calculated. The standard deviation of 2317 is 1.35938, and the standard deviation of 6505 is 1.63947. As a result, we know 2317 is better than 6505 because the cash dividend yield of 2317 is stable. It means that buying 2317 can earn a stable income at low risk. Then, the cash dividend yields (yi) of n companies are shown in Table 3.

Table 3. The cash dividend yield (yi) of companies.

s1 s2 si sn-1 sn

y1 y2 yi yn-1 yn

With the cash dividend yield (yi) of n companies, we use some existing techniques like kNN and k-means clustering to divide n stocks into K groups. The average cash dividend of each group avgCDi is calculated. Then, we calculate the proportion of average cash dividend of each group to all groups, and the result is shown in Table 4.

Table 4. Proportion of average cash dividend of each group.

G1 G2 … Gi … GK-1 GK

𝑎𝑣𝑔𝐶𝐷₁

∑^𝐾_𝑎=1𝑎𝑣𝑔𝐶𝐷_𝑎

𝑎𝑣𝑔𝐶𝐷₂

∑^𝐾_𝑎=1𝑎𝑣𝑔𝐶𝐷_𝑎 … 𝑎𝑣𝑔𝐶𝐷_𝑖

∑^𝐾_𝑎=1𝑎𝑣𝑔𝐶𝐷_𝑎 … 𝑎𝑣𝑔𝐶𝐷_𝐾−1

∑^𝐾_𝑎=1𝑎𝑣𝑔𝐶𝐷_𝑎

𝑎𝑣𝑔𝐶𝐷_𝐾

∑^𝐾_𝑎=1𝑎𝑣𝑔𝐶𝐷_𝐾

In Table 6, the proportions of the average cash dividend of each group are shown. It means that the probability of being selected in the stock portfolio of each group. For instance, if the avgCD1 is larger than the avgCD2, it means that the probability of G1 being selected in the stock portfolio is higher than G2 being selected in the stock portfolio. It can be seen that the larger the average cash dividend of the group is, the larger the probability of a stock being chosen from the group to form the stock portfolio. By using this method, the quality of the initial population can be improved.

3.2.3 Fitness Evaluation

In this section, we introduce the fitness functions used in this study. The fitness functions are used to determine the quality of the chromosome. In the law of nature, the fitness functions are equivalent to the conditions of natural selection. It is very important to define the appropriate fitness function because it affects the results of the entire algorithm. Since the goal of this study is to mine the diverse group stock portfolio, the fitness function is used to score the chromosome and decide whether to retain it. In this paper, the fitness functions are designed

based on the risk of investor sentiment, portfolio satisfaction, group balance, price balance, unit balance and diversity factor. The following describes the five fitness functions used in this article.

The risk of investor sentiment (RIS) is used to assess the risk of the stock using the investor sentiment strategy trading. After using investor sentiment index, we get the trading signals which decide the trading timing. In general, the trading signals will produce many transactions.

Although the trading strategy in this study only takes a transaction to determine trading timing on each stock, we need all the difference of trading price (DTP) in transaction records by using trading signals to calculate RIS. To practice RIS, there are three main steps that should be followed. The first step is finding the minimal difference of trading price (minDTPi) on each stock, where i represents the stocks. The second step is finding the maximum minDTPi named MAXminDTP and the minimal minDTPi named MINminDTP from all stocks. The third step is normalizing the minDTPi to the range [0, 1] and getting the nomalDTP(si) on each stock by the following formula:

Then, we calculate the subRIS(SPp) by the following formula:

 

* , stock portfolio SPp. Finally, the RIS(Cq) is calculated by the following formula:

where NC is the number of stock portfolios generated from chromosome Cq.

Take an example to explain this method. Assume there are five companies {company A, company B, company C, company D, company E}, and the transaction date is from 2011/1/3 to 2011/1/28. The trading information of company A is shown in Table 5.

Table 5. The price and trading signals.

Trading Date Price Trading Signal 2011/1/3 272.5

2011/1/4 274 Buy

2011/1/5 261

2011/1/6 258

2011/1/7 267.5 Sell

2011/1/10 267 2011/1/11 265

2011/1/12 262 Buy

2011/1/13 260 2011/1/14 267 2011/1/17 260 2011/1/18 262.5

2011/1/19 264 Sell

2011/1/20 263 2011/1/21 258 2011/1/24 258 2011/1/25 262

2011/1/26 263 Buy

2011/1/27 259

2011/1/28 262 Sell

In Table 5, the price is the closing price on every trading date, and the trading signals are produced from the trading strategy described in section 4.1. There are three trading times from 2011/1/3 to 2011/1/28. The first DTP is -6.5 (267.5-274). It means that investors buy this stock on 2011/1/4, sell it on 2011/1/7 and lose 6.5 in this transaction. In the same way, the DTPs of

three transactions are -6.5, 2 and -1. Obviously, the minDTPA is -6.5, and so on. Then, we get the minDTP of five companies. The five companies’ minDTP and the nomalDTP are shown in Table 6.

Table 6. The minDTP and the nomalDTP on five companies.

Stock minDTP nomalDTP

Company A -6.5 0.72

Company B -1.8 1

Company C -10.3 0.49

Company D -18.6 0

Company E -9 0.57

In Table 6, the MAXminDTP is -1.8, and the MINminDTP is -18.6. Finally, the subRIS of each stock is calculated by the formula 7. Company B has the largest nomalDTP from all companies, and it means that Company B has the maximum difference of trading price from all companies. In other words, Company B has the highest return under the investor sentiment strategy. If Company B is selected into the stock portfolio in the chromosome, the quality of the chromosome is higher than the chromosome which did not select Company B into stock portfolio under the same conditions.

The portfolio satisfaction is used to assess the profit and requests given by users of a chromosome. When a chromosome has high portfolio satisfaction, it means stock portfolios generated from a chromosome can get high returns from stock portfolios and satisfy objective and subjective criteria given by users. Formally, the portfolio satisfaction of a chromosome Cq

is defined by the following formula:

where NC is the number of stock portfolios generated from chromosome Cq, and subPS(SPp) is the portfolio satisfaction of the p-th stock portfolio. The formula of subPS(SPp) is as follows:

)

where ROI(SPP) is the profit of the stock portfolio in SPp, which is calculated by the following formula:

When investors buy the stock, investors have to pay the buying handling fee to the securities company. When investors sell the stock, investors have to pay the selling handling fee and the securities transactions tax to securities company. The formula of Tax⁽ⁱ⁾ is as follows:

%).

The suitability(SPp) is used to evaluate whether the stock portfolio meets the subjective criteria given by investors. The suitability(SPp) includes the investment capital penalty (ICP) and portfolio penalty (PP). The ICP is used to measure the satisfaction degree of investment in a portfoliorelative to the predefined maximum investment. The PP is used to measure the satisfaction degree of the number of purchased stocks in a portfolio relative to the predefined maximum number of purchased stocks. The ICP(SP) is defined as:

 investment. The PP(SP) is defined as:

 maximum number of purchased stocks. Hence, the suitability(SP) is defined as:

The group balance is used to make the groups have the number of stocks as similar as possible. For this goal, the concept of entropy is added in this fitness function. The group balance for a chromosome Cq is defined by the following formula:

where |Gi| represents the number of stocks in the i-th group. Hence, if a chromosome has a large group balance value, it means that numbers of stocks in groups are similar.

The unit balance is designed to avoid generating too large or too small purchased unit ui

for a group and make sure the purchased units of groups can range from the predefined [minPurchasedUnit, maxPurchasedUnit]. The unit balance is defined by the following formula:

 purchased units of all groups are in the predefined range. If UB(Cq) is 1.15, it indicates some purchased units are not in the predefined range. Otherwise, UB(Cq) is 1.

The price balance is designed to make the price of every stock in the same group as similar as possible. The price balance is defined by the following formula:

| ),

where Secj is the stock price section which is defined by user, |Secj| is the number of stocks in j-th section and |Gi| is the number of stocks in group Gi.

The diversity factor is used to increase the diversity of stocks in the same group. The diversity factor is defined by the following formula:

The Diqis calculated by the following formula:

where sh and st are two stocks in the same group Gi, and dissMatrix(sh, st) is the difference

where dl is used to assess the difference of an attribute between two stocks and m is the number of attributes used to calculate the diversity of GSP. In the proposed approach, the two attributes, industry category ah1 and company capital ah2, are used. The distance d1 of two stocks is one when ah1 is not equal to at1, which means sh and st are in different categories.

Otherwise, d1 is zero. The distance d2 is calculated by the following formula:



This formula shows that if the capital difference of stocks sh and st is between 0 and 10 billion, the score is zero. If the capital difference of stocks sh and st is between 10 and 20 billion, the score is 1/3. If the capital difference of stocks sh and st is between 20 and 40 billion, the score is 2/3. Otherwise, the score is 1.

With the above description, the final fitness functions are defined by the formula:

where α and β are parameters used to reflect the influence of these factors.

3.2.4 Genetic Operation

In this section, genetic operations are described, including crossover, mutation and inversion operations. The crossover operations are designed for grouping part and stock portfolio, the mutation operations are designed for stock part and stock portfolio, and the inversion operation is designed for the grouping part.

3.2.4.1 Crossover

The first phase of crossover operation is on the group part and the second part is on the stock portfolio part. In the grouping part, it randomly selects chromosome CA to be the base chromosome, and inserts some groups from another chromosome CB. Then, it deletes the duplicate stocks from the new chromosome C^new. Formally, it assumes CA as the base chromosome and CB as the inserted chromosome as follows:

where p is the point of insertion, s is the starting group in the inserted segment, and m is the number of inserted segments. The parameters p, s, and m are generated randomly. The new chromosome C^new is generated as follows:

i[1, K].

The will be removed if . Then, there are three situations after elimination process including Case (1): |C^new| < K, Case (2): |C^new| = K, and Case (3): |C^new| > K. The |C^new| is the number of non-empty groups in the new chromosome. The Case (2) satisfies the definition of chromosome, and the others need to be adjusted. In the following, adjustment of Cases (1) and (3) are stated.

Case (1): |C^new| < K

In this case, it represents that the new groups should be added to satisfy the constraint.

Therefore, a new group Gi is randomly selected from . Then, the group Gi will be split into two subgroups. This process is repeated until the grouping part has K non-empty groups.

Case (3): |C^new| > K

In this case, it means that some groups should be removed. The roulette wheel selection strategy is used to select the groups to remove. When a group is picked for removal, the stocks inside it are reappointed to other groups randomly. If a group has few number of stocks, it has large probability to be removed. In the stock portfolio part, we use a one-point crossover in the proposed approach. The one-point crossover operator generates two new offspring from their parents with a random crossover location d. Of course, more sophisticated crossover operations could be utilized.

3.2.4.2 Mutation and Inversion

The two-phase mutation operation is designed in this section. The first phase of the mutation operation is for the stock part and the second phase is for the stock portfolio part. In the first part, the two groups are selected randomly, where the number of stocks of each group is larger than 1. Then, the mutation operator will reassign a stock into another group. In the second phase, we use a one-point mutation in this approach. A gene is selected to mutate. There are two cases in this step. The first case is that if the odd gene in the stock portfolio part is selected, it changes the value from [0.5, 1] to [0, 0.5] or from [0, 0.5] to [0.5, 1]. The second







  ,G G K

G N

G_i _i _i

case is that the even gene in the stock portfolio part is picked, it randomly generates a value from the range [1, maxUnit] to replace original one.

The inversion operator is to make the crossover operator to generate various combinations of groups to exchange between two parents. In the proposed approach, the rearrangement is done randomly for this target.

3.3 The Proposed Algorithm

Input Data:

A set of stocks S = {si | 1  i  n}, which includes the following information on every trading day t : the stock prices SPt, the balance of margin loan BMLt , the balance of stock loan BSLt, the trading volume Vt, the buying volume of institutional investors BIt, the selling volume of institutional investors SIt, the proportion of day-trades DTt, the cash dividends of stocks CD, the industry category of stocks C and the variations of cash dividends varCD.

Parameter Setting:

A predefined maximum number of purchased stocks in a portfolio numCom, a predefined maximum investment capital maxInves, a predefined maximum number of purchased units of a stock maxUnit, a number of groups K, a population size pSize, a crossover rate pc, a mutation rate pm, an inversion rate pI, a number of generations numGene and a number of minimum holding stock days H.

Output:

A diverse group stock portfolio DGSP = {Gi | 1  i  K}.

STEP 1: Generate the trading signal on every stock using the following sub-steps:

Sub-step 1.1: Calculate the BSIt on every trading days using the formula (1) to (4).

Sub-step 1.2: Calculate the high BSI threshold and low BSI threshold using the f sub-steps:

Sub-step 1.2.1: Rank all BSIt, and get the PR value of all BSIt.

Sub-step 1.2.2: Get the PR90 of BSIt and the PR10 of BSIt to be the high BSI threshold and low

BSI threshold.

Sub-step 1.2: Calculate the difference between DTt-1 and DTt on every trading days.

Sub-step 1.3: Generate the original trading signal on every trading days. The trading flag F = {F | 0, 1, -1} is used to control three possible cases. Note that the initial value of F is 0. The three cases are stated as followed.

Case F=0 and F=1: Inspect whether the following conditions of the trading day are met.

BSIt is less than low BSI threshold, changes of DTt >0 and H>3.

If it meets, generate a buying signal on that trading day, translate F to -1 and make H to 0. If it does not meet, add one to H.

Case F= -1: Inspect whether the following conditions of the trading day are met.

BSIt is more than high BSI threshold, changes of DTt <0 and H>3

If it meets, generate a selling signal on that trading day, translate F to 1 and make H to 0. If it does not meet, add one to H.

STEP 2: Record prices of the trading day with first buying signal and trading day with last selling signal on every stock.

STEP 3: Generate initial population with pSize using the following sub-steps:

Sub-step 3.1: Generate grouping part with K randomly. Note that constraints for the grouping part, G1G2…Gk = S, Gi   and i  j, GiGj = should be reached.

Sub-step 3.2: Calculate average cash dividend of each group Gi according to cash dividend of each stock yh using the following formula:

where avgCD(Gi) is the average cash dividend of each group Gi,

Sub-step 3.4: Randomly generate numCom values from the range [0, 1] and collect them in a set R = {ri | 1  i  numCom}.

Sub-step 3.5: For each element in R, if the random value ri is between proportionAvgCD(Gi-1) and proportionAvgCD(Gi), then group Gi is put into the candidate portfolio.

Sub-step 3.6: Generate stock portfolio according to the candidate portfolio. For each selected group Gi, set its bi in the chromosome larger than 0.5. Otherwise, set it less than 0.5. Randomly generate the corresponding number of purchased units of each group from the range [0, maxUnit].

step 3.7: If pSize chromosomes are generated, go to the next step. Otherwise, go to Sub-step 3.1.

STEP 4: Calculate fitness value of each chromosome using the following sub-steps:

Sub-step 4.1: Calculate the risk of investor sentiment (RIS) of each chromosome Cq using the following sub-steps:

Sub-step 4.1.1: Generate possible stock portfolios using the group part represented in each chromosome Cq. All of them are collected in a set S = {SPi | 1 < i <

|G1|*|G2|*...*|GK|}.

Sub-step 4.1.2: Get the minimal difference of trading price minDTP(si) on stock si, from every transactions which are decided by trading signals.

Sub-step 4.1.3: Rank all minDTP(si) and get the maximum to be MAXminDTP and the minimum to be MINminDTP.

Sub-step 4.1.4: Calculate the nomalDTP(si) of each stock using the formula (6).

Sub-step 4.1.5: Calculate the subRIS of the stock portfolio SPi using the formula (7)

Sub-step 4.1.6: Repeat Sub-steps 4.1.1 to 4.1.5 to calculate the subRIS of all stock portfolios.

Sub-step 4.1.7: Set the RIS of each chromosome Cq using the formula (8).

Sub-step 4.2: Calculate portfolio satisfaction of each chromosome Cq using following sub-steps:

Sub-step 4.2.1: Calculate ROI of each stock portfolio SPi using the formula (11).

Sub-step 4.2.2: Calculate suitability of each stock portfolio SPi using the formula (15).

Sub-step 4.2.3: Set the portfolio satisfaction of the stock portfolio SPi using the formula (10).

Sub-step 4.2.4: Repeat Sub-steps 4.2.1 to 4.2.3 to calculate the portfolio satisfaction of all stock portfolios.

Sub-step 4.2.5: Set portfolio satisfaction of each chromosome Cq using formula (9).

Sub-step 4.3: Calculate group balance of each chromosome using formula (16):

Sub-step 4.4: Calculate diversity factor of each chromosome using formula (19):

Sub-step 4.5: Set fitness value of each chromosome Cq using formula (23).

Step 5: Execute selection operation on the population to form the next population. Here, elitist or roulette wheel selection strategies can be used. In this paper, the elitist selection strategy is utilized for generating next population.

Step 6: Execute crossover operation on the population.

Step 7: Execute mutation operation on the population.

Step 8: Execute inversion operation on the population.

Step 9: If the stop criterion is satisfied, go to the next step. Otherwise, go to Step 4.

Step 10: Output the best diverse GSP with Investor Sentiment Index.

3.4 An Example

In this section, we use an example to illustrate the proposed algorithm to mine a diverse group stock portfolio with investor sentiment index. Assume there are fifteen companies. The trading data of one of the fifteen companies s1 is shown in Table 2. For the convenience of the example, we define the Low BSI threshold as PR20(BSIt) and the High BSI threshold as

在文檔中以投資者情緒指標為基礎的多樣群組股票組合最佳化之研究 (頁 30-0)