2. DATA
2.2 I NTERNET SEARCH VOLUME
where , is the close price of index i on day t.
Besides, we will use weekly search volume instead if daily search volume is not available for index. So we also have to define weekly realized volatility, , , and weekly return, , :
, ∑ , , (3)
, log ,
, , (4)
where t is Friday, t-4 is Monday and t-5 is last Friday.
2.2 Internet search volume
For internet search volume, we choose to use the search engine which has the highest market shares according to StatCounter Global Stats, which express global and each country’s ranking of search engines’ market shares. In global, Google has about 90.7% market shares of all search engines on 2011.
Table 2 presents top 2 search engines with market shares in each country on 2011.
Panel A, B and C are developed, emerging and frontier markets respectively. From this table we find that in addition to China, where Baidu is the biggest search engine, and South Korea, where Naver is the most popular search engine, Google has the highest market shares in the other countries. And the market shares of Google are beyond 70% in almost all countries, except Hong Kong, Russia and Taiwan, where market shares are between 53.37% and 59.60%. Although Google is not the top one in China and South Korea, it still owns 30.73% and 34.16% market shares respectively, ranked at second.
Eventually, we use the same search engine, Google, to measure local attention of individual investors. There are two main reasons. First is the problem of language,
11
Table 2
Top 2 search engines on 2011
This table presents top 2 search engines with market shares in each country on 2011. Data source is StatCounter Global Stats (http://gs.statcounter.com/). Panel A, B and C are developed, emerging and frontier markets respectively. The country whose top 1 search engine has market shares below 70% or is not Google is indicated through bold numbers.
Panel A: Developed Markets
Country Search Engine Market Share Search Engine Market Share
Australia Google 94.11% bing 3.92%
Austria Google 96.91% bing 1.98%
Belgium Google 98.08% bing 0.83%
Canada Google 91.83% bing 4.79%
Denmark Google 96.57% bing 2.62%
Finland Google 97.90% bing 1.68%
France Google 94.89% bing 2.99%
Germany Google 95.73% bing 1.99%
Greece Google 97.63% bing 1.56%
Hong Kong Google 59.60% Yahoo! 39.35%
Ireland Google 94.60% bing 2.71%
Israel Google 97.17% bing 1.87%
Italy Google 96.76% Yahoo! 1.07%
Japan Google 70.85% Yahoo! 26.65%
Netherlands Google 94.61% StartPagina 2.54%
Norway Google 93.77% bing 3.00%
Portugal Google 96.98% bing 1.95%
Singapore Google 85.91% Yahoo! 11.12%
Spain Google 96.48% bing 2.28%
Sweden Google 96.80% bing 2.41%
Switzerland Google 96.35% bing 2.28%
United Kingdom Google 91.78% bing 4.40%
USA Google 79.71% Yahoo! 9.57%
12
Table 2-Continued
Panel B: Emerging Markets
country Search Engine Market Share Search Engine Market Share
Chile Google 97.38% bing 1.96%
China Baidu 65.51% Google 30.73%
Colombia Google 96.63% bing 2.61%
Hungary Google 98.49% bing 0.90%
India Google 97.53% Yahoo! 1.18%
Indonesia Google 95.91% Yahoo! 2.18%
South Korea Naver 55.72% Google 34.16%
Malaysia Google 86.21% Yahoo! 9.74%
Mexico Google 92.75% bing 5.20%
Peru Google 97.80% bing 1.54%
Philippines Google 85.92% Yahoo! 11.73%
Russia Google 54.99% YANDEX RU 43.02%
South Africa Google 94.28% bing 3.69%
Taiwan Google 53.37% Yahoo! 45.43%
Thailand Google 99.21% bing 0.58%
Turkey Google 98.79% bing 1.03%
Panel C: Frontier Markets
country Search Engine Market Share Search Engine Market Share
Argentina Google 95.47% bing 2.80%
Bulgaria Google 98.56% bing 0.79%
Croatia Google 98.46% bing 0.89%
Kazakhstan Google 80.12% YANDEX RU 17.77%
Lebanon Google 94.58% bing 2.81%
Nigeria Google 88.96% Yahoo! 4.94%
Pakistan Google 94.67% Yahoo! 2.46%
Romania Google 97.62% Yahoo! 1.13%
like Naver, the top one search engine in South Korea, is all in Korean without the version of English. Next, search engine does not provide detail search volume data to be downloaded, ex: Yahoo!.
Google provide Search Volume index, instead of effective total number, of
13
search term publicly by Google Trends.3 This index is a portion of Google web searches to compute how many searches have been done for the terms we enter, relative to the total number of searches done on Google over time. In this website, we can see graph of search volume index and download the search volume data globally, or in specific region, country or city, even in different periods. We also can compare search volume of several searching terms, up to five, when entering terms separated by comma “,”.
After we signed into our Google Account, we could download two different modes of scaled data, relative and fixed. In relative mode, the data is scaled to the average search traffic for search term (represented as 1.0) during the time period we’ve selected while in fixed mode, the data is scaled to the average traffic during a fixed point in time (usually January 2004). Since the scale basis doesn’t change with time in fixed mode, we can relate them in different time periods. Therefore, we all download the search volume data with fixed scaling.
Search volume could date back to January 2004. We’d like to download the highest frequency search volume data, the daily data. But the search volume data at daily frequency may has many missing data, or even not enough volume to show graph and to be downloaded. For those indices with above problems, we will make use of weekly search volume instead. We only consider trading days of the stock markets in order to match search volumes to the respective time series of volatility.
A search engine user may search for a specific index using its name, ticker symbols or moreover, the short name of its stock exchange. Since stock indices often have many names and ticker symbols, it is a problem to choose an appropriate search term for stock index. We need to find the most widely used search term for specific
3 Source: http://www.google.com.tw/trends/.
14
stock index. In general, the short name of the index is preferred by individuals. Take the leading index, S&P/ASX 200, in Australia for example. Using its name as search term, there is not enough search volume to show graph. Search volume of “ASX 200”
has a lot missing data before 2011. Finally, we set “ASX”, which has correlation about 0.84 with “ASX 200” and far more often been searched, as keywords to download daily search volume data. 4
For USA, we use all three leading index, S&P 500, NASDAQ, DJIA. The answer of the question which search term individuals use when looking for information about the stock index is easy, especially NASDAQ, which “NASDAQ” is used as keywords. For S&P 500, the number of search volume of “S&P” is about 2.1 times as often as the term “S&P 500”. The correlation between the two search terms is 0.84. To DJIA, search volume of “DJIA” and “Dow Jones” amount to 15% and 46%
respectively when compared to “Dow”. And the pairwise correlations between these search volumes are remarkably high, all above 0.96. Therefore, we choose the search term that is most preferred by retail investors.
However, for some countries, we cannot find search volume of index while using its names or ticker symbols. At this time, we choose to use the name or short name of stock exchange where the index is traded. For example, “Bolsa de Madrid” is the Spanish name of the stock exchange of the leading index, IGBM, in Spain. We take away those countries which we cannot discover any search volume of the main index.
And then we rearrange the sample period for each country. We also remove the countries whose number of observation is under 100, such as Portugal, whose search volume data has too many missing data before 2011.
Table 3 displays the list of countries in our sample with the name of leading
4 Source: Google Correlate (http://www.google.com/trends/correlate/)
15
Table 3
List of countries in the sample with search term, start date and number of observation
This table contains the list of countries in our sample with the name of leading index, the search term used to measure local attention, the start date of sample period and the number of observation of realized volatility (search volume) in sample. For those name of countries in italic type mean that the data is at weekly frequency, ex: Austria. The search term which is not short name of relative index is indicated through bold type. Panel A, B and C provide the list of developed, emerging and frontier markets respectively.
Panel A: Developed Markets
Country Index Search Term Start date Obs.
Australia S&P/ASX 200 ASX 2005/7/7 1684
Austria ATX ATX 2008/9/19 180
Belgium BEL 20 BEL 2004/1/9 425
Canada S&P/TSX COMPOSITE TSX 2005/7/19 1661
France CAC 40 CAC 2007/1/2 1325
Germany DAX 30 DAX 2006/1/2 1571
Hong Kong HANG SENG HANG 2009/2/9 760
Italy FTSE MIB MIB 2007/8/31 224
Japan NIKKEI 225 NIKKEI 2005/11/1 1552
Netherlands AEX AEX 2007/1/2 1323
Singapore STRAITS TIMES STRAITS 2009/2/2 775 Spain IGBM Bolsa de Madrid 2006/10/2 1380
Sweden OMXS30 OMX 2009/8/14 133
Switzerland SMI SMI 2007/11/9 225
United Kingdom FTSE 100 FTSE 2006/1/3 1558
USA S&P 500 S&P 2006/1/3 1551
USA NASDAQ NASDAQ 2005/1/3 1803
USA DJIA DOW 2005/1/3 1803
16
Table 3-Continued
Panel B: Emerging Markets
Country Index Search Term Start Date Obs.
China SSE A SHARE A 股 2006/9/1 279
India SENSEX SENSEX 2007/10/1 1089
Malaysia KLCI KLSE 2010/8/2 385
Mexico BOLSA BOLSA 2005/1/3 1805
Peru IGBL BVL 2007/2/9 264
South Africa FTSE/JSE ALL SHARE JSE 2007/1/12 268
Thailand SET SET 2007/10/2 1077
Turkey ISE 100 IMKB 2007/3/1 1239
Panel C: Frontier Markets
Country Index Search Term Start Date Obs.
Croatia CROBEX zagrebačka burza 2007/11/30 222
Pakistan KSE 100 KSE 2011/1/3 289
Romania BET BVB 2011/1/10 290
index, the search term used to search information about the index for measuring local attention, the start date of sample period and the number of observation of realized volatility (search volume) in sample. For those words in italic type mean that the data is at weekly frequency, ex: Austria, China. In general, individual investors like to use short name of index to search information. Only in Malaysia, people more prefer to use the ticker symbol. “KLSE” is one of ticker symbols of the leading index, KLCI, which we cannot find enough search volume of its short name. There are 6 countries, Spain, Sweden, Peru, Turkey, Croatia and Romania, where (short) names of stock exchange are used as search terms.
From Panel A of Table 3, there are 18 indices which belong to developed markets in our sample. Among these countries, there are 5 countries, Austria, Belgium, Italy, Sweden and Switzerland, which weekly data are used. The longest sample period is Belgium but the number of observation is only 425 since the data is at weekly
17
frequency. Panel B of Table 3 shows that 8 emerging countries are included in sample and weekly data are used for China, Peru and South Africa. The last panel of Table 3 presents that in frontier markets, only Croatia with weekly data and 2 countries with daily data are contained in sample.