條件式建模與多層表示式在影像分析之應用

(1)

Final Report

Lo-Bin Chang

November 6, 2013

During the two years of the proposal, we published two ﬁnance papers, one probability paper and one machine learning paper, and we ﬁnished a draft about vision research. The following are the details of these papers:

(1) An Invariance for the Large-sample Empirical Distribution of Waiting Time between Successive Extremes, Lo-Bin Chang, Alok Goswami, Fushing Hsieh,

Chii-Ruey Hwang, Bulletin of the Institute of Mathematics Academia Sinica (2013) Vol. 8, No. 1, 31-48.

The investigation was motivated by an empirical invariance found in our study of the raw data consisting of the actual trade price of the intraday transactions data (trades and quotes) of companies in S&P500 list from 1998 to 2007 and part of 2008 (see [3]).

The temporal evolution of the price of a stock is given by a stochastic process {St}. For example the Black-Scholes model assumes that{St} follows a geometric Brownian motion with constant volatility and satisﬁes the stochastic diﬀerential equation

dSt = rStdt + σStdWt,

where r and σ are positive constants and {Wt} is standard Brownian motion. An imme-diate consequence of this model is that the ’returns’ over disjoint time intervals of equal duration are independent and identically distributed normal random variables. More speciﬁcally, if S1, S2, . . . , Sn+1 are the prices of a stock at a set of n + 1 equally spaced

time points, then the random variables (S2−S1)/S1, (S3−S2)/S2, . . . , (Sn+1−Sn)/Sn are

i.i.d. normal random variables.

To allow for possible jumps for stock prices, a more general model is considered where the rate of return is assumed to follow a general L´evy process, resulting in the ‘geometric L´evy process’ model for evolution of stock prices (see [8] for example). However, the

(2)

returns over disjoint time intervals of equal duration still remain i.i.d. random variables (though not necessarily normal).

Suppose now that we have available a large set of data on prices of a stock at equal intervals of time and calculate the returns. From this set, say of size n, of return values, suppose we generate a sequence of 0s and 1s that identifies those among the set of return values which constitute the upper (or lower) ten percent among this set. More specifically, a return value is dubbed as 0 if it falls in the upper (or lower) ten percentile of the set of return values and is dubbed as 1 otherwise. This will lead to a n-long sequence of 0s and 1s with (roughly) 0.1n many 0s and remaining (roughly) 0.9n many 1s. If we now look at the lengths of the successive runs of 1s in this sequence, including the (possibly empty) runs before the first and after the last 0s, then how are they expected to behave? We prove that if for each n, the successive returns have an exchangeable joint distribution, as under the Black-Scholes model or geometric Lévy process model, then, in the limit, these run lengths follow a common geometric distribution. Moreover, the run-lengths are asymptotically independent. More specifically for any fixed k, the lengths of first k runs of 1s are asymptotically distributed as k i.i.d. geometric random variables (Theorem 2.1). Next, we consider the probability histogram generated by the lengths of all the runs of 1s. This will be a histogram of a probability on non-negative integers. We prove a strong law of large numbers (Theorem 2.2) which implies that, under the same hypothesis as in the earlier theorem, this (sample) histogram will converge, with probability one, to that of a geometric distribution. The convergence holds, with probability one, also in total variation norm, in Kolmogorov norm and uniformly (Corollary 2.1). Using results from Chen ([4]), we also obtain an associated central limit theorem (Theorem 2.3) for the (sample) probability histogram. It may be worthwhile to point out in this connection, that by applying a result of Chen ([4], Theorem 2), one may obtain a limit theorem for our probability histogram. However, such a limit theorem is substantially weaker than our Theorem 2.2, in the sense that Chen’s result gives convergence only in probability (which is easy in our set-up) and assumes additional conditions.

Another point needs mention here. Under our hypothesis that the returns (over suc-cessive equal lengths of times) are exchangeable, it is obvious that the lengths sucsuc-cessive runs of 1s, as described above, will also have an exchangeable joint distribution. However, they are not independent, even if the successive returns are assumed to be so. Our Theo-rem 2.2 shows that the ﬁnite sample empirical distribution converges almost surely to an

(3)

appropriate geometric distribution, as if it is the empirical distribution based on a finite i.i.d. sample from the geometric distribution. While our Theorem 2.1 shows that the successive run lengths are indeed asymptotically i.i.d geometric, the convergence of the empirical distribution to geometric distribution is not automatic by any means. Further, the de-Finetti decomposition of exchangeable probabilities on infinite product spaces as mixtures of i.i.d. probabilities does not apply here, because, for each n, we only have a finite-dimensional exchangeable distribution. A natural option would be to try and apply ideas of Diaconis-Freedman for finite exchangeable probabilities (see [7]), but our attempts along this line did not prove fruitful for the strong law.

Although we are not going to present simulation results in this paper, remarks might be made according to simulations.

We prove a strong law of large numbers with the limit being an invariance, a ge-ometric distribution. The associated central limit theorem (Theorem 2.3) easily leads to a Donsker-type theorem (Theorem 2.4) for the empirical distribution function on

D[0,∞). This, of course, will characterize the limiting distribution of the

correspond-ing Kolmogorov-Smirnov statistic. As expected, this limitcorrespond-ing distribution will not be the usual Kolmogorov distribution, as was already evidenced by our simulations.

As mentioned in the first paragraph, our raw data consisted of the actual trade price of the intraday transactions data (trades and quotes) of companies in S&P500 list from 1998 to 2007 and part of 2008. The return process was analyzed at five-minute, one-minute, and 30-second intervals for a whole year (see [3]). We repeated the procedure mentioned in previous paragraphs and considered the empirical waiting time distributions of hitting a certain percentile. Closeness of distributions was measured by ROC area and Kolmogorov distance. The invariance of these empirical distributions were found across stocks, time units, and years. This empirical invariance turned out to be different from the geometric distribution (see [3],[10]) .

On the other hand, simulated data points generated from i.i.d. normal and i.i.d. uniform, regarded as returns, were converted into a sequence of 0s and 1s as above. The histograms generated from the lengths of 1-runs were matched against the histogram of a geometric distribution with the parameter corresponding to the width of the percentile. In all cases, there seemed to be perfect match. Of course, this is only to be expected now, in view of our Theorem 2.2. We would like to point out, however, that originally it was this observed invariance in our simulation results that inspired us to formulate and prove

(4)

the main theoretical results described in this work.

Our theoretical results when contrasted with our above-mentioned analysis based on real data on prices of various stocks would raise questions on the validity of a large class of widely accepted models on stock-prices. In fact, any model that merely implies ex-changeability of successive returns would be put to question. In recent times, models that incorporate long range dependence, have been proposed; these models use fractional Brownian motion instead of Brownian motion (see [11]). Since exchangeability of succes-sive returns no longer hold for these models (except in two special cases), our invariance results do not apply. However, simulations indicate that a law of large numbers holds but with a diﬀerent limit.

One other point that we like to note here is that, in the above, we have only described the ﬁrst layer of coding in what is known as hierarchical segmentation of time-series data (see [10]). The data were coded into a binary 0− 1 sequence identifying times of occurrences of “rare events” (namely, occurrence of extreme values). One can now think of the lengths of successive 1-runs (waiting times between occurrence of rare events) as the data, which is only exchangeable but not i.i.d., and code these further into a “second layer” of 0− 1 sequence in exactly the same way as before. An occurrence of a 0 at this level corresponds to times of very long waiting between rare events at the earlier level. A long 1-run at this level corresponds to a long wait between two long waits for extreme events at the earlier level. By assuming only exchangeability (instead of i.i.d.) of the underlying random variables, our results would imply that at every level of coding, the same limiting result for the histograms generated by the lengths of 1-runs is valid.

For possible applications to ﬁnance, one may refer to ([3], [9], [10]). For example, volatile period may be deﬁned hierarchically up to the third level using the real data: if the length of runs of 1s falls in say upper ten percentile, denote that period 0∗, otherwise 1∗; repeat the same procedure for the length of runs of 1∗s and denote the period in the upper ten percentile 0@. One may regard 0@s as the volatile periods. Using this segmentation, the dynamics and trading strategy are studied in ([10]).

References

[1] Andrews, D. W. K. (1988) Laws of large numbers for dependent non-identically distributed random variables. Econometric Theory 4, 458–467.

(5)

[3] Chang, Lo-Bin, Shu-Chun Chen, Fushing Hsieh, Chii-Ruey Hwang, Max Palmer (2009) Empirical invariance in stock market and related problems, draft.

[4] Chen, Wen-Chen (1981) Limit theorems for general size distributions. Journal of Applied

Proba-bility 18, 139–147.

[5] Davidson, J. (1994) An L1-convergence theorem for heterogeneous mixingale arrays with trending

moments. Statistics and Probability Letters 16, 301–306.

[6] De Jong, R. M. (1996) A strong law of large numbers for triangular mixingale arrays. Statistics

and Probability Letter 27, 1–9.

[7] Diaconis, P. and D. Freedman (1980) Finite exchangeable sequences. The Annals of Probability

8, 745–764.

[8] Eberlein, E. (2001) Application of generalized hyperbolic L´evy motions to finance. L´evy Processes:

Theory and Applications [Ed. Barndorﬀ-Nielsen O. E., Mikosch T. and Resnick S. I.], Birkh¨auser, 319–336.

[9] Geman, Stuart and Lo-Bin Chang (2009) Stock prices and the peculiar statistics of rare events, preprint.

[10] Hsieh, Fushing, Shu-Chun Chen, Chii-Ruey Hwang (2009) Discovering stock dynamics through multidimensional volatility-phase, accepted by Quantitative Finance.

[11] Hu, Y., B. Oksendal,A. Sulem (2000) Optimal portfolio in a fractional Black & Scholes Market.

Mathematical Physics and Stochastic Analysis: Essays in Honor of Ludwig Streit, World Scientific,

267–279.

[12] Pollard, David (1984) Convergence of Stochastic Processes, Springer Series in Statistics, xiv, Springer-Verlag

[13] Teicher, H. (1985) Almost certain convergence in double arrays. Zeitschrift fur

Wahrschein-lichkeitstheorie und verwandte Gebiete 69, 331–345.

(2) Asymptotic Error Bounds for Kernel-based Nystrom Low-Rank Approxi-mation Matrices, Lo-Bin Chang, Zhidong Bai, Su-Yun Huang, Chii-Ruey Hwang,

Jour-nal of Multivariate AJour-nalysis (2013), Vol. 120, 102-119.

Due to the fast advancement of information technology, kernel-based learning algo-rithms have become popular nowadays and play an important role in machine learning with ample applications in statistics, biostatistics, medical science, image analysis, pat-tern recognition, engineering, etc. (See, e.g., Cortes and Vapnik, 1995; Cristianini and Shawe-Taylor, 2000; Schölkopf and Smola, 2002; Alpaydm, 2004; Hastie, Tibshirani and Friedman, 2009). Kernel functions are flexible building blocks for modeling complex and nonlinear data structures. The value of a kernel function K(x, y) represents a dot product in a kernel-induced Hilbert space, often high-dimensional or even infinite dimensional, and can be interpreted as similarity measure between the two points, x and y.

(6)

Many kernel-based learning algorithms have computational load scaled with the sam-ple size n of data collection {X1, . . . , Xn}. This article considers the Nyström low-rank approximation to kernel-based Gram matrix and provides it a theoretical justification. Asymptotic error bounds on eigenvalues and eigenvectors are derived for the Nyström low-rank approximation matrix. The lack of intuition for the approximation of eigenvec-tors creates a great surprise on numerical results of asymptotic error. However, to our best knowledge, there is no other existing article mathematically exploring the convergence or error bound of eigenvectors for Nyström approximation matrix. The Nyström method is an easy and yet efficient approach for low-rank approximation, which dramatically cuts down the computational load and memory usage. See, for instance, Lee and Mangasarian (2001), Williams and Seeger (2001), Drineas and Mahoney (2005), Lee and Huang (2007) for studies of Nyström low-rank approximation for kernel matrices.

The underlying kernel function K(x, y) in this article is assumed continuous, symmet-ric, non-negative deﬁnite, and deﬁned on X × X . Let X be a random variable having continuous distribution F on X ⊂ ℜp_{. Let X}

n be the data matrix (random)

consist-ing of i.i.d. copies of X, i.e., Xn := (X1, . . . , Xn)T, which is of size n × p; and let

Kn := K(Xn, Xn) = [K(Xi, Xj)]ni,j=1 be the full kernel matrix. The key idea of Nystr¨om

approximation is to employ a reduced kernel. It randomly selects a portion of data set to generate a thin rectangular kernel matrix, called reduced kernel and denoted by

c

Kn := K(Xn, cXn), where cXnis a data subset matrix formed by a subset of{X1, . . . , Xn}. Then, it uses this much smaller rectangular kernel matrix to replace or to generate an approximation to the full kernel matrix. See Figure 1 for graphical illustration.

full kernel: K(Xn,Xn) reduced kernel: K(Xn,Xgn)

Nystr¨om approximation matrix

Kn≈ cKn ( K( cXn, cXn) )₋₁ c KT n, full kernel: Kn:= K(Xn, Xn), reduced kernel: cKn := K(Xn, cXn).

Figure 1: Reduced kernel and Nystr¨om approximation

The technique of using a reduced kernel matrix has been successfully applied to other kernel-based learning algorithms, such as least squares support vector machine (Suykens and Vandewalle, 1999a, 1999b), proximal support vector machine (Fung and Mangasarian,

(7)

2001), Lagrangian support vector machine (Mangasarian and Musicant, 2001), active set support vector regression (Musicant and Feinberg, 2004), smooth ϵ-support vector regression (Lee, Hsieh and Huang, 2005), kernel sliced-inverse regression (Yeh, Huang and Lee, 2009), robust kernel PCA (Huang, Yeh and Eguchi, 2009), among others.

The random subsample {K(·, Xi_k)}m

k=1 is used as a basis subset to replace the

full-sample basis set{K(·, Xi)}n

i=1. In a training phase of a kernel algorithm, the thin reduced

kernel matrix Kn= K(Xn, cXn) is used as data inputs, where cXn consists of {Xik)}

m k=1.

Notice that the number of observations (the column size of Kn) is not reduced, it is the

number of basis functions (the row size of Kn) that has been cut down. This uniform

random subset for kernel basis selection has a link to the popular uniform design, which is a space filling design. Space filling designs are known to be robust against the worst possible scenario (Fang et al., 2000). Of course, there is always a random luck issue in every random sampling scheme. To improve the quality of the random subsample used as partial kernel basis, a stratified random subset is suggested. For classification problem, the random sampling has to be stratified over classes. For regression problem, the random sampling has to be stratified over the regression responses. Furthermore, the low-rank approximation matrix actually adopts a model with less model complexity, thus a larger penalty is suggested to enforce better data fidelity. See Lee and Huang (2007) for more detailed discussion and suggestions for practical implementation.

The idea of using random subset can also be found in series of works of CUR ma-trix decompositions (Drineas, Kannan and Mahoney, 2006a, 2006b, 2006c; Mahoney and Drineas, 2009, and references therein). See Figure 2 for illustration of a CUR decompo-sition.

A

R

C A

CUR matrix approximation

Ap×q ≈ CUR,

R_r×q: random rows from A,

Cp×c: random columns from A,

core: U :=(CT_C)−1_CT_ART (_RRT)−1_.

(8)

In addition to being continuous, symmetric and non-negative deﬁnite, the kernel func-tion K is assumed square-integrable

∫

X ×X

K2(x, y)dF (x)dF (y) = c <∞, (1) has the following spectrum decomposition

K(x, y) = ∞ ∑ k=1 λkηk(x)ηk(y), where ∫ X ηk(x)ηj(x)dF (x) = δkj, (2)

and is of trace type, i.e.,

∞

∑

k=1

λk<∞. (3)

For simplicity, we assume eigenvalues of K are strictly positive, distinct, and arranged in descending order

λ1 > λ2 > λ3 >· · · > λk >· · · > 0. (4) (Notice that the method is designed to work for symmetric non-negative definite kernels, not for others (see the second example in Section 4). The number of positive eigenvalues of the kernel needs to be greater than n for n×n data kernel matrices defined below. If the eigenvalues are not distinct, the corresponding eigenspaces of non-distinct eigenvalues are of more than one dimension. Thus, the convergence result for eigenvalues still holds, but the convergence for eigenvectors need to be modified in terms of eigenspaces.) Consider the n× n kernel data matrix (scaled by n−1)

Mn:= n−1K(Xn, Xn) = n−1[K(Xi, Xj)]n_i,j=1 = n−1Kn (5)

with eigenvalues λn1 ≥ λn2 ≥ λn3 ≥ · · · ≥ λnn ≥ 0 and corresponding unit eigenvectors

unk, k = 1, 2, . . . , n. Matrices Kn and Mn are both called a full kernel matrix. The

eigenvalue decomposition problem for a full kernel matrix Mn is computationally costly.

An alternative is to resort to a reduced kernel by random subset. Since data are i.i.d. copies of X, without loss of generality, we may assume that the random subset, denoted by cXn(m), is formed by{X1, . . . , Xm} for some m < n. Consider the partition of Mn as

Mn= [ M11 M12 M21 M22 ] m n− m m n− m . (6)

The following rank-m approximation matrix is called Nystr¨om approximation to Mn

c M_n(m):= [ M11 M21 ] M₁₁−1[M11 M12] = [ M11 M12 M21 M21M11−1M12 ] . (7)

(9)

This approximation is based on a reduced kernel 1_nK(Xn, cXn(m)) =

[

M₁₁T, M₂₁T]T, which is of size n× m. Denote the kth _{eigenvalue and its associated eigenvector of c}_M(m)

n by bλ(m)_nk

and ub(m)_nk . The aims of this article are

(i) to study the asymptotic orders of magnitude for the full kernel{λnk, unk} and the

reduced kernel{bλ(m)_nk ,ub(m)_nk }, as compared with the ideal ones {λk,√1

nηk(Xn)}, where ηk(Xn) is an n-vector given by ηk(Xn) := ( ηk(X1), . . . , ηk(Xn) )T ; and

(ii) to ﬁnd asymptotic bounds for Mn− cM

(m)

n in terms of their eigenvalues and

eigen-vectors, where Mn− cMn(m) is the diﬀerence between the full kernel matrix and the

reduced-rank approximation matrix.

ߣሚ௡௞௦ ,࢛෥௡௞௦ _௞ୀଵ ௦ based on ࡹ෩_௡(௦) ߣመ௡௞௠,࢛ෝ௡௞௠ _௞ୀଵ ௦ based on ࡹ෡_௡(௠) ߣ௡௞,࢛௡௞ ௞ୀଵ௦ based on ࡹ_௡ ߣ௞,ߟ௞ ࢄ௡ ௞ୀଵ௦

based on kernel spectrum and data ࢄ_௡

(ܿ_௜,௝ estimates ) Lemma 2

Lemma 4 Lemma 3 (proved using Lemma 1)

Theorem 2 (proved using Theorem 1 and Lemma 4 )

Figure 3: Connection diagram. The correspondences among variables, lemmas and theo-rems can be viewed in this diagram. The major goal is to connect {λnk, unk} (with full

kernel matrix) to {bλ(m)_nk ,ub(m)_nk } (with reduced-rank approximation matrix).

Results established in this article are summarized in Figure 3. The eλ(s)_nk, ue(s)_nk and f

Mn(s) in the diagram are intermediate variables deﬁned later in the article to prove our

main results. Notice that two opposite directions can be chosen for an eigenvector. For convenience, the directions of the eigenvectors, unk, ub

(m)

nk , ue

(s)

nk are chosen toward the

direction of ηk(Xn) for each k. The rest of the article is organized as follows. Main

results of eigenvalues and eigenvectors error bounds for Nystr¨om approximation matrix are given in Section 2. Technical lemmas and proofs are placed in Section 3. Two numerical examples are displayed in Section 4. A list of notation usage is appended at the end of the article.

(10)

References

Alpaydm, E. (2004). Introduction to Machine Learning. MIT Press, Cambridge, MA.

Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis. Wiley.

Bai, Z.D. (1999). Methodologies in spectral analysis of large dimensional random ma-trices, a review. Statistica Sinica, 9: 611-677.

Cortes, C. and Vapnik, V.N. (1995). Support-vector networks, Machine Learning, 20: 273-297.

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector

Ma-chines. Cambridge University Press, Cambridge.

Drineas, P., Kannan, R. and Mahoney, M.W. (2006a). Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Computing, 36: 132-157.

Drineas, P., Kannan, R. and Mahoney, M.W. (2006b). Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J. Computing, 36: 158-183.

Drineas, P., Kannan, R. and Mahoney, M.W. (2006c). Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition. SIAM J.

Computing, 36: 184-206.

Drineas, P. and Mahoney, M.W. (2005). On the Nystr¨om method for approximating a Gram matrix for improved kernel-based learning. J. Machine Learning Research, 6: 2153-2175.

Fang, K.T., Lin, D.K.J., Winker, P. and Zhang, Y. (2000). Uniform design: theory and application. Technometrics, 42: 237-248.

Fung, G. and Mangasarian, O.L. (2001). Proximal support vector machine classiﬁers. In Proceedings KDD-2001: Knowledge Discovery and Data Mining, F. Provost and R. Srikant, Eds., San Francisco, CA, Aug. 26V29, 2001, pp. 77-86.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning:

(11)

Huang, S.Y., Yeh, Y.R. and Eguchi, S.(2009). Robust kernel principal component anal-ysis. Neural Computation, 21: 3179-3213.

Lee, Y.J., Hsieh, W.F. and Huang, C.M. (2005). ϵ-SSVR: A smooth support vector machine for ϵ-insensitive regression. IEEE Transactions on Knowledge and Data

Engineering., 17: 678-685.

Lee, Y.J. and Huang, S.Y. (2007). Reduced support vector machines: a statistical theory.

IEEE Trans. Neural Networks, 18: 1-13.

Lee, Y.J. and Mangasarian, O.L. (2001). RSVM: Reduced support vector machines. In

Proceedings of the First SIAM International Conference on Data Mining.

Philadel-phia: SIAM.

Mahoney, M.W. and Drineas, P. (2009). CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, USA, 106(3): 697-702.

Mangasarian, O.L. and Musicant, D.R. (2001). Lagrangian support vector machines, J.

Machine Learning Research, 1: 161-177.

Musicant, D.R. and Feinberg, (2004). A. Active set support vector regression, IEEE

Trans. Neural Networks, 15: 268-275.

Sch¨olkopf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machines,

Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.

Suykens, J.A.K. and Vandewalle, J. (1999a). Least squares support vector machine classiﬁers. Neural Process. Lett., 9: 293-300.

Suykens, J.A.K. and Vandewalle, J. (1999b). Multiclass least squares support vector machines. In Proc. IJCNN, Washington, DC, 1999, 900-903.

Williams, C. and Seeger, M. (2001). Using the Nystr¨om method to speed up kernel machines. In Advances in Neural Information Processing Systems, 13: 682-688, T.K. Leen, T.G. Dietterich, and V. Tresp, Eds. MIT Press, Cambridge, MA.

Yeh, Y.R., Huang, S.Y. and Lee, Y.J. (2009). Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Transactions on Knowledge and Data Engineering, 21: 1590-1603.

(12)

Zhang, F. (1999). Matrix Theory: Basic Results and Techniques. Springer, New York.

(3) Empirical Scaling Laws and the Aggregation of Non-stationary Data,

Lo-Bin Chang, Stuart Geman, Physica A (2013), Vol. 392, Issue 20, 5046-5052.

The 1900 dissertation of Louis Bachelier, on The Theory of Speculation (Bachelier, 1900), proposed a random-walk model for security prices. The basic model, elaborated to accommodate heavy-tailed distributions and stochastic volatilities, still provides a com-pelling and nearly universally accepted foundation for a theory of price movements. At the same time, a salient and much-discussed feature of the data is the remarkably precise self-similarity of the returns, relative to the return interval, of many of these securities. This was ﬁrst observed by Mandelbrot (1963) and has since been found in multiple data sets involving a range of securities and time periods (cf. Evertsz, 1995, Mantegna & Stanley, 1995, Guillaume et al., 1997, Gopikrishnan et al., 1999, Podobnik et al., 2000, Ivanov et al., 2001, Wang and Hui, 2001, Gencay et al., 2001, Xu & Gencay, 2003, Ivanov et al., 2004, Matteo et al., 2005, Matteo, 2007, Glattfelder et al., 2011, and Podobnik et al., 2011, to name a few). With a straightforward calculation we will conclude that state-of-the-art models of price movements do not generate self-similar processes, and are therefore at odds with the empirical scaling of returns. We will then show that scaling of

empirical distributions is likely to be a statistical consequence of the aggregation of large

amounts of non-stationary data.

Bachelier’s remarkable thesis included a ﬁrst construction of Brownian motion, and proposed a suitably scaled version as a model for the price dynamics of securities: S(t) =

S(0)+σw(t), in which w is a “standard” Brownian motion and σ is the standard deviation

of the change in price after one unit of time. The model has evolved, incrementally, to better accommodate theoretical and empirical constraints. For example, the realization that the scale of an ensuing price increment is typically and logically proportional to the current price, rather than independent of it, leads to the geometric (instead of linear) Brownian motion:

R(t)= ln S(t). − ln S(0) = σw(t) (8) after correcting for a possible drift associated with risk-free investment.

Additionally, the common observation that returns are too peaked and heavy-tailed to be consistent with the normal distribution led Mandelbrot (1963) to seek a replacement

(13)

for the Brownian motion, while preserving the compelling argument that increments of prices arise from large numbers of small influences. As stable processes are the only possible limits of rescaled sums of independent random variables (the “generalized central limit theorem,” Lévy, 1925), and as the resulting theoretical return distributions are a better, and often excellent, fit to empirical returns, Mandelbrot proposed models of the same form as (8) but with w(t) interpreted more generally as an α-stable L´evy process,

α∈ (0, 2]. The special case α = 2 recovers ordinary Brownian motion.

Further reﬁnements are dictated by the fact that volatilities, modeled by the scaling factor σ (which is a standard deviation only in the case α = 2), are almost never constant (σ = σ(t), “stochastic volatility” cf. Shephard, 2005 and Shapira, 2011 ). And in fact the evidence is for very rapid ﬂuctuations in σ(t) (e.g. the left-hand panel in Figure ?? is typical). A parsimonious extension of (8), whether or not α = 2, is through the stochastic integral

R(t) = ln S(t)− ln S(0) =

∫ t

0

σ(s)dw(s) (9)

which falls out of the same thought experiment that took us from discrete and small price movements to the stable process w(t), except that a step at time t has scale proportional to σ(t) rather than σ.

Many lines of thought lead to more-or-less the same thing. For example, the function

σ(t) can be thought of as itself a stochastic process, dependent or independent of w, or

as a given deterministic (perhaps historical) volatility trajectory. Many authors prefer to think of σ(t) as a proxy for, or measure of, market activity or “market time,” and in fact under very general conditions the result of a random time change can also be expressed by (9), cf. Clark (1973), Geman et al. (2001) and Veraart & Winkel (2010). We will assume that either σ(t) is deterministic or, if stochastic, it is independent of w, in which case we will condition on a sample path of σ(t) so that the two situations amount to the same thing. In any case, it would be a mistake to think of σ(t) as statistically stationary, given the prototypical intraday volatility proﬁle (including high values in the opening and closing thirty minutes) and the overall rise in volatility with rise in volume through the years and decades over which return proﬁles are studied.

The question we wish to examine is an apparent incompatibility between the class of models embodied by equation (9) and the widely cited evidence for scaling of returns on stock prices and other ﬁnancial processes.

(14)

References

[1] A. Amarasingham, M. Harrison, N. Hatsopoulos, and S. Geman. Conditional model-ing and the jitter method of spike resamplmodel-ing. Journal of Neurophysiology, 107:517– 531, 2012.

[2] L. Bachelier. Th´eory de la sp´eculation. Annales Scientifiques de l’Ecole Normale

Sup´erieure, 3(17):21–86, 1900.

[3] L.-B. Chang, S. Geman, F. Hsieh, and C.-R. Hwang. Invariance in the recurrence of large returns and the validation of models of price dynamics. submitted for publica-tion, 2013.

[4] P.K. Clark. A subordinated stochastic process model with ﬁnite variance for specu-lative prices. Econometrica, 41(1):135–155, 1973.

[5] C. Evertsz. Fractal geometry of ﬁnancial time series. Fractals, 3(3):609–616, 1995.

[6] H. Geman, D.B. Madan, and M. Yor. Time changes for L´evy processes. Mathematical

Finance, 11(1):79–96, 2001.

[7] R. Gencay, F. Selcuk, and B. Whitcher. Scaling properties of foreign exchange volatil-ity. Physica A, 289:249–266, 2001.

[8] J.B. Glattfelder, A. Dupuis, and R.B. Olsen. Patterns in high-frequency fx data: discovery of 12 empirical scaling laws. Quantitative Finance, 11:599–614, 2011.

[9] P. Gopikrishnan, V. Plerou, L. Amaral, M. Meyer, and H. Stanley. Scaling of the distribution of ﬂuctuations of ﬁnancial market indices. Physical Review E, 60:5305– 5316, 1999.

[10] D. Guillaume, M. Dacorogna, R. Dav´e, Ulrich A. M¨uller, R. Olsen, and O. Pictet. From the bird?s eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Finance and Stochastics, 1:95–129, 1997.

[11] F. Hsieh, S.-C. Chen, and C.-R. Hwang. Discovering stock dynamics through multi-dimensional volatility phases. Quantitative Finance, 12:213–230, 2012.

(15)

[12] P.C. Ivanov, B. Podobnik, Y. Lee, and H.E. Stanley. Truncated levy process with scale-invariant behavior. Physica A, 299:154–160, 2001.

[13] P.C. Ivanov, A. Yuen, B. Podobnik, and Y. Lee. Common scaling patterns in inter-trade times of u. s. stocks. Physical Review E, 69:056107(7 pages), 2004.

[14] P. L´evy. Calcul des probabilit´es. Gauthier-Villars, Paris, 1925.

[15] B. Mandelbrot. The variation of certain speculative prices. Journal of Business, 36:394–419, 1963.

[16] R.N. Mantegna and H.E. Stanley. Scaling behaviour in the dynamics of an economic index. Nature, 376:46–49, 1995.

[17] T.D. Matteo. Multi-scaling in ﬁnance. Quantitative Finance, 7:21–36, 2007.

[18] T.D. Matteo, T. Aste, and M.M. Dacorogna. Long-term memories of developed and emerging markets: Using the scaling analysis to characterize their stage of develop-ment. Journal of Banking and Finance, 29:827–851, 2005.

[19] B. Podobnik, P.C. Ivanov, and H.E. Stanley. Scale-invariant truncated levy process.

Europhysics Letters, 52(5):491–497, 2000.

[20] B. Podobnik, A. Valentincic, D. Hovatic, and H.E. Stanley. Asymmetric levy ﬂight in ﬁnancial ratios. Proceedings of the National Academy of Sciences, 108:17883, 2011.

[21] T. Preis, J.J. Schneider, and H.E. Stanley. Switching processes in ﬁnancial markets.

Proceedings of the National Academy of Sciences, 108(19):7674, 2011.

[22] Y. Shapira, D. Y. Kenett, Ohad Raviv, and E. Ben-Jacob. Hidden temporal order unveiled in stock market volatility variance. AIP Advances, 1(2):022127, 2011.

[23] N. Shephard. Stochastic Volatility. Oxford University Press, Oxford, 2005.

[24] A.E. Veraart and M. Winkel. Time change. In Rama Cont, editor, Encyclopedia of

Quantitative Finance. Wiley Online Library, 2010.

[25] B.H. Wang and P.M. Hui. The distribution and scaling of ﬂuctuations for hang seng index in hong kong stock market. The European Physical Journal B, 20:573–579, 2001.

(16)

[26] Z. Xu and R. Gencay. Scaling, self-similarity and multifractality in fx markets.

Physica A, 323:578–590, 2003.

(4) Invariance in the Recurrence of Large Returns and the Validation of Models of Price Dynamics, Lo-Bin Chang, Stuart Geman, Fushing Hsieh, Chii-Ruey

Hwang, Physical Review E (2013), Vol. 88, Issue 2, 022116.

Given a sequence of stock prices s0, s1, . . . recorded at ﬁxed intervals, say every ﬁve minutes, let rn

.

= log sn

sn−1, n = 1, 2, . . ., be the corresponding sequence of returns. Fix N

and deﬁne an excursion to be a return that is large, in absolute value, relative to the set

{r1, r2, . . . , rN}. Speciﬁcally, following Hsieh et al. (2012), deﬁne the excursion process, z1, z2, . . . , zN:

zn =

{

1 if rn ≤ l or rn ≥ u

0 if rn ∈ (u, l)

where l and u are, respectively, the 10’th and 90’th percentiles of {r1, . . . , rN}. We call

the event zn = 1 an excursion, since it represents a large movement of the stock relative

to the chosen set of returns. We will study the distribution of waiting times between large stock returns by studying the distribution of the number of zeros between successive ones of the excursion process. Our motivation includes:

1. The empirical observation (cf. Chang et al., 2013) that this waiting-time distribution is nearly invariant to time scale (e.g. thirty-second, one-minute, or ﬁve-minute returns), to stock (e.g. IBM or Citigroup), and to year (e.g. 2001 or 2007).

2. The waiting-time to large returns is of obvious interest to investors, and much easier to study if, and to the extent that, it is invariant across time scale, stock, and year.

3. The particular waiting-time distribution found in the data and its invariance to time scale have implications for models of price and volatility movement. For instance, L´evy processes, “market-time” models based on volume or trades, and GARCH models are each one way or another inconsistent with the empirical data.

4. Overwhelmingly, the evidence for self-similarity comes from studies of the univariate (marginal) return distributions (e.g. evidence for a stable-law distribution), but marginal distributions leave data models underspeciﬁed. Waiting-time distributions provide additional, explicitly temporal, constraints, and these appear to be nearly universal.

(17)

Larger returns can be studied by using more extreme percentiles. Although we have not experimented extensively, the empirical results we will report on appear to be qual-itatively robust to the chosen percentiles and hence the definition of “large return.” In general, the upper and lower percentiles index a family of waiting-time distributions that might prove useful to systematically constrain the dynamics of price and volatility models. In§2, we study the invariance of the empirical waiting-time distribution. Starting with the Lévy type models, we first make a connection between the model-based distribution and the geometric distribution. To be concrete, let S(t) follow the “Black-Scholes model” (geometric Brownian motion) as an example: d log S(t) = µdt + σdw(t), where w(t) is a standard Brownian motion. Because of the independent increments property of Brownian motion w(t), the return sequence under this model is exchangeable (i.e. the distribution of any permutation remains the same). Therefore, the empirical waiting-time distribution under this model is provably invariant to waiting-time scale and to waiting-time period. More specifically, the probability of getting a “large” return, with l =10’th percentile and

u =90’th percentile, is exactly 0.2 at each return interval and the empirical waiting-time

distribution is, therefore, nearly a geometric distribution with parameter 0.2 (see§2.1 for more detail). We emphasize the these considerations apply without modiﬁcation not just to the geometric Brownian motion but to all of its popular generalizations as geometric L´evy processes.

Not surprisingly (cf. “stochastic volatility”), the actual (i.e. empirical) waiting-time distribution is diﬀerent from geometric. But what is surprising is the invariance of this distribution across time scale, stock, and year. In§2.2 we make an exhaustive comparison of empirical waiting-time distributions, using trading prices of approximately 300 stocks from the S&P 500 observed over the eight years from 2001 through 2008. Invariance to timescale is strong in all eight years; invariance to stock is strong in years 2001–2007 and less strong in 2008; and invariance across years is stronger for pairs of years that do not include 2008. (We have not studied the years since 2008.) In §2.3, we will connect waiting-time invariance to self-similarity, being careful to distinguish a self-similar process from a process having self-similar increments (i.e. distinguish dynamics from marginal distributions).

Which of the state-of-the-art models of price dynamics are consistent with the empiri-cal distribution of the excursion process? The existence of a nearly invariant waiting-time distribution between excursions provides a new tool for evaluating these models, through

(18)

which questions of consistency with the data can be addressed using statistical measures of ﬁt and hypothesis tests. In general, we will advocate for permutation and other com-binatorial statistical approaches that robustly and eﬃciently exploit symmetries shared by large classes of models, supporting exact hypothesis tests as well as exploratory data analysis. In §3 we introduce some combinatorial tools for hypothesis testing and explore the implications of waiting-time distributions to the time scale of volatility clustering. We continue with this approach, in §4, with a discussion of stochastic volatility modeling, as well as “market-time” and other stochastic time-change models. We conclude, in§5, with a summary and some proposals for price and volatility modeling.

References

[1] T. G. Andersen and T. Bollerslev. Intraday periodicity and volatility persistence in ﬁnancial markets. Journal of Empirical Finance, 4:115–158, 1997.

[2] T.G. Andersen. Return volatility and trading volume: An information ﬂow interpre-tation of stochastic volatility. Journal of Finance, 51:169–204, 1996.

[3] T. An´e and H. Geman. Order ﬂow, transaction clock, and normality of asset returns.

Journal of Finance, 55(5):2259–2284, 2000.

[4] T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of

Econometrics, 31:307–327, 1986.

[5] J.-P. Bouchaud. Power laws in economics and ﬁnance: some ideas from physics.

Quantitative Finance, 1:105–112, 2001.

[6] L. Calvet and A. Fisher. Multifractality in asset returns: theory and evidence. The

Review of Economics and Statistics, 84:381–406, 2002.

[7] L.-B. Chang and S. Geman. Empirical scaling laws and the aggregation of non-stationary data. Physica A, to appear, 2013.

[8] L.-B. Chang, A. Goswami, F. Hsieh, and C.-R. Hwang. An invariance for the large-sample empirical distribution of waiting time between successive extremes. Bulletin

(19)

[9] P.K. Clark. A subordinated stochastic process model with ﬁnite variance for specu-lative prices. Econometrica, 41(1):135–155, 1973.

[10] P. Diaconis and D. Freedman. Finite exchangeable sequences. Annals of Probability, 8(4):745–764, 1980.

[11] D. Easley, M.M. L´opez de Prado, and M. O’Hara. The volume clock: insights into the high frequency paradigm. Journal of Portfolio Management, 39:19–29, 2012.

[12] P. Embrechts and M. Maejima. Selfsimilar Processes. Princeton University Press, Princeton, NJ, 2002.

[13] R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom inﬂation. Econometrica, 50:987–1008, 1982.

[14] X. Gabaix, P. Gopikrishnan, V. Plerou, and H.E. Stanley. A theory of power-law distributions in ﬁnancial market ﬂuctuations. Nature, 423:267–270, 2003.

[15] H. Geman. From measure changes to time changes in asset pricing. Journal of

Banking & Finance, 29:2701–2722, 2005.

[16] H. Geman, D.B. Madan, and M. Yor. Time changes for L´evy processes. Mathematical

Finance, 11(1):79–96, 2001.

[17] R. Gencay, F. Selcuk, and B. Whitcher. Scaling properties of foreign exchange volatil-ity. Physica A, 289:249–266, 2001.

[18] J.B. Glattfelder, A. Dupuis, and R.B. Olsen. Patterns in high-frequency fx data: discovery of 12 empirical scaling laws. Quantitative Finance, 11:599–614, 2011.

[19] S.L. Heston. A closed-form solution for options with stochastic volatility with appli-cations to bond and currency options. The Review of Financial Studies, 6(2):327–343, 1993.

[20] C.C. Heyde. A risky asset model with strong dependence through fractal activity time. J. Appl. Prob., 36:1234–1239, 1999.

[21] F. Hsieh, S.-C. Chen, and C.-R. Hwang. Discovering stock dynamics through multi-dimensional volatility-phases. Quantitative Finance, 12:213–230, 2012.

(20)

[22] J. Hull and A. White. The pricing of options on assets with stochastic volatilities.

Journal of Finance, 42:281–300, 1987.

[23] B. Mandelbrot. The variation of certain speculative prices. Journal of Business, 36:394–419, 1963.

[24] B. Mandelbrot and H. Taylor. On the distribution of stock price diﬀerences.

Opera-tions Research, 15:1057–1062, 1967.

[25] R.N. Mantegna and H.E. Stanley. Scaling behaviour in the dynamics of an economic index. Nature, 376:46–49, 1995.

[26] R.N. Mantegna and H.E. Stanley. An Introduction to econophysics. Cambridge University Press, 2000.

[27] T.D. Matteo. Multi-scaling in ﬁnance. Quantitative Finance, 7:21–36, 2007.

[28] U.A. Muller, M.M. Dacorogna, R.B. Olsen, O.V. Pictet, M. Schwarz, and C. Mor-genegg. Statistical study of foreign exchange rates, empirical evidence of a price change scaling law, and intraday analysis. Journal of Banking and Finance, 14:1189– 1208, 1990.

[29] N. Shephard. Stochastic Volatility. Oxford University Press, Oxford, 2005.

[30] A.E. Veraart and M. Winkel. Time change. In Rama Cont, editor, Encyclopedia of

Quantitative Finance. Wiley Online Library, 2010.

[31] B.H. Wang and P.M. Hui. The distribution and scaling of ﬂuctuations for hang seng index in hong kong stock market. The European Physical Journal B, 20:573–579, 2001.

[32] F. Wang, K. Yamasaki, S. Havlin, and H.E. Stanley. Scaling and memory of intraday volatility return intervals in stock markets. Physical Review E, 73:026117–1–023117– 8, 2006.

[33] Z. Xu and R. Gencay. Scaling, self-similarity and multifractality in fx markets.