Recapitulating Predictive Behaviors - 視網膜中的預測行為與編碼

The term “prediction” has been widely used throughout this study, whenever a re-sponse relates to the future stimulus. However, the contexts for these prediction might be quite diﬀerent. For instance, even though the latency of OSR is clearly predictive, if we quantify the performance of OSR to the corresponding instanta-neous input, the information content would be quite low since there is actually no input signal at the very moment. On the other hand, when we measure non-zero values for predictive information, it is not clear if the system really makes prediction actively or this performance is solely contributed by the correlated input structure.

With these considerations, it is necessary to recapitulate the concepts and discuss some of the methods in this section.

4.3.1 How to define predictive behavior

Prediction is a proper forecast for an uncertain event given the statistical inference.

Time series prediction has long been an important task in studies such as economics and climate change[50]. Methods such as nonlinear regression and artificial neural networks manage to generate the expected value for an upcoming event through de-trending from accumulated data points[63]. The parameters for these predictive models are usually learned through a set of training data. In contrast, in our exper-iments, there is little pre-knowledge about what neural computations occur in the retina. We choose to scan over diﬀerent input statistics and compare the results with numerical simulations to understand the working range and possible computations in the retina.

Prediction should be viewed as a functional consideration for neural computation in this study. Without knowing the underlying mechanism, we identify predictive

behavior simply when the response is informative about the future input. One of the criteria used is the comparison with control groups that should not make prediction. For instance, the linear-nonlinear model fails to describe either latency of spontaneously occurred OSR after periodic drive or the coding performance under stochastic stimuli. On the other hand, if the retina processes the input by simply

“reflecting” the exact signal without any computation, the predictive information can still be non-zero due to the correlated structure in stimuli. The normalization defined as Pp values is therefore an index to distinguish this eﬀect, since Pp may be approximately constant if the retina is simply doing so. To sum up, it is a specific type of feature selection that makes the predictive behavior distinguishable from the trivial results in passive processes.

Another noteworthy discovery for predictive behaviors is the identification of hidden variables in the time series. The diﬀerence in the OU and HMM signal provided in the experiments is the hidden variable in HMM that allows one to make better prediction about the consecutive future input. In other words, the physical limit (further derived in the appendix) that allows one to make prediction for OU and HMM inputs is diﬀerent if the hidden variable is realized. The retina must be able to detect the second-order property, namely the diﬀerence in two consecutive intervals, and represent it in the firing rate to make proper prediction for HMM inputs. In fact, according to the experiments in transient responses, the retina is able to detect the diﬀerence in intervals sensitively. It is remarkable to see how the shape of Im curves deviate under the two stimuli in the experimental results, showing that the retina is able to extract features from the hidden variable. Also, the position of tp that moves to positive t is a critical evidence to show that the retina

experiments.

To understand temporal predictions quantitatively, diﬀerent numerical methods were applied to implement the predictive behaviors observed in experiments. It is confirmed that the passive process such as LN model fails to capture the shape of Im curve and the position of tp. These properties contribute to the change in Pp

under diﬀerent stimuli. It is therefore expected to see that the dependency of Pp

to stimuli calculated from LN model is incorrect. On the other hand, while AFHN produces OSR, it still fails to capture the dependency under stochastic stimuli. It is possible that the degree of freedoms in the simplified model forms the constraint to make proper prediction for the continuous stimuli, or we simply have not fine tune the optimal parameter for such task. Note that the dependencies are more close to the experimental results when the calculation were done on parameter a in the AFHN model. This implies that the slow parameter provides better predictive power, possibly because it filters out the fluctuation and captures the changing trend in the stochastic input. It has been shown that “slowness” extracted from time series are related to prediction in theoretical works as well[25]. On the other hand, the conceptual model using PID circuit that includes feedback adjustment captures predictive behavior such as the shift in tp. This guides our ongoing work to add the inhibitory feedback shown in predictive coding for similar simulation.

In the end, we constructed the gadenkan retina to prove that this shift in tp is not unphysical but could be achieved when the iteration structure of the input is learned. This could also be viewed as the physical limit that one can predict from the original HHM time series. This is not to directly prove that retina may really learn to solve the parameter in this damped oscillator driven by noise. Rather, similar to what has been shown in the AFHN model that produces OSR, it might be able to response to the input adaptively/actively (mechanism still unknown) to make proper temporal prediction.

4.3.2 Other methods to measure prediction

Alternative method was used in the previous study to verify the optimality of predic-tive information measured from the retinal response[65]. The retina was responding to a moving bar governed by the same stochastic damped oscillator in space. By recording the spiking patterns from a population of retinal ganglion cells, the pre-dictive information between the current firing pattern and the future position of the moving bar could be quantified, obtaining similar Im curves measured in this study.

They attempt to verify how “good” is this predictive performance. The information-bottleneck (IB) method was applied to calculate the physical bound of prediction that is determined by the statistical structure[22]. The retina would at most be able to make certain amount of prediction given the constraint of its own capacity and the predictability of the moving bar. Surprisingly, the values measured from experiments match closely to this IB bound, showing the optimal predictive repre-sentation in the retinal firing pattern (Fig. 4.3). Moreover, it is shown that these representations allow the downstream neurons that receives population activities to perform predictive computations as well.

More recently, another method that introduces IB method was shown in exper-iments on MMN recorded in the auditory cortex[71]. Rather then constructing the correlated stochastic stimuli, auditory stimuli with simply two tone are presented and the probability to switch between two them are controlled. By scanning over the constraints in the IB method and measuring neural representation of the prediction error, they could draw similar conclusion that the oddball responses are optimal predictors. Furthermore, a conclusion similar to our finding is that while there is a

Figure 4.3: Optimality of predictive information in the retina. (a) Information about the future stimuli (distinguished at t = 0) from t time lag behind, calculated from firing pattern of 5 retinal ganglion cells. The solid line shows the statistical limit to predict this stimuli estimated by the IB method. (b) Shows the optimality of groups with diﬀerent number of cells N, plotting information captured for the past Ipast against the predictive information If uture at a fixed t timing. The solid line is calculated from the IB method, showing a bound on predictive information (I_{f uture}^⇤ ) given an encoded past. Note that the If uture/I_{f uture}^⇤ is close to 1 in the results, indicating the optimal performance for prediction. Figure reprinted from [65].

sensory systems. In this study, diﬀerently, the measurements under stimuli with diﬀerent statistics clearly show that the predictive behavior has a certain working range. Similar to the dynamic range in OSR phenomenon, the optimality for retina to make temporal prediction might be maximum when the stimulus parameters are tuned in a certain range. Interestingly, one may postulate and further investigate if these parameters meet specific statistics in the environment where the animal adaptively evolves from.

Recently, an alternative measurement called “local active information storage”

(LAIS) may also be a candidate to characterize predictive behaviors in nervous systems and complex systems such as cell automata[96]. This quantity is calculated by the measurements from each units (one channel in MEA or a region in calcium imaging) and considering the values through space-time. The LAIS values could even be negative when the observation of past states decreases the conditional probability for the current state, namely being mis-informative. This measure is therefore closely related to predictive coding in a neural population. It would be idea to apply such calculation to the retinal activities driven by stochastic pulse intervals and compare with the current result by measuring predictive information, or even further analyze the information transfer between units in space measured from MEA.

在文檔中視網膜中的預測行為與編碼 (頁 110-115)