Chapter 1: Introduction
3. An overview of striatum: anatomy and neural circuits
The parameter β, which is called the inverse temperature, represents choice
perseveration (or exploration/exploitation), a term referring to the tendency of making actions guided by reward values. A zero value of β means the agent will choose the action at random. Thus, the hypothesis of Q-learning included not only the predictive value, but also the action to explain behaviors. And it was postulated that learning is to optimize the consequences of actions in terms of some long-term measure of total obtained rewards (and/or avoided punishments). Somehow, this hypothesis seemed to be similar to the one which instrumental conditioning proposed. Thus, the study of instrumental conditioning, using TD learning model (consider both value and action), could be an approach into the fundamental form of rational decision-making.
3. An overview of striatum: anatomy and neural circuits
3.1. Anatomy of striatum. The striatum is the principal input structure of the
basal ganglia that influences motor control and reward-based learning (Chang, Chen, Luo, Shi, & Woodward, 2002; Lauwereyns, Watanabe, & Coe, 2002; Tanaka et al.,
15
2006). The principal neurons in the striatum are medium spiny neurons (MSN), which represent over 95% of total neurons. These GABAergic neurons receive two major glutamatergic inputs from the cortex and the thalamus (Kreitzer & Malenka, 2008;
Lovinger, 2010; Surmeier, Ding, Day, Wang, & Shen, 2007). MSNs also receive dopaminergic inputs from the SN-VTA complex, and regulation of MSN by dopamine is important for reward learning (Lee, Seo, & Jung, 2012; Oyama et al., 2010; Schultz, 2006).
Evidence showed that the MSNs can be further divided into two categories: the striatonigral MSNs and the striatopallidal MSNs. The striatonigral MSNs express D1-like receptors, group I mGluRs (mGluR1/5), M1 and M4 muscarinic receptors, while the striatopallidal MSNs express D2-like receptors, M1 muscarinic receptors, adenosine A2A receptors and group I mGluRs (mGluR1/5) (Kreitzer & Malenka, 2008). Both subgroups of MSNs are morphologically indistinguishable and mosaically distributed (Gerfen & Young, 1988; Gerfen, 1992; Giménez-Amaya &
Graybiel, 1990). However, recent studies using technique of bacterial artificial chromosome (BAC) mediated transgenesis in mice has shown differences of basal electrophysiological properties and synaptic plasticity between the striatonigral and striatopallidal MSNs (Kreitzer & Malenka, 2007; Shen, Flajolet, Greengard, &
16
Surmeier, 2008).
In addition, MSNs receive GABAergic synapse from local interneurons as well as other MSNs (Kawaguchi, Wilson, Augood, & Emson, 1995; Kreitzer, 2009).
Striatal interneurons are grouped into four types based on the cytochemical,
physiological and morphological properties. The giant cholinergic interneurons with large soma are the source of acetylcholine (ACh) in the striatum and their axonal fields are extensive compared with other interneurons. Cholinergic interneurons display tonic irregular firing pattern and are featured by a long duration after hyperpolarization, hence are also called long duration after hyperpolarization cells.
The second type of interneuron is the parvalbumin-containing cell which composes 3-5% of total striatal neurons and is characterized as fast-spiking firing pattern in vitro.
The third type of interneuron is the somatostatin (Neuropeptide Y, NOS)-containing interneuron which represents 1-2% of total striatal neurons, and the dendrites of which are relatively unbranched for longer distances. Somatostatin-containing interneuron is featured by Ca2+-dependent low threshold spikes in vitro. The fourth type of
interneuron is the calretinin-containing interneuron, the phenotype and physiology of which have not been well established (Kawaguchi et al., 1995; Kreitzer, 2009;
Lovinger, 2010).
17
There are two pathways of projections of MSNs. One is called the direct pathway and the other is called the indirect pathway (Albin, Young, & Penney, 1989; Garrett E.
Alexander & Crutcher, 1990; DeLong, 1990). The direct-pathway circuit originates from striatonigral MSNs, which project to GABAergic neurons in the internal globus pallidus (GPi in primates, GPm in rodents) and substantia nigra pars reticulata (SNr), and the GPi and SNr send axons to motor nuclei of the thalamus. The net effect of direct-pathway activity is a disinhibition of excitatory thalamocortical projections, leading to activation of cortical premotor circuits and the facilitation of movement.
The indirect-pathway circuit originates from striatopallidal MSNs, which inhibit neurons in the globus pallidus (GP), which in turn project to glutamatergic neurons in the subthalamic nucleus (STN). Subthalamic neurons send axons to basal ganglia output nuclei (GPi and SNr), where they form excitatory synapses on the inhibitory output neurons. The net effect of indirect-pathway activity is an inhibition of
thalamocortical projection neurons, which would reduce cortical premotor drive and inhibit movement.
3.2. Cortico-striatal circuits involved in decision making. Traditionally, the
striatum has been divided into dorsal and ventral subregions. The dorsal subregion
18
contains the dorsolateral striatum (DLS) and dorsomedial striatum (DMS). The ventral subregion contains the nucleus accumbens (NA), which itself consists of core and shell subregions (Alexander, DeLong, & Strick, 1986; Groenewegen, Berendse, Wolters, & Lohman, 1991; Zahm, 2000). The cortical inputs to striatum are
topographically organized, with limbic and ventral prefrontal regions projecting to the ventral striatum, sensorimotor cortical regions projecting to the DLS and association areas of the prefrontal cortex projecting to the DMS (Alexander et al., 1986;
Groenewegen et al., 1991). The connectivity between cortico-striatal regions has lead to the idea that cortico-basal-ganglia loop are corresponded to functional circuits that mediate distinct components of behavior. And researches focused on the different subregions of striatum somehow confirmed this point of view.
1. DMS: Local blockade of NMDA receptors and lesion studies all showed that DMS is crucial for the acquisition and expression of goal-directed actions
(Gremel & Costa, 2013; Yin et al., 2005; Yin & Knowlton, 2004, 2006; Yin et al., 2005). However, some researchers found that the DMS may not support effort- and reward-related decision making but the flexibility of spatially guided behavior (Braun & Hauber, 2011; Ragozzino, Jih, & Tzavos, 2002; Ragozzino, Ragozzino, Mizumori, & Kesner, 2002; Ragozzino, 2007).
19
2. DLS: For DLS, almost all studies confirmed it crucial to habit formation (Gremel
& Costa, 2013; Yin & Knowlton, 2004, 2006).
3. NA: Previous studies demonstrated that the NA plays an important role on the acquisition and reversal of instrumental contingencies (Annett, McGregor, &
Robbins, 1989; Balleine & Killcross, 1994; Taghzouti, Louilot, Herman, Le Moal, & Simon, 1985), while others found that lesions of NA did not disrupt reversal performance in a go-no go odor discrimination paradigm (Schoenbaum
& Setlow, 2003) and in a delayed matching task (Burk & Mair, 2001). In sum, studies investigating the contribution of the NA in reversal learning are
controversial. On the other hand, there is evidence for the participation of the NA, and in particular its core sub-region, in behavioral flexibility involving changes in strategies or rules (Floresco, Ghods-Sharifi, Vexelman, & Magyar, 2006;
Haluk & Floresco, 2009). Also, NA was described as having a role in the expression of conditioned emotional responses to cues and contexts associated with appetitive (or aversive) events (Belin, Jonkman, Dickinson, Robbins, &
Everitt, 2009; Day & Carelli, 2007).
Despite the inconsistency, Shiflett and Balleine cnocluded the previous findings on rodents and proposed a cortico-striatal circuits involved in decision making process
20
(Shiflett & Balleine, 2011). According to the previous defined subregions, there are three pathways:
1. The dorsomedial striatum, also known as “associative striatum” in primates, which receives inputs from association areas of the prefrontal cortex is implicated in goal-directed behavior (i.e. reward –related actions) in rodents.
2. The dorsolateral striatum, a part of the sensorimotor striatum in primates, is related to habit learning (i.e. stimulus-response bound actions) in rodents.
3. The nucleus accumbens (NA) is implicated in representing predicted future reward, and the representations can be used to guide both goal-directed and habitual actions.
Furthermore, the basal ganglia contain intrinsic feedforward and feedback circuits that may be crucial for striatal function. In particular, bidirectional
connections of striatum and midbrain through the SN-VTA complex have been found to connect neighboring striatal regions. This spiraling architecture links NA to the DMS, and the DMS to the DLS (Haber, Fudge, & McFarland, 2000). Also, as
previously mentioned, the interneurons in the striatum may also contribute to connect neighboring striatal subregions. these connections may enable striatal subregions to
21
work cooperatively to support the transition from goal-directed to habitual behavior, as well as enable information of predictied reward (from NA) to influence action control mediated by dorsal striatum (Ito & Doya, 2011; Yin, Ostlund, & Balleine, 2008).