Prediction of neural activity in connectome-constrained recurrent networks - Nature Neuroscience


Prediction of neural activity in connectome-constrained recurrent networks - Nature Neuroscience

We found that training a connectome-constrained student network to generate the task-related readout of the teacher does not always produce consistent dynamics in the teacher and student. Multiple combinations of single-neuron parameters, each producing different activity patterns, can equivalently solve the same task. However, when connectivity constraints are combined with recordings of the activity of a subset of neurons, this degeneracy is broken. The minimum number of recordings depends on the dimensionality of the network dynamics, not the total number of neurons. This contrasts with student networks whose connectivity is unconstrained, which always display degenerate solutions. Interestingly, even when neural activity is well reconstructed, single-neuron parameters are often not recovered accurately, suggesting that some combinations of parameters are 'stiff', with strong effects on neural dynamics, whereas others are 'sloppy', with weak effects. Our qualitative predictions hold across a variety of simulated networks and networks constrained by true connectomes from invertebrates and vertebrates. Our theory can also rank neurons that should be recorded with higher priority to maximally reduce uncertainty in network activity, suggesting approaches that iteratively refine network models using neural recordings.

To explore how a connectome constrains the solutions of neural network models, we studied a teacher-student paradigm: a recurrent neural network (RNN) that we call the teacher is constructed, and the parameters of a student RNN are adjusted to mimic this teacher. The teacher is used as a proxy for a neural system whose connectome has been mapped and whose output or neural activity can be recorded. To develop our theory, we will begin by examining synthetic teacher networks whose activity and function we specify. Later, we will consider teacher networks derived from empirical connectome data.

Both teacher and student are composed of N firing rate neurons, in which the activity of neuron i is described by a continuous variable r(t) (see Methods for details). The activity is a nonlinear function, which we call the activation function, of the input current x(t) received by the neuron and depends on a set of single-neuron parameters. For instance, if we describe this function using parameters g and b for neuron i's gain and bias, its activity is given by

where ϕ is a nonlinear function. The network dynamics follow:

where J is the synaptic weight from neuron j to neuron i, and I(t) is the time-varying external input received by neuron i. For connectome-constrained networks, we begin by assuming that both the presence or absence of a connection between neurons as well as the strengths of these connections are known, and, thus, J is the same for both teacher and student. Additionally, we assume that the external inputs and initial state x(t = 0) are the same for teacher and student (Discussion).

Note that the number of unconstrained parameters in the student network scales differently depending on whether single-neuron parameters or connectivity parameters are fixed. There are N free synaptic weight parameters if the connectivity is unspecified, as in previous studies of teacher-student paradigms. On the other hand, for connectome-constrained networks, the number of unconstrained parameters is proportional to N. For example, when we parameterize the activation functions of neurons with gains and biases, as in equation (1), there are 2N unknowns.

We first asked whether teacher and student networks that share the same synaptic weight matrix exhibit consistent solutions when the student is trained to reproduce a task performed by the teacher (Fig. 1). Because we are interested in whether connectivity constraints yield mechanistic models of the teacher, we measure the consistency of solutions using the similarity of the activity of neurons in the teacher and those same neurons in the student. Such a direct comparison is possible because the connectome uniquely identifies each individual neuron. We also measure the similarity of teacher and student single-neuron parameters. We refer to the dissimilarity between teacher and student activities or parameters as the 'error' associated with each respective quantity. We note that our notion of similarity between teacher and student is more precise than requiring similarity of collective dynamics as measured through dimensionality reduction methods, such as principal component analysis. Indeed, matching such dynamics can be accomplished by recording a small number of neurons without access to a connectome.

We built a teacher network that performs a flexible sensorimotor task. Specifically, the network implements a variant of the cycling task, which requires the production of oscillatory responses of different durations in response to transient sensory cues (Fig. 1a and Methods). In the network, firing rates are a non-negative smooth function of the input currents, and the unknown single-neuron parameters are the gains and biases (Fig. 1b, left). The synaptic weight matrix is sparse, and neurons are either excitatory or inhibitory (Fig. 1b, right).

We trained multiple students to generate the same readout as the teacher. Each student is initialized with different gains and biases before being trained via gradient descent. Trained networks successfully reproduce the teacher's readout (Fig. 1c,f). However, the error in the neural activity of the student, compared to the teacher, increases over training epochs (Fig. 1d). As a baseline, we computed the error of a student whose neurons match the activities of all neurons in the teacher but with shuffled identities (gray line in Fig. 1d). In this baseline, the manifold of neural activity is the same in teacher and student but not the activity of single neurons. In all networks, the error in activity after training remains above this baseline, indicating that training does not produce a correspondence between the function of individual teacher and student neurons. Examining the activities of individual neurons shows that neuronal dynamics across different student networks are highly variable, and all students differ from the teacher (Fig. 1g).

Finally, we examined the error in single-neuron parameters between teacher and student (Fig. 1e). The error in gains varies little over training and is similar to a randomly shuffled baseline. The error in biases grows slightly but remains within the same order of magnitude as the baseline.

We conclude that knowledge of synaptic weights and task output is not always enough to predict the activity of single neurons in recurrent networks. For the task we considered, there is a degenerate space of solutions, with different combinations of single-neuron gains and biases, that solve the same task. There may be scenarios for which this degeneracy is reduced, such as small networks optimized for highly specific functions or networks trained on complex or high-dimensional task spaces (Discussion). Nonetheless, our results show that, even with N connectivity constraints, task-optimized neural dynamics are, in general, highly heterogeneous.

We next asked whether these conclusions change if, instead of recording only task-related readout activity, we record the activity of a subset of neurons in the teacher network. We use M ≤ N to denote the number of recorded neurons. Students are trained to reproduce this recorded activity, which provides additional constraints on the solution space (Fig. 2a,b). The recording of subsampled activity in the teacher is analogous to neural recordings in imaging or electrophysiology studies, where only a subset of neurons is registered. We trained two types of student networks: students that have access to the teacher connectome and students that are not constrained in connectivity. For connectome-constrained students (Fig. 2c,e), single-neuron parameters of both recorded and unrecorded neurons are unknown and, therefore, trained. For students with unconstrained connectivity, synaptic weights are trained instead. In this case, the single-neuron parameters of the student are set equal to those of the teacher so that the networks differ only in synaptic weights (Fig. 2d,f). Additionally, because there is no direct map between unrecorded neurons in the teacher and the student when the connectome is not known, after training we searched for the mapping between student and teacher neurons that minimizes the mismatch in unrecorded activity at each training epoch (Methods).

We found that both connectome-constrained and unconstrained students are able to mimic the activity of the M recorded teacher neurons with small errors (Fig. 2b; the teacher has N = 300 neurons). We then asked whether this holds for the unrecorded neurons. When the connectivity is provided (Fig. 2c), the error for unrecorded neurons is reduced to values similar to the error for recorded neurons when more than M* = 30 neurons are recorded (example task outputs are shown in Extended Data Fig. 1). In comparison, when training the synaptic weights (Fig. 2d), unrecorded neuron activities are not recovered substantially better than baseline even when most neurons are recorded. Thus, connectome-constrained, but not unconstrained, networks produce consistent solutions when M is large enough.

We then assessed whether the students' parameters converge to those of the teacher. For connectome-unconstrained students, the error in synaptic weights remains high, for connections between both recorded and unrecorded neurons (Fig. 2f). We may expect this to occur given that the activity of unknown neurons in these networks is not well predicted (Fig. 2d). More surprisingly, errors in the single-neuron parameters of connectome-constrained networks also remain high, even when the activity of unrecorded neurons is well predicted (Fig. 2e). We did not find qualitative differences in the behavior of single-neuron parameters for recorded and unrecorded neurons.

Thus far, we focused on a teacher whose neural activity is primarily generated through recurrent interactions, triggered by brief external pulses. We further explored whether similar results hold in networks driven by a time-varying external input (Extended Data Fig. 1). Additionally, we systematically varied the distributions of gains and connection sparsity (Extended Data Fig. 1). In all these networks, the qualitative dependence of the error on M was unchanged. Nevertheless, the error in unrecorded neural activity prior to training is different in networks with strong inputs or weak recurrent connections. Unlike in Fig. 2, where the error prior to training is similar to a baseline with randomly shuffled neuron identities, the error for strongly input-driven networks lies below this baseline even before training. Thus, although certain features of neural activity may be predictable even with random parameters when the input is known, improving upon this initial baseline through training requires sufficiently many recorded neurons.

In summary, connectome-constrained networks are able to predict the activity of unrecorded neurons when further constrained by the activity of enough recorded neurons. By contrast, networks without a connectome constraint do not predict unrecorded activity. Nevertheless, in all cases, the unknown parameters are not precisely recovered, suggesting that multiple sets of biophysical parameters lead to the same neural activity.

What features of a connectome-constrained RNN determine how many recorded neurons are required to predict unrecorded activity? We considered two alternatives: the required number is a fixed fraction of the total number of neurons in the network or the number is determined by properties of the network dynamics. The former alternative would pose a challenge for large connectome datasets.

To disambiguate these two possibilities, we examined a class of teacher networks whose population dynamics are largely independent of their size N. We generated networks with specific rank-two connectivity that autonomously generate a stable limit cycle (Fig. 3a and Methods). In these networks, the currents received by each neuron oscillate within a two-dimensional linear subspace, independent of N (Fig. 3b).

We found no difference in a plot of error in unrecorded activity against number of recorded neurons M, for networks of different sizes (Fig. 3c), suggesting that accurate predictions can be made when recording from few neurons, even in large networks. Examining more closely the dependence of the error on M, we observed that when M = 1, the student produces oscillatory activity with the same frequency as the teacher, but the activity of unrecorded neurons exhibits consistent errors at particular phases of the oscillation (Extended Data Fig. 2). By contrast, when M = 7, errors in recorded and unrecorded neurons are similarly small.

This led us to hypothesize that the number of recorded neurons required to accurately predict neural activity scales with the dimensionality of the neural dynamics, not the network size. This would explain why networks with widely varying sizes but similar two-dimensional dynamics exhibit similar performance (Fig. 3c). To further test this hypothesis, we studied a setting in which we trained students to mimic another class of teacher networks: strongly coupled random networks (Fig. 3d). In such networks, activity is chaotic, and, unlike low-rank networks (Fig. 3b), the linear dimensionality of the dynamics grows in proportion to N (ref. ), a dependence that we verified for the time windows we considered (Fig. 3e). In this case, the required number of recorded neurons also grows proportionally with N (Fig. 3f). Together, these results suggest that recording from a subset of neurons, on the order of the dimensionality of network activity, is sufficient to predict unrecorded neural activity. Later, we will show that this numerical result is consistent with the predictions of an analytical theory.

Thus far, we have considered teacher and student networks that belong to the same model class of firing rate networks with parameterized activation functions and connectivity. However, models based on experimental data will possess some degree of 'model mismatch' due to unaccounted or incorrectly parameterized biophysical processes. Moreover, errors in synaptic reconstruction and inter-individual variability in connectomes imply that synaptic weight estimates may also be imprecise. In this section, we examine whether our qualitative results hold when teacher and student exhibit model mismatch.

We used the same teacher as in Figs. 1 and 2. To study the case of mismatch in activation function (Fig. 4a), we parameterized the activation function with β, which controls the smoothness of the rectification, and used different values of β for student and teacher (Fig. 4b). Larger mismatch increases the error in both recorded and unrecorded activity (Fig. 4c). The effect is strongest in an extreme case of very small student β, for which very little rectification occurs. This makes it difficult for the student to match even the recorded activity of the teacher (Fig. 4c and Extended Data Fig. 3). Nevertheless, up to a considerable mismatch, there is a steep decrease in the error in unrecorded activity as more neurons are recorded.

Can model mismatch arising from single-neuron properties be compensated by allowing the synaptic weights to be trained, which introduces additional free parameters? We examined a student with activation function mismatch and a synaptic weight matrix that was initialized equal to that of the teacher but then trained (Extended Data Fig. 3). This performed worse than training single-neuron parameters, arguing against the feasibility of this approach. An alternative approach is to increase the number of single-neuron parameters. For instance, when β is trained together with gains and biases, the error in unrecorded activity is similar to the case without mismatch (Extended Data Fig. 3). We conclude that parameterizing uncertainty in activation function is important for dealing with this form of model mismatch.

We next considered mismatch between teacher and student connectomes. To simulate such errors, we added Gaussian noise to the strengths of existing connections and added spurious connections with probability σ (Fig. 4d and Methods). The resulting corrupted synaptic weight matrix was used by the student. Noise in the synaptic weight matrix shifts its eigenvalues (Fig. 4e) and modifies the corresponding eigenvectors. Trained students exhibit smooth increases of the error in recorded and unrecorded activity as this noise is increased (Fig. 4f). However, we again found a steep decrease of the error in unrecorded neural activity with M, suggesting that this qualitative behavior is not overly sensitive to connectome reconstruction errors.

Thus far, we have examined synthetic teachers, whose connectivity statistics and functional properties may differ from those of biological networks. We next study teachers whose synaptic weights are directly determined by empirical connectome datasets. We modeled three neural circuits for which a ground truth connectome is available and whose function has been characterized: the premotor-motor system in the ventral nerve cord of larval Drosophila, the heading direction system in the central complex of adult Drosophila and the oculomotor neural integrator in the hindbrain of larval zebrafish.

When larval Drosophilae are engaged in forward or backward locomotion, recurrently connected premotor neurons in the ventral nerve cord drive motor neurons to produce appropriately timed muscle activity (Fig. 5a, left). Motor neurons in each body segment are segregated into functional groups whose sequences of activation differ across the two behaviors (Fig. 5a, right). A previous study showed that a connectome-constrained RNN recapitulates features of motor and premotor neuron activity when trained to produce such sequences in the A1 and A2 body segments. We used such a model as a connectome-constrained teacher, whose 178 premotor neurons produce appropriately timed activity in 52 motor neurons (Methods). Student networks comprising the premotor circuitry were then trained to approximate recorded teacher activity. We found that the error in unrecorded activity is reduced when approximately 10 neurons are recorded (Fig. 5b). When few neurons are recorded, the error in activity is similar to a network with randomly chosen single-neuron parameters (two recorded neurons; Fig. 5c, left). Recording from more neurons dramatically improves the prediction (Fig. 5c, right), which is qualitatively similar to the results of the synthetic teacher network (Fig. 2c).

Next, we studied the heading direction system in the central complex of adult Drosophila. This system has been the subject of numerous recent theoretical analyses, most of which examined models with idealized connectivity rather than directly incorporating connectome data. We modeled a circuit reconstructed in the hemibrain dataset comprising 153 neurons grouped into four cell types: the putatively excitatory EPG, PEN and PEG neurons and the putatively inhibitory Δ7 neurons (Fig. 5d). The 46 EPG neurons encode heading orientation and are arranged along a ring in the ellipsoid body based on their angular tuning. Recurrent connections among EPG neurons and other cell types form a stable 'bump' of neural activity representing heading angle, consistent with 'ring attractor' dynamical models. We, therefore, constructed a teacher network in which EPG neurons maintained a bump representing a heading encoded by a brief stimulus (Fig. 5d and Methods). Student networks without access to recordings generated neural activity different from the teacher (Fig. 5e, black dots). In particular, these students did not behave as ring attractors, demonstrating that the central complex connectivity alone does not guarantee stable attractor dynamics (Fig. 5f). However, recording from a handful of neurons was enough to place the system in the correct dynamical regime and accurately predict the activity of unrecorded neurons (Fig. 5f).

Finally, we studied the oculomotor integrator in the hindbrain of larval zebrafish. This system persistently tracks eye position by integrating eye motor commands. The integration is supported by strong recurrent connections that produce a 'line attractor' in neural activity space. Such dynamics were previously modeled with a connectome-constrained linear RNN (Fig. 5g and Methods). We used this network as the teacher and then trained the gains of student networks with the same synaptic weights. Although a random initialization of gain parameters did not produce the slow timescale necessary for accurate integration, recording from a few neurons substantially reduced the error in activity (Fig. 5h). This is consistent with the results of Vishwanathan et al., who adjusted a global gain parameter to produce a slow timescale.

The weight matrices of empirical connectomes and synthetic teacher-student networks (Figs. 2 and 3) may exhibit statistical differences due to the level of sparsity, heterogeneity in the number and strength of synaptic connections and other higher-order structure. However, in each of these examples, the qualitative phenomena present in synthetic teacher-student networks are recapitulated. Recording from a number of neurons determined by the dimensionality of the teacher activity (Extended Data Fig. 4) -- a handful for the one-dimensional line attractor or two-dimensional ring attractor dynamics and approximately 10 for more complex sequential activity -- produces consistent dynamics between teacher and student.

We developed an analytic theory of our connectome-constrained teacher-student paradigm. The theory aims to explain, first, how the teacher and student produce the same activity despite different single-neuron parameters and, second, the conditions under which the student's activity converges to that of the teacher.

We begin with a simplified linear model and later relax our assumptions: the teacher and student RNNs have linear single-neuron activation functions; the only unknown single-neuron parameters are the biases b; and the synaptic weight matrix J has rank D (Fig. 6a). This rank constraint implies that recurrent neural activity is confined to a D-dimensional subspace of the N-dimensional neural activity space. We focus on the network's steady-state activity at equilibrium, which depends linearly on the biases:

where we have defined .

Although we focus here on equilibrium activity, time-dependent trajectories also yield a linear relation between activity and single-neuron parameters (see Methods for the time-dependent derivation). For the same reason, we also assume no external input to each neuron (I(t) = 0). This linear relation between single-neuron parameters and activity, which underpins the mathematical tractability of the simplified model, is a consequence of the linear network dynamics and the additive influence of the bias parameters. Choosing multiplicative gains as the unknown single-neuron parameters, for instance, would produce a nonlinear relation.

The student is trained using gradient descent updates to the single-neuron parameters. In the limit of small learning rate η, the learning trajectory in parameter space can be expressed in continuous time (with proportional to training epoch) as:

where M is the number of recorded neurons. Using these learning dynamics, we can analytically calculate the expected error in recorded and unrecorded activity and in single-neuron parameters (Fig. 6c and Methods). This reveals a transition to zero error in the activity of unrecorded neurons when M = D, the rank of the synaptic weight matrix (Fig. 6c, gray line). There are, however, large errors in single-neuron parameters (Fig. 6c, red line) even when the activity of the full network is accurately recovered.

To understand these results, we analyzed the properties of the loss function, which describes how the difference in activity between teacher and student depends on single-neuron parameters. We differentiate the loss function for the full network, which is determined by errors in both recorded and unrecorded neural activity, from the loss function for the recorded neurons, which is the function optimized during training. These loss functions are convex, as illustrated in Fig. 6d. The minima are surrounded by a valley-shaped region of low loss (Fig. 6d, right). We refer to directions for which the loss changes quickly or slowly as 'stiff' or 'sloppy' parameter modes, respectively. Stiff modes both have the greatest effect on the loss and are learned most quickly. Each mode's degree of stiffness is determined by the corresponding singular value of the matrix A (Methods). A mode is infinitely sloppy when its associated singular value is zero, implying that parameter differences between teacher and student along that mode produce no differences in neural dynamics.

The parameter modes that affect the recorded activities and, thus, the loss function for the M recorded neurons are determined by A (the submatrix of A containing the rows corresponding to these neurons), whose stiff and sloppy modes are generally different from those of the fully sampled matrix A (Fig. 6e versus Fig. 6f). Recording from a subset of neurons introduces additional modes with zero singular value when M < D, because A has, at most, M non-zero singular values. The stiff modes of the loss function of the recorded activity will also, typically, not be fully aligned with those of the fully sampled system (Fig. 6f, inset), leading to errors in prediction.

To illustrate these results, we plotted the error in single-neuron activity and biases for M below and above the critical number D (Fig. 6g,h). When recording from few neurons, the error in these parameters for unrecorded neurons remains high (Fig. 6g, left). The error in biases quickly converges to a small value along the stiffest mode, whereas it barely changes for sloppy modes (Fig. 6g, right). The stiffest mode of the subsampled network is not completely aligned with the stiffest mode of the fully sampled network, explaining why it converges to a small but non-zero value. Only when more neurons are recorded does the error in unrecorded activity, and along the stiffest parameter mode, converge to zero (Fig. 6h, left).

The simplified model demonstrates that specific patterns of single-neuron parameters determine the error between teacher and student. Stiff parameter modes are learned, whereas sloppy modes are not. The number of stiff parameter modes is bounded by the rank of J and, thus, the dimensionality of neural activity. Recording from increasingly many neurons provides increasingly many constraints on this activity. When enough neurons are sampled, the stiff modes for the M recorded neurons align with the stiff modes for the full network, leading to correct prediction of unrecorded activity. These conclusions do not rely on the choice of gradient descent as a learning algorithm, as an analysis using linear system identification methods yields the same conclusions (Supplementary Note). We also note that we have assumed here that the loss function is determined by the difference in recorded neural activity. However, similar conclusions would be reached if it were determined by other linear projections of activity, such as projections onto task-related dimensions (Discussion).

We next generalized our theory to nonlinear networks. To facilitate analysis, we studied a class of low-rank RNNs whose activity can be understood analytically. We focused on a teacher network with N = 1,000 neurons and a nonlinear, bounded activation function. Each neuron is parameterized only by the gain parameter. We designed the network's synaptic weight matrix to be rank-two, with two different subpopulations. For this network, there are only two stiff parameter modes: the average single-neuron gain for each subpopulation (Fig. 7a). We set the weights of the teacher to generate two different pairs of non-trivial fixed points, and we recorded activity as the neural dynamics approached one of these fixed points (Fig. 7b).

Because the parameter space is two dimensional, we can visualize the loss function for the full network across a grid of parameters (Fig. 7c). The function has a single minimum, similar to the linear model. However, due to the nonlinearity, the function is non-convex (contour lines are not convex in Fig. 7c), and the curvature for parameter values away from the global minimum is different than at the minimum. Despite this non-convexity, gradient descent on this fully sampled loss function will still approach the single minimum.

We next visualized the loss function for the activity of one recorded neuron (Fig. 7d). We repeated this for two different choices of the single recorded neuron, each of which exhibited distinct dynamics (black lines in Fig. 7b). For these loss functions, there is an additional sloppy mode that is not present in the fully sampled loss (black valleys in Fig. 7d). These results are similar to those of the linear case, although due to the nonlinearity, the sloppy modes correspond to curved regions in parameter space.

The sloppy mode is different for each of the two recorded neurons. When running gradient descent on these subsampled loss functions, randomly initialized parameter values will evolve toward the dark regions of Fig. 7d -- for example, toward the blue dot when neuron 1 is sampled or toward the red dot when neuron 2 is sampled. However, both of these two solutions produce high error in unrecorded activity (Fig. 7e,f). This mismatch in unrecorded activity occurs because recording from a single neuron constrains activity along only one dimension of the two-dimensional activity space defined by the rank-two synaptic weight matrix (Fig. 6e).

To test whether the same insights also apply to nonlinear networks with high-dimensional parameter spaces, we computed the stiff and sloppy modes of the fully sampled loss function in the network of Figs. 1 and 2. We approximated the loss function in parameter space to second order at the optimum. We then projected the average error in parameter space, before and after training, along the estimated stiff and sloppy modes (Fig. 7g,h). When few neurons are recorded, the average changes in parameter space before and after training are not aligned with the stiff modes of the loss function. However, when recording from many neurons, there is a large decrease in error along the estimated stiff modes while the error along sloppy modes barely changes, as predicted by our theory. Thus, a second-order approximation of the non-convex loss function qualitatively describes the behavior of gradient descent. Some other effects of non-convexity, however, cannot be explained by the linear theory -- for instance, the growth in errors in the bias parameters over the course of training (Figs. 1e and 2e).

We conclude that the qualitative behavior of the linear model holds for the nonlinear networks studied in previous sections. Specifically, when the loss function is determined by recordings of a small number of neurons, the parameter modes become sloppier on average, and new sloppy parameter modes are added that do not align with those of the fully sampled loss function.

Thus far, recorded neurons have been selected randomly from the teacher network. As we have seen, different sets of recorded neurons define different loss functions and gradient descent dynamics, suggesting the possibility of selecting recorded neurons to minimize the expected error in unrecorded activity (Fig. 8a). Specifically, we aim to select recorded neurons to maximize the alignment of stiff modes of the subsampled loss and those of the fully sampled loss function.

In the simplified linear model, subsampling neurons corresponds to selecting rows of the matrix A that relates single-neuron parameters to activity (Fig. 8b). In this case, it is possible to exactly determine which neurons are most informative to record. The most informative neuron i is the one whose corresponding row A overlaps most with the weighted left singular vectors of A (Methods). The second most informative neuron is the one whose row overlaps most with the weighted left singular vectors of A projected onto the space orthogonal to the previously selected neuron's row and so on. It is also possible to define the least informative sequence of recorded neurons by minimizing rather than maximizing these overlaps. We compared the error in unrecorded activity for the most and least informative sequence of selected neurons as well as random selection, finding that the optimal strategy indeed improves the efficiency of training (Fig. 8d).

For nonlinear networks, the mapping between parameters and network activity is also nonlinear and depends on the unknown parameters of the teacher (Methods). As a result, the globally optimal sequence cannot be determined a priori. Nevertheless, the mapping between parameters and activity can be linearized based on an initial guess of the single-neuron parameters and then iteratively refined. In practice, we found that linearization works well for nonlinear networks, with the optimal selection strategy dramatically reducing the error compared to random selection, especially when there are few recorded neurons. For the network studied in Fig. 2, the error using the best 10 predicted neurons is 60% smaller than random selection (Fig. 8g).

The singular vectors used to determine which neurons are most informative depend on the global connectivity structure and cannot be exactly reduced to any single-neuron property. Such properties, including in-degree, out-degree, average synaptic strength or neuron firing rate, may be correlated with the singular value decomposition score developed here but are not guaranteed to be good proxies for informativeness. This argues for the use of models like those studied here to guide the selection of recorded neurons.

Previous articleNext article

POPULAR CATEGORY

misc

16558

entertainment

17579

corporate

14547

research

8914

wellness

14424

athletics

18449