Excellent Substack writeup by Patrick Mineault on how cell types may specify innate behaviors and why mapping regions of the brain specialized to steer innate behaviors (via lots of distinct cell types) could lead us to more aligned AI systems. Highly convincing and elegant arguments made here! [ https://substack.com/home/post/p-189321289](https://substack.com/home/post/p-189321289)
Dwarkesh seemed very confused by this, asking a few different times: “Why would each reward function need a different cell type?” I empathize with Dwarkesh here! It is mysterious that a cell type could represent something as abstract as a reward. As a computational neuroscientist who mostly worked at the representation level during my PhD, I’ve leaned historically towards thinking of cell types as a mere “implementation detail”. But over conversations with Adam, Steve Byrnes, Paul Cisek, Tony Zador, and a few others, I’ve started to become convinced that cell types are a really useful lens to think about innate behaviors and rewards.
In this essay, I’ll unpack the conversation and answer the question: what do cell types have to do with reward functions? To answer it, we’ll need to understand what kind of information can be encoded in the genome, and how that information ultimately relates to connectomes and to cell types. I’ll connect the answer to the central claim of Adam: that these connections matter for AI, and AI safety in particular.
Andrew Barto and colleagues make the point that all primary rewards are internal, and must be genetically encoded. In reinforcement learning, which Barto co-developed along with Rich Sutton, an agent learns by receiving reward signals that indicate what is good and bad. The critical insight is that for biological organisms, all of these reward signals are internal —they are generated by the organism’s own nervous system. It is not a chunk of steak that gives reward: it is circuitry inside the brain that assigns positive valence to fat, salt, umami, heat, and texture. Things like money—secondary rewards—must be bootstrapped off of the pre-existing primary rewards.










