Transcript
00:00:00 The following is a conversation with Manolis Kellis, his third time on the podcast.
00:00:05 He is a professor at MIT and head of the MIT Computational Biology Group.
00:00:11 This time we went deep on the science, biology, and genetics.
00:00:17 So this is a bit of an experiment.
00:00:19 Manolis went back and forth between the basics of biology to the latest state of the art
00:00:25 in the research.
00:00:26 He’s a master at this, so I just sat back and enjoyed the ride.
00:00:31 This conversation happened at 7am, so it’s yet another podcast episode after an all nighter
00:00:37 for me.
00:00:38 And once again, since the universe has a sense of humor, this one was a tough one for my
00:00:44 brain to keep up, but I did my best and I never shy away from a good challenge.
00:00:50 Quick mention of each sponsor, followed by some thoughts related to the episode.
00:00:55 First is SEMrush, the most advanced SEO optimization tool I’ve ever come across.
00:01:02 I don’t like looking at numbers, but someone probably should, it helps you make good decisions.
00:01:08 Second is Pessimist Archive, they’re back, one of my favorite history podcasts on why
00:01:13 people resist new things from recorded music to umbrellas to cars, chess, coffee, and the
00:01:20 elevator.
00:01:22 Third is 8sleep, a mattress that cools itself, measures heart rate variability, has an app,
00:01:28 and has given me yet another reason to look forward to sleep, including the all important
00:01:33 power nap.
00:01:34 And finally, BetterHelp, online therapy when you want to face your demons with a licensed
00:01:40 professional, not just by doing the David Goggins like physical challenges like I seem
00:01:45 to do on occasion.
00:01:47 Please check out these sponsors in the description to get a discount and to support this podcast.
00:01:54 As a side note, let me say that biology in the brain and in the various systems of the
00:01:59 body fill me with awe every time I think about how such a chaotic mess coming from its humble
00:02:05 origins in the ocean was able to achieve such incredibly complex and robust mechanisms of
00:02:11 life that survived despite all the forces of nature that want to destroy it.
00:02:17 It is so unlike the computing systems we humans have engineered that it makes me feel that
00:02:22 in order to create artificial general intelligence and artificial consciousness, we may have
00:02:28 to completely rethink how we engineer computational systems.
00:02:33 If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast, follow
00:02:38 on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
00:02:44 And now, here’s my conversation with Manolis Callas.
00:02:49 So your group at MIT is trying to understand the molecular basis of human disease.
00:02:54 What are some of the biggest challenges in your view?
00:02:57 Don’t get me started.
00:02:58 I mean, understanding human disease is the most complex challenge in modern science.
00:03:06 So because human disease is as complex as the human genome, it is as complex as the
00:03:13 human brain, and it is in many ways, even more complex because the more we understand
00:03:20 disease complexity, the more we start understanding genome complexity and epigenome complexity
00:03:27 and brain circuitry complexity and immune system complexity and cancer complexity and
00:03:31 so on and so forth.
00:03:32 So traditionally, human disease was following basic biology.
00:03:39 You would basically understand basic biology in model organisms like, you know, mouse and
00:03:44 fly and yeast.
00:03:46 You would understand sort of mammalian biology and animal biology and eukaryotic biology
00:03:53 in sort of progressive layers of complexity, getting closer to human phylogenetically.
00:03:59 And you would do perturbation experiments in those species to see if I knock out a gene,
00:04:06 what happens?
00:04:07 And based on the knocking out of these genes, you would basically then have a way to drive
00:04:12 human biology because you would sort of understand the functions of these genes.
00:04:16 And then if you find that a human gene locus, something that you’ve mapped from human genetics
00:04:23 to that gene is related to a particular human disease, you’d say, aha, now I know the function
00:04:28 of the gene from the model organisms.
00:04:31 I can now go and understand the function of that gene in human.
00:04:37 But this is all changing.
00:04:38 This is dramatically changed.
00:04:39 So that was the old way of doing basic biology.
00:04:41 You would start with the animal models, the eukaryotic models, the mammalian models, and
00:04:46 then you would go to human.
00:04:48 Human genetics has been so transformed in the last decade or two that human genetics
00:04:55 is now actually driving the basic biology.
00:04:58 There is more genetic mutation information in the human genome than there will ever be
00:05:04 in any other species.
00:05:06 What do you mean by mutation information?
00:05:08 So perturbations is how you understand systems.
00:05:11 So an engineer builds systems and then they know how they work from the inside out.
00:05:16 A scientist studies systems through perturbations.
00:05:20 You basically say, if I poke that balloon, what’s going to happen?
00:05:23 And I’m going to film it in super high resolution, understand, I don’t know, aerodynamics or
00:05:26 fluid dynamics if it’s filled with water, et cetera.
00:05:28 So you can then make experimentation by perturbation and then the scientific process is sort of
00:05:33 building models that best fit the data, designing new experiments that best test your models
00:05:41 and challenge your models and so on and so forth.
00:05:43 This is the same thing with science.
00:05:44 Basically if you’re trying to understand biological science, you basically want to do perturbations
00:05:49 that then drive the models.
00:05:54 So how do these perturbations allow you to understand disease?
00:05:58 So if you know that a gene is related to disease, you don’t want to just know that it’s related
00:06:04 to the disease.
00:06:05 You want to know what is the disease mechanism because you want to go and intervene.
00:06:09 So the way that I like to describe it is that traditionally epidemiology, which is basically
00:06:17 the study of disease, you know, sort of the observational study of disease has been about
00:06:23 correlating one thing with another thing.
00:06:25 So if you have a lot of people with liver disease who are also alcoholics, you might
00:06:29 say, well, maybe the alcoholism is driving the liver disease or maybe those who have
00:06:34 liver disease self medicate with alcohol.
00:06:36 So the connection could be either way.
00:06:40 With genetic epidemiology, it’s about correlating changes in genome with phenotypic differences
00:06:47 and then you know the direction of causality.
00:06:50 So if you know that a particular gene is related to the disease, you can basically say, okay,
00:06:58 perturbing that gene in mouse causes the mice to have X phenotype.
00:07:03 So perturbing that gene in human causes the humans to have the disease.
00:07:08 So I can now figure out what are the detailed molecular phenotypes in the human that are
00:07:14 related to that organismal phenotype in the disease.
00:07:18 So it’s all about understanding disease mechanism, understanding what are the pathways, what
00:07:22 are the tissues, what are the processes that are associated with the disease so that we
00:07:27 know how to intervene.
00:07:29 You can then prescribe particular medications that also alter these processes.
00:07:33 You can prescribe lifestyle changes that also affect these processes and so on and so forth.
00:07:37 That’s such a beautiful puzzle to try to solve.
00:07:41 Like what kind of perturbations eventually have this ripple effect that leads to disease
00:07:45 across the population.
00:07:46 And then you study that for animals or mice first and then see how that might possibly
00:07:51 connect to humans.
00:07:54 How hard is that puzzle of trying to figure out how little perturbations might lead to,
00:08:01 in a stable way, to a disease?
00:08:04 In animals, we make the puzzle simpler because we perturb one gene at a time.
00:08:11 That’s the beauty of this, the power of animal models.
00:08:13 You can basically decouple the perturbations.
00:08:15 You only do one perturbation and you only do strong perturbations at a time.
00:08:21 In human, the puzzle is incredibly complex because obviously you don’t do human experimentation.
00:08:28 You wait for natural selection and natural genetic variation to basically do its own
00:08:34 experiments, which it has been doing for hundreds and thousands of years in the human population
00:08:40 and for hundreds of thousands of years across the history leading to the human population.
00:08:49 So you basically take this natural genetic variation that we all carry within us.
00:08:54 Every one of us carries 6 million perturbations.
00:08:58 So I’ve done 6 million experiments on you, 6 million experiments on me, 6 million experiments
00:09:02 on every one of 7 billion people on the planet.
00:09:06 What’s the 6 million correspond to?
00:09:08 6 million unique genetic variants that are segregating in the human population.
00:09:14 Every one of us carries millions of polymorphic sites, poly, many, morph, forms.
00:09:22 Polymorphic means many forms, variants.
00:09:25 That basically means that every one of us has single nucleotide alterations that we
00:09:29 have inherited from mom and from dad that basically can be thought of as tiny little
00:09:34 perturbations.
00:09:36 Most of them don’t do anything, but some of them lead to all of the phenotypic differences
00:09:42 that we see between us.
00:09:43 The reason why two twins are identical is because these variants completely determine
00:09:48 the way that I’m going to look at exactly 93 years of age.
00:09:52 How happy are you with this kind of data set?
00:09:54 Is it large enough of the human population of Earth?
00:09:59 Is that too big, too small?
00:10:01 Yeah, so is it large enough is a power analysis question.
00:10:07 In every one of our grants, we do a power analysis based on what is the effect size
00:10:11 that I would like to detect and what is the natural variation in the two forms.
00:10:19 Every time you do a perturbation, you’re asking, I’m changing form A into form B. Form A has
00:10:25 some natural phenotypic variation around it and form B has some natural phenotypic variation
00:10:30 around it.
00:10:31 If those variances are large and the differences between the mean of A and the mean of B are
00:10:36 small, then you have very little power.
00:10:38 The further the means go apart, that’s the effect size, the more power you have, and
00:10:44 the smaller the standard deviation, the more power you have.
00:10:48 So basically when you’re asking, is that sufficiently large, certainly not for everything, but we
00:10:54 already have enough power for many of the stronger effects in the more tight distributions.
00:11:01 So that’s the hopeful message that there exists parts of the genome that have a strong effect
00:11:09 that has a small variance.
00:11:13 That’s exactly right.
00:11:14 Unfortunately, those perturbations are the basis of disease in many cases.
00:11:18 So it’s not a hopeful message.
00:11:20 Sometimes it’s a terrible message.
00:11:22 It’s basically, well, some people are sick, but if we can figure out what are these contributors
00:11:27 to sickness, we can then help make them better and help many other people better who don’t
00:11:32 carry that exact mutation, but who carry mutations on the same pathways.
00:11:38 And that’s what we like to call the allelic series of a gene.
00:11:42 You basically have many perturbations of the same gene in different people, each with a
00:11:49 different frequency in the human population and each with a different effect on the individual
00:11:55 that carries them.
00:11:56 So you said in the past there would be these small experiments on perturbations and animal
00:12:03 models.
00:12:04 What does this puzzle solving process look like today?
00:12:08 So we basically have something like 7 billion people in the planet and every one of them
00:12:13 carries something like 6 million mutations.
00:12:16 You basically have an enormous matrix of genotype by phenotype by systematically measuring the
00:12:25 phenotype of these individuals.
00:12:27 And the traditional way of measuring this phenotype has been to look at one trait at
00:12:32 a time.
00:12:33 You would gather families and you would sort of paint the pedigrees of a strong effect,
00:12:40 what we like to call Mendelian mutation, so a mutation that gets transmitted in a dominant
00:12:47 or a recessive, but strong effect form where basically one locus plays a very big role
00:12:53 in that disease.
00:12:54 And you could then look at carriers versus non carriers in one family, carriers versus
00:12:59 non carriers in another family and do that for hundreds, sometimes thousands of families
00:13:04 and then trace these inheritance patterns and then figure out what is the gene that
00:13:08 plays that role.
00:13:09 Is this the matrix that you’re showing in talks or lectures?
00:13:14 So that matrix is the input to those stuff that I show in talks.
00:13:21 So basically that matrix has traditionally been strong effect genes.
00:13:24 What the matrix looks like now is instead of pedigrees, instead of families, you basically
00:13:29 have thousands and sometimes hundreds of thousands of unrelated individuals, each with all of
00:13:36 their genetic variants and each with their phenotype, for example, height or lipids or,
00:13:43 you know, whether they’re sick or not for a particular trait.
00:13:48 That has been the modern view instead of going to families, going to unrelated individuals
00:13:53 with one phenotype at a time.
00:13:55 And what we’re doing now as we’re maturing in all of these sciences is that we’re doing
00:14:00 this in the context of large medical systems or enormous cohorts that are very well phenotyped
00:14:07 across hundreds of phenotypes, sometimes with our complete electronic health record.
00:14:13 So you can now start relating not just one gene segregating one family, not just thousands
00:14:19 of variants segregating with one phenotype, but now you can do millions of variants versus
00:14:25 hundreds of phenotypes.
00:14:27 And as a computer scientist, I mean, deconvolving that matrix, partitioning it into the layers
00:14:33 of biology that are associated with every one of these elements is a dream come true.
00:14:40 It’s like the world’s greatest puzzle.
00:14:42 And you can now solve that puzzle by throwing in more and more knowledge about the function
00:14:50 of different genomic regions and how these functions are changed across tissues and in
00:14:56 the context of disease.
00:14:58 And that’s what my group and many other groups are doing.
00:15:00 We’re trying to systematically relate this genetic variation with molecular variation
00:15:05 at the expression level of the genes, at the epigenomic level of the gene regulatory circuitry,
00:15:12 and at the cellular level of what are the functions that are happening in those cells,
00:15:17 at the single cell level using single cell profiling, and then relate all that vast amount
00:15:22 of knowledge computationally with the thousands of traits that each of these of thousands
00:15:29 of variants are perturbing.
00:15:30 I mean, this is something we talked about, I think last time.
00:15:34 So there’s these effects at different levels that happen.
00:15:36 You said at a single cell level, you’re trying to see things that happen due to certain perturbations.
00:15:42 And then it’s not just like a puzzle of perturbation and disease.
00:15:49 It’s perturbation then effect at a cellular level, then at an organ level, a body, like,
00:15:57 how do you disassemble this into like what your group is working on?
00:16:02 You’re basically taking a bunch of the hard problems in the space.
00:16:06 How do you break apart a difficult disease and break it apart into problems that you,
00:16:13 into puzzles that you can now start solving?
00:16:15 So there’s a struggle here.
00:16:17 Super scientists love hard puzzles and they’re like, oh, I want to build a method that just
00:16:22 deconvolves the whole thing computationally.
00:16:24 And that’s very tempting and it’s very appealing, but biologists just like to decouple that
00:16:31 complexity experimentally, to just like peel off layers of complexity experimentally.
00:16:36 And that’s what many of these modern tools that my group and others have both developed
00:16:40 and used.
00:16:41 The fact that we can now figure out tricks for peeling off these layers of complexity
00:16:46 by testing one cell type at a time or by testing one cell at a time.
00:16:53 And you could basically say, what is the effect of these genetic variants associated with
00:16:56 Alzheimer’s on human brain?
00:16:59 Human brain sounds like, oh, it’s an organ, of course, just go one organ at a time.
00:17:04 But human brain has of course, dozens of different brain regions and within each of these brain
00:17:09 regions, dozens of different cell types and every single type of neuron, every single
00:17:15 type of glial cell between astrocytes, oligodendrocytes, microglia, between all of the neural cells
00:17:24 and the vascular cells and the immune cells that are co inhabiting the brain between the
00:17:29 different types of excitatory and inhibitory neurons that are sort of interacting with
00:17:34 each other between different layers of neurons in the cortical layers.
00:17:39 Every single one of these has a different type of function to play in cognition, in
00:17:47 interaction with the environment, in maintenance of the brain, in energetic needs, in feeding
00:17:55 the brain with blood, with oxygen, in clearing out the debris that are resulting from the
00:18:01 super high energy production of cognition in humans.
00:18:06 So all of these things are basically potentially deconvolvable computationally, but experimentally,
00:18:17 you can just do single cell profiling of dozens of regions of the brain across hundreds of
00:18:21 individuals across millions of cells.
00:18:24 And then now you have pieces of the puzzle that you can then put back together to understand
00:18:31 that complexity.
00:18:32 I mean, first of all, the cells in the human brain are the most, maybe I’m romanticizing
00:18:39 it, but cognition seems to be very complicated.
00:18:42 So separating into the function, breaking Alzheimer’s down to the cellular level seems
00:18:53 very challenging.
00:18:56 Is that basically you’re trying to find a way that some perturbation in the genome results
00:19:05 in some obvious major dysfunction in the cell.
00:19:11 You’re trying to find something like that.
00:19:14 Exactly.
00:19:15 So what does human genetics do?
00:19:17 Human genetics basically looks at the whole path from genetic variation all the way to
00:19:21 disease.
00:19:22 So human genetics has basically taken thousands of Alzheimer’s cases and thousands of controls
00:19:31 matched for age, for sex, for environmental backgrounds and so on and so forth.
00:19:38 And then looked at that map where you’re asking, what are the individual genetic perturbations
00:19:44 and how are they related to all the way to Alzheimer’s disease?
00:19:48 And that has actually been quite successful.
00:19:51 So we now have more than 27 different loci, these are genomic regions that are associated
00:19:57 with Alzheimer’s at these end to end level.
00:20:02 But the moment you sort of break up that very long path into smaller levels, you can basically
00:20:07 say from genetics, what are the epigenomic alterations at the level of gene regulatory
00:20:13 elements where that genetic variant perturbs the control region nearby.
00:20:19 That effect is much larger.
00:20:21 You mean much larger in terms of this down the line impact or?
00:20:25 It’s much larger in terms of the measurable effect, this A versus B variance is actually
00:20:31 so much cleanly defined when you go to the shorter branches.
00:20:35 Because for one genetic variant to affect Alzheimer’s, that’s a very long path.
00:20:40 That basically means that in the context of millions of these 6 million variants that
00:20:43 every one of us carries, that one single nucleotide has a detectable effect all the way to the
00:20:51 end.
00:20:52 I mean, it’s just mind boggling that that’s even possible, but indeed there are such effects.
00:20:57 So the hope is, or the most scientifically speaking, the most effective place where to
00:21:03 detect the alteration that results in disease is earlier on in the pipeline, as early as
00:21:10 possible.
00:21:11 It’s a trade off.
00:21:12 If you go very early on in the pipeline, now each of these epigenomic alterations, for
00:21:17 example, this enhancer control region is active maybe 50% less, which is a dramatic effect.
00:21:25 Now you can ask, well, how much does changing one regulatory region in the genome in one
00:21:29 cell type change disease?
00:21:31 Well, that path is now long.
00:21:33 So if you instead look at expression, the path between genetic variation and the expression
00:21:39 of one gene goes through many enhancer regions, and therefore it’s a subtler effect at the
00:21:44 gene level.
00:21:45 But then now you’re closer because one gene is acting in the context of only 20,000 other
00:21:51 genes as opposed to one enhancer acting in the context of 2 million other enhancers.
00:21:57 So you basically now have genetic, epigenomic, the circuitry, transcriptomic, the gene expression
00:22:04 control, and then cellular, where you can basically say, I can measure various properties
00:22:09 of those cells.
00:22:11 What is the calcium influx rate when I have this genetic variation?
00:22:17 What is the synaptic density?
00:22:19 What is the electric impulse conductivity and so on and so forth?
00:22:24 So you can measure things along this path to disease, and you can also measure endophenotypes.
00:22:32 You can basically measure your brain activity.
00:22:37 You can do imaging in the brain.
00:22:39 You can basically measure, I don’t know, the heart rate, the pulse, the lipids, the amount
00:22:44 of blood secreted and so on and so forth.
00:22:46 And then through all of that, you can basically get at the path to causality, the path to
00:22:52 disease.
00:22:55 And is there something beyond cellular?
00:22:57 So you mentioned lifestyle interventions or changes as a way to, or like be able to prescribe
00:23:05 changes in lifestyle.
00:23:07 Like what about organs?
00:23:09 What about like the function of the body as a whole?
00:23:13 Yeah, absolutely.
00:23:14 So basically when you go to your doctor, they always measure, you know, your pulse.
00:23:18 They always measure your height.
00:23:19 They always measure your weight, you know, your BMI.
00:23:21 So basically these are just very basic variables.
00:23:24 But with digital devices nowadays, you can start measuring hundreds of variables for
00:23:27 every individual.
00:23:29 You can basically also phenotype cognitively through tests, Alzheimer’s patients.
00:23:37 There are cognitive tests that you can measure, that you typically do for cognitive decline,
00:23:43 these mini mental observations that you have specific questions to.
00:23:48 You can think of sort of enlarging the set of cognitive tests.
00:23:51 So in the mouse, for example, you do experiments for how do they get out of mazes?
00:23:55 How do they find food?
00:23:57 Whether they recall a fear, whether they shake in a new environment and so on and so forth.
00:24:02 In the human, you can have much, much richer phenotypes where you can basically say not
00:24:06 just imaging at the organ level and all kinds of other activities at the organ level, but
00:24:13 you can also do at the organism level, you can do behavioral tests.
00:24:19 And how did they do on empathy?
00:24:21 How did they do on memory?
00:24:22 How did they do on longterm memory versus short term memory?
00:24:26 And so on and so forth.
00:24:27 I love how you’re calling that phenotype.
00:24:28 I guess it is.
00:24:29 It is.
00:24:31 But like your behavior patterns that might change over a period of a life, your ability
00:24:37 to remember things, your ability to be empathetic or emotionally, your intelligence perhaps
00:24:44 even.
00:24:45 Yeah, but intelligence has hundreds of variables.
00:24:47 You can be your math intelligence, your literary intelligence, your puzzle solving intelligence,
00:24:50 your logic.
00:24:51 It could be like hundreds of things.
00:24:52 And all of that, we’re able to measure that better and better and all that could be connected
00:24:57 to the entire pipeline somehow.
00:24:58 We used to think of each of these as a single variable like intelligence.
00:25:01 I mean, that’s ridiculous.
00:25:03 It’s basically dozens of different genes that are controlling every single variable.
00:25:10 You can basically think of, imagine us in a video game where every one of us has measures
00:25:16 of strength, stamina, energy left and so on and so forth.
00:25:20 But you could click on each of those five bars that are just the main bars and each
00:25:24 of those will just give you then hundreds of bars and can basically say, okay, great
00:25:28 for my machine learning task, I want someone who, a human who has these particular forms
00:25:36 of intelligence.
00:25:37 I require now these 20 different things.
00:25:40 And then you can combine those things and then relate them to of course performance
00:25:45 in a particular task, but you can also relate them to genetic variation that might be affecting
00:25:50 different parts of the brain.
00:25:52 For example, your frontal cortex versus your temporal cortex versus your visual cortex
00:25:56 and so on and so forth.
00:25:58 So genetic variation that affects expression of genes in different parts of your brain
00:26:02 can basically affect your music ability, your auditory ability, your smell, just dozens
00:26:08 of different phenotypes can be broken down into hundreds of cognitive variables and then
00:26:15 relate each of those to thousands of genes that are associated with them.
00:26:20 So somebody who loves RPGs or playing games, there’s too few variables that we can control.
00:26:28 So I’m excited if we’re in fact living in a simulation and this is a video game, I’m
00:26:32 excited by the quality of the video game.
00:26:37 The game designer did a hell of a good job.
00:26:39 So we’re impressed.
00:26:40 Oh, I don’t know.
00:26:41 The sunset last night was a little unrealistic.
00:26:43 Yeah.
00:26:44 Yeah.
00:26:45 The graphics.
00:26:46 Exactly.
00:26:47 Come on, NVIDIA.
00:26:48 To zoom back out, we’ve been talking about the genetic origins of diseases, but I think
00:26:54 it’s fascinating to talk about what are the most important diseases to understand and
00:27:01 especially as it connects to the things that you’re working on.
00:27:05 So it’s very difficult to think about important diseases to understand.
00:27:08 There’s many metrics of importance.
00:27:10 One is lifestyle impact.
00:27:12 I mean, if you look at COVID, the impact on lifestyle has been enormous.
00:27:16 So understanding COVID is important because it has impacted the wellbeing in terms of
00:27:23 ability to have a job, ability to have an apartment, ability to go to work, ability
00:27:27 to have a mental circle of support and all of that for millions of Americans, like huge,
00:27:34 huge impact.
00:27:35 So that’s one aspect of importance.
00:27:37 So basically mental disorders, Alzheimer’s has a huge importance in the wellbeing of
00:27:42 Americans.
00:27:44 Whether or not it kills someone for many, many years, it has a huge impact.
00:27:48 So the first measure of importance is just wellbeing.
00:27:52 Impact on the quality of life.
00:27:53 Impact on the quality of life, absolutely.
00:27:55 The second metric, which is much easier to quantify is deaths.
00:28:00 What is the number one killer?
00:28:01 The number one killer is actually heart disease.
00:28:04 It is actually killing 650,000 Americans per year.
00:28:10 Number two is cancer with 600,000 Americans.
00:28:14 Number three, far, far down the list is accidents, every single accident combined.
00:28:19 So basically you read the news, accidents, like there was a huge car crash all over the
00:28:24 news.
00:28:25 But the number of deaths, number three by far, 167,000.
00:28:31 Core respiratory disease.
00:28:32 So that’s asthma, not being able to breathe and so on and so forth, 160,000 Alzheimer’s
00:28:39 number five with 120,000 and then stroke, brain aneurysms and so on and so forth, that’s
00:28:45 147,000 diabetes and metabolic disorders, et cetera.
00:28:49 That’s 85,000.
00:28:51 The flu is 60,000, suicide, 50,000 and then overdose, et cetera, you know, goes further
00:28:58 down the list.
00:29:00 So of course COVID has creeped up to be the number three killer this year with, you know,
00:29:06 more than 100,000 Americans and counting.
00:29:11 And you know, but if you think about sort of what do we use, what are the most important
00:29:16 diseases, you have to understand both the quality of life and the sheer number of deaths
00:29:22 and just numbers of years lost if you wish.
00:29:25 And each of these diseases you can think of as, and also including terrorist attacks and
00:29:30 school shootings, for example, things which lead to fatalities, you can look at as problems
00:29:39 that could be solved.
00:29:41 And some problems are harder to solve than others.
00:29:44 I mean, that’s part of the equation.
00:29:46 So maybe if you look at these diseases, if you look at heart disease or cancer or Alzheimer’s
00:29:52 or just like schizophrenia and obesity, Debbie, like not necessarily things that kill you,
00:29:59 but affect the quality of life, which problems are solvable, which aren’t, which are harder
00:30:05 to solve, which aren’t.
00:30:07 I love your question because he puts it in the context of a global effort rather than
00:30:13 just the local effort.
00:30:15 So basically if you look at the global aspect, exercise and nutrition are two interventions
00:30:22 that we can as a society make a much better job at.
00:30:27 So if you think about sort of the availability of cheap food, it’s extremely high in calories.
00:30:33 It’s extremely detrimental for you, like a lot of processed food, et cetera.
00:30:36 So if we change that equation and as a society, we made availability of healthy food much,
00:30:43 much easier and charged a burger at McDonald’s, the price that it costs on the health system,
00:30:52 then people would actually start buying more healthy foods.
00:30:56 So basically that’s sort of a societal intervention, if you wish.
00:30:59 In the same way, increasing empathy, increasing education, increasing the social framework
00:31:06 and support would basically lead to fewer suicides.
00:31:10 It would lead to fewer murders.
00:31:11 It would lead to fewer deaths overall.
00:31:15 So that’s something that we as a society can do.
00:31:19 You can also think about external factors versus internal factors.
00:31:21 So the external factors are basically communicable diseases like COVID, like the flu, et cetera.
00:31:27 And the internal factors are basically things like cancer and Alzheimer’s where basically
00:31:33 your genetics will eventually drive you there.
00:31:38 And then of course, with all of these factors, every single disease has both the genetic
00:31:43 component and environmental component.
00:31:46 So heart disease, huge genetic contribution, Alzheimer’s, it’s like 60% plus genetic.
00:31:55 So I think it’s like 79% heritability.
00:31:59 So that basically means that genetics alone explains 79% of Alzheimer’s incidents.
00:32:06 And yes, there’s a 21% environmental component where you could basically enrich your cognitive
00:32:14 environment, enrich your social interactions, read more books, learn a foreign language,
00:32:21 go running, you know, sort of have a more fulfilling life.
00:32:24 All of that will actually decrease Alzheimer’s, but there’s a limit to how much that can impact
00:32:29 because of the huge genetic footprint.
00:32:31 So this is fascinating.
00:32:32 So each one of these problems have a genetic component and an environment component.
00:32:38 And so like when there’s a genetic component, what can we do about some of these diseases?
00:32:43 And have you worked on what can you say that’s in terms of problems that are solvable here
00:32:48 or understandable?
00:32:50 So my group works on the genetic component, but I would argue that understanding the genetic
00:32:55 component can have a huge impact even on the environmental component.
00:32:59 Why is that?
00:33:00 Because genetics gives us access to mechanism.
00:33:03 And if we can alter the mechanism, if we can impact the mechanism, we can perhaps counteract
00:33:09 some of the environmental components.
00:33:12 So understanding the biological mechanisms leading to disease is extremely important
00:33:18 in being able to intervene.
00:33:20 But when you can intervene and what, you know, the analogy that I like to give is for example,
00:33:26 for obesity, you know, think of it as a giant bathtub of fat.
00:33:29 There’s basically fat coming in from your diet and there’s fat coming out from your
00:33:35 exercise.
00:33:36 Okay.
00:33:37 So that’s an in out equation and that’s the equation that everybody’s focusing on.
00:33:42 But your metabolism impacts that, you know, bathtub.
00:33:47 Basically your metabolism controls the rate at which you’re burning energy.
00:33:53 It controls the rate at which you’re storing energy.
00:33:56 And it also teaches you about the various valves that control the input and the output
00:34:02 equation.
00:34:04 So if we can learn from the genetics, the valves, we can then manipulate those valves.
00:34:11 And even if the environment is feeding you a lot of fat and getting a little that out,
00:34:16 you can just poke another hole at the bathtub and just get a lot of the fat out.
00:34:19 Yeah, that’s fascinating.
00:34:21 Yeah.
00:34:22 So we’re not just passive observers of our genetics.
00:34:25 The more we understand, the more we can come up with actual treatments.
00:34:29 And I think that’s an important aspect to realize when people are thinking about strong
00:34:35 effect versus weak effect variants.
00:34:38 So some variants have strong effects.
00:34:39 We talked about these Mendelian disorders where a single gene has a sufficiently large
00:34:43 effect, penetrance, expressivity, and so on and so forth, that basically you can trace
00:34:49 it in families with cases and not cases, cases, not cases, and so on and so forth.
00:34:55 But so these are the genes that everybody says, oh, that’s the genes we should go after
00:35:02 because that’s a strong effect gene.
00:35:04 I like to think about it slightly differently.
00:35:06 These are the genes where genetic impacts that have a strong effect were tolerated because
00:35:15 every single time we have a genetic association with disease, it depends on two things.
00:35:20 Number one, the obvious one, whether the gene has an impact on the disease.
00:35:24 Number two, the more subtle one is whether there is genetic variation standing and circulating
00:35:32 and segregating in the human population that impacts that gene.
00:35:37 Some genes are so darn important that if you mess with them, even a tiny little amount,
00:35:44 that person’s dead.
00:35:46 So those genes don’t have variation.
00:35:49 You’re not going to find a genetic association if you don’t have variation.
00:35:53 That doesn’t mean that the gene has no role.
00:35:55 It simply means that the gene tolerates no mutations.
00:35:59 So that’s actually a strong signal when there’s no variation.
00:36:01 That’s so fascinating.
00:36:02 Exactly.
00:36:03 Genes that have very little variation are hugely important.
00:36:06 You can actually rank the importance of genes based on how little variation they have.
00:36:10 And those genes that have very little variation but no association with disease, that’s a
00:36:16 very good metric to say, oh, that’s probably a developmental gene because we’re not good
00:36:20 at measuring those phenotypes.
00:36:22 So it’s genes that you can tell evolution has excluded mutations from, but yet we can’t
00:36:29 see them associated with anything that we can measure nowadays.
00:36:32 It’s probably early embryonic lethal.
00:36:34 What are all the words you just said?
00:36:36 Early embryonic what?
00:36:37 Lethal.
00:36:38 Meaning?
00:36:39 Meaning that that embryo will die.
00:36:40 Okay.
00:36:41 There’s a bunch of stuff that is required for a stable functional organism across the
00:36:49 board for an entire species, I guess.
00:36:53 If you look at sperm, it expresses thousands of proteins.
00:36:58 Does sperm actually need thousands of proteins?
00:37:01 No, but it’s probably just testing them.
00:37:05 So my speculation is that misfolding of these proteins is an early test for failure.
00:37:11 So that out of the millions of sperm that are possible, you select the subset that are
00:37:18 just not grossly misfolding thousands of proteins.
00:37:21 So it’s kind of an assert that this is folded correctly.
00:37:25 Correct.
00:37:26 Yeah.
00:37:27 This just because if this little thing about the folding of a protein isn’t correct, that
00:37:32 probably means somewhere down the line, there’s a bigger issue.
00:37:35 That’s exactly right.
00:37:36 So fail fast.
00:37:37 So basically if you look at the mammalian investment in a newborn, that investment is
00:37:45 enormous in terms of resources.
00:37:47 So mammals have basically evolved mechanisms for fail fast.
00:37:52 Where basically in those early months of development, I mean it’s horrendous of course at the personal
00:37:58 level when you lose your future child, but in some ways there’s so little hope for that
00:38:08 child to develop and sort of make it through the remaining months that sort of fail fast
00:38:12 is probably a good evolutionary principle for mammals.
00:38:19 And of course humans have a lot of medical resources that you can sort of give those
00:38:24 children a chance and we have so much more success in sort of giving folks who have these
00:38:33 strong carrier mutations a chance, but if they’re not even making it through the first
00:38:37 three months, we’re not going to see them.
00:38:39 So that’s why when we say what are the most important genes to focus on, the ones that
00:38:45 have a strong effect mutation or the ones that have a weak effect mutation, well the
00:38:50 jury might be out because the ones that have a strong effect mutation are basically not
00:38:57 mattering as much.
00:38:58 The ones that only have weak effect mutations by understanding through genetics that they
00:39:04 have a weak effect mutation and understanding that they have a causal role on the disease,
00:39:10 we can then say, okay, great, evolution has only tolerated a 2% change in that gene.
00:39:15 Pharmaceutically I can go in and induce a 70% change in that gene and maybe I will poke
00:39:22 another hole at the bathtub that was not easy to control in many of the other sort of strong
00:39:33 effect genetic variants.
00:39:35 So there’s this beautiful map of across the population of things that you’re saying strong
00:39:41 and weak effects, so stuff with a lot of mutations and stuff with little mutations with no mutations
00:39:48 and you have this map and it lays out the puzzle.
00:39:51 Yeah.
00:39:52 So when I say strong effect, I mean at the level of individual mutations.
00:39:56 So basically genes where, so you have to think of first the effect of the gene on the disease.
00:40:03 Remember how I was sort of painting that map earlier from genetics all the way to phenotype.
00:40:10 That gene can have a strong effect on the disease, but the genetic variant might have
00:40:15 a weak effect on the gene.
00:40:18 So basically when you ask what is the effect of that genetic variant on the disease, it
00:40:24 could be that that genetic variant impacts the gene by a lot and then the gene impacts
00:40:29 the disease by a little, or it could be that the genetic variants impacts the gene by a
00:40:33 little and then the gene impacts the disease by a lot.
00:40:35 So what we care about is genes that impact the disease a lot, but genetics gives us the
00:40:41 full equation and what I would argue is if we couple the genetics with expression variation
00:40:51 to basically ask what genes change by a lot and which genes correlate with disease by
00:41:00 a lot, even if the genetic variants change them by a little, then those are the best
00:41:06 places to intervene.
00:41:07 Those are the best places where pharmaceutical, if I have even a modest effect, I will have
00:41:13 a strong effect on the disease, whereas those genetic variants that have a huge effect on
00:41:17 the disease, I might not be able to change that gene by this much without affecting all
00:41:21 kinds of other things.
00:41:22 Interesting.
00:41:23 So that’s what we’re looking at.
00:41:26 What have we been able to find in terms of which disease could be helped?
00:41:31 Again, don’t get me started.
00:41:37 We have found so much.
00:41:38 Our understanding of disease has changed so dramatically with genetics.
00:41:46 I mean places that we had no idea would be involved.
00:41:49 So one of the worst things about my genome is that I have a genetic predisposition to
00:41:53 age related macular degeneration, AMD.
00:41:56 So it’s a form of blindness that causes you to lose the central part of your vision progressively
00:42:02 as you grow older.
00:42:04 My increased risk is fairly small.
00:42:06 I have an 8% chance.
00:42:07 You only have a 6% chance.
00:42:10 I’m an average.
00:42:11 By the way, when you say my, you mean literally yours.
00:42:14 You know this about you.
00:42:15 I know this about me.
00:42:18 Which is kind of, I mean philosophically speaking is a pretty powerful thing to live with.
00:42:26 Maybe that’s, so we agreed to talk again by the way for the listeners to where we’re going
00:42:31 to try to focus on science today and a little bit of philosophy next time.
00:42:36 But it’s interesting to think about the more you’re able to know about yourself from the
00:42:42 genetic information in terms of the diseases, how that changes your own view of life.
00:42:49 So there’s a lot of impact there and there’s something called genetics exceptionalism,
00:42:56 which basically thinks of genetics as something very, very different than everything else
00:43:01 as a type of determinism.
00:43:04 And you know, let’s talk about that next time.
00:43:07 So basically.
00:43:08 That’s a good preview.
00:43:09 Yeah.
00:43:10 So let’s go back to AMD.
00:43:11 So basically with AMD, we have no idea what causes AMD.
00:43:16 You know, it was, it was a mystery until the genetics were worked out.
00:43:23 And now the fact that I know that I have a predisposition allows me to sort of make some
00:43:28 life choices, number one, but number two, the genes that lead to that predisposition
00:43:34 give us insights as to how does it actually work.
00:43:38 And that’s a place where genetics gave us something totally unexpected.
00:43:42 So there’s a complement pathway, which is an immune function pathway that was in, you
00:43:52 know, most of the loci associated with AMD.
00:43:55 And that basically told us that, wow, there’s an immune basis to this eye disorder that
00:44:02 people had just not expected before.
00:44:05 If you look at complement, it was recently also implicated in schizophrenia.
00:44:11 And there’s a type of microglia that is involved in synaptic pruning.
00:44:17 So synapses are the connections between neurons.
00:44:20 And in this whole use it or lose it view of mental cognition and other capabilities, you
00:44:27 basically have microglia, which are immune cells that are sort of constantly traversing
00:44:32 your brain and then pruning neuronal connections, pruning synaptic connections that are not
00:44:38 utilized.
00:44:40 So in schizophrenia, there’s thought to be a change in the pruning that basically if
00:44:47 you don’t prune your synapses the right way, you will actually have an increased role of
00:44:53 schizophrenia.
00:44:54 This is something that was completely unexpected for schizophrenia.
00:44:57 Of course, we knew it has to do with neurons, but the role of the complement complex, which
00:45:01 is also implicated in AMD, which is now also implicated in schizophrenia, was a huge surprise.
00:45:06 What’s the complement complex?
00:45:08 So it’s basically a set of genes, the complement genes that are basically having various immune
00:45:13 roles.
00:45:15 And as I was saying earlier, our immune system has been coopted for many different roles
00:45:19 across the body.
00:45:21 So they actually play many diverse roles.
00:45:23 And somehow the immune system is connected to the synaptic pruning process, the process.
00:45:29 Exactly.
00:45:30 So the prune cells were coopted to prune synapses.
00:45:33 How did you figure this out?
00:45:35 How does one go about figuring this intricate connection, like pipeline of connections out?
00:45:41 Yeah.
00:45:42 Let me give you another example.
00:45:44 So Alzheimer’s disease, the first place that you would expect it to act is obviously the
00:45:48 brain.
00:45:49 So we had basically this roadmap epigenomics consortium view of the human epigenome, the
00:45:57 largest map of the human epigenome that has ever been built across 127 different tissues
00:46:04 and samples with dozens of epigenomic marks measured in hundreds of donors.
00:46:10 So what we’ve basically learned through that is that you basically can map what are the
00:46:16 active gene regulatory elements for every one of the tissues in the body.
00:46:20 And then we connected these gene regulatory active maps of basically what regions of the
00:46:27 human genome are turning on in every one of different tissues.
00:46:32 We then can go back and say, where are all of the genetic loci that are associated with
00:46:38 disease?
00:46:39 This is something that my group, I think was the first to do back in 2010 in this Ernst
00:46:46 Nature Biotech paper, but basically we were for the first time able to show that specific
00:46:52 chromatin states, specific epigenomic states, in that case enhancers, were in fact enriched
00:46:58 in disease associated variants.
00:47:00 We pushed that further in the Ernst Nature paper a year later.
00:47:05 And then in this roadmap epigenomics paper a few years after that, but basically that
00:47:12 matrix that you mentioned earlier was in fact the first time that we could see what genetic
00:47:18 traits have genetic variants that are enriched in what tissues in the body.
00:47:26 And a lot of that map made complete sense.
00:47:28 If you looked at a diversity of immune traits like allergies and type one diabetes and so
00:47:33 on and so forth, you basically could see that they were enriching, that the genetic variants
00:47:38 associated with those traits were enriched in enhancers in these gene regulatory elements
00:47:44 active in T cells and B cells and hematopoietic stem cells and so on and so forth.
00:47:49 So that basically gave us a confirmation in many ways that those immune traits were indeed
00:47:56 enriching immune cells.
00:48:00 If you looked at type two diabetes, you basically saw an enrichment in only one type of sample
00:48:06 and it was pancreatic islets.
00:48:08 And we know that type two diabetes sort of stems from the dysregulation of insulin in
00:48:14 the beta cells of pancreatic islets.
00:48:17 And that sort of was spot on, super precise.
00:48:21 If you looked at blood pressure, where would you expect blood pressure to occur?
00:48:25 You know, I don’t know, maybe in your metabolism and ways that you process coffee or something
00:48:29 like that.
00:48:30 Maybe in your brain, the way that you stress out and increases your blood pressure, et
00:48:33 cetera.
00:48:34 So the blood pressure localized specifically in the left ventricle of the heart.
00:48:40 So the enhancers of the left ventricle in the heart contained a lot of genetic variants
00:48:44 associated with blood pressure.
00:48:46 If you look at height, we found an enrichment specifically in embryonic stem cell enhancers.
00:48:53 So the genetic variants predisposing you to be taller or shorter are in fact acting in
00:48:57 developmental stem cells, makes complete sense.
00:49:01 If you looked at inflammatory bowel disease, you basically found inflammatory, which is
00:49:05 immune, and also bowel disease, which is digestive.
00:49:09 And indeed we saw a double enrichment both in the immune cells and in the digestive cells.
00:49:15 So that basically told us that this is acting in both components.
00:49:19 There’s an immune component to inflammatory bowel disease and there’s a digestive component.
00:49:23 And the big surprise was for Alzheimer’s.
00:49:25 We had seven different brain samples.
00:49:29 We found zero enrichment in the brain samples for genetic variants associated with Alzheimer’s.
00:49:36 And this is mind boggling.
00:49:38 Our brains were literally hurting.
00:49:40 What is going on?
00:49:42 And what is going on is that the brain samples are primarily neurons, oligodendrocytes, and
00:49:49 astrocytes in terms of the cell types that make them up.
00:49:54 So that basically indicated that genetic variants associated with Alzheimer’s were probably
00:49:59 not acting in oligodendrocytes, astrocytes, or neurons.
00:50:04 So what could they be acting in?
00:50:05 Well, the fourth major cell type is actually microglia.
00:50:10 Microglia are resident immune cells in your brain.
00:50:13 Oh, nice.
00:50:15 They immune.
00:50:16 Oh, wow.
00:50:17 They are CD14 plus, which is this sort of cell surface markers of those cells.
00:50:24 So they’re CD14 plus cells, just like macrophages that are circulating in your blood.
00:50:30 The microglia are resident monocytes that are basically sitting in your brain.
00:50:35 They’re tissue specific monocytes.
00:50:38 And every one of your tissues, like your fat, for example, has a lot of macrophages that
00:50:42 are resident.
00:50:43 And the M1 versus M2 macrophage ratio has a huge role to play in obesity.
00:50:49 And so basically, again, these immune cells are everywhere, but basically what we found
00:50:53 through this completely unbiased view of what are the tissues that likely underlie different
00:50:59 disorders, we found that Alzheimer’s was humongously enriched in microglia, but not at all in the
00:51:08 other cell types.
00:51:09 So what are we supposed to make that if you look at the tissues involved, is that simply
00:51:15 useful for indication of propensity for disease, or does it give us somehow a pathway of treatment?
00:51:24 It’s very much the second.
00:51:26 If you look at the way to therapeutics, you have to start somewhere.
00:51:33 What are you going to do?
00:51:34 You’re going to basically make assays that manipulate those genes and those pathways
00:51:42 in those cell types.
00:51:43 So before we know the tissue of action, we don’t even know where to start.
00:51:49 We basically are at a loss.
00:51:51 But if you know the tissue of action, and even better, if you know the pathway of action,
00:51:54 then you can basically screen your small molecules, not for the gene, you can screen them directly
00:52:00 for the pathway in that cell type.
00:52:02 So you can basically develop a high throughput multiplexed robotic system for testing the
00:52:10 impact of your favorite molecules that you know are safe, efficacious, and sort of hit
00:52:16 that particular gene and so on and so forth.
00:52:18 You can basically screen those molecules against either a set of genes that act in that pathway
00:52:25 or on the pathway directly by having a cellular assay.
00:52:29 And then you can basically go into mice and do experiments and basically sort of figure
00:52:33 out ways to manipulate these processes that allow you to then go back to humans and do
00:52:38 a clinical trial that basically says, okay, I was able indeed to reverse these processes
00:52:43 in mice.
00:52:44 Can I do the same thing in humans?
00:52:46 So the knowledge of the tissues gives you the pathway to treatment, but that’s not the
00:52:51 only part.
00:52:52 There are many additional steps to figuring out the mechanism of disease.
00:52:57 So that’s really promising.
00:52:59 Maybe to take a small step back, you’ve mentioned all these puzzles that were figured out with
00:53:04 the Nature paper for, I mean, you’ve mentioned a ton of diseases from obesity to Alzheimer’s,
00:53:13 even schizophrenia, I think you mentioned.
00:53:17 What is the actual methodology of figuring this out?
00:53:20 So indeed, I mentioned a lot of diseases and my lab works on a lot of different disorders.
00:53:26 And the reason for that is that if you look at biology, it used to be zoology departments
00:53:39 and botanology departments and virology departments and so on and so forth.
00:53:43 And MIT was one of the first schools to basically create a biology department, like, oh, we’re
00:53:47 going to study all of life suddenly.
00:53:49 Why was that even a case?
00:53:51 Because the advent of DNA and the genome and the central dogma of DNA makes RNA makes protein
00:53:58 in many ways, unified biology.
00:54:01 You could suddenly study the process of transcription in viruses or in bacteria and have a huge
00:54:07 impact on yeast and fly and maybe even mammals because of this realization of these common
00:54:15 underlying processes.
00:54:17 And in the same way that DNA unified biology, genetics is unifying disease studies.
00:54:27 So you used to have, I don’t know, cardiovascular disease department and neurological disease
00:54:39 department and neurodegeneration department and basically immune and cancer and so on
00:54:47 and so forth.
00:54:48 And all of these were studied in different labs because it made sense, because basically
00:54:53 the first step was understanding how the tissue functions and we kind of knew the tissues
00:54:57 involved in cardiovascular disease and so on and so forth.
00:55:00 But what’s happening with human genetics is that all of these walls and edifices that
00:55:05 we had built are crumbling.
00:55:08 And the reason for that is that genetics is in many ways revealing unexpected connections.
00:55:16 So suddenly we now have to bring the immunologists to work on Alzheimer’s.
00:55:21 They were never in the room.
00:55:22 They were in another building altogether.
00:55:25 The same way for schizophrenia, we now have to sort of worry about all these interconnected
00:55:31 aspects.
00:55:33 For metabolic disorders, we’re finding contributions from brain.
00:55:37 So suddenly we have to call the neurologist from the other building and so on and so forth.
00:55:41 So in my view, it makes no sense anymore to basically say, oh, I’m a geneticist studying
00:55:49 immune disorders.
00:55:50 I mean, that’s ridiculous because, I mean, of course in many ways you still need to sort
00:55:55 of focus.
00:55:56 But what we’re doing is that we’re basically saying we’ll go wherever the genetics takes
00:56:01 us.
00:56:02 And by building these massive resources, by working on our latest map is now 833 tissues,
00:56:10 sort of the next generation of the epigenomics roadmap, which we’re now called epimap, is
00:56:15 833 different tissues.
00:56:18 And using those, we’ve basically found enrichments in 540 different disorders.
00:56:24 Those enrichments are not like, oh great, you guys work on that and we’ll work on this.
00:56:29 They’re intertwined amazingly.
00:56:31 So of course there’s a lot of modularity, but there’s these enhancers that are sort
00:56:36 of broadly active and these disorders that are broadly active.
00:56:39 So basically some enhancers are active in all tissues and some disorders are enriching
00:56:43 in all tissues.
00:56:44 So basically there’s these multifactorial and this other class, which I like to call
00:56:49 polyfactorial diseases, which are basically lighting up everywhere.
00:56:54 And in many ways it’s, you know, sort of cutting across these walls that were previously built
00:57:00 across these departments.
00:57:01 And the polyfactorial ones were probably the previous structural departments wasn’t equipped
00:57:07 to deal with those.
00:57:08 I mean, again, maybe it’s a romanticized question, but you know, there’s in physics, there’s
00:57:14 a theory of everything.
00:57:16 Do you think it’s possible to move towards an almost theory of everything of disease
00:57:22 from a genetic perspective?
00:57:24 So if this unification continues, is it possible that, like, do you think in those terms, like
00:57:29 trying to arrive at a fundamental understanding of how disease emerges, period?
00:57:35 That unification is not just foreseeable, it’s inevitable.
00:57:41 I see it as inevitable.
00:57:43 We have to go there.
00:57:45 You cannot be a specialist anymore.
00:57:48 If you’re a genomicist, you have to be a specialist in every single disorder.
00:57:53 And the reason for that is that the fundamental understanding of the circuitry of the human
00:57:59 genome that you need to solve schizophrenia, that fundamental circuitry is hugely important
00:58:07 to solve Alzheimer’s.
00:58:09 And that same circuitry is hugely important to solve metabolic disorders.
00:58:13 And that same exact circuitry is hugely important for solving immune disorders and cancer and,
00:58:20 you know, every single disease.
00:58:22 So all of them have the same sub task.
00:58:26 And I teach dynamic programming in my class.
00:58:29 Dynamic programming is all about sort of not redoing the work.
00:58:34 It’s reusing the work that you do once.
00:58:37 So basically for us to say, oh, great, you know, you guys in the immune building go solve
00:58:42 the fundamental circuitry of everything.
00:58:44 And then you guys in the schizophrenia building go solve the fundamental circuitry of everything
00:58:47 separately, is crazy.
00:58:50 So what we need to do is come together and sort of have a circuitry group, the circuitry
00:58:56 building that sort of tries to solve the circuitry of everything.
00:58:59 And then the immune folks who will apply this knowledge to all of the disorders that are
00:59:05 associated with immune dysfunction and the schizophrenia folks will basically interacting
00:59:12 with both the immune folks and with the neuronal folks.
00:59:15 And all of them will be interacting with the circuitry folks and so on and so forth.
00:59:19 So that’s sort of the current structure of my group, if you wish.
00:59:22 So basically what we’re doing is focusing on the fundamental circuitry.
00:59:27 But at the same time, we’re the users of our own tools by collaborating with many other
00:59:34 labs in every one of these disorders that we mentioned.
00:59:37 We basically have a heart focus on cardiovascular disease, coronary artery disease, heart failure
00:59:42 and so on and so forth.
00:59:44 We have an immune focus on several immune disorders.
00:59:48 We have a cancer focus on metastatic melanoma and immunotherapy response.
00:59:55 We have a psychiatric disease focus on schizophrenia, autism, PTSD, and other psychiatric disorders.
01:00:04 We have an Alzheimer’s and neurodegeneration focus on Huntington’s disease, ALS and, you
01:00:10 know, AD related disorders like frontotemporal dementia and Lewy body dementia.
01:00:14 And of course, a huge focus on Alzheimer’s.
01:00:16 We have a metabolic focus on the role of exercise and diets and sort of how they’re impacting
01:00:23 metabolic organs across the body and across many different tissues.
01:00:29 And all of them are interfacing with the circuitry.
01:00:34 And the reason for that is another computer science principle of eat your own dog food.
01:00:42 If everybody ate their own dog food, dog food would taste a lot better.
01:00:47 The reason why Microsoft Excel and Word and PowerPoint was so important and so successful
01:00:55 is because the employees that were working on them, were using them for their day to
01:01:00 day tasks.
01:01:01 You can’t just simply build a circuitry and say, here it is guys, take the circuitry,
01:01:06 we’re done without being the users of that circuitry because you then go back.
01:01:11 And because we span the whole spectrum from profiling the epigenome, using comparative
01:01:16 genomics, finding the important nucleotides in the genome, building the basic functional
01:01:21 map of what are the genes in the human genome, what are the gene regulatory elements of the
01:01:26 human genome.
01:01:27 I mean, over the years we’ve written a series of papers on how do you find human genes in
01:01:31 the first place using comparative genomics?
01:01:34 How do you find the motifs that are the building blocks of gene regulation using comparative
01:01:38 genomics?
01:01:39 And how do you then find how these motifs come together and act in specific tissues
01:01:44 using epigenomics?
01:01:46 How do you link regulators to enhancers and enhancers to their target genes using epigenomics
01:01:53 and regulatory genomics?
01:01:55 So through the years we’ve basically built all this infrastructure for understanding
01:02:00 what I like to say, every single nucleotide of the human genome and how it acts in every
01:02:06 one of the major cell types and tissues of the human body.
01:02:10 I mean, this is no small task.
01:02:12 This is an enormous task that takes the entire field.
01:02:15 And that’s something that my group has taken on along with many other groups.
01:02:20 And we have also, and that sort of a thing sets my group perhaps apart, we have also
01:02:25 worked with specialists in every one of these disorders to basically further our understanding
01:02:30 all the way down to disease and in some cases collaborating with pharma to go all the way
01:02:35 down to therapeutics because of our deep, deep understanding of that basic circuitry
01:02:42 and how it allows us to now improve the circuitry.
01:02:47 Not just treat it as a black box, but basically go and say, okay, we need a better cell type
01:02:51 specific wiring that we now have at the tissue specific level.
01:02:56 So we’re focusing on that because we’re understanding the needs from the disease front.
01:03:01 So you have a sense of the entire pipeline, I mean, one, maybe you can indulge me.
01:03:08 One nice question to ask would be, how do you, from the scientific perspective, go from
01:03:14 knowing nothing about the disease to going, you said, to go into the entire pipeline and
01:03:22 actually have a drug or a treatment that cures that disease?
01:03:26 So that’s an enormously long path and an enormously great challenge.
01:03:32 And what I’m trying to argue is that it progresses in stages of understanding rather than one
01:03:39 gene at a time.
01:03:40 The traditional view of biology was you have one postdoc working on this gene and another
01:03:45 postdoc working on that gene, and they’ll just figure out everything about that gene
01:03:50 and that’s their job.
01:03:52 But we’ve realized how polygenic the diseases are, so we can’t have one postdoc per gene
01:03:57 anymore.
01:03:58 We now have to have these cross cutting needs.
01:04:04 And I’m going to describe the path to circuitry along those needs.
01:04:10 And every single one of these paths, we are now doing in parallel across thousands of
01:04:15 genes.
01:04:17 So the first step is you have a genetic association, and we talked a little bit about sort of the
01:04:23 Mendelian path and the polygenic path to that association.
01:04:27 So the Mendelian path was looking through families to basically find gene regions and
01:04:33 ultimately genes that are underlying particular disorders.
01:04:36 The polygenic path is basically looking at unrelated individuals in this giant matrix
01:04:43 of genotype by phenotype, and then finding hits where a particular variant impacts disease
01:04:49 all the way to the end.
01:04:51 And then we now have a connection, not between a gene and a disease, but between a genetic
01:04:57 region and a disease.
01:05:00 And that distinction is not understood by most people.
01:05:03 So I’m going to explain it a little bit more.
01:05:06 Why do we not have a connection between a gene and a disease, but we have a connection
01:05:11 between a genetic region and a disease?
01:05:13 The reason for that is that 93% of genetic variants that are associated with disease
01:05:21 don’t impact the protein at all.
01:05:27 So if you look at the human genome, there’s 20,000 genes, there’s 3.2 billion nucleotides.
01:05:33 Only 1.5% of the genome codes for proteins.
01:05:40 The other 98.5% does not code for proteins.
01:05:46 If you now look at where are the disease variants located, 93% of them fall in that outside
01:05:54 the genes portion.
01:05:55 Of course, genes are enriched, but they’re only enriched by a factor of three.
01:06:00 That means that still 93% of genetic variants fall outside the proteins.
01:06:06 Why is that difficult?
01:06:08 Why is that a problem?
01:06:09 The problem is that when a variant falls outside the gene, you don’t know what gene is impacted
01:06:15 by that variant.
01:06:16 You can’t just say, oh, it’s near this gene, let’s just connect that variant to the gene.
01:06:21 And the reason for that is that the genome circuitry is very often long range.
01:06:27 So you basically have that genetic variant that could sit in the intron of one gene.
01:06:34 An intron is sort of the place between the exons that code for proteins.
01:06:38 So proteins are split up into exons and introns and every exon codes for a particular subset
01:06:43 of amino acids and together they’re spliced together and then make the final protein.
01:06:49 So that genetic variant might be sitting in an intron of a gene.
01:06:51 It’s transcribed with the gene, it’s processed and then excised, but it might not impact
01:06:56 this gene at all.
01:06:57 It might actually impact another gene that’s a million nucleotides away.
01:07:01 So it’s just riding along even though it has nothing to do with this nearby neighborhood.
01:07:05 That’s exactly right.
01:07:06 Let me give you an example.
01:07:09 The strongest genetic association with obesity was discovered in this FTO gene, fat and obesity
01:07:16 associated gene.
01:07:18 So this FTO gene was studied ad nauseum.
01:07:23 People did tons of experiments on it.
01:07:26 They figured out that FTO is in fact RNA methylation transferase.
01:07:33 It basically impacts something that we call the epitranscriptome.
01:07:38 Just like the genome can be modified, the transcriptome, the transcript of the genes
01:07:43 can be modified.
01:07:44 And we basically said, oh great, that means that epitranscriptomics is hugely involved
01:07:49 in obesity because that gene FTO is clearly where the genetic locus is at.
01:07:56 My group studied FTO in collaboration with a wonderful team led by Melina Klausnitzer.
01:08:04 And what we found is that this FTO locus, even though it is as associated with obesity,
01:08:11 does not implicate the FTO gene.
01:08:16 The genetic variance, it’s in the first intron of the FTO gene, but it controls two genes
01:08:22 IRX3 and IRX5 that are sitting 1.2 million nucleotides away, several genes away.
01:08:32 Oh boy.
01:08:33 What am I supposed to feel about that because isn’t that like super complicated then?
01:08:38 So the way that I was introduced at a conference a few years ago was, and here’s Manolis Kellis
01:08:43 who wrote the most depressing paper of 2015.
01:08:48 And the reason for that is that the entire pharmaceutical industry was so comfortable
01:08:52 that there was a single gene in that locus.
01:08:56 Because in some loci, you basically have three dozen genes that are all sitting in the same
01:08:59 region of association and you’re like, oh gosh, which ones of those is it?
01:09:04 But even that question of which ones of those is it is making the assumption that it is
01:09:08 one of those as opposed to some random gene just far, far away, which is what our paper
01:09:13 showed.
01:09:14 So basically what our paper showed is that you can’t ignore the circuitry.
01:09:19 You have to first figure out the circuitry, all of those long range interactions, how
01:09:23 every genetic variant impacts the expression of every gene in every tissue imaginable across
01:09:28 hundreds of individuals.
01:09:30 And then you now have one of the building blocks, not even all of the building blocks
01:09:35 for then going and understanding disease.
01:09:41 So embrace the wholeness of the circuitry.
01:09:44 Correct.
01:09:45 So back to the question of starting knowing nothing to the disease and going to the treatment.
01:09:51 So what are the next steps?
01:09:53 So you basically have to first figure out the tissue and then describe how you figure
01:09:57 out the tissue.
01:09:58 You figure out the tissue by taking all of these non coding variants that are sitting
01:10:01 outside proteins and then figuring out what are the epigenomic enrichments.
01:10:06 And the reason for that, you know, thankfully is that there is convergence, that the same
01:10:13 processes are impacted in different ways by different loci.
01:10:19 And that’s a saving grace for our field.
01:10:23 The fact that if I look at hundreds of genetic variants associated with Alzheimer’s, they
01:10:27 localize in a small number of processes.
01:10:31 Can you clarify why that’s hopeful?
01:10:34 So like they show up in the same exact way in the, in the specific set of processes.
01:10:40 Yeah.
01:10:41 So basically there’s a small number of biological processes that underlie, or at least that
01:10:45 play the biggest role in every disorder.
01:10:48 So in Alzheimer’s you basically have, you know, maybe 10 different types of processes.
01:10:54 One of them is lipid metabolism.
01:10:56 One of them is immune cell function.
01:10:58 One of them is neuronal energetics.
01:11:02 So these are just a small number of processes, but you have multiple lesions, multiple genetic
01:11:07 perturbations that are associated with those processes.
01:11:10 So if you look at schizophrenia, it’s excitatory neuron function, it’s inhibitory neuron function,
01:11:15 it’s synaptic pruning, it’s calcium signaling and so on and so forth.
01:11:18 So when you look at disease genetics, you have one hit here and one hit there and one
01:11:24 hit there and one hit there, completely different parts of the genome.
01:11:28 But it turns out all of those hits are calcium signaling proteins.
01:11:31 Oh, cool.
01:11:32 You’re like, aha.
01:11:34 That means that calcium signaling is important.
01:11:37 So those people who are focusing on one doctor at a time cannot possibly see that picture.
01:11:42 You have to become a genomicist.
01:11:44 You have to look at the omics, the om, the holistic picture to understand these enrichments.
01:11:51 But you mentioned the convergence thing.
01:11:54 The whatever the thing associated with the disease shows up.
01:11:58 So let me explain convergence.
01:12:00 Convergence is such a beautiful concept.
01:12:03 So you basically have these four genes that are converging on calcium signaling.
01:12:12 So that basically means that they are acting each in their own way, but together in the
01:12:18 same process.
01:12:19 But now in every one of these loci, you have many enhancers controlling each of those genes.
01:12:27 That’s another type of convergence where dysregulation of seven different enhancers might all converge
01:12:33 on dysregulation of that one gene, which then converges on calcium signaling.
01:12:39 And in each one of those enhancers, you might have multiple genetic variants distributed
01:12:44 across many different people.
01:12:46 Everyone has their own different mutation.
01:12:49 But all of these mutations are impacting that enhancer.
01:12:52 And all of these enhancers are impacting that gene.
01:12:55 And all of these genes are impacting this pathway.
01:12:57 And all these pathways are acting in the same tissue.
01:13:00 And all of these tissues are converging together on the same biological process of schizophrenia.
01:13:05 And you’re saying the saving grace is that that conversion seems to happen for a lot
01:13:09 of these diseases.
01:13:11 For all of them.
01:13:12 Basically that for every single disease that we’ve looked at, we have found an epigenomic
01:13:17 enrichment.
01:13:18 How do you do that?
01:13:19 You basically have all of the genetic variants associated with the disorder.
01:13:24 And then you’re asking for all of the enhancers active in a particular tissue.
01:13:28 For 540 disorders, we’ve basically found that indeed there is an enrichment.
01:13:33 That basically means that there is commonality.
01:13:37 And from the commonality, we can just get insights.
01:13:40 So to explain in mathematical terms, we’re basically building an empirical prior.
01:13:47 We’re using a Bayesian approach to basically say, great, all of these variants are equally
01:13:52 likely in a particular locus to be important.
01:13:57 So in a genetic locus, you basically have a dozen variants that are coinherited.
01:14:02 Because the way that inheritance works in the human genome is through all of these recombination
01:14:07 events during meiosis, you basically have, you know, you inherit maybe three, chromosome
01:14:16 three, for example, in your body is inherited from four different parts.
01:14:20 One part comes from your dad, another part comes from your mom, another part comes from
01:14:23 your dad, another part comes from your mom.
01:14:25 So basically, the way that it, sorry, from your mom’s mom.
01:14:30 So you basically have one copy that comes from your dad and one copy that comes from
01:14:33 your mom.
01:14:34 But that copy that you got from your mom is a mixture of her maternal and her paternal
01:14:39 chromosome.
01:14:41 And the copy that you got from your dad is a mixture of his maternal and his paternal
01:14:44 chromosome.
01:14:45 So these breakpoints that happen when chromosomes are lining up are basically ensuring through
01:14:53 these crossover events, they’re ensuring that every child cell during the process of meiosis,
01:15:02 where you basically have, you know, one spermatozoid that basically couples with one ovule to basically
01:15:08 create one egg to basically create the zygote.
01:15:12 You basically have half of your genome that comes from dad and half your genome that comes
01:15:16 from mom.
01:15:17 But in order to line them up, you basically have these crossover events.
01:15:21 These crossover events are basically leading to coinheritance of that entire block coming
01:15:27 from your maternal grandmother and that entire block coming from your maternal grandfather.
01:15:33 Over many generations, these crossover events don’t happen randomly.
01:15:38 There’s a protein called PRDM9 that basically guides the double stranded breaks and then
01:15:45 leads to these crossovers.
01:15:48 And that protein has a particular preference to only a small number of hotspots of recombination,
01:15:54 which then lead to a small number of breaks between these coinheritance patterns.
01:15:59 So even though there are 6 million variants, there are 6 million loci, this variation is
01:16:06 inherited in blocks and every one of these blocks has like two dozen genetic variants
01:16:12 that are all associated.
01:16:13 So in the case of FTO, it wasn’t just one variant, it was 89 common variants that were
01:16:19 all humongously associated with obesity.
01:16:24 Which one of those is the important one?
01:16:26 Well, if you look at only one locus, you have no idea.
01:16:29 But if you look at many loci, you basically say, aha, all of them are enriching in the
01:16:36 same epigenomic map.
01:16:40 In that particular case, it was mesenchymal stem cells.
01:16:44 So these are the progenitor cells that give rise to your brown fat and your white fat.
01:16:50 Progenitor is like the early on developmental stem cells?
01:16:54 So you start from one zygote and that’s a totipotent cell type.
01:16:58 It can do anything.
01:17:00 You then, you know, that cell divides, divides, divides, and then every cell division is leading
01:17:08 to specialization where you now have a mesodermal lineage and ectodermal lineage and endodermal
01:17:14 lineage that basically leads to different parts of your body.
01:17:19 The ectoderm will basically give rise to your skin, ecto means outside, derm is skin.
01:17:25 So ectoderm, but it also gives rise to your neurons and your whole brain.
01:17:29 So that’s a lot of ectoderm.
01:17:31 Mesoderm gives rise to your internal organs, including the vasculature and you know, your
01:17:36 muscle and stuff like that.
01:17:38 So you basically have this progressive differentiation and then if you look further, further down
01:17:45 that lineage, you basically have one lineage that will give rise to both your muscle and
01:17:49 your bone, but also your fat.
01:17:52 And if you go further down the lineage of your fat, you basically have your white fat
01:17:57 cells.
01:17:59 These are the cells that store energy.
01:18:01 So when you eat a lot, but you don’t exercise too much, there’s an excess set of calories,
01:18:06 excess energy.
01:18:07 What do you do with those?
01:18:08 You basically create, you spend a lot of that energy to create these high energy molecules,
01:18:13 lipids, which you can then burn when you need them on a rainy day.
01:18:19 So that leads to obesity if you don’t exercise and if you overeat because your body’s like,
01:18:26 oh great, I have all these calories.
01:18:27 I’m going to store them.
01:18:28 Ooh, more calories.
01:18:29 I’m going to store them too.
01:18:30 Ooh, more calories.
01:18:31 So basically the 42% of European chromosomes have a predisposition to storing fat, which
01:18:40 was selected probably in the food scarcity periods, like basically as we were exiting
01:18:48 Africa before and during the ice ages, there was probably a selection to those individuals
01:18:54 who made it North to basically be able to store energy, a lot more energy.
01:19:00 So you basically now have this lineage that is deciding whether you want to store energy
01:19:07 in your white fat or burn energy in your beige fat.
01:19:11 It turns out that your fat is, you know, like we have such a bad view of fat.
01:19:18 Fat is your best friend.
01:19:20 Fat can both store all these excess lipids that would be otherwise circulating through
01:19:24 your body and causing damage, but it can also burn calories directly.
01:19:29 If you have too much energy, you can just choose to just burn some of that as heat.
01:19:35 So basically when you’re cold, you’re burning energy to basically warm your body up and
01:19:41 you’re burning all these lipids and you’re burning all these calories.
01:19:44 So what we basically found is that across the board, genetic variants associated with
01:19:50 obesity across many of these regions were all enriched repeatedly in mesenchymal stem
01:19:56 cell enhancers.
01:19:58 So that gave us a hint as to which of these genetic variants was likely driving this whole
01:20:05 association.
01:20:06 And we ended up with this one genetic variant called RS1421085.
01:20:14 And that genetic variant out of the 89 was the one that we predicted to be causal for
01:20:20 the disease.
01:20:21 Wow.
01:20:22 So going back to those steps, first step is figure out the relevant tissue based on the
01:20:26 global enrichment.
01:20:27 Second step is figure out the causal variant among many variants in this linkage disequilibrium
01:20:34 in this coinherited block between these recombination hotspots, these boundaries of these inherited
01:20:41 blocks.
01:20:42 That’s the second step.
01:20:43 The third step is once you know that causal variant, try to figure out what is the motif
01:20:49 that is disrupted by that causal variant.
01:20:52 Basically how does it act?
01:20:54 Variants don’t just disrupt elements, they disrupt the binding of specific regulators.
01:20:59 So basically the third step there was how do you find the motif that is responsible
01:21:04 like the gene regulatory word, the building block of gene regulation that is responsible
01:21:10 for that dysregulatory event.
01:21:12 And the fourth step is finding out what regulator normally binds that motif and is now no longer
01:21:18 able to bind.
01:21:19 And then once you have the regulator, can you then try to figure out how to, what after
01:21:24 it developed, how to fix it?
01:21:27 That’s exactly right.
01:21:28 You now know how to intervene.
01:21:30 You have basically a regulator, you have a gene that you can then perturb and you say,
01:21:34 well, maybe that regulator has a global role in obesity.
01:21:38 I can perturb the regulator.
01:21:40 Just to clarify, when we say perturb, like on the scale of a human life, can a human
01:21:46 being be helped?
01:21:49 Of course.
01:21:50 Yeah.
01:21:51 I guess understanding is the first step.
01:21:52 No, no, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics
01:21:57 against that.
01:21:59 Or you develop other types of intervention that affect the expression of that gene.
01:22:03 What do pharmaceutical therapeutics look like when your understanding is on a genetic level?
01:22:11 Yeah.
01:22:12 Sorry if it’s a dumb question.
01:22:13 No, no, no.
01:22:14 It’s a brilliant question, but I want to save it for a little bit later when we start talking
01:22:16 about therapeutics.
01:22:17 Perfect.
01:22:18 So let’s talk about the first four steps.
01:22:20 There’s two more.
01:22:21 So basically the first step is figure out, I mean, the zero step, the starting point
01:22:25 is the genetics.
01:22:26 The first step after that is figure out the tissue of action.
01:22:31 The second step is figuring out the nucleotide that is responsible or set of nucleotides.
01:22:36 The third step is figuring out the motif and the upstream regulator, number four.
01:22:40 Number five and six is what are the targets?
01:22:44 So number five is great.
01:22:45 Now I know the regulator.
01:22:47 I know the motif.
01:22:48 I know the tissue and I know the variant.
01:22:51 What does it actually do?
01:22:53 So you have to now trace it to the biological process and the genes that mediate that biological
01:22:59 process.
01:23:00 So knowing all of this can now allow you to find the target genes.
01:23:05 How?
01:23:06 By basically doing perturbation experiments or by looking at the folding of the epigenome
01:23:13 or by looking at the genetic impact of that genetic variant on the expression of genes.
01:23:19 And we use all three.
01:23:21 So let me go through them.
01:23:22 Basically one of them is physical links.
01:23:26 This is the folding of the genome onto itself.
01:23:29 How do you even figure out the folding?
01:23:32 It’s a little bit of a tangent, but it’s a super awesome technology.
01:23:36 Think of the genome as again, this massive packaging that we talked about of taking two
01:23:41 meters worth of DNA and putting it in something that’s a million times smaller than two meters
01:23:48 worth of DNA.
01:23:49 That’s a single cell.
01:23:51 You basically have this massive packaging and this packaging basically leads to the
01:23:56 chromosome being wrapped around in sort of tight, tight ways in ways, however, that are
01:24:02 functionally capable of being reopened and reclosed.
01:24:07 So I can then go in and figure out that folding by sort of chopping up the spaghetti soup,
01:24:15 putting glue and ligating the segments that were chopped up but nearby each other, and
01:24:21 then sequencing through these ligation events to figure out that this region of this chromosome,
01:24:26 that region of the chromosome were near each other.
01:24:28 That means they were interacting even though they were far away on the genome itself.
01:24:33 So that chopping up, sequencing and reglueing is basically giving you folds of the genome
01:24:42 that we call.
01:24:43 Sorry, can you backtrack?
01:24:44 Of course.
01:24:45 How does cutting it help you figure out which ones were close in the original folding?
01:24:50 So you have a bowl of noodles.
01:24:53 Go on.
01:24:54 And in that bowl of noodles, some noodles are near each other.
01:24:59 Yes.
01:25:00 So you throw in a bunch of glue, you basically freeze the noodles in place, throw in a cutter
01:25:06 that chops up the noodles into little pieces.
01:25:10 Now throw in some ligation enzyme that lets those pieces that were free religate near
01:25:18 each other.
01:25:19 In some cases, they religate what you had just cut, but that’s very rare.
01:25:24 Most of the time they will religate in whatever was proximal.
01:25:30 You now have glued the red noodle that was crossing the blue noodle to each other.
01:25:36 You then reverse the glue, the glue goes away and you just sequence the heck out of it.
01:25:43 Most of the time you’ll find red segment with, you know, red segment, but you can specifically
01:25:48 select for ligation events that have happened that were not from the same segment by sort
01:25:52 of marking them in a particular way and then selecting those and then you sequence and
01:25:57 you look for red with blue matches of sort of things that were glued that were not immediate
01:26:03 proximal to each other.
01:26:05 And that reveals the linking of the blue noodle and the red noodle.
01:26:08 You’re with me so far?
01:26:09 Yeah.
01:26:10 Good.
01:26:11 So we’ve done these experiments.
01:26:12 That’s the physical.
01:26:13 That’s the physical.
01:26:14 That’s step one of the physical.
01:26:15 And what the physical revealed is topologically associated domains, basically big blocks of
01:26:20 the genome that are topologically connected together.
01:26:25 That’s the physical.
01:26:26 The second one is the genetic links.
01:26:30 It basically says across individuals that have different genetic variants, how are their
01:26:37 genes expressed differently?
01:26:39 Remember before I was saying that the path between genetics and disease is enormous,
01:26:43 but we can break it up to look at the path between genetics and gene expression.
01:26:47 So instead of using Alzheimer’s as a phenotype, I can now use expression of IRX3 as the phenotype,
01:26:54 expression of gene A. And I can look at all of the humans who contain a G at that location
01:27:01 and all the humans that contain a T at that location and basically say, wow, it turns
01:27:05 out that the expression of each gene is higher for the T humans than for the G humans at
01:27:09 that location.
01:27:10 So that basically gives me a genetic link between a genetic variant, a locus, a region,
01:27:16 and the expression of nearby genes.
01:27:19 Good on the genetic link?
01:27:20 I think so.
01:27:21 Awesome.
01:27:22 The third genetic link is the activity link.
01:27:25 What’s an activity link?
01:27:26 It basically says if I look across 833 different epigenomes, whenever this enhancer is active,
01:27:34 this gene is active.
01:27:36 That gives me an activity link between this region of the DNA and that gene.
01:27:42 And then the fourth one is perturbations where I can go in and blow up that region and see
01:27:47 what are the genes that change in expression, or I can go in and over activate that region
01:27:51 and see what genes change in expression.
01:27:55 So I guess that’s similar to activity?
01:27:57 Yeah.
01:27:58 Yeah.
01:27:59 So that’s basically similar to activity.
01:28:00 I agree, but it’s causal rather than correlational.
01:28:02 Again, I’m a little weird.
01:28:04 No, no, you’re 100% on.
01:28:07 It’s exactly the same as the perturbation where I go in and intervene.
01:28:11 I basically take a bunch of cells.
01:28:13 So you know CRISPR, right?
01:28:16 CRISPR is this genome guidance and cutting mechanism.
01:28:21 That’s what George Church likes to call genome vandalism.
01:28:24 So you basically are able to, you can basically take a guide RNA that you put into the CRISPR
01:28:32 system, and the CRISPR system will basically use this guide RNA, scan the genome, find
01:28:38 wherever there’s a match, and then cut the genome.
01:28:42 So I digress, but it’s a bacterial immune defense system.
01:28:48 So basically bacteria are constantly attacked by viruses, but sometimes they win against
01:28:54 the viruses and they chop up these viruses.
01:28:56 And remember as a trophy inside their genome, they have these loci, these CRISPR loci that
01:29:02 basically stands for clustered repeats, interspersed, et cetera.
01:29:06 So basically it’s an interspersed repeats structure where basically you have a set of
01:29:11 repetitive regions and then interspersed where these variable segments that were basically
01:29:17 matching viruses.
01:29:19 So when this was first discovered, it was basically hypothesized that this is probably
01:29:24 a bacterial immune system that remembers the trophies of the viruses that managed to kill.
01:29:30 And then the bacteria pass on, you know, they sort of do lateral transfer of DNA and they
01:29:34 pass on these memories so that the next bacterium says, Ooh, you killed that guy.
01:29:39 When that guy shows up again, I will recognize him.
01:29:41 And the CRISPR system was basically evolved as a bacterial adaptive immune response to
01:29:47 sense foreigners that should not belong and to just go and cut their genome.
01:29:52 So it’s an RNA guided RNA cutting enzyme or an RNA guided DNA cutting enzyme.
01:30:00 So there’s different systems.
01:30:02 Some of them cut DNA, some of them cut RNA, but all of them remember this sort of viral
01:30:08 attack.
01:30:10 So what we have done now as a field is, you know, through the work of, you know, Jennifer
01:30:15 Donne, Manuel Carpentier, Feng Zhang and many others is coopted that system of bacterial
01:30:23 immune defense as a way to cut genomes.
01:30:26 You basically have this guiding system that allows you to use an RNA guide to bring enzymes
01:30:35 to cut DNA at a particular locus.
01:30:37 That’s so fascinating.
01:30:39 So this is like already a natural mechanism, a natural tool for cutting those useful as
01:30:45 particular context.
01:30:46 And we’re like, well, we can use that thing to actually, it’s a nice tool that’s already
01:30:51 in the body.
01:30:52 Yeah.
01:30:53 Yeah.
01:30:54 It’s not in our body.
01:30:55 It’s in the bacterial body.
01:30:56 It was discovered by the yogurt industry.
01:30:59 They were trying to make better yogurts and they were trying to make their bacteria in
01:31:03 their yogurt cultures more resilient to viruses.
01:31:08 And they were studying bacteria and they found that, wow, this CRISPR system is awesome.
01:31:12 It allows you to defend against that.
01:31:14 And then it was coopted in mammalian systems that don’t use anything like that as a targeting
01:31:20 way to basically bring these DNA cutting enzymes to any locus in the genome.
01:31:25 Why would you want to cut DNA to do anything?
01:31:29 The reason is that our DNA has a DNA repair mechanism where if a region of the genome
01:31:35 gets randomly cut, you will basically scan the genome for anything that matches and sort
01:31:40 of use it by homology.
01:31:43 So the reason why we’re deployed is because we now have a spare copy.
01:31:47 As soon as my mom’s copy is deactivated, I can use my dad’s copy.
01:31:50 And somewhere else, if my dad’s copy is deactivated, I can use my mom’s copy to repair it.
01:31:55 So this is called homologous based repair.
01:31:59 So all you have to do is the cutting and you don’t have to do the fixing.
01:32:04 That’s exactly right.
01:32:05 You don’t have to do the fixing.
01:32:06 Because it’s already built in.
01:32:07 That’s exactly right.
01:32:08 But the fixing can be coopted by throwing in a bunch of homologous segments that instead
01:32:14 of having your dad’s version, have whatever other version you’d like to use.
01:32:19 So you then control the fixing by throwing in a bunch of other stuff.
01:32:24 That’s exactly right.
01:32:25 And that’s how you do genome editing.
01:32:26 So that’s what CRISPR is.
01:32:27 That’s what CRISPR is.
01:32:28 In popular culture, people use the term.
01:32:30 I’ve never, wow, that’s brilliant.
01:32:32 So CRISPR is genome vandalism followed by a bunch of band aids that have the sequence
01:32:39 that you’d like.
01:32:40 And you could control the choices of band aids.
01:32:43 Correct.
01:32:44 And of course there’s new generations of CRISPR.
01:32:46 There’s something that’s called prime editing that was sort of very, very much in the press
01:32:50 recently that basically instead of sort of making a double stranded break, which again
01:32:55 is genome vandalism, you basically make a single stranded break.
01:33:00 You basically just nick one of the two strands, enabling you to sort of peel off without sort
01:33:06 of completely breaking it up and then repair it locally using a guide that is coupled to
01:33:13 your initial RNA that took you to that location.
01:33:18 Dumb question, but is CRISPR as awesome and cool as it sounds?
01:33:24 I mean, technically speaking, in terms of like as a tool for manipulating our genetics
01:33:31 in the positive meaning of the word manipulating, or is there downsides, drawbacks in this whole
01:33:39 context of therapeutics that we’re talking about or understanding and so on?
01:33:42 So when I teach my students about CRISPR, I show them articles with the headline, genome
01:33:50 editing tool revolutionizes biology.
01:33:53 And then I show them the date of these articles and they’re 2004, like five years before CRISPR
01:33:58 was invented.
01:33:59 And the reason is that they’re not talking about CRISPR.
01:34:02 They’re talking about zinc finger enzymes that are another way to bring these cutters
01:34:07 to the genome.
01:34:09 It’s a very difficult way of sort of designing the right set of zinc finger proteins, the
01:34:13 right set of amino acids that will now target a particular long stretch of DNA because for
01:34:20 every location that you want to target, you need to design a particular regulator, a particular
01:34:25 protein that will match that region well.
01:34:28 There’s another technology called talons, which are basically just a different way of
01:34:35 using proteins to sort of guide these cutters to a particular location of the genome.
01:34:41 These require a massive team of engineers, of biological engineers to basically design
01:34:46 a set of amino acids that will target a particular sequence of your genome.
01:34:51 The reason why CRISPR is amazingly, awesomely revolutionary is because instead of having
01:34:57 this team of engineers design a new set of proteins for every locus that you want to
01:35:02 target, you just type it in your computer and you just synthesize an RNA guide.
01:35:07 The beauty of CRISPR is not the cutting, it’s not the fixing.
01:35:11 All of that was there before.
01:35:12 It’s the guiding, and the only thing that changes is that it makes the guiding easier
01:35:17 by sort of just typing in the RNA sequence, which then allows the system to sort of scan
01:35:23 the DNA to find that.
01:35:25 So the coding, the engineering of the cutter is easier in terms of SP.
01:35:32 That’s kind of similar to the story of deep learning versus old school machine learning.
01:35:37 Some of the challenging parts are automated.
01:35:41 But CRISPR is just one cutting technology, and then that’s part of the challenges and
01:35:47 exciting opportunities of the field is to design different cutting technologies.
01:35:53 So now this was a big parenthesis on CRISPR, but now when we were talking about perturbations,
01:36:00 you basically now have the ability to not just look at correlation between enhancers
01:36:04 and genes, but actually go and either destroy that enhancer and see if the gene changes
01:36:10 in expression, or you can use the CRISPR targeting system to bring in not vandalism and cutting,
01:36:20 but you can couple the CRISPR system with, and the CRISPR system is called usually CRISPR
01:36:26 Cas9 because Cas9 is the protein that will then come and cut.
01:36:30 But there’s a version of that protein called dead Cas9 where the cutting part is deactivated.
01:36:36 So you basically use the dead Cas9 to bring in an activator or to bring in a repressor.
01:36:45 So you can now ask, is this enhancer changing that gene by taking this modified CRISPR,
01:36:51 which is already modified from the bacteria to be used in humans, that you can now modify
01:36:55 the Cas9 to be dead Cas9, and you can now further modify to bring in a regulator, and
01:37:01 you can basically turn on or turn off that enhancer and then see what is the impact on
01:37:05 that gene.
01:37:06 So these are the four ways of linking the locus to the target gene, and that’s step
01:37:11 number five.
01:37:14 Step number five is find the target gene, and step number six is what the heck does
01:37:17 that gene do?
01:37:19 You basically now go and manipulate that gene to basically see what are the processes that
01:37:25 change, and you can basically ask, well, in this particular case, in the FTO locus, we
01:37:32 found mesenchymal stem cells that are the progenitors of white fat and brown fat or
01:37:38 beige fat.
01:37:39 We found the RS1421085 nucleotide variant as the causal variant.
01:37:44 We found this large enhancer, this master regulator.
01:37:49 I like to call it OB1 for obesity one, like the strongest enhancer associated with it,
01:37:55 and OB1 was kind of chubby as the actor.
01:37:57 I don’t know if you remember him.
01:38:01 So you basically are using this Jedi mind trick to basically find out the location of
01:38:07 the genome that is responsible, the enhancer that harbors it, the motif, the upstream regulator,
01:38:14 which is ARID5B for AT rich interacting domain 5B.
01:38:18 That’s a protein that sort of comes and binds normally.
01:38:21 That protein is normally a repressor.
01:38:23 It represses this super enhancer, this massive 12,000 nucleotide master regulatory control
01:38:28 gene, and it turns off IRX3, which is a gene that’s 600,000 nucleotides away, and IRX5,
01:38:36 which is 1.2 million nucleotides away.
01:38:38 So those things.
01:38:39 And what’s the effect of turning them off?
01:38:40 That’s exactly the next question.
01:38:42 So step six is what do these genes actually do?
01:38:45 So we then ask, what does RX3 and RX5 do?
01:38:48 The first thing we did is look across individuals for individuals that had higher expression
01:38:52 of RX3 or lower expression RX3.
01:38:55 And then we looked at the expression of all of the other genes in the genome.
01:38:58 And we looked for simply correlation.
01:39:01 And we found that RX3 and RX5 were both correlated positively with lipid metabolism and negatively
01:39:09 with mitochondrial biogenesis.
01:39:11 You’re like, what the heck does that mean?
01:39:16 Does this sound related to obesity?
01:39:18 Not at all superficially, but lipid metabolism should, because lipids is these high and
01:39:25 energy molecules that basically store fat.
01:39:28 So RX3 and RX5 are negatively correlated with lipid metabolism.
01:39:33 So that basically means that when they turn on, positively, when they turn on, they turn
01:39:39 on lipid metabolism.
01:39:41 And they’re negatively correlated with mitochondrial biogenesis.
01:39:45 What do mitochondria do in this whole process?
01:39:49 Again, small parenthesis, what are mitochondria?
01:39:53 Mitochondria are little organelles.
01:39:56 They arose, they only are found in eukaryotes.
01:40:01 U means good, karyote means nucleus.
01:40:04 So truly like a true nucleus.
01:40:05 So eukaryotes have a nucleus.
01:40:07 Prokaryotes are before the nucleus.
01:40:09 They don’t have a nucleus.
01:40:11 So eukaryotes have a nucleus, compartmentalization.
01:40:16 Eukaryotes have also organelles.
01:40:19 Some eukaryotes have chloroplasts.
01:40:22 These are the plants, they photosynthesize.
01:40:26 Some other eukaryotes like us have another type of organelle called mitochondria.
01:40:33 These arose from an ancient species that we engulfed.
01:40:40 This is an endosymbiosis event.
01:40:44 Symbiosis bio means life, sim means together.
01:40:47 So symbiotes are things that live together.
01:40:50 Symbiosis endo means inside, so endosymbiosis means you live together holding the other
01:40:54 one inside you.
01:40:56 So the pre eukaryotes engulfed an organism that was very good at energy production and
01:41:07 that organism eventually shed most of its genome to now have only 13 genes in the mitochondrial
01:41:14 genome and those 13 genes are all involved in energy production, the electron transport
01:41:22 chain.
01:41:23 So basically electrons are these massive super energy rich molecules.
01:41:28 We basically have these organelles that produce energy and when your muscle exercises, you
01:41:35 basically multiply your mitochondria.
01:41:37 You basically sort of, you know, use more and more mitochondria and that’s how you get
01:41:42 beefed up.
01:41:43 So basically the muscle sort of learns how to generate more energy.
01:41:47 So basically every single time your muscles will, you know, overnight regenerate and sort
01:41:51 of become stronger and amplify their mitochondria and so forth.
01:41:55 So what does mitochondria do?
01:41:56 The mitochondria use energy to sort of do any kind of task.
01:42:02 When you’re thinking, you’re using energy.
01:42:05 This energy comes from mitochondria.
01:42:06 Your neurons have mitochondria all over the place.
01:42:10 Basically this mitochondria can multiply as organelles and they can be spread along the
01:42:13 body of your muscle.
01:42:15 Some of your muscle cells have actually multiple nuclei, they’re polynucleated, but they also
01:42:18 have multiple mitochondria to basically deal with the fact that your muscle is enormous.
01:42:24 You can sort of span these super, super long length and you need energy throughout the
01:42:28 length of your muscle.
01:42:29 So that’s why you have mitochondria throughout the length and you also need transcription
01:42:32 through the length so you have multiple nuclei as well.
01:42:35 So these two processes, lipids store energy, what do mitochondria do?
01:42:42 So there’s a process known as thermogenesis.
01:42:46 Thermal heat, genesis generation.
01:42:48 Thermogenesis is the generation of heat.
01:42:50 Remember that bathtub with the in and out?
01:42:55 That’s the equation that everybody’s focused on.
01:42:57 So how much energy do you consume?
01:42:58 How much energy do you burn?
01:43:01 But in every thermodynamic system, there’s three parts to the equation.
01:43:06 There’s energy in, energy out, and energy lost.
01:43:10 Any machine has loss of energy.
01:43:14 How do you lose energy?
01:43:15 You emanate heat.
01:43:17 So heat is energy loss.
01:43:20 So there’s…
01:43:24 Which is where the thermogenesis comes in.
01:43:26 Thermogenesis is actually a regulatory process that modulates the third component of the
01:43:32 thermodynamic equation.
01:43:34 You can basically control thermogenesis explicitly.
01:43:37 You can turn on and turn off thermogenesis.
01:43:39 And that’s where the mitochondria comes into play.
01:43:41 Exactly.
01:43:42 So Irix3 and RX5 turn out to be the master regulators of a process of thermogenesis versus
01:43:49 lipogenesis generation of fat.
01:43:52 So Irix3 and RX5 in most people burn heat, burn calories as heat.
01:43:58 So when you eat too much, just burn it off in your fat cells.
01:44:02 So that bathtub has basically a sort of dissipation knob that most people are able to turn on.
01:44:11 I am unable to turn that on because I am a homozygous carrier for the mutation that changes
01:44:17 a T into a C in the RS1421085 allele and locus, a SNP.
01:44:24 I have the risk allele twice from my mom and from my dad.
01:44:28 So I’m unable to thermogenize.
01:44:31 I’m unable to turn on thermogenesis through Irix3 and RX5 because the regulator that normally
01:44:37 binds here, Irix5b, can no longer bind because it’s an AT rich interacting domain.
01:44:42 And as soon as I change the T into a C, it can no longer bind because it’s no longer
01:44:46 AT rich.
01:44:47 But doesn’t that mean that you’re able to use the energy more efficiently?
01:44:52 You’re not generating heat or is that?
01:44:54 That means I can eat less and get around just fine.
01:44:56 Yes.
01:44:57 Yeah.
01:44:58 So that’s a feature actually.
01:44:59 It’s a feature in a food scarce environment.
01:45:02 Yeah.
01:45:03 But if we’re all starving, I’m doing great.
01:45:05 If we all have access to massive amounts of food, I’m obese basically.
01:45:09 That’s taken us to the entire process of then understanding that why mitochondria and then
01:45:14 the lipids are both, even though distant, are somehow involved.
01:45:18 Different sides of the same coin.
01:45:20 And you basically choose to store energy or you can choose to burn energy.
01:45:24 And then all of that is involved in the puzzle of obesity.
01:45:27 And that’s what’s fascinating, right?
01:45:29 Here we are in 2007, discovering the strongest genetic association with obesity and knowing
01:45:35 nothing about how it works for almost 10 years.
01:45:39 For 10 years, everybody focused on this FTO gene and they were like, oh, it must have
01:45:43 to do something with RNA modification.
01:45:46 And it’s like, no, it has nothing to do with the function of FTO.
01:45:50 It has everything to do with all of these other processes.
01:45:53 And suddenly the moment you solve that puzzle, which is a multiyear effort by the way, a
01:45:58 tremendous effort by Melina and many, many others.
01:46:01 So this tremendous effort basically led us to recognize this circuitry.
01:46:07 You went from having some 89 common variants associated in that region of the DNA sitting
01:46:12 on top of this gene to knowing the whole circuitry.
01:46:17 When you know the circuitry, you can now go crazy.
01:46:21 You can now start intervening at every level.
01:46:24 You can start intervening at the arid 5B level.
01:46:27 You can start intervening with CRISPR Cas9 at the single SNP level.
01:46:31 You can start intervening at iRx3 and iRx5 directly there.
01:46:34 You can start intervening at the thermogenesis level because you know the pathway.
01:46:38 You can start intervening at the differentiation level where the decision to make either white
01:46:45 fat or beige fat, the energy burning beige fat is made developmentally in the first three
01:46:51 days of differentiation of your adipocytes.
01:46:54 So as they’re differentiating, you basically can choose to make fat burning machines or
01:46:57 fat storing machines.
01:46:59 And sort of that’s how you populate your fat.
01:47:02 You basically can now go in pharmaceutical and do all of that.
01:47:05 And in our paper, we actually did all of that.
01:47:09 We went in and manipulated every single aspect.
01:47:12 At the nucleotide level, we use CRISPR Cas9 genome editing to basically take primary adipocytes
01:47:18 from risk and non risk individuals and show that by editing that one nucleotide out of
01:47:24 3.2 billion nucleotides in the human genome, you could then flip between an obese phenotype
01:47:29 and a lean phenotype like a switch.
01:47:31 You can basically take my cells that are non thermogenizing and just flip into thermogenizing
01:47:36 cells by changing one nucleotide.
01:47:38 It’s mind boggling.
01:47:40 It’s so inspiring that this puzzle could be solved in this way and it feels within reach
01:47:44 to then be able to crack the problem of some of these diseases.
01:47:50 What are the technologies, the tools that came along that made this possible?
01:48:00 What are you excited about?
01:48:01 Maybe if we just look at the buffet of things that you’ve kind of mentioned, what’s involved?
01:48:08 What should we be excited about?
01:48:09 What are you excited about?
01:48:11 I love that question because there’s so much ahead of us.
01:48:14 There’s so, so much.
01:48:18 So basically solving that one locus required massive amounts of knowledge that we have
01:48:24 been building across the years through the epigenome, through the comparative genomics
01:48:28 to find out the causal variant and the controller regulatory motif through the conserved circuitry.
01:48:35 It required knowing these regulatory genomic wiring.
01:48:38 It required high C of these sort of topologically associated domains to basically find these
01:48:42 long range interaction.
01:48:44 It required EQTLs of these sort of genetic perturbation of these intermediate gene phenotypes.
01:48:51 It required all of the arsenal of tools that I’ve been describing was put together for
01:48:55 one locus.
01:48:57 And this was a massive team effort, huge investment in time, energy, money, effort, intellectual,
01:49:05 everything.
01:49:06 You’re referring to, I’m sorry, just for the obesity one.
01:49:09 Yeah, this one paper.
01:49:10 This one single paper.
01:49:11 This one single locus.
01:49:12 I would like to say that this is a paper about one nucleotide in the human genome, about
01:49:16 one bit of information, C versus T in the human genome.
01:49:20 That’s one bit of information and we have 3.2 billion nucleotides to go through.
01:49:25 So how do you do that systematically?
01:49:29 I am so excited about the next phase of research because the technologies that my group and
01:49:35 many other groups have developed allows us to now do this systematically, not just one
01:49:40 locus at a time, but thousands of loci at a time.
01:49:45 So let me describe some of these technologies.
01:49:48 The first one is automation and robotics.
01:49:52 So basically, you know, we talked about how you can take all of these molecules and see
01:49:58 which of these molecules are targeting each of these genes and what do they do?
01:50:02 So you can basically now screen through millions of molecules through thousands and thousands
01:50:07 and thousands of plates, each of which has thousands and thousands and thousands of molecules,
01:50:12 every single time testing, you know, all of these genes and asking which of these molecules
01:50:20 perturb these genes.
01:50:22 So that’s technology number one, automation and robotics.
01:50:25 Technology number two is parallel readouts.
01:50:29 So instead of perturbing one locus and then asking if I use CRISPR Cas9 on this enhancer
01:50:35 to basically use dCas9 to turn on or turn off the enhancer, or if I use CRISPR Cas9
01:50:41 on the SNP to basically change that one SNP at a time, then what happens?
01:50:46 But we have 120,000 disease associated SNPs that we want to test.
01:50:52 We don’t want to spend 120,000 years doing it.
01:50:57 So what do we do?
01:50:58 We’ve basically developed this technology for massively parallel reporter assays, MPRA.
01:51:07 So in collaboration with Tarsha Mikkelsen, Eric Lander, I mean, Jason Durie’s group has
01:51:11 done a lot of that.
01:51:12 So there’s a lot of groups that basically have developed technologies for testing 10,000
01:51:19 genetic variants at a time.
01:51:21 How do you do that?
01:51:23 You know, we talked about microarray technology, the ability to synthesize these huge microarrays
01:51:28 that allow you to do all kinds of things like measure gene expression by hybridization,
01:51:33 by measuring the genotype of a person, by looking at hybridization with one version
01:51:38 with a T versus the other version with a C, and then sort of figuring out that I am a
01:51:43 risk carrier for obesity based on these differential hybridization in my genome that says, oh,
01:51:49 you seem to only have this allele or you seem to have that allele.
01:51:53 These can also be used to systematically synthesize small fragments of DNA.
01:51:59 So you can basically synthesize these 150 nucleotide long fragments across 450,000 spots
01:52:07 at a time.
01:52:10 You can now take the result of that synthesis, which basically works through all of these
01:52:15 sort of layers of adding one nucleotide at a time.
01:52:18 You can basically just type it into your computer and order it, and you can basically order
01:52:24 10,000 or 100,000 of these small DNA segments at a time.
01:52:30 And that’s where awesome molecular biology comes in.
01:52:33 You can basically take all these segments, have a common start and end barcode or sort
01:52:38 of like Gator, just like pieces of a puzzle.
01:52:42 You can make the same end piece and the same start piece for all of them.
01:52:48 And you can now use plasmids, which are these extra chromosomal small DNA circular segments
01:52:57 that are basically inhabiting all our, all our genomes.
01:53:00 We basically have, you know, plasmids from floating around and bacteria use plasmids
01:53:05 for transferring DNA.
01:53:07 And that’s where they put a lot of antibiotic resistance genes.
01:53:10 So they can easily transfer them from one bacterium to the other.
01:53:14 After one bacterium evolves a gene to be resistant to a particular antibiotic, it basically says
01:53:20 to all its friends, Hey, here’s that sort of DNA piece.
01:53:24 We can now coopt these plasmids into human cells.
01:53:28 You can basically make a human cell culture and add plasmids to that human cell culture
01:53:34 that contain the things that you want to test.
01:53:38 You now have this library of 450,000 elements.
01:53:41 You can insert them each into the common plasmid and then test them in millions of cells in
01:53:47 parallel.
01:53:48 And the common plasmid is all the same before you add it.
01:53:51 Exactly.
01:53:52 The rest of the plasmid is the same.
01:53:53 So it’s, it’s called an epizomal reporter assay.
01:53:57 Epizome means not inside the genome.
01:53:59 It’s sort of outside the chromosomes.
01:54:01 So it’s an epizomal assay that allows you to have a variable region where you basically
01:54:06 test 10,000 different enhancers and you have a common region which basically has the same
01:54:11 reporter gene.
01:54:13 You now can do some very cool molecular biology.
01:54:16 You can basically take the 450,000 elements that you’ve generated and you have a piece
01:54:21 of the puzzle here, piece of the puzzle here, which is identical.
01:54:24 So they’re compatible with that plasmid.
01:54:27 You can chop them up in the middle to separate a barcode reporter from the enhancer and in
01:54:32 the middle put the same gene again using the same piece of the puzzle.
01:54:36 You now can have a barcode readout of what is the impact of 10,000 different versions
01:54:42 of an enhancer on gene expression.
01:54:46 So we’re not doing one experiment, we’re doing 10,000 experiments.
01:54:50 And those 10,000 can be 5,000 of different loci and each of them in two versions, risk
01:54:58 or non risk.
01:55:00 I can now test tens of thousands.
01:55:01 Just a little hypothesis.
01:55:02 Exactly.
01:55:03 And then you can do 10,000 and we can test 10,000 hypothesis at once.
01:55:08 How hard is it to generate those 10,000?
01:55:11 Trivial.
01:55:12 Trivial.
01:55:13 But it’s biology.
01:55:14 No, no.
01:55:15 Generating the 10,000 is trivial because you basically add, it’s biotechnology.
01:55:20 You basically have these arrays that add one nucleotide at a time at every spot.
01:55:26 So it’s printing and so you’re able to, you’re able to control.
01:55:30 Yeah.
01:55:31 Is it super costly?
01:55:32 Is it?
01:55:33 10,000 bucks.
01:55:34 So this isn’t millions.
01:55:35 10,000 bucks for 10,000 experiments sounds like the right, you know.
01:55:39 I mean, so that’s super, that’s exciting because you don’t have to do one thing at a time.
01:55:44 You can now use that technology, these massively parallel reporter assays to test 10,000 locations
01:55:49 at a time.
01:55:51 We’ve made multiple modifications to that technology.
01:55:55 One was sharper MPRA, which stands for, you know, basically getting a higher resolution
01:56:04 view by tiling these, these elements so you can see where along the region of control
01:56:14 are they acting.
01:56:16 And we made another modification called Hydra for high, you know, definition regulatory
01:56:23 annotation or something like that, which basically allows you to test 7 million of these at a
01:56:30 time by sort of cutting them directly from the DNA.
01:56:32 So instead of synthesizing, which basically has the limit of 450,000 that you can synthesize
01:56:37 at a time, we basically said, Hey, if we want to test all accessible regions of the genome,
01:56:42 let’s just do an experiment that cuts accessible regions.
01:56:45 Let’s take those accessible regions, put them all with the same end joints of the puzzles,
01:56:51 and then now use those to create a much, much larger array of things that you can test.
01:56:59 And then tiling all of these regions, you can then pinpoint what are the driver nucleotides,
01:57:04 what are the elements, how are they acting across 7 million experiments at a time.
01:57:07 So basically this is all the same family of technology where you’re basically using these
01:57:12 parallel readouts of the barcodes.
01:57:15 And then to do this, we used a technology called StarSeq for self transcribing reporter
01:57:23 assays, a technology developed by Alex Stark, my former postdoc, who’s now API over in Vienna.
01:57:30 So we basically coupled the StarSeq, the self transcribing reporters where the enhancer
01:57:37 can be part of the gene itself.
01:57:39 So instead of having a separate barcode, that enhancer basically acts to turn on the gene
01:57:43 and it’s transcribed as part of the gene.
01:57:46 So you don’t have to have the two separate parts.
01:57:47 Exactly.
01:57:48 So you can just read them directly.
01:57:49 So there’s a constant improvements in this whole process.
01:57:52 By the way, generating all these options, is it basically brute force?
01:57:57 How much human intuition is?
01:57:58 Oh gosh, of course it’s human intuition and human creativity and incorporating all of
01:58:04 the input data sets.
01:58:06 Because again, the genome is enormous.
01:58:08 3.2 billion, you don’t want to test that.
01:58:11 You basically use all of these tools that I’ve talked about already.
01:58:14 You generate your top favorite 10,000 hypothesis, and then you go and test all 10,000.
01:58:19 And then from what comes out, you can then go to the next step.
01:58:24 So that’s technology number two.
01:58:25 So technology number one is robotics, automation, where you have thousands of wells and you
01:58:30 constantly test them.
01:58:32 The second technology is instead of having wells, you have these massively parallel readouts
01:58:37 in sort of these pooled assays.
01:58:40 The third technology is coupling CRISPR perturbations with these single cell RNA readouts.
01:58:51 So let me make another parenthesis here to describe now single cell RNA sequencing.
01:58:57 So what does single cell RNA sequencing mean?
01:58:59 So RNA sequencing is what has been traditionally used, well, traditionally the last 20 years,
01:59:07 ever since the advent of next generation sequencing.
01:59:10 So basically before RNA expression profiling was based on these microarrays.
01:59:14 The next technology after that was based on sequencing.
01:59:17 So you chop up your RNA and you just sequence small molecules, just like you would sequence
01:59:22 a genome, basically reverse transcribe the small RNAs into DNA, and you sequence that
01:59:28 DNA in order to get the number of sequencing reads corresponding to the expression level
01:59:35 of every gene in the genome.
01:59:37 You now have RNA sequencing.
01:59:39 How do you go to single cell RNA sequencing?
01:59:42 That technology also went through stages of evolution.
01:59:45 The first was microfluidics.
01:59:48 You basically had these, or even chambers, you basically had these ways of isolating
01:59:52 individual cells, putting them into a well for every one of these cells.
01:59:57 So you have 384 well plates and you now do 384 parallel reactions to measure the expression
02:00:03 of 384 cells.
02:00:05 That sounds amazing and it was amazing, but we want to do a million cells.
02:00:11 How do you go from these wells to a million cells?
02:00:14 You can’t.
02:00:15 So what the next technology was after that is instead of using a well for every reaction,
02:00:21 you now use a lipid droplet for every reaction.
02:00:26 So you use micro droplets as reaction chambers to basically amplify RNA.
02:00:33 So here’s the idea.
02:00:34 You basically have microfluidics where you basically have every single cell coming down
02:00:39 one tube in your microfluidics and you have little bubbles getting created in the other
02:00:44 way with specific primers that mark every cell with its own barcode.
02:00:49 You basically couple the two and you end up with little bubbles that have a cell and tons
02:00:55 of markers for that cell.
02:00:57 You now mark up all of the RNA for that one cell with the same exact barcode and you then
02:01:03 lyse all of the droplets and you sequence the heck out of that and you have for every
02:01:09 RNA molecule, a unique identifier that tells you what cell was it on.
02:01:12 That is such good engineering, microfluidics and using some kind of primer to put a label
02:01:20 on the thing.
02:01:21 I mean, you’re making it sound easy.
02:01:24 I assume it’s beautiful, but it’s gorgeous.
02:01:27 So there’s the next generation.
02:01:29 So that’s the second generation.
02:01:31 Next generation is forget the microfluidics altogether.
02:01:34 Just use big bottles.
02:01:35 How can you possibly do that with big bottles?
02:01:37 So here’s the idea.
02:01:39 You dissociate all of your cells or all of your nuclei from complex cells like brain
02:01:43 cells that are very long and sticky so you can’t do that.
02:01:48 If you have blood cells or if you have neuronal nuclei or brain nuclei, you can basically
02:01:52 dissociate let’s say a million cells.
02:01:56 You now want to add a unique barcode, a unique barcode in each one of a million cells using
02:02:01 only big bottles.
02:02:02 How can you possibly do that?
02:02:04 Sounds crazy, but here’s the idea.
02:02:07 You use a hundred of these bottles, you randomly shuffle all your million cells and you throw
02:02:13 them into those hundred bottles randomly, completely randomly.
02:02:17 You add one barcode out of a hundred to every one of the cells.
02:02:21 You then you now take them all out.
02:02:23 You shuffle them again and you throw them again into the same hundred bottles.
02:02:28 But now in a different randomization and you add a second barcode.
02:02:33 So every cell now has two barcodes.
02:02:36 You take them out again, you shuffle them and you throw them back in.
02:02:40 Another third barcode is adding randomly from the same hundred barcodes.
02:02:47 You’ve now labeled every cell probabilistically based on the unique path that he took of which
02:02:53 of a hundred bottles did he go for the first time, which of a hundred bottles the second
02:02:56 time and which of a hundred bottles the third time.
02:03:00 A hundred times a hundred times a hundred is a million unique barcodes in every single
02:03:05 one of these cells without ever using microfluidics.
02:03:09 Very clever.
02:03:10 It’s beautiful, right?
02:03:11 From a computer science perspective, that’s very clever.
02:03:12 Yeah.
02:03:13 So you now have the single cell sequence technology.
02:03:16 You can use the wells, you can use the bubbles or you can use the bottles and you have way
02:03:22 The bubbles still sound pretty damn cool.
02:03:23 The bubbles are awesome.
02:03:24 And that’s basically the main technology that we’re using.
02:03:26 So the bubbles is the main technology.
02:03:29 So there are kits now that companies just sell to basically carry out single cell RNA
02:03:34 sequencing that you can basically for $2,000, you can basically get 10,000 cells from one
02:03:40 sample.
02:03:42 And for every one of those cells, you basically have the transcription of thousands of genes.
02:03:49 And you know, of course the data for any one cell is noisy, but being computer scientists,
02:03:54 we can aggregate the data from all of the cells together across thousands of individuals
02:03:58 together to basically make very robust inferences.
02:04:02 Okay.
02:04:03 So the third technology is basically single cell RNA sequencing that allows you to now
02:04:07 start asking not just what is the brain expression level difference of that genetic variant,
02:04:14 but what is the expression difference of that one genetic variant across every single subtype
02:04:20 of brain cell?
02:04:21 How is the variance changing?
02:04:24 You can’t just, you know, with a brain sample, you can just ask about the mean, what is the
02:04:29 average expression?
02:04:30 If I instead have 3000 cells that are neurons, I can ask not just what is the neuronal expression.
02:04:38 I can say for layer five excitatory neurons of which I have, I don’t know, 300 cells,
02:04:44 what is the variance that this genetic variant has?
02:04:48 So suddenly it’s amazingly more powerful.
02:04:51 I can basically start asking about this middle layer of gene expression at unprecedented
02:04:55 levels.
02:04:56 So when you look at the average, it washes out some potentially important signal that
02:05:01 corresponds to ultimately the disease.
02:05:04 Completely.
02:05:05 Yeah.
02:05:06 So that, I can do that at the RNA level, but I can also do that at the DNA level for the
02:05:10 epigenome.
02:05:11 So remember how before I was telling you about all this technology that we’re using to probe
02:05:14 the epigenome, one of them is DNA accessibility.
02:05:18 So what we’re doing in my lab is that from the same dissociation of say a brain sample
02:05:23 where you now have all these tens of thousands of cells floating around, you basically take
02:05:27 half of them to do RNA profiling and the other half to do epigenome profiling, both at the
02:05:32 single cell level.
02:05:34 So that allows you to now figure out what are the millions of DNA enhancers that are
02:05:40 accessible in every one of tens of thousands of cells.
02:05:45 And computationally, we can now take the RNA and the DNA readouts and group them together
02:05:50 to basically figure out how is every enhancer related to every gene.
02:05:57 And remember these sort of enhancer gene linking that we were doing across 833 samples?
02:06:01 833 is awesome, don’t get me wrong, but 10 million is way more awesome.
02:06:08 So we can now look at correlated activity across 2.3 million enhancers and 20,000 genes
02:06:14 in each of millions of cells to basically start piecing together the regulatory circuitry
02:06:19 of every single type of neuron, every single type of astrocytes, oligodendrocytes, microglial
02:06:25 cell inside the brains of 1,500 individuals that we sample across multiple different brain
02:06:32 regions across both DNA and RNA.
02:06:36 So that’s the data set that my team generated last year alone.
02:06:39 So in one year, we basically generated 10 million cells from human brain across a dozen
02:06:46 different disorders, across schizophrenia, Alzheimer’s, frontotemporal dementia, Lewy
02:06:51 body dementia, ALS, Huntington’s disease, post traumatic stress disorder, autism, bipolar
02:07:01 disorder, healthy aging, et cetera.
02:07:04 So it’s possible that even just within that data set lie a lot of keys to understanding
02:07:13 these diseases and then be able to like directly leads to then treatment.
02:07:18 Correct.
02:07:19 Correct.
02:07:20 So basically we are now motivating.
02:07:21 Yeah.
02:07:22 So our computational team is in heaven right now and we’re looking for people.
02:07:25 I mean, if you have super smart.
02:07:29 So this is a very interesting kind of side question.
02:07:33 How much of this is biology?
02:07:34 How much of this is computation?
02:07:36 So you’re the head of the computational biology group, but how much of, should you be comfortable
02:07:44 with biology to be able to solve some of these problems?
02:07:48 If you just find, if you put several of the hats you were on fundamentally, are you thinking
02:07:54 like a computer scientist here?
02:07:56 You have to.
02:07:57 This is the only way.
02:07:59 As I said, we are the descendants of the first digital computer.
02:08:02 We’re trying to understand the digital computer.
02:08:05 We’re trying to understand the circuitry, the logic of this digital core computer and
02:08:11 all of these analog layers surrounding it.
02:08:14 So the case that I’ve been making is that you cannot think one gene at a time.
02:08:19 The traditional biology is dead.
02:08:22 There’s no way you cannot solve disease with traditional biology.
02:08:24 You need it as a component.
02:08:27 Once you figured out RX3 and RX5, you now can then say, Hey, have you guys worked on
02:08:31 those genes with your single gene approach?
02:08:33 We’d love to know everything you know.
02:08:35 And if you haven’t, we now know how important these genes are.
02:08:38 Let’s now launch a single gene program to dissect them and understand them.
02:08:43 But you cannot use that as a way to dissect disease.
02:08:46 You have to think genomically.
02:08:48 You have to think from the global perspective and you have to build these circuits systematically.
02:08:53 So we need numbers of computer scientists who are interested and willing to dive into
02:08:59 these data fully, fully in and extract meaning.
02:09:04 We need computer science people who can understand machine learning and inference and decouple
02:09:11 these matrices, come up with super smart ways of dissecting them.
02:09:16 But we also need computer scientists who understand biology, who are able to design the next generation
02:09:22 of experiments.
02:09:24 Because many of these experiments, no one in their right mind would design them without
02:09:28 thinking of the analytical approach that you would use to deconvolve the data afterwards.
02:09:33 Because it’s massive amounts of ridiculously noisy data.
02:09:36 And if you don’t have the computational pipeline in your head before you even design the experiment,
02:09:42 you would never design the experiment that way.
02:09:44 That’s brilliant.
02:09:45 So in designing the experiment, you have to see the entirety of the computational pipeline.
02:09:50 That drives the design.
02:09:52 That even drives the necessity for that design.
02:09:55 Basically, you know, if you didn’t have a computer scientist way of thinking, you would
02:10:00 never design these hugely combinatorial, massively parallel experiments.
02:10:07 So that’s why you need interdisciplinary teams, you need teams.
02:10:10 And I want to sort of clarify that what do we mean by computational biology group?
02:10:15 The focus is not on computational, the focus is on the biology.
02:10:18 So we are a biology group.
02:10:20 What type of biology?
02:10:22 Computational biology.
02:10:23 That’s the type of biology that uses the whole genome.
02:10:27 That’s the type of biology that designs experiments, genomic experiments, that can only be interpreted
02:10:33 in the context of the whole genome.
02:10:34 Right.
02:10:35 So it’s philosophically looking at biology as a computer.
02:10:39 Correct.
02:10:40 Correct.
02:10:41 So which is in the context of the history of biology is a big transformation.
02:10:46 Yeah.
02:10:47 Yeah.
02:10:48 You can think of the name as what do we do?
02:10:50 Only computation.
02:10:51 That’s not true.
02:10:52 How do we study it?
02:10:53 Only computationally.
02:10:54 That is true.
02:10:56 So all of these single cell sequencing can now be coupled with the technology that we
02:11:00 talked about earlier for perturbation.
02:11:02 So here’s the crazy thing.
02:11:04 Instead of using these wells and these robotic systems for doing one drug at a time or for
02:11:10 perturbing one gene at a time in thousands of wells, you can now do this using a pool
02:11:16 of cells and single cell RNA sequencing.
02:11:20 How?
02:11:21 You basically can take these perturbations using CRISPR and instead of using a single
02:11:27 guide RNA, you can use a library of guide RNAs generated exactly the same way using
02:11:32 this array technology.
02:11:34 So you synthesize a thousand different guide RNAs.
02:11:38 You now take each of these guide RNAs and you insert them in a pool of cells where every
02:11:45 cell gets one perturbation.
02:11:48 And you use CRISPR editing or CRISPR, so with either CRISPR Cas9 to edit a genome with these
02:11:56 thousand perturbations or with the activation or with the repression.
02:12:01 And you now can have a single cell readout where every single cell has received one of
02:12:07 these modifications.
02:12:09 And you can now in massively parallel ways, couple the perturbation and the readout in
02:12:17 a single experiment.
02:12:18 How are you tracking which perturbations each cell received?
02:12:21 So there’s ways of doing that, but basically one way is to make that perturbation an expressible
02:12:27 vector so that part of your RNA reading is actually that perturbation itself.
02:12:33 So you can basically put it in an expressible part so you can self drive it.
02:12:37 So the point that I want to get across is that the sky’s the limit.
02:12:42 You basically have these tools, these building blocks of molecular biology.
02:12:46 We have these massive data sets of computational biology.
02:12:50 We have this huge ability to sort of use machine learning and statistical methods and, you
02:12:56 know, linear algebra to sort of reduce the dimensionality of all these massive data sets.
02:13:01 And then you end up with a series of actionable targets that you can then couple with pharma
02:13:10 and just go after systematically.
02:13:13 So the ability to sort of bring genetics to the epigenomics, to the transcriptomics, to
02:13:19 the cellular readouts using these sort of high throughput perturbation technologies
02:13:24 that I’m talking about and ultimately to the organismal through the electronic health record
02:13:30 endophenotypes and ultimately the disease battery of assays at the cognitive level,
02:13:36 at the physiological level and, you know, every other level.
02:13:42 There is no better or more exciting field, in my view, to be a computer scientist then
02:13:46 or to be a scientist in period.
02:13:48 Basically this confluence of technologies, of computation, of data, of insight and of
02:13:54 tools for manipulation is unprecedented in human history.
02:13:58 And I think this is what’s shaping the next century to really be a transformative century
02:14:04 for our species and for our planet.
02:14:09 Do you think the 21st century will be remembered for the big leaps in understanding and alleviation
02:14:17 of biology?
02:14:18 If you look at the path between discovery and therapeutics, it’s been on the order of
02:14:23 50 years, it’s been shortened to 40, 30, 20, and now it’s on the order of 10 years.
02:14:29 But the huge number of technologies that are going on right now for discovery will result
02:14:36 undoubtedly in the most dramatic manipulation of human biology that we’ve ever seen in the
02:14:42 history of humanity in the next few years.
02:14:45 Do you think we might be able to cure some of the diseases we started this conversation
02:14:48 with?
02:14:49 Absolutely.
02:14:50 Absolutely.
02:14:51 It’s only a matter of time.
02:14:54 Basically the complexity is enormous and I don’t want to underestimate the complexity
02:14:58 but the number of insights is unprecedented and the ability to manipulate is unprecedented
02:15:03 and the ability to deliver these small molecules and other non traditional medicine perturbations,
02:15:11 there’s a new generation of perturbations that you can use at the DNA level, at the
02:15:17 RNA level, at the micro RNA level, at the epigenomic level, there’s a battery of new
02:15:24 generations of perturbations.
02:15:26 If you couple that with cell type identifiers that can basically sense when you are in the
02:15:32 right cell based on the specific combination and then turn on that intervention for that
02:15:36 cell, you can now think of combinatorial interventions where you can basically sort of feed a synthetic
02:15:42 biology construct to someone that will basically do different things in different cells.
02:15:47 So basically for cancer, this is one of the therapeutics that our collaborator Ron Weiss
02:15:51 is using to basically start sort of engineering the circuits that will use micro RNA sensors
02:15:56 of the environment to sort of know if you’re in a tumor cell or if you’re in an immune
02:15:59 cell or if you’re in a stromal cell and so forth and basically turn on particular interventions
02:16:04 there.
02:16:05 You can sort of create constructs that are tuned to only the liver cells or only the
02:16:11 heart cells or only the brain cells and then have these new generations of therapeutics
02:16:18 coupled with this immense amount of knowledge on the sort of which targets to choose and
02:16:24 what biological processes to measure and how to intervene.
02:16:27 My view is that disease is going to be fundamentally altered and alleviated as we go forward.
02:16:36 Next time we talk, we’ll talk about the philosophical implications of that and the effect of life,
02:16:40 but let’s stick to biology for just a little longer.
02:16:44 We did pretty good today.
02:16:45 We stuck to the science.
02:16:49 What are you excited in terms of the future of this field, the technologies in your own
02:16:56 group, in your own mind, you’re leading the world at MIT in the science and the engineering
02:17:02 of this work.
02:17:04 So what are you excited about here?
02:17:06 I could not be more excited.
02:17:08 We are one of many, many teams who are working on this.
02:17:12 In my team, the most exciting parts are, you know, many folds.
02:17:17 So basically we’ve now assembled these battery of technologies.
02:17:20 We’ve assembled these massive, massive data sets and now we’re really sort of in the stage
02:17:24 of our team’s path of generating disease insights.
02:17:30 So we are simultaneously working on a paper on schizophrenia right now that is basically
02:17:36 using the single cell profiling technologies, using this editing and manipulation technologies
02:17:40 to basically show how the master regulators underlying changes in the brain that are sort
02:17:47 of found in schizophrenia are in fact affecting excitatory neurons and inhibitory neurons
02:17:53 in pathways that are active both in synaptic pruning, but also in early development.
02:17:59 We’ve basically found this set of four regulators that are connecting these two processes that
02:18:03 were previously separate in schizophrenia in sort of having a sort of more unified view
02:18:10 across those two sides.
02:18:12 The second one is in the area of metabolism.
02:18:15 We basically now have a beautiful collaboration with the Goodyear lab that’s basically looking
02:18:19 at multi tissue perturbations in six or seven different tissues across the body in the context
02:18:29 of exercise and in the context of nutritional interventions using both mouse and human,
02:18:35 where we can basically see what are the cell to cell communications that are changing across
02:18:41 them.
02:18:42 And what we’re finding is this immense role of both immune cells as well as adipocyte
02:18:47 stem cells in sort of reshaping that circuitry of all of these different tissues and that’s
02:18:53 sort of painting to a new path for therapeutical intervention there.
02:18:56 In Alzheimer’s, it’s this huge focus on microglia and now we’re discovering different classes
02:19:02 of microglial cells that are basically either synaptic or immune.
02:19:10 And these are playing vastly different roles in Alzheimer’s versus in schizophrenia.
02:19:16 And what we’re finding is this immense complexity as you go further and further down of how
02:19:22 in fact there’s 10 different types of microglia, each with their own sort of expression programs.
02:19:28 We used to think of them as, oh yeah, they’re microglia, but in fact now we’re realizing
02:19:32 just even in that sort of least abundant of cell types, there’s this incredible diversity
02:19:37 there.
02:19:39 The differences between brain regions is another sort of major, major insight.
02:19:44 Often one would think that, oh, astrocytes are astrocytes no matter where they are.
02:19:48 But no, there’s incredible region specific differences in the expression patterns of
02:19:54 all of the major brain cell types across different brain regions.
02:19:57 So basically there’s the neocortical regions that are sort of the recent innovation that
02:20:01 makes us so different from all other species.
02:20:03 There’s the sort of reptilian brain sort of regions that are sort of much more very extremely
02:20:10 distinct.
02:20:11 There’s the cerebellum.
02:20:12 Each of those basically is associated in a different way with disease.
02:20:17 And what we’re doing now is looking into pseudo temporal models for how disease progresses
02:20:23 across different regions of the brain.
02:20:25 If you look at Alzheimer’s, it basically starts in this small region called the entorhinal
02:20:30 cortex and then it spreads through the brain and through the hippocampus and ultimately
02:20:38 affecting the neocortex.
02:20:39 And with every brain region that it hits, it basically has a different impact on the
02:20:46 cognitive and memory aspects, orientation, short term memory, long term memory, et cetera,
02:20:52 which is dramatically affecting the cognitive path that the individuals go through.
02:20:58 So what we’re doing now is creating these computational models for ordering the cells
02:21:04 and the regions and the individuals according to their ability to predict Alzheimer’s disease.
02:21:10 So we can have a cell level predictor of pathology that allows us to now create a temporal time
02:21:17 course that tells us when every gene turns on along this pathology progression and then
02:21:22 trace that across regions and pathological measures that are region specific, but also
02:21:28 cognitive measures and so on and so forth.
02:21:30 So that allows us to now sort of for the first time, look at can we actually do early intervention
02:21:35 for Alzheimer’s where we know that the disease starts manifesting for 10 years before you
02:21:40 actually have your first cognitive loss.
02:21:44 Can we start seeing that path to build new diagnostics, new prognostics, new biomarkers
02:21:50 for this sort of early intervention in Alzheimer’s?
02:21:54 The other aspect that we’re looking at is mosaicism.
02:21:57 We talked about the common variants and the rare variants, but in addition to those rare
02:22:01 variants as your initial cell that forms the zygote divides and divides and divides, with
02:22:08 every cell division there are additional mutations that are happening.
02:22:12 So what you end up with is your brain being a mosaic of multiple different types of genetic
02:22:18 underpinnings.
02:22:19 Some cells contain a mutation that other cells don’t have.
02:22:23 So every human has the common variants that all of us carry to some degree, the rare variants
02:22:31 that your immediate tree of the human species carries, and then there’s the somatic variant,
02:22:37 which is the tree that happened after the zygote that sort of forms your own body.
02:22:44 So these somatic alterations is something that has been previously inaccessible to study
02:22:50 in human postmortem samples.
02:22:53 But right now with the advent of single cell RNA sequencing, in this particular case, we’re
02:22:58 using the well based sequencing, which is much more expensive, but gives you a lot richer
02:23:01 information about each of those transcripts.
02:23:04 So we’re using now that richer information to infer mutations that have happened in each
02:23:10 of the thousands of genes that sort of are active in these cells, and then understand
02:23:16 how the genome relates to the function, this genotype phenotype relationship that we usually
02:23:25 build in GWAS between in genome wide association studies between genetic variation and disease.
02:23:31 We’re now building that at the cell level, where for every cell, we can relate the unique
02:23:36 specific genome of that cell with the expression patterns of that cell, and the predicted function
02:23:42 using these predictive models that I mentioned before on this regulation for cognition for
02:23:47 pathology in Alzheimer’s at the cell level.
02:23:51 And what we’re finding is that the genes that are altered and the genetic regions that are
02:23:54 altered in common variants versus rare variants versus somatic variants are actually very
02:23:59 different from each other.
02:24:01 The somatic variants are pointing to neuronal energetics and oligodendrocyte functions that
02:24:08 are not visible in the genetic legions that you find for the common variants, probably
02:24:13 because they have too strong of an effect that evolution is just not tolerating them
02:24:17 on the common side of the allele frequency spectrum.
02:24:20 So the somatic one, that’s the variation that happens after the zygote, after you individual.
02:24:26 I mean, this is a dumb question, but there’s mutation and variation, I guess that happens
02:24:31 there.
02:24:32 And you’re saying that they’re through this, if we focus in on individual cells, we’re
02:24:37 able to detect the story that’s interesting there, and that might be a very unique kind
02:24:42 of important variability that arises for, you said neuronal or something that would
02:24:49 sound…
02:24:50 Energetics.
02:24:51 Energetics, sounds like a cool term.
02:24:52 So, I mean, the metabolism of humans is dramatically altered from that of nearby species.
02:24:59 We talked about that last time that basically we are able to consume meat that is incredibly
02:25:04 energy rich, and that allows us to sort of have functions that are meeting this humongous
02:25:13 brain that we have.
02:25:14 So basically on one hand, every one of our brain cells is much more energy efficient
02:25:18 than our neighbors, than our relatives.
02:25:20 Number two, we have way more of these cells.
02:25:23 And number three, we have this new diet that allows us to now feed all these needs.
02:25:30 That basically creates a massive amount of damage, oxidative damage from this huge super
02:25:36 powered factory of ideas and thoughts that we carry in our skull.
02:25:42 And that factory has energetic needs, and there’s a lot of sort of biological processes
02:25:47 underlying that, that we are finding are altered in the context of Alzheimer’s disease.
02:25:52 That’s fascinating.
02:25:53 So you have to consider all of these systems if you want to understand even something like
02:25:59 diseases that you would maybe traditionally associate with just the particular cells of
02:26:04 the brain.
02:26:07 The immune system, the metabolic system, the metabolic system.
02:26:11 And these are all the things that makes us uniquely human.
02:26:13 So our immune system is dramatically different from that of our neighbors.
02:26:17 Our societies are so much more clustered.
02:26:19 The history of infection that have plagued the human population is dramatically different
02:26:24 from every other species.
02:26:27 The way that our society and our population has sort of exploded has basically put unique
02:26:31 pressures on our immune system.
02:26:33 And our immune system has both coped with that density and also been shaped by, as I
02:26:37 mentioned, the vast amount of death that has happened in the Black Plague and other sort
02:26:42 of selective events in human history, famines, ice ages, and so forth.
02:26:47 So that’s number one on the sort of immune side.
02:26:49 On the metabolic side, again, we are able to sort of run marathons.
02:26:55 I don’t know if you remember the sort of human versus horse experiment where the horse actually
02:26:59 tires out faster than the human and the human actually wins.
02:27:03 So on the metabolic side, we’re dramatically different.
02:27:05 On the immune side, we’re dramatically different.
02:27:07 On the brain side, again, you know, no need to sort of, you know, it’s a no brainer of
02:27:12 how our brain is like just enormously more capable.
02:27:16 And then, you know, in the side of cancer, so basically the cancers that humans are having,
02:27:21 the exposures, the environmental exposures is again, dramatically different.
02:27:25 And the lifespan, the expansion of human lifespan is unseen in any other species in, you know,
02:27:32 recent evolutionary history.
02:27:35 And that now leads to a lot of new disorders that are starting to, you know, manifest late
02:27:42 in life.
02:27:43 So you know, Alzheimer’s is one example where basically, you know, these vast energetic
02:27:48 needs over a lifetime of thinking can basically lead to all of these debris and eventually
02:27:54 saturate the system and lead to, you know, Alzheimer’s in the late life.
02:28:00 But there’s, you know, there’s just such a dramatic set of frontiers when it comes to
02:28:07 aging research that, you know, so what I often like to say is that if you want to engineer
02:28:14 a car to go from 70 miles an hour to 120 miles an hour, that’s fine.
02:28:18 You can basically, you know, fix a few components.
02:28:20 If you wanted to now go at 400 miles an hour, you have to completely redesign the entire
02:28:24 car because the system has just not evolved to go that far.
02:28:31 Basically our human body has only evolved to live to, I don’t know, 120, maybe we can
02:28:36 get to 150 with minor changes.
02:28:39 But if, you know, as we start pushing these frontiers for not just living, but well living,
02:28:45 the Fzine that we talked about last time.
02:28:48 So to basically push Fzine into the 80s and 90s and a hundreds and, you know, much further
02:28:53 than that, we will face new challenges that have, you know, never been faced before in
02:29:00 terms of cancer, the number of divisions, in terms of Alzheimer’s and brain related
02:29:04 disorders, in terms of metabolic disorders, in terms of regeneration, there’s just so
02:29:08 many different frontiers ahead of us.
02:29:10 So I am thrilled about where we’re heading.
02:29:14 So basically I see this confluence in my lab and many other labs of AI, of, you know, sort
02:29:20 of, you know, the next frontier of AI for drug design.
02:29:22 So basically these sort of graph neural networks on specific chemical designs that allow you
02:29:30 to create new generations of therapeutics.
02:29:34 These molecular biology tricks for intervening at the system at every level, these personalized
02:29:42 medicine prediction, diagnosis, and prognosis using the electronic health records and using
02:29:49 these polygenic risk scores weighted by the burden, the number of mutations that are accumulating
02:29:56 across common rare and somatic variants, the burden converging across all of these different
02:30:03 molecular pathways, the delivery of specific drugs and specific interventions into specific
02:30:10 cell types.
02:30:11 And again, you’ve talked with Bob Langer about this, there’s, you know, many giants in that
02:30:14 field.
02:30:15 And then the last concept is not intervening at the single gene level.
02:30:20 I want you to sort of conceptualize the concept of an on target side effect.
02:30:27 What is an on target side effect?
02:30:29 An off target side effect is when you design a molecule to target one gene and instead
02:30:33 it targets another gene and you have side effects because of that.
02:30:36 And on target side effect is when your molecule does exactly what you were expecting, but
02:30:41 that gene is plyotropic.
02:30:43 Plyo means many, tropos means ways, many ways, it acts in many ways.
02:30:48 It’s a multifunctional gene.
02:30:50 So you find that this gene plays a role in this, but as we talked about the wiring of
02:30:55 genes to phenotypes is extremely dense and extremely complex.
02:30:59 So the next stage of intervention will be intervening not at the gene level, but at
02:31:04 the network level.
02:31:06 Intervening at the set of pathways and the set of genes with multi input perturbations
02:31:11 to the system, multi input modulations, pharmaceutical or other interventional, and that basically
02:31:18 allow you to now work at the sort of full level of understanding, not just in your brain,
02:31:24 but across your body, not just in one gene, but across the set of pathways and so on and
02:31:29 so forth for every one of these disorders.
02:31:31 So I think that we’re finally at the level of systems medicine of basically instead of
02:31:37 sort of medicine being at the single gene level, medicine being at the systems level
02:31:42 where it can be personalized based on the specific set of genetic markers and genetic
02:31:46 perturbations that you are either born with or that you have developed during your lifetime.
02:31:53 Your unique set of exposures, your unique set of biomarkers, and your unique set of
02:31:59 current set of conditions through your EHR and other ways.
02:32:06 And the precision component of intervening extremely precisely in the specific pathways
02:32:12 and the specific combinations of genes that should be modulated to sort of bring you from
02:32:16 the disease state to the physiologically normal state or even to physiologically improved
02:32:23 state through this combination of interventions.
02:32:25 So that’s in my view, the field where basically computer science comes together with artificial
02:32:30 intelligence statistics, all of these other tools, molecular biology technologies and
02:32:34 biotechnology and pharmaceutical technologies that are sort of revolutionary in the way
02:32:37 of intervention.
02:32:38 And of course, this massive amount of molecular biology and data gathering and generation
02:32:43 and perturbation in massively parallel ways.
02:32:46 So there’s no better way.
02:32:47 There’s no better time.
02:32:49 There’s no better place to be sort of looking at this whole confluence of ideas.
02:32:56 And I’m just so thrilled to be a small part of this amazing, enormous ecosystem.
02:33:01 It’s exciting to imagine what humans of 100, 200 years from now, what their life experience
02:33:07 is like, because these ideas seem to have potential to transform the quality of life
02:33:13 that, when they look back at us, they probably wonder how we were put up with all the suffering
02:33:22 in the world.
02:33:23 Manolis, it’s a huge honor.
02:33:25 Thank you for spending this early Sunday morning with me.
02:33:29 I deeply appreciate it.
02:33:30 See you next time.
02:33:31 Sounds like a plan.
02:33:32 Thank you, Lex.
02:33:33 Thanks for listening to this conversation with Manolis Kellis.
02:33:36 And thank you to our sponsors, SEMrush, which is an SEO optimization tool.
02:33:43 Pessimist Archive, which is one of my favorite history podcasts.
02:33:47 8Sleep, which is a self cooling mattress with smart sensors and an app.
02:33:52 And finally, BetterHelp, which is an online therapy service.
02:33:57 Please check out these sponsors in the description to get a discount and to support this podcast.
02:34:02 If you enjoy this thing, subscribe on YouTube, review it with 5 Stars and Apple Podcasts,
02:34:08 follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
02:34:13 And now, let me leave you with some words from Haruki Murakami.
02:34:19 Human beings are ultimately nothing but carriers, passageways for genes.
02:34:24 They ride us into the ground like racehorses from generation to generation.
02:34:30 Genes don’t think about what constitutes good or evil.
02:34:34 They don’t care whether we’re happy or unhappy.
02:34:37 We’re just means to an end for them.
02:34:40 The only thing they think about is what is most efficient for them.
02:34:45 Thank you for listening, and hope to see you next time.