Manolis Kellis: Biology of Disease #133

Transcript

00:00:00 The following is a conversation with Manolis Kellis, his third time on the podcast.

00:00:05 He is a professor at MIT and head of the MIT Computational Biology Group.

00:00:11 This time we went deep on the science, biology, and genetics.

00:00:17 So this is a bit of an experiment.

00:00:19 Manolis went back and forth between the basics of biology to the latest state of the art

00:00:25 in the research.

00:00:26 He’s a master at this, so I just sat back and enjoyed the ride.

00:00:31 This conversation happened at 7am, so it’s yet another podcast episode after an all nighter

00:00:37 for me.

00:00:38 And once again, since the universe has a sense of humor, this one was a tough one for my

00:00:44 brain to keep up, but I did my best and I never shy away from a good challenge.

00:00:50 Quick mention of each sponsor, followed by some thoughts related to the episode.

00:00:55 First is SEMrush, the most advanced SEO optimization tool I’ve ever come across.

00:01:02 I don’t like looking at numbers, but someone probably should, it helps you make good decisions.

00:01:08 Second is Pessimist Archive, they’re back, one of my favorite history podcasts on why

00:01:13 people resist new things from recorded music to umbrellas to cars, chess, coffee, and the

00:01:20 elevator.

00:01:22 Third is 8sleep, a mattress that cools itself, measures heart rate variability, has an app,

00:01:28 and has given me yet another reason to look forward to sleep, including the all important

00:01:33 power nap.

00:01:34 And finally, BetterHelp, online therapy when you want to face your demons with a licensed

00:01:40 professional, not just by doing the David Goggins like physical challenges like I seem

00:01:45 to do on occasion.

00:01:47 Please check out these sponsors in the description to get a discount and to support this podcast.

00:01:54 As a side note, let me say that biology in the brain and in the various systems of the

00:01:59 body fill me with awe every time I think about how such a chaotic mess coming from its humble

00:02:05 origins in the ocean was able to achieve such incredibly complex and robust mechanisms of

00:02:11 life that survived despite all the forces of nature that want to destroy it.

00:02:17 It is so unlike the computing systems we humans have engineered that it makes me feel that

00:02:22 in order to create artificial general intelligence and artificial consciousness, we may have

00:02:28 to completely rethink how we engineer computational systems.

00:02:33 If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcast, follow

00:02:38 on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.

00:02:44 And now, here’s my conversation with Manolis Callas.

00:02:49 So your group at MIT is trying to understand the molecular basis of human disease.

00:02:54 What are some of the biggest challenges in your view?

00:02:57 Don’t get me started.

00:02:58 I mean, understanding human disease is the most complex challenge in modern science.

00:03:06 So because human disease is as complex as the human genome, it is as complex as the

00:03:13 human brain, and it is in many ways, even more complex because the more we understand

00:03:20 disease complexity, the more we start understanding genome complexity and epigenome complexity

00:03:27 and brain circuitry complexity and immune system complexity and cancer complexity and

00:03:31 so on and so forth.

00:03:32 So traditionally, human disease was following basic biology.

00:03:39 You would basically understand basic biology in model organisms like, you know, mouse and

00:03:44 fly and yeast.

00:03:46 You would understand sort of mammalian biology and animal biology and eukaryotic biology

00:03:53 in sort of progressive layers of complexity, getting closer to human phylogenetically.

00:03:59 And you would do perturbation experiments in those species to see if I knock out a gene,

00:04:06 what happens?

00:04:07 And based on the knocking out of these genes, you would basically then have a way to drive

00:04:12 human biology because you would sort of understand the functions of these genes.

00:04:16 And then if you find that a human gene locus, something that you’ve mapped from human genetics

00:04:23 to that gene is related to a particular human disease, you’d say, aha, now I know the function

00:04:28 of the gene from the model organisms.

00:04:31 I can now go and understand the function of that gene in human.

00:04:37 But this is all changing.

00:04:38 This is dramatically changed.

00:04:39 So that was the old way of doing basic biology.

00:04:41 You would start with the animal models, the eukaryotic models, the mammalian models, and

00:04:46 then you would go to human.

00:04:48 Human genetics has been so transformed in the last decade or two that human genetics

00:04:55 is now actually driving the basic biology.

00:04:58 There is more genetic mutation information in the human genome than there will ever be

00:05:04 in any other species.

00:05:06 What do you mean by mutation information?

00:05:08 So perturbations is how you understand systems.

00:05:11 So an engineer builds systems and then they know how they work from the inside out.

00:05:16 A scientist studies systems through perturbations.

00:05:20 You basically say, if I poke that balloon, what’s going to happen?

00:05:23 And I’m going to film it in super high resolution, understand, I don’t know, aerodynamics or

00:05:26 fluid dynamics if it’s filled with water, et cetera.

00:05:28 So you can then make experimentation by perturbation and then the scientific process is sort of

00:05:33 building models that best fit the data, designing new experiments that best test your models

00:05:41 and challenge your models and so on and so forth.

00:05:43 This is the same thing with science.

00:05:44 Basically if you’re trying to understand biological science, you basically want to do perturbations

00:05:49 that then drive the models.

00:05:54 So how do these perturbations allow you to understand disease?

00:05:58 So if you know that a gene is related to disease, you don’t want to just know that it’s related

00:06:04 to the disease.

00:06:05 You want to know what is the disease mechanism because you want to go and intervene.

00:06:09 So the way that I like to describe it is that traditionally epidemiology, which is basically

00:06:17 the study of disease, you know, sort of the observational study of disease has been about

00:06:23 correlating one thing with another thing.

00:06:25 So if you have a lot of people with liver disease who are also alcoholics, you might

00:06:29 say, well, maybe the alcoholism is driving the liver disease or maybe those who have

00:06:34 liver disease self medicate with alcohol.

00:06:36 So the connection could be either way.

00:06:40 With genetic epidemiology, it’s about correlating changes in genome with phenotypic differences

00:06:47 and then you know the direction of causality.

00:06:50 So if you know that a particular gene is related to the disease, you can basically say, okay,

00:06:58 perturbing that gene in mouse causes the mice to have X phenotype.

00:07:03 So perturbing that gene in human causes the humans to have the disease.

00:07:08 So I can now figure out what are the detailed molecular phenotypes in the human that are

00:07:14 related to that organismal phenotype in the disease.

00:07:18 So it’s all about understanding disease mechanism, understanding what are the pathways, what

00:07:22 are the tissues, what are the processes that are associated with the disease so that we

00:07:27 know how to intervene.

00:07:29 You can then prescribe particular medications that also alter these processes.

00:07:33 You can prescribe lifestyle changes that also affect these processes and so on and so forth.

00:07:37 That’s such a beautiful puzzle to try to solve.

00:07:41 Like what kind of perturbations eventually have this ripple effect that leads to disease

00:07:45 across the population.

00:07:46 And then you study that for animals or mice first and then see how that might possibly

00:07:51 connect to humans.

00:07:54 How hard is that puzzle of trying to figure out how little perturbations might lead to,

00:08:01 in a stable way, to a disease?

00:08:04 In animals, we make the puzzle simpler because we perturb one gene at a time.

00:08:11 That’s the beauty of this, the power of animal models.

00:08:13 You can basically decouple the perturbations.

00:08:15 You only do one perturbation and you only do strong perturbations at a time.

00:08:21 In human, the puzzle is incredibly complex because obviously you don’t do human experimentation.

00:08:28 You wait for natural selection and natural genetic variation to basically do its own

00:08:34 experiments, which it has been doing for hundreds and thousands of years in the human population

00:08:40 and for hundreds of thousands of years across the history leading to the human population.

00:08:49 So you basically take this natural genetic variation that we all carry within us.

00:08:54 Every one of us carries 6 million perturbations.

00:08:58 So I’ve done 6 million experiments on you, 6 million experiments on me, 6 million experiments

00:09:02 on every one of 7 billion people on the planet.

00:09:06 What’s the 6 million correspond to?

00:09:08 6 million unique genetic variants that are segregating in the human population.

00:09:14 Every one of us carries millions of polymorphic sites, poly, many, morph, forms.

00:09:22 Polymorphic means many forms, variants.

00:09:25 That basically means that every one of us has single nucleotide alterations that we

00:09:29 have inherited from mom and from dad that basically can be thought of as tiny little

00:09:34 perturbations.

00:09:36 Most of them don’t do anything, but some of them lead to all of the phenotypic differences

00:09:42 that we see between us.

00:09:43 The reason why two twins are identical is because these variants completely determine

00:09:48 the way that I’m going to look at exactly 93 years of age.

00:09:52 How happy are you with this kind of data set?

00:09:54 Is it large enough of the human population of Earth?

00:09:59 Is that too big, too small?

00:10:01 Yeah, so is it large enough is a power analysis question.

00:10:07 In every one of our grants, we do a power analysis based on what is the effect size

00:10:11 that I would like to detect and what is the natural variation in the two forms.

00:10:19 Every time you do a perturbation, you’re asking, I’m changing form A into form B. Form A has

00:10:25 some natural phenotypic variation around it and form B has some natural phenotypic variation

00:10:30 around it.

00:10:31 If those variances are large and the differences between the mean of A and the mean of B are

00:10:36 small, then you have very little power.

00:10:38 The further the means go apart, that’s the effect size, the more power you have, and

00:10:44 the smaller the standard deviation, the more power you have.

00:10:48 So basically when you’re asking, is that sufficiently large, certainly not for everything, but we

00:10:54 already have enough power for many of the stronger effects in the more tight distributions.

00:11:01 So that’s the hopeful message that there exists parts of the genome that have a strong effect

00:11:09 that has a small variance.

00:11:13 That’s exactly right.

00:11:14 Unfortunately, those perturbations are the basis of disease in many cases.

00:11:18 So it’s not a hopeful message.

00:11:20 Sometimes it’s a terrible message.

00:11:22 It’s basically, well, some people are sick, but if we can figure out what are these contributors

00:11:27 to sickness, we can then help make them better and help many other people better who don’t

00:11:32 carry that exact mutation, but who carry mutations on the same pathways.

00:11:38 And that’s what we like to call the allelic series of a gene.

00:11:42 You basically have many perturbations of the same gene in different people, each with a

00:11:49 different frequency in the human population and each with a different effect on the individual

00:11:55 that carries them.

00:11:56 So you said in the past there would be these small experiments on perturbations and animal

00:12:03 models.

00:12:04 What does this puzzle solving process look like today?

00:12:08 So we basically have something like 7 billion people in the planet and every one of them

00:12:13 carries something like 6 million mutations.

00:12:16 You basically have an enormous matrix of genotype by phenotype by systematically measuring the

00:12:25 phenotype of these individuals.

00:12:27 And the traditional way of measuring this phenotype has been to look at one trait at

00:12:32 a time.

00:12:33 You would gather families and you would sort of paint the pedigrees of a strong effect,

00:12:40 what we like to call Mendelian mutation, so a mutation that gets transmitted in a dominant

00:12:47 or a recessive, but strong effect form where basically one locus plays a very big role

00:12:53 in that disease.

00:12:54 And you could then look at carriers versus non carriers in one family, carriers versus

00:12:59 non carriers in another family and do that for hundreds, sometimes thousands of families

00:13:04 and then trace these inheritance patterns and then figure out what is the gene that

00:13:08 plays that role.

00:13:09 Is this the matrix that you’re showing in talks or lectures?

00:13:14 So that matrix is the input to those stuff that I show in talks.

00:13:21 So basically that matrix has traditionally been strong effect genes.

00:13:24 What the matrix looks like now is instead of pedigrees, instead of families, you basically

00:13:29 have thousands and sometimes hundreds of thousands of unrelated individuals, each with all of

00:13:36 their genetic variants and each with their phenotype, for example, height or lipids or,

00:13:43 you know, whether they’re sick or not for a particular trait.

00:13:48 That has been the modern view instead of going to families, going to unrelated individuals

00:13:53 with one phenotype at a time.

00:13:55 And what we’re doing now as we’re maturing in all of these sciences is that we’re doing

00:14:00 this in the context of large medical systems or enormous cohorts that are very well phenotyped

00:14:07 across hundreds of phenotypes, sometimes with our complete electronic health record.

00:14:13 So you can now start relating not just one gene segregating one family, not just thousands

00:14:19 of variants segregating with one phenotype, but now you can do millions of variants versus

00:14:25 hundreds of phenotypes.

00:14:27 And as a computer scientist, I mean, deconvolving that matrix, partitioning it into the layers

00:14:33 of biology that are associated with every one of these elements is a dream come true.

00:14:40 It’s like the world’s greatest puzzle.

00:14:42 And you can now solve that puzzle by throwing in more and more knowledge about the function

00:14:50 of different genomic regions and how these functions are changed across tissues and in

00:14:56 the context of disease.

00:14:58 And that’s what my group and many other groups are doing.

00:15:00 We’re trying to systematically relate this genetic variation with molecular variation

00:15:05 at the expression level of the genes, at the epigenomic level of the gene regulatory circuitry,

00:15:12 and at the cellular level of what are the functions that are happening in those cells,

00:15:17 at the single cell level using single cell profiling, and then relate all that vast amount

00:15:22 of knowledge computationally with the thousands of traits that each of these of thousands

00:15:29 of variants are perturbing.

00:15:30 I mean, this is something we talked about, I think last time.

00:15:34 So there’s these effects at different levels that happen.

00:15:36 You said at a single cell level, you’re trying to see things that happen due to certain perturbations.

00:15:42 And then it’s not just like a puzzle of perturbation and disease.

00:15:49 It’s perturbation then effect at a cellular level, then at an organ level, a body, like,

00:15:57 how do you disassemble this into like what your group is working on?

00:16:02 You’re basically taking a bunch of the hard problems in the space.

00:16:06 How do you break apart a difficult disease and break it apart into problems that you,

00:16:13 into puzzles that you can now start solving?

00:16:15 So there’s a struggle here.

00:16:17 Super scientists love hard puzzles and they’re like, oh, I want to build a method that just

00:16:22 deconvolves the whole thing computationally.

00:16:24 And that’s very tempting and it’s very appealing, but biologists just like to decouple that

00:16:31 complexity experimentally, to just like peel off layers of complexity experimentally.

00:16:36 And that’s what many of these modern tools that my group and others have both developed

00:16:40 and used.

00:16:41 The fact that we can now figure out tricks for peeling off these layers of complexity

00:16:46 by testing one cell type at a time or by testing one cell at a time.

00:16:53 And you could basically say, what is the effect of these genetic variants associated with

00:16:56 Alzheimer’s on human brain?

00:16:59 Human brain sounds like, oh, it’s an organ, of course, just go one organ at a time.

00:17:04 But human brain has of course, dozens of different brain regions and within each of these brain

00:17:09 regions, dozens of different cell types and every single type of neuron, every single

00:17:15 type of glial cell between astrocytes, oligodendrocytes, microglia, between all of the neural cells

00:17:24 and the vascular cells and the immune cells that are co inhabiting the brain between the

00:17:29 different types of excitatory and inhibitory neurons that are sort of interacting with

00:17:34 each other between different layers of neurons in the cortical layers.

00:17:39 Every single one of these has a different type of function to play in cognition, in

00:17:47 interaction with the environment, in maintenance of the brain, in energetic needs, in feeding

00:17:55 the brain with blood, with oxygen, in clearing out the debris that are resulting from the

00:18:01 super high energy production of cognition in humans.

00:18:06 So all of these things are basically potentially deconvolvable computationally, but experimentally,

00:18:17 you can just do single cell profiling of dozens of regions of the brain across hundreds of

00:18:21 individuals across millions of cells.

00:18:24 And then now you have pieces of the puzzle that you can then put back together to understand

00:18:31 that complexity.

00:18:32 I mean, first of all, the cells in the human brain are the most, maybe I’m romanticizing

00:18:39 it, but cognition seems to be very complicated.

00:18:42 So separating into the function, breaking Alzheimer’s down to the cellular level seems

00:18:53 very challenging.

00:18:56 Is that basically you’re trying to find a way that some perturbation in the genome results

00:19:05 in some obvious major dysfunction in the cell.

00:19:11 You’re trying to find something like that.

00:19:14 Exactly.

00:19:15 So what does human genetics do?

00:19:17 Human genetics basically looks at the whole path from genetic variation all the way to

00:19:21 disease.

00:19:22 So human genetics has basically taken thousands of Alzheimer’s cases and thousands of controls

00:19:31 matched for age, for sex, for environmental backgrounds and so on and so forth.

00:19:38 And then looked at that map where you’re asking, what are the individual genetic perturbations

00:19:44 and how are they related to all the way to Alzheimer’s disease?

00:19:48 And that has actually been quite successful.

00:19:51 So we now have more than 27 different loci, these are genomic regions that are associated

00:19:57 with Alzheimer’s at these end to end level.

00:20:02 But the moment you sort of break up that very long path into smaller levels, you can basically

00:20:07 say from genetics, what are the epigenomic alterations at the level of gene regulatory

00:20:13 elements where that genetic variant perturbs the control region nearby.

00:20:19 That effect is much larger.

00:20:21 You mean much larger in terms of this down the line impact or?

00:20:25 It’s much larger in terms of the measurable effect, this A versus B variance is actually

00:20:31 so much cleanly defined when you go to the shorter branches.

00:20:35 Because for one genetic variant to affect Alzheimer’s, that’s a very long path.

00:20:40 That basically means that in the context of millions of these 6 million variants that

00:20:43 every one of us carries, that one single nucleotide has a detectable effect all the way to the

00:20:51 end.

00:20:52 I mean, it’s just mind boggling that that’s even possible, but indeed there are such effects.

00:20:57 So the hope is, or the most scientifically speaking, the most effective place where to

00:21:03 detect the alteration that results in disease is earlier on in the pipeline, as early as

00:21:10 possible.

00:21:11 It’s a trade off.

00:21:12 If you go very early on in the pipeline, now each of these epigenomic alterations, for

00:21:17 example, this enhancer control region is active maybe 50% less, which is a dramatic effect.

00:21:25 Now you can ask, well, how much does changing one regulatory region in the genome in one

00:21:29 cell type change disease?

00:21:31 Well, that path is now long.

00:21:33 So if you instead look at expression, the path between genetic variation and the expression

00:21:39 of one gene goes through many enhancer regions, and therefore it’s a subtler effect at the

00:21:44 gene level.

00:21:45 But then now you’re closer because one gene is acting in the context of only 20,000 other

00:21:51 genes as opposed to one enhancer acting in the context of 2 million other enhancers.

00:21:57 So you basically now have genetic, epigenomic, the circuitry, transcriptomic, the gene expression

00:22:04 control, and then cellular, where you can basically say, I can measure various properties

00:22:09 of those cells.

00:22:11 What is the calcium influx rate when I have this genetic variation?

00:22:17 What is the synaptic density?

00:22:19 What is the electric impulse conductivity and so on and so forth?

00:22:24 So you can measure things along this path to disease, and you can also measure endophenotypes.

00:22:32 You can basically measure your brain activity.

00:22:37 You can do imaging in the brain.

00:22:39 You can basically measure, I don’t know, the heart rate, the pulse, the lipids, the amount

00:22:44 of blood secreted and so on and so forth.

00:22:46 And then through all of that, you can basically get at the path to causality, the path to

00:22:52 disease.

00:22:55 And is there something beyond cellular?

00:22:57 So you mentioned lifestyle interventions or changes as a way to, or like be able to prescribe

00:23:05 changes in lifestyle.

00:23:07 Like what about organs?

00:23:09 What about like the function of the body as a whole?

00:23:13 Yeah, absolutely.

00:23:14 So basically when you go to your doctor, they always measure, you know, your pulse.

00:23:18 They always measure your height.

00:23:19 They always measure your weight, you know, your BMI.

00:23:21 So basically these are just very basic variables.

00:23:24 But with digital devices nowadays, you can start measuring hundreds of variables for

00:23:27 every individual.

00:23:29 You can basically also phenotype cognitively through tests, Alzheimer’s patients.

00:23:37 There are cognitive tests that you can measure, that you typically do for cognitive decline,

00:23:43 these mini mental observations that you have specific questions to.

00:23:48 You can think of sort of enlarging the set of cognitive tests.

00:23:51 So in the mouse, for example, you do experiments for how do they get out of mazes?

00:23:55 How do they find food?

00:23:57 Whether they recall a fear, whether they shake in a new environment and so on and so forth.

00:24:02 In the human, you can have much, much richer phenotypes where you can basically say not

00:24:06 just imaging at the organ level and all kinds of other activities at the organ level, but

00:24:13 you can also do at the organism level, you can do behavioral tests.

00:24:19 And how did they do on empathy?

00:24:21 How did they do on memory?

00:24:22 How did they do on longterm memory versus short term memory?

00:24:26 And so on and so forth.

00:24:27 I love how you’re calling that phenotype.

00:24:28 I guess it is.

00:24:29 It is.

00:24:31 But like your behavior patterns that might change over a period of a life, your ability

00:24:37 to remember things, your ability to be empathetic or emotionally, your intelligence perhaps

00:24:44 even.

00:24:45 Yeah, but intelligence has hundreds of variables.

00:24:47 You can be your math intelligence, your literary intelligence, your puzzle solving intelligence,

00:24:50 your logic.

00:24:51 It could be like hundreds of things.

00:24:52 And all of that, we’re able to measure that better and better and all that could be connected

00:24:57 to the entire pipeline somehow.

00:24:58 We used to think of each of these as a single variable like intelligence.

00:25:01 I mean, that’s ridiculous.

00:25:03 It’s basically dozens of different genes that are controlling every single variable.

00:25:10 You can basically think of, imagine us in a video game where every one of us has measures

00:25:16 of strength, stamina, energy left and so on and so forth.

00:25:20 But you could click on each of those five bars that are just the main bars and each

00:25:24 of those will just give you then hundreds of bars and can basically say, okay, great

00:25:28 for my machine learning task, I want someone who, a human who has these particular forms

00:25:36 of intelligence.

00:25:37 I require now these 20 different things.

00:25:40 And then you can combine those things and then relate them to of course performance

00:25:45 in a particular task, but you can also relate them to genetic variation that might be affecting

00:25:50 different parts of the brain.

00:25:52 For example, your frontal cortex versus your temporal cortex versus your visual cortex

00:25:56 and so on and so forth.

00:25:58 So genetic variation that affects expression of genes in different parts of your brain

00:26:02 can basically affect your music ability, your auditory ability, your smell, just dozens

00:26:08 of different phenotypes can be broken down into hundreds of cognitive variables and then

00:26:15 relate each of those to thousands of genes that are associated with them.

00:26:20 So somebody who loves RPGs or playing games, there’s too few variables that we can control.

00:26:28 So I’m excited if we’re in fact living in a simulation and this is a video game, I’m

00:26:32 excited by the quality of the video game.

00:26:37 The game designer did a hell of a good job.

00:26:39 So we’re impressed.

00:26:40 Oh, I don’t know.

00:26:41 The sunset last night was a little unrealistic.

00:26:43 Yeah.

00:26:44 Yeah.

00:26:45 The graphics.

00:26:46 Exactly.

00:26:47 Come on, NVIDIA.

00:26:48 To zoom back out, we’ve been talking about the genetic origins of diseases, but I think

00:26:54 it’s fascinating to talk about what are the most important diseases to understand and

00:27:01 especially as it connects to the things that you’re working on.

00:27:05 So it’s very difficult to think about important diseases to understand.

00:27:08 There’s many metrics of importance.

00:27:10 One is lifestyle impact.

00:27:12 I mean, if you look at COVID, the impact on lifestyle has been enormous.

00:27:16 So understanding COVID is important because it has impacted the wellbeing in terms of

00:27:23 ability to have a job, ability to have an apartment, ability to go to work, ability

00:27:27 to have a mental circle of support and all of that for millions of Americans, like huge,

00:27:34 huge impact.

00:27:35 So that’s one aspect of importance.

00:27:37 So basically mental disorders, Alzheimer’s has a huge importance in the wellbeing of

00:27:42 Americans.

00:27:44 Whether or not it kills someone for many, many years, it has a huge impact.

00:27:48 So the first measure of importance is just wellbeing.

00:27:52 Impact on the quality of life.

00:27:53 Impact on the quality of life, absolutely.

00:27:55 The second metric, which is much easier to quantify is deaths.

00:28:00 What is the number one killer?

00:28:01 The number one killer is actually heart disease.

00:28:04 It is actually killing 650,000 Americans per year.

00:28:10 Number two is cancer with 600,000 Americans.

00:28:14 Number three, far, far down the list is accidents, every single accident combined.

00:28:19 So basically you read the news, accidents, like there was a huge car crash all over the

00:28:24 news.

00:28:25 But the number of deaths, number three by far, 167,000.

00:28:31 Core respiratory disease.

00:28:32 So that’s asthma, not being able to breathe and so on and so forth, 160,000 Alzheimer’s

00:28:39 number five with 120,000 and then stroke, brain aneurysms and so on and so forth, that’s

00:28:45 147,000 diabetes and metabolic disorders, et cetera.

00:28:49 That’s 85,000.

00:28:51 The flu is 60,000, suicide, 50,000 and then overdose, et cetera, you know, goes further

00:28:58 down the list.

00:29:00 So of course COVID has creeped up to be the number three killer this year with, you know,

00:29:06 more than 100,000 Americans and counting.

00:29:11 And you know, but if you think about sort of what do we use, what are the most important

00:29:16 diseases, you have to understand both the quality of life and the sheer number of deaths

00:29:22 and just numbers of years lost if you wish.

00:29:25 And each of these diseases you can think of as, and also including terrorist attacks and

00:29:30 school shootings, for example, things which lead to fatalities, you can look at as problems

00:29:39 that could be solved.

00:29:41 And some problems are harder to solve than others.

00:29:44 I mean, that’s part of the equation.

00:29:46 So maybe if you look at these diseases, if you look at heart disease or cancer or Alzheimer’s

00:29:52 or just like schizophrenia and obesity, Debbie, like not necessarily things that kill you,

00:29:59 but affect the quality of life, which problems are solvable, which aren’t, which are harder

00:30:05 to solve, which aren’t.

00:30:07 I love your question because he puts it in the context of a global effort rather than

00:30:13 just the local effort.

00:30:15 So basically if you look at the global aspect, exercise and nutrition are two interventions

00:30:22 that we can as a society make a much better job at.

00:30:27 So if you think about sort of the availability of cheap food, it’s extremely high in calories.

00:30:33 It’s extremely detrimental for you, like a lot of processed food, et cetera.

00:30:36 So if we change that equation and as a society, we made availability of healthy food much,

00:30:43 much easier and charged a burger at McDonald’s, the price that it costs on the health system,

00:30:52 then people would actually start buying more healthy foods.

00:30:56 So basically that’s sort of a societal intervention, if you wish.

00:30:59 In the same way, increasing empathy, increasing education, increasing the social framework

00:31:06 and support would basically lead to fewer suicides.

00:31:10 It would lead to fewer murders.

00:31:11 It would lead to fewer deaths overall.

00:31:15 So that’s something that we as a society can do.

00:31:19 You can also think about external factors versus internal factors.

00:31:21 So the external factors are basically communicable diseases like COVID, like the flu, et cetera.

00:31:27 And the internal factors are basically things like cancer and Alzheimer’s where basically

00:31:33 your genetics will eventually drive you there.

00:31:38 And then of course, with all of these factors, every single disease has both the genetic

00:31:43 component and environmental component.

00:31:46 So heart disease, huge genetic contribution, Alzheimer’s, it’s like 60% plus genetic.

00:31:55 So I think it’s like 79% heritability.

00:31:59 So that basically means that genetics alone explains 79% of Alzheimer’s incidents.

00:32:06 And yes, there’s a 21% environmental component where you could basically enrich your cognitive

00:32:14 environment, enrich your social interactions, read more books, learn a foreign language,

00:32:21 go running, you know, sort of have a more fulfilling life.

00:32:24 All of that will actually decrease Alzheimer’s, but there’s a limit to how much that can impact

00:32:29 because of the huge genetic footprint.

00:32:31 So this is fascinating.

00:32:32 So each one of these problems have a genetic component and an environment component.

00:32:38 And so like when there’s a genetic component, what can we do about some of these diseases?

00:32:43 And have you worked on what can you say that’s in terms of problems that are solvable here

00:32:48 or understandable?

00:32:50 So my group works on the genetic component, but I would argue that understanding the genetic

00:32:55 component can have a huge impact even on the environmental component.

00:32:59 Why is that?

00:33:00 Because genetics gives us access to mechanism.

00:33:03 And if we can alter the mechanism, if we can impact the mechanism, we can perhaps counteract

00:33:09 some of the environmental components.

00:33:12 So understanding the biological mechanisms leading to disease is extremely important

00:33:18 in being able to intervene.

00:33:20 But when you can intervene and what, you know, the analogy that I like to give is for example,

00:33:26 for obesity, you know, think of it as a giant bathtub of fat.

00:33:29 There’s basically fat coming in from your diet and there’s fat coming out from your

00:33:35 exercise.

00:33:36 Okay.

00:33:37 So that’s an in out equation and that’s the equation that everybody’s focusing on.

00:33:42 But your metabolism impacts that, you know, bathtub.

00:33:47 Basically your metabolism controls the rate at which you’re burning energy.

00:33:53 It controls the rate at which you’re storing energy.

00:33:56 And it also teaches you about the various valves that control the input and the output

00:34:02 equation.

00:34:04 So if we can learn from the genetics, the valves, we can then manipulate those valves.

00:34:11 And even if the environment is feeding you a lot of fat and getting a little that out,

00:34:16 you can just poke another hole at the bathtub and just get a lot of the fat out.

00:34:19 Yeah, that’s fascinating.

00:34:21 Yeah.

00:34:22 So we’re not just passive observers of our genetics.

00:34:25 The more we understand, the more we can come up with actual treatments.

00:34:29 And I think that’s an important aspect to realize when people are thinking about strong

00:34:35 effect versus weak effect variants.

00:34:38 So some variants have strong effects.

00:34:39 We talked about these Mendelian disorders where a single gene has a sufficiently large

00:34:43 effect, penetrance, expressivity, and so on and so forth, that basically you can trace

00:34:49 it in families with cases and not cases, cases, not cases, and so on and so forth.

00:34:55 But so these are the genes that everybody says, oh, that’s the genes we should go after

00:35:02 because that’s a strong effect gene.

00:35:04 I like to think about it slightly differently.

00:35:06 These are the genes where genetic impacts that have a strong effect were tolerated because

00:35:15 every single time we have a genetic association with disease, it depends on two things.

00:35:20 Number one, the obvious one, whether the gene has an impact on the disease.

00:35:24 Number two, the more subtle one is whether there is genetic variation standing and circulating

00:35:32 and segregating in the human population that impacts that gene.

00:35:37 Some genes are so darn important that if you mess with them, even a tiny little amount,

00:35:44 that person’s dead.

00:35:46 So those genes don’t have variation.

00:35:49 You’re not going to find a genetic association if you don’t have variation.

00:35:53 That doesn’t mean that the gene has no role.

00:35:55 It simply means that the gene tolerates no mutations.

00:35:59 So that’s actually a strong signal when there’s no variation.

00:36:01 That’s so fascinating.

00:36:02 Exactly.

00:36:03 Genes that have very little variation are hugely important.

00:36:06 You can actually rank the importance of genes based on how little variation they have.

00:36:10 And those genes that have very little variation but no association with disease, that’s a

00:36:16 very good metric to say, oh, that’s probably a developmental gene because we’re not good

00:36:20 at measuring those phenotypes.

00:36:22 So it’s genes that you can tell evolution has excluded mutations from, but yet we can’t

00:36:29 see them associated with anything that we can measure nowadays.

00:36:32 It’s probably early embryonic lethal.

00:36:34 What are all the words you just said?

00:36:36 Early embryonic what?

00:36:37 Lethal.

00:36:38 Meaning?

00:36:39 Meaning that that embryo will die.

00:36:40 Okay.

00:36:41 There’s a bunch of stuff that is required for a stable functional organism across the

00:36:49 board for an entire species, I guess.

00:36:53 If you look at sperm, it expresses thousands of proteins.

00:36:58 Does sperm actually need thousands of proteins?

00:37:01 No, but it’s probably just testing them.

00:37:05 So my speculation is that misfolding of these proteins is an early test for failure.

00:37:11 So that out of the millions of sperm that are possible, you select the subset that are

00:37:18 just not grossly misfolding thousands of proteins.

00:37:21 So it’s kind of an assert that this is folded correctly.

00:37:25 Correct.

00:37:26 Yeah.

00:37:27 This just because if this little thing about the folding of a protein isn’t correct, that

00:37:32 probably means somewhere down the line, there’s a bigger issue.

00:37:35 That’s exactly right.

00:37:36 So fail fast.

00:37:37 So basically if you look at the mammalian investment in a newborn, that investment is

00:37:45 enormous in terms of resources.

00:37:47 So mammals have basically evolved mechanisms for fail fast.

00:37:52 Where basically in those early months of development, I mean it’s horrendous of course at the personal

00:37:58 level when you lose your future child, but in some ways there’s so little hope for that

00:38:08 child to develop and sort of make it through the remaining months that sort of fail fast

00:38:12 is probably a good evolutionary principle for mammals.

00:38:19 And of course humans have a lot of medical resources that you can sort of give those

00:38:24 children a chance and we have so much more success in sort of giving folks who have these

00:38:33 strong carrier mutations a chance, but if they’re not even making it through the first

00:38:37 three months, we’re not going to see them.

00:38:39 So that’s why when we say what are the most important genes to focus on, the ones that

00:38:45 have a strong effect mutation or the ones that have a weak effect mutation, well the

00:38:50 jury might be out because the ones that have a strong effect mutation are basically not

00:38:57 mattering as much.

00:38:58 The ones that only have weak effect mutations by understanding through genetics that they

00:39:04 have a weak effect mutation and understanding that they have a causal role on the disease,

00:39:10 we can then say, okay, great, evolution has only tolerated a 2% change in that gene.

00:39:15 Pharmaceutically I can go in and induce a 70% change in that gene and maybe I will poke

00:39:22 another hole at the bathtub that was not easy to control in many of the other sort of strong

00:39:33 effect genetic variants.

00:39:35 So there’s this beautiful map of across the population of things that you’re saying strong

00:39:41 and weak effects, so stuff with a lot of mutations and stuff with little mutations with no mutations

00:39:48 and you have this map and it lays out the puzzle.

00:39:51 Yeah.

00:39:52 So when I say strong effect, I mean at the level of individual mutations.

00:39:56 So basically genes where, so you have to think of first the effect of the gene on the disease.

00:40:03 Remember how I was sort of painting that map earlier from genetics all the way to phenotype.

00:40:10 That gene can have a strong effect on the disease, but the genetic variant might have

00:40:15 a weak effect on the gene.

00:40:18 So basically when you ask what is the effect of that genetic variant on the disease, it

00:40:24 could be that that genetic variant impacts the gene by a lot and then the gene impacts

00:40:29 the disease by a little, or it could be that the genetic variants impacts the gene by a

00:40:33 little and then the gene impacts the disease by a lot.

00:40:35 So what we care about is genes that impact the disease a lot, but genetics gives us the

00:40:41 full equation and what I would argue is if we couple the genetics with expression variation

00:40:51 to basically ask what genes change by a lot and which genes correlate with disease by

00:41:00 a lot, even if the genetic variants change them by a little, then those are the best

00:41:06 places to intervene.

00:41:07 Those are the best places where pharmaceutical, if I have even a modest effect, I will have

00:41:13 a strong effect on the disease, whereas those genetic variants that have a huge effect on

00:41:17 the disease, I might not be able to change that gene by this much without affecting all

00:41:21 kinds of other things.

00:41:22 Interesting.

00:41:23 So that’s what we’re looking at.

00:41:26 What have we been able to find in terms of which disease could be helped?

00:41:31 Again, don’t get me started.

00:41:37 We have found so much.

00:41:38 Our understanding of disease has changed so dramatically with genetics.

00:41:46 I mean places that we had no idea would be involved.

00:41:49 So one of the worst things about my genome is that I have a genetic predisposition to

00:41:53 age related macular degeneration, AMD.

00:41:56 So it’s a form of blindness that causes you to lose the central part of your vision progressively

00:42:02 as you grow older.

00:42:04 My increased risk is fairly small.

00:42:06 I have an 8% chance.

00:42:07 You only have a 6% chance.

00:42:10 I’m an average.

00:42:11 By the way, when you say my, you mean literally yours.

00:42:14 You know this about you.

00:42:15 I know this about me.

00:42:18 Which is kind of, I mean philosophically speaking is a pretty powerful thing to live with.

00:42:26 Maybe that’s, so we agreed to talk again by the way for the listeners to where we’re going

00:42:31 to try to focus on science today and a little bit of philosophy next time.

00:42:36 But it’s interesting to think about the more you’re able to know about yourself from the

00:42:42 genetic information in terms of the diseases, how that changes your own view of life.

00:42:49 So there’s a lot of impact there and there’s something called genetics exceptionalism,

00:42:56 which basically thinks of genetics as something very, very different than everything else

00:43:01 as a type of determinism.

00:43:04 And you know, let’s talk about that next time.

00:43:07 So basically.

00:43:08 That’s a good preview.

00:43:09 Yeah.

00:43:10 So let’s go back to AMD.

00:43:11 So basically with AMD, we have no idea what causes AMD.

00:43:16 You know, it was, it was a mystery until the genetics were worked out.

00:43:23 And now the fact that I know that I have a predisposition allows me to sort of make some

00:43:28 life choices, number one, but number two, the genes that lead to that predisposition

00:43:34 give us insights as to how does it actually work.

00:43:38 And that’s a place where genetics gave us something totally unexpected.

00:43:42 So there’s a complement pathway, which is an immune function pathway that was in, you

00:43:52 know, most of the loci associated with AMD.

00:43:55 And that basically told us that, wow, there’s an immune basis to this eye disorder that

00:44:02 people had just not expected before.

00:44:05 If you look at complement, it was recently also implicated in schizophrenia.

00:44:11 And there’s a type of microglia that is involved in synaptic pruning.

00:44:17 So synapses are the connections between neurons.

00:44:20 And in this whole use it or lose it view of mental cognition and other capabilities, you

00:44:27 basically have microglia, which are immune cells that are sort of constantly traversing

00:44:32 your brain and then pruning neuronal connections, pruning synaptic connections that are not

00:44:38 utilized.

00:44:40 So in schizophrenia, there’s thought to be a change in the pruning that basically if

00:44:47 you don’t prune your synapses the right way, you will actually have an increased role of

00:44:53 schizophrenia.

00:44:54 This is something that was completely unexpected for schizophrenia.

00:44:57 Of course, we knew it has to do with neurons, but the role of the complement complex, which

00:45:01 is also implicated in AMD, which is now also implicated in schizophrenia, was a huge surprise.

00:45:06 What’s the complement complex?

00:45:08 So it’s basically a set of genes, the complement genes that are basically having various immune

00:45:13 roles.

00:45:15 And as I was saying earlier, our immune system has been coopted for many different roles

00:45:19 across the body.

00:45:21 So they actually play many diverse roles.

00:45:23 And somehow the immune system is connected to the synaptic pruning process, the process.

00:45:29 Exactly.

00:45:30 So the prune cells were coopted to prune synapses.

00:45:33 How did you figure this out?

00:45:35 How does one go about figuring this intricate connection, like pipeline of connections out?

00:45:41 Yeah.

00:45:42 Let me give you another example.

00:45:44 So Alzheimer’s disease, the first place that you would expect it to act is obviously the

00:45:48 brain.

00:45:49 So we had basically this roadmap epigenomics consortium view of the human epigenome, the

00:45:57 largest map of the human epigenome that has ever been built across 127 different tissues

00:46:04 and samples with dozens of epigenomic marks measured in hundreds of donors.

00:46:10 So what we’ve basically learned through that is that you basically can map what are the

00:46:16 active gene regulatory elements for every one of the tissues in the body.

00:46:20 And then we connected these gene regulatory active maps of basically what regions of the

00:46:27 human genome are turning on in every one of different tissues.

00:46:32 We then can go back and say, where are all of the genetic loci that are associated with

00:46:38 disease?

00:46:39 This is something that my group, I think was the first to do back in 2010 in this Ernst

00:46:46 Nature Biotech paper, but basically we were for the first time able to show that specific

00:46:52 chromatin states, specific epigenomic states, in that case enhancers, were in fact enriched

00:46:58 in disease associated variants.

00:47:00 We pushed that further in the Ernst Nature paper a year later.

00:47:05 And then in this roadmap epigenomics paper a few years after that, but basically that

00:47:12 matrix that you mentioned earlier was in fact the first time that we could see what genetic

00:47:18 traits have genetic variants that are enriched in what tissues in the body.

00:47:26 And a lot of that map made complete sense.

00:47:28 If you looked at a diversity of immune traits like allergies and type one diabetes and so

00:47:33 on and so forth, you basically could see that they were enriching, that the genetic variants

00:47:38 associated with those traits were enriched in enhancers in these gene regulatory elements

00:47:44 active in T cells and B cells and hematopoietic stem cells and so on and so forth.

00:47:49 So that basically gave us a confirmation in many ways that those immune traits were indeed

00:47:56 enriching immune cells.

00:48:00 If you looked at type two diabetes, you basically saw an enrichment in only one type of sample

00:48:06 and it was pancreatic islets.

00:48:08 And we know that type two diabetes sort of stems from the dysregulation of insulin in

00:48:14 the beta cells of pancreatic islets.

00:48:17 And that sort of was spot on, super precise.

00:48:21 If you looked at blood pressure, where would you expect blood pressure to occur?

00:48:25 You know, I don’t know, maybe in your metabolism and ways that you process coffee or something

00:48:29 like that.

00:48:30 Maybe in your brain, the way that you stress out and increases your blood pressure, et

00:48:33 cetera.

00:48:34 So the blood pressure localized specifically in the left ventricle of the heart.

00:48:40 So the enhancers of the left ventricle in the heart contained a lot of genetic variants

00:48:44 associated with blood pressure.

00:48:46 If you look at height, we found an enrichment specifically in embryonic stem cell enhancers.

00:48:53 So the genetic variants predisposing you to be taller or shorter are in fact acting in

00:48:57 developmental stem cells, makes complete sense.

00:49:01 If you looked at inflammatory bowel disease, you basically found inflammatory, which is

00:49:05 immune, and also bowel disease, which is digestive.

00:49:09 And indeed we saw a double enrichment both in the immune cells and in the digestive cells.

00:49:15 So that basically told us that this is acting in both components.

00:49:19 There’s an immune component to inflammatory bowel disease and there’s a digestive component.

00:49:23 And the big surprise was for Alzheimer’s.

00:49:25 We had seven different brain samples.

00:49:29 We found zero enrichment in the brain samples for genetic variants associated with Alzheimer’s.

00:49:36 And this is mind boggling.

00:49:38 Our brains were literally hurting.

00:49:40 What is going on?

00:49:42 And what is going on is that the brain samples are primarily neurons, oligodendrocytes, and

00:49:49 astrocytes in terms of the cell types that make them up.

00:49:54 So that basically indicated that genetic variants associated with Alzheimer’s were probably

00:49:59 not acting in oligodendrocytes, astrocytes, or neurons.

00:50:04 So what could they be acting in?

00:50:05 Well, the fourth major cell type is actually microglia.

00:50:10 Microglia are resident immune cells in your brain.

00:50:13 Oh, nice.

00:50:15 They immune.

00:50:16 Oh, wow.

00:50:17 They are CD14 plus, which is this sort of cell surface markers of those cells.

00:50:24 So they’re CD14 plus cells, just like macrophages that are circulating in your blood.

00:50:30 The microglia are resident monocytes that are basically sitting in your brain.

00:50:35 They’re tissue specific monocytes.

00:50:38 And every one of your tissues, like your fat, for example, has a lot of macrophages that

00:50:42 are resident.

00:50:43 And the M1 versus M2 macrophage ratio has a huge role to play in obesity.

00:50:49 And so basically, again, these immune cells are everywhere, but basically what we found

00:50:53 through this completely unbiased view of what are the tissues that likely underlie different

00:50:59 disorders, we found that Alzheimer’s was humongously enriched in microglia, but not at all in the

00:51:08 other cell types.

00:51:09 So what are we supposed to make that if you look at the tissues involved, is that simply

00:51:15 useful for indication of propensity for disease, or does it give us somehow a pathway of treatment?

00:51:24 It’s very much the second.

00:51:26 If you look at the way to therapeutics, you have to start somewhere.

00:51:33 What are you going to do?

00:51:34 You’re going to basically make assays that manipulate those genes and those pathways

00:51:42 in those cell types.

00:51:43 So before we know the tissue of action, we don’t even know where to start.

00:51:49 We basically are at a loss.

00:51:51 But if you know the tissue of action, and even better, if you know the pathway of action,

00:51:54 then you can basically screen your small molecules, not for the gene, you can screen them directly

00:52:00 for the pathway in that cell type.

00:52:02 So you can basically develop a high throughput multiplexed robotic system for testing the

00:52:10 impact of your favorite molecules that you know are safe, efficacious, and sort of hit

00:52:16 that particular gene and so on and so forth.

00:52:18 You can basically screen those molecules against either a set of genes that act in that pathway

00:52:25 or on the pathway directly by having a cellular assay.

00:52:29 And then you can basically go into mice and do experiments and basically sort of figure

00:52:33 out ways to manipulate these processes that allow you to then go back to humans and do

00:52:38 a clinical trial that basically says, okay, I was able indeed to reverse these processes

00:52:43 in mice.

00:52:44 Can I do the same thing in humans?

00:52:46 So the knowledge of the tissues gives you the pathway to treatment, but that’s not the

00:52:51 only part.

00:52:52 There are many additional steps to figuring out the mechanism of disease.

00:52:57 So that’s really promising.

00:52:59 Maybe to take a small step back, you’ve mentioned all these puzzles that were figured out with

00:53:04 the Nature paper for, I mean, you’ve mentioned a ton of diseases from obesity to Alzheimer’s,

00:53:13 even schizophrenia, I think you mentioned.

00:53:17 What is the actual methodology of figuring this out?

00:53:20 So indeed, I mentioned a lot of diseases and my lab works on a lot of different disorders.

00:53:26 And the reason for that is that if you look at biology, it used to be zoology departments

00:53:39 and botanology departments and virology departments and so on and so forth.

00:53:43 And MIT was one of the first schools to basically create a biology department, like, oh, we’re

00:53:47 going to study all of life suddenly.

00:53:49 Why was that even a case?

00:53:51 Because the advent of DNA and the genome and the central dogma of DNA makes RNA makes protein

00:53:58 in many ways, unified biology.

00:54:01 You could suddenly study the process of transcription in viruses or in bacteria and have a huge

00:54:07 impact on yeast and fly and maybe even mammals because of this realization of these common

00:54:15 underlying processes.

00:54:17 And in the same way that DNA unified biology, genetics is unifying disease studies.

00:54:27 So you used to have, I don’t know, cardiovascular disease department and neurological disease

00:54:39 department and neurodegeneration department and basically immune and cancer and so on

00:54:47 and so forth.

00:54:48 And all of these were studied in different labs because it made sense, because basically

00:54:53 the first step was understanding how the tissue functions and we kind of knew the tissues

00:54:57 involved in cardiovascular disease and so on and so forth.

00:55:00 But what’s happening with human genetics is that all of these walls and edifices that

00:55:05 we had built are crumbling.

00:55:08 And the reason for that is that genetics is in many ways revealing unexpected connections.

00:55:16 So suddenly we now have to bring the immunologists to work on Alzheimer’s.

00:55:21 They were never in the room.

00:55:22 They were in another building altogether.

00:55:25 The same way for schizophrenia, we now have to sort of worry about all these interconnected

00:55:31 aspects.

00:55:33 For metabolic disorders, we’re finding contributions from brain.

00:55:37 So suddenly we have to call the neurologist from the other building and so on and so forth.

00:55:41 So in my view, it makes no sense anymore to basically say, oh, I’m a geneticist studying

00:55:49 immune disorders.

00:55:50 I mean, that’s ridiculous because, I mean, of course in many ways you still need to sort

00:55:55 of focus.

00:55:56 But what we’re doing is that we’re basically saying we’ll go wherever the genetics takes

00:56:01 us.

00:56:02 And by building these massive resources, by working on our latest map is now 833 tissues,

00:56:10 sort of the next generation of the epigenomics roadmap, which we’re now called epimap, is

00:56:15 833 different tissues.

00:56:18 And using those, we’ve basically found enrichments in 540 different disorders.

00:56:24 Those enrichments are not like, oh great, you guys work on that and we’ll work on this.

00:56:29 They’re intertwined amazingly.

00:56:31 So of course there’s a lot of modularity, but there’s these enhancers that are sort

00:56:36 of broadly active and these disorders that are broadly active.

00:56:39 So basically some enhancers are active in all tissues and some disorders are enriching

00:56:43 in all tissues.

00:56:44 So basically there’s these multifactorial and this other class, which I like to call

00:56:49 polyfactorial diseases, which are basically lighting up everywhere.

00:56:54 And in many ways it’s, you know, sort of cutting across these walls that were previously built

00:57:00 across these departments.

00:57:01 And the polyfactorial ones were probably the previous structural departments wasn’t equipped

00:57:07 to deal with those.

00:57:08 I mean, again, maybe it’s a romanticized question, but you know, there’s in physics, there’s

00:57:14 a theory of everything.

00:57:16 Do you think it’s possible to move towards an almost theory of everything of disease

00:57:22 from a genetic perspective?

00:57:24 So if this unification continues, is it possible that, like, do you think in those terms, like

00:57:29 trying to arrive at a fundamental understanding of how disease emerges, period?

00:57:35 That unification is not just foreseeable, it’s inevitable.

00:57:41 I see it as inevitable.

00:57:43 We have to go there.

00:57:45 You cannot be a specialist anymore.

00:57:48 If you’re a genomicist, you have to be a specialist in every single disorder.

00:57:53 And the reason for that is that the fundamental understanding of the circuitry of the human

00:57:59 genome that you need to solve schizophrenia, that fundamental circuitry is hugely important

00:58:07 to solve Alzheimer’s.

00:58:09 And that same circuitry is hugely important to solve metabolic disorders.

00:58:13 And that same exact circuitry is hugely important for solving immune disorders and cancer and,

00:58:20 you know, every single disease.

00:58:22 So all of them have the same sub task.

00:58:26 And I teach dynamic programming in my class.

00:58:29 Dynamic programming is all about sort of not redoing the work.

00:58:34 It’s reusing the work that you do once.

00:58:37 So basically for us to say, oh, great, you know, you guys in the immune building go solve

00:58:42 the fundamental circuitry of everything.

00:58:44 And then you guys in the schizophrenia building go solve the fundamental circuitry of everything

00:58:47 separately, is crazy.

00:58:50 So what we need to do is come together and sort of have a circuitry group, the circuitry

00:58:56 building that sort of tries to solve the circuitry of everything.

00:58:59 And then the immune folks who will apply this knowledge to all of the disorders that are

00:59:05 associated with immune dysfunction and the schizophrenia folks will basically interacting

00:59:12 with both the immune folks and with the neuronal folks.

00:59:15 And all of them will be interacting with the circuitry folks and so on and so forth.

00:59:19 So that’s sort of the current structure of my group, if you wish.

00:59:22 So basically what we’re doing is focusing on the fundamental circuitry.

00:59:27 But at the same time, we’re the users of our own tools by collaborating with many other

00:59:34 labs in every one of these disorders that we mentioned.

00:59:37 We basically have a heart focus on cardiovascular disease, coronary artery disease, heart failure

00:59:42 and so on and so forth.

00:59:44 We have an immune focus on several immune disorders.

00:59:48 We have a cancer focus on metastatic melanoma and immunotherapy response.

00:59:55 We have a psychiatric disease focus on schizophrenia, autism, PTSD, and other psychiatric disorders.

01:00:04 We have an Alzheimer’s and neurodegeneration focus on Huntington’s disease, ALS and, you

01:00:10 know, AD related disorders like frontotemporal dementia and Lewy body dementia.

01:00:14 And of course, a huge focus on Alzheimer’s.

01:00:16 We have a metabolic focus on the role of exercise and diets and sort of how they’re impacting

01:00:23 metabolic organs across the body and across many different tissues.

01:00:29 And all of them are interfacing with the circuitry.

01:00:34 And the reason for that is another computer science principle of eat your own dog food.

01:00:42 If everybody ate their own dog food, dog food would taste a lot better.

01:00:47 The reason why Microsoft Excel and Word and PowerPoint was so important and so successful

01:00:55 is because the employees that were working on them, were using them for their day to

01:01:00 day tasks.

01:01:01 You can’t just simply build a circuitry and say, here it is guys, take the circuitry,

01:01:06 we’re done without being the users of that circuitry because you then go back.

01:01:11 And because we span the whole spectrum from profiling the epigenome, using comparative

01:01:16 genomics, finding the important nucleotides in the genome, building the basic functional

01:01:21 map of what are the genes in the human genome, what are the gene regulatory elements of the

01:01:26 human genome.

01:01:27 I mean, over the years we’ve written a series of papers on how do you find human genes in

01:01:31 the first place using comparative genomics?

01:01:34 How do you find the motifs that are the building blocks of gene regulation using comparative

01:01:38 genomics?

01:01:39 And how do you then find how these motifs come together and act in specific tissues

01:01:44 using epigenomics?

01:01:46 How do you link regulators to enhancers and enhancers to their target genes using epigenomics

01:01:53 and regulatory genomics?

01:01:55 So through the years we’ve basically built all this infrastructure for understanding

01:02:00 what I like to say, every single nucleotide of the human genome and how it acts in every

01:02:06 one of the major cell types and tissues of the human body.

01:02:10 I mean, this is no small task.

01:02:12 This is an enormous task that takes the entire field.

01:02:15 And that’s something that my group has taken on along with many other groups.

01:02:20 And we have also, and that sort of a thing sets my group perhaps apart, we have also

01:02:25 worked with specialists in every one of these disorders to basically further our understanding

01:02:30 all the way down to disease and in some cases collaborating with pharma to go all the way

01:02:35 down to therapeutics because of our deep, deep understanding of that basic circuitry

01:02:42 and how it allows us to now improve the circuitry.

01:02:47 Not just treat it as a black box, but basically go and say, okay, we need a better cell type

01:02:51 specific wiring that we now have at the tissue specific level.

01:02:56 So we’re focusing on that because we’re understanding the needs from the disease front.

01:03:01 So you have a sense of the entire pipeline, I mean, one, maybe you can indulge me.

01:03:08 One nice question to ask would be, how do you, from the scientific perspective, go from

01:03:14 knowing nothing about the disease to going, you said, to go into the entire pipeline and

01:03:22 actually have a drug or a treatment that cures that disease?

01:03:26 So that’s an enormously long path and an enormously great challenge.

01:03:32 And what I’m trying to argue is that it progresses in stages of understanding rather than one

01:03:39 gene at a time.

01:03:40 The traditional view of biology was you have one postdoc working on this gene and another

01:03:45 postdoc working on that gene, and they’ll just figure out everything about that gene

01:03:50 and that’s their job.

01:03:52 But we’ve realized how polygenic the diseases are, so we can’t have one postdoc per gene

01:03:57 anymore.

01:03:58 We now have to have these cross cutting needs.

01:04:04 And I’m going to describe the path to circuitry along those needs.

01:04:10 And every single one of these paths, we are now doing in parallel across thousands of

01:04:15 genes.

01:04:17 So the first step is you have a genetic association, and we talked a little bit about sort of the

01:04:23 Mendelian path and the polygenic path to that association.

01:04:27 So the Mendelian path was looking through families to basically find gene regions and

01:04:33 ultimately genes that are underlying particular disorders.

01:04:36 The polygenic path is basically looking at unrelated individuals in this giant matrix

01:04:43 of genotype by phenotype, and then finding hits where a particular variant impacts disease

01:04:49 all the way to the end.

01:04:51 And then we now have a connection, not between a gene and a disease, but between a genetic

01:04:57 region and a disease.

01:05:00 And that distinction is not understood by most people.

01:05:03 So I’m going to explain it a little bit more.

01:05:06 Why do we not have a connection between a gene and a disease, but we have a connection

01:05:11 between a genetic region and a disease?

01:05:13 The reason for that is that 93% of genetic variants that are associated with disease

01:05:21 don’t impact the protein at all.

01:05:27 So if you look at the human genome, there’s 20,000 genes, there’s 3.2 billion nucleotides.

01:05:33 Only 1.5% of the genome codes for proteins.

01:05:40 The other 98.5% does not code for proteins.

01:05:46 If you now look at where are the disease variants located, 93% of them fall in that outside

01:05:54 the genes portion.

01:05:55 Of course, genes are enriched, but they’re only enriched by a factor of three.

01:06:00 That means that still 93% of genetic variants fall outside the proteins.

01:06:06 Why is that difficult?

01:06:08 Why is that a problem?

01:06:09 The problem is that when a variant falls outside the gene, you don’t know what gene is impacted

01:06:15 by that variant.

01:06:16 You can’t just say, oh, it’s near this gene, let’s just connect that variant to the gene.

01:06:21 And the reason for that is that the genome circuitry is very often long range.

01:06:27 So you basically have that genetic variant that could sit in the intron of one gene.

01:06:34 An intron is sort of the place between the exons that code for proteins.

01:06:38 So proteins are split up into exons and introns and every exon codes for a particular subset

01:06:43 of amino acids and together they’re spliced together and then make the final protein.

01:06:49 So that genetic variant might be sitting in an intron of a gene.

01:06:51 It’s transcribed with the gene, it’s processed and then excised, but it might not impact

01:06:56 this gene at all.

01:06:57 It might actually impact another gene that’s a million nucleotides away.

01:07:01 So it’s just riding along even though it has nothing to do with this nearby neighborhood.

01:07:05 That’s exactly right.

01:07:06 Let me give you an example.

01:07:09 The strongest genetic association with obesity was discovered in this FTO gene, fat and obesity

01:07:16 associated gene.

01:07:18 So this FTO gene was studied ad nauseum.

01:07:23 People did tons of experiments on it.

01:07:26 They figured out that FTO is in fact RNA methylation transferase.

01:07:33 It basically impacts something that we call the epitranscriptome.

01:07:38 Just like the genome can be modified, the transcriptome, the transcript of the genes

01:07:43 can be modified.

01:07:44 And we basically said, oh great, that means that epitranscriptomics is hugely involved

01:07:49 in obesity because that gene FTO is clearly where the genetic locus is at.

01:07:56 My group studied FTO in collaboration with a wonderful team led by Melina Klausnitzer.

01:08:04 And what we found is that this FTO locus, even though it is as associated with obesity,

01:08:11 does not implicate the FTO gene.

01:08:16 The genetic variance, it’s in the first intron of the FTO gene, but it controls two genes

01:08:22 IRX3 and IRX5 that are sitting 1.2 million nucleotides away, several genes away.

01:08:32 Oh boy.

01:08:33 What am I supposed to feel about that because isn’t that like super complicated then?

01:08:38 So the way that I was introduced at a conference a few years ago was, and here’s Manolis Kellis

01:08:43 who wrote the most depressing paper of 2015.

01:08:48 And the reason for that is that the entire pharmaceutical industry was so comfortable

01:08:52 that there was a single gene in that locus.

01:08:56 Because in some loci, you basically have three dozen genes that are all sitting in the same

01:08:59 region of association and you’re like, oh gosh, which ones of those is it?

01:09:04 But even that question of which ones of those is it is making the assumption that it is

01:09:08 one of those as opposed to some random gene just far, far away, which is what our paper

01:09:13 showed.

01:09:14 So basically what our paper showed is that you can’t ignore the circuitry.

01:09:19 You have to first figure out the circuitry, all of those long range interactions, how

01:09:23 every genetic variant impacts the expression of every gene in every tissue imaginable across

01:09:28 hundreds of individuals.

01:09:30 And then you now have one of the building blocks, not even all of the building blocks

01:09:35 for then going and understanding disease.

01:09:41 So embrace the wholeness of the circuitry.

01:09:44 Correct.

01:09:45 So back to the question of starting knowing nothing to the disease and going to the treatment.

01:09:51 So what are the next steps?

01:09:53 So you basically have to first figure out the tissue and then describe how you figure

01:09:57 out the tissue.

01:09:58 You figure out the tissue by taking all of these non coding variants that are sitting

01:10:01 outside proteins and then figuring out what are the epigenomic enrichments.

01:10:06 And the reason for that, you know, thankfully is that there is convergence, that the same

01:10:13 processes are impacted in different ways by different loci.

01:10:19 And that’s a saving grace for our field.

01:10:23 The fact that if I look at hundreds of genetic variants associated with Alzheimer’s, they

01:10:27 localize in a small number of processes.

01:10:31 Can you clarify why that’s hopeful?

01:10:34 So like they show up in the same exact way in the, in the specific set of processes.

01:10:40 Yeah.

01:10:41 So basically there’s a small number of biological processes that underlie, or at least that

01:10:45 play the biggest role in every disorder.

01:10:48 So in Alzheimer’s you basically have, you know, maybe 10 different types of processes.

01:10:54 One of them is lipid metabolism.

01:10:56 One of them is immune cell function.

01:10:58 One of them is neuronal energetics.

01:11:02 So these are just a small number of processes, but you have multiple lesions, multiple genetic

01:11:07 perturbations that are associated with those processes.

01:11:10 So if you look at schizophrenia, it’s excitatory neuron function, it’s inhibitory neuron function,

01:11:15 it’s synaptic pruning, it’s calcium signaling and so on and so forth.

01:11:18 So when you look at disease genetics, you have one hit here and one hit there and one

01:11:24 hit there and one hit there, completely different parts of the genome.

01:11:28 But it turns out all of those hits are calcium signaling proteins.

01:11:31 Oh, cool.

01:11:32 You’re like, aha.

01:11:34 That means that calcium signaling is important.

01:11:37 So those people who are focusing on one doctor at a time cannot possibly see that picture.

01:11:42 You have to become a genomicist.

01:11:44 You have to look at the omics, the om, the holistic picture to understand these enrichments.

01:11:51 But you mentioned the convergence thing.

01:11:54 The whatever the thing associated with the disease shows up.

01:11:58 So let me explain convergence.

01:12:00 Convergence is such a beautiful concept.

01:12:03 So you basically have these four genes that are converging on calcium signaling.

01:12:12 So that basically means that they are acting each in their own way, but together in the

01:12:18 same process.

01:12:19 But now in every one of these loci, you have many enhancers controlling each of those genes.

01:12:27 That’s another type of convergence where dysregulation of seven different enhancers might all converge

01:12:33 on dysregulation of that one gene, which then converges on calcium signaling.

01:12:39 And in each one of those enhancers, you might have multiple genetic variants distributed

01:12:44 across many different people.

01:12:46 Everyone has their own different mutation.

01:12:49 But all of these mutations are impacting that enhancer.

01:12:52 And all of these enhancers are impacting that gene.

01:12:55 And all of these genes are impacting this pathway.

01:12:57 And all these pathways are acting in the same tissue.

01:13:00 And all of these tissues are converging together on the same biological process of schizophrenia.

01:13:05 And you’re saying the saving grace is that that conversion seems to happen for a lot

01:13:09 of these diseases.

01:13:11 For all of them.

01:13:12 Basically that for every single disease that we’ve looked at, we have found an epigenomic

01:13:17 enrichment.

01:13:18 How do you do that?

01:13:19 You basically have all of the genetic variants associated with the disorder.

01:13:24 And then you’re asking for all of the enhancers active in a particular tissue.

01:13:28 For 540 disorders, we’ve basically found that indeed there is an enrichment.

01:13:33 That basically means that there is commonality.

01:13:37 And from the commonality, we can just get insights.

01:13:40 So to explain in mathematical terms, we’re basically building an empirical prior.

01:13:47 We’re using a Bayesian approach to basically say, great, all of these variants are equally

01:13:52 likely in a particular locus to be important.

01:13:57 So in a genetic locus, you basically have a dozen variants that are coinherited.

01:14:02 Because the way that inheritance works in the human genome is through all of these recombination

01:14:07 events during meiosis, you basically have, you know, you inherit maybe three, chromosome

01:14:16 three, for example, in your body is inherited from four different parts.

01:14:20 One part comes from your dad, another part comes from your mom, another part comes from

01:14:23 your dad, another part comes from your mom.

01:14:25 So basically, the way that it, sorry, from your mom’s mom.

01:14:30 So you basically have one copy that comes from your dad and one copy that comes from

01:14:33 your mom.

01:14:34 But that copy that you got from your mom is a mixture of her maternal and her paternal

01:14:39 chromosome.

01:14:41 And the copy that you got from your dad is a mixture of his maternal and his paternal

01:14:44 chromosome.

01:14:45 So these breakpoints that happen when chromosomes are lining up are basically ensuring through

01:14:53 these crossover events, they’re ensuring that every child cell during the process of meiosis,

01:15:02 where you basically have, you know, one spermatozoid that basically couples with one ovule to basically

01:15:08 create one egg to basically create the zygote.

01:15:12 You basically have half of your genome that comes from dad and half your genome that comes

01:15:16 from mom.

01:15:17 But in order to line them up, you basically have these crossover events.

01:15:21 These crossover events are basically leading to coinheritance of that entire block coming

01:15:27 from your maternal grandmother and that entire block coming from your maternal grandfather.

01:15:33 Over many generations, these crossover events don’t happen randomly.

01:15:38 There’s a protein called PRDM9 that basically guides the double stranded breaks and then

01:15:45 leads to these crossovers.

01:15:48 And that protein has a particular preference to only a small number of hotspots of recombination,

01:15:54 which then lead to a small number of breaks between these coinheritance patterns.

01:15:59 So even though there are 6 million variants, there are 6 million loci, this variation is

01:16:06 inherited in blocks and every one of these blocks has like two dozen genetic variants

01:16:12 that are all associated.

01:16:13 So in the case of FTO, it wasn’t just one variant, it was 89 common variants that were

01:16:19 all humongously associated with obesity.

01:16:24 Which one of those is the important one?

01:16:26 Well, if you look at only one locus, you have no idea.

01:16:29 But if you look at many loci, you basically say, aha, all of them are enriching in the

01:16:36 same epigenomic map.

01:16:40 In that particular case, it was mesenchymal stem cells.

01:16:44 So these are the progenitor cells that give rise to your brown fat and your white fat.

01:16:50 Progenitor is like the early on developmental stem cells?

01:16:54 So you start from one zygote and that’s a totipotent cell type.

01:16:58 It can do anything.

01:17:00 You then, you know, that cell divides, divides, divides, and then every cell division is leading

01:17:08 to specialization where you now have a mesodermal lineage and ectodermal lineage and endodermal

01:17:14 lineage that basically leads to different parts of your body.

01:17:19 The ectoderm will basically give rise to your skin, ecto means outside, derm is skin.

01:17:25 So ectoderm, but it also gives rise to your neurons and your whole brain.

01:17:29 So that’s a lot of ectoderm.

01:17:31 Mesoderm gives rise to your internal organs, including the vasculature and you know, your

01:17:36 muscle and stuff like that.

01:17:38 So you basically have this progressive differentiation and then if you look further, further down

01:17:45 that lineage, you basically have one lineage that will give rise to both your muscle and

01:17:49 your bone, but also your fat.

01:17:52 And if you go further down the lineage of your fat, you basically have your white fat

01:17:57 cells.

01:17:59 These are the cells that store energy.

01:18:01 So when you eat a lot, but you don’t exercise too much, there’s an excess set of calories,

01:18:06 excess energy.

01:18:07 What do you do with those?

01:18:08 You basically create, you spend a lot of that energy to create these high energy molecules,

01:18:13 lipids, which you can then burn when you need them on a rainy day.

01:18:19 So that leads to obesity if you don’t exercise and if you overeat because your body’s like,

01:18:26 oh great, I have all these calories.

01:18:27 I’m going to store them.

01:18:28 Ooh, more calories.

01:18:29 I’m going to store them too.

01:18:30 Ooh, more calories.

01:18:31 So basically the 42% of European chromosomes have a predisposition to storing fat, which

01:18:40 was selected probably in the food scarcity periods, like basically as we were exiting

01:18:48 Africa before and during the ice ages, there was probably a selection to those individuals

01:18:54 who made it North to basically be able to store energy, a lot more energy.

01:19:00 So you basically now have this lineage that is deciding whether you want to store energy

01:19:07 in your white fat or burn energy in your beige fat.

01:19:11 It turns out that your fat is, you know, like we have such a bad view of fat.

01:19:18 Fat is your best friend.

01:19:20 Fat can both store all these excess lipids that would be otherwise circulating through

01:19:24 your body and causing damage, but it can also burn calories directly.

01:19:29 If you have too much energy, you can just choose to just burn some of that as heat.

01:19:35 So basically when you’re cold, you’re burning energy to basically warm your body up and

01:19:41 you’re burning all these lipids and you’re burning all these calories.

01:19:44 So what we basically found is that across the board, genetic variants associated with

01:19:50 obesity across many of these regions were all enriched repeatedly in mesenchymal stem

01:19:56 cell enhancers.

01:19:58 So that gave us a hint as to which of these genetic variants was likely driving this whole

01:20:05 association.

01:20:06 And we ended up with this one genetic variant called RS1421085.

01:20:14 And that genetic variant out of the 89 was the one that we predicted to be causal for

01:20:20 the disease.

01:20:21 Wow.

01:20:22 So going back to those steps, first step is figure out the relevant tissue based on the

01:20:26 global enrichment.

01:20:27 Second step is figure out the causal variant among many variants in this linkage disequilibrium

01:20:34 in this coinherited block between these recombination hotspots, these boundaries of these inherited

01:20:41 blocks.

01:20:42 That’s the second step.

01:20:43 The third step is once you know that causal variant, try to figure out what is the motif

01:20:49 that is disrupted by that causal variant.

01:20:52 Basically how does it act?

01:20:54 Variants don’t just disrupt elements, they disrupt the binding of specific regulators.

01:20:59 So basically the third step there was how do you find the motif that is responsible

01:21:04 like the gene regulatory word, the building block of gene regulation that is responsible

01:21:10 for that dysregulatory event.

01:21:12 And the fourth step is finding out what regulator normally binds that motif and is now no longer

01:21:18 able to bind.

01:21:19 And then once you have the regulator, can you then try to figure out how to, what after

01:21:24 it developed, how to fix it?

01:21:27 That’s exactly right.

01:21:28 You now know how to intervene.

01:21:30 You have basically a regulator, you have a gene that you can then perturb and you say,

01:21:34 well, maybe that regulator has a global role in obesity.

01:21:38 I can perturb the regulator.

01:21:40 Just to clarify, when we say perturb, like on the scale of a human life, can a human

01:21:46 being be helped?

01:21:49 Of course.

01:21:50 Yeah.

01:21:51 I guess understanding is the first step.

01:21:52 No, no, but perturbed basically means you now develop therapeutics, pharmaceutical therapeutics

01:21:57 against that.

01:21:59 Or you develop other types of intervention that affect the expression of that gene.

01:22:03 What do pharmaceutical therapeutics look like when your understanding is on a genetic level?

01:22:11 Yeah.

01:22:12 Sorry if it’s a dumb question.

01:22:13 No, no, no.

01:22:14 It’s a brilliant question, but I want to save it for a little bit later when we start talking

01:22:16 about therapeutics.

01:22:17 Perfect.

01:22:18 So let’s talk about the first four steps.

01:22:20 There’s two more.

01:22:21 So basically the first step is figure out, I mean, the zero step, the starting point

01:22:25 is the genetics.

01:22:26 The first step after that is figure out the tissue of action.

01:22:31 The second step is figuring out the nucleotide that is responsible or set of nucleotides.

01:22:36 The third step is figuring out the motif and the upstream regulator, number four.

01:22:40 Number five and six is what are the targets?

01:22:44 So number five is great.

01:22:45 Now I know the regulator.

01:22:47 I know the motif.

01:22:48 I know the tissue and I know the variant.

01:22:51 What does it actually do?

01:22:53 So you have to now trace it to the biological process and the genes that mediate that biological

01:22:59 process.

01:23:00 So knowing all of this can now allow you to find the target genes.

01:23:05 How?

01:23:06 By basically doing perturbation experiments or by looking at the folding of the epigenome

01:23:13 or by looking at the genetic impact of that genetic variant on the expression of genes.

01:23:19 And we use all three.

01:23:21 So let me go through them.

01:23:22 Basically one of them is physical links.

01:23:26 This is the folding of the genome onto itself.

01:23:29 How do you even figure out the folding?

01:23:32 It’s a little bit of a tangent, but it’s a super awesome technology.

01:23:36 Think of the genome as again, this massive packaging that we talked about of taking two

01:23:41 meters worth of DNA and putting it in something that’s a million times smaller than two meters

01:23:48 worth of DNA.

01:23:49 That’s a single cell.

01:23:51 You basically have this massive packaging and this packaging basically leads to the

01:23:56 chromosome being wrapped around in sort of tight, tight ways in ways, however, that are

01:24:02 functionally capable of being reopened and reclosed.

01:24:07 So I can then go in and figure out that folding by sort of chopping up the spaghetti soup,

01:24:15 putting glue and ligating the segments that were chopped up but nearby each other, and

01:24:21 then sequencing through these ligation events to figure out that this region of this chromosome,

01:24:26 that region of the chromosome were near each other.

01:24:28 That means they were interacting even though they were far away on the genome itself.

01:24:33 So that chopping up, sequencing and reglueing is basically giving you folds of the genome

01:24:42 that we call.

01:24:43 Sorry, can you backtrack?

01:24:44 Of course.

01:24:45 How does cutting it help you figure out which ones were close in the original folding?

01:24:50 So you have a bowl of noodles.

01:24:53 Go on.

01:24:54 And in that bowl of noodles, some noodles are near each other.

01:24:59 Yes.

01:25:00 So you throw in a bunch of glue, you basically freeze the noodles in place, throw in a cutter

01:25:06 that chops up the noodles into little pieces.

01:25:10 Now throw in some ligation enzyme that lets those pieces that were free religate near

01:25:18 each other.

01:25:19 In some cases, they religate what you had just cut, but that’s very rare.

01:25:24 Most of the time they will religate in whatever was proximal.

01:25:30 You now have glued the red noodle that was crossing the blue noodle to each other.

01:25:36 You then reverse the glue, the glue goes away and you just sequence the heck out of it.

01:25:43 Most of the time you’ll find red segment with, you know, red segment, but you can specifically

01:25:48 select for ligation events that have happened that were not from the same segment by sort

01:25:52 of marking them in a particular way and then selecting those and then you sequence and

01:25:57 you look for red with blue matches of sort of things that were glued that were not immediate

01:26:03 proximal to each other.

01:26:05 And that reveals the linking of the blue noodle and the red noodle.

01:26:08 You’re with me so far?

01:26:09 Yeah.

01:26:10 Good.

01:26:11 So we’ve done these experiments.

01:26:12 That’s the physical.

01:26:13 That’s the physical.

01:26:14 That’s step one of the physical.

01:26:15 And what the physical revealed is topologically associated domains, basically big blocks of

01:26:20 the genome that are topologically connected together.

01:26:25 That’s the physical.

01:26:26 The second one is the genetic links.

01:26:30 It basically says across individuals that have different genetic variants, how are their

01:26:37 genes expressed differently?

01:26:39 Remember before I was saying that the path between genetics and disease is enormous,

01:26:43 but we can break it up to look at the path between genetics and gene expression.

01:26:47 So instead of using Alzheimer’s as a phenotype, I can now use expression of IRX3 as the phenotype,

01:26:54 expression of gene A. And I can look at all of the humans who contain a G at that location

01:27:01 and all the humans that contain a T at that location and basically say, wow, it turns

01:27:05 out that the expression of each gene is higher for the T humans than for the G humans at

01:27:09 that location.

01:27:10 So that basically gives me a genetic link between a genetic variant, a locus, a region,

01:27:16 and the expression of nearby genes.

01:27:19 Good on the genetic link?

01:27:20 I think so.

01:27:21 Awesome.

01:27:22 The third genetic link is the activity link.

01:27:25 What’s an activity link?

01:27:26 It basically says if I look across 833 different epigenomes, whenever this enhancer is active,

01:27:34 this gene is active.

01:27:36 That gives me an activity link between this region of the DNA and that gene.

01:27:42 And then the fourth one is perturbations where I can go in and blow up that region and see

01:27:47 what are the genes that change in expression, or I can go in and over activate that region

01:27:51 and see what genes change in expression.

01:27:55 So I guess that’s similar to activity?

01:27:57 Yeah.

01:27:58 Yeah.

01:27:59 So that’s basically similar to activity.

01:28:00 I agree, but it’s causal rather than correlational.

01:28:02 Again, I’m a little weird.

01:28:04 No, no, you’re 100% on.

01:28:07 It’s exactly the same as the perturbation where I go in and intervene.

01:28:11 I basically take a bunch of cells.

01:28:13 So you know CRISPR, right?

01:28:16 CRISPR is this genome guidance and cutting mechanism.

01:28:21 That’s what George Church likes to call genome vandalism.

01:28:24 So you basically are able to, you can basically take a guide RNA that you put into the CRISPR

01:28:32 system, and the CRISPR system will basically use this guide RNA, scan the genome, find

01:28:38 wherever there’s a match, and then cut the genome.

01:28:42 So I digress, but it’s a bacterial immune defense system.

01:28:48 So basically bacteria are constantly attacked by viruses, but sometimes they win against

01:28:54 the viruses and they chop up these viruses.

01:28:56 And remember as a trophy inside their genome, they have these loci, these CRISPR loci that

01:29:02 basically stands for clustered repeats, interspersed, et cetera.

01:29:06 So basically it’s an interspersed repeats structure where basically you have a set of

01:29:11 repetitive regions and then interspersed where these variable segments that were basically

01:29:17 matching viruses.

01:29:19 So when this was first discovered, it was basically hypothesized that this is probably

01:29:24 a bacterial immune system that remembers the trophies of the viruses that managed to kill.

01:29:30 And then the bacteria pass on, you know, they sort of do lateral transfer of DNA and they

01:29:34 pass on these memories so that the next bacterium says, Ooh, you killed that guy.

01:29:39 When that guy shows up again, I will recognize him.

01:29:41 And the CRISPR system was basically evolved as a bacterial adaptive immune response to

01:29:47 sense foreigners that should not belong and to just go and cut their genome.

01:29:52 So it’s an RNA guided RNA cutting enzyme or an RNA guided DNA cutting enzyme.

01:30:00 So there’s different systems.

01:30:02 Some of them cut DNA, some of them cut RNA, but all of them remember this sort of viral

01:30:08 attack.

01:30:10 So what we have done now as a field is, you know, through the work of, you know, Jennifer

01:30:15 Donne, Manuel Carpentier, Feng Zhang and many others is coopted that system of bacterial

01:30:23 immune defense as a way to cut genomes.

01:30:26 You basically have this guiding system that allows you to use an RNA guide to bring enzymes

01:30:35 to cut DNA at a particular locus.

01:30:37 That’s so fascinating.

01:30:39 So this is like already a natural mechanism, a natural tool for cutting those useful as

01:30:45 particular context.

01:30:46 And we’re like, well, we can use that thing to actually, it’s a nice tool that’s already

01:30:51 in the body.

01:30:52 Yeah.

01:30:53 Yeah.

01:30:54 It’s not in our body.

01:30:55 It’s in the bacterial body.

01:30:56 It was discovered by the yogurt industry.

01:30:59 They were trying to make better yogurts and they were trying to make their bacteria in

01:31:03 their yogurt cultures more resilient to viruses.

01:31:08 And they were studying bacteria and they found that, wow, this CRISPR system is awesome.

01:31:12 It allows you to defend against that.

01:31:14 And then it was coopted in mammalian systems that don’t use anything like that as a targeting

01:31:20 way to basically bring these DNA cutting enzymes to any locus in the genome.

01:31:25 Why would you want to cut DNA to do anything?

01:31:29 The reason is that our DNA has a DNA repair mechanism where if a region of the genome

01:31:35 gets randomly cut, you will basically scan the genome for anything that matches and sort

01:31:40 of use it by homology.

01:31:43 So the reason why we’re deployed is because we now have a spare copy.

01:31:47 As soon as my mom’s copy is deactivated, I can use my dad’s copy.

01:31:50 And somewhere else, if my dad’s copy is deactivated, I can use my mom’s copy to repair it.

01:31:55 So this is called homologous based repair.

01:31:59 So all you have to do is the cutting and you don’t have to do the fixing.

01:32:04 That’s exactly right.

01:32:05 You don’t have to do the fixing.

01:32:06 Because it’s already built in.

01:32:07 That’s exactly right.

01:32:08 But the fixing can be coopted by throwing in a bunch of homologous segments that instead

01:32:14 of having your dad’s version, have whatever other version you’d like to use.

01:32:19 So you then control the fixing by throwing in a bunch of other stuff.

01:32:24 That’s exactly right.

01:32:25 And that’s how you do genome editing.

01:32:26 So that’s what CRISPR is.

01:32:27 That’s what CRISPR is.

01:32:28 In popular culture, people use the term.

01:32:30 I’ve never, wow, that’s brilliant.

01:32:32 So CRISPR is genome vandalism followed by a bunch of band aids that have the sequence

01:32:39 that you’d like.

01:32:40 And you could control the choices of band aids.

01:32:43 Correct.

01:32:44 And of course there’s new generations of CRISPR.

01:32:46 There’s something that’s called prime editing that was sort of very, very much in the press

01:32:50 recently that basically instead of sort of making a double stranded break, which again

01:32:55 is genome vandalism, you basically make a single stranded break.

01:33:00 You basically just nick one of the two strands, enabling you to sort of peel off without sort

01:33:06 of completely breaking it up and then repair it locally using a guide that is coupled to

01:33:13 your initial RNA that took you to that location.

01:33:18 Dumb question, but is CRISPR as awesome and cool as it sounds?

01:33:24 I mean, technically speaking, in terms of like as a tool for manipulating our genetics

01:33:31 in the positive meaning of the word manipulating, or is there downsides, drawbacks in this whole

01:33:39 context of therapeutics that we’re talking about or understanding and so on?

01:33:42 So when I teach my students about CRISPR, I show them articles with the headline, genome

01:33:50 editing tool revolutionizes biology.

01:33:53 And then I show them the date of these articles and they’re 2004, like five years before CRISPR

01:33:58 was invented.

01:33:59 And the reason is that they’re not talking about CRISPR.

01:34:02 They’re talking about zinc finger enzymes that are another way to bring these cutters

01:34:07 to the genome.

01:34:09 It’s a very difficult way of sort of designing the right set of zinc finger proteins, the

01:34:13 right set of amino acids that will now target a particular long stretch of DNA because for

01:34:20 every location that you want to target, you need to design a particular regulator, a particular

01:34:25 protein that will match that region well.

01:34:28 There’s another technology called talons, which are basically just a different way of

01:34:35 using proteins to sort of guide these cutters to a particular location of the genome.

01:34:41 These require a massive team of engineers, of biological engineers to basically design

01:34:46 a set of amino acids that will target a particular sequence of your genome.

01:34:51 The reason why CRISPR is amazingly, awesomely revolutionary is because instead of having

01:34:57 this team of engineers design a new set of proteins for every locus that you want to

01:35:02 target, you just type it in your computer and you just synthesize an RNA guide.

01:35:07 The beauty of CRISPR is not the cutting, it’s not the fixing.

01:35:11 All of that was there before.

01:35:12 It’s the guiding, and the only thing that changes is that it makes the guiding easier

01:35:17 by sort of just typing in the RNA sequence, which then allows the system to sort of scan

01:35:23 the DNA to find that.

01:35:25 So the coding, the engineering of the cutter is easier in terms of SP.

01:35:32 That’s kind of similar to the story of deep learning versus old school machine learning.

01:35:37 Some of the challenging parts are automated.

01:35:41 But CRISPR is just one cutting technology, and then that’s part of the challenges and

01:35:47 exciting opportunities of the field is to design different cutting technologies.

01:35:53 So now this was a big parenthesis on CRISPR, but now when we were talking about perturbations,

01:36:00 you basically now have the ability to not just look at correlation between enhancers

01:36:04 and genes, but actually go and either destroy that enhancer and see if the gene changes

01:36:10 in expression, or you can use the CRISPR targeting system to bring in not vandalism and cutting,

01:36:20 but you can couple the CRISPR system with, and the CRISPR system is called usually CRISPR

01:36:26 Cas9 because Cas9 is the protein that will then come and cut.

01:36:30 But there’s a version of that protein called dead Cas9 where the cutting part is deactivated.

01:36:36 So you basically use the dead Cas9 to bring in an activator or to bring in a repressor.

01:36:45 So you can now ask, is this enhancer changing that gene by taking this modified CRISPR,

01:36:51 which is already modified from the bacteria to be used in humans, that you can now modify

01:36:55 the Cas9 to be dead Cas9, and you can now further modify to bring in a regulator, and

01:37:01 you can basically turn on or turn off that enhancer and then see what is the impact on

01:37:05 that gene.

01:37:06 So these are the four ways of linking the locus to the target gene, and that’s step

01:37:11 number five.

01:37:14 Step number five is find the target gene, and step number six is what the heck does

01:37:17 that gene do?

01:37:19 You basically now go and manipulate that gene to basically see what are the processes that

01:37:25 change, and you can basically ask, well, in this particular case, in the FTO locus, we

01:37:32 found mesenchymal stem cells that are the progenitors of white fat and brown fat or

01:37:38 beige fat.

01:37:39 We found the RS1421085 nucleotide variant as the causal variant.

01:37:44 We found this large enhancer, this master regulator.

01:37:49 I like to call it OB1 for obesity one, like the strongest enhancer associated with it,

01:37:55 and OB1 was kind of chubby as the actor.

01:37:57 I don’t know if you remember him.

01:38:01 So you basically are using this Jedi mind trick to basically find out the location of

01:38:07 the genome that is responsible, the enhancer that harbors it, the motif, the upstream regulator,

01:38:14 which is ARID5B for AT rich interacting domain 5B.

01:38:18 That’s a protein that sort of comes and binds normally.

01:38:21 That protein is normally a repressor.

01:38:23 It represses this super enhancer, this massive 12,000 nucleotide master regulatory control

01:38:28 gene, and it turns off IRX3, which is a gene that’s 600,000 nucleotides away, and IRX5,

01:38:36 which is 1.2 million nucleotides away.

01:38:38 So those things.

01:38:39 And what’s the effect of turning them off?

01:38:40 That’s exactly the next question.

01:38:42 So step six is what do these genes actually do?

01:38:45 So we then ask, what does RX3 and RX5 do?

01:38:48 The first thing we did is look across individuals for individuals that had higher expression

01:38:52 of RX3 or lower expression RX3.

01:38:55 And then we looked at the expression of all of the other genes in the genome.

01:38:58 And we looked for simply correlation.

01:39:01 And we found that RX3 and RX5 were both correlated positively with lipid metabolism and negatively

01:39:09 with mitochondrial biogenesis.

01:39:11 You’re like, what the heck does that mean?

01:39:16 Does this sound related to obesity?

01:39:18 Not at all superficially, but lipid metabolism should, because lipids is these high and

01:39:25 energy molecules that basically store fat.

01:39:28 So RX3 and RX5 are negatively correlated with lipid metabolism.

01:39:33 So that basically means that when they turn on, positively, when they turn on, they turn

01:39:39 on lipid metabolism.

01:39:41 And they’re negatively correlated with mitochondrial biogenesis.

01:39:45 What do mitochondria do in this whole process?

01:39:49 Again, small parenthesis, what are mitochondria?

01:39:53 Mitochondria are little organelles.

01:39:56 They arose, they only are found in eukaryotes.

01:40:01 U means good, karyote means nucleus.

01:40:04 So truly like a true nucleus.

01:40:05 So eukaryotes have a nucleus.

01:40:07 Prokaryotes are before the nucleus.

01:40:09 They don’t have a nucleus.

01:40:11 So eukaryotes have a nucleus, compartmentalization.

01:40:16 Eukaryotes have also organelles.

01:40:19 Some eukaryotes have chloroplasts.

01:40:22 These are the plants, they photosynthesize.

01:40:26 Some other eukaryotes like us have another type of organelle called mitochondria.

01:40:33 These arose from an ancient species that we engulfed.

01:40:40 This is an endosymbiosis event.

01:40:44 Symbiosis bio means life, sim means together.

01:40:47 So symbiotes are things that live together.

01:40:50 Symbiosis endo means inside, so endosymbiosis means you live together holding the other

01:40:54 one inside you.

01:40:56 So the pre eukaryotes engulfed an organism that was very good at energy production and

01:41:07 that organism eventually shed most of its genome to now have only 13 genes in the mitochondrial

01:41:14 genome and those 13 genes are all involved in energy production, the electron transport

01:41:22 chain.

01:41:23 So basically electrons are these massive super energy rich molecules.

01:41:28 We basically have these organelles that produce energy and when your muscle exercises, you

01:41:35 basically multiply your mitochondria.

01:41:37 You basically sort of, you know, use more and more mitochondria and that’s how you get

01:41:42 beefed up.

01:41:43 So basically the muscle sort of learns how to generate more energy.

01:41:47 So basically every single time your muscles will, you know, overnight regenerate and sort

01:41:51 of become stronger and amplify their mitochondria and so forth.

01:41:55 So what does mitochondria do?

01:41:56 The mitochondria use energy to sort of do any kind of task.

01:42:02 When you’re thinking, you’re using energy.

01:42:05 This energy comes from mitochondria.

01:42:06 Your neurons have mitochondria all over the place.

01:42:10 Basically this mitochondria can multiply as organelles and they can be spread along the

01:42:13 body of your muscle.

01:42:15 Some of your muscle cells have actually multiple nuclei, they’re polynucleated, but they also

01:42:18 have multiple mitochondria to basically deal with the fact that your muscle is enormous.

01:42:24 You can sort of span these super, super long length and you need energy throughout the

01:42:28 length of your muscle.

01:42:29 So that’s why you have mitochondria throughout the length and you also need transcription

01:42:32 through the length so you have multiple nuclei as well.

01:42:35 So these two processes, lipids store energy, what do mitochondria do?

01:42:42 So there’s a process known as thermogenesis.

01:42:46 Thermal heat, genesis generation.

01:42:48 Thermogenesis is the generation of heat.

01:42:50 Remember that bathtub with the in and out?

01:42:55 That’s the equation that everybody’s focused on.

01:42:57 So how much energy do you consume?

01:42:58 How much energy do you burn?

01:43:01 But in every thermodynamic system, there’s three parts to the equation.

01:43:06 There’s energy in, energy out, and energy lost.

01:43:10 Any machine has loss of energy.

01:43:14 How do you lose energy?

01:43:15 You emanate heat.

01:43:17 So heat is energy loss.

01:43:20 So there’s…

01:43:24 Which is where the thermogenesis comes in.

01:43:26 Thermogenesis is actually a regulatory process that modulates the third component of the

01:43:32 thermodynamic equation.

01:43:34 You can basically control thermogenesis explicitly.

01:43:37 You can turn on and turn off thermogenesis.

01:43:39 And that’s where the mitochondria comes into play.

01:43:41 Exactly.

01:43:42 So Irix3 and RX5 turn out to be the master regulators of a process of thermogenesis versus

01:43:49 lipogenesis generation of fat.

01:43:52 So Irix3 and RX5 in most people burn heat, burn calories as heat.

01:43:58 So when you eat too much, just burn it off in your fat cells.

01:44:02 So that bathtub has basically a sort of dissipation knob that most people are able to turn on.

01:44:11 I am unable to turn that on because I am a homozygous carrier for the mutation that changes

01:44:17 a T into a C in the RS1421085 allele and locus, a SNP.

01:44:24 I have the risk allele twice from my mom and from my dad.

01:44:28 So I’m unable to thermogenize.

01:44:31 I’m unable to turn on thermogenesis through Irix3 and RX5 because the regulator that normally

01:44:37 binds here, Irix5b, can no longer bind because it’s an AT rich interacting domain.

01:44:42 And as soon as I change the T into a C, it can no longer bind because it’s no longer

01:44:46 AT rich.

01:44:47 But doesn’t that mean that you’re able to use the energy more efficiently?

01:44:52 You’re not generating heat or is that?

01:44:54 That means I can eat less and get around just fine.

01:44:56 Yes.

01:44:57 Yeah.

01:44:58 So that’s a feature actually.

01:44:59 It’s a feature in a food scarce environment.

01:45:02 Yeah.

01:45:03 But if we’re all starving, I’m doing great.

01:45:05 If we all have access to massive amounts of food, I’m obese basically.

01:45:09 That’s taken us to the entire process of then understanding that why mitochondria and then

01:45:14 the lipids are both, even though distant, are somehow involved.

01:45:18 Different sides of the same coin.

01:45:20 And you basically choose to store energy or you can choose to burn energy.

01:45:24 And then all of that is involved in the puzzle of obesity.

01:45:27 And that’s what’s fascinating, right?

01:45:29 Here we are in 2007, discovering the strongest genetic association with obesity and knowing

01:45:35 nothing about how it works for almost 10 years.

01:45:39 For 10 years, everybody focused on this FTO gene and they were like, oh, it must have

01:45:43 to do something with RNA modification.

01:45:46 And it’s like, no, it has nothing to do with the function of FTO.

01:45:50 It has everything to do with all of these other processes.

01:45:53 And suddenly the moment you solve that puzzle, which is a multiyear effort by the way, a

01:45:58 tremendous effort by Melina and many, many others.

01:46:01 So this tremendous effort basically led us to recognize this circuitry.

01:46:07 You went from having some 89 common variants associated in that region of the DNA sitting

01:46:12 on top of this gene to knowing the whole circuitry.

01:46:17 When you know the circuitry, you can now go crazy.

01:46:21 You can now start intervening at every level.

01:46:24 You can start intervening at the arid 5B level.

01:46:27 You can start intervening with CRISPR Cas9 at the single SNP level.

01:46:31 You can start intervening at iRx3 and iRx5 directly there.

01:46:34 You can start intervening at the thermogenesis level because you know the pathway.

01:46:38 You can start intervening at the differentiation level where the decision to make either white

01:46:45 fat or beige fat, the energy burning beige fat is made developmentally in the first three

01:46:51 days of differentiation of your adipocytes.

01:46:54 So as they’re differentiating, you basically can choose to make fat burning machines or

01:46:57 fat storing machines.

01:46:59 And sort of that’s how you populate your fat.

01:47:02 You basically can now go in pharmaceutical and do all of that.

01:47:05 And in our paper, we actually did all of that.

01:47:09 We went in and manipulated every single aspect.

01:47:12 At the nucleotide level, we use CRISPR Cas9 genome editing to basically take primary adipocytes

01:47:18 from risk and non risk individuals and show that by editing that one nucleotide out of

01:47:24 3.2 billion nucleotides in the human genome, you could then flip between an obese phenotype

01:47:29 and a lean phenotype like a switch.

01:47:31 You can basically take my cells that are non thermogenizing and just flip into thermogenizing

01:47:36 cells by changing one nucleotide.

01:47:38 It’s mind boggling.

01:47:40 It’s so inspiring that this puzzle could be solved in this way and it feels within reach

01:47:44 to then be able to crack the problem of some of these diseases.

01:47:50 What are the technologies, the tools that came along that made this possible?

01:48:00 What are you excited about?

01:48:01 Maybe if we just look at the buffet of things that you’ve kind of mentioned, what’s involved?

01:48:08 What should we be excited about?

01:48:09 What are you excited about?

01:48:11 I love that question because there’s so much ahead of us.

01:48:14 There’s so, so much.

01:48:18 So basically solving that one locus required massive amounts of knowledge that we have

01:48:24 been building across the years through the epigenome, through the comparative genomics

01:48:28 to find out the causal variant and the controller regulatory motif through the conserved circuitry.

01:48:35 It required knowing these regulatory genomic wiring.

01:48:38 It required high C of these sort of topologically associated domains to basically find these

01:48:42 long range interaction.

01:48:44 It required EQTLs of these sort of genetic perturbation of these intermediate gene phenotypes.

01:48:51 It required all of the arsenal of tools that I’ve been describing was put together for

01:48:55 one locus.

01:48:57 And this was a massive team effort, huge investment in time, energy, money, effort, intellectual,

01:49:05 everything.

01:49:06 You’re referring to, I’m sorry, just for the obesity one.

01:49:09 Yeah, this one paper.

01:49:10 This one single paper.

01:49:11 This one single locus.

01:49:12 I would like to say that this is a paper about one nucleotide in the human genome, about

01:49:16 one bit of information, C versus T in the human genome.

01:49:20 That’s one bit of information and we have 3.2 billion nucleotides to go through.

01:49:25 So how do you do that systematically?

01:49:29 I am so excited about the next phase of research because the technologies that my group and

01:49:35 many other groups have developed allows us to now do this systematically, not just one

01:49:40 locus at a time, but thousands of loci at a time.

01:49:45 So let me describe some of these technologies.

01:49:48 The first one is automation and robotics.

01:49:52 So basically, you know, we talked about how you can take all of these molecules and see

01:49:58 which of these molecules are targeting each of these genes and what do they do?

01:50:02 So you can basically now screen through millions of molecules through thousands and thousands

01:50:07 and thousands of plates, each of which has thousands and thousands and thousands of molecules,

01:50:12 every single time testing, you know, all of these genes and asking which of these molecules

01:50:20 perturb these genes.

01:50:22 So that’s technology number one, automation and robotics.

01:50:25 Technology number two is parallel readouts.

01:50:29 So instead of perturbing one locus and then asking if I use CRISPR Cas9 on this enhancer

01:50:35 to basically use dCas9 to turn on or turn off the enhancer, or if I use CRISPR Cas9

01:50:41 on the SNP to basically change that one SNP at a time, then what happens?

01:50:46 But we have 120,000 disease associated SNPs that we want to test.

01:50:52 We don’t want to spend 120,000 years doing it.

01:50:57 So what do we do?

01:50:58 We’ve basically developed this technology for massively parallel reporter assays, MPRA.

01:51:07 So in collaboration with Tarsha Mikkelsen, Eric Lander, I mean, Jason Durie’s group has

01:51:11 done a lot of that.

01:51:12 So there’s a lot of groups that basically have developed technologies for testing 10,000

01:51:19 genetic variants at a time.

01:51:21 How do you do that?

01:51:23 You know, we talked about microarray technology, the ability to synthesize these huge microarrays

01:51:28 that allow you to do all kinds of things like measure gene expression by hybridization,

01:51:33 by measuring the genotype of a person, by looking at hybridization with one version

01:51:38 with a T versus the other version with a C, and then sort of figuring out that I am a

01:51:43 risk carrier for obesity based on these differential hybridization in my genome that says, oh,

01:51:49 you seem to only have this allele or you seem to have that allele.

01:51:53 These can also be used to systematically synthesize small fragments of DNA.

01:51:59 So you can basically synthesize these 150 nucleotide long fragments across 450,000 spots

01:52:07 at a time.

01:52:10 You can now take the result of that synthesis, which basically works through all of these

01:52:15 sort of layers of adding one nucleotide at a time.

01:52:18 You can basically just type it into your computer and order it, and you can basically order

01:52:24 10,000 or 100,000 of these small DNA segments at a time.

01:52:30 And that’s where awesome molecular biology comes in.

01:52:33 You can basically take all these segments, have a common start and end barcode or sort

01:52:38 of like Gator, just like pieces of a puzzle.

01:52:42 You can make the same end piece and the same start piece for all of them.

01:52:48 And you can now use plasmids, which are these extra chromosomal small DNA circular segments

01:52:57 that are basically inhabiting all our, all our genomes.

01:53:00 We basically have, you know, plasmids from floating around and bacteria use plasmids

01:53:05 for transferring DNA.

01:53:07 And that’s where they put a lot of antibiotic resistance genes.

01:53:10 So they can easily transfer them from one bacterium to the other.

01:53:14 After one bacterium evolves a gene to be resistant to a particular antibiotic, it basically says

01:53:20 to all its friends, Hey, here’s that sort of DNA piece.

01:53:24 We can now coopt these plasmids into human cells.

01:53:28 You can basically make a human cell culture and add plasmids to that human cell culture

01:53:34 that contain the things that you want to test.

01:53:38 You now have this library of 450,000 elements.

01:53:41 You can insert them each into the common plasmid and then test them in millions of cells in

01:53:47 parallel.

01:53:48 And the common plasmid is all the same before you add it.

01:53:51 Exactly.

01:53:52 The rest of the plasmid is the same.

01:53:53 So it’s, it’s called an epizomal reporter assay.

01:53:57 Epizome means not inside the genome.

01:53:59 It’s sort of outside the chromosomes.

01:54:01 So it’s an epizomal assay that allows you to have a variable region where you basically

01:54:06 test 10,000 different enhancers and you have a common region which basically has the same

01:54:11 reporter gene.

01:54:13 You now can do some very cool molecular biology.

01:54:16 You can basically take the 450,000 elements that you’ve generated and you have a piece

01:54:21 of the puzzle here, piece of the puzzle here, which is identical.

01:54:24 So they’re compatible with that plasmid.

01:54:27 You can chop them up in the middle to separate a barcode reporter from the enhancer and in

01:54:32 the middle put the same gene again using the same piece of the puzzle.

01:54:36 You now can have a barcode readout of what is the impact of 10,000 different versions

01:54:42 of an enhancer on gene expression.

01:54:46 So we’re not doing one experiment, we’re doing 10,000 experiments.

01:54:50 And those 10,000 can be 5,000 of different loci and each of them in two versions, risk

01:54:58 or non risk.

01:55:00 I can now test tens of thousands.

01:55:01 Just a little hypothesis.

01:55:02 Exactly.

01:55:03 And then you can do 10,000 and we can test 10,000 hypothesis at once.

01:55:08 How hard is it to generate those 10,000?

01:55:11 Trivial.

01:55:12 Trivial.

01:55:13 But it’s biology.

01:55:14 No, no.

01:55:15 Generating the 10,000 is trivial because you basically add, it’s biotechnology.

01:55:20 You basically have these arrays that add one nucleotide at a time at every spot.

01:55:26 So it’s printing and so you’re able to, you’re able to control.

01:55:30 Yeah.

01:55:31 Is it super costly?

01:55:32 Is it?

01:55:33 10,000 bucks.

01:55:34 So this isn’t millions.

01:55:35 10,000 bucks for 10,000 experiments sounds like the right, you know.

01:55:39 I mean, so that’s super, that’s exciting because you don’t have to do one thing at a time.

01:55:44 You can now use that technology, these massively parallel reporter assays to test 10,000 locations

01:55:49 at a time.

01:55:51 We’ve made multiple modifications to that technology.

01:55:55 One was sharper MPRA, which stands for, you know, basically getting a higher resolution

01:56:04 view by tiling these, these elements so you can see where along the region of control

01:56:14 are they acting.

01:56:16 And we made another modification called Hydra for high, you know, definition regulatory

01:56:23 annotation or something like that, which basically allows you to test 7 million of these at a

01:56:30 time by sort of cutting them directly from the DNA.

01:56:32 So instead of synthesizing, which basically has the limit of 450,000 that you can synthesize

01:56:37 at a time, we basically said, Hey, if we want to test all accessible regions of the genome,

01:56:42 let’s just do an experiment that cuts accessible regions.

01:56:45 Let’s take those accessible regions, put them all with the same end joints of the puzzles,

01:56:51 and then now use those to create a much, much larger array of things that you can test.

01:56:59 And then tiling all of these regions, you can then pinpoint what are the driver nucleotides,

01:57:04 what are the elements, how are they acting across 7 million experiments at a time.

01:57:07 So basically this is all the same family of technology where you’re basically using these

01:57:12 parallel readouts of the barcodes.

01:57:15 And then to do this, we used a technology called StarSeq for self transcribing reporter

01:57:23 assays, a technology developed by Alex Stark, my former postdoc, who’s now API over in Vienna.

01:57:30 So we basically coupled the StarSeq, the self transcribing reporters where the enhancer

01:57:37 can be part of the gene itself.

01:57:39 So instead of having a separate barcode, that enhancer basically acts to turn on the gene

01:57:43 and it’s transcribed as part of the gene.

01:57:46 So you don’t have to have the two separate parts.

01:57:47 Exactly.

01:57:48 So you can just read them directly.

01:57:49 So there’s a constant improvements in this whole process.

01:57:52 By the way, generating all these options, is it basically brute force?

01:57:57 How much human intuition is?

01:57:58 Oh gosh, of course it’s human intuition and human creativity and incorporating all of

01:58:04 the input data sets.

01:58:06 Because again, the genome is enormous.

01:58:08 3.2 billion, you don’t want to test that.

01:58:11 You basically use all of these tools that I’ve talked about already.

01:58:14 You generate your top favorite 10,000 hypothesis, and then you go and test all 10,000.

01:58:19 And then from what comes out, you can then go to the next step.

01:58:24 So that’s technology number two.

01:58:25 So technology number one is robotics, automation, where you have thousands of wells and you

01:58:30 constantly test them.

01:58:32 The second technology is instead of having wells, you have these massively parallel readouts

01:58:37 in sort of these pooled assays.

01:58:40 The third technology is coupling CRISPR perturbations with these single cell RNA readouts.

01:58:51 So let me make another parenthesis here to describe now single cell RNA sequencing.

01:58:57 So what does single cell RNA sequencing mean?

01:58:59 So RNA sequencing is what has been traditionally used, well, traditionally the last 20 years,

01:59:07 ever since the advent of next generation sequencing.

01:59:10 So basically before RNA expression profiling was based on these microarrays.

01:59:14 The next technology after that was based on sequencing.

01:59:17 So you chop up your RNA and you just sequence small molecules, just like you would sequence

01:59:22 a genome, basically reverse transcribe the small RNAs into DNA, and you sequence that

01:59:28 DNA in order to get the number of sequencing reads corresponding to the expression level

01:59:35 of every gene in the genome.

01:59:37 You now have RNA sequencing.

01:59:39 How do you go to single cell RNA sequencing?

01:59:42 That technology also went through stages of evolution.

01:59:45 The first was microfluidics.

01:59:48 You basically had these, or even chambers, you basically had these ways of isolating

01:59:52 individual cells, putting them into a well for every one of these cells.

01:59:57 So you have 384 well plates and you now do 384 parallel reactions to measure the expression

02:00:03 of 384 cells.

02:00:05 That sounds amazing and it was amazing, but we want to do a million cells.

02:00:11 How do you go from these wells to a million cells?

02:00:14 You can’t.

02:00:15 So what the next technology was after that is instead of using a well for every reaction,

02:00:21 you now use a lipid droplet for every reaction.

02:00:26 So you use micro droplets as reaction chambers to basically amplify RNA.

02:00:33 So here’s the idea.

02:00:34 You basically have microfluidics where you basically have every single cell coming down

02:00:39 one tube in your microfluidics and you have little bubbles getting created in the other

02:00:44 way with specific primers that mark every cell with its own barcode.

02:00:49 You basically couple the two and you end up with little bubbles that have a cell and tons

02:00:55 of markers for that cell.

02:00:57 You now mark up all of the RNA for that one cell with the same exact barcode and you then

02:01:03 lyse all of the droplets and you sequence the heck out of that and you have for every

02:01:09 RNA molecule, a unique identifier that tells you what cell was it on.

02:01:12 That is such good engineering, microfluidics and using some kind of primer to put a label

02:01:20 on the thing.

02:01:21 I mean, you’re making it sound easy.

02:01:24 I assume it’s beautiful, but it’s gorgeous.

02:01:27 So there’s the next generation.

02:01:29 So that’s the second generation.

02:01:31 Next generation is forget the microfluidics altogether.

02:01:34 Just use big bottles.

02:01:35 How can you possibly do that with big bottles?

02:01:37 So here’s the idea.

02:01:39 You dissociate all of your cells or all of your nuclei from complex cells like brain

02:01:43 cells that are very long and sticky so you can’t do that.

02:01:48 If you have blood cells or if you have neuronal nuclei or brain nuclei, you can basically

02:01:52 dissociate let’s say a million cells.

02:01:56 You now want to add a unique barcode, a unique barcode in each one of a million cells using

02:02:01 only big bottles.

02:02:02 How can you possibly do that?

02:02:04 Sounds crazy, but here’s the idea.

02:02:07 You use a hundred of these bottles, you randomly shuffle all your million cells and you throw

02:02:13 them into those hundred bottles randomly, completely randomly.

02:02:17 You add one barcode out of a hundred to every one of the cells.

02:02:21 You then you now take them all out.

02:02:23 You shuffle them again and you throw them again into the same hundred bottles.

02:02:28 But now in a different randomization and you add a second barcode.

02:02:33 So every cell now has two barcodes.

02:02:36 You take them out again, you shuffle them and you throw them back in.

02:02:40 Another third barcode is adding randomly from the same hundred barcodes.

02:02:47 You’ve now labeled every cell probabilistically based on the unique path that he took of which

02:02:53 of a hundred bottles did he go for the first time, which of a hundred bottles the second

02:02:56 time and which of a hundred bottles the third time.

02:03:00 A hundred times a hundred times a hundred is a million unique barcodes in every single

02:03:05 one of these cells without ever using microfluidics.

02:03:09 Very clever.

02:03:10 It’s beautiful, right?

02:03:11 From a computer science perspective, that’s very clever.

02:03:12 Yeah.

02:03:13 So you now have the single cell sequence technology.

02:03:16 You can use the wells, you can use the bubbles or you can use the bottles and you have way

02:03:22 The bubbles still sound pretty damn cool.

02:03:23 The bubbles are awesome.

02:03:24 And that’s basically the main technology that we’re using.

02:03:26 So the bubbles is the main technology.

02:03:29 So there are kits now that companies just sell to basically carry out single cell RNA

02:03:34 sequencing that you can basically for $2,000, you can basically get 10,000 cells from one

02:03:40 sample.

02:03:42 And for every one of those cells, you basically have the transcription of thousands of genes.

02:03:49 And you know, of course the data for any one cell is noisy, but being computer scientists,

02:03:54 we can aggregate the data from all of the cells together across thousands of individuals

02:03:58 together to basically make very robust inferences.

02:04:02 Okay.

02:04:03 So the third technology is basically single cell RNA sequencing that allows you to now

02:04:07 start asking not just what is the brain expression level difference of that genetic variant,

02:04:14 but what is the expression difference of that one genetic variant across every single subtype

02:04:20 of brain cell?

02:04:21 How is the variance changing?

02:04:24 You can’t just, you know, with a brain sample, you can just ask about the mean, what is the

02:04:29 average expression?

02:04:30 If I instead have 3000 cells that are neurons, I can ask not just what is the neuronal expression.

02:04:38 I can say for layer five excitatory neurons of which I have, I don’t know, 300 cells,

02:04:44 what is the variance that this genetic variant has?

02:04:48 So suddenly it’s amazingly more powerful.

02:04:51 I can basically start asking about this middle layer of gene expression at unprecedented

02:04:55 levels.

02:04:56 So when you look at the average, it washes out some potentially important signal that

02:05:01 corresponds to ultimately the disease.

02:05:04 Completely.

02:05:05 Yeah.

02:05:06 So that, I can do that at the RNA level, but I can also do that at the DNA level for the

02:05:10 epigenome.

02:05:11 So remember how before I was telling you about all this technology that we’re using to probe

02:05:14 the epigenome, one of them is DNA accessibility.

02:05:18 So what we’re doing in my lab is that from the same dissociation of say a brain sample

02:05:23 where you now have all these tens of thousands of cells floating around, you basically take

02:05:27 half of them to do RNA profiling and the other half to do epigenome profiling, both at the

02:05:32 single cell level.

02:05:34 So that allows you to now figure out what are the millions of DNA enhancers that are

02:05:40 accessible in every one of tens of thousands of cells.

02:05:45 And computationally, we can now take the RNA and the DNA readouts and group them together

02:05:50 to basically figure out how is every enhancer related to every gene.

02:05:57 And remember these sort of enhancer gene linking that we were doing across 833 samples?

02:06:01 833 is awesome, don’t get me wrong, but 10 million is way more awesome.

02:06:08 So we can now look at correlated activity across 2.3 million enhancers and 20,000 genes

02:06:14 in each of millions of cells to basically start piecing together the regulatory circuitry

02:06:19 of every single type of neuron, every single type of astrocytes, oligodendrocytes, microglial

02:06:25 cell inside the brains of 1,500 individuals that we sample across multiple different brain

02:06:32 regions across both DNA and RNA.

02:06:36 So that’s the data set that my team generated last year alone.

02:06:39 So in one year, we basically generated 10 million cells from human brain across a dozen

02:06:46 different disorders, across schizophrenia, Alzheimer’s, frontotemporal dementia, Lewy

02:06:51 body dementia, ALS, Huntington’s disease, post traumatic stress disorder, autism, bipolar

02:07:01 disorder, healthy aging, et cetera.

02:07:04 So it’s possible that even just within that data set lie a lot of keys to understanding

02:07:13 these diseases and then be able to like directly leads to then treatment.

02:07:18 Correct.

02:07:19 Correct.

02:07:20 So basically we are now motivating.

02:07:21 Yeah.

02:07:22 So our computational team is in heaven right now and we’re looking for people.

02:07:25 I mean, if you have super smart.

02:07:29 So this is a very interesting kind of side question.

02:07:33 How much of this is biology?

02:07:34 How much of this is computation?

02:07:36 So you’re the head of the computational biology group, but how much of, should you be comfortable

02:07:44 with biology to be able to solve some of these problems?

02:07:48 If you just find, if you put several of the hats you were on fundamentally, are you thinking

02:07:54 like a computer scientist here?

02:07:56 You have to.

02:07:57 This is the only way.

02:07:59 As I said, we are the descendants of the first digital computer.

02:08:02 We’re trying to understand the digital computer.

02:08:05 We’re trying to understand the circuitry, the logic of this digital core computer and

02:08:11 all of these analog layers surrounding it.

02:08:14 So the case that I’ve been making is that you cannot think one gene at a time.

02:08:19 The traditional biology is dead.

02:08:22 There’s no way you cannot solve disease with traditional biology.

02:08:24 You need it as a component.

02:08:27 Once you figured out RX3 and RX5, you now can then say, Hey, have you guys worked on

02:08:31 those genes with your single gene approach?

02:08:33 We’d love to know everything you know.

02:08:35 And if you haven’t, we now know how important these genes are.

02:08:38 Let’s now launch a single gene program to dissect them and understand them.

02:08:43 But you cannot use that as a way to dissect disease.

02:08:46 You have to think genomically.

02:08:48 You have to think from the global perspective and you have to build these circuits systematically.

02:08:53 So we need numbers of computer scientists who are interested and willing to dive into

02:08:59 these data fully, fully in and extract meaning.

02:09:04 We need computer science people who can understand machine learning and inference and decouple

02:09:11 these matrices, come up with super smart ways of dissecting them.

02:09:16 But we also need computer scientists who understand biology, who are able to design the next generation

02:09:22 of experiments.

02:09:24 Because many of these experiments, no one in their right mind would design them without

02:09:28 thinking of the analytical approach that you would use to deconvolve the data afterwards.

02:09:33 Because it’s massive amounts of ridiculously noisy data.

02:09:36 And if you don’t have the computational pipeline in your head before you even design the experiment,

02:09:42 you would never design the experiment that way.

02:09:44 That’s brilliant.

02:09:45 So in designing the experiment, you have to see the entirety of the computational pipeline.

02:09:50 That drives the design.

02:09:52 That even drives the necessity for that design.

02:09:55 Basically, you know, if you didn’t have a computer scientist way of thinking, you would

02:10:00 never design these hugely combinatorial, massively parallel experiments.

02:10:07 So that’s why you need interdisciplinary teams, you need teams.

02:10:10 And I want to sort of clarify that what do we mean by computational biology group?

02:10:15 The focus is not on computational, the focus is on the biology.

02:10:18 So we are a biology group.

02:10:20 What type of biology?

02:10:22 Computational biology.

02:10:23 That’s the type of biology that uses the whole genome.

02:10:27 That’s the type of biology that designs experiments, genomic experiments, that can only be interpreted

02:10:33 in the context of the whole genome.

02:10:34 Right.

02:10:35 So it’s philosophically looking at biology as a computer.

02:10:39 Correct.

02:10:40 Correct.

02:10:41 So which is in the context of the history of biology is a big transformation.

02:10:46 Yeah.

02:10:47 Yeah.

02:10:48 You can think of the name as what do we do?

02:10:50 Only computation.

02:10:51 That’s not true.

02:10:52 How do we study it?

02:10:53 Only computationally.

02:10:54 That is true.

02:10:56 So all of these single cell sequencing can now be coupled with the technology that we

02:11:00 talked about earlier for perturbation.

02:11:02 So here’s the crazy thing.

02:11:04 Instead of using these wells and these robotic systems for doing one drug at a time or for

02:11:10 perturbing one gene at a time in thousands of wells, you can now do this using a pool

02:11:16 of cells and single cell RNA sequencing.

02:11:20 How?

02:11:21 You basically can take these perturbations using CRISPR and instead of using a single

02:11:27 guide RNA, you can use a library of guide RNAs generated exactly the same way using

02:11:32 this array technology.

02:11:34 So you synthesize a thousand different guide RNAs.

02:11:38 You now take each of these guide RNAs and you insert them in a pool of cells where every

02:11:45 cell gets one perturbation.

02:11:48 And you use CRISPR editing or CRISPR, so with either CRISPR Cas9 to edit a genome with these

02:11:56 thousand perturbations or with the activation or with the repression.

02:12:01 And you now can have a single cell readout where every single cell has received one of

02:12:07 these modifications.

02:12:09 And you can now in massively parallel ways, couple the perturbation and the readout in

02:12:17 a single experiment.

02:12:18 How are you tracking which perturbations each cell received?

02:12:21 So there’s ways of doing that, but basically one way is to make that perturbation an expressible

02:12:27 vector so that part of your RNA reading is actually that perturbation itself.

02:12:33 So you can basically put it in an expressible part so you can self drive it.

02:12:37 So the point that I want to get across is that the sky’s the limit.

02:12:42 You basically have these tools, these building blocks of molecular biology.

02:12:46 We have these massive data sets of computational biology.

02:12:50 We have this huge ability to sort of use machine learning and statistical methods and, you

02:12:56 know, linear algebra to sort of reduce the dimensionality of all these massive data sets.

02:13:01 And then you end up with a series of actionable targets that you can then couple with pharma

02:13:10 and just go after systematically.

02:13:13 So the ability to sort of bring genetics to the epigenomics, to the transcriptomics, to

02:13:19 the cellular readouts using these sort of high throughput perturbation technologies

02:13:24 that I’m talking about and ultimately to the organismal through the electronic health record

02:13:30 endophenotypes and ultimately the disease battery of assays at the cognitive level,

02:13:36 at the physiological level and, you know, every other level.

02:13:42 There is no better or more exciting field, in my view, to be a computer scientist then

02:13:46 or to be a scientist in period.

02:13:48 Basically this confluence of technologies, of computation, of data, of insight and of

02:13:54 tools for manipulation is unprecedented in human history.

02:13:58 And I think this is what’s shaping the next century to really be a transformative century

02:14:04 for our species and for our planet.

02:14:09 Do you think the 21st century will be remembered for the big leaps in understanding and alleviation

02:14:17 of biology?

02:14:18 If you look at the path between discovery and therapeutics, it’s been on the order of

02:14:23 50 years, it’s been shortened to 40, 30, 20, and now it’s on the order of 10 years.

02:14:29 But the huge number of technologies that are going on right now for discovery will result

02:14:36 undoubtedly in the most dramatic manipulation of human biology that we’ve ever seen in the

02:14:42 history of humanity in the next few years.

02:14:45 Do you think we might be able to cure some of the diseases we started this conversation

02:14:48 with?

02:14:49 Absolutely.

02:14:50 Absolutely.

02:14:51 It’s only a matter of time.

02:14:54 Basically the complexity is enormous and I don’t want to underestimate the complexity

02:14:58 but the number of insights is unprecedented and the ability to manipulate is unprecedented

02:15:03 and the ability to deliver these small molecules and other non traditional medicine perturbations,

02:15:11 there’s a new generation of perturbations that you can use at the DNA level, at the

02:15:17 RNA level, at the micro RNA level, at the epigenomic level, there’s a battery of new

02:15:24 generations of perturbations.

02:15:26 If you couple that with cell type identifiers that can basically sense when you are in the

02:15:32 right cell based on the specific combination and then turn on that intervention for that

02:15:36 cell, you can now think of combinatorial interventions where you can basically sort of feed a synthetic

02:15:42 biology construct to someone that will basically do different things in different cells.

02:15:47 So basically for cancer, this is one of the therapeutics that our collaborator Ron Weiss

02:15:51 is using to basically start sort of engineering the circuits that will use micro RNA sensors

02:15:56 of the environment to sort of know if you’re in a tumor cell or if you’re in an immune

02:15:59 cell or if you’re in a stromal cell and so forth and basically turn on particular interventions

02:16:04 there.

02:16:05 You can sort of create constructs that are tuned to only the liver cells or only the

02:16:11 heart cells or only the brain cells and then have these new generations of therapeutics

02:16:18 coupled with this immense amount of knowledge on the sort of which targets to choose and

02:16:24 what biological processes to measure and how to intervene.

02:16:27 My view is that disease is going to be fundamentally altered and alleviated as we go forward.

02:16:36 Next time we talk, we’ll talk about the philosophical implications of that and the effect of life,

02:16:40 but let’s stick to biology for just a little longer.

02:16:44 We did pretty good today.

02:16:45 We stuck to the science.

02:16:49 What are you excited in terms of the future of this field, the technologies in your own

02:16:56 group, in your own mind, you’re leading the world at MIT in the science and the engineering

02:17:02 of this work.

02:17:04 So what are you excited about here?

02:17:06 I could not be more excited.

02:17:08 We are one of many, many teams who are working on this.

02:17:12 In my team, the most exciting parts are, you know, many folds.

02:17:17 So basically we’ve now assembled these battery of technologies.

02:17:20 We’ve assembled these massive, massive data sets and now we’re really sort of in the stage

02:17:24 of our team’s path of generating disease insights.

02:17:30 So we are simultaneously working on a paper on schizophrenia right now that is basically

02:17:36 using the single cell profiling technologies, using this editing and manipulation technologies

02:17:40 to basically show how the master regulators underlying changes in the brain that are sort

02:17:47 of found in schizophrenia are in fact affecting excitatory neurons and inhibitory neurons

02:17:53 in pathways that are active both in synaptic pruning, but also in early development.

02:17:59 We’ve basically found this set of four regulators that are connecting these two processes that

02:18:03 were previously separate in schizophrenia in sort of having a sort of more unified view

02:18:10 across those two sides.

02:18:12 The second one is in the area of metabolism.

02:18:15 We basically now have a beautiful collaboration with the Goodyear lab that’s basically looking

02:18:19 at multi tissue perturbations in six or seven different tissues across the body in the context

02:18:29 of exercise and in the context of nutritional interventions using both mouse and human,

02:18:35 where we can basically see what are the cell to cell communications that are changing across

02:18:41 them.

02:18:42 And what we’re finding is this immense role of both immune cells as well as adipocyte

02:18:47 stem cells in sort of reshaping that circuitry of all of these different tissues and that’s

02:18:53 sort of painting to a new path for therapeutical intervention there.

02:18:56 In Alzheimer’s, it’s this huge focus on microglia and now we’re discovering different classes

02:19:02 of microglial cells that are basically either synaptic or immune.

02:19:10 And these are playing vastly different roles in Alzheimer’s versus in schizophrenia.

02:19:16 And what we’re finding is this immense complexity as you go further and further down of how

02:19:22 in fact there’s 10 different types of microglia, each with their own sort of expression programs.

02:19:28 We used to think of them as, oh yeah, they’re microglia, but in fact now we’re realizing

02:19:32 just even in that sort of least abundant of cell types, there’s this incredible diversity

02:19:37 there.

02:19:39 The differences between brain regions is another sort of major, major insight.

02:19:44 Often one would think that, oh, astrocytes are astrocytes no matter where they are.

02:19:48 But no, there’s incredible region specific differences in the expression patterns of

02:19:54 all of the major brain cell types across different brain regions.

02:19:57 So basically there’s the neocortical regions that are sort of the recent innovation that

02:20:01 makes us so different from all other species.

02:20:03 There’s the sort of reptilian brain sort of regions that are sort of much more very extremely

02:20:10 distinct.

02:20:11 There’s the cerebellum.

02:20:12 Each of those basically is associated in a different way with disease.

02:20:17 And what we’re doing now is looking into pseudo temporal models for how disease progresses

02:20:23 across different regions of the brain.

02:20:25 If you look at Alzheimer’s, it basically starts in this small region called the entorhinal

02:20:30 cortex and then it spreads through the brain and through the hippocampus and ultimately

02:20:38 affecting the neocortex.

02:20:39 And with every brain region that it hits, it basically has a different impact on the

02:20:46 cognitive and memory aspects, orientation, short term memory, long term memory, et cetera,

02:20:52 which is dramatically affecting the cognitive path that the individuals go through.

02:20:58 So what we’re doing now is creating these computational models for ordering the cells

02:21:04 and the regions and the individuals according to their ability to predict Alzheimer’s disease.

02:21:10 So we can have a cell level predictor of pathology that allows us to now create a temporal time

02:21:17 course that tells us when every gene turns on along this pathology progression and then

02:21:22 trace that across regions and pathological measures that are region specific, but also

02:21:28 cognitive measures and so on and so forth.

02:21:30 So that allows us to now sort of for the first time, look at can we actually do early intervention

02:21:35 for Alzheimer’s where we know that the disease starts manifesting for 10 years before you

02:21:40 actually have your first cognitive loss.

02:21:44 Can we start seeing that path to build new diagnostics, new prognostics, new biomarkers

02:21:50 for this sort of early intervention in Alzheimer’s?

02:21:54 The other aspect that we’re looking at is mosaicism.

02:21:57 We talked about the common variants and the rare variants, but in addition to those rare

02:22:01 variants as your initial cell that forms the zygote divides and divides and divides, with

02:22:08 every cell division there are additional mutations that are happening.

02:22:12 So what you end up with is your brain being a mosaic of multiple different types of genetic

02:22:18 underpinnings.

02:22:19 Some cells contain a mutation that other cells don’t have.

02:22:23 So every human has the common variants that all of us carry to some degree, the rare variants

02:22:31 that your immediate tree of the human species carries, and then there’s the somatic variant,

02:22:37 which is the tree that happened after the zygote that sort of forms your own body.

02:22:44 So these somatic alterations is something that has been previously inaccessible to study

02:22:50 in human postmortem samples.

02:22:53 But right now with the advent of single cell RNA sequencing, in this particular case, we’re

02:22:58 using the well based sequencing, which is much more expensive, but gives you a lot richer

02:23:01 information about each of those transcripts.

02:23:04 So we’re using now that richer information to infer mutations that have happened in each

02:23:10 of the thousands of genes that sort of are active in these cells, and then understand

02:23:16 how the genome relates to the function, this genotype phenotype relationship that we usually

02:23:25 build in GWAS between in genome wide association studies between genetic variation and disease.

02:23:31 We’re now building that at the cell level, where for every cell, we can relate the unique

02:23:36 specific genome of that cell with the expression patterns of that cell, and the predicted function

02:23:42 using these predictive models that I mentioned before on this regulation for cognition for

02:23:47 pathology in Alzheimer’s at the cell level.

02:23:51 And what we’re finding is that the genes that are altered and the genetic regions that are

02:23:54 altered in common variants versus rare variants versus somatic variants are actually very

02:23:59 different from each other.

02:24:01 The somatic variants are pointing to neuronal energetics and oligodendrocyte functions that

02:24:08 are not visible in the genetic legions that you find for the common variants, probably

02:24:13 because they have too strong of an effect that evolution is just not tolerating them

02:24:17 on the common side of the allele frequency spectrum.

02:24:20 So the somatic one, that’s the variation that happens after the zygote, after you individual.

02:24:26 I mean, this is a dumb question, but there’s mutation and variation, I guess that happens

02:24:31 there.

02:24:32 And you’re saying that they’re through this, if we focus in on individual cells, we’re

02:24:37 able to detect the story that’s interesting there, and that might be a very unique kind

02:24:42 of important variability that arises for, you said neuronal or something that would

02:24:49 sound…

02:24:50 Energetics.

02:24:51 Energetics, sounds like a cool term.

02:24:52 So, I mean, the metabolism of humans is dramatically altered from that of nearby species.

02:24:59 We talked about that last time that basically we are able to consume meat that is incredibly

02:25:04 energy rich, and that allows us to sort of have functions that are meeting this humongous

02:25:13 brain that we have.

02:25:14 So basically on one hand, every one of our brain cells is much more energy efficient

02:25:18 than our neighbors, than our relatives.

02:25:20 Number two, we have way more of these cells.

02:25:23 And number three, we have this new diet that allows us to now feed all these needs.

02:25:30 That basically creates a massive amount of damage, oxidative damage from this huge super

02:25:36 powered factory of ideas and thoughts that we carry in our skull.

02:25:42 And that factory has energetic needs, and there’s a lot of sort of biological processes

02:25:47 underlying that, that we are finding are altered in the context of Alzheimer’s disease.

02:25:52 That’s fascinating.

02:25:53 So you have to consider all of these systems if you want to understand even something like

02:25:59 diseases that you would maybe traditionally associate with just the particular cells of

02:26:04 the brain.

02:26:07 The immune system, the metabolic system, the metabolic system.

02:26:11 And these are all the things that makes us uniquely human.

02:26:13 So our immune system is dramatically different from that of our neighbors.

02:26:17 Our societies are so much more clustered.

02:26:19 The history of infection that have plagued the human population is dramatically different

02:26:24 from every other species.

02:26:27 The way that our society and our population has sort of exploded has basically put unique

02:26:31 pressures on our immune system.

02:26:33 And our immune system has both coped with that density and also been shaped by, as I

02:26:37 mentioned, the vast amount of death that has happened in the Black Plague and other sort

02:26:42 of selective events in human history, famines, ice ages, and so forth.

02:26:47 So that’s number one on the sort of immune side.

02:26:49 On the metabolic side, again, we are able to sort of run marathons.

02:26:55 I don’t know if you remember the sort of human versus horse experiment where the horse actually

02:26:59 tires out faster than the human and the human actually wins.

02:27:03 So on the metabolic side, we’re dramatically different.

02:27:05 On the immune side, we’re dramatically different.

02:27:07 On the brain side, again, you know, no need to sort of, you know, it’s a no brainer of

02:27:12 how our brain is like just enormously more capable.

02:27:16 And then, you know, in the side of cancer, so basically the cancers that humans are having,

02:27:21 the exposures, the environmental exposures is again, dramatically different.

02:27:25 And the lifespan, the expansion of human lifespan is unseen in any other species in, you know,

02:27:32 recent evolutionary history.

02:27:35 And that now leads to a lot of new disorders that are starting to, you know, manifest late

02:27:42 in life.

02:27:43 So you know, Alzheimer’s is one example where basically, you know, these vast energetic

02:27:48 needs over a lifetime of thinking can basically lead to all of these debris and eventually

02:27:54 saturate the system and lead to, you know, Alzheimer’s in the late life.

02:28:00 But there’s, you know, there’s just such a dramatic set of frontiers when it comes to

02:28:07 aging research that, you know, so what I often like to say is that if you want to engineer

02:28:14 a car to go from 70 miles an hour to 120 miles an hour, that’s fine.

02:28:18 You can basically, you know, fix a few components.

02:28:20 If you wanted to now go at 400 miles an hour, you have to completely redesign the entire

02:28:24 car because the system has just not evolved to go that far.

02:28:31 Basically our human body has only evolved to live to, I don’t know, 120, maybe we can

02:28:36 get to 150 with minor changes.

02:28:39 But if, you know, as we start pushing these frontiers for not just living, but well living,

02:28:45 the Fzine that we talked about last time.

02:28:48 So to basically push Fzine into the 80s and 90s and a hundreds and, you know, much further

02:28:53 than that, we will face new challenges that have, you know, never been faced before in

02:29:00 terms of cancer, the number of divisions, in terms of Alzheimer’s and brain related

02:29:04 disorders, in terms of metabolic disorders, in terms of regeneration, there’s just so

02:29:08 many different frontiers ahead of us.

02:29:10 So I am thrilled about where we’re heading.

02:29:14 So basically I see this confluence in my lab and many other labs of AI, of, you know, sort

02:29:20 of, you know, the next frontier of AI for drug design.

02:29:22 So basically these sort of graph neural networks on specific chemical designs that allow you

02:29:30 to create new generations of therapeutics.

02:29:34 These molecular biology tricks for intervening at the system at every level, these personalized

02:29:42 medicine prediction, diagnosis, and prognosis using the electronic health records and using

02:29:49 these polygenic risk scores weighted by the burden, the number of mutations that are accumulating

02:29:56 across common rare and somatic variants, the burden converging across all of these different

02:30:03 molecular pathways, the delivery of specific drugs and specific interventions into specific

02:30:10 cell types.

02:30:11 And again, you’ve talked with Bob Langer about this, there’s, you know, many giants in that

02:30:14 field.

02:30:15 And then the last concept is not intervening at the single gene level.

02:30:20 I want you to sort of conceptualize the concept of an on target side effect.

02:30:27 What is an on target side effect?

02:30:29 An off target side effect is when you design a molecule to target one gene and instead

02:30:33 it targets another gene and you have side effects because of that.

02:30:36 And on target side effect is when your molecule does exactly what you were expecting, but

02:30:41 that gene is plyotropic.

02:30:43 Plyo means many, tropos means ways, many ways, it acts in many ways.

02:30:48 It’s a multifunctional gene.

02:30:50 So you find that this gene plays a role in this, but as we talked about the wiring of

02:30:55 genes to phenotypes is extremely dense and extremely complex.

02:30:59 So the next stage of intervention will be intervening not at the gene level, but at

02:31:04 the network level.

02:31:06 Intervening at the set of pathways and the set of genes with multi input perturbations

02:31:11 to the system, multi input modulations, pharmaceutical or other interventional, and that basically

02:31:18 allow you to now work at the sort of full level of understanding, not just in your brain,

02:31:24 but across your body, not just in one gene, but across the set of pathways and so on and

02:31:29 so forth for every one of these disorders.

02:31:31 So I think that we’re finally at the level of systems medicine of basically instead of

02:31:37 sort of medicine being at the single gene level, medicine being at the systems level

02:31:42 where it can be personalized based on the specific set of genetic markers and genetic

02:31:46 perturbations that you are either born with or that you have developed during your lifetime.

02:31:53 Your unique set of exposures, your unique set of biomarkers, and your unique set of

02:31:59 current set of conditions through your EHR and other ways.

02:32:06 And the precision component of intervening extremely precisely in the specific pathways

02:32:12 and the specific combinations of genes that should be modulated to sort of bring you from

02:32:16 the disease state to the physiologically normal state or even to physiologically improved

02:32:23 state through this combination of interventions.

02:32:25 So that’s in my view, the field where basically computer science comes together with artificial

02:32:30 intelligence statistics, all of these other tools, molecular biology technologies and

02:32:34 biotechnology and pharmaceutical technologies that are sort of revolutionary in the way

02:32:37 of intervention.

02:32:38 And of course, this massive amount of molecular biology and data gathering and generation

02:32:43 and perturbation in massively parallel ways.

02:32:46 So there’s no better way.

02:32:47 There’s no better time.

02:32:49 There’s no better place to be sort of looking at this whole confluence of ideas.

02:32:56 And I’m just so thrilled to be a small part of this amazing, enormous ecosystem.

02:33:01 It’s exciting to imagine what humans of 100, 200 years from now, what their life experience

02:33:07 is like, because these ideas seem to have potential to transform the quality of life

02:33:13 that, when they look back at us, they probably wonder how we were put up with all the suffering

02:33:22 in the world.

02:33:23 Manolis, it’s a huge honor.

02:33:25 Thank you for spending this early Sunday morning with me.

02:33:29 I deeply appreciate it.

02:33:30 See you next time.

02:33:31 Sounds like a plan.

02:33:32 Thank you, Lex.

02:33:33 Thanks for listening to this conversation with Manolis Kellis.

02:33:36 And thank you to our sponsors, SEMrush, which is an SEO optimization tool.

02:33:43 Pessimist Archive, which is one of my favorite history podcasts.

02:33:47 8Sleep, which is a self cooling mattress with smart sensors and an app.

02:33:52 And finally, BetterHelp, which is an online therapy service.

02:33:57 Please check out these sponsors in the description to get a discount and to support this podcast.

02:34:02 If you enjoy this thing, subscribe on YouTube, review it with 5 Stars and Apple Podcasts,

02:34:08 follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.

02:34:13 And now, let me leave you with some words from Haruki Murakami.

02:34:19 Human beings are ultimately nothing but carriers, passageways for genes.

02:34:24 They ride us into the ground like racehorses from generation to generation.

02:34:30 Genes don’t think about what constitutes good or evil.

02:34:34 They don’t care whether we’re happy or unhappy.

02:34:37 We’re just means to an end for them.

02:34:40 The only thing they think about is what is most efficient for them.

02:34:45 Thank you for listening, and hope to see you next time.