Transcript
00:00:00 The following is a conversation with Dmitry Korkin,
00:00:02 his second time in the podcast.
00:00:04 He’s a professor of bioinformatics
00:00:06 and computational biology at WPI,
00:00:09 where he specializes in bioinformatics of complex disease,
00:00:13 computational genomics, systems biology,
00:00:16 and biomedical data analytics.
00:00:18 He loves biology, he loves computing,
00:00:22 plus he is Russian and recites a poem in Russian
00:00:26 at the end of the podcast.
00:00:27 What else could you possibly ask for in this world?
00:00:31 Quick mention of our sponsors.
00:00:32 Brave Browser, NetSuite Business Management Software,
00:00:37 Magic Spoon Low Carb Cereal,
00:00:40 and 8sleep Self Cooling Mattress.
00:00:42 So the choice is browsing privacy, business success,
00:00:46 healthy diet, or comfortable sleep.
00:00:49 Choose wisely, my friends,
00:00:50 and if you wish, click the sponsor links below
00:00:53 to get a discount and to support this podcast.
00:00:56 As a side note, let me say that to me,
00:00:58 the scientists that did the best apolitical,
00:01:01 impactful, brilliant work of 2020
00:01:04 are the biologists who study viruses without an agenda,
00:01:09 without much sleep, to be honest,
00:01:11 just a pure passion for scientific discovery
00:01:14 and exploration of the mysteries within viruses.
00:01:18 Viruses are both terrifying and beautiful.
00:01:21 Terrifying because they can threaten
00:01:22 the fabric of human civilization,
00:01:25 both biological and psychological.
00:01:27 Beautiful because they give us insights
00:01:30 into the nature of life on Earth
00:01:32 and perhaps even extraterrestrial life
00:01:35 of the not so intelligent variety
00:01:37 that might meet us one day
00:01:39 as we explore the habitable planets
00:01:41 and moons in our universe.
00:01:43 If you enjoy this thing, subscribe on YouTube,
00:01:45 review it on Apple Podcast, follow on Spotify,
00:01:49 support on Patreon, or connect with me on Twitter
00:01:51 at Lex Friedman.
00:01:53 And now here’s my conversation with Dmitry Korkin.
00:01:57 It’s often said that proteins
00:02:00 and the amino acid residues that make them up
00:02:04 are the building blocks of life.
00:02:06 Do you think of proteins in this way
00:02:08 as the basic building blocks of life?
00:02:11 Yes and no.
00:02:12 So the proteins indeed is the basic unit,
00:02:16 biological unit that carries out
00:02:20 important function of the cell.
00:02:22 However, through studying the proteins
00:02:25 and comparing the proteins across different species,
00:02:29 across different kingdoms,
00:02:31 you realize that proteins are actually
00:02:34 much more complicated.
00:02:36 So they have so called modular complexity.
00:02:42 And so what I mean by that is an average protein
00:02:47 consists of several structural units.
00:02:54 So we call them protein domains.
00:02:57 And so you can imagine a protein as a string of beads
00:03:02 where each bead is a protein domain.
00:03:05 And in the past 20 years,
00:03:10 scientists have been studying
00:03:13 the nature of the protein domains
00:03:15 because we realize that it’s the unit.
00:03:19 Because if you look at the functions, right?
00:03:22 So many proteins have more than one function
00:03:25 and those protein functions are often carried out
00:03:29 by those protein domains.
00:03:31 So we also see that in the evolution,
00:03:37 those proteins domains get shuffled.
00:03:40 So they act actually as a unit.
00:03:43 Also from the structural perspective, right?
00:03:45 So some people think of a protein
00:03:50 as a sort of a globular molecule,
00:03:55 but as a matter of fact,
00:03:56 is the globular part of this protein is a protein domain.
00:04:02 So we often have this, again,
00:04:06 the collection of this protein domains
00:04:09 align on a string as beads.
00:04:14 And the protein domains are made up of amino acid residue.
00:04:17 So we’re talking about.
00:04:18 So this is the basic,
00:04:20 so you’re saying the protein domain
00:04:22 is the basic building block of the function
00:04:25 that we think about proteins doing.
00:04:28 So of course you can always talk
00:04:30 about different building blocks.
00:04:31 It’s turtles all the way down.
00:04:32 But there’s a point where there is,
00:04:36 at the point of the hierarchy
00:04:37 where it’s the most, the cleanest element block
00:04:43 based on which you can put them together
00:04:46 in different kinds of ways to form complex function.
00:04:49 And you’re saying protein domains,
00:04:50 why is that not talked about as often in popular culture?
00:04:55 Well, there are several perspectives on this.
00:04:59 And one of course is the historical perspective, right?
00:05:03 So historically scientists have been able
00:05:07 to structurally resolved
00:05:10 to obtain the 3D coordinates of a protein
00:05:14 for smaller proteins.
00:05:17 And smaller proteins tend to be a single domain protein.
00:05:21 So we have a protein equal to a protein domain.
00:05:24 And so because of that,
00:05:26 the initial suspicion was that the proteins are,
00:05:29 they have globular shapes
00:05:31 and the more of smaller proteins you obtain structurally,
00:05:36 the more you became convinced that that’s the case.
00:05:41 And only later when we started having
00:05:47 alternative approaches.
00:05:49 So the traditional ones are X ray crystallography
00:05:55 and NMR spectroscopy.
00:05:57 So this is sort of the two main techniques
00:06:02 that give us the 3D coordinates.
00:06:04 But nowadays there’s huge breakthrough
00:06:07 in cryo electron microscopy.
00:06:10 So the more advanced methods that allow us
00:06:13 to get into the 3D shapes of much larger molecules,
00:06:21 molecular complexes,
00:06:23 just to give you one of the common examples
00:06:28 for this year, right?
00:06:29 So the first experimental structure
00:06:32 of a SARS COVID 2 protein
00:06:35 was the cryo EM structure of the S protein.
00:06:40 So the spike protein.
00:06:41 And so it was solved very quickly.
00:06:46 And the reason for that is the advancement
00:06:49 of this technology is pretty spectacular.
00:06:53 How many domains does the, is it more than one domain?
00:06:57 Oh yes.
00:06:58 Oh yes, I mean, so it’s a very complex structure.
00:07:01 And we, you know, on top of the complexity
00:07:06 of a single protein, right?
00:07:08 So this structure is actually is a complex, is a trimer.
00:07:13 So it needs to form a trimer in order to function properly.
00:07:17 What’s a complex?
00:07:18 So a complex is a glomeration of multiple proteins.
00:07:22 And so we can have the same protein copied in multiple,
00:07:29 you know, made up in multiple copies
00:07:32 and forming something that we called a homo oligomer.
00:07:36 Homo means the same, right?
00:07:38 So in this case, so the spike protein is the,
00:07:42 is an example of a homo tetram, homo trimer, sorry.
00:07:46 So you need three copies of it?
00:07:48 Three copies.
00:07:48 In order to.
00:07:50 Exactly.
00:07:50 We have these three chains,
00:07:52 the three molecular chains coupled together
00:07:56 and performing the function.
00:07:58 That’s what, when you look at this protein from the top,
00:08:02 you see a perfect triangle.
00:08:03 Yeah.
00:08:04 So, but other, you know,
00:08:07 so other complexes are made up of, you know,
00:08:10 different proteins.
00:08:12 Some of them are completely different.
00:08:15 Some of them are similar.
00:08:16 The hemoglobin molecule, right?
00:08:18 So it’s actually, it’s a protein complex.
00:08:21 It’s made of four basic subunits.
00:08:25 Two of them are identical to each other.
00:08:29 Two other identical to each other,
00:08:30 but they are also similar to each other,
00:08:32 which sort of gives us some ideas about the evolution
00:08:36 of this, you know, of this molecule.
00:08:40 And perhaps, so one of the hypothesis is that, you know,
00:08:44 in the past, it was just a homo tetramer, right?
00:08:48 So four identical copies,
00:08:50 and then it became, you know, sort of modified,
00:08:55 it became mutated over the time
00:08:58 and became more specialized.
00:09:00 Can we linger on the spike protein for a little bit?
00:09:02 Is there something interesting
00:09:04 or like beautiful you find about it?
00:09:06 I mean, first of all,
00:09:07 it’s an incredibly challenging protein.
00:09:10 And so we, as a part of our sort of research
00:09:16 to understand the structural basis of this virus,
00:09:20 to sort of decode, structurally decode,
00:09:22 every single protein in its proteome,
00:09:27 which, you know, we’ve been working on this spike protein.
00:09:31 And one of the main challenges was that the cryoEM data
00:09:36 allows us to reconstruct or to obtain the 3D coordinates
00:09:44 of roughly two thirds of the protein.
00:09:48 The rest of the one third of this protein,
00:09:51 it’s a part that is buried into the membrane of the virus
00:09:58 and of the viral envelope.
00:10:01 And it also has a lot of unstable structures around it.
00:10:06 So it’s chemically interacting somehow
00:10:08 with whatever the hex is connecting to.
00:10:10 Yeah, so people are still trying to understand.
00:10:12 So the nature of, and the role of this one third,
00:10:18 because the top part, you know, the primary function
00:10:23 is to get attached to the ACE2 receptor, human receptor.
00:10:28 There is also beautiful mechanics
00:10:32 of how this thing happens, right?
00:10:34 So because there are three different copies of this chains,
00:10:39 you know, there are three different domains, right?
00:10:43 So we’re talking about domains.
00:10:44 So this is the receptor binding domains, RBDs,
00:10:47 that gets untangled and get ready to get attached
00:10:53 to the receptor.
00:10:55 And now they are not necessarily going in a sync mode.
00:11:02 As a matter of fact.
00:11:04 It’s asynchronous.
00:11:05 So yes, and this is where another level of complexity
00:11:11 comes into play because right now what we see is,
00:11:16 we typically see just one of the arms going out
00:11:20 and getting ready to be attached to the ACE2 receptors.
00:11:27 However, there was a recent mutation
00:11:30 that people studied in that spike protein.
00:11:35 And very recently, a group from UMass Medical School
00:11:43 will happen to collaborate with groups.
00:11:45 So this is a group of Jeremy Lubin
00:11:47 and a number of other faculty.
00:11:51 They actually solve the mutated structure of the spike.
00:11:59 And they showed that actually, because of these mutations,
00:12:03 you have more than one arms opening up.
00:12:08 And so now, so the frequency of two arms going up
00:12:13 increase quite drastically.
00:12:17 Interesting.
00:12:18 Does that change the dynamics somehow?
00:12:20 It potentially can change the dynamics
00:12:22 because now you have two possible opportunities
00:12:27 to get attached to the ACE2 receptor.
00:12:30 It’s a very complex molecular process, mechanistic process.
00:12:34 But the first step of this process is the attachment
00:12:38 of this spike protein, of the spike trimer
00:12:42 to the human ACE2 receptor.
00:12:46 So this is a molecule that sits
00:12:48 on the surface of the human cell.
00:12:51 And that’s essentially what initiates,
00:12:54 what triggers the whole process of encapsulation.
00:12:58 If this was dating, this would be the first date.
00:13:01 So this is the…
00:13:03 In a way.
00:13:04 Yes.
00:13:05 So is it possible to have the spike protein
00:13:07 just like floating about on its own?
00:13:10 Or does it need that interactability with the membrane?
00:13:14 Yeah, so it needs to be attached,
00:13:16 at least as far as I know.
00:13:19 But when you get this thing attached on the surface,
00:13:23 there is also a lot of dynamics
00:13:25 on how it sits on the surface.
00:13:28 So for example, there was a recent work in,
00:13:32 again, where people use the cryolectron microscopy
00:13:35 to get the first glimpse of the overall structure.
00:13:38 It’s a very low res, but you still get
00:13:41 some interesting details about the surface,
00:13:45 about what is happening inside,
00:13:47 because we have literally no clue until recent work
00:13:50 about how the capsid is organized.
00:13:54 What’s a capsid?
00:13:55 So a capsid is essentially,
00:13:56 it’s the inner core of the viral particle
00:14:01 where there is the RNA of the virus,
00:14:05 and it’s protected by another protein, N protein,
00:14:10 that essentially acts as a shield.
00:14:13 But now we are learning more and more,
00:14:16 so it’s actually, it’s not just this shield,
00:14:18 it potentially is used for the stability
00:14:21 of the outer shell of the virus.
00:14:25 So it’s pretty complicated.
00:14:27 And I mean, understanding all of this is really useful
00:14:30 for trying to figure out like developing a vaccine
00:14:33 or some kind of drug to attack,
00:14:34 any aspects of this, right?
00:14:36 So, I mean, there are many different implications to that.
00:14:39 First of all, it’s important to understand
00:14:43 the virus itself, right?
00:14:44 So in order to understand how it acts,
00:14:51 what is the overall mechanistic process
00:14:55 of this virus replication,
00:14:57 of this virus proliferation to the cell, right?
00:15:00 So that’s one aspect.
00:15:03 The other aspect is designing new treatments.
00:15:06 So one of the possible treatments
00:15:09 is designing nanoparticles.
00:15:12 And so some nanoparticles that will resemble the viral shape
00:15:17 that would have the spike integrated,
00:15:19 and essentially would act as a competitor to the real virus
00:15:23 by blocking the ACE2 receptors,
00:15:26 and thus preventing the real virus entering the cell.
00:15:30 Now, there are also, you know,
00:15:32 there is a very interesting direction
00:15:35 in looking at the membrane,
00:15:38 at the envelope portion of the protein
00:15:40 and attacking its M protein.
00:15:44 So there are, you know, to give you a, you know,
00:15:48 sort of a brief overview,
00:15:50 there are four structural proteins.
00:15:52 These are the proteins that made up
00:15:54 a structure of the virus.
00:15:58 So SPIKE, S protein that acts as a trimer,
00:16:02 so it needs three copies.
00:16:06 E, envelope protein that acts as a pantomime,
00:16:09 so it needs five copies to act properly.
00:16:13 M is a membrane protein, it forms dimers,
00:16:18 and actually it forms beautiful lattice.
00:16:20 And this is something that we’ve been studying
00:16:22 and we are seeing it in simulations.
00:16:24 It actually forms a very nice grid
00:16:26 or, you know, threads, you know,
00:16:30 of different dimers attached next to each other.
00:16:33 Just a bunch of copies of each other,
00:16:34 and they naturally, when you have a bunch of copies
00:16:36 of each other, they form an interesting lattice.
00:16:38 Exactly.
00:16:39 And, you know, if you think about this, right?
00:16:42 So this complex, you know, the viral shape
00:16:48 needs to be organized somehow, self organized somehow, right?
00:16:52 So it, you know, if it was a completely random process,
00:16:56 you know, you probably wouldn’t have the envelope shell
00:17:02 of the ellipsoid shape, you know,
00:17:03 you would have something, you know,
00:17:05 pretty random, right, shape.
00:17:07 So there is some, you know, regularity
00:17:10 in how this, you know, how this M dimers
00:17:16 get to attach to each other
00:17:18 in a very specific directed way.
00:17:20 Is that understood at all?
00:17:23 It’s not understood.
00:17:24 We are now, we’ve been working in the past six months
00:17:28 since, you know, we met, actually,
00:17:30 this is where we started working on trying to understand
00:17:33 the overall structure of the envelope
00:17:36 and the key components that made up this, you know,
00:17:40 structure.
00:17:41 Wait, does the envelope also have the lattice structure
00:17:43 or no?
00:17:44 So the envelope is essentially is the outer shell
00:17:47 of the viral particle.
00:17:48 The N, the nucleocapsid protein,
00:17:51 is something that is inside.
00:17:53 Got it.
00:17:54 But get that, the N is likely to interact with M.
00:17:59 Does it go M and E?
00:18:01 Like, where’s the E and the M?
00:18:02 So E, those different proteins,
00:18:05 they occur in different copies on the viral particle.
00:18:10 So E, this pentamer complex,
00:18:13 we only have two or three, maybe, per each particle, okay?
00:18:18 We have thousand or so of M dimers
00:18:24 that essentially made up,
00:18:26 that makes up the entire, you know, outer shell.
00:18:30 So most of the outer shell is the M.
00:18:33 M dimer.
00:18:34 And the M protein.
00:18:35 When you say particle, that’s the virion,
00:18:38 the virus, the individual virus.
00:18:40 It’s a single, yes.
00:18:40 Single element of the virus, it’s a single virus.
00:18:43 Single virus, right.
00:18:45 And we have about, you know, roughly 50 to 90 spike trimmers.
00:18:50 Right?
00:18:51 So when you, you know, when you show a…
00:18:54 Per virus particle.
00:18:55 Per virus particle.
00:18:56 Sorry, what did you say, 50 to 90?
00:18:58 50 to 90, right?
00:19:00 So this is how this thing is organized.
00:19:04 And so now, typically, right,
00:19:06 so you see these, the antibodies that target,
00:19:11 you know, spike protein,
00:19:13 certain parts of the spike protein,
00:19:15 but there could be some, also some treatments, right?
00:19:17 So these are, you know, these are small molecules
00:19:22 that bind strategic parts of these proteins,
00:19:27 disrupting its function.
00:19:29 So one of the promising directions,
00:19:34 it’s one of the newest directions,
00:19:35 is actually targeting the M dimer of the protein.
00:19:40 Targeting the proteins that make up this outer shell.
00:19:44 Because if you’re able to destroy the outer shell,
00:19:47 you’re essentially destroying the viral particle itself.
00:19:52 So preventing it from, you know, functioning at all.
00:19:56 So that’s, you think is,
00:19:59 from a sort of cyber security perspective,
00:20:01 virus security perspective,
00:20:02 that’s the best attack vector?
00:20:05 Is, or like, that’s a promising attack vector?
00:20:08 I would say, yeah.
00:20:09 So, I mean, there’s still tons of research needs to be,
00:20:12 you know, to be done.
00:20:14 But yes, I think, you know, so.
00:20:16 There’s more attack surface, I guess.
00:20:18 More attack surface.
00:20:19 But, you know, from our analysis,
00:20:22 from other evolutionary analysis,
00:20:24 this protein is evolutionarily more stable
00:20:28 compared to the, say, to the spike protein.
00:20:31 Oh, and stable means a more static target?
00:20:35 Well, yeah, so it doesn’t change.
00:20:38 It doesn’t evolve from the evolutionary perspective
00:20:42 so drastically as, for example, the spike protein.
00:20:46 There’s a bunch of stuff in the news
00:20:47 about mutations of the virus in the United Kingdom.
00:20:51 I also saw in South Africa something.
00:20:54 Maybe that was yesterday.
00:20:56 You just kind of mentioned about stability and so on.
00:21:00 Which aspects of this are mutatable
00:21:02 and which aspects, if mutated, become more dangerous?
00:21:07 And maybe even zooming out,
00:21:09 what are your thoughts and knowledge and ideas
00:21:12 about the way it’s mutated,
00:21:13 all the news that we’ve been hearing?
00:21:15 Are you worried about it from a biological perspective?
00:21:18 Are you worried about it from a human perspective?
00:21:21 So, I mean, you know, mutations are sort of a general way
00:21:26 for these viruses to evolve, right?
00:21:28 So, it’s, you know, it’s essentially,
00:21:32 this is the way they evolve.
00:21:34 This is the way they were able to jump
00:21:38 from one species to another.
00:21:42 We also see some recent jumps.
00:21:46 There were some incidents of this virus jumping
00:21:50 from human to dogs.
00:21:51 So, you know, there is some danger in those jumps
00:21:55 because every time it jumps, it also mutates, right?
00:21:59 So, when it jumps to the species
00:22:04 and jumps back, right?
00:22:06 So, it acquires some mutations
00:22:08 that are sort of driven by the environment
00:22:14 of a new host, right?
00:22:16 And it’s different from the human environment.
00:22:19 And so, we don’t know whether the mutations
00:22:21 that are acquired in the new species
00:22:24 are neutral with respect to the human host
00:22:28 or maybe, you know, maybe damaging.
00:22:32 Yeah, change is always scary, but so are you worried about,
00:22:36 I mean, it seems like because the spread is,
00:22:38 during winter now, seems to be exceptionally high
00:22:43 and especially with a vaccine just around the corner
00:22:46 already being actually deployed,
00:22:49 is there some worry that this puts evolutionary pressure,
00:22:53 selective pressure on the virus for it to mutate?
00:22:59 Is that a source of worry?
00:23:00 Well, I mean, there is always this thought
00:23:03 in the scientist’s mind, you know, what will happen, right?
00:23:08 So, I know there’ve been discussions
00:23:12 about sort of the arms race between the ability
00:23:17 of the humanity to get vaccinated faster
00:23:22 than the virus, you know, essentially, you know,
00:23:27 it becomes, you know, resistant to the vaccine.
00:23:34 I mean, I don’t worry that much simply because,
00:23:40 you know, there is not that much evidence to that.
00:23:44 To aggressive mutation around the vaccine.
00:23:47 Exactly, you know, obviously there are mutations
00:23:49 around the vaccine, so the reason we get vaccinated
00:23:56 every year against the seasonal mutations, right?
00:24:01 But, you know, I think it’s important to study it.
00:24:06 No doubts, right?
00:24:07 So, I think one of the, you know, to me,
00:24:10 and again, I might be biased because, you know,
00:24:14 we’ve been trying to do that as well,
00:24:17 so, but one of the critical directions
00:24:20 in understanding the virus is to understand its evolution
00:24:23 in order to sort of understand the mechanisms,
00:24:27 the key mechanisms that lead the virus to jump,
00:24:30 you know, the Nordic viruses to jump from species,
00:24:34 from species to another, that the mechanisms
00:24:37 that lead the virus to become resistant to vaccines,
00:24:42 also to treatments, right?
00:24:44 And hopefully that knowledge will enable us
00:24:48 to sort of forecast the evolutionary traces,
00:24:52 the future evolutionary traces of this virus.
00:24:55 I mean, what, from a biological perspective,
00:24:58 this might be a dumb question,
00:24:59 but is there parts of the virus that if souped up,
00:25:05 like through mutation, could make it more effective
00:25:09 at doing its job?
00:25:09 We’re talking about this specific coronavirus
00:25:12 because we were talking about the different, like,
00:25:14 the membrane, the M protein, the E protein,
00:25:18 the N and the S, the spike, is there some?
00:25:24 And there are 20 or so more in addition to that.
00:25:27 But is that a dumb way to look at it?
00:25:29 Like, which of these, if mutated,
00:25:34 could have the greatest impact, potentially damaging impact,
00:25:39 on the effectiveness of the virus?
00:25:41 So it’s actually, it’s a very good question
00:25:44 because, and the short answer is, we don’t know yet.
00:25:48 But of course there is capacity of this virus
00:25:51 to become more efficient.
00:25:53 The reason for that is, you know,
00:25:56 so if you look at the virus, I mean, it’s a machine, right?
00:25:59 So it’s a machine that does a lot of different functions,
00:26:03 and many of these functions are sort of nearly perfect,
00:26:06 but they’re not perfect.
00:26:07 And those mutations can have the greatest impact
00:26:11 and make those functions more perfect.
00:26:14 For example, the attachment to ACE2 receptor, right,
00:26:18 of the spike, right?
00:26:19 So, you know, has this virus reached the efficiency
00:26:28 in which the attachment is carried out?
00:26:31 Or there are some mutations that still to be discovered,
00:26:36 right, that will make this attachment sort of stronger,
00:26:41 or, you know, something more, in a way more efficient
00:26:48 from the point of view of this virus functioning.
00:26:51 That’s sort of the obvious example.
00:26:54 But if you look at each of these proteins,
00:26:57 I mean, it’s there for a reason,
00:26:58 it performs certain function.
00:27:00 And it could be that certain mutations will, you know,
00:27:07 enhance this function.
00:27:08 It could be that some mutations will make this function
00:27:11 much less efficient, right?
00:27:13 So that’s also the case.
00:27:16 Let’s, since we’re talking about the evolutionary history
00:27:18 of a virus, let’s zoom back out
00:27:22 and look at the evolution of proteins.
00:27:25 I glanced at this 2010 Nature paper
00:27:29 on the quote, ongoing expansion of the protein universe.
00:27:34 And then, you know, it kind of implies and talks about
00:27:39 that proteins started with a common ancestor,
00:27:42 which is, you know, kind of interesting.
00:27:44 It’s interesting to think about like,
00:27:45 even just like the first organic thing
00:27:49 that started life on Earth.
00:27:51 And from that, there’s now, you know, what is it?
00:27:56 3.5 billion years later, there’s now millions of proteins.
00:27:59 And they’re still evolving.
00:28:01 And that’s, you know, in part,
00:28:02 one of the things that you’re researching.
00:28:05 Is there something interesting to you about the evolution
00:28:09 of proteins from this initial ancestor to today?
00:28:14 Is there something beautiful and insightful
00:28:16 about this long story?
00:28:18 So I think, you know, if I were to pick a single keyword
00:28:24 about protein evolution, I would pick modularity,
00:28:29 something that we talked about in the beginning.
00:28:32 And that’s the fact that the proteins are no longer
00:28:36 considered as, you know, as a sequence of letters.
00:28:41 There are hierarchical complexities
00:28:45 in the way these proteins are organized.
00:28:48 And these complexities are actually going
00:28:51 beyond the protein sequence.
00:28:53 It’s actually going all the way back to the gene,
00:28:57 to the nucleotide sequence.
00:29:00 And so, you know, again, these protein domains,
00:29:04 they are not only functional building blocks,
00:29:07 they are also evolutionary building blocks.
00:29:09 And so what we see in the sort of,
00:29:12 in the later stages of evolution,
00:29:15 I mean, once this stable structurally
00:29:18 and functionally building blocks were discovered,
00:29:22 they essentially, they stay, those domains stay as such.
00:29:28 So that’s why if you start comparing different proteins,
00:29:31 you will see that many of them will have similar fragments.
00:29:37 And those fragments will correspond to something
00:29:39 that we call protein domain families.
00:29:42 And so they are still different
00:29:44 because you still have mutations and, you know,
00:29:48 the, you know, different mutations are attributed to,
00:29:53 to, you know, diversification of the function
00:29:56 of this, you know, protein domains.
00:29:58 However, you don’t, you very rarely see, you know,
00:30:03 the evolutionary events that would split
00:30:07 this domain into fragments because,
00:30:10 and it’s, you know, once you have the domain split,
00:30:17 you actually, you, you know,
00:30:20 you can completely cancel out its function
00:30:24 or at the very least you can reduce it.
00:30:26 And that’s not, you know, efficient from the point of view
00:30:29 of the, you know, of the cell functioning.
00:30:32 So, so the, the, the protein domain level
00:30:37 is a very important one.
00:30:39 Now, on top of that, right?
00:30:42 So if you look at the proteins, right,
00:30:44 so you have this structural units
00:30:46 and they carry out the function,
00:30:48 but then much less is known about things
00:30:51 that connect this protein domains,
00:30:54 something that we call linkers.
00:30:56 And those linkers are completely flexible, you know,
00:31:00 parts of the protein that nevertheless
00:31:03 carry out a lot of function.
00:31:06 So it’s like little tails, little heads.
00:31:08 So, so, so we do have tails.
00:31:09 So they’re called termini, C and N termini.
00:31:12 So these are things right on the, on, on, on one
00:31:17 and another ends of the protein sequence.
00:31:20 So they are also very important.
00:31:22 So they, they attributed to very specific interactions
00:31:26 between the proteins.
00:31:27 So.
00:31:28 But you’re referring to the links between domains.
00:31:30 That connect the domains.
00:31:32 And, you know, apart from the, just the,
00:31:36 the simple perspective, if you have, you know,
00:31:39 a very short domain, you have, sorry, a very short linker,
00:31:43 you have two domains next to each other.
00:31:45 They are forced to be next to each other.
00:31:47 If you have a very long one,
00:31:49 you have the domains that are extremely flexible
00:31:52 and they carry out a lot of sort of
00:31:54 spatial reorganization, right?
00:31:56 That’s awesome.
00:31:58 But on top of that, right, just this linker itself,
00:32:01 because it’s so flexible, it actually can adapt
00:32:05 to a lot of different shapes.
00:32:07 And therefore it’s a, it’s a very good interactor
00:32:11 when it comes to interaction between this protein
00:32:14 and other protein, right?
00:32:15 So these things also evolve, you know,
00:32:18 and they in a way have different sort of laws of
00:32:25 the driving laws that underlie the evolution
00:32:30 because they no longer need to,
00:32:33 to preserve certain structure, right?
00:32:37 Unlike protein domains.
00:32:38 And so on top of that,
00:32:41 you have something that is even less studied.
00:32:45 And this is something that attribute to,
00:32:49 to the concept of alternative splicing.
00:32:53 So alternative splicing.
00:32:54 So it’s a, it’s a very cool concept.
00:32:56 It’s something that we’ve been fascinated about for,
00:33:00 you know, over a decade in my lab
00:33:03 and trying to do research with that.
00:33:05 But so, you know, so typically, you know,
00:33:08 a simplistic perspective is that one gene
00:33:12 is equal one protein product, right?
00:33:16 So you have a gene, you know,
00:33:18 you transcribe it and translate it
00:33:21 and it becomes a protein.
00:33:24 In reality, when we talk about eukaryotes,
00:33:28 especially sort of more recent eukaryotes
00:33:32 that are very complex,
00:33:33 the gene is no longer equal to one protein.
00:33:40 It actually can produce multiple functionally,
00:33:47 you know, active protein products.
00:33:50 And each of them is, you know,
00:33:52 is called an alternatively spliced product.
00:33:57 The reason it happens is that if you look at the gene,
00:34:00 it actually has, it has also blocks.
00:34:05 And the blocks, some of which,
00:34:08 and it’s essentially, it goes like this.
00:34:10 So we have a block that will later be translated.
00:34:13 We call it exon.
00:34:15 Then we’ll have a block that is not translated, cut out.
00:34:19 We call it intron.
00:34:20 So we have exon, intron, exon, intron,
00:34:22 et cetera, et cetera, et cetera, right?
00:34:24 So sometimes you can have, you know,
00:34:26 dozens of these exons and introns.
00:34:29 So what happens is during the process
00:34:32 when the gene is converted to RNA,
00:34:37 we have things that are cut out,
00:34:41 the introns that are cut out,
00:34:43 and exons that now get assembled together.
00:34:47 And sometimes we will throw out some of the exons
00:34:52 and the remaining protein product will become
00:34:54 still be the same.
00:34:55 Different.
00:34:56 Oh, different.
00:34:57 So now you have fragments of the protein
00:34:59 that no longer there.
00:35:01 They were cut out with the introns.
00:35:03 Sometimes you will essentially take one exon
00:35:07 and replace it with another one, right?
00:35:09 So there’s some flexibility in this process.
00:35:12 So that creates a whole new level of complexity.
00:35:17 Cause now.
00:35:18 Is this random though?
00:35:18 Is it random?
00:35:19 It’s not random.
00:35:20 We, and this is where I think now the appearance
00:35:24 of this modern single cell
00:35:27 and before that tissue level sequencing,
00:35:31 next generation sequencing techniques such as RNA seed
00:35:34 allows us to see that these are the events
00:35:38 that often happen in response.
00:35:41 It’s a dynamic event that happens in response
00:35:44 to disease or in response
00:35:48 to certain developmental stage of a cell.
00:35:51 And this is an incredibly complex layer
00:35:56 that also undergoes, I mean,
00:35:59 because it’s at the gene level, right?
00:36:01 So it undergoes certain evolution, right?
00:36:05 And now we have this interplay
00:36:08 between what is happening in the protein world
00:36:12 and what is happening in the gene and RNA world.
00:36:17 And for example, it’s often that we see
00:36:22 that the boundaries of this exons coincide
00:36:28 with the boundaries of the protein domains, right?
00:36:32 So there is this close interplay to that.
00:36:36 It’s not always, I mean, otherwise it would be too simple,
00:36:39 right?
00:36:40 But we do see the connection
00:36:41 between those sort of machineries.
00:36:45 And obviously the evolution will pick up this complexity
00:36:49 and, you know.
00:36:51 Select for whatever is successful,
00:36:53 whatever is interesting function.
00:36:55 We see that complexity in play
00:36:57 and makes this question more complex, but more exciting.
00:37:02 Small detour, I don’t know if you think about this
00:37:05 into the world of computer science.
00:37:07 There’s a Douglas Hostetter, I think,
00:37:11 came up with the name of Quine,
00:37:14 which are, I don’t know if you’re familiar
00:37:16 with these things, but it’s computer programs
00:37:18 that have, I guess, exon and intron,
00:37:22 and they copy, the whole purpose of the program
00:37:24 is to copy itself.
00:37:26 So it prints copies of itself,
00:37:28 but can also carry information inside of it.
00:37:30 So it’s a very kind of crude, fun exercise of,
00:37:36 can we sort of replicate these ideas from cells?
00:37:40 Can we have a computer program that when you run it,
00:37:42 just print itself, the entirety of itself,
00:37:47 and does it in different programming languages and so on.
00:37:50 I’ve been playing around and writing them.
00:37:51 It’s a kind of fun little exercise.
00:37:53 You know, when I was a kid, so you know,
00:37:55 it was essentially one of the sort of main stages
00:38:02 in informatics Olympiads that you have to reach
00:38:08 in order to be any so good,
00:38:10 is you should be able to write a program
00:38:14 that replicates itself.
00:38:16 And so the task then becomes even sort of more complicated.
00:38:20 So what is the shortest program?
00:38:24 And of course, it’s a function of a programming language,
00:38:27 but yeah, I remember a long, long, long time ago
00:38:30 when we tried to make it short and short
00:38:34 and find the shortcut.
00:38:36 There’s actually on a stack exchange, there’s a entire site
00:38:41 called CodeGolf, I think,
00:38:44 where the entirety is just the competition.
00:38:46 People just come up with whatever task, I don’t know,
00:38:50 like write code that reports the weather today.
00:38:54 And the competition is about whatever programming language,
00:38:58 what is the shortest program?
00:39:00 And it makes you actually, people should check it out
00:39:02 because it makes you realize
00:39:03 there’s some weird programming languages out there.
00:39:07 But just to dig on that a little deeper,
00:39:12 do you think, in computer science,
00:39:16 we don’t often think about programs,
00:39:19 just like the machine learning world now,
00:39:22 that’s still kind of basic programs.
00:39:26 And then there’s humans that replicate themselves, right?
00:39:29 And there’s these mutations and so on.
00:39:31 Do you think we’ll ever have a world
00:39:34 where there’s programs that kind of
00:39:37 have an evolutionary process?
00:39:40 So I’m not talking about evolutionary algorithms,
00:39:42 but I’m talking about programs that kind of
00:39:44 mate with each other and evolve
00:39:46 and like on their own replicate themselves.
00:39:49 So this is kind of the idea here is,
00:39:54 that’s how you can have a runaway thing.
00:39:57 So we think about machine learning as a system
00:39:59 that gets smarter and smarter and smarter and smarter.
00:40:01 At least the machine learning systems of today are like,
00:40:05 it’s a program that you can like turn off,
00:40:09 as opposed to throwing a bunch of little programs out there
00:40:12 and letting them like multiply and mate
00:40:15 and evolve and replicate.
00:40:17 Do you ever think about that kind of world,
00:40:20 when we jump from the biological systems
00:40:23 that you’re looking at to artificial ones?
00:40:27 I mean, it’s almost like you take the sort of the area
00:40:32 of intelligent agents, right?
00:40:34 Which are essentially the independent sort of codes
00:40:38 that run and interact and exchange the information, right?
00:40:42 So I don’t see why not.
00:40:45 I mean, it could be sort of a natural evolution
00:40:48 in this area of computer science.
00:40:52 I think it’s kind of an interesting possibility.
00:40:54 It’s terrifying too,
00:40:55 but I think it’s a really powerful tool.
00:40:58 Like to have like agents that, you know,
00:41:00 we have social networks with millions of people
00:41:02 and they interact.
00:41:03 I think it’s interesting to inject into that,
00:41:05 was already injected into that bots, right?
00:41:08 But those bots are pretty dumb.
00:41:11 You know, they’re probably pretty dumb algorithms.
00:41:15 You know, it’s interesting to think
00:41:17 that there might be bots that evolve together with humans.
00:41:20 And there’s the sea of humans and robots
00:41:23 that are operating first in the digital space.
00:41:26 And then you can also think, I love the idea.
00:41:29 Some people worked, I think at Harvard, at Penn,
00:41:32 there’s robotics labs that, you know,
00:41:37 take as a fundamental task to build a robot
00:41:40 that given extra resources can build another copy of itself,
00:41:44 like in the physical space,
00:41:46 which is super difficult to do, but super interesting.
00:41:50 I remember there’s like research on robots
00:41:54 that can build a bridge.
00:41:55 So they make a copy of themselves
00:41:56 and they connect themselves
00:41:57 and the sort of like self building bridge
00:42:00 based on building blocks.
00:42:02 You can imagine like a building that self assembles.
00:42:05 So it’s basically self assembling structures
00:42:07 from robotic parts.
00:42:10 But it’s interesting to, within that robot,
00:42:13 add the ability to mutate
00:42:15 and do all the interesting like little things
00:42:21 that you’re referring to in evolution
00:42:23 to go from a single origin protein building block
00:42:26 to like this weird complex.
00:42:28 And if you think about this, I mean, you know,
00:42:30 the bits and pieces are there, you know.
00:42:34 So you mentioned the evolution algorithm, right?
00:42:37 You know, so this is sort of,
00:42:38 and maybe sort of the goal is in a way different, right?
00:42:43 So the goal is to, you know, to essentially,
00:42:46 to optimize your search, right?
00:42:50 So, but sort of the ideas are there.
00:42:53 So people recognize that, you know,
00:42:55 that the recombination events lead to global changes
00:43:01 in the search trajectories, the mutations event
00:43:04 is a more refined, you know, step in the search.
00:43:09 Then you have, you know, other sort of
00:43:14 nature inspired algorithm, right?
00:43:16 So one of the reasons that, you know,
00:43:19 I think it’s one of the funnest one
00:43:21 is the slime based algorithm, right?
00:43:24 So it’s, I think the first was introduced
00:43:28 by the Japanese group,
00:43:30 where it was able to solve some pre complex problems.
00:43:35 So that’s, and then I think there are still a lot of things
00:43:43 we’ve yet to, you know, borrow from the nature, right?
00:43:48 So there are a lot of sort of ideas
00:43:52 that nature, you know, gets to offer us that, you know,
00:43:56 it’s up to us to grab it and to, you know,
00:44:01 get the best use of it.
00:44:02 Including neural networks, you know, we have a very crude
00:44:06 inspiration from nature on neural networks.
00:44:08 Maybe there’s other inspirations to be discovered
00:44:10 in the brain or other aspects of the various systems,
00:44:16 even like the immune system, the way it interplays.
00:44:20 I recently started to understand that the,
00:44:22 like the immune system has something to do
00:44:24 with the way the brain operates.
00:44:26 Like there’s multiple things going on in there,
00:44:28 which all of which are not modeled
00:44:30 in artificial neural networks.
00:44:32 And maybe if you throw a little bit of that biological spice
00:44:35 in there, you’ll come up with something, something cool.
00:44:39 I’m not sure if you’re familiar with the Drake equation
00:44:43 that estimate, I just did a video on it yesterday
00:44:46 because I wanted to give my own estimate of it.
00:44:49 It’s an equation that combines a bunch of factors
00:44:52 to estimate how many alien civilizations are in the galaxy.
00:44:56 I’ve heard about it, yes.
00:44:58 So one of the interesting parameters, you know,
00:45:01 it’s like how many stars are born every year,
00:45:05 how many planets are on average per star for this,
00:45:11 how many habitable planets are there.
00:45:14 And then the one that starts being really interesting
00:45:18 is the probability that life emerges on a habitable planet.
00:45:24 So like, I don’t know if you think about,
00:45:27 you certainly think a lot about evolution,
00:45:29 but do you think about the thing
00:45:31 which evolution doesn’t describe,
00:45:32 which is like the beginning of evolution, the origin of life.
00:45:36 I think I put the probability of life developing
00:45:39 in a habitable planet at 1%.
00:45:41 This is very scientifically rigorous.
00:45:44 Okay, well, first at a high level for the Drake equation,
00:45:48 what would you put that percent at on earth?
00:45:51 And in general, do you have something,
00:45:55 do you have thoughts about how life might’ve started,
00:45:58 you know, like the proteins being the first kind of,
00:46:01 one of the early jumping points?
00:46:02 Yeah, so I think back in 2018,
00:46:07 there was a very exciting paper published in Nature
00:46:10 where they found one of the simplest amino acids,
00:46:18 glycine, in a comet dust.
00:46:23 So this is, and I apologize if I don’t pronounce,
00:46:29 it’s a Russian named comet,
00:46:31 it’s I think Chugryumov Gerasimenko.
00:46:34 This is the comet where, and there was this mission
00:46:40 to get close to this comet and get the stardust
00:46:46 from its tail.
00:46:48 And when scientists analyzed it,
00:46:50 they actually found traces of, you know, of glycine,
00:46:56 which, you know, makes up, you know,
00:46:59 it’s one of the basic, one of the 20 basic amino acids
00:47:04 that makes up proteins, right?
00:47:06 So that was kind of very exciting, right?
00:47:10 But, you know, the question is very interesting, right?
00:47:14 So what, you know, if there is some alien life,
00:47:18 is it gonna be made of proteins, right?
00:47:22 Or maybe RNAs, right?
00:47:24 So we see that, you know, the RNA viruses are certainly,
00:47:29 you know, very well established sort of, you know,
00:47:35 group of molecular machines, right?
00:47:37 So, yeah, it’s a very interesting question.
00:47:42 What probability would you put?
00:47:43 Like, how hard is this job?
00:47:45 Like, how unlikely just on Earth do you think
00:47:48 this whole thing is that we got going?
00:47:51 Like, are we really lucky or is it inevitable?
00:47:54 Like, what’s your sense when you sit back
00:47:56 and think about life on Earth?
00:47:58 Is it higher or lower than 1%?
00:48:00 Well, because 1% is pretty low, but it still is like,
00:48:03 damn, that’s a pretty good chance.
00:48:05 Yes, it’s a pretty good chance.
00:48:06 I mean, I would, personally, but again, you know,
00:48:10 I’m, you know, probably not the best person
00:48:14 to do such estimations, but I would, you know,
00:48:19 intuitively, I would probably put it lower.
00:48:23 But still, I mean, you know, given.
00:48:24 So we’re really lucky here on Earth.
00:48:27 I mean.
00:48:28 Or the conditions are really good.
00:48:30 It’s, you know, I think that there was,
00:48:32 everything was right in a way, right?
00:48:35 So we still, it’s not, the conditions were not like ideal
00:48:39 if you try to look at, you know, what was, you know,
00:48:44 several billions years ago when the life emerged.
00:48:48 So there is something called the Rare Earth Hypothesis
00:48:52 that, you know, in counter to the Drake Equation says
00:48:55 that the, you know, the conditions of Earth,
00:49:00 if you actually were to describe Earth,
00:49:03 it’s quite a special place.
00:49:05 So special it might be unique in our galaxy
00:49:09 and potentially, you know, close to unique
00:49:11 in the entire universe.
00:49:12 Like it’s very difficult to reconstruct
00:49:14 those same conditions.
00:49:16 And what the Rare Earth Hypothesis argues
00:49:19 is all those different conditions are essential for life.
00:49:23 And so that’s sort of the counter, you know,
00:49:26 like all the things we, you know,
00:49:29 thinking that Earth is pretty average.
00:49:31 I mean, I can’t really, I’m trying to remember
00:49:34 to go through all of them, but just the fact
00:49:36 that it is shielded from a lot of asteroids,
00:49:41 the, obviously the distance to the sun,
00:49:43 but also the fact that it’s like a perfect balance
00:49:48 between the amount of water and land
00:49:52 and all those kinds of things.
00:49:53 I don’t know, there’s a bunch of different factors
00:49:55 that I don’t remember, there’s a long list.
00:49:57 But it’s fascinating to think about if in order
00:50:01 for something like proteins and then DNA and RNA
00:50:05 to emerge, you need, and basic living organisms,
00:50:10 you need to be very close to an Earth like planet,
00:50:14 which will be sad or exciting, I don’t know which.
00:50:19 If you ask me, I, you know, in a way I put a parallel
00:50:23 between, you know, between our own research.
00:50:28 And I mean, from the intuitive perspective,
00:50:33 you know, you have those two extremes
00:50:36 and the reality is never very rarely falls
00:50:40 into the extremes.
00:50:41 It’s always the optimus always reached somewhere in between.
00:50:46 So, and that’s what I tend to think.
00:50:50 I think that, you know, we’re probably somewhere in between.
00:50:54 So they were not unique, unique, but again,
00:50:58 the chances are, you know, reasonably small.
00:51:01 The problem is we don’t know the other extreme
00:51:04 is like, I tend to think that we don’t actually understand
00:51:08 the basic mechanisms of like what this is all originated
00:51:11 from, like, it seems like we think of life
00:51:15 as this distinct thing, maybe intelligence
00:51:17 is a distinct thing, maybe the physics that,
00:51:20 from which planets and suns are born is a distinct thing.
00:51:24 But that could be a very, it’s like the Stephen Wolfram
00:51:27 thing, it’s like the, from simple rules emerges
00:51:29 greater and greater complexity.
00:51:31 So, you know, I tend to believe that just life finds a way.
00:51:36 Like, we don’t know the extreme of how common life is
00:51:39 because it could be life is like everywhere.
00:51:44 Like, so everywhere that it’s almost like laughable,
00:51:49 like that we’re such idiots to think who are you?
00:51:52 Like, it’s like ridiculous to even like think,
00:51:56 it’s like ants thinking that their little colony
00:51:59 is the unique thing and everything else doesn’t exist.
00:52:03 I mean, it’s also very possible that that’s the extreme
00:52:07 and we’re just not able to maybe comprehend
00:52:09 the nature of that life.
00:52:12 Just to stick on alien life for just a brief moment more,
00:52:16 there is some signs of life on Venus in gaseous form.
00:52:22 There’s hope for life on Mars, probably extinct.
00:52:27 We’re not talking about intelligent life.
00:52:29 Although that has been in the news recently.
00:52:32 We’re talking about basic like, you know, bacteria.
00:52:36 Yeah, and then also, I guess, there’s a couple moons.
00:52:40 Europe.
00:52:41 Yeah, Europa, which is Jupiter’s moon.
00:52:45 I think there’s another one.
00:52:46 Are you, is that exciting or is it terrifying to you
00:52:50 that we might find life?
00:52:52 Do you hope we find life?
00:52:53 I certainly do hope that we find life.
00:52:56 I mean, it was very exciting to hear about this news
00:53:05 about the possible life on Venus.
00:53:09 It’d be nice to have hard evidence of something with,
00:53:12 which is what the hope is for Mars and Europa.
00:53:17 But do you think those organisms
00:53:18 will be similar biologically
00:53:20 or would they even be sort of carbon based
00:53:23 if we do find them?
00:53:25 I would say they would be carbon based.
00:53:28 How similar, it’s a big question, right?
00:53:31 So it’s the moment we discover things outside Earth, right?
00:53:39 Even if it’s a tiny little single cell.
00:53:43 I mean, there is so much.
00:53:45 Just imagine that, that would be so.
00:53:47 I think that that would be another turning point
00:53:50 for the science, you know?
00:53:52 Especially if it’s different in some very new way.
00:53:56 That’s exciting.
00:53:57 Because that says, that’s a definitive statement,
00:53:59 not a definitive, but a pretty strong statement
00:54:01 that life is everywhere in the universe.
00:54:05 To me at least, that’s really exciting.
00:54:08 You brought up Joshua Lederberg in an offline conversation.
00:54:13 I think I’d love to talk to you about Alpha Fold
00:54:15 and this might be an interesting way
00:54:17 to enter that conversation because,
00:54:19 so he won the 1958 Nobel Prize in Physiology and Medicine
00:54:24 for discovering that bacteria can mate and exchange genes.
00:54:29 But he also did a ton of other stuff,
00:54:32 like we mentioned, helping NASA find life on Mars
00:54:37 and the…
00:54:40 Dendro. Dendro.
00:54:42 The chemical expert system.
00:54:45 Expert systems, remember those?
00:54:46 What do you find interesting about this guy
00:54:51 and his ideas about artificial intelligence in general?
00:54:54 So I have a kind of personal story to share.
00:55:00 So I started my PhD in Canada back in 2000.
00:55:05 And so essentially my PhD was,
00:55:07 so we were developing sort of a new language
00:55:10 for symbolic machine learning.
00:55:12 So it’s different from the feature based machine learning.
00:55:15 And one of the sort of cleanest applications
00:55:19 of this approach, of this formalism
00:55:23 was to cheminformatics and computer aided drug design.
00:55:28 So essentially we were, as a part of my research,
00:55:33 I developed a system that essentially looked
00:55:37 at chemical compounds of say the same therapeutic category,
00:55:42 you know, male hormones, right?
00:55:45 And try to figure out the structural fragments
00:55:51 that are the structural building blocks
00:55:54 that are important that define this class
00:55:58 versus structural building blocks
00:55:59 that are there just because, you know,
00:56:02 to complete the structure.
00:56:04 But they are not essentially the ones
00:56:06 that make up the chemical, the key chemical properties
00:56:10 of this therapeutic category.
00:56:12 And, you know, for me, it was something new.
00:56:16 I was trained as an applied mathematicians, you know,
00:56:20 as with some machine learning background,
00:56:22 but, you know, computer aided drug design
00:56:25 was a completely new territory.
00:56:27 So because of that, I often find myself
00:56:31 asking lots of questions on one of these
00:56:34 sort of central forums.
00:56:36 Back then, there were no Facebooks or stuff like that.
00:56:40 There was a forum, you know, it’s a forum.
00:56:43 It’s essentially, it’s like a bulletin board.
00:56:45 Yeah.
00:56:46 On the internet.
00:56:47 Yeah, so you essentially, you have a bunch of people
00:56:50 and you post a question and you get, you know,
00:56:52 an answer from, you know, different people.
00:56:55 And back then, just like one of the most popular forums
00:56:59 was CCL, I think Computational Chemistry Library,
00:57:04 not library, but something like that,
00:57:07 but CCL, that was the forum.
00:57:09 And there, I, you know, I…
00:57:12 Asked a lot of dumb questions.
00:57:14 Yes, I asked questions.
00:57:15 Also shared some, you know, some information
00:57:19 about how formal it is and how we do
00:57:21 and whether whatever we do makes sense.
00:57:25 And so, you know, and I remember that one of these posts,
00:57:29 I mean, I still remember, you know,
00:57:31 I would call it desperately looking
00:57:35 for a chemist advice, something like that, right?
00:57:40 And so I post my question, I explained, you know,
00:57:43 how formalism is, what it does
00:57:49 and what kind of applications I’m planning to do.
00:57:53 And, you know, and it was, you know,
00:57:55 in the middle of the night and I went back to bed.
00:57:59 And next morning, have a phone call from my advisor
00:58:04 who also looked at this forum.
00:58:06 It’s like, you won’t believe who replied to you.
00:58:11 And it’s like, who?
00:58:13 And he said, well, you know, there is a message
00:58:16 to you from Joshua Lederberg.
00:58:19 And my reaction was like, who is Joshua Lederberg?
00:58:22 Your advisor hung up. So, and essentially, you know,
00:58:29 Joshua wrote me that we had conceptually similar ideas
00:58:34 in the dendrial project.
00:58:36 You may wanna look it up.
00:58:39 And we should also, sorry, and it’s a side comment,
00:58:42 say that even though he won the Nobel Prize
00:58:45 at a really young age, in 58, but so he was,
00:58:49 I think he was what, 33.
00:58:52 It’s just crazy.
00:58:53 So anyway, so that’s, so hence in the 90s,
00:58:57 responding to young whippersnappers on the CCL forum.
00:59:02 Okay.
00:59:02 And so back then he was already very senior.
00:59:05 I mean, he unfortunately passed away back in 2008,
00:59:09 but, you know, back in 2001, he was, I mean,
00:59:12 he was a professor emeritus at Rockefeller University.
00:59:15 And, you know, that was actually, believe it or not,
00:59:18 one of the reasons I decided to join, you know,
00:59:25 as a postdoc, the group of Andre Salle,
00:59:28 who was at Rockefeller University,
00:59:30 with the hope that, you know, that I could actually,
00:59:33 you know, have a chance to meet Joshua in person.
00:59:38 And I met him very briefly, right?
00:59:42 Just because he was walking, you know,
00:59:45 there’s a little bridge that connects the,
00:59:47 sort of the research campus with the,
00:59:51 with the sort of skyscraper that Rockefeller owns,
00:59:55 the where, you know, postdocs and faculty
00:59:58 and graduate students live.
01:00:00 And so I met him, you know,
01:00:02 and had a very short conversation, you know.
01:00:06 But so I started, you know, reading about Dendral
01:00:10 and I was amazed, you know, it’s,
01:00:12 we’re talking about 1960, right?
01:00:16 The ideas were so profound.
01:00:19 Well, what’s the fun about the ideas of it?
01:00:21 The reason to make this is even crazier.
01:00:25 So, Lederberg wanted to make a system
01:00:29 that would help him study the extraterrestrial molecules,
01:00:38 right?
01:00:39 So, the idea was that, you know,
01:00:40 the way you study the extraterrestrial molecules
01:00:43 is you do the mass spec analysis, right?
01:00:46 And so the mass spec gives you sort of bits,
01:00:49 numbers about essentially gives you the ideas
01:00:52 about the possible fragments or, you know,
01:00:55 atoms, you know, and maybe a little fragments,
01:00:59 pieces of this molecule that make up the molecule, right?
01:01:03 So now you need to sort of,
01:01:06 to decompose this information
01:01:09 and to figure out what was the hole
01:01:12 before it became fragments, bits and pieces, right?
01:01:17 So, in order to make this, you know,
01:01:20 to have this tool, the idea of Lederberg
01:01:25 was to connect chemistry, computer science,
01:01:32 and to design this so called expert system
01:01:36 that looks, that takes into account,
01:01:38 that takes as an input the mass spec data,
01:01:42 the possible database of possible molecules
01:01:47 and essentially try to sort of induce the molecule
01:01:52 that would correspond to this spectra
01:01:55 or, you know, essentially what this project ended up being
01:02:03 was that, you know, it would provide a list of candidates
01:02:07 that then a chemist would look at and make final decision.
01:02:11 So.
01:02:12 But the original idea, I suppose,
01:02:13 is to solve the entirety of this problem automatically.
01:02:16 Yes, yes.
01:02:17 So he, you know, so he,
01:02:21 back then he approached. 60s.
01:02:25 Yes, believe that, it’s amazing.
01:02:28 I mean, it still blows my mind, you know, that it’s,
01:02:32 that’s, and this was essentially the origin
01:02:37 of the modern bioinformatics, cheminformatics,
01:02:41 you know, back in 60s.
01:02:42 So that’s, you know, every time you deal with projects
01:02:48 like this, with the, you know, research like this,
01:02:51 you just, you know, so the power of the, you know,
01:02:56 intelligence of this people is just, you know, overwhelming.
01:03:01 Do you think about expert systems, is there,
01:03:05 and why they kind of didn’t become successful,
01:03:10 especially in the space of bioinformatics,
01:03:12 where it does seem like there is a lot of expertise
01:03:15 in humans, and, you know, it’s possible to see
01:03:20 that a system like this could be made very useful.
01:03:23 Right.
01:03:24 And be built up.
01:03:25 So it’s actually, it’s a great question,
01:03:26 and this is something, so, you know, so, you know,
01:03:30 at my university, I teach artificial intelligence,
01:03:33 and, you know, we start, my first two lectures
01:03:37 are on the history of AI.
01:03:40 And there we, you know, we try to, you know,
01:03:45 go through the main stages of AI.
01:03:48 And so, you know, the question of why expert systems failed
01:03:54 or became obsolete, it’s actually a very interesting one.
01:03:58 And there are, you know, if you try to read the, you know,
01:04:01 the historical perspectives,
01:04:03 there are actually two lines of thoughts.
01:04:05 One is that they were essentially
01:04:11 not up to the expectations.
01:04:14 And so therefore they were replaced, you know,
01:04:18 by other things, right?
01:04:21 The other one was that completely opposite one,
01:04:25 that they were too good.
01:04:28 And as a result, they essentially became
01:04:31 sort of a household name,
01:04:33 and then essentially they got transformed.
01:04:37 I mean, in both cases, sort of the outcome was the same.
01:04:40 They evolved into something, right?
01:04:43 And that’s what I, you know, if I look at this, right?
01:04:47 So the modern machine learning, right?
01:04:50 So.
01:04:51 So there’s echoes in the modern machine learning.
01:04:53 I think so, I think so, because, you know,
01:04:55 if you think about this, you know, and how we design,
01:04:59 you know, the most successful algorithms,
01:05:02 including AlphaFold, right?
01:05:04 You built in the knowledge about the domain
01:05:08 that you study, right?
01:05:09 So you built in your expertise.
01:05:12 So speaking of AlphaFold,
01:05:14 so DeepMind’s AlphaFold 2 recently was announced
01:05:18 to have, quote unquote, solved protein folding.
01:05:21 But how exciting is this to you?
01:05:24 It seems to be one of the,
01:05:27 one of the exciting things that have happened in 2020.
01:05:29 It’s an incredible accomplishment from the looks of it.
01:05:32 What part of it is amazing to you?
01:05:33 What part would you say is over hype
01:05:36 or maybe misunderstood?
01:05:39 It’s definitely a very exciting achievement.
01:05:41 To give you a little bit of perspective, right?
01:05:43 So in bioinformatics, we have several competitions.
01:05:50 And so the way, you know, you often hear
01:05:53 how those competitions have been explained
01:05:56 to sort of to non bioinformaticians is that, you know,
01:05:59 they call it bioinformatics Olympic games.
01:06:01 And there are several disciplines, right?
01:06:03 So the historically one of the first one
01:06:07 was the discipline in predicting the protein structure,
01:06:10 predicting the 3D coordinates of the protein.
01:06:12 But there are some others.
01:06:13 So the predicting protein functions,
01:06:16 predicting effects of mutations on protein functions,
01:06:21 then predicting protein, protein interactions.
01:06:24 So the original one was CASP
01:06:28 or a critical assessment of a protein structure.
01:06:32 And the, you know, typically what happens
01:06:40 during this competitions is, you know, scientists,
01:06:43 experimental scientists solve the structures,
01:06:48 but don’t put them into the protein data bank,
01:06:51 which is the centralized database
01:06:54 that contains all the 3D coordinates.
01:06:57 Instead, they hold it and release protein sequences.
01:07:02 And now the challenge of the community
01:07:05 is to predict the 3D structures of this proteins
01:07:10 and then use the experimental results structures
01:07:12 to assess which one is the closest one, right?
01:07:16 And this competition, by the way,
01:07:17 just a bunch of different tangents.
01:07:19 And maybe you can also say, what is protein folding?
01:07:22 Then this competition, CASP competition
01:07:25 has become the gold standard.
01:07:27 And that’s what was used to say
01:07:29 that protein folding was solved.
01:07:32 So just to add a little, just a bunch.
01:07:35 So if you could, whenever you say stuff,
01:07:37 maybe throw in some of the basics
01:07:39 for the folks that might be outside of the field.
01:07:41 Anyway, sorry.
01:07:42 So, yeah, so, you know, so the reason it’s, you know,
01:07:45 it’s relevant to our understanding of protein folding
01:07:50 is because, you know, we’ve yet to learn
01:07:54 how the folding mechanistically works, right?
01:07:58 So there are different hypothesis,
01:08:00 what happens to this fold?
01:08:02 For example, there is a hypothesis that the folding happens
01:08:07 by, you know, also in the modular fashion, right?
01:08:12 So that, you know, we have protein domains
01:08:16 that get folded independently
01:08:17 because their structure is stable.
01:08:19 And then the whole protein structure gets formed.
01:08:23 But, you know, within those domains,
01:08:25 we also have a so called secondary structure,
01:08:27 the small alpha helices, beta schists.
01:08:29 So these are, you know, elements that are structurally stable.
01:08:34 And so, and the question is, you know,
01:08:37 when do they get formed?
01:08:40 Because some of the secondary structure elements,
01:08:42 you have to have, you know, a fragment in the beginning
01:08:46 and say the fragment in the middle, right?
01:08:49 So you cannot potentially start having the full fold
01:08:54 from the get go, right?
01:08:57 So it’s still, you know, it’s still a big enigma,
01:09:00 what happens.
01:09:01 We know that it’s an extremely efficient
01:09:04 and stable process, right?
01:09:05 So there’s this long sequence
01:09:07 and the fold happens really quickly.
01:09:09 Exactly.
01:09:10 So that’s really weird, right?
01:09:11 And it happens like the same way almost every time.
01:09:15 Exactly, exactly.
01:09:16 That’s really weird.
01:09:17 That’s freaking weird.
01:09:19 It’s, yeah, that’s why it’s such an amazing thing.
01:09:22 But most importantly, right?
01:09:24 So it’s, you know, so when you see the, you know,
01:09:27 the translation process, right?
01:09:29 So when you don’t have the whole protein translated,
01:09:36 right, it’s still being translated,
01:09:37 you know, getting out from the ribosome,
01:09:41 you already see some structural, you know, fragmentation.
01:09:45 So folding starts happening
01:09:49 before the whole protein gets produced, right?
01:09:52 And so this is obviously, you know,
01:09:55 one of the biggest questions in, you know,
01:09:59 in modern molecular biologists.
01:10:00 Not like maybe what happens,
01:10:04 like that’s not as bigger than the question of folding.
01:10:07 That’s the question of like,
01:10:09 something like deeper fundamental idea of folding.
01:10:12 Yes. Behind folding.
01:10:13 Exactly, exactly.
01:10:14 So, you know, so obviously if we are able to predict
01:10:21 the end product of protein folding,
01:10:24 we are one step closer to understanding
01:10:27 sort of the mechanisms of the protein folding.
01:10:30 Because we can then potentially look and start probing
01:10:34 what are the critical parts of this process
01:10:38 and what are not so critical parts of this process.
01:10:41 So we can start decomposing this, you know,
01:10:44 so in a way this protein structure prediction algorithm
01:10:50 can be used as a tool, right?
01:10:53 So you change the, you know, you modify the protein,
01:10:59 you get back to this tool, it predicts,
01:11:02 okay, it’s completely unstable.
01:11:04 Yeah, which aspects of the input
01:11:07 will have a big impact on the output?
01:11:09 Exactly, exactly.
01:11:11 So what happens is, you know,
01:11:13 we typically have some sort of incremental advancement,
01:11:18 you know, each stage of this CASP competition,
01:11:22 you have groups with incremental advancement
01:11:25 and, you know, historically the top performing groups
01:11:29 were, you know, they were not using machine learning.
01:11:34 They were using a very advanced biophysics
01:11:37 combined with bioinformatics,
01:11:39 combined with, you know, the data mining
01:11:43 and that was, you know, that would enable them
01:11:47 to obtain protein structures of those proteins
01:11:52 that don’t have any structurally solved relatives
01:11:57 because, you know, if we have another protein,
01:12:01 say the same protein, but coming from a different species,
01:12:07 we could potentially derive some ideas
01:12:10 and that’s so called homology or comparative modeling,
01:12:13 where we’ll derive some ideas
01:12:15 from the previously known structures
01:12:17 and that would help us tremendously
01:12:19 in, you know, in reconstructing the 3D structure overall.
01:12:25 But what happens when we don’t have these relatives?
01:12:27 This is when it becomes really, really hard, right?
01:12:31 So that’s so called de novo, you know,
01:12:35 de novo protein structure prediction.
01:12:37 And in this case, those methods were traditionally very good.
01:12:43 But what happened in the last year,
01:12:46 the original alpha fold came into
01:12:50 and all of a sudden it’s much better than everyone else.
01:12:56 This is 2018.
01:12:57 Yeah.
01:12:58 Oh, and the competition is only every two years, I think.
01:13:02 And then, so, you know, it was sort of kind of over shockwave
01:13:08 to the bioinformatics community that, you know,
01:13:10 we have like a state of the art machine learning system
01:13:15 that does, you know, structure prediction.
01:13:18 And essentially what it does, you know,
01:13:20 so if you look at this, it actually predicts the context.
01:13:26 So, you know, so the process of reconstructing
01:13:29 the 3D structure starts by predicting the context
01:13:34 between the different parts of the protein.
01:13:38 And the context essentially is the parts of the proteins
01:13:40 that are in a close proximity to each other.
01:13:43 Right, so actually the machine learning part
01:13:45 seems to be estimating, you can correct me if I’m wrong here,
01:13:51 but it seems to be estimating the distance matrix,
01:13:53 which is like the distance between the different parts.
01:13:55 Yeah, so we call the contact map.
01:13:58 Contact map.
01:13:58 So once you have the contact map,
01:14:00 the reconstruction is becoming more straightforward, right?
01:14:04 But so the contact map is the key.
01:14:06 And so, you know, so that what happened.
01:14:11 And now we started seeing in this current stage, right?
01:14:15 Well, in the most recent one,
01:14:18 we started seeing the emergence of these ideas
01:14:22 in other people works, right?
01:14:25 But yet here’s, you know, AlphaFold2
01:14:29 that again outperforms everyone else.
01:14:33 And also by introducing yet another wave
01:14:35 of the machine learning ideas.
01:14:38 Yeah, there don’t seem to be also an incorporation.
01:14:41 First of all, the paper is not out yet,
01:14:43 but there’s a bunch of ideas already out.
01:14:44 There does seem to be an incorporation of this other thing.
01:14:48 I don’t know if it’s something that you could speak to,
01:14:50 which is like the incorporation of like other structures,
01:14:58 like evolutionary similar structures
01:15:01 that are used to kind of give you hints.
01:15:03 Yes, so evolutionary similarity is something
01:15:08 that we can detect at different levels, right?
01:15:10 So we know, for example,
01:15:12 that the structure of proteins
01:15:17 is more conserved than the sequence.
01:15:20 The sequence could be very different,
01:15:22 but the structural shape is actually still very conserved.
01:15:26 So that’s sort of the intrinsic property that, you know,
01:15:28 in a way related to protein folds,
01:15:31 you know, to the evolution of the, you know,
01:15:34 of the proteins and protein domains, et cetera.
01:15:37 But we know that, I mean, there’ve been multiple studies.
01:15:41 And, you know, ideally, if you have structures,
01:15:45 you know, you should use that information.
01:15:48 However, sometimes we don’t have this information.
01:15:51 Instead, we have a bunch of sequences.
01:15:53 Sequences, we have a lot, right?
01:15:54 So we have, you know, hundreds, thousands
01:16:00 of, you know, different organisms sequenced, right?
01:16:04 And by taking the same protein,
01:16:07 but in different organisms and aligning it,
01:16:11 so making it, you know, making the corresponding positions
01:16:15 aligned, we can actually say a lot
01:16:20 about sort of what is conserved in this protein
01:16:24 and therefore, you know, structurally more stable,
01:16:26 what is diverse in this protein.
01:16:28 So on top of that, we could provide sort of the information
01:16:32 about the sort of the secondary structure
01:16:35 of this protein, et cetera, et cetera.
01:16:36 So this information is extremely useful
01:16:39 and it’s already there.
01:16:41 So while it’s tempting to, you know,
01:16:44 to do a complete ab initio,
01:16:46 so you just have a protein sequence and nothing else,
01:16:49 the reality is such that we are overwhelmed with this data.
01:16:54 So why not use it?
01:16:56 And so, yeah, so I’m looking forward
01:16:59 to reading this paper.
01:17:01 It does seem to, like they’ve,
01:17:03 in the previous version of Alpha Fold,
01:17:05 they didn’t, for this evolutionary similarity thing,
01:17:09 they didn’t use machine learning for that.
01:17:12 Or rather, they used it as like the input
01:17:15 to the entirety of the neural net,
01:17:17 like the features derived from the similarity.
01:17:22 It seems like there’s some kind of quote, unquote,
01:17:24 iterative thing where it seems to be part of the learning
01:17:30 process is the incorporation of this evolutionary similarity.
01:17:34 Yeah, I don’t think there is a bioarchive paper, right?
01:17:36 There’s nothing.
01:17:37 No, there’s nothing.
01:17:38 There’s a blog post that’s written
01:17:40 by a marketing team, essentially,
01:17:42 which, you know, it has some scientific similarity,
01:17:48 probably, to the actual methodology used,
01:17:51 but it could be, it’s like interpreting scripture.
01:17:55 It could be just poetic interpretations of the actual work
01:17:59 as opposed to direct connection to the work.
01:18:01 So now, speaking about protein folding, right?
01:18:04 So, you know, in order to answer the question
01:18:06 whether or not we have solved this, right?
01:18:09 So we need to go back to the beginning of our conversation
01:18:13 with the realization that an average protein
01:18:16 is that typically what the CASP has been focusing on
01:18:22 is this competition has been focusing
01:18:25 on the single, maybe two domain proteins
01:18:29 that are still very compact.
01:18:31 And even those ones are extremely challenging to solve.
01:18:35 But now we talk about, you know,
01:18:37 an average protein that has two, three protein domains.
01:18:42 If you look at the proteins that are in charge
01:18:46 of the, you know, of the process with the neural system,
01:18:51 right, perhaps one of the most recently evolved
01:18:58 sort of systems in an organism, right?
01:19:03 All of them, well, the majority of them
01:19:06 are highly multi domain proteins.
01:19:09 So they are, you know, some of them have five, six, seven,
01:19:13 you know, and more domains, right?
01:19:16 And, you know, we are very far away
01:19:20 from understanding how these proteins are folded.
01:19:22 So the complexity of the protein matters here.
01:19:24 The complexity of the protein modules
01:19:27 or the protein domains.
01:19:30 So you’re saying solved, so the definition
01:19:35 of solved here is particularly the CASP competition
01:19:38 achieving human level, not human level,
01:19:41 achieving experimental level performance
01:19:45 on these particular sets of proteins
01:19:48 that have been used in these competitions.
01:19:50 Well, I mean, you know, I do think that, you know,
01:19:54 especially with regards to the alpha fold,
01:19:57 you know, it is able to, you know, to solve,
01:20:03 you know, at the near experimental level,
01:20:08 pre big majority of the more compact proteins
01:20:15 like, or protein domains.
01:20:16 Because again, in order to understand
01:20:18 how the overall protein, you know,
01:20:22 multi domain protein fold, we do need to understand
01:20:26 the structure of its individual domains.
01:20:28 I mean, unlike if you look at alpha zero
01:20:31 or like even mu zero, if you look at that work,
01:20:36 you know, it’s nice reinforcement learning
01:20:39 self playing mechanisms are nice
01:20:41 cause it’s all in simulation.
01:20:42 So you can learn from just huge amounts.
01:20:45 Like you don’t need data.
01:20:47 It was like the problem with proteins,
01:20:49 like the size, I forget how many 3D structures
01:20:54 have been mapped, but the training data is very small.
01:20:56 No matter what, it’s like millions,
01:20:59 maybe a one or two million or something like that,
01:21:01 but it’s some very small number,
01:21:02 but like, it doesn’t seem like that’s scalable.
01:21:06 There has to be, I don’t know,
01:21:09 it feels like you want to somehow 10 X the data
01:21:13 or a hundred X the data somehow.
01:21:15 Yes, but we also can take advantage of homology models,
01:21:20 right, so the models that are of very good quality
01:21:26 because they are essentially obtained
01:21:30 based on the evolutionary information, right?
01:21:33 So you can, there is a potential to enhance this information
01:21:38 and, you know, use it again to empower the training set.
01:21:43 And it’s, I think, I am actually very optimistic.
01:21:49 I think it’s been one of this sort of, you know,
01:21:58 churning events where you have a system that is,
01:22:05 you know, a machine learning system
01:22:07 that is truly better than the machine learning system.
01:22:12 Better than the sort of the more conventional
01:22:15 biophysics based methods.
01:22:17 That’s a huge leap.
01:22:19 This is one of those fun questions,
01:22:21 but where would you put it in the ranking
01:22:26 of the greatest breakthroughs
01:22:28 in artificial intelligence history?
01:22:31 So like, okay, so let’s see who’s in the running.
01:22:34 Maybe you can correct me.
01:22:35 So you got like AlphaZero and AlphaGo
01:22:39 beating the world champion at the game of Go.
01:22:44 Thought to be impossible like 20 years ago.
01:22:48 Or at least the AI community was highly skeptical.
01:22:51 Then you got like also Deep Blue original Kasparov.
01:22:55 You have deep learning itself,
01:22:56 like the maybe, what would you say,
01:22:58 the AlexNet, ImageNet moment.
01:23:00 So the first neural network
01:23:02 achieving human level performance.
01:23:04 Super, that’s not true.
01:23:07 Achieving like a big leap in performance
01:23:10 on the computer vision problem.
01:23:14 There is OpenAI, the whole like GPT3,
01:23:18 that whole space of transformers and language models
01:23:23 just achieving this incredible performance
01:23:27 of application of neural networks to language models.
01:23:31 Boston Dynamics, pretty cool.
01:23:33 Like robotics.
01:23:35 People are like, there’s no AI.
01:23:38 No, no, there’s no machine learning currently.
01:23:41 But AI is much bigger than machine learning.
01:23:44 So that just the engineering aspect,
01:23:48 I would say it’s one of the greatest accomplishments
01:23:50 in engineering side.
01:23:52 Engineering meaning like mechanical engineering
01:23:56 of robotics ever.
01:23:57 Then of course, autonomous vehicles.
01:23:59 You can argue for Waymo,
01:24:01 which is like the Google self driving car.
01:24:03 Or you can argue for Tesla,
01:24:05 which is like actually being used
01:24:07 by hundreds of thousands of people on the road today,
01:24:10 machine learning system.
01:24:13 And I don’t know if you can, what else is there?
01:24:17 But I think that’s it.
01:24:18 And then AlphaFold, many people are saying
01:24:20 is up there, potentially number one.
01:24:23 Would you put them at number one?
01:24:24 Well, in terms of the impact on the science
01:24:29 and on the society beyond, it’s definitely,
01:24:34 to me would be one of the…
01:24:37 Top three?
01:24:39 What you want?
01:24:39 Maybe, I mean, I’m probably not the best person
01:24:43 to answer that.
01:24:45 But I do have, I remember my,
01:24:51 back in, I think 1997, when Deep Blue,
01:24:56 that Kasparov, it was, I mean, it was a shock.
01:25:01 I mean, it was, and I think for the,
01:25:04 for the pre substantial part of the world,
01:25:14 that especially people who have some experience with chess,
01:25:21 and realizing how incredibly human this game,
01:25:25 how much of a brain power you need
01:25:30 to reach those levels of grandmasters, right, level.
01:25:36 And it’s probably one of the first time,
01:25:37 and how good Kasparov was.
01:25:39 And again, yeah, so Kasparov’s arguably
01:25:42 one of the best ever, right?
01:25:45 And you get a machine that beats him.
01:25:47 All right, so it’s…
01:25:48 First time a machine probably beat a human
01:25:50 at that scale of a thing, of anything.
01:25:53 Yes, yes.
01:25:54 So that was, to me, that was like, you know,
01:25:57 one of the groundbreaking events in the history of AI.
01:26:00 Yeah, that’s probably number one.
01:26:02 Probably, like we don’t, it’s hard to remember.
01:26:05 It’s like Muhammad Ali versus, I don’t know,
01:26:08 any of the Mike Tyson, something like that.
01:26:09 It’s like, nah, you gotta put Muhammad Ali at number one.
01:26:13 Same with Deep Blue,
01:26:15 even though it’s not machine learning based.
01:26:19 Still, it uses advanced search,
01:26:21 and search is the integral part of AI, right?
01:26:24 It’s not, people don’t think of it that way at this moment.
01:26:27 In vogue currently, search is not seen
01:26:30 as a fundamental aspect of intelligence,
01:26:34 but it very well, I mean, it very likely is.
01:26:37 In fact, I mean, that’s what neural networks are,
01:26:39 is they’re just performing search
01:26:41 on the space of parameters, and it’s all search.
01:26:45 All of intelligence is some form of search,
01:26:47 and you just have to become cleverer and clever
01:26:49 at that search problem.
01:26:50 And I also have another one that you didn’t mention
01:26:53 that’s one of my favorite ones is,
01:26:58 so you’ve probably heard of this,
01:26:59 it’s, I think it’s called Deep Rembrandt.
01:27:03 It’s the project where they trained,
01:27:06 I think there was a collaboration
01:27:08 between the sort of the experts
01:27:11 in Rembrandt painting in Netherlands,
01:27:15 and a group, an artificial intelligence group,
01:27:18 where they train an algorithm
01:27:20 to replicate the style of the Rembrandt,
01:27:22 and they actually printed a portrait
01:27:26 that never existed before in the style of Rembrandt.
01:27:32 I think they printed it on a sort of,
01:27:36 on the canvas that, you know,
01:27:38 using pretty much same types of paints and stuff.
01:27:42 To me, it was mind blowing.
01:27:44 Yeah, and the space of art, that’s interesting.
01:27:46 There hasn’t been, maybe that’s it,
01:27:50 but I think there hasn’t been an image in that moment yet
01:27:54 in the space of art.
01:27:56 You haven’t been able to achieve
01:27:58 superhuman level performance in the space of art,
01:28:01 even though there’s this big famous thing
01:28:04 where a piece of art was purchased,
01:28:07 I guess for a lot of money.
01:28:08 Yes.
01:28:09 Yeah, but it’s still, you know,
01:28:11 people are like in the space of music at least,
01:28:15 that’s, you know, it’s clear that human created pieces
01:28:19 are much more popular.
01:28:21 So there hasn’t been a moment where it’s like,
01:28:24 oh, this is, we’re now,
01:28:26 I would say in the space of music,
01:28:28 what makes a lot of money,
01:28:30 we’re talking about serious money,
01:28:32 it’s music and movies, or like shows and so on,
01:28:35 and entertainment.
01:28:36 There hasn’t been a moment where AI created,
01:28:41 AI was able to create a piece of music
01:28:44 or a piece of cinema, like Netflix show,
01:28:49 that is, you know, that’s sufficiently popular
01:28:53 to make a ton of money.
01:28:55 Yeah.
01:28:56 And that moment would be very, very powerful,
01:28:58 because that’s like, that’s an AI system
01:29:01 being used to make a lot of money.
01:29:03 And like direct, of course, AI tools,
01:29:05 like even Premiere, audio editing,
01:29:07 all the editing, everything I do,
01:29:08 to edit this podcast, there’s a lot of AI involved.
01:29:11 Actually, this is a program,
01:29:13 I wanna talk to those folks, just cause I wanna nerd out,
01:29:15 it’s called iZotope, I don’t know if you’re familiar with it.
01:29:18 They have a bunch of tools of audio processing,
01:29:20 and they have, I think they’re Boston based,
01:29:23 just, it’s so exciting to me to use it,
01:29:26 like on the audio here,
01:29:28 cause it’s all machine learning.
01:29:30 It’s not, cause most audio production stuff
01:29:35 is like any kind of processing you do,
01:29:37 it’s very basic signal processing,
01:29:39 and you’re tuning knobs and so on.
01:29:41 They have all of that, of course,
01:29:43 but they also have all of this machine learning stuff,
01:29:46 like where you actually give it training data,
01:29:48 you select parts of the audio you train on,
01:29:51 you train on it, and it figures stuff out.
01:29:56 It’s great, it’s able to detect,
01:29:59 like the ability of it to be able
01:30:01 to separate voice and music, for example,
01:30:04 or voice and anything, is incredible.
01:30:07 Like it just, it’s clearly exceptionally good
01:30:11 at applying these different neural networks models
01:30:14 to just separate the different kinds
01:30:17 of signals from the audio.
01:30:19 That, okay, so that’s really exciting.
01:30:22 Photoshop, Adobe people also use it,
01:30:24 but to generate a piece of music
01:30:28 that will sell millions, a piece of art, yeah.
01:30:31 No, I agree, and you know, it’s,
01:30:34 that’s, you know, as I mentioned,
01:30:39 I offer my AI class, and you know,
01:30:41 an integral part of this is the project, right?
01:30:44 So it’s my favorite, ultimate favorite part,
01:30:47 because it typically, we have these project presentations
01:30:51 the last two weeks of the classes,
01:30:53 right before, you know, the Christmas break,
01:30:56 and it’s sort of, it adds this cool excitement,
01:31:00 and every time, I mean, I’m amazed, you know,
01:31:02 with some projects that people, you know, come up with.
01:31:07 And so, and quite a few of them are actually, you know,
01:31:12 they have some link to arts.
01:31:17 I mean, you know, I think last year we had a group
01:31:21 who designed an AI producing hokus, Japanese poems.
01:31:27 Oh, wow.
01:31:29 So, and some of them, so, you know,
01:31:31 it got trained on the English based,
01:31:34 haikus, haikus, right?
01:31:36 So, and some of them, you know,
01:31:40 they get to present, like, the top selection.
01:31:43 They were pretty good.
01:31:44 I mean, you know, I mean, of course, I’m not a specialist,
01:31:47 but you read them, and you see this is real.
01:31:49 It seems profound.
01:31:50 Yes, yeah, it seems real.
01:31:52 So it’s kind of cool.
01:31:55 We also had a couple of projects where people tried
01:31:57 to teach AI how to play, like, rock music, classical music.
01:32:02 I think, and popular music.
01:32:05 Yeah.
01:32:07 Interestingly enough, you know,
01:32:10 classical music was among the most difficult ones.
01:32:14 Oh, sure.
01:32:15 And, you know, of course, if you, if, you know,
01:32:21 you know, if you look at the, you know,
01:32:23 the, like, grandmasters of music, like Bach, right?
01:32:28 So there is a lot of, there is a lot of,
01:32:31 there is a lot of almost math.
01:32:34 Yeah, well, he’s very mathematical.
01:32:36 Yeah, exactly.
01:32:37 So this is, I would imagine that at least some style
01:32:41 of this music could be picked up,
01:32:43 but then you have this completely different spectrum
01:32:46 of classical composers.
01:32:49 And so, you know, it’s almost like, you know,
01:32:54 you don’t have to sort of look at the data.
01:32:56 You just listen to it and say, nah, that’s not it, not yet.
01:33:01 That’s not it, yeah.
01:33:02 That’s how I feel too.
01:33:03 There’s OpenAI has, I think, OpenMuse
01:33:05 or something like that, the system.
01:33:07 It’s cool, but it’s like, eh,
01:33:09 it’s not compelling for some reason.
01:33:12 It could be a psychological reason too.
01:33:14 Maybe we need to have a human being,
01:33:17 a tortured soul behind the music.
01:33:19 I don’t know.
01:33:20 Yeah, no, absolutely.
01:33:22 I completely agree.
01:33:23 But yeah, whether or not we’ll have,
01:33:26 one day we’ll have, you know,
01:33:29 a song written by an AI engine
01:33:33 to be like in top charts, musical charts,
01:33:37 I wouldn’t be surprised.
01:33:40 I wouldn’t be surprised.
01:33:43 I wonder if we already have one
01:33:44 and it just hasn’t been announced.
01:33:48 We wouldn’t know.
01:33:49 How hard is the multi protein folding problem?
01:33:53 Is that kind of something you’ve already mentioned
01:33:57 which is baked into this idea of greater
01:33:59 and greater complexity of proteins?
01:34:01 Like multi domain proteins,
01:34:03 is that basically become multi protein complexes?
01:34:08 Yes, you got it right.
01:34:10 So it’s sort of, it has the components
01:34:15 of both of protein folding
01:34:18 and protein, protein interactions.
01:34:21 Because in order for these domains,
01:34:24 many of these proteins actually,
01:34:27 they never form a stable structure.
01:34:31 One of my favorite proteins,
01:34:33 and pretty much everyone who works in the,
01:34:37 I know, whom I know, who works with proteins,
01:34:41 they always have their favorite proteins.
01:34:44 Right, so one of my favorite proteins,
01:34:47 probably my favorite protein,
01:34:49 the one that I worked when I was a postdoc
01:34:51 is so called post synaptic density 95, PSD 95 protein.
01:34:56 So it’s one of the key actors
01:35:00 in the majority of neurological processes
01:35:03 at the molecular level.
01:35:04 So it’s a, and essentially it’s a key player
01:35:11 in the post synaptic density.
01:35:13 So this is the crucial part of this synapse
01:35:17 where a lot of these chemical processes are happening.
01:35:22 So it has five domains, right?
01:35:26 So five protein domains.
01:35:27 So pretty large proteins, I think 600 something assets.
01:35:35 But the way it’s organized itself, it’s flexible, right?
01:35:41 So it acts as a scaffold.
01:35:43 So it is used to bring in other proteins.
01:35:49 So they start acting in the orchestrated manner, right?
01:35:54 So, and the type of the shape of this protein,
01:35:58 it’s in a way, there are some stable parts of this protein,
01:36:02 but there are some flexible.
01:36:04 And this flexibility is built in into the protein
01:36:08 in order to become sort of this multifunctional machine.
01:36:13 So do you think that kind of thing is also learnable
01:36:16 through the alpha fold two kind of approach?
01:36:19 I mean, the time will tell.
01:36:22 Is it another level of complexity?
01:36:24 Is it like how big of a jump in complexity
01:36:27 is that whole thing?
01:36:28 To me, it’s yet another level of complexity
01:36:31 because when we talk about protein, protein interactions,
01:36:35 and there is actually a different challenge for this
01:36:38 called Capri, and so this, that is focused specifically
01:36:43 on macromolecular interactions, protein, protein, protein,
01:36:47 DNA, et cetera.
01:36:48 So, but it’s, there are different mechanisms
01:36:56 that govern molecular interactions
01:36:58 and that need to be picked up,
01:37:00 say by a machine learning algorithm.
01:37:03 Interestingly enough, we actually,
01:37:06 we participated for a few years in this competition.
01:37:11 We typically don’t participate in competitions,
01:37:14 I don’t know, don’t have enough time,
01:37:19 because it’s very intensive, it’s a very intensive process.
01:37:23 But we participated back in about 10 years ago or so.
01:37:30 And the way we entered this competition,
01:37:32 so we design a scoring function, right?
01:37:35 So the function that evaluates
01:37:37 whether or not your protein, protein interaction
01:37:40 is supposed to look like experimentally solved, right?
01:37:43 So the scoring function is very critical part
01:37:45 of the model prediction.
01:37:49 So we designed it to be a machine learning one.
01:37:52 And so it was one of the first machine learning
01:37:56 based scoring function used in Capri.
01:38:00 And we essentially learned what should contribute,
01:38:06 what are the critical components contributing
01:38:08 into the protein, protein interaction.
01:38:10 So this could be converted into a learning problem
01:38:13 and thereby it could be learned?
01:38:15 I believe so, yes.
01:38:17 Do you think AlphaFold2 or something similar to it
01:38:20 from DeepMind or somebody else will be,
01:38:24 will result in a Nobel Prize or multiple Nobel Prizes?
01:38:28 So like, you know, obviously, maybe not so obviously,
01:38:33 you can’t give a Nobel Prize to a computer program.
01:38:38 At least for now, give it to the designers of that program.
01:38:42 But do you see one or multiple Nobel Prizes
01:38:46 where AlphaFold2 is like a large percentage
01:38:51 of what that prize is given for?
01:38:54 Would it lead to discoveries at the level of Nobel Prizes?
01:39:00 I mean, I think we are definitely destined
01:39:05 to see the Nobel Prize becoming sort of,
01:39:08 to be evolving with the evolution of science
01:39:12 and the evolution of science as such
01:39:14 that it now becomes like really multi facets, right?
01:39:17 So where you don’t really have like a unique discipline,
01:39:21 you have sort of the, a lot of cross disciplinary talks
01:39:25 in order to achieve sort of, you know,
01:39:28 really big advancements, you know.
01:39:32 So I think, you know, the computational methods
01:39:39 will be acknowledged in one way or another.
01:39:42 And as a matter of fact, you know,
01:39:46 they were first acknowledged back in 2013, right?
01:39:50 Where, you know, the first three people were, you know,
01:39:56 awarded the Nobel Prize for study the protein folding,
01:40:00 right, the principle.
01:40:01 And, you know, I think all three of them
01:40:03 are computational biophysicists, right?
01:40:06 So, you know, that I think is unavoidable.
01:40:13 You know, it will come with the time.
01:40:16 The fact that, you know, alpha fold and, you know,
01:40:23 similar approaches, because again, it’s a matter of time
01:40:26 that people will embrace this, you know, principle
01:40:31 and we’ll see more and more such, you know,
01:40:34 such tools coming into play.
01:40:36 But, you know, these methods will be critical
01:40:41 in a scientific discovery, no doubts about it.
01:40:47 On the engineering side, maybe a dark question,
01:40:51 but do you think it’s possible to use
01:40:53 these machine learning methods
01:40:55 to start to engineer proteins?
01:40:59 And the next question is something quite a few biologists
01:41:04 are against, some are for, for study purposes,
01:41:07 is to engineer viruses.
01:41:09 Do you think machine learning, like something like alpha fold
01:41:12 could be used to engineer viruses?
01:41:14 So to answer the first question, you know,
01:41:16 it has been, you know, a part of the research
01:41:21 in the protein science, the protein design is, you know,
01:41:25 is a very prominent areas of research.
01:41:29 Of course, you know, one of the pioneers is David Baker
01:41:32 and Rosetta algorithm that, you know,
01:41:34 essentially was doing the de novo design and was used
01:41:39 to design new proteins, you know.
01:41:41 And design of proteins means design of function.
01:41:44 So like when you design a protein, you can control,
01:41:47 I mean, the whole point of a protein
01:41:49 with the protein structure comes a function,
01:41:52 like it’s doing something.
01:41:53 Correct.
01:41:54 So you can design different things.
01:41:56 So you can, yeah, so you can, well,
01:41:58 you can look at the proteins from the functional perspective.
01:42:00 You can also look at the proteins
01:42:02 from the structural perspective, right?
01:42:04 So the structural building blocks.
01:42:05 So if you want to have a building block
01:42:07 of a certain shape, you can try to achieve it
01:42:10 by, you know, introducing a new protein sequence
01:42:13 and predicting, you know, how it will fold.
01:42:17 So with that, I mean, it’s a natural,
01:42:22 one of the, you know, natural applications
01:42:25 of these algorithms.
01:42:28 Now, talking about engineering a virus.
01:42:34 With machine learning.
01:42:35 With machine learning, right?
01:42:36 So, well, you know, so luckily for us,
01:42:41 I mean, we don’t have that much data, right?
01:42:46 Yeah.
01:42:47 We actually, right now, one of the projects
01:42:50 that we are carrying on in the lab
01:42:53 is we’re trying to develop a machine learning algorithm
01:42:56 that determines the,
01:42:59 whether or not the current strain is pathogenic.
01:43:02 And the current strain of the coronavirus.
01:43:04 Of the virus.
01:43:06 I mean, so there are applications to coronaviruses
01:43:08 because we have strains of SARS COVID 2,
01:43:11 also SARS COVID, MERS that are pathogenic,
01:43:14 but we also have strains of other coronaviruses
01:43:17 that are, you know, not pathogenic.
01:43:20 I mean, the common cold viruses and, you know,
01:43:24 some other ones, right?
01:43:25 So, so pathogenic meaning spreading.
01:43:28 Pathogenic means actually inflicting damage.
01:43:33 Correct.
01:43:35 There are also some, you know,
01:43:37 seasonal versus pandemic strains of influenza, right?
01:43:41 And determining the, what are the molecular determinant,
01:43:45 right?
01:43:46 So that are built in, into the protein sequence,
01:43:48 into the gene sequence, right?
01:43:50 So, and whether or not the machine learning
01:43:52 can determine those, those components, right?
01:43:58 Oh, interesting.
01:43:59 So like using machine learning to do,
01:44:00 that’s really interesting to, to, to given,
01:44:03 give the input is like what the entire,
01:44:07 the protein sequence and then determine
01:44:09 if this thing is going to be able to do damage
01:44:12 to a biological system.
01:44:14 Yeah.
01:44:15 So, so I mean,
01:44:16 It’s a good machine learning,
01:44:17 you’re saying we don’t have enough data for that?
01:44:19 We, I mean, for, for this specific one, we do.
01:44:22 We might actually, I have, you know,
01:44:24 have to back up on this because we’re still in the process.
01:44:27 There was one work that appeared in bioarchive
01:44:31 by Eugene Kunin, who is one of these, you know,
01:44:34 pioneers in, in, in evolutionary genomics.
01:44:39 And they tried to look at this, but, you know,
01:44:42 the methods were sort of standard, you know,
01:44:46 supervised learning methods.
01:44:48 And now the question is, you know,
01:44:51 can you advance it further by, by using, you know,
01:44:56 not so standard methods, you know?
01:44:58 So there’s obviously a lot of hope in,
01:45:01 in transfer learning where you can actually try to transfer
01:45:05 the information that the machine learning learns about
01:45:08 the proper protein sequences, right?
01:45:11 And, you know, so, so there is some promise
01:45:16 in going this direction, but if we have this,
01:45:18 it would be extremely useful because then
01:45:21 we could essentially forecast the potential mutations
01:45:24 that would make the current strain
01:45:26 more or less pathogenic.
01:45:27 Anticipate, anticipate them from a vaccine development,
01:45:31 for the treatment, antiviral drug development.
01:45:34 That, that would be a very crucial task.
01:45:36 But you could also use that system to then say,
01:45:42 how would we potentially modify this virus
01:45:45 to make it more pathogenic?
01:45:47 This, that’s true.
01:45:49 That’s true.
01:45:50 And then, you know, the, again,
01:45:55 the hope is, well, several things, right?
01:45:59 So one is that, you know, it’s,
01:46:02 even if you design a, you know, a sequence, right?
01:46:06 So to carry out the actual experimental biology,
01:46:12 to ensure that all the components working, you know,
01:46:16 is a completely different matter.
01:46:19 Difficult process.
01:46:19 Yes.
01:46:20 Then the, you know, we’ve seen in the past,
01:46:24 there could be some regulation of the moment
01:46:27 the scientific community recognizes
01:46:30 that it’s now becoming no longer a sort of a fun puzzle
01:46:34 to, you know, for machine learning.
01:46:36 Could be open.
01:46:37 Yeah, so then there might be some regulation.
01:46:40 So I think back in, what, 2015, there was, you know,
01:46:45 there was an issue on regulating the research
01:46:49 on influenza strains, right?
01:46:52 There were several groups, you know,
01:46:55 used sort of the mutation analysis
01:46:58 to determine whether or not this strain will jump
01:47:01 from one species to another.
01:47:03 And I think there was like a half a year moratorium
01:47:06 on the research on the paper published
01:47:09 until, you know, scientists, you know, analyzed it
01:47:13 and decided that it’s actually safe.
01:47:16 I forgot what that’s called.
01:47:17 Something of function, test of function.
01:47:20 Gain of function.
01:47:20 Gain of function, yeah.
01:47:22 Gain of function, loss of function, that’s right.
01:47:24 Sorry.
01:47:26 It’s like, let’s watch this thing mutate for a while
01:47:29 to see like, to see what kind of things we can observe.
01:47:33 I guess I’m not so much worried
01:47:36 about that kind of research if there’s a lot of regulation
01:47:38 and if it’s done very well and with competence and seriously.
01:47:42 I am more worried about kind of this, you know,
01:47:46 the underlying aspect of this question
01:47:49 is more like 50 years from now.
01:47:52 Speaking to the Drake equation,
01:47:54 one of the parameters in the Drake equation
01:47:57 is how long civilizations last.
01:47:59 And that seems to be the most important value actually
01:48:03 for calculating if there’s other alien
01:48:06 intelligent civilizations out there.
01:48:08 That’s where there’s most variability.
01:48:10 Assuming like if life, if that percentage
01:48:15 that life can emerge is like not zero,
01:48:19 like if we’re a super unique,
01:48:21 then it’s the how long we last
01:48:23 is basically the most important thing.
01:48:26 So from a selfish perspective,
01:48:29 but also from a Drake equation perspective,
01:48:32 I’m worried about our civilization lasting.
01:48:35 And you kind of think about all the ways
01:48:37 in which machine learning can be used
01:48:39 to design greater weapons of destruction, right?
01:48:45 And I mean, one way to ask that
01:48:48 if you look sort of 50 years from now,
01:48:50 a hundred years from now,
01:48:52 would you be more worried about natural pandemics
01:48:55 or engineered pandemics?
01:48:59 Like who’s the better designer of viruses,
01:49:02 nature or humans if we look down the line?
01:49:05 I think in my view, I would still be worried
01:49:10 about the natural pandemics simply because I mean,
01:49:14 the capacity of the nature producing this.
01:49:20 It does pretty good job, right?
01:49:22 Yes.
01:49:23 And the motivation for using virus,
01:49:25 engineering viruses as a weapon is a weird one
01:49:29 because maybe you can correct me on this,
01:49:31 but it seems very difficult to target a virus, right?
01:49:35 The whole point of a weapon, the way a rocket works,
01:49:38 if a starting point, you have an end point
01:49:40 and you’re trying to hit a target,
01:49:42 to hit a target with a virus is very difficult.
01:49:44 It’s basically just, right?
01:49:47 The target would be the human species.
01:49:51 Oh man.
01:49:52 Yeah, I have a hope in us.
01:49:54 I’m forever optimistic that we will not,
01:49:58 there’s insufficient evil in the world
01:50:01 to lead to that kind of destruction.
01:50:04 Well, I also hope that, I mean, that’s what we see.
01:50:07 I mean, with the way we are getting connected,
01:50:11 the world is getting connected.
01:50:14 I think it helps for the world to become more transparent.
01:50:21 Yeah.
01:50:22 So the information spread is,
01:50:27 I think it’s one of the key things for the society
01:50:31 to become more balanced one way or another.
01:50:36 This is something that people disagree with me on,
01:50:38 but I do think that the kind of secrecy
01:50:41 that governments have.
01:50:43 So you’re kind of speaking more to the other aspects,
01:50:47 like a research community being more open,
01:50:49 companies are being more open.
01:50:52 Government is still like,
01:50:55 we’re talking about like military secrets.
01:50:57 I think military secrets of the kind
01:51:01 that could destroy the world
01:51:03 will become also a thing of the 20th century.
01:51:07 It’ll become more and more open.
01:51:09 Yeah.
01:51:10 I think nations will lose power in the 21st century,
01:51:13 like lose sufficient power towards secrecies.
01:51:15 Transparency is more beneficial than secrecy,
01:51:18 but of course it’s not obvious.
01:51:21 Let’s hope so.
01:51:22 Let’s hope so that the governments
01:51:27 will become more transparent.
01:51:31 What, so we last talked, I think in March or April,
01:51:35 what have you learned?
01:51:36 How has your philosophical, psychological,
01:51:40 biological worldview changed since then?
01:51:43 Or you’ve been studying it nonstop
01:51:46 from a computational biology perspective.
01:51:48 How has your understanding and thoughts about this virus
01:51:51 changed over those months from the beginning to today?
01:51:54 One thing that I was really amazed at
01:51:58 how efficient the scientific community was.
01:52:03 I mean, and even just judging on this very narrow domain
01:52:10 of protein structure and understanding
01:52:13 the structural characterization of this virus
01:52:17 from the components point of view,
01:52:19 whole virus point of view.
01:52:21 If you look at SARS, something that happened less than 20,
01:52:31 but close enough, 20 years ago,
01:52:34 and you see what, when it happened,
01:52:38 what was sort of the response by the scientific community,
01:52:42 you see that the structure characterizations did a cure,
01:52:47 but it took several years, right?
01:52:51 Now the things that took several years,
01:52:54 it’s a matter of months, right?
01:52:56 So we see that the research pop up.
01:53:01 We are at the unprecedented level
01:53:03 in terms of the sequencing, right?
01:53:05 Never before we had a single virus sequence so many times,
01:53:10 so which allows us to actually to trace very precisely
01:53:16 the sort of the evolutionary nature of this virus,
01:53:21 what happens, and it’s not just this virus independently
01:53:27 of everything, it’s the sequence of this virus
01:53:32 linked, anchored to the specific geographic place
01:53:36 to specific
01:53:24 people, because our genotype influences also
01:53:31 the evolution of this, it’s always a host pathogen,
01:53:35 core evolution that, you know,
01:53:38 it’s not just the virus, it’s the sequence of this virus,
01:53:41 it’s the sequence of this virus linked to the specific
01:53:44 geographic place, it’s the sequence of this virus
01:53:48 linked to the specific geographic place to specific people,
01:53:52 that, you know, occurs.
01:53:55 It’d be cool if we also had a lot more data about,
01:53:58 so that the spread of this virus, not maybe,
01:54:02 well, it’d be nice if we had it for like contact tracing
01:54:06 purposes for this virus, but it’d be also nice if we had it
01:54:09 for the study for future viruses to be able to respond
01:54:12 and so on, but it’s already nice that we have geographical
01:54:15 data and the basic data from individual humans, yeah.
01:54:18 Exactly, no, I think contact tracing is obviously
01:54:22 a key component in understanding
01:54:26 the spread of this virus.
01:54:29 There is also, there is a number of challenges, right?
01:54:31 So XPRIZE is one of them, we
01:54:35 just recently took a part of
01:54:39 this competition, it’s the prediction of the
01:54:43 number of infections in different regions.
01:54:47 Oh, sure.
01:54:48 So, you know, obviously the AI
01:54:52 is the main topic in those predictions.
01:54:55 Yeah, but it’s still, the data, I mean, that’s a competition,
01:54:59 but the data is weak
01:55:03 on the training. Like, it’s great,
01:55:07 it’s much more than probably before, but like, it’d be nice if it was like
01:55:11 really rich. I talked to Michael Mina from
01:55:15 Harvard, I mean, he dreams that the community comes together with like a
01:55:19 weather map to where viruses, right, like
01:55:23 really high resolution sensors on like how
01:55:27 from person to person the viruses that travel, all the different kinds of viruses, right?
01:55:31 Because there’s a ton of them, and then you’d be able to tell
01:55:35 the story that you’ve spoken about
01:55:39 of the evolution of these viruses, like day to day mutations that
01:55:43 are occurring. I mean, that’d be fascinating just from a perspective of
01:55:47 study and from the perspective of being able to respond to future pandemics.
01:55:51 That’s ultimately what I’m worried about. People love
01:55:55 books. Is there some three
01:55:59 or whatever number of books, technical, fiction, philosophical, that
01:56:03 brought you joy in life, had an impact on your life,
01:56:07 and maybe some that you would recommend others?
01:56:11 I’ll give you three very different books, and I also have a special runner up.
01:56:15 Honorable mention.
01:56:19 I mean, it’s an audiobook, and that’s
01:56:23 some specific reason behind it. So the first book is
01:56:27 something that sort of impacted my earlier
01:56:31 stage of life, and I’m probably not going to be very original here.
01:56:35 It’s Bulgakov’s Master and Margarita.
01:56:39 For a Russian, maybe it’s not super original,
01:56:43 but it’s a really powerful book, even in English.
01:56:47 It is incredibly powerful, and
01:56:51 I mean, the way it ends.
01:56:55 I still have goosebumps when I read
01:56:59 the very last sort of, it’s called prologue, where
01:57:03 it’s just so powerful. What impact did it have on you? What ideas?
01:57:07 What insights did you get from it? I was just taken by
01:57:11 the fact that
01:57:15 you have those parallel lives
01:57:19 apart from many centuries, and
01:57:23 somehow they got sort of intertwined into
01:57:27 one story, and that
01:57:31 to me was fascinating. And of course
01:57:35 the romantic part of this book is like
01:57:39 it’s not just romance, it’s like the romance
01:57:43 empowered by sort of magic, right?
01:57:47 And maybe on top of that, you have some irony,
01:57:51 which is unavoidable, right? Because it was that
01:57:55 Soviet time. But it’s very deeply Russian, so that’s
01:57:59 the wit, the humor, the pain, the love,
01:58:03 all of that is one of the books that kind of captures
01:58:07 something about Russian culture that people outside of Russia
01:58:11 should probably read. I agree. What’s the second one? So the second one
01:58:15 is again another one that it happened
01:58:19 I read it later in my life. I think I read it
01:58:23 first time when I was a graduate student.
01:58:27 And that’s the Solzhenitsyn’s Cancer Word.
01:58:31 That is amazingly powerful book.
01:58:35 What is it about? It’s about, I mean, essentially
01:58:39 based on Solzhenitsyn was
01:58:43 diagnosed with cancer when he was reasonably young, and he
01:58:47 made a full recovery. So this is
01:58:51 about a person who was sentenced
01:58:55 for life in one of these camps.
01:58:59 And he had some cancer,
01:59:03 so he was transported back to one of these
01:59:07 Soviet republics, I think it was
01:59:11 South Asian republics. And the
01:59:15 book is about
01:59:19 his experience being a
01:59:23 prisoner, being a patient in the
01:59:27 cancer clinic, in the cancer ward, surrounded
01:59:31 by people, many of which die.
01:59:35 But in the way
01:59:39 it reads, first of all, later on I
01:59:43 read the accounts of the doctors
01:59:47 who describe the experiences
01:59:51 in the book by the
01:59:55 patient as incredibly accurate.
01:59:59 So I read that there was some doctor saying that
02:00:03 every single doctor should read this book to understand
02:00:07 what the patient feels. But
02:00:11 again, as many of the Solzhenitsyn’s
02:00:15 books, it has multiple levels of complexity.
02:00:19 And obviously if you look above
02:00:23 the cancer and the patient, the
02:00:27 tumor that was growing and then disappeared
02:00:31 in his
02:00:35 body with some consequences, this is
02:00:39 allegorically the
02:00:43 Soviet, and he actually
02:00:47 when he was asked, he said that this is what made him
02:00:51 think about this, how to combine these experiences.
02:00:55 Him being a part of the Soviet regime,
02:00:59 also being a part of the
02:01:03 someone sent to Gulag camp,
02:01:07 and also someone who experienced cancer
02:01:11 in his life. The Gulag Archipelago
02:01:15 and this book, these are the works that actually made him
02:01:19 receive a Nobel Prize. But to me
02:01:23 I’ve read
02:01:27 other books by Solzhenitsyn.
02:01:31 This one to me is the most powerful one.
02:01:35 And by the way, both this one and the previous one you read in Russian?
02:01:39 Yes. So now there is the third book is an English book
02:01:43 and it’s completely different. So we’re switching the gears
02:01:47 completely. So this is the book which, it’s not even
02:01:51 a book, it’s an essay by
02:01:55 Jonathan Neumann called The Computer and the Brain.
02:01:59 And that was the book he was writing
02:02:03 knowing that he was dying of cancer.
02:02:07 So the book was released back, it’s a very thin book.
02:02:11 But the power,
02:02:15 the intellectual power in this book, in this essay
02:02:19 is incredible. I mean you probably know that von Neumann
02:02:23 is considered to be one of the biggest
02:02:27 thinkers. So his intellectual power was incredible.
02:02:31 And you can actually feel this power
02:02:35 in this book where the person is writing knowing that he will be,
02:02:39 he will die. The book actually got published only after his
02:02:43 death back in 1958. He died in 1957.
02:02:47 So he tried to put as many
02:02:51 ideas that he still
02:02:55 hadn’t realized.
02:02:59 So this book is very difficult
02:03:03 to read because every single paragraph
02:03:07 is just compact, is
02:03:11 filled with these ideas. And the ideas are incredible.
02:03:15 Even nowadays, so he tried
02:03:19 to put the parallels between the brain
02:03:23 computing power, the neural system, and the computers
02:03:27 as they were understood. Do you remember what year he was working on this?
02:03:31 57. 57. So that was right during his,
02:03:35 when he was diagnosed with cancer and he was essentially…
02:03:39 Yeah, he’s one of those, there’s a few folks people mention,
02:03:43 I think Ed Witten is another that like
02:03:47 everyone that meets them, they say he’s just an intellectual powerhouse.
02:03:51 Yes. Okay, so who’s the honorable mention?
02:03:55 And this is, I mean, the reason I put it sort of in a separate section
02:03:59 because this is a book that I recently
02:04:03 listened to. So it’s an audio book.
02:04:07 And this is a book called Lab Girl by Hope Jarron.
02:04:11 So Hope Jarron, she is a
02:04:15 scientist, she’s a geochemist that essentially
02:04:19 studies the
02:04:23 fossil plants. And so she uses
02:04:27 this fossil plant, the chemical analysis to understand
02:04:31 what was the climate back in
02:04:35 a thousand years, hundreds of thousands of years ago.
02:04:39 And so something that incredibly
02:04:43 touched me by this book, it was narrated by the author.
02:04:47 Nice. And it’s an incredibly
02:04:51 personal story, incredibly. So
02:04:55 certain parts of the book, you could actually hear the author crying.
02:04:59 And that to me, I mean, I never experienced
02:05:03 anything like this, reading the book, but it was like
02:05:07 the connection between you and the author.
02:05:11 And I think this is really
02:05:15 a must read, but even better, a must listen
02:05:19 to audio book for anyone who
02:05:23 wants to learn about sort of
02:05:27 academia, science, research in general, because it’s
02:05:31 a very personal account about her becoming
02:05:35 a scientist. So
02:05:39 we’re just before New Year’s.
02:05:43 We talked a lot about some difficult topics of viruses and so on.
02:05:47 Do you have some exciting things you’re looking forward
02:05:51 to in 2021? Some New Year’s resolutions,
02:05:55 maybe silly or fun, or
02:05:59 something very important and fundamental to
02:06:03 the world of science or something completely unimportant?
02:06:07 Well, I’m definitely looking forward to
02:06:11 towards things becoming normal.
02:06:15 So yes, I really miss traveling.
02:06:19 Every summer I go
02:06:23 to an international summer school. It’s called
02:06:27 the School for Molecular and Theoretical Biology. It’s held in Europe.
02:06:31 It’s organized by very good friends of mine. And this is
02:06:35 the school for gifted kids from all over the world, and
02:06:39 they’re incredibly bright. It’s like every time I go there, it’s like, you know,
02:06:43 it’s a highlight of the year. And
02:06:47 we couldn’t make it this August, so we
02:06:51 did this school remotely, but it’s different.
02:06:55 So I am definitely looking forward to next August
02:06:59 coming there. One of
02:07:03 my personal resolutions, I realized that
02:07:07 being in the house and working from home,
02:07:11 I realized that actually
02:07:15 I apparently missed a lot
02:07:19 spending time with my family,
02:07:23 believe it or not. So you typically, with all the
02:07:27 research and teaching and
02:07:31 everything related to the academic life,
02:07:35 I mean, you get distracted. And so
02:07:39 you don’t feel that
02:07:43 the fact that you are away from your family doesn’t affect you
02:07:47 because you are naturally distracted by other things.
02:07:51 So this time I realized that
02:07:55 that’s so important, right? Spending your time with
02:07:59 the family, with your kids. And so that
02:08:03 would be my new year resolution and actually trying to
02:08:07 spend as much time as possible. Even when the world opens up.
02:08:11 Yeah, that’s a beautiful message. That’s a beautiful reminder.
02:08:15 I asked you if there’s a Russian poem
02:08:19 that I could read, that I could force you to read, and you said, okay, fine, sure.
02:08:23 Do you mind reading?
02:08:27 And you said that no paper needed.
02:08:31 So this poem was written by my namesake,
02:08:35 another Dmitry, Dmitry Kemerefeld.
02:08:39 It’s a recent poem and it’s
02:08:43 called Sorceress, Vyadma,
02:08:47 in Russian, or actually
02:08:51 Koldunya. So that’s sort of another sort of connotation of
02:08:55 sorceress or witch. And I really like it
02:08:59 and it’s one of just a handful poems I actually
02:09:03 can recall by heart. I also have a very strong
02:09:07 association when I read this poem with Master and
02:09:11 Margarita, the main female character,
02:09:15 Margarita. And also it’s
02:09:19 about, it’s happening about the same time we’re talking
02:09:23 now, so around New Year,
02:09:27 around Christmas. Do you mind reading it in Russian?
02:09:31 I’ll give it a try.
02:10:01 So you narrowed your eyes,
02:10:05 that anyone who was blessed
02:10:09 was ready to give their soul to the devil
02:10:13 for this witch’s connection.
02:10:17 And I, without prejudice,
02:10:21 ran out to feel your
02:10:25 amazing breath on your lips,
02:10:29 to remember how you flew above the earth
02:10:33 in a white view,
02:10:37 in a white haze, in a white mist.
02:10:41 That’s beautiful. I love how it captures a moment of longing
02:10:45 and maybe love even.
02:10:49 Yes. To me it has a lot of meaning about
02:10:53 this something that is happening,
02:10:57 something that is far away, but still very close to you.
02:11:01 And yes, it’s the winter.
02:11:05 There’s something magical about winter, isn’t there?
02:11:09 I don’t know how to translate it, but a kiss in winter
02:11:13 is interesting. Lips in winter and all that kind of stuff.
02:11:17 It’s beautiful. Russian has a way. It has a reason, Russian poetry
02:11:21 is just, I’m a fan of poetry in both languages, but English
02:11:25 doesn’t capture some of the magic that Russian seems to, so
02:11:29 thank you for doing that. That was awesome. Dmitry,
02:11:33 it’s great to talk to you again. It’s contagious
02:11:37 how much you love what you do, how much you love life, so I really appreciate
02:11:41 you taking the time to talk today. And thank you for having me.
02:11:45 Thanks for listening to this conversation with Dmitry Korkin, and thank you to our
02:11:49 sponsors. Brave Browser, NetSuite Business Management
02:11:53 Software, Magic Spoon Low Carb Cereal, and
02:11:57 Asleep Self Cooling Mattress. So the choice is
02:12:01 browsing privacy, business success, healthy diet, or comfortable
02:12:05 sleep. Choose wisely, my friends. And if you wish,
02:12:09 click the sponsor links below to get a discount and to support this podcast.
02:12:13 And now, let me leave you with some words from Jeffrey Eugenides.
02:12:17 Biology gives you a brain.
02:12:21 Life turns it into a mind. Thank you for listening,
02:12:25 and hope to see you next time.