Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI #153

Transcript

00:00:00 The following is a conversation with Dmitry Korkin,

00:00:02 his second time in the podcast.

00:00:04 He’s a professor of bioinformatics

00:00:06 and computational biology at WPI,

00:00:09 where he specializes in bioinformatics of complex disease,

00:00:13 computational genomics, systems biology,

00:00:16 and biomedical data analytics.

00:00:18 He loves biology, he loves computing,

00:00:22 plus he is Russian and recites a poem in Russian

00:00:26 at the end of the podcast.

00:00:27 What else could you possibly ask for in this world?

00:00:31 Quick mention of our sponsors.

00:00:32 Brave Browser, NetSuite Business Management Software,

00:00:37 Magic Spoon Low Carb Cereal,

00:00:40 and 8sleep Self Cooling Mattress.

00:00:42 So the choice is browsing privacy, business success,

00:00:46 healthy diet, or comfortable sleep.

00:00:49 Choose wisely, my friends,

00:00:50 and if you wish, click the sponsor links below

00:00:53 to get a discount and to support this podcast.

00:00:56 As a side note, let me say that to me,

00:00:58 the scientists that did the best apolitical,

00:01:01 impactful, brilliant work of 2020

00:01:04 are the biologists who study viruses without an agenda,

00:01:09 without much sleep, to be honest,

00:01:11 just a pure passion for scientific discovery

00:01:14 and exploration of the mysteries within viruses.

00:01:18 Viruses are both terrifying and beautiful.

00:01:21 Terrifying because they can threaten

00:01:22 the fabric of human civilization,

00:01:25 both biological and psychological.

00:01:27 Beautiful because they give us insights

00:01:30 into the nature of life on Earth

00:01:32 and perhaps even extraterrestrial life

00:01:35 of the not so intelligent variety

00:01:37 that might meet us one day

00:01:39 as we explore the habitable planets

00:01:41 and moons in our universe.

00:01:43 If you enjoy this thing, subscribe on YouTube,

00:01:45 review it on Apple Podcast, follow on Spotify,

00:01:49 support on Patreon, or connect with me on Twitter

00:01:51 at Lex Friedman.

00:01:53 And now here’s my conversation with Dmitry Korkin.

00:01:57 It’s often said that proteins

00:02:00 and the amino acid residues that make them up

00:02:04 are the building blocks of life.

00:02:06 Do you think of proteins in this way

00:02:08 as the basic building blocks of life?

00:02:11 Yes and no.

00:02:12 So the proteins indeed is the basic unit,

00:02:16 biological unit that carries out

00:02:20 important function of the cell.

00:02:22 However, through studying the proteins

00:02:25 and comparing the proteins across different species,

00:02:29 across different kingdoms,

00:02:31 you realize that proteins are actually

00:02:34 much more complicated.

00:02:36 So they have so called modular complexity.

00:02:42 And so what I mean by that is an average protein

00:02:47 consists of several structural units.

00:02:54 So we call them protein domains.

00:02:57 And so you can imagine a protein as a string of beads

00:03:02 where each bead is a protein domain.

00:03:05 And in the past 20 years,

00:03:10 scientists have been studying

00:03:13 the nature of the protein domains

00:03:15 because we realize that it’s the unit.

00:03:19 Because if you look at the functions, right?

00:03:22 So many proteins have more than one function

00:03:25 and those protein functions are often carried out

00:03:29 by those protein domains.

00:03:31 So we also see that in the evolution,

00:03:37 those proteins domains get shuffled.

00:03:40 So they act actually as a unit.

00:03:43 Also from the structural perspective, right?

00:03:45 So some people think of a protein

00:03:50 as a sort of a globular molecule,

00:03:55 but as a matter of fact,

00:03:56 is the globular part of this protein is a protein domain.

00:04:02 So we often have this, again,

00:04:06 the collection of this protein domains

00:04:09 align on a string as beads.

00:04:14 And the protein domains are made up of amino acid residue.

00:04:17 So we’re talking about.

00:04:18 So this is the basic,

00:04:20 so you’re saying the protein domain

00:04:22 is the basic building block of the function

00:04:25 that we think about proteins doing.

00:04:28 So of course you can always talk

00:04:30 about different building blocks.

00:04:31 It’s turtles all the way down.

00:04:32 But there’s a point where there is,

00:04:36 at the point of the hierarchy

00:04:37 where it’s the most, the cleanest element block

00:04:43 based on which you can put them together

00:04:46 in different kinds of ways to form complex function.

00:04:49 And you’re saying protein domains,

00:04:50 why is that not talked about as often in popular culture?

00:04:55 Well, there are several perspectives on this.

00:04:59 And one of course is the historical perspective, right?

00:05:03 So historically scientists have been able

00:05:07 to structurally resolved

00:05:10 to obtain the 3D coordinates of a protein

00:05:14 for smaller proteins.

00:05:17 And smaller proteins tend to be a single domain protein.

00:05:21 So we have a protein equal to a protein domain.

00:05:24 And so because of that,

00:05:26 the initial suspicion was that the proteins are,

00:05:29 they have globular shapes

00:05:31 and the more of smaller proteins you obtain structurally,

00:05:36 the more you became convinced that that’s the case.

00:05:41 And only later when we started having

00:05:47 alternative approaches.

00:05:49 So the traditional ones are X ray crystallography

00:05:55 and NMR spectroscopy.

00:05:57 So this is sort of the two main techniques

00:06:02 that give us the 3D coordinates.

00:06:04 But nowadays there’s huge breakthrough

00:06:07 in cryo electron microscopy.

00:06:10 So the more advanced methods that allow us

00:06:13 to get into the 3D shapes of much larger molecules,

00:06:21 molecular complexes,

00:06:23 just to give you one of the common examples

00:06:28 for this year, right?

00:06:29 So the first experimental structure

00:06:32 of a SARS COVID 2 protein

00:06:35 was the cryo EM structure of the S protein.

00:06:40 So the spike protein.

00:06:41 And so it was solved very quickly.

00:06:46 And the reason for that is the advancement

00:06:49 of this technology is pretty spectacular.

00:06:53 How many domains does the, is it more than one domain?

00:06:57 Oh yes.

00:06:58 Oh yes, I mean, so it’s a very complex structure.

00:07:01 And we, you know, on top of the complexity

00:07:06 of a single protein, right?

00:07:08 So this structure is actually is a complex, is a trimer.

00:07:13 So it needs to form a trimer in order to function properly.

00:07:17 What’s a complex?

00:07:18 So a complex is a glomeration of multiple proteins.

00:07:22 And so we can have the same protein copied in multiple,

00:07:29 you know, made up in multiple copies

00:07:32 and forming something that we called a homo oligomer.

00:07:36 Homo means the same, right?

00:07:38 So in this case, so the spike protein is the,

00:07:42 is an example of a homo tetram, homo trimer, sorry.

00:07:46 So you need three copies of it?

00:07:48 Three copies.

00:07:48 In order to.

00:07:50 Exactly.

00:07:50 We have these three chains,

00:07:52 the three molecular chains coupled together

00:07:56 and performing the function.

00:07:58 That’s what, when you look at this protein from the top,

00:08:02 you see a perfect triangle.

00:08:03 Yeah.

00:08:04 So, but other, you know,

00:08:07 so other complexes are made up of, you know,

00:08:10 different proteins.

00:08:12 Some of them are completely different.

00:08:15 Some of them are similar.

00:08:16 The hemoglobin molecule, right?

00:08:18 So it’s actually, it’s a protein complex.

00:08:21 It’s made of four basic subunits.

00:08:25 Two of them are identical to each other.

00:08:29 Two other identical to each other,

00:08:30 but they are also similar to each other,

00:08:32 which sort of gives us some ideas about the evolution

00:08:36 of this, you know, of this molecule.

00:08:40 And perhaps, so one of the hypothesis is that, you know,

00:08:44 in the past, it was just a homo tetramer, right?

00:08:48 So four identical copies,

00:08:50 and then it became, you know, sort of modified,

00:08:55 it became mutated over the time

00:08:58 and became more specialized.

00:09:00 Can we linger on the spike protein for a little bit?

00:09:02 Is there something interesting

00:09:04 or like beautiful you find about it?

00:09:06 I mean, first of all,

00:09:07 it’s an incredibly challenging protein.

00:09:10 And so we, as a part of our sort of research

00:09:16 to understand the structural basis of this virus,

00:09:20 to sort of decode, structurally decode,

00:09:22 every single protein in its proteome,

00:09:27 which, you know, we’ve been working on this spike protein.

00:09:31 And one of the main challenges was that the cryoEM data

00:09:36 allows us to reconstruct or to obtain the 3D coordinates

00:09:44 of roughly two thirds of the protein.

00:09:48 The rest of the one third of this protein,

00:09:51 it’s a part that is buried into the membrane of the virus

00:09:58 and of the viral envelope.

00:10:01 And it also has a lot of unstable structures around it.

00:10:06 So it’s chemically interacting somehow

00:10:08 with whatever the hex is connecting to.

00:10:10 Yeah, so people are still trying to understand.

00:10:12 So the nature of, and the role of this one third,

00:10:18 because the top part, you know, the primary function

00:10:23 is to get attached to the ACE2 receptor, human receptor.

00:10:28 There is also beautiful mechanics

00:10:32 of how this thing happens, right?

00:10:34 So because there are three different copies of this chains,

00:10:39 you know, there are three different domains, right?

00:10:43 So we’re talking about domains.

00:10:44 So this is the receptor binding domains, RBDs,

00:10:47 that gets untangled and get ready to get attached

00:10:53 to the receptor.

00:10:55 And now they are not necessarily going in a sync mode.

00:11:02 As a matter of fact.

00:11:04 It’s asynchronous.

00:11:05 So yes, and this is where another level of complexity

00:11:11 comes into play because right now what we see is,

00:11:16 we typically see just one of the arms going out

00:11:20 and getting ready to be attached to the ACE2 receptors.

00:11:27 However, there was a recent mutation

00:11:30 that people studied in that spike protein.

00:11:35 And very recently, a group from UMass Medical School

00:11:43 will happen to collaborate with groups.

00:11:45 So this is a group of Jeremy Lubin

00:11:47 and a number of other faculty.

00:11:51 They actually solve the mutated structure of the spike.

00:11:59 And they showed that actually, because of these mutations,

00:12:03 you have more than one arms opening up.

00:12:08 And so now, so the frequency of two arms going up

00:12:13 increase quite drastically.

00:12:17 Interesting.

00:12:18 Does that change the dynamics somehow?

00:12:20 It potentially can change the dynamics

00:12:22 because now you have two possible opportunities

00:12:27 to get attached to the ACE2 receptor.

00:12:30 It’s a very complex molecular process, mechanistic process.

00:12:34 But the first step of this process is the attachment

00:12:38 of this spike protein, of the spike trimer

00:12:42 to the human ACE2 receptor.

00:12:46 So this is a molecule that sits

00:12:48 on the surface of the human cell.

00:12:51 And that’s essentially what initiates,

00:12:54 what triggers the whole process of encapsulation.

00:12:58 If this was dating, this would be the first date.

00:13:01 So this is the…

00:13:03 In a way.

00:13:04 Yes.

00:13:05 So is it possible to have the spike protein

00:13:07 just like floating about on its own?

00:13:10 Or does it need that interactability with the membrane?

00:13:14 Yeah, so it needs to be attached,

00:13:16 at least as far as I know.

00:13:19 But when you get this thing attached on the surface,

00:13:23 there is also a lot of dynamics

00:13:25 on how it sits on the surface.

00:13:28 So for example, there was a recent work in,

00:13:32 again, where people use the cryolectron microscopy

00:13:35 to get the first glimpse of the overall structure.

00:13:38 It’s a very low res, but you still get

00:13:41 some interesting details about the surface,

00:13:45 about what is happening inside,

00:13:47 because we have literally no clue until recent work

00:13:50 about how the capsid is organized.

00:13:54 What’s a capsid?

00:13:55 So a capsid is essentially,

00:13:56 it’s the inner core of the viral particle

00:14:01 where there is the RNA of the virus,

00:14:05 and it’s protected by another protein, N protein,

00:14:10 that essentially acts as a shield.

00:14:13 But now we are learning more and more,

00:14:16 so it’s actually, it’s not just this shield,

00:14:18 it potentially is used for the stability

00:14:21 of the outer shell of the virus.

00:14:25 So it’s pretty complicated.

00:14:27 And I mean, understanding all of this is really useful

00:14:30 for trying to figure out like developing a vaccine

00:14:33 or some kind of drug to attack,

00:14:34 any aspects of this, right?

00:14:36 So, I mean, there are many different implications to that.

00:14:39 First of all, it’s important to understand

00:14:43 the virus itself, right?

00:14:44 So in order to understand how it acts,

00:14:51 what is the overall mechanistic process

00:14:55 of this virus replication,

00:14:57 of this virus proliferation to the cell, right?

00:15:00 So that’s one aspect.

00:15:03 The other aspect is designing new treatments.

00:15:06 So one of the possible treatments

00:15:09 is designing nanoparticles.

00:15:12 And so some nanoparticles that will resemble the viral shape

00:15:17 that would have the spike integrated,

00:15:19 and essentially would act as a competitor to the real virus

00:15:23 by blocking the ACE2 receptors,

00:15:26 and thus preventing the real virus entering the cell.

00:15:30 Now, there are also, you know,

00:15:32 there is a very interesting direction

00:15:35 in looking at the membrane,

00:15:38 at the envelope portion of the protein

00:15:40 and attacking its M protein.

00:15:44 So there are, you know, to give you a, you know,

00:15:48 sort of a brief overview,

00:15:50 there are four structural proteins.

00:15:52 These are the proteins that made up

00:15:54 a structure of the virus.

00:15:58 So SPIKE, S protein that acts as a trimer,

00:16:02 so it needs three copies.

00:16:06 E, envelope protein that acts as a pantomime,

00:16:09 so it needs five copies to act properly.

00:16:13 M is a membrane protein, it forms dimers,

00:16:18 and actually it forms beautiful lattice.

00:16:20 And this is something that we’ve been studying

00:16:22 and we are seeing it in simulations.

00:16:24 It actually forms a very nice grid

00:16:26 or, you know, threads, you know,

00:16:30 of different dimers attached next to each other.

00:16:33 Just a bunch of copies of each other,

00:16:34 and they naturally, when you have a bunch of copies

00:16:36 of each other, they form an interesting lattice.

00:16:38 Exactly.

00:16:39 And, you know, if you think about this, right?

00:16:42 So this complex, you know, the viral shape

00:16:48 needs to be organized somehow, self organized somehow, right?

00:16:52 So it, you know, if it was a completely random process,

00:16:56 you know, you probably wouldn’t have the envelope shell

00:17:02 of the ellipsoid shape, you know,

00:17:03 you would have something, you know,

00:17:05 pretty random, right, shape.

00:17:07 So there is some, you know, regularity

00:17:10 in how this, you know, how this M dimers

00:17:16 get to attach to each other

00:17:18 in a very specific directed way.

00:17:20 Is that understood at all?

00:17:23 It’s not understood.

00:17:24 We are now, we’ve been working in the past six months

00:17:28 since, you know, we met, actually,

00:17:30 this is where we started working on trying to understand

00:17:33 the overall structure of the envelope

00:17:36 and the key components that made up this, you know,

00:17:40 structure.

00:17:41 Wait, does the envelope also have the lattice structure

00:17:43 or no?

00:17:44 So the envelope is essentially is the outer shell

00:17:47 of the viral particle.

00:17:48 The N, the nucleocapsid protein,

00:17:51 is something that is inside.

00:17:53 Got it.

00:17:54 But get that, the N is likely to interact with M.

00:17:59 Does it go M and E?

00:18:01 Like, where’s the E and the M?

00:18:02 So E, those different proteins,

00:18:05 they occur in different copies on the viral particle.

00:18:10 So E, this pentamer complex,

00:18:13 we only have two or three, maybe, per each particle, okay?

00:18:18 We have thousand or so of M dimers

00:18:24 that essentially made up,

00:18:26 that makes up the entire, you know, outer shell.

00:18:30 So most of the outer shell is the M.

00:18:33 M dimer.

00:18:34 And the M protein.

00:18:35 When you say particle, that’s the virion,

00:18:38 the virus, the individual virus.

00:18:40 It’s a single, yes.

00:18:40 Single element of the virus, it’s a single virus.

00:18:43 Single virus, right.

00:18:45 And we have about, you know, roughly 50 to 90 spike trimmers.

00:18:50 Right?

00:18:51 So when you, you know, when you show a…

00:18:54 Per virus particle.

00:18:55 Per virus particle.

00:18:56 Sorry, what did you say, 50 to 90?

00:18:58 50 to 90, right?

00:19:00 So this is how this thing is organized.

00:19:04 And so now, typically, right,

00:19:06 so you see these, the antibodies that target,

00:19:11 you know, spike protein,

00:19:13 certain parts of the spike protein,

00:19:15 but there could be some, also some treatments, right?

00:19:17 So these are, you know, these are small molecules

00:19:22 that bind strategic parts of these proteins,

00:19:27 disrupting its function.

00:19:29 So one of the promising directions,

00:19:34 it’s one of the newest directions,

00:19:35 is actually targeting the M dimer of the protein.

00:19:40 Targeting the proteins that make up this outer shell.

00:19:44 Because if you’re able to destroy the outer shell,

00:19:47 you’re essentially destroying the viral particle itself.

00:19:52 So preventing it from, you know, functioning at all.

00:19:56 So that’s, you think is,

00:19:59 from a sort of cyber security perspective,

00:20:01 virus security perspective,

00:20:02 that’s the best attack vector?

00:20:05 Is, or like, that’s a promising attack vector?

00:20:08 I would say, yeah.

00:20:09 So, I mean, there’s still tons of research needs to be,

00:20:12 you know, to be done.

00:20:14 But yes, I think, you know, so.

00:20:16 There’s more attack surface, I guess.

00:20:18 More attack surface.

00:20:19 But, you know, from our analysis,

00:20:22 from other evolutionary analysis,

00:20:24 this protein is evolutionarily more stable

00:20:28 compared to the, say, to the spike protein.

00:20:31 Oh, and stable means a more static target?

00:20:35 Well, yeah, so it doesn’t change.

00:20:38 It doesn’t evolve from the evolutionary perspective

00:20:42 so drastically as, for example, the spike protein.

00:20:46 There’s a bunch of stuff in the news

00:20:47 about mutations of the virus in the United Kingdom.

00:20:51 I also saw in South Africa something.

00:20:54 Maybe that was yesterday.

00:20:56 You just kind of mentioned about stability and so on.

00:21:00 Which aspects of this are mutatable

00:21:02 and which aspects, if mutated, become more dangerous?

00:21:07 And maybe even zooming out,

00:21:09 what are your thoughts and knowledge and ideas

00:21:12 about the way it’s mutated,

00:21:13 all the news that we’ve been hearing?

00:21:15 Are you worried about it from a biological perspective?

00:21:18 Are you worried about it from a human perspective?

00:21:21 So, I mean, you know, mutations are sort of a general way

00:21:26 for these viruses to evolve, right?

00:21:28 So, it’s, you know, it’s essentially,

00:21:32 this is the way they evolve.

00:21:34 This is the way they were able to jump

00:21:38 from one species to another.

00:21:42 We also see some recent jumps.

00:21:46 There were some incidents of this virus jumping

00:21:50 from human to dogs.

00:21:51 So, you know, there is some danger in those jumps

00:21:55 because every time it jumps, it also mutates, right?

00:21:59 So, when it jumps to the species

00:22:04 and jumps back, right?

00:22:06 So, it acquires some mutations

00:22:08 that are sort of driven by the environment

00:22:14 of a new host, right?

00:22:16 And it’s different from the human environment.

00:22:19 And so, we don’t know whether the mutations

00:22:21 that are acquired in the new species

00:22:24 are neutral with respect to the human host

00:22:28 or maybe, you know, maybe damaging.

00:22:32 Yeah, change is always scary, but so are you worried about,

00:22:36 I mean, it seems like because the spread is,

00:22:38 during winter now, seems to be exceptionally high

00:22:43 and especially with a vaccine just around the corner

00:22:46 already being actually deployed,

00:22:49 is there some worry that this puts evolutionary pressure,

00:22:53 selective pressure on the virus for it to mutate?

00:22:59 Is that a source of worry?

00:23:00 Well, I mean, there is always this thought

00:23:03 in the scientist’s mind, you know, what will happen, right?

00:23:08 So, I know there’ve been discussions

00:23:12 about sort of the arms race between the ability

00:23:17 of the humanity to get vaccinated faster

00:23:22 than the virus, you know, essentially, you know,

00:23:27 it becomes, you know, resistant to the vaccine.

00:23:34 I mean, I don’t worry that much simply because,

00:23:40 you know, there is not that much evidence to that.

00:23:44 To aggressive mutation around the vaccine.

00:23:47 Exactly, you know, obviously there are mutations

00:23:49 around the vaccine, so the reason we get vaccinated

00:23:56 every year against the seasonal mutations, right?

00:24:01 But, you know, I think it’s important to study it.

00:24:06 No doubts, right?

00:24:07 So, I think one of the, you know, to me,

00:24:10 and again, I might be biased because, you know,

00:24:14 we’ve been trying to do that as well,

00:24:17 so, but one of the critical directions

00:24:20 in understanding the virus is to understand its evolution

00:24:23 in order to sort of understand the mechanisms,

00:24:27 the key mechanisms that lead the virus to jump,

00:24:30 you know, the Nordic viruses to jump from species,

00:24:34 from species to another, that the mechanisms

00:24:37 that lead the virus to become resistant to vaccines,

00:24:42 also to treatments, right?

00:24:44 And hopefully that knowledge will enable us

00:24:48 to sort of forecast the evolutionary traces,

00:24:52 the future evolutionary traces of this virus.

00:24:55 I mean, what, from a biological perspective,

00:24:58 this might be a dumb question,

00:24:59 but is there parts of the virus that if souped up,

00:25:05 like through mutation, could make it more effective

00:25:09 at doing its job?

00:25:09 We’re talking about this specific coronavirus

00:25:12 because we were talking about the different, like,

00:25:14 the membrane, the M protein, the E protein,

00:25:18 the N and the S, the spike, is there some?

00:25:24 And there are 20 or so more in addition to that.

00:25:27 But is that a dumb way to look at it?

00:25:29 Like, which of these, if mutated,

00:25:34 could have the greatest impact, potentially damaging impact,

00:25:39 on the effectiveness of the virus?

00:25:41 So it’s actually, it’s a very good question

00:25:44 because, and the short answer is, we don’t know yet.

00:25:48 But of course there is capacity of this virus

00:25:51 to become more efficient.

00:25:53 The reason for that is, you know,

00:25:56 so if you look at the virus, I mean, it’s a machine, right?

00:25:59 So it’s a machine that does a lot of different functions,

00:26:03 and many of these functions are sort of nearly perfect,

00:26:06 but they’re not perfect.

00:26:07 And those mutations can have the greatest impact

00:26:11 and make those functions more perfect.

00:26:14 For example, the attachment to ACE2 receptor, right,

00:26:18 of the spike, right?

00:26:19 So, you know, has this virus reached the efficiency

00:26:28 in which the attachment is carried out?

00:26:31 Or there are some mutations that still to be discovered,

00:26:36 right, that will make this attachment sort of stronger,

00:26:41 or, you know, something more, in a way more efficient

00:26:48 from the point of view of this virus functioning.

00:26:51 That’s sort of the obvious example.

00:26:54 But if you look at each of these proteins,

00:26:57 I mean, it’s there for a reason,

00:26:58 it performs certain function.

00:27:00 And it could be that certain mutations will, you know,

00:27:07 enhance this function.

00:27:08 It could be that some mutations will make this function

00:27:11 much less efficient, right?

00:27:13 So that’s also the case.

00:27:16 Let’s, since we’re talking about the evolutionary history

00:27:18 of a virus, let’s zoom back out

00:27:22 and look at the evolution of proteins.

00:27:25 I glanced at this 2010 Nature paper

00:27:29 on the quote, ongoing expansion of the protein universe.

00:27:34 And then, you know, it kind of implies and talks about

00:27:39 that proteins started with a common ancestor,

00:27:42 which is, you know, kind of interesting.

00:27:44 It’s interesting to think about like,

00:27:45 even just like the first organic thing

00:27:49 that started life on Earth.

00:27:51 And from that, there’s now, you know, what is it?

00:27:56 3.5 billion years later, there’s now millions of proteins.

00:27:59 And they’re still evolving.

00:28:01 And that’s, you know, in part,

00:28:02 one of the things that you’re researching.

00:28:05 Is there something interesting to you about the evolution

00:28:09 of proteins from this initial ancestor to today?

00:28:14 Is there something beautiful and insightful

00:28:16 about this long story?

00:28:18 So I think, you know, if I were to pick a single keyword

00:28:24 about protein evolution, I would pick modularity,

00:28:29 something that we talked about in the beginning.

00:28:32 And that’s the fact that the proteins are no longer

00:28:36 considered as, you know, as a sequence of letters.

00:28:41 There are hierarchical complexities

00:28:45 in the way these proteins are organized.

00:28:48 And these complexities are actually going

00:28:51 beyond the protein sequence.

00:28:53 It’s actually going all the way back to the gene,

00:28:57 to the nucleotide sequence.

00:29:00 And so, you know, again, these protein domains,

00:29:04 they are not only functional building blocks,

00:29:07 they are also evolutionary building blocks.

00:29:09 And so what we see in the sort of,

00:29:12 in the later stages of evolution,

00:29:15 I mean, once this stable structurally

00:29:18 and functionally building blocks were discovered,

00:29:22 they essentially, they stay, those domains stay as such.

00:29:28 So that’s why if you start comparing different proteins,

00:29:31 you will see that many of them will have similar fragments.

00:29:37 And those fragments will correspond to something

00:29:39 that we call protein domain families.

00:29:42 And so they are still different

00:29:44 because you still have mutations and, you know,

00:29:48 the, you know, different mutations are attributed to,

00:29:53 to, you know, diversification of the function

00:29:56 of this, you know, protein domains.

00:29:58 However, you don’t, you very rarely see, you know,

00:30:03 the evolutionary events that would split

00:30:07 this domain into fragments because,

00:30:10 and it’s, you know, once you have the domain split,

00:30:17 you actually, you, you know,

00:30:20 you can completely cancel out its function

00:30:24 or at the very least you can reduce it.

00:30:26 And that’s not, you know, efficient from the point of view

00:30:29 of the, you know, of the cell functioning.

00:30:32 So, so the, the, the protein domain level

00:30:37 is a very important one.

00:30:39 Now, on top of that, right?

00:30:42 So if you look at the proteins, right,

00:30:44 so you have this structural units

00:30:46 and they carry out the function,

00:30:48 but then much less is known about things

00:30:51 that connect this protein domains,

00:30:54 something that we call linkers.

00:30:56 And those linkers are completely flexible, you know,

00:31:00 parts of the protein that nevertheless

00:31:03 carry out a lot of function.

00:31:06 So it’s like little tails, little heads.

00:31:08 So, so, so we do have tails.

00:31:09 So they’re called termini, C and N termini.

00:31:12 So these are things right on the, on, on, on one

00:31:17 and another ends of the protein sequence.

00:31:20 So they are also very important.

00:31:22 So they, they attributed to very specific interactions

00:31:26 between the proteins.

00:31:27 So.

00:31:28 But you’re referring to the links between domains.

00:31:30 That connect the domains.

00:31:32 And, you know, apart from the, just the,

00:31:36 the simple perspective, if you have, you know,

00:31:39 a very short domain, you have, sorry, a very short linker,

00:31:43 you have two domains next to each other.

00:31:45 They are forced to be next to each other.

00:31:47 If you have a very long one,

00:31:49 you have the domains that are extremely flexible

00:31:52 and they carry out a lot of sort of

00:31:54 spatial reorganization, right?

00:31:56 That’s awesome.

00:31:58 But on top of that, right, just this linker itself,

00:32:01 because it’s so flexible, it actually can adapt

00:32:05 to a lot of different shapes.

00:32:07 And therefore it’s a, it’s a very good interactor

00:32:11 when it comes to interaction between this protein

00:32:14 and other protein, right?

00:32:15 So these things also evolve, you know,

00:32:18 and they in a way have different sort of laws of

00:32:25 the driving laws that underlie the evolution

00:32:30 because they no longer need to,

00:32:33 to preserve certain structure, right?

00:32:37 Unlike protein domains.

00:32:38 And so on top of that,

00:32:41 you have something that is even less studied.

00:32:45 And this is something that attribute to,

00:32:49 to the concept of alternative splicing.

00:32:53 So alternative splicing.

00:32:54 So it’s a, it’s a very cool concept.

00:32:56 It’s something that we’ve been fascinated about for,

00:33:00 you know, over a decade in my lab

00:33:03 and trying to do research with that.

00:33:05 But so, you know, so typically, you know,

00:33:08 a simplistic perspective is that one gene

00:33:12 is equal one protein product, right?

00:33:16 So you have a gene, you know,

00:33:18 you transcribe it and translate it

00:33:21 and it becomes a protein.

00:33:24 In reality, when we talk about eukaryotes,

00:33:28 especially sort of more recent eukaryotes

00:33:32 that are very complex,

00:33:33 the gene is no longer equal to one protein.

00:33:40 It actually can produce multiple functionally,

00:33:47 you know, active protein products.

00:33:50 And each of them is, you know,

00:33:52 is called an alternatively spliced product.

00:33:57 The reason it happens is that if you look at the gene,

00:34:00 it actually has, it has also blocks.

00:34:05 And the blocks, some of which,

00:34:08 and it’s essentially, it goes like this.

00:34:10 So we have a block that will later be translated.

00:34:13 We call it exon.

00:34:15 Then we’ll have a block that is not translated, cut out.

00:34:19 We call it intron.

00:34:20 So we have exon, intron, exon, intron,

00:34:22 et cetera, et cetera, et cetera, right?

00:34:24 So sometimes you can have, you know,

00:34:26 dozens of these exons and introns.

00:34:29 So what happens is during the process

00:34:32 when the gene is converted to RNA,

00:34:37 we have things that are cut out,

00:34:41 the introns that are cut out,

00:34:43 and exons that now get assembled together.

00:34:47 And sometimes we will throw out some of the exons

00:34:52 and the remaining protein product will become

00:34:54 still be the same.

00:34:55 Different.

00:34:56 Oh, different.

00:34:57 So now you have fragments of the protein

00:34:59 that no longer there.

00:35:01 They were cut out with the introns.

00:35:03 Sometimes you will essentially take one exon

00:35:07 and replace it with another one, right?

00:35:09 So there’s some flexibility in this process.

00:35:12 So that creates a whole new level of complexity.

00:35:17 Cause now.

00:35:18 Is this random though?

00:35:18 Is it random?

00:35:19 It’s not random.

00:35:20 We, and this is where I think now the appearance

00:35:24 of this modern single cell

00:35:27 and before that tissue level sequencing,

00:35:31 next generation sequencing techniques such as RNA seed

00:35:34 allows us to see that these are the events

00:35:38 that often happen in response.

00:35:41 It’s a dynamic event that happens in response

00:35:44 to disease or in response

00:35:48 to certain developmental stage of a cell.

00:35:51 And this is an incredibly complex layer

00:35:56 that also undergoes, I mean,

00:35:59 because it’s at the gene level, right?

00:36:01 So it undergoes certain evolution, right?

00:36:05 And now we have this interplay

00:36:08 between what is happening in the protein world

00:36:12 and what is happening in the gene and RNA world.

00:36:17 And for example, it’s often that we see

00:36:22 that the boundaries of this exons coincide

00:36:28 with the boundaries of the protein domains, right?

00:36:32 So there is this close interplay to that.

00:36:36 It’s not always, I mean, otherwise it would be too simple,

00:36:39 right?

00:36:40 But we do see the connection

00:36:41 between those sort of machineries.

00:36:45 And obviously the evolution will pick up this complexity

00:36:49 and, you know.

00:36:51 Select for whatever is successful,

00:36:53 whatever is interesting function.

00:36:55 We see that complexity in play

00:36:57 and makes this question more complex, but more exciting.

00:37:02 Small detour, I don’t know if you think about this

00:37:05 into the world of computer science.

00:37:07 There’s a Douglas Hostetter, I think,

00:37:11 came up with the name of Quine,

00:37:14 which are, I don’t know if you’re familiar

00:37:16 with these things, but it’s computer programs

00:37:18 that have, I guess, exon and intron,

00:37:22 and they copy, the whole purpose of the program

00:37:24 is to copy itself.

00:37:26 So it prints copies of itself,

00:37:28 but can also carry information inside of it.

00:37:30 So it’s a very kind of crude, fun exercise of,

00:37:36 can we sort of replicate these ideas from cells?

00:37:40 Can we have a computer program that when you run it,

00:37:42 just print itself, the entirety of itself,

00:37:47 and does it in different programming languages and so on.

00:37:50 I’ve been playing around and writing them.

00:37:51 It’s a kind of fun little exercise.

00:37:53 You know, when I was a kid, so you know,

00:37:55 it was essentially one of the sort of main stages

00:38:02 in informatics Olympiads that you have to reach

00:38:08 in order to be any so good,

00:38:10 is you should be able to write a program

00:38:14 that replicates itself.

00:38:16 And so the task then becomes even sort of more complicated.

00:38:20 So what is the shortest program?

00:38:24 And of course, it’s a function of a programming language,

00:38:27 but yeah, I remember a long, long, long time ago

00:38:30 when we tried to make it short and short

00:38:34 and find the shortcut.

00:38:36 There’s actually on a stack exchange, there’s a entire site

00:38:41 called CodeGolf, I think,

00:38:44 where the entirety is just the competition.

00:38:46 People just come up with whatever task, I don’t know,

00:38:50 like write code that reports the weather today.

00:38:54 And the competition is about whatever programming language,

00:38:58 what is the shortest program?

00:39:00 And it makes you actually, people should check it out

00:39:02 because it makes you realize

00:39:03 there’s some weird programming languages out there.

00:39:07 But just to dig on that a little deeper,

00:39:12 do you think, in computer science,

00:39:16 we don’t often think about programs,

00:39:19 just like the machine learning world now,

00:39:22 that’s still kind of basic programs.

00:39:26 And then there’s humans that replicate themselves, right?

00:39:29 And there’s these mutations and so on.

00:39:31 Do you think we’ll ever have a world

00:39:34 where there’s programs that kind of

00:39:37 have an evolutionary process?

00:39:40 So I’m not talking about evolutionary algorithms,

00:39:42 but I’m talking about programs that kind of

00:39:44 mate with each other and evolve

00:39:46 and like on their own replicate themselves.

00:39:49 So this is kind of the idea here is,

00:39:54 that’s how you can have a runaway thing.

00:39:57 So we think about machine learning as a system

00:39:59 that gets smarter and smarter and smarter and smarter.

00:40:01 At least the machine learning systems of today are like,

00:40:05 it’s a program that you can like turn off,

00:40:09 as opposed to throwing a bunch of little programs out there

00:40:12 and letting them like multiply and mate

00:40:15 and evolve and replicate.

00:40:17 Do you ever think about that kind of world,

00:40:20 when we jump from the biological systems

00:40:23 that you’re looking at to artificial ones?

00:40:27 I mean, it’s almost like you take the sort of the area

00:40:32 of intelligent agents, right?

00:40:34 Which are essentially the independent sort of codes

00:40:38 that run and interact and exchange the information, right?

00:40:42 So I don’t see why not.

00:40:45 I mean, it could be sort of a natural evolution

00:40:48 in this area of computer science.

00:40:52 I think it’s kind of an interesting possibility.

00:40:54 It’s terrifying too,

00:40:55 but I think it’s a really powerful tool.

00:40:58 Like to have like agents that, you know,

00:41:00 we have social networks with millions of people

00:41:02 and they interact.

00:41:03 I think it’s interesting to inject into that,

00:41:05 was already injected into that bots, right?

00:41:08 But those bots are pretty dumb.

00:41:11 You know, they’re probably pretty dumb algorithms.

00:41:15 You know, it’s interesting to think

00:41:17 that there might be bots that evolve together with humans.

00:41:20 And there’s the sea of humans and robots

00:41:23 that are operating first in the digital space.

00:41:26 And then you can also think, I love the idea.

00:41:29 Some people worked, I think at Harvard, at Penn,

00:41:32 there’s robotics labs that, you know,

00:41:37 take as a fundamental task to build a robot

00:41:40 that given extra resources can build another copy of itself,

00:41:44 like in the physical space,

00:41:46 which is super difficult to do, but super interesting.

00:41:50 I remember there’s like research on robots

00:41:54 that can build a bridge.

00:41:55 So they make a copy of themselves

00:41:56 and they connect themselves

00:41:57 and the sort of like self building bridge

00:42:00 based on building blocks.

00:42:02 You can imagine like a building that self assembles.

00:42:05 So it’s basically self assembling structures

00:42:07 from robotic parts.

00:42:10 But it’s interesting to, within that robot,

00:42:13 add the ability to mutate

00:42:15 and do all the interesting like little things

00:42:21 that you’re referring to in evolution

00:42:23 to go from a single origin protein building block

00:42:26 to like this weird complex.

00:42:28 And if you think about this, I mean, you know,

00:42:30 the bits and pieces are there, you know.

00:42:34 So you mentioned the evolution algorithm, right?

00:42:37 You know, so this is sort of,

00:42:38 and maybe sort of the goal is in a way different, right?

00:42:43 So the goal is to, you know, to essentially,

00:42:46 to optimize your search, right?

00:42:50 So, but sort of the ideas are there.

00:42:53 So people recognize that, you know,

00:42:55 that the recombination events lead to global changes

00:43:01 in the search trajectories, the mutations event

00:43:04 is a more refined, you know, step in the search.

00:43:09 Then you have, you know, other sort of

00:43:14 nature inspired algorithm, right?

00:43:16 So one of the reasons that, you know,

00:43:19 I think it’s one of the funnest one

00:43:21 is the slime based algorithm, right?

00:43:24 So it’s, I think the first was introduced

00:43:28 by the Japanese group,

00:43:30 where it was able to solve some pre complex problems.

00:43:35 So that’s, and then I think there are still a lot of things

00:43:43 we’ve yet to, you know, borrow from the nature, right?

00:43:48 So there are a lot of sort of ideas

00:43:52 that nature, you know, gets to offer us that, you know,

00:43:56 it’s up to us to grab it and to, you know,

00:44:01 get the best use of it.

00:44:02 Including neural networks, you know, we have a very crude

00:44:06 inspiration from nature on neural networks.

00:44:08 Maybe there’s other inspirations to be discovered

00:44:10 in the brain or other aspects of the various systems,

00:44:16 even like the immune system, the way it interplays.

00:44:20 I recently started to understand that the,

00:44:22 like the immune system has something to do

00:44:24 with the way the brain operates.

00:44:26 Like there’s multiple things going on in there,

00:44:28 which all of which are not modeled

00:44:30 in artificial neural networks.

00:44:32 And maybe if you throw a little bit of that biological spice

00:44:35 in there, you’ll come up with something, something cool.

00:44:39 I’m not sure if you’re familiar with the Drake equation

00:44:43 that estimate, I just did a video on it yesterday

00:44:46 because I wanted to give my own estimate of it.

00:44:49 It’s an equation that combines a bunch of factors

00:44:52 to estimate how many alien civilizations are in the galaxy.

00:44:56 I’ve heard about it, yes.

00:44:58 So one of the interesting parameters, you know,

00:45:01 it’s like how many stars are born every year,

00:45:05 how many planets are on average per star for this,

00:45:11 how many habitable planets are there.

00:45:14 And then the one that starts being really interesting

00:45:18 is the probability that life emerges on a habitable planet.

00:45:24 So like, I don’t know if you think about,

00:45:27 you certainly think a lot about evolution,

00:45:29 but do you think about the thing

00:45:31 which evolution doesn’t describe,

00:45:32 which is like the beginning of evolution, the origin of life.

00:45:36 I think I put the probability of life developing

00:45:39 in a habitable planet at 1%.

00:45:41 This is very scientifically rigorous.

00:45:44 Okay, well, first at a high level for the Drake equation,

00:45:48 what would you put that percent at on earth?

00:45:51 And in general, do you have something,

00:45:55 do you have thoughts about how life might’ve started,

00:45:58 you know, like the proteins being the first kind of,

00:46:01 one of the early jumping points?

00:46:02 Yeah, so I think back in 2018,

00:46:07 there was a very exciting paper published in Nature

00:46:10 where they found one of the simplest amino acids,

00:46:18 glycine, in a comet dust.

00:46:23 So this is, and I apologize if I don’t pronounce,

00:46:29 it’s a Russian named comet,

00:46:31 it’s I think Chugryumov Gerasimenko.

00:46:34 This is the comet where, and there was this mission

00:46:40 to get close to this comet and get the stardust

00:46:46 from its tail.

00:46:48 And when scientists analyzed it,

00:46:50 they actually found traces of, you know, of glycine,

00:46:56 which, you know, makes up, you know,

00:46:59 it’s one of the basic, one of the 20 basic amino acids

00:47:04 that makes up proteins, right?

00:47:06 So that was kind of very exciting, right?

00:47:10 But, you know, the question is very interesting, right?

00:47:14 So what, you know, if there is some alien life,

00:47:18 is it gonna be made of proteins, right?

00:47:22 Or maybe RNAs, right?

00:47:24 So we see that, you know, the RNA viruses are certainly,

00:47:29 you know, very well established sort of, you know,

00:47:35 group of molecular machines, right?

00:47:37 So, yeah, it’s a very interesting question.

00:47:42 What probability would you put?

00:47:43 Like, how hard is this job?

00:47:45 Like, how unlikely just on Earth do you think

00:47:48 this whole thing is that we got going?

00:47:51 Like, are we really lucky or is it inevitable?

00:47:54 Like, what’s your sense when you sit back

00:47:56 and think about life on Earth?

00:47:58 Is it higher or lower than 1%?

00:48:00 Well, because 1% is pretty low, but it still is like,

00:48:03 damn, that’s a pretty good chance.

00:48:05 Yes, it’s a pretty good chance.

00:48:06 I mean, I would, personally, but again, you know,

00:48:10 I’m, you know, probably not the best person

00:48:14 to do such estimations, but I would, you know,

00:48:19 intuitively, I would probably put it lower.

00:48:23 But still, I mean, you know, given.

00:48:24 So we’re really lucky here on Earth.

00:48:27 I mean.

00:48:28 Or the conditions are really good.

00:48:30 It’s, you know, I think that there was,

00:48:32 everything was right in a way, right?

00:48:35 So we still, it’s not, the conditions were not like ideal

00:48:39 if you try to look at, you know, what was, you know,

00:48:44 several billions years ago when the life emerged.

00:48:48 So there is something called the Rare Earth Hypothesis

00:48:52 that, you know, in counter to the Drake Equation says

00:48:55 that the, you know, the conditions of Earth,

00:49:00 if you actually were to describe Earth,

00:49:03 it’s quite a special place.

00:49:05 So special it might be unique in our galaxy

00:49:09 and potentially, you know, close to unique

00:49:11 in the entire universe.

00:49:12 Like it’s very difficult to reconstruct

00:49:14 those same conditions.

00:49:16 And what the Rare Earth Hypothesis argues

00:49:19 is all those different conditions are essential for life.

00:49:23 And so that’s sort of the counter, you know,

00:49:26 like all the things we, you know,

00:49:29 thinking that Earth is pretty average.

00:49:31 I mean, I can’t really, I’m trying to remember

00:49:34 to go through all of them, but just the fact

00:49:36 that it is shielded from a lot of asteroids,

00:49:41 the, obviously the distance to the sun,

00:49:43 but also the fact that it’s like a perfect balance

00:49:48 between the amount of water and land

00:49:52 and all those kinds of things.

00:49:53 I don’t know, there’s a bunch of different factors

00:49:55 that I don’t remember, there’s a long list.

00:49:57 But it’s fascinating to think about if in order

00:50:01 for something like proteins and then DNA and RNA

00:50:05 to emerge, you need, and basic living organisms,

00:50:10 you need to be very close to an Earth like planet,

00:50:14 which will be sad or exciting, I don’t know which.

00:50:19 If you ask me, I, you know, in a way I put a parallel

00:50:23 between, you know, between our own research.

00:50:28 And I mean, from the intuitive perspective,

00:50:33 you know, you have those two extremes

00:50:36 and the reality is never very rarely falls

00:50:40 into the extremes.

00:50:41 It’s always the optimus always reached somewhere in between.

00:50:46 So, and that’s what I tend to think.

00:50:50 I think that, you know, we’re probably somewhere in between.

00:50:54 So they were not unique, unique, but again,

00:50:58 the chances are, you know, reasonably small.

00:51:01 The problem is we don’t know the other extreme

00:51:04 is like, I tend to think that we don’t actually understand

00:51:08 the basic mechanisms of like what this is all originated

00:51:11 from, like, it seems like we think of life

00:51:15 as this distinct thing, maybe intelligence

00:51:17 is a distinct thing, maybe the physics that,

00:51:20 from which planets and suns are born is a distinct thing.

00:51:24 But that could be a very, it’s like the Stephen Wolfram

00:51:27 thing, it’s like the, from simple rules emerges

00:51:29 greater and greater complexity.

00:51:31 So, you know, I tend to believe that just life finds a way.

00:51:36 Like, we don’t know the extreme of how common life is

00:51:39 because it could be life is like everywhere.

00:51:44 Like, so everywhere that it’s almost like laughable,

00:51:49 like that we’re such idiots to think who are you?

00:51:52 Like, it’s like ridiculous to even like think,

00:51:56 it’s like ants thinking that their little colony

00:51:59 is the unique thing and everything else doesn’t exist.

00:52:03 I mean, it’s also very possible that that’s the extreme

00:52:07 and we’re just not able to maybe comprehend

00:52:09 the nature of that life.

00:52:12 Just to stick on alien life for just a brief moment more,

00:52:16 there is some signs of life on Venus in gaseous form.

00:52:22 There’s hope for life on Mars, probably extinct.

00:52:27 We’re not talking about intelligent life.

00:52:29 Although that has been in the news recently.

00:52:32 We’re talking about basic like, you know, bacteria.

00:52:36 Yeah, and then also, I guess, there’s a couple moons.

00:52:40 Europe.

00:52:41 Yeah, Europa, which is Jupiter’s moon.

00:52:45 I think there’s another one.

00:52:46 Are you, is that exciting or is it terrifying to you

00:52:50 that we might find life?

00:52:52 Do you hope we find life?

00:52:53 I certainly do hope that we find life.

00:52:56 I mean, it was very exciting to hear about this news

00:53:05 about the possible life on Venus.

00:53:09 It’d be nice to have hard evidence of something with,

00:53:12 which is what the hope is for Mars and Europa.

00:53:17 But do you think those organisms

00:53:18 will be similar biologically

00:53:20 or would they even be sort of carbon based

00:53:23 if we do find them?

00:53:25 I would say they would be carbon based.

00:53:28 How similar, it’s a big question, right?

00:53:31 So it’s the moment we discover things outside Earth, right?

00:53:39 Even if it’s a tiny little single cell.

00:53:43 I mean, there is so much.

00:53:45 Just imagine that, that would be so.

00:53:47 I think that that would be another turning point

00:53:50 for the science, you know?

00:53:52 Especially if it’s different in some very new way.

00:53:56 That’s exciting.

00:53:57 Because that says, that’s a definitive statement,

00:53:59 not a definitive, but a pretty strong statement

00:54:01 that life is everywhere in the universe.

00:54:05 To me at least, that’s really exciting.

00:54:08 You brought up Joshua Lederberg in an offline conversation.

00:54:13 I think I’d love to talk to you about Alpha Fold

00:54:15 and this might be an interesting way

00:54:17 to enter that conversation because,

00:54:19 so he won the 1958 Nobel Prize in Physiology and Medicine

00:54:24 for discovering that bacteria can mate and exchange genes.

00:54:29 But he also did a ton of other stuff,

00:54:32 like we mentioned, helping NASA find life on Mars

00:54:37 and the…

00:54:40 Dendro. Dendro.

00:54:42 The chemical expert system.

00:54:45 Expert systems, remember those?

00:54:46 What do you find interesting about this guy

00:54:51 and his ideas about artificial intelligence in general?

00:54:54 So I have a kind of personal story to share.

00:55:00 So I started my PhD in Canada back in 2000.

00:55:05 And so essentially my PhD was,

00:55:07 so we were developing sort of a new language

00:55:10 for symbolic machine learning.

00:55:12 So it’s different from the feature based machine learning.

00:55:15 And one of the sort of cleanest applications

00:55:19 of this approach, of this formalism

00:55:23 was to cheminformatics and computer aided drug design.

00:55:28 So essentially we were, as a part of my research,

00:55:33 I developed a system that essentially looked

00:55:37 at chemical compounds of say the same therapeutic category,

00:55:42 you know, male hormones, right?

00:55:45 And try to figure out the structural fragments

00:55:51 that are the structural building blocks

00:55:54 that are important that define this class

00:55:58 versus structural building blocks

00:55:59 that are there just because, you know,

00:56:02 to complete the structure.

00:56:04 But they are not essentially the ones

00:56:06 that make up the chemical, the key chemical properties

00:56:10 of this therapeutic category.

00:56:12 And, you know, for me, it was something new.

00:56:16 I was trained as an applied mathematicians, you know,

00:56:20 as with some machine learning background,

00:56:22 but, you know, computer aided drug design

00:56:25 was a completely new territory.

00:56:27 So because of that, I often find myself

00:56:31 asking lots of questions on one of these

00:56:34 sort of central forums.

00:56:36 Back then, there were no Facebooks or stuff like that.

00:56:40 There was a forum, you know, it’s a forum.

00:56:43 It’s essentially, it’s like a bulletin board.

00:56:45 Yeah.

00:56:46 On the internet.

00:56:47 Yeah, so you essentially, you have a bunch of people

00:56:50 and you post a question and you get, you know,

00:56:52 an answer from, you know, different people.

00:56:55 And back then, just like one of the most popular forums

00:56:59 was CCL, I think Computational Chemistry Library,

00:57:04 not library, but something like that,

00:57:07 but CCL, that was the forum.

00:57:09 And there, I, you know, I…

00:57:12 Asked a lot of dumb questions.

00:57:14 Yes, I asked questions.

00:57:15 Also shared some, you know, some information

00:57:19 about how formal it is and how we do

00:57:21 and whether whatever we do makes sense.

00:57:25 And so, you know, and I remember that one of these posts,

00:57:29 I mean, I still remember, you know,

00:57:31 I would call it desperately looking

00:57:35 for a chemist advice, something like that, right?

00:57:40 And so I post my question, I explained, you know,

00:57:43 how formalism is, what it does

00:57:49 and what kind of applications I’m planning to do.

00:57:53 And, you know, and it was, you know,

00:57:55 in the middle of the night and I went back to bed.

00:57:59 And next morning, have a phone call from my advisor

00:58:04 who also looked at this forum.

00:58:06 It’s like, you won’t believe who replied to you.

00:58:11 And it’s like, who?

00:58:13 And he said, well, you know, there is a message

00:58:16 to you from Joshua Lederberg.

00:58:19 And my reaction was like, who is Joshua Lederberg?

00:58:22 Your advisor hung up. So, and essentially, you know,

00:58:29 Joshua wrote me that we had conceptually similar ideas

00:58:34 in the dendrial project.

00:58:36 You may wanna look it up.

00:58:39 And we should also, sorry, and it’s a side comment,

00:58:42 say that even though he won the Nobel Prize

00:58:45 at a really young age, in 58, but so he was,

00:58:49 I think he was what, 33.

00:58:52 It’s just crazy.

00:58:53 So anyway, so that’s, so hence in the 90s,

00:58:57 responding to young whippersnappers on the CCL forum.

00:59:02 Okay.

00:59:02 And so back then he was already very senior.

00:59:05 I mean, he unfortunately passed away back in 2008,

00:59:09 but, you know, back in 2001, he was, I mean,

00:59:12 he was a professor emeritus at Rockefeller University.

00:59:15 And, you know, that was actually, believe it or not,

00:59:18 one of the reasons I decided to join, you know,

00:59:25 as a postdoc, the group of Andre Salle,

00:59:28 who was at Rockefeller University,

00:59:30 with the hope that, you know, that I could actually,

00:59:33 you know, have a chance to meet Joshua in person.

00:59:38 And I met him very briefly, right?

00:59:42 Just because he was walking, you know,

00:59:45 there’s a little bridge that connects the,

00:59:47 sort of the research campus with the,

00:59:51 with the sort of skyscraper that Rockefeller owns,

00:59:55 the where, you know, postdocs and faculty

00:59:58 and graduate students live.

01:00:00 And so I met him, you know,

01:00:02 and had a very short conversation, you know.

01:00:06 But so I started, you know, reading about Dendral

01:00:10 and I was amazed, you know, it’s,

01:00:12 we’re talking about 1960, right?

01:00:16 The ideas were so profound.

01:00:19 Well, what’s the fun about the ideas of it?

01:00:21 The reason to make this is even crazier.

01:00:25 So, Lederberg wanted to make a system

01:00:29 that would help him study the extraterrestrial molecules,

01:00:38 right?

01:00:39 So, the idea was that, you know,

01:00:40 the way you study the extraterrestrial molecules

01:00:43 is you do the mass spec analysis, right?

01:00:46 And so the mass spec gives you sort of bits,

01:00:49 numbers about essentially gives you the ideas

01:00:52 about the possible fragments or, you know,

01:00:55 atoms, you know, and maybe a little fragments,

01:00:59 pieces of this molecule that make up the molecule, right?

01:01:03 So now you need to sort of,

01:01:06 to decompose this information

01:01:09 and to figure out what was the hole

01:01:12 before it became fragments, bits and pieces, right?

01:01:17 So, in order to make this, you know,

01:01:20 to have this tool, the idea of Lederberg

01:01:25 was to connect chemistry, computer science,

01:01:32 and to design this so called expert system

01:01:36 that looks, that takes into account,

01:01:38 that takes as an input the mass spec data,

01:01:42 the possible database of possible molecules

01:01:47 and essentially try to sort of induce the molecule

01:01:52 that would correspond to this spectra

01:01:55 or, you know, essentially what this project ended up being

01:02:03 was that, you know, it would provide a list of candidates

01:02:07 that then a chemist would look at and make final decision.

01:02:11 So.

01:02:12 But the original idea, I suppose,

01:02:13 is to solve the entirety of this problem automatically.

01:02:16 Yes, yes.

01:02:17 So he, you know, so he,

01:02:21 back then he approached. 60s.

01:02:25 Yes, believe that, it’s amazing.

01:02:28 I mean, it still blows my mind, you know, that it’s,

01:02:32 that’s, and this was essentially the origin

01:02:37 of the modern bioinformatics, cheminformatics,

01:02:41 you know, back in 60s.

01:02:42 So that’s, you know, every time you deal with projects

01:02:48 like this, with the, you know, research like this,

01:02:51 you just, you know, so the power of the, you know,

01:02:56 intelligence of this people is just, you know, overwhelming.

01:03:01 Do you think about expert systems, is there,

01:03:05 and why they kind of didn’t become successful,

01:03:10 especially in the space of bioinformatics,

01:03:12 where it does seem like there is a lot of expertise

01:03:15 in humans, and, you know, it’s possible to see

01:03:20 that a system like this could be made very useful.

01:03:23 Right.

01:03:24 And be built up.

01:03:25 So it’s actually, it’s a great question,

01:03:26 and this is something, so, you know, so, you know,

01:03:30 at my university, I teach artificial intelligence,

01:03:33 and, you know, we start, my first two lectures

01:03:37 are on the history of AI.

01:03:40 And there we, you know, we try to, you know,

01:03:45 go through the main stages of AI.

01:03:48 And so, you know, the question of why expert systems failed

01:03:54 or became obsolete, it’s actually a very interesting one.

01:03:58 And there are, you know, if you try to read the, you know,

01:04:01 the historical perspectives,

01:04:03 there are actually two lines of thoughts.

01:04:05 One is that they were essentially

01:04:11 not up to the expectations.

01:04:14 And so therefore they were replaced, you know,

01:04:18 by other things, right?

01:04:21 The other one was that completely opposite one,

01:04:25 that they were too good.

01:04:28 And as a result, they essentially became

01:04:31 sort of a household name,

01:04:33 and then essentially they got transformed.

01:04:37 I mean, in both cases, sort of the outcome was the same.

01:04:40 They evolved into something, right?

01:04:43 And that’s what I, you know, if I look at this, right?

01:04:47 So the modern machine learning, right?

01:04:50 So.

01:04:51 So there’s echoes in the modern machine learning.

01:04:53 I think so, I think so, because, you know,

01:04:55 if you think about this, you know, and how we design,

01:04:59 you know, the most successful algorithms,

01:05:02 including AlphaFold, right?

01:05:04 You built in the knowledge about the domain

01:05:08 that you study, right?

01:05:09 So you built in your expertise.

01:05:12 So speaking of AlphaFold,

01:05:14 so DeepMind’s AlphaFold 2 recently was announced

01:05:18 to have, quote unquote, solved protein folding.

01:05:21 But how exciting is this to you?

01:05:24 It seems to be one of the,

01:05:27 one of the exciting things that have happened in 2020.

01:05:29 It’s an incredible accomplishment from the looks of it.

01:05:32 What part of it is amazing to you?

01:05:33 What part would you say is over hype

01:05:36 or maybe misunderstood?

01:05:39 It’s definitely a very exciting achievement.

01:05:41 To give you a little bit of perspective, right?

01:05:43 So in bioinformatics, we have several competitions.

01:05:50 And so the way, you know, you often hear

01:05:53 how those competitions have been explained

01:05:56 to sort of to non bioinformaticians is that, you know,

01:05:59 they call it bioinformatics Olympic games.

01:06:01 And there are several disciplines, right?

01:06:03 So the historically one of the first one

01:06:07 was the discipline in predicting the protein structure,

01:06:10 predicting the 3D coordinates of the protein.

01:06:12 But there are some others.

01:06:13 So the predicting protein functions,

01:06:16 predicting effects of mutations on protein functions,

01:06:21 then predicting protein, protein interactions.

01:06:24 So the original one was CASP

01:06:28 or a critical assessment of a protein structure.

01:06:32 And the, you know, typically what happens

01:06:40 during this competitions is, you know, scientists,

01:06:43 experimental scientists solve the structures,

01:06:48 but don’t put them into the protein data bank,

01:06:51 which is the centralized database

01:06:54 that contains all the 3D coordinates.

01:06:57 Instead, they hold it and release protein sequences.

01:07:02 And now the challenge of the community

01:07:05 is to predict the 3D structures of this proteins

01:07:10 and then use the experimental results structures

01:07:12 to assess which one is the closest one, right?

01:07:16 And this competition, by the way,

01:07:17 just a bunch of different tangents.

01:07:19 And maybe you can also say, what is protein folding?

01:07:22 Then this competition, CASP competition

01:07:25 has become the gold standard.

01:07:27 And that’s what was used to say

01:07:29 that protein folding was solved.

01:07:32 So just to add a little, just a bunch.

01:07:35 So if you could, whenever you say stuff,

01:07:37 maybe throw in some of the basics

01:07:39 for the folks that might be outside of the field.

01:07:41 Anyway, sorry.

01:07:42 So, yeah, so, you know, so the reason it’s, you know,

01:07:45 it’s relevant to our understanding of protein folding

01:07:50 is because, you know, we’ve yet to learn

01:07:54 how the folding mechanistically works, right?

01:07:58 So there are different hypothesis,

01:08:00 what happens to this fold?

01:08:02 For example, there is a hypothesis that the folding happens

01:08:07 by, you know, also in the modular fashion, right?

01:08:12 So that, you know, we have protein domains

01:08:16 that get folded independently

01:08:17 because their structure is stable.

01:08:19 And then the whole protein structure gets formed.

01:08:23 But, you know, within those domains,

01:08:25 we also have a so called secondary structure,

01:08:27 the small alpha helices, beta schists.

01:08:29 So these are, you know, elements that are structurally stable.

01:08:34 And so, and the question is, you know,

01:08:37 when do they get formed?

01:08:40 Because some of the secondary structure elements,

01:08:42 you have to have, you know, a fragment in the beginning

01:08:46 and say the fragment in the middle, right?

01:08:49 So you cannot potentially start having the full fold

01:08:54 from the get go, right?

01:08:57 So it’s still, you know, it’s still a big enigma,

01:09:00 what happens.

01:09:01 We know that it’s an extremely efficient

01:09:04 and stable process, right?

01:09:05 So there’s this long sequence

01:09:07 and the fold happens really quickly.

01:09:09 Exactly.

01:09:10 So that’s really weird, right?

01:09:11 And it happens like the same way almost every time.

01:09:15 Exactly, exactly.

01:09:16 That’s really weird.

01:09:17 That’s freaking weird.

01:09:19 It’s, yeah, that’s why it’s such an amazing thing.

01:09:22 But most importantly, right?

01:09:24 So it’s, you know, so when you see the, you know,

01:09:27 the translation process, right?

01:09:29 So when you don’t have the whole protein translated,

01:09:36 right, it’s still being translated,

01:09:37 you know, getting out from the ribosome,

01:09:41 you already see some structural, you know, fragmentation.

01:09:45 So folding starts happening

01:09:49 before the whole protein gets produced, right?

01:09:52 And so this is obviously, you know,

01:09:55 one of the biggest questions in, you know,

01:09:59 in modern molecular biologists.

01:10:00 Not like maybe what happens,

01:10:04 like that’s not as bigger than the question of folding.

01:10:07 That’s the question of like,

01:10:09 something like deeper fundamental idea of folding.

01:10:12 Yes. Behind folding.

01:10:13 Exactly, exactly.

01:10:14 So, you know, so obviously if we are able to predict

01:10:21 the end product of protein folding,

01:10:24 we are one step closer to understanding

01:10:27 sort of the mechanisms of the protein folding.

01:10:30 Because we can then potentially look and start probing

01:10:34 what are the critical parts of this process

01:10:38 and what are not so critical parts of this process.

01:10:41 So we can start decomposing this, you know,

01:10:44 so in a way this protein structure prediction algorithm

01:10:50 can be used as a tool, right?

01:10:53 So you change the, you know, you modify the protein,

01:10:59 you get back to this tool, it predicts,

01:11:02 okay, it’s completely unstable.

01:11:04 Yeah, which aspects of the input

01:11:07 will have a big impact on the output?

01:11:09 Exactly, exactly.

01:11:11 So what happens is, you know,

01:11:13 we typically have some sort of incremental advancement,

01:11:18 you know, each stage of this CASP competition,

01:11:22 you have groups with incremental advancement

01:11:25 and, you know, historically the top performing groups

01:11:29 were, you know, they were not using machine learning.

01:11:34 They were using a very advanced biophysics

01:11:37 combined with bioinformatics,

01:11:39 combined with, you know, the data mining

01:11:43 and that was, you know, that would enable them

01:11:47 to obtain protein structures of those proteins

01:11:52 that don’t have any structurally solved relatives

01:11:57 because, you know, if we have another protein,

01:12:01 say the same protein, but coming from a different species,

01:12:07 we could potentially derive some ideas

01:12:10 and that’s so called homology or comparative modeling,

01:12:13 where we’ll derive some ideas

01:12:15 from the previously known structures

01:12:17 and that would help us tremendously

01:12:19 in, you know, in reconstructing the 3D structure overall.

01:12:25 But what happens when we don’t have these relatives?

01:12:27 This is when it becomes really, really hard, right?

01:12:31 So that’s so called de novo, you know,

01:12:35 de novo protein structure prediction.

01:12:37 And in this case, those methods were traditionally very good.

01:12:43 But what happened in the last year,

01:12:46 the original alpha fold came into

01:12:50 and all of a sudden it’s much better than everyone else.

01:12:56 This is 2018.

01:12:57 Yeah.

01:12:58 Oh, and the competition is only every two years, I think.

01:13:02 And then, so, you know, it was sort of kind of over shockwave

01:13:08 to the bioinformatics community that, you know,

01:13:10 we have like a state of the art machine learning system

01:13:15 that does, you know, structure prediction.

01:13:18 And essentially what it does, you know,

01:13:20 so if you look at this, it actually predicts the context.

01:13:26 So, you know, so the process of reconstructing

01:13:29 the 3D structure starts by predicting the context

01:13:34 between the different parts of the protein.

01:13:38 And the context essentially is the parts of the proteins

01:13:40 that are in a close proximity to each other.

01:13:43 Right, so actually the machine learning part

01:13:45 seems to be estimating, you can correct me if I’m wrong here,

01:13:51 but it seems to be estimating the distance matrix,

01:13:53 which is like the distance between the different parts.

01:13:55 Yeah, so we call the contact map.

01:13:58 Contact map.

01:13:58 So once you have the contact map,

01:14:00 the reconstruction is becoming more straightforward, right?

01:14:04 But so the contact map is the key.

01:14:06 And so, you know, so that what happened.

01:14:11 And now we started seeing in this current stage, right?

01:14:15 Well, in the most recent one,

01:14:18 we started seeing the emergence of these ideas

01:14:22 in other people works, right?

01:14:25 But yet here’s, you know, AlphaFold2

01:14:29 that again outperforms everyone else.

01:14:33 And also by introducing yet another wave

01:14:35 of the machine learning ideas.

01:14:38 Yeah, there don’t seem to be also an incorporation.

01:14:41 First of all, the paper is not out yet,

01:14:43 but there’s a bunch of ideas already out.

01:14:44 There does seem to be an incorporation of this other thing.

01:14:48 I don’t know if it’s something that you could speak to,

01:14:50 which is like the incorporation of like other structures,

01:14:58 like evolutionary similar structures

01:15:01 that are used to kind of give you hints.

01:15:03 Yes, so evolutionary similarity is something

01:15:08 that we can detect at different levels, right?

01:15:10 So we know, for example,

01:15:12 that the structure of proteins

01:15:17 is more conserved than the sequence.

01:15:20 The sequence could be very different,

01:15:22 but the structural shape is actually still very conserved.

01:15:26 So that’s sort of the intrinsic property that, you know,

01:15:28 in a way related to protein folds,

01:15:31 you know, to the evolution of the, you know,

01:15:34 of the proteins and protein domains, et cetera.

01:15:37 But we know that, I mean, there’ve been multiple studies.

01:15:41 And, you know, ideally, if you have structures,

01:15:45 you know, you should use that information.

01:15:48 However, sometimes we don’t have this information.

01:15:51 Instead, we have a bunch of sequences.

01:15:53 Sequences, we have a lot, right?

01:15:54 So we have, you know, hundreds, thousands

01:16:00 of, you know, different organisms sequenced, right?

01:16:04 And by taking the same protein,

01:16:07 but in different organisms and aligning it,

01:16:11 so making it, you know, making the corresponding positions

01:16:15 aligned, we can actually say a lot

01:16:20 about sort of what is conserved in this protein

01:16:24 and therefore, you know, structurally more stable,

01:16:26 what is diverse in this protein.

01:16:28 So on top of that, we could provide sort of the information

01:16:32 about the sort of the secondary structure

01:16:35 of this protein, et cetera, et cetera.

01:16:36 So this information is extremely useful

01:16:39 and it’s already there.

01:16:41 So while it’s tempting to, you know,

01:16:44 to do a complete ab initio,

01:16:46 so you just have a protein sequence and nothing else,

01:16:49 the reality is such that we are overwhelmed with this data.

01:16:54 So why not use it?

01:16:56 And so, yeah, so I’m looking forward

01:16:59 to reading this paper.

01:17:01 It does seem to, like they’ve,

01:17:03 in the previous version of Alpha Fold,

01:17:05 they didn’t, for this evolutionary similarity thing,

01:17:09 they didn’t use machine learning for that.

01:17:12 Or rather, they used it as like the input

01:17:15 to the entirety of the neural net,

01:17:17 like the features derived from the similarity.

01:17:22 It seems like there’s some kind of quote, unquote,

01:17:24 iterative thing where it seems to be part of the learning

01:17:30 process is the incorporation of this evolutionary similarity.

01:17:34 Yeah, I don’t think there is a bioarchive paper, right?

01:17:36 There’s nothing.

01:17:37 No, there’s nothing.

01:17:38 There’s a blog post that’s written

01:17:40 by a marketing team, essentially,

01:17:42 which, you know, it has some scientific similarity,

01:17:48 probably, to the actual methodology used,

01:17:51 but it could be, it’s like interpreting scripture.

01:17:55 It could be just poetic interpretations of the actual work

01:17:59 as opposed to direct connection to the work.

01:18:01 So now, speaking about protein folding, right?

01:18:04 So, you know, in order to answer the question

01:18:06 whether or not we have solved this, right?

01:18:09 So we need to go back to the beginning of our conversation

01:18:13 with the realization that an average protein

01:18:16 is that typically what the CASP has been focusing on

01:18:22 is this competition has been focusing

01:18:25 on the single, maybe two domain proteins

01:18:29 that are still very compact.

01:18:31 And even those ones are extremely challenging to solve.

01:18:35 But now we talk about, you know,

01:18:37 an average protein that has two, three protein domains.

01:18:42 If you look at the proteins that are in charge

01:18:46 of the, you know, of the process with the neural system,

01:18:51 right, perhaps one of the most recently evolved

01:18:58 sort of systems in an organism, right?

01:19:03 All of them, well, the majority of them

01:19:06 are highly multi domain proteins.

01:19:09 So they are, you know, some of them have five, six, seven,

01:19:13 you know, and more domains, right?

01:19:16 And, you know, we are very far away

01:19:20 from understanding how these proteins are folded.

01:19:22 So the complexity of the protein matters here.

01:19:24 The complexity of the protein modules

01:19:27 or the protein domains.

01:19:30 So you’re saying solved, so the definition

01:19:35 of solved here is particularly the CASP competition

01:19:38 achieving human level, not human level,

01:19:41 achieving experimental level performance

01:19:45 on these particular sets of proteins

01:19:48 that have been used in these competitions.

01:19:50 Well, I mean, you know, I do think that, you know,

01:19:54 especially with regards to the alpha fold,

01:19:57 you know, it is able to, you know, to solve,

01:20:03 you know, at the near experimental level,

01:20:08 pre big majority of the more compact proteins

01:20:15 like, or protein domains.

01:20:16 Because again, in order to understand

01:20:18 how the overall protein, you know,

01:20:22 multi domain protein fold, we do need to understand

01:20:26 the structure of its individual domains.

01:20:28 I mean, unlike if you look at alpha zero

01:20:31 or like even mu zero, if you look at that work,

01:20:36 you know, it’s nice reinforcement learning

01:20:39 self playing mechanisms are nice

01:20:41 cause it’s all in simulation.

01:20:42 So you can learn from just huge amounts.

01:20:45 Like you don’t need data.

01:20:47 It was like the problem with proteins,

01:20:49 like the size, I forget how many 3D structures

01:20:54 have been mapped, but the training data is very small.

01:20:56 No matter what, it’s like millions,

01:20:59 maybe a one or two million or something like that,

01:21:01 but it’s some very small number,

01:21:02 but like, it doesn’t seem like that’s scalable.

01:21:06 There has to be, I don’t know,

01:21:09 it feels like you want to somehow 10 X the data

01:21:13 or a hundred X the data somehow.

01:21:15 Yes, but we also can take advantage of homology models,

01:21:20 right, so the models that are of very good quality

01:21:26 because they are essentially obtained

01:21:30 based on the evolutionary information, right?

01:21:33 So you can, there is a potential to enhance this information

01:21:38 and, you know, use it again to empower the training set.

01:21:43 And it’s, I think, I am actually very optimistic.

01:21:49 I think it’s been one of this sort of, you know,

01:21:58 churning events where you have a system that is,

01:22:05 you know, a machine learning system

01:22:07 that is truly better than the machine learning system.

01:22:12 Better than the sort of the more conventional

01:22:15 biophysics based methods.

01:22:17 That’s a huge leap.

01:22:19 This is one of those fun questions,

01:22:21 but where would you put it in the ranking

01:22:26 of the greatest breakthroughs

01:22:28 in artificial intelligence history?

01:22:31 So like, okay, so let’s see who’s in the running.

01:22:34 Maybe you can correct me.

01:22:35 So you got like AlphaZero and AlphaGo

01:22:39 beating the world champion at the game of Go.

01:22:44 Thought to be impossible like 20 years ago.

01:22:48 Or at least the AI community was highly skeptical.

01:22:51 Then you got like also Deep Blue original Kasparov.

01:22:55 You have deep learning itself,

01:22:56 like the maybe, what would you say,

01:22:58 the AlexNet, ImageNet moment.

01:23:00 So the first neural network

01:23:02 achieving human level performance.

01:23:04 Super, that’s not true.

01:23:07 Achieving like a big leap in performance

01:23:10 on the computer vision problem.

01:23:14 There is OpenAI, the whole like GPT3,

01:23:18 that whole space of transformers and language models

01:23:23 just achieving this incredible performance

01:23:27 of application of neural networks to language models.

01:23:31 Boston Dynamics, pretty cool.

01:23:33 Like robotics.

01:23:35 People are like, there’s no AI.

01:23:38 No, no, there’s no machine learning currently.

01:23:41 But AI is much bigger than machine learning.

01:23:44 So that just the engineering aspect,

01:23:48 I would say it’s one of the greatest accomplishments

01:23:50 in engineering side.

01:23:52 Engineering meaning like mechanical engineering

01:23:56 of robotics ever.

01:23:57 Then of course, autonomous vehicles.

01:23:59 You can argue for Waymo,

01:24:01 which is like the Google self driving car.

01:24:03 Or you can argue for Tesla,

01:24:05 which is like actually being used

01:24:07 by hundreds of thousands of people on the road today,

01:24:10 machine learning system.

01:24:13 And I don’t know if you can, what else is there?

01:24:17 But I think that’s it.

01:24:18 And then AlphaFold, many people are saying

01:24:20 is up there, potentially number one.

01:24:23 Would you put them at number one?

01:24:24 Well, in terms of the impact on the science

01:24:29 and on the society beyond, it’s definitely,

01:24:34 to me would be one of the…

01:24:37 Top three?

01:24:39 What you want?

01:24:39 Maybe, I mean, I’m probably not the best person

01:24:43 to answer that.

01:24:45 But I do have, I remember my,

01:24:51 back in, I think 1997, when Deep Blue,

01:24:56 that Kasparov, it was, I mean, it was a shock.

01:25:01 I mean, it was, and I think for the,

01:25:04 for the pre substantial part of the world,

01:25:14 that especially people who have some experience with chess,

01:25:21 and realizing how incredibly human this game,

01:25:25 how much of a brain power you need

01:25:30 to reach those levels of grandmasters, right, level.

01:25:36 And it’s probably one of the first time,

01:25:37 and how good Kasparov was.

01:25:39 And again, yeah, so Kasparov’s arguably

01:25:42 one of the best ever, right?

01:25:45 And you get a machine that beats him.

01:25:47 All right, so it’s…

01:25:48 First time a machine probably beat a human

01:25:50 at that scale of a thing, of anything.

01:25:53 Yes, yes.

01:25:54 So that was, to me, that was like, you know,

01:25:57 one of the groundbreaking events in the history of AI.

01:26:00 Yeah, that’s probably number one.

01:26:02 Probably, like we don’t, it’s hard to remember.

01:26:05 It’s like Muhammad Ali versus, I don’t know,

01:26:08 any of the Mike Tyson, something like that.

01:26:09 It’s like, nah, you gotta put Muhammad Ali at number one.

01:26:13 Same with Deep Blue,

01:26:15 even though it’s not machine learning based.

01:26:19 Still, it uses advanced search,

01:26:21 and search is the integral part of AI, right?

01:26:24 It’s not, people don’t think of it that way at this moment.

01:26:27 In vogue currently, search is not seen

01:26:30 as a fundamental aspect of intelligence,

01:26:34 but it very well, I mean, it very likely is.

01:26:37 In fact, I mean, that’s what neural networks are,

01:26:39 is they’re just performing search

01:26:41 on the space of parameters, and it’s all search.

01:26:45 All of intelligence is some form of search,

01:26:47 and you just have to become cleverer and clever

01:26:49 at that search problem.

01:26:50 And I also have another one that you didn’t mention

01:26:53 that’s one of my favorite ones is,

01:26:58 so you’ve probably heard of this,

01:26:59 it’s, I think it’s called Deep Rembrandt.

01:27:03 It’s the project where they trained,

01:27:06 I think there was a collaboration

01:27:08 between the sort of the experts

01:27:11 in Rembrandt painting in Netherlands,

01:27:15 and a group, an artificial intelligence group,

01:27:18 where they train an algorithm

01:27:20 to replicate the style of the Rembrandt,

01:27:22 and they actually printed a portrait

01:27:26 that never existed before in the style of Rembrandt.

01:27:32 I think they printed it on a sort of,

01:27:36 on the canvas that, you know,

01:27:38 using pretty much same types of paints and stuff.

01:27:42 To me, it was mind blowing.

01:27:44 Yeah, and the space of art, that’s interesting.

01:27:46 There hasn’t been, maybe that’s it,

01:27:50 but I think there hasn’t been an image in that moment yet

01:27:54 in the space of art.

01:27:56 You haven’t been able to achieve

01:27:58 superhuman level performance in the space of art,

01:28:01 even though there’s this big famous thing

01:28:04 where a piece of art was purchased,

01:28:07 I guess for a lot of money.

01:28:08 Yes.

01:28:09 Yeah, but it’s still, you know,

01:28:11 people are like in the space of music at least,

01:28:15 that’s, you know, it’s clear that human created pieces

01:28:19 are much more popular.

01:28:21 So there hasn’t been a moment where it’s like,

01:28:24 oh, this is, we’re now,

01:28:26 I would say in the space of music,

01:28:28 what makes a lot of money,

01:28:30 we’re talking about serious money,

01:28:32 it’s music and movies, or like shows and so on,

01:28:35 and entertainment.

01:28:36 There hasn’t been a moment where AI created,

01:28:41 AI was able to create a piece of music

01:28:44 or a piece of cinema, like Netflix show,

01:28:49 that is, you know, that’s sufficiently popular

01:28:53 to make a ton of money.

01:28:55 Yeah.

01:28:56 And that moment would be very, very powerful,

01:28:58 because that’s like, that’s an AI system

01:29:01 being used to make a lot of money.

01:29:03 And like direct, of course, AI tools,

01:29:05 like even Premiere, audio editing,

01:29:07 all the editing, everything I do,

01:29:08 to edit this podcast, there’s a lot of AI involved.

01:29:11 Actually, this is a program,

01:29:13 I wanna talk to those folks, just cause I wanna nerd out,

01:29:15 it’s called iZotope, I don’t know if you’re familiar with it.

01:29:18 They have a bunch of tools of audio processing,

01:29:20 and they have, I think they’re Boston based,

01:29:23 just, it’s so exciting to me to use it,

01:29:26 like on the audio here,

01:29:28 cause it’s all machine learning.

01:29:30 It’s not, cause most audio production stuff

01:29:35 is like any kind of processing you do,

01:29:37 it’s very basic signal processing,

01:29:39 and you’re tuning knobs and so on.

01:29:41 They have all of that, of course,

01:29:43 but they also have all of this machine learning stuff,

01:29:46 like where you actually give it training data,

01:29:48 you select parts of the audio you train on,

01:29:51 you train on it, and it figures stuff out.

01:29:56 It’s great, it’s able to detect,

01:29:59 like the ability of it to be able

01:30:01 to separate voice and music, for example,

01:30:04 or voice and anything, is incredible.

01:30:07 Like it just, it’s clearly exceptionally good

01:30:11 at applying these different neural networks models

01:30:14 to just separate the different kinds

01:30:17 of signals from the audio.

01:30:19 That, okay, so that’s really exciting.

01:30:22 Photoshop, Adobe people also use it,

01:30:24 but to generate a piece of music

01:30:28 that will sell millions, a piece of art, yeah.

01:30:31 No, I agree, and you know, it’s,

01:30:34 that’s, you know, as I mentioned,

01:30:39 I offer my AI class, and you know,

01:30:41 an integral part of this is the project, right?

01:30:44 So it’s my favorite, ultimate favorite part,

01:30:47 because it typically, we have these project presentations

01:30:51 the last two weeks of the classes,

01:30:53 right before, you know, the Christmas break,

01:30:56 and it’s sort of, it adds this cool excitement,

01:31:00 and every time, I mean, I’m amazed, you know,

01:31:02 with some projects that people, you know, come up with.

01:31:07 And so, and quite a few of them are actually, you know,

01:31:12 they have some link to arts.

01:31:17 I mean, you know, I think last year we had a group

01:31:21 who designed an AI producing hokus, Japanese poems.

01:31:27 Oh, wow.

01:31:29 So, and some of them, so, you know,

01:31:31 it got trained on the English based,

01:31:34 haikus, haikus, right?

01:31:36 So, and some of them, you know,

01:31:40 they get to present, like, the top selection.

01:31:43 They were pretty good.

01:31:44 I mean, you know, I mean, of course, I’m not a specialist,

01:31:47 but you read them, and you see this is real.

01:31:49 It seems profound.

01:31:50 Yes, yeah, it seems real.

01:31:52 So it’s kind of cool.

01:31:55 We also had a couple of projects where people tried

01:31:57 to teach AI how to play, like, rock music, classical music.

01:32:02 I think, and popular music.

01:32:05 Yeah.

01:32:07 Interestingly enough, you know,

01:32:10 classical music was among the most difficult ones.

01:32:14 Oh, sure.

01:32:15 And, you know, of course, if you, if, you know,

01:32:21 you know, if you look at the, you know,

01:32:23 the, like, grandmasters of music, like Bach, right?

01:32:28 So there is a lot of, there is a lot of,

01:32:31 there is a lot of almost math.

01:32:34 Yeah, well, he’s very mathematical.

01:32:36 Yeah, exactly.

01:32:37 So this is, I would imagine that at least some style

01:32:41 of this music could be picked up,

01:32:43 but then you have this completely different spectrum

01:32:46 of classical composers.

01:32:49 And so, you know, it’s almost like, you know,

01:32:54 you don’t have to sort of look at the data.

01:32:56 You just listen to it and say, nah, that’s not it, not yet.

01:33:01 That’s not it, yeah.

01:33:02 That’s how I feel too.

01:33:03 There’s OpenAI has, I think, OpenMuse

01:33:05 or something like that, the system.

01:33:07 It’s cool, but it’s like, eh,

01:33:09 it’s not compelling for some reason.

01:33:12 It could be a psychological reason too.

01:33:14 Maybe we need to have a human being,

01:33:17 a tortured soul behind the music.

01:33:19 I don’t know.

01:33:20 Yeah, no, absolutely.

01:33:22 I completely agree.

01:33:23 But yeah, whether or not we’ll have,

01:33:26 one day we’ll have, you know,

01:33:29 a song written by an AI engine

01:33:33 to be like in top charts, musical charts,

01:33:37 I wouldn’t be surprised.

01:33:40 I wouldn’t be surprised.

01:33:43 I wonder if we already have one

01:33:44 and it just hasn’t been announced.

01:33:48 We wouldn’t know.

01:33:49 How hard is the multi protein folding problem?

01:33:53 Is that kind of something you’ve already mentioned

01:33:57 which is baked into this idea of greater

01:33:59 and greater complexity of proteins?

01:34:01 Like multi domain proteins,

01:34:03 is that basically become multi protein complexes?

01:34:08 Yes, you got it right.

01:34:10 So it’s sort of, it has the components

01:34:15 of both of protein folding

01:34:18 and protein, protein interactions.

01:34:21 Because in order for these domains,

01:34:24 many of these proteins actually,

01:34:27 they never form a stable structure.

01:34:31 One of my favorite proteins,

01:34:33 and pretty much everyone who works in the,

01:34:37 I know, whom I know, who works with proteins,

01:34:41 they always have their favorite proteins.

01:34:44 Right, so one of my favorite proteins,

01:34:47 probably my favorite protein,

01:34:49 the one that I worked when I was a postdoc

01:34:51 is so called post synaptic density 95, PSD 95 protein.

01:34:56 So it’s one of the key actors

01:35:00 in the majority of neurological processes

01:35:03 at the molecular level.

01:35:04 So it’s a, and essentially it’s a key player

01:35:11 in the post synaptic density.

01:35:13 So this is the crucial part of this synapse

01:35:17 where a lot of these chemical processes are happening.

01:35:22 So it has five domains, right?

01:35:26 So five protein domains.

01:35:27 So pretty large proteins, I think 600 something assets.

01:35:35 But the way it’s organized itself, it’s flexible, right?

01:35:41 So it acts as a scaffold.

01:35:43 So it is used to bring in other proteins.

01:35:49 So they start acting in the orchestrated manner, right?

01:35:54 So, and the type of the shape of this protein,

01:35:58 it’s in a way, there are some stable parts of this protein,

01:36:02 but there are some flexible.

01:36:04 And this flexibility is built in into the protein

01:36:08 in order to become sort of this multifunctional machine.

01:36:13 So do you think that kind of thing is also learnable

01:36:16 through the alpha fold two kind of approach?

01:36:19 I mean, the time will tell.

01:36:22 Is it another level of complexity?

01:36:24 Is it like how big of a jump in complexity

01:36:27 is that whole thing?

01:36:28 To me, it’s yet another level of complexity

01:36:31 because when we talk about protein, protein interactions,

01:36:35 and there is actually a different challenge for this

01:36:38 called Capri, and so this, that is focused specifically

01:36:43 on macromolecular interactions, protein, protein, protein,

01:36:47 DNA, et cetera.

01:36:48 So, but it’s, there are different mechanisms

01:36:56 that govern molecular interactions

01:36:58 and that need to be picked up,

01:37:00 say by a machine learning algorithm.

01:37:03 Interestingly enough, we actually,

01:37:06 we participated for a few years in this competition.

01:37:11 We typically don’t participate in competitions,

01:37:14 I don’t know, don’t have enough time,

01:37:19 because it’s very intensive, it’s a very intensive process.

01:37:23 But we participated back in about 10 years ago or so.

01:37:30 And the way we entered this competition,

01:37:32 so we design a scoring function, right?

01:37:35 So the function that evaluates

01:37:37 whether or not your protein, protein interaction

01:37:40 is supposed to look like experimentally solved, right?

01:37:43 So the scoring function is very critical part

01:37:45 of the model prediction.

01:37:49 So we designed it to be a machine learning one.

01:37:52 And so it was one of the first machine learning

01:37:56 based scoring function used in Capri.

01:38:00 And we essentially learned what should contribute,

01:38:06 what are the critical components contributing

01:38:08 into the protein, protein interaction.

01:38:10 So this could be converted into a learning problem

01:38:13 and thereby it could be learned?

01:38:15 I believe so, yes.

01:38:17 Do you think AlphaFold2 or something similar to it

01:38:20 from DeepMind or somebody else will be,

01:38:24 will result in a Nobel Prize or multiple Nobel Prizes?

01:38:28 So like, you know, obviously, maybe not so obviously,

01:38:33 you can’t give a Nobel Prize to a computer program.

01:38:38 At least for now, give it to the designers of that program.

01:38:42 But do you see one or multiple Nobel Prizes

01:38:46 where AlphaFold2 is like a large percentage

01:38:51 of what that prize is given for?

01:38:54 Would it lead to discoveries at the level of Nobel Prizes?

01:39:00 I mean, I think we are definitely destined

01:39:05 to see the Nobel Prize becoming sort of,

01:39:08 to be evolving with the evolution of science

01:39:12 and the evolution of science as such

01:39:14 that it now becomes like really multi facets, right?

01:39:17 So where you don’t really have like a unique discipline,

01:39:21 you have sort of the, a lot of cross disciplinary talks

01:39:25 in order to achieve sort of, you know,

01:39:28 really big advancements, you know.

01:39:32 So I think, you know, the computational methods

01:39:39 will be acknowledged in one way or another.

01:39:42 And as a matter of fact, you know,

01:39:46 they were first acknowledged back in 2013, right?

01:39:50 Where, you know, the first three people were, you know,

01:39:56 awarded the Nobel Prize for study the protein folding,

01:40:00 right, the principle.

01:40:01 And, you know, I think all three of them

01:40:03 are computational biophysicists, right?

01:40:06 So, you know, that I think is unavoidable.

01:40:13 You know, it will come with the time.

01:40:16 The fact that, you know, alpha fold and, you know,

01:40:23 similar approaches, because again, it’s a matter of time

01:40:26 that people will embrace this, you know, principle

01:40:31 and we’ll see more and more such, you know,

01:40:34 such tools coming into play.

01:40:36 But, you know, these methods will be critical

01:40:41 in a scientific discovery, no doubts about it.

01:40:47 On the engineering side, maybe a dark question,

01:40:51 but do you think it’s possible to use

01:40:53 these machine learning methods

01:40:55 to start to engineer proteins?

01:40:59 And the next question is something quite a few biologists

01:41:04 are against, some are for, for study purposes,

01:41:07 is to engineer viruses.

01:41:09 Do you think machine learning, like something like alpha fold

01:41:12 could be used to engineer viruses?

01:41:14 So to answer the first question, you know,

01:41:16 it has been, you know, a part of the research

01:41:21 in the protein science, the protein design is, you know,

01:41:25 is a very prominent areas of research.

01:41:29 Of course, you know, one of the pioneers is David Baker

01:41:32 and Rosetta algorithm that, you know,

01:41:34 essentially was doing the de novo design and was used

01:41:39 to design new proteins, you know.

01:41:41 And design of proteins means design of function.

01:41:44 So like when you design a protein, you can control,

01:41:47 I mean, the whole point of a protein

01:41:49 with the protein structure comes a function,

01:41:52 like it’s doing something.

01:41:53 Correct.

01:41:54 So you can design different things.

01:41:56 So you can, yeah, so you can, well,

01:41:58 you can look at the proteins from the functional perspective.

01:42:00 You can also look at the proteins

01:42:02 from the structural perspective, right?

01:42:04 So the structural building blocks.

01:42:05 So if you want to have a building block

01:42:07 of a certain shape, you can try to achieve it

01:42:10 by, you know, introducing a new protein sequence

01:42:13 and predicting, you know, how it will fold.

01:42:17 So with that, I mean, it’s a natural,

01:42:22 one of the, you know, natural applications

01:42:25 of these algorithms.

01:42:28 Now, talking about engineering a virus.

01:42:34 With machine learning.

01:42:35 With machine learning, right?

01:42:36 So, well, you know, so luckily for us,

01:42:41 I mean, we don’t have that much data, right?

01:42:46 Yeah.

01:42:47 We actually, right now, one of the projects

01:42:50 that we are carrying on in the lab

01:42:53 is we’re trying to develop a machine learning algorithm

01:42:56 that determines the,

01:42:59 whether or not the current strain is pathogenic.

01:43:02 And the current strain of the coronavirus.

01:43:04 Of the virus.

01:43:06 I mean, so there are applications to coronaviruses

01:43:08 because we have strains of SARS COVID 2,

01:43:11 also SARS COVID, MERS that are pathogenic,

01:43:14 but we also have strains of other coronaviruses

01:43:17 that are, you know, not pathogenic.

01:43:20 I mean, the common cold viruses and, you know,

01:43:24 some other ones, right?

01:43:25 So, so pathogenic meaning spreading.

01:43:28 Pathogenic means actually inflicting damage.

01:43:33 Correct.

01:43:35 There are also some, you know,

01:43:37 seasonal versus pandemic strains of influenza, right?

01:43:41 And determining the, what are the molecular determinant,

01:43:45 right?

01:43:46 So that are built in, into the protein sequence,

01:43:48 into the gene sequence, right?

01:43:50 So, and whether or not the machine learning

01:43:52 can determine those, those components, right?

01:43:58 Oh, interesting.

01:43:59 So like using machine learning to do,

01:44:00 that’s really interesting to, to, to given,

01:44:03 give the input is like what the entire,

01:44:07 the protein sequence and then determine

01:44:09 if this thing is going to be able to do damage

01:44:12 to a biological system.

01:44:14 Yeah.

01:44:15 So, so I mean,

01:44:16 It’s a good machine learning,

01:44:17 you’re saying we don’t have enough data for that?

01:44:19 We, I mean, for, for this specific one, we do.

01:44:22 We might actually, I have, you know,

01:44:24 have to back up on this because we’re still in the process.

01:44:27 There was one work that appeared in bioarchive

01:44:31 by Eugene Kunin, who is one of these, you know,

01:44:34 pioneers in, in, in evolutionary genomics.

01:44:39 And they tried to look at this, but, you know,

01:44:42 the methods were sort of standard, you know,

01:44:46 supervised learning methods.

01:44:48 And now the question is, you know,

01:44:51 can you advance it further by, by using, you know,

01:44:56 not so standard methods, you know?

01:44:58 So there’s obviously a lot of hope in,

01:45:01 in transfer learning where you can actually try to transfer

01:45:05 the information that the machine learning learns about

01:45:08 the proper protein sequences, right?

01:45:11 And, you know, so, so there is some promise

01:45:16 in going this direction, but if we have this,

01:45:18 it would be extremely useful because then

01:45:21 we could essentially forecast the potential mutations

01:45:24 that would make the current strain

01:45:26 more or less pathogenic.

01:45:27 Anticipate, anticipate them from a vaccine development,

01:45:31 for the treatment, antiviral drug development.

01:45:34 That, that would be a very crucial task.

01:45:36 But you could also use that system to then say,

01:45:42 how would we potentially modify this virus

01:45:45 to make it more pathogenic?

01:45:47 This, that’s true.

01:45:49 That’s true.

01:45:50 And then, you know, the, again,

01:45:55 the hope is, well, several things, right?

01:45:59 So one is that, you know, it’s,

01:46:02 even if you design a, you know, a sequence, right?

01:46:06 So to carry out the actual experimental biology,

01:46:12 to ensure that all the components working, you know,

01:46:16 is a completely different matter.

01:46:19 Difficult process.

01:46:19 Yes.

01:46:20 Then the, you know, we’ve seen in the past,

01:46:24 there could be some regulation of the moment

01:46:27 the scientific community recognizes

01:46:30 that it’s now becoming no longer a sort of a fun puzzle

01:46:34 to, you know, for machine learning.

01:46:36 Could be open.

01:46:37 Yeah, so then there might be some regulation.

01:46:40 So I think back in, what, 2015, there was, you know,

01:46:45 there was an issue on regulating the research

01:46:49 on influenza strains, right?

01:46:52 There were several groups, you know,

01:46:55 used sort of the mutation analysis

01:46:58 to determine whether or not this strain will jump

01:47:01 from one species to another.

01:47:03 And I think there was like a half a year moratorium

01:47:06 on the research on the paper published

01:47:09 until, you know, scientists, you know, analyzed it

01:47:13 and decided that it’s actually safe.

01:47:16 I forgot what that’s called.

01:47:17 Something of function, test of function.

01:47:20 Gain of function.

01:47:20 Gain of function, yeah.

01:47:22 Gain of function, loss of function, that’s right.

01:47:24 Sorry.

01:47:26 It’s like, let’s watch this thing mutate for a while

01:47:29 to see like, to see what kind of things we can observe.

01:47:33 I guess I’m not so much worried

01:47:36 about that kind of research if there’s a lot of regulation

01:47:38 and if it’s done very well and with competence and seriously.

01:47:42 I am more worried about kind of this, you know,

01:47:46 the underlying aspect of this question

01:47:49 is more like 50 years from now.

01:47:52 Speaking to the Drake equation,

01:47:54 one of the parameters in the Drake equation

01:47:57 is how long civilizations last.

01:47:59 And that seems to be the most important value actually

01:48:03 for calculating if there’s other alien

01:48:06 intelligent civilizations out there.

01:48:08 That’s where there’s most variability.

01:48:10 Assuming like if life, if that percentage

01:48:15 that life can emerge is like not zero,

01:48:19 like if we’re a super unique,

01:48:21 then it’s the how long we last

01:48:23 is basically the most important thing.

01:48:26 So from a selfish perspective,

01:48:29 but also from a Drake equation perspective,

01:48:32 I’m worried about our civilization lasting.

01:48:35 And you kind of think about all the ways

01:48:37 in which machine learning can be used

01:48:39 to design greater weapons of destruction, right?

01:48:45 And I mean, one way to ask that

01:48:48 if you look sort of 50 years from now,

01:48:50 a hundred years from now,

01:48:52 would you be more worried about natural pandemics

01:48:55 or engineered pandemics?

01:48:59 Like who’s the better designer of viruses,

01:49:02 nature or humans if we look down the line?

01:49:05 I think in my view, I would still be worried

01:49:10 about the natural pandemics simply because I mean,

01:49:14 the capacity of the nature producing this.

01:49:20 It does pretty good job, right?

01:49:22 Yes.

01:49:23 And the motivation for using virus,

01:49:25 engineering viruses as a weapon is a weird one

01:49:29 because maybe you can correct me on this,

01:49:31 but it seems very difficult to target a virus, right?

01:49:35 The whole point of a weapon, the way a rocket works,

01:49:38 if a starting point, you have an end point

01:49:40 and you’re trying to hit a target,

01:49:42 to hit a target with a virus is very difficult.

01:49:44 It’s basically just, right?

01:49:47 The target would be the human species.

01:49:51 Oh man.

01:49:52 Yeah, I have a hope in us.

01:49:54 I’m forever optimistic that we will not,

01:49:58 there’s insufficient evil in the world

01:50:01 to lead to that kind of destruction.

01:50:04 Well, I also hope that, I mean, that’s what we see.

01:50:07 I mean, with the way we are getting connected,

01:50:11 the world is getting connected.

01:50:14 I think it helps for the world to become more transparent.

01:50:21 Yeah.

01:50:22 So the information spread is,

01:50:27 I think it’s one of the key things for the society

01:50:31 to become more balanced one way or another.

01:50:36 This is something that people disagree with me on,

01:50:38 but I do think that the kind of secrecy

01:50:41 that governments have.

01:50:43 So you’re kind of speaking more to the other aspects,

01:50:47 like a research community being more open,

01:50:49 companies are being more open.

01:50:52 Government is still like,

01:50:55 we’re talking about like military secrets.

01:50:57 I think military secrets of the kind

01:51:01 that could destroy the world

01:51:03 will become also a thing of the 20th century.

01:51:07 It’ll become more and more open.

01:51:09 Yeah.

01:51:10 I think nations will lose power in the 21st century,

01:51:13 like lose sufficient power towards secrecies.

01:51:15 Transparency is more beneficial than secrecy,

01:51:18 but of course it’s not obvious.

01:51:21 Let’s hope so.

01:51:22 Let’s hope so that the governments

01:51:27 will become more transparent.

01:51:31 What, so we last talked, I think in March or April,

01:51:35 what have you learned?

01:51:36 How has your philosophical, psychological,

01:51:40 biological worldview changed since then?

01:51:43 Or you’ve been studying it nonstop

01:51:46 from a computational biology perspective.

01:51:48 How has your understanding and thoughts about this virus

01:51:51 changed over those months from the beginning to today?

01:51:54 One thing that I was really amazed at

01:51:58 how efficient the scientific community was.

01:52:03 I mean, and even just judging on this very narrow domain

01:52:10 of protein structure and understanding

01:52:13 the structural characterization of this virus

01:52:17 from the components point of view,

01:52:19 whole virus point of view.

01:52:21 If you look at SARS, something that happened less than 20,

01:52:31 but close enough, 20 years ago,

01:52:34 and you see what, when it happened,

01:52:38 what was sort of the response by the scientific community,

01:52:42 you see that the structure characterizations did a cure,

01:52:47 but it took several years, right?

01:52:51 Now the things that took several years,

01:52:54 it’s a matter of months, right?

01:52:56 So we see that the research pop up.

01:53:01 We are at the unprecedented level

01:53:03 in terms of the sequencing, right?

01:53:05 Never before we had a single virus sequence so many times,

01:53:10 so which allows us to actually to trace very precisely

01:53:16 the sort of the evolutionary nature of this virus,

01:53:21 what happens, and it’s not just this virus independently

01:53:27 of everything, it’s the sequence of this virus

01:53:32 linked, anchored to the specific geographic place

01:53:36 to specific

01:53:24 people, because our genotype influences also

01:53:31 the evolution of this, it’s always a host pathogen,

01:53:35 core evolution that, you know,

01:53:38 it’s not just the virus, it’s the sequence of this virus,

01:53:41 it’s the sequence of this virus linked to the specific

01:53:44 geographic place, it’s the sequence of this virus

01:53:48 linked to the specific geographic place to specific people,

01:53:52 that, you know, occurs.

01:53:55 It’d be cool if we also had a lot more data about,

01:53:58 so that the spread of this virus, not maybe,

01:54:02 well, it’d be nice if we had it for like contact tracing

01:54:06 purposes for this virus, but it’d be also nice if we had it

01:54:09 for the study for future viruses to be able to respond

01:54:12 and so on, but it’s already nice that we have geographical

01:54:15 data and the basic data from individual humans, yeah.

01:54:18 Exactly, no, I think contact tracing is obviously

01:54:22 a key component in understanding

01:54:26 the spread of this virus.

01:54:29 There is also, there is a number of challenges, right?

01:54:31 So XPRIZE is one of them, we

01:54:35 just recently took a part of

01:54:39 this competition, it’s the prediction of the

01:54:43 number of infections in different regions.

01:54:47 Oh, sure.

01:54:48 So, you know, obviously the AI

01:54:52 is the main topic in those predictions.

01:54:55 Yeah, but it’s still, the data, I mean, that’s a competition,

01:54:59 but the data is weak

01:55:03 on the training. Like, it’s great,

01:55:07 it’s much more than probably before, but like, it’d be nice if it was like

01:55:11 really rich. I talked to Michael Mina from

01:55:15 Harvard, I mean, he dreams that the community comes together with like a

01:55:19 weather map to where viruses, right, like

01:55:23 really high resolution sensors on like how

01:55:27 from person to person the viruses that travel, all the different kinds of viruses, right?

01:55:31 Because there’s a ton of them, and then you’d be able to tell

01:55:35 the story that you’ve spoken about

01:55:39 of the evolution of these viruses, like day to day mutations that

01:55:43 are occurring. I mean, that’d be fascinating just from a perspective of

01:55:47 study and from the perspective of being able to respond to future pandemics.

01:55:51 That’s ultimately what I’m worried about. People love

01:55:55 books. Is there some three

01:55:59 or whatever number of books, technical, fiction, philosophical, that

01:56:03 brought you joy in life, had an impact on your life,

01:56:07 and maybe some that you would recommend others?

01:56:11 I’ll give you three very different books, and I also have a special runner up.

01:56:15 Honorable mention.

01:56:19 I mean, it’s an audiobook, and that’s

01:56:23 some specific reason behind it. So the first book is

01:56:27 something that sort of impacted my earlier

01:56:31 stage of life, and I’m probably not going to be very original here.

01:56:35 It’s Bulgakov’s Master and Margarita.

01:56:39 For a Russian, maybe it’s not super original,

01:56:43 but it’s a really powerful book, even in English.

01:56:47 It is incredibly powerful, and

01:56:51 I mean, the way it ends.

01:56:55 I still have goosebumps when I read

01:56:59 the very last sort of, it’s called prologue, where

01:57:03 it’s just so powerful. What impact did it have on you? What ideas?

01:57:07 What insights did you get from it? I was just taken by

01:57:11 the fact that

01:57:15 you have those parallel lives

01:57:19 apart from many centuries, and

01:57:23 somehow they got sort of intertwined into

01:57:27 one story, and that

01:57:31 to me was fascinating. And of course

01:57:35 the romantic part of this book is like

01:57:39 it’s not just romance, it’s like the romance

01:57:43 empowered by sort of magic, right?

01:57:47 And maybe on top of that, you have some irony,

01:57:51 which is unavoidable, right? Because it was that

01:57:55 Soviet time. But it’s very deeply Russian, so that’s

01:57:59 the wit, the humor, the pain, the love,

01:58:03 all of that is one of the books that kind of captures

01:58:07 something about Russian culture that people outside of Russia

01:58:11 should probably read. I agree. What’s the second one? So the second one

01:58:15 is again another one that it happened

01:58:19 I read it later in my life. I think I read it

01:58:23 first time when I was a graduate student.

01:58:27 And that’s the Solzhenitsyn’s Cancer Word.

01:58:31 That is amazingly powerful book.

01:58:35 What is it about? It’s about, I mean, essentially

01:58:39 based on Solzhenitsyn was

01:58:43 diagnosed with cancer when he was reasonably young, and he

01:58:47 made a full recovery. So this is

01:58:51 about a person who was sentenced

01:58:55 for life in one of these camps.

01:58:59 And he had some cancer,

01:59:03 so he was transported back to one of these

01:59:07 Soviet republics, I think it was

01:59:11 South Asian republics. And the

01:59:15 book is about

01:59:19 his experience being a

01:59:23 prisoner, being a patient in the

01:59:27 cancer clinic, in the cancer ward, surrounded

01:59:31 by people, many of which die.

01:59:35 But in the way

01:59:39 it reads, first of all, later on I

01:59:43 read the accounts of the doctors

01:59:47 who describe the experiences

01:59:51 in the book by the

01:59:55 patient as incredibly accurate.

01:59:59 So I read that there was some doctor saying that

02:00:03 every single doctor should read this book to understand

02:00:07 what the patient feels. But

02:00:11 again, as many of the Solzhenitsyn’s

02:00:15 books, it has multiple levels of complexity.

02:00:19 And obviously if you look above

02:00:23 the cancer and the patient, the

02:00:27 tumor that was growing and then disappeared

02:00:31 in his

02:00:35 body with some consequences, this is

02:00:39 allegorically the

02:00:43 Soviet, and he actually

02:00:47 when he was asked, he said that this is what made him

02:00:51 think about this, how to combine these experiences.

02:00:55 Him being a part of the Soviet regime,

02:00:59 also being a part of the

02:01:03 someone sent to Gulag camp,

02:01:07 and also someone who experienced cancer

02:01:11 in his life. The Gulag Archipelago

02:01:15 and this book, these are the works that actually made him

02:01:19 receive a Nobel Prize. But to me

02:01:23 I’ve read

02:01:27 other books by Solzhenitsyn.

02:01:31 This one to me is the most powerful one.

02:01:35 And by the way, both this one and the previous one you read in Russian?

02:01:39 Yes. So now there is the third book is an English book

02:01:43 and it’s completely different. So we’re switching the gears

02:01:47 completely. So this is the book which, it’s not even

02:01:51 a book, it’s an essay by

02:01:55 Jonathan Neumann called The Computer and the Brain.

02:01:59 And that was the book he was writing

02:02:03 knowing that he was dying of cancer.

02:02:07 So the book was released back, it’s a very thin book.

02:02:11 But the power,

02:02:15 the intellectual power in this book, in this essay

02:02:19 is incredible. I mean you probably know that von Neumann

02:02:23 is considered to be one of the biggest

02:02:27 thinkers. So his intellectual power was incredible.

02:02:31 And you can actually feel this power

02:02:35 in this book where the person is writing knowing that he will be,

02:02:39 he will die. The book actually got published only after his

02:02:43 death back in 1958. He died in 1957.

02:02:47 So he tried to put as many

02:02:51 ideas that he still

02:02:55 hadn’t realized.

02:02:59 So this book is very difficult

02:03:03 to read because every single paragraph

02:03:07 is just compact, is

02:03:11 filled with these ideas. And the ideas are incredible.

02:03:15 Even nowadays, so he tried

02:03:19 to put the parallels between the brain

02:03:23 computing power, the neural system, and the computers

02:03:27 as they were understood. Do you remember what year he was working on this?

02:03:31 57. 57. So that was right during his,

02:03:35 when he was diagnosed with cancer and he was essentially…

02:03:39 Yeah, he’s one of those, there’s a few folks people mention,

02:03:43 I think Ed Witten is another that like

02:03:47 everyone that meets them, they say he’s just an intellectual powerhouse.

02:03:51 Yes. Okay, so who’s the honorable mention?

02:03:55 And this is, I mean, the reason I put it sort of in a separate section

02:03:59 because this is a book that I recently

02:04:03 listened to. So it’s an audio book.

02:04:07 And this is a book called Lab Girl by Hope Jarron.

02:04:11 So Hope Jarron, she is a

02:04:15 scientist, she’s a geochemist that essentially

02:04:19 studies the

02:04:23 fossil plants. And so she uses

02:04:27 this fossil plant, the chemical analysis to understand

02:04:31 what was the climate back in

02:04:35 a thousand years, hundreds of thousands of years ago.

02:04:39 And so something that incredibly

02:04:43 touched me by this book, it was narrated by the author.

02:04:47 Nice. And it’s an incredibly

02:04:51 personal story, incredibly. So

02:04:55 certain parts of the book, you could actually hear the author crying.

02:04:59 And that to me, I mean, I never experienced

02:05:03 anything like this, reading the book, but it was like

02:05:07 the connection between you and the author.

02:05:11 And I think this is really

02:05:15 a must read, but even better, a must listen

02:05:19 to audio book for anyone who

02:05:23 wants to learn about sort of

02:05:27 academia, science, research in general, because it’s

02:05:31 a very personal account about her becoming

02:05:35 a scientist. So

02:05:39 we’re just before New Year’s.

02:05:43 We talked a lot about some difficult topics of viruses and so on.

02:05:47 Do you have some exciting things you’re looking forward

02:05:51 to in 2021? Some New Year’s resolutions,

02:05:55 maybe silly or fun, or

02:05:59 something very important and fundamental to

02:06:03 the world of science or something completely unimportant?

02:06:07 Well, I’m definitely looking forward to

02:06:11 towards things becoming normal.

02:06:15 So yes, I really miss traveling.

02:06:19 Every summer I go

02:06:23 to an international summer school. It’s called

02:06:27 the School for Molecular and Theoretical Biology. It’s held in Europe.

02:06:31 It’s organized by very good friends of mine. And this is

02:06:35 the school for gifted kids from all over the world, and

02:06:39 they’re incredibly bright. It’s like every time I go there, it’s like, you know,

02:06:43 it’s a highlight of the year. And

02:06:47 we couldn’t make it this August, so we

02:06:51 did this school remotely, but it’s different.

02:06:55 So I am definitely looking forward to next August

02:06:59 coming there. One of

02:07:03 my personal resolutions, I realized that

02:07:07 being in the house and working from home,

02:07:11 I realized that actually

02:07:15 I apparently missed a lot

02:07:19 spending time with my family,

02:07:23 believe it or not. So you typically, with all the

02:07:27 research and teaching and

02:07:31 everything related to the academic life,

02:07:35 I mean, you get distracted. And so

02:07:39 you don’t feel that

02:07:43 the fact that you are away from your family doesn’t affect you

02:07:47 because you are naturally distracted by other things.

02:07:51 So this time I realized that

02:07:55 that’s so important, right? Spending your time with

02:07:59 the family, with your kids. And so that

02:08:03 would be my new year resolution and actually trying to

02:08:07 spend as much time as possible. Even when the world opens up.

02:08:11 Yeah, that’s a beautiful message. That’s a beautiful reminder.

02:08:15 I asked you if there’s a Russian poem

02:08:19 that I could read, that I could force you to read, and you said, okay, fine, sure.

02:08:23 Do you mind reading?

02:08:27 And you said that no paper needed.

02:08:31 So this poem was written by my namesake,

02:08:35 another Dmitry, Dmitry Kemerefeld.

02:08:39 It’s a recent poem and it’s

02:08:43 called Sorceress, Vyadma,

02:08:47 in Russian, or actually

02:08:51 Koldunya. So that’s sort of another sort of connotation of

02:08:55 sorceress or witch. And I really like it

02:08:59 and it’s one of just a handful poems I actually

02:09:03 can recall by heart. I also have a very strong

02:09:07 association when I read this poem with Master and

02:09:11 Margarita, the main female character,

02:09:15 Margarita. And also it’s

02:09:19 about, it’s happening about the same time we’re talking

02:09:23 now, so around New Year,

02:09:27 around Christmas. Do you mind reading it in Russian?

02:09:31 I’ll give it a try.

02:10:01 So you narrowed your eyes,

02:10:05 that anyone who was blessed

02:10:09 was ready to give their soul to the devil

02:10:13 for this witch’s connection.

02:10:17 And I, without prejudice,

02:10:21 ran out to feel your

02:10:25 amazing breath on your lips,

02:10:29 to remember how you flew above the earth

02:10:33 in a white view,

02:10:37 in a white haze, in a white mist.

02:10:41 That’s beautiful. I love how it captures a moment of longing

02:10:45 and maybe love even.

02:10:49 Yes. To me it has a lot of meaning about

02:10:53 this something that is happening,

02:10:57 something that is far away, but still very close to you.

02:11:01 And yes, it’s the winter.

02:11:05 There’s something magical about winter, isn’t there?

02:11:09 I don’t know how to translate it, but a kiss in winter

02:11:13 is interesting. Lips in winter and all that kind of stuff.

02:11:17 It’s beautiful. Russian has a way. It has a reason, Russian poetry

02:11:21 is just, I’m a fan of poetry in both languages, but English

02:11:25 doesn’t capture some of the magic that Russian seems to, so

02:11:29 thank you for doing that. That was awesome. Dmitry,

02:11:33 it’s great to talk to you again. It’s contagious

02:11:37 how much you love what you do, how much you love life, so I really appreciate

02:11:41 you taking the time to talk today. And thank you for having me.

02:11:45 Thanks for listening to this conversation with Dmitry Korkin, and thank you to our

02:11:49 sponsors. Brave Browser, NetSuite Business Management

02:11:53 Software, Magic Spoon Low Carb Cereal, and

02:11:57 Asleep Self Cooling Mattress. So the choice is

02:12:01 browsing privacy, business success, healthy diet, or comfortable

02:12:05 sleep. Choose wisely, my friends. And if you wish,

02:12:09 click the sponsor links below to get a discount and to support this podcast.

02:12:13 And now, let me leave you with some words from Jeffrey Eugenides.

02:12:17 Biology gives you a brain.

02:12:21 Life turns it into a mind. Thank you for listening,

02:12:25 and hope to see you next time.