Rajat Monga: TensorFlow #22

Transcript

00:00:00 The following is a conversation with Rajat Manga.

00:00:03 He’s an engineer and director of Google,

00:00:04 leading the TensorFlow team.

00:00:06 TensorFlow is an open source library

00:00:09 at the center of much of the work going on in the world

00:00:11 in deep learning, both the cutting edge research

00:00:14 and the large scale application of learning based approaches.

00:00:17 But it’s quickly becoming much more than a software library.

00:00:20 It’s now an ecosystem of tools for the deployment of machine

00:00:24 learning in the cloud, on the phone, in the browser,

00:00:26 on both generic and specialized hardware.

00:00:29 TPU, GPU, and so on.

00:00:31 Plus, there’s a big emphasis on growing a passionate community

00:00:35 of developers.

00:00:36 Rajat, Jeff Dean, and a large team of engineers at Google

00:00:39 Brain are working to define the future of machine

00:00:42 learning with TensorFlow 2.0, which is now in alpha.

00:00:46 I think the decision to open source TensorFlow

00:00:49 is a definitive moment in the tech industry.

00:00:51 It showed that open innovation can be successful

00:00:54 and inspire many companies to open source their code,

00:00:56 to publish, and in general engage

00:00:58 in the open exchange of ideas.

00:01:01 This conversation is part of the Artificial Intelligence

00:01:03 podcast.

00:01:05 If you enjoy it, subscribe on YouTube, iTunes,

00:01:07 or simply connect with me on Twitter at Lex Friedman,

00:01:10 spelled F R I D.

00:01:12 And now, here’s my conversation with Rajat Manga.

00:01:17 You were involved with Google Brain since its start in 2011

00:01:22 with Jeff Dean.

00:01:24 It started with this belief, the proprietary machine learning

00:01:29 library, and turned into TensorFlow in 2014,

00:01:32 the open source library.

00:01:35 So what were the early days of Google Brain like?

00:01:39 What were the goals, the missions?

00:01:41 How do you even proceed forward once there’s

00:01:45 so much possibilities before you?

00:01:47 It was interesting back then when I started,

00:01:50 or when you were even just talking about it,

00:01:55 the idea of deep learning was interesting and intriguing

00:01:59 in some ways.

00:02:00 It hadn’t yet taken off, but it held some promise.

00:02:04 It had shown some very promising and early results.

00:02:08 I think the idea where Andrew and Jeff had started

00:02:11 was, what if we can take this work people are doing

00:02:15 in research and scale it to what Google has

00:02:18 in terms of the compute power, and also

00:02:23 put that kind of data together?

00:02:24 What does it mean?

00:02:25 And so far, the results had been, if you scale the compute,

00:02:28 scale the data, it does better.

00:02:30 And would that work?

00:02:31 And so that was the first year or two, can we prove that out?

00:02:35 And with this belief, when we started the first year,

00:02:37 we got some early wins, which is always great.

00:02:40 What were the wins like?

00:02:41 What was the wins where you were,

00:02:44 there’s some problems to this, this is going to be good?

00:02:46 I think there are two early wins where one was speech,

00:02:49 that we collaborated very closely with the speech research

00:02:52 team, who was also getting interested in this.

00:02:54 And the other one was on images, where the cat paper,

00:02:58 as we call it, that was covered by a lot of folks.

00:03:03 And the birth of Google Brain was around neural networks.

00:03:07 So it was deep learning from the very beginning.

00:03:09 That was the whole mission.

00:03:10 So what would, in terms of scale,

00:03:15 what was the sort of dream of what this could become?

00:03:21 Were there echoes of this open source TensorFlow community

00:03:24 that might be brought in?

00:03:26 Was there a sense of TPUs?

00:03:28 Was there a sense of machine learning is now going to be

00:03:31 at the core of the entire company,

00:03:33 is going to grow into that direction?

00:03:36 Yeah, I think, so that was interesting.

00:03:38 And if I think back to 2012 or 2011,

00:03:41 and first was can we scale it in the year or so,

00:03:45 we had started scaling it to hundreds and thousands

00:03:47 of machines.

00:03:48 In fact, we had some runs even going to 10,000 machines.

00:03:51 And all of those shows great promise.

00:03:53 In terms of machine learning at Google,

00:03:56 the good thing was Google’s been doing machine learning

00:03:58 for a long time.

00:04:00 Deep learning was new, but as we scaled this up,

00:04:03 we showed that, yes, that was possible.

00:04:05 And it was going to impact lots of things.

00:04:07 Like we started seeing real products wanting to use this.

00:04:11 Again, speech was the first, there were image things

00:04:13 that photos came out of and then many other products as well.

00:04:17 So that was exciting.

00:04:20 As we went into that a couple of years,

00:04:23 externally also academia started to,

00:04:25 there was lots of push on, okay,

00:04:27 deep learning is interesting,

00:04:28 we should be doing more and so on.

00:04:30 And so by 2014, we were looking at, okay,

00:04:34 this is a big thing, it’s going to grow.

00:04:36 And not just internally, externally as well.

00:04:39 Yes, maybe Google’s ahead of where everybody is,

00:04:42 but there’s a lot to do.

00:04:43 So a lot of this started to make sense and come together.

00:04:46 So the decision to open source,

00:04:49 I was just chatting with Chris Glatner about this.

00:04:52 The decision to go open source with TensorFlow,

00:04:54 I would say sort of for me personally,

00:04:57 seems to be one of the big seminal moments

00:04:59 in all of software engineering ever.

00:05:01 I think that’s when a large company like Google

00:05:04 decides to take a large project that many lawyers

00:05:07 might argue has a lot of IP,

00:05:10 just decide to go open source with it,

00:05:12 and in so doing lead the entire world

00:05:14 and saying, you know what, open innovation

00:05:16 is a pretty powerful thing, and it’s okay to do.

00:05:22 That was, I mean, that’s an incredible moment in time.

00:05:26 So do you remember those discussions happening?

00:05:29 Whether open source should be happening?

00:05:31 What was that like?

00:05:32 I would say, I think, so the initial idea came from Jeff,

00:05:36 who was a big proponent of this.

00:05:39 I think it came off of two big things.

00:05:42 One was research wise, we were a research group.

00:05:46 We were putting all our research out there.

00:05:49 If you wanted to, we were building on others research

00:05:51 and we wanted to push the state of the art forward.

00:05:55 And part of that was to share the research.

00:05:56 That’s how I think deep learning and machine learning

00:05:58 has really grown so fast.

00:06:01 So the next step was, okay, now,

00:06:03 would software help with that?

00:06:05 And it seemed like they were existing

00:06:08 a few libraries out there, Tiano being one,

00:06:11 Torch being another, and a few others,

00:06:14 but they were all done by academia

00:06:15 and so the level was significantly different.

00:06:18 The other one was from a software perspective,

00:06:22 Google had done lots of software

00:06:23 or that we used internally, you know,

00:06:27 and we published papers.

00:06:29 Often there was an open source project

00:06:31 that came out of that that somebody else

00:06:33 picked up that paper and implemented

00:06:35 and they were very successful.

00:06:38 Back then it was like, okay, there’s Hadoop,

00:06:41 which has come off of tech that we’ve built.

00:06:44 We know the tech we’ve built is way better

00:06:46 for a number of different reasons.

00:06:47 We’ve invested a lot of effort in that.

00:06:51 And turns out we have Google Cloud

00:06:54 and we are now not really providing our tech,

00:06:57 but we are saying, okay, we have Bigtable,

00:07:00 which is the original thing.

00:07:02 We are going to now provide H base APIs

00:07:03 on top of that, which isn’t as good,

00:07:06 but that’s what everybody’s used to.

00:07:07 So there’s like, can we make something

00:07:10 that is better and really just provide,

00:07:12 helps the community in lots of ways,

00:07:14 but also helps push a good standard forward.

00:07:18 So how does Cloud fit into that?

00:07:19 There’s a TensorFlow open source library

00:07:22 and how does the fact that you can

00:07:25 use so many of the resources that Google provides

00:07:28 and the Cloud fit into that strategy?

00:07:31 So TensorFlow itself is open

00:07:33 and you can use it anywhere, right?

00:07:34 And we want to make sure that continues to be the case.

00:07:38 On Google Cloud, we do make sure

00:07:41 that there’s lots of integrations with everything else

00:07:43 and we want to make sure

00:07:44 that it works really, really well there.

00:07:47 You’re leading the TensorFlow effort.

00:07:50 Can you tell me the history

00:07:51 and the timeline of TensorFlow project

00:07:53 in terms of major design decisions,

00:07:55 so like the open source decision,

00:07:58 but really what to include and not?

00:08:01 There’s this incredible ecosystem

00:08:03 that I’d like to talk about.

00:08:04 There’s all these parts,

00:08:05 but what if just some sample moments

00:08:11 that defined what TensorFlow eventually became

00:08:15 through its, I don’t know if you’re allowed to say history

00:08:17 when it’s just, but in deep learning,

00:08:20 everything moves so fast

00:08:21 and just a few years is already history.

00:08:23 Yes, yes, so looking back, we were building TensorFlow.

00:08:29 I guess we open sourced it in 2015, November 2015.

00:08:34 We started on it in summer of 2014, I guess.

00:08:39 And somewhere like three to six, late 2014,

00:08:42 by then we had decided that, okay,

00:08:45 there’s a high likelihood we’ll open source it.

00:08:47 So we started thinking about that

00:08:48 and making sure we’re heading down that path.

00:08:53 At that point, by that point,

00:08:56 we had seen a few, lots of different use cases at Google.

00:08:59 So there were things like, okay,

00:09:01 yes, you wanna run it at large scale in the data center.

00:09:04 Yes, we need to support different kind of hardware.

00:09:07 We had GPUs at that point.

00:09:09 We had our first GPU at that point

00:09:11 or was about to come out roughly around that time.

00:09:15 So the design sort of included those.

00:09:18 We had started to push on mobile.

00:09:21 So we were running models on mobile.

00:09:24 At that point, people were customizing code.

00:09:28 So we wanted to make sure TensorFlow

00:09:29 could support that as well.

00:09:30 So that sort of became part of that overall design.

00:09:35 When you say mobile,

00:09:36 you mean like a pretty complicated algorithms

00:09:38 running on the phone?

00:09:40 That’s correct.

00:09:40 So when you have a model that you deploy on the phone

00:09:44 and run it there, right?

00:09:45 So already at that time,

00:09:46 there was ideas of running machine learning on the phone.

00:09:48 That’s correct.

00:09:49 We already had a couple of products

00:09:51 that were doing that by then.

00:09:53 And in those cases,

00:09:54 we had basically customized handcrafted code

00:09:57 or some internal libraries that we’re using.

00:10:00 So I was actually at Google during this time

00:10:02 in a parallel, I guess, universe,

00:10:04 but we were using Theano and Caffe.

00:10:09 Was there some degree to which you were bouncing,

00:10:11 like trying to see what Caffe was offering people,

00:10:15 trying to see what Theano was offering

00:10:17 that you want to make sure you’re delivering

00:10:19 on whatever that is?

00:10:21 Perhaps the Python part of thing,

00:10:23 maybe did that influence any design decisions?

00:10:27 Totally.

00:10:28 So when we built this belief

00:10:29 and some of that was in parallel

00:10:31 with some of these libraries coming up,

00:10:33 I mean, Theano itself is older,

00:10:36 but we were building this belief

00:10:39 focused on our internal thing

00:10:41 because our systems were very different.

00:10:42 By the time we got to this,

00:10:44 we looked at a number of libraries that were out there.

00:10:47 Theano, there were folks in the group

00:10:49 who had experience with Torch, with Lua.

00:10:52 There were folks here who had seen Caffe.

00:10:54 I mean, actually, Yang Jing was here as well.

00:10:58 There’s what other libraries?

00:11:02 I think we looked at a number of things.

00:11:04 Might even have looked at JNR back then.

00:11:06 I’m trying to remember if it was there.

00:11:09 In fact, yeah, we did discuss ideas around,

00:11:12 okay, should we have a graph or not?

00:11:17 So putting all these together was definitely,

00:11:20 they were key decisions that we wanted.

00:11:22 We had seen limitations in our prior disbelief things.

00:11:28 A few of them were just in terms of research

00:11:31 was moving so fast, we wanted the flexibility.

00:11:35 The hardware was changing fast.

00:11:36 We expected to change that

00:11:37 so that those probably were two things.

00:11:39 And yeah, I think the flexibility

00:11:43 in terms of being able to express

00:11:44 all kinds of crazy things was definitely a big one then.

00:11:46 So what, the graph decisions though,

00:11:49 with moving towards TensorFlow 2.0,

00:11:52 there’s more, by default, there’ll be eager execution.

00:11:56 So sort of hiding the graph a little bit

00:11:59 because it’s less intuitive

00:12:00 in terms of the way people develop and so on.

00:12:03 What was that discussion like in terms of using graphs?

00:12:06 It seemed, it’s kind of the Theano way.

00:12:09 Did it seem the obvious choice?

00:12:11 So I think where it came from was our disbelief

00:12:15 had a graph like thing as well.

00:12:17 A much more simple, it wasn’t a general graph,

00:12:19 it was more like a straight line thing.

00:12:23 More like what you might think of cafe,

00:12:25 I guess in that sense.

00:12:26 But the graph was,

00:12:28 and we always cared about the production stuff.

00:12:31 Like even with disbelief,

00:12:32 we were deploying a whole bunch of stuff in production.

00:12:34 So graph did come from that when we thought of,

00:12:37 okay, should we do that in Python?

00:12:39 And we experimented with some ideas

00:12:40 where it looked a lot simpler to use,

00:12:44 but not having a graph meant,

00:12:46 okay, how do you deploy now?

00:12:47 So that was probably what tilted the balance for us

00:12:51 and eventually we ended up with a graph.

00:12:52 And I guess the question there is, did you,

00:12:55 I mean, so production seems to be

00:12:57 the really good thing to focus on,

00:12:59 but did you even anticipate the other side of it

00:13:02 where there could be, what is it?

00:13:04 What are the numbers?

00:13:05 It’s been crazy, 41 million downloads.

00:13:08 Yep.

00:13:12 I mean, was that even like a possibility in your mind

00:13:16 that it would be as popular as it became?

00:13:19 So I think we did see a need for this

00:13:24 a lot from the research perspective

00:13:27 and like early days of deep learning in some ways.

00:13:32 41 million, no, I don’t think I imagined this number.

00:13:35 Then it seemed like there’s a potential future

00:13:41 where lots more people would be doing this

00:13:43 and how do we enable that?

00:13:45 I would say this kind of growth,

00:13:49 I probably started seeing somewhat after the open sourcing

00:13:52 where it was like, okay,

00:13:55 deep learning is actually growing way faster

00:13:57 for a lot of different reasons.

00:13:59 And we are in just the right place to push on that

00:14:02 and leverage that and deliver on lots of things

00:14:06 that people want.

00:14:07 So what changed once you open sourced?

00:14:09 Like how this incredible amount of attention

00:14:13 from a global population of developers,

00:14:16 how did the project start changing?

00:14:18 I don’t even actually remember during those times.

00:14:22 I know looking now, there’s really good documentation,

00:14:24 there’s an ecosystem of tools,

00:14:26 there’s a community, there’s a blog,

00:14:27 there’s a YouTube channel now, right?

00:14:29 Yeah.

00:14:31 It’s very community driven.

00:14:33 Back then, I guess 0.1 version,

00:14:38 is that the version?

00:14:39 I think we call it 0.6 or five,

00:14:42 something like that, I forget.

00:14:43 What changed leading into 1.0?

00:14:47 It’s interesting.

00:14:48 I think we’ve gone through a few things there.

00:14:51 When we started out, when we first came out,

00:14:53 people loved the documentation we have

00:14:56 because it was just a huge step up from everything else

00:14:58 because all of those were academic projects,

00:15:00 people doing, who don’t think about documentation.

00:15:04 I think what that changed was,

00:15:06 instead of deep learning being a research thing,

00:15:10 some people who were just developers

00:15:12 could now suddenly take this out

00:15:14 and do some interesting things with it, right?

00:15:16 Who had no clue what machine learning was before then.

00:15:20 And that I think really changed

00:15:22 how things started to scale up in some ways

00:15:24 and pushed on it.

00:15:27 Over the next few months as we looked at

00:15:30 how do we stabilize things,

00:15:31 as we look at not just researchers,

00:15:33 now we want stability, people want to deploy things.

00:15:36 That’s how we started planning for 1.0

00:15:38 and there are certain needs for that perspective.

00:15:42 And so again, documentation comes up,

00:15:45 designs, more kinds of things to put that together.

00:15:49 And so that was exciting to get that to a stage

00:15:52 where more and more enterprises wanted to buy in

00:15:55 and really get behind that.

00:15:57 And I think post 1.0 and over the next few releases,

00:16:01 that enterprise adoption also started to take off.

00:16:04 I would say between the initial release and 1.0,

00:16:07 it was, okay, researchers of course,

00:16:10 then a lot of hobbies and early interest,

00:16:12 people excited about this who started to get on board

00:16:15 and then over the 1.x thing, lots of enterprises.

00:16:18 I imagine anything that’s below 1.0

00:16:23 gives pressure to be,

00:16:25 the enterprise probably wants something that’s stable.

00:16:28 Exactly.

00:16:28 And do you have a sense now that TensorFlow is stable?

00:16:33 Like it feels like deep learning in general

00:16:35 is extremely dynamic field, so much is changing.

00:16:40 And TensorFlow has been growing incredibly.

00:16:43 Do you have a sense of stability at the helm of it?

00:16:46 I mean, I know you’re in the midst of it, but.

00:16:48 Yeah, I think in the midst of it,

00:16:51 it’s often easy to forget what an enterprise wants

00:16:55 and what some of the people on that side want.

00:16:58 There are still people running models

00:17:00 that are three years old, four years old.

00:17:02 So Inception is still used by tons of people.

00:17:06 Even ResNet 50 is what, couple of years old now or more,

00:17:08 but there are tons of people who use that and they’re fine.

00:17:12 They don’t need the last couple of bits of performance

00:17:15 or quality, they want some stability

00:17:17 in things that just work.

00:17:19 And so there is value in providing that

00:17:22 with that kind of stability and making it really simpler

00:17:25 because that allows a lot more people to access it.

00:17:27 And then there’s the research crowd which wants,

00:17:31 okay, they wanna do these crazy things

00:17:33 exactly like you’re saying, right?

00:17:34 Not just deep learning in the straight up models

00:17:37 that used to be there, they want RNNs

00:17:40 and even RNNs are maybe old, they are transformers now.

00:17:43 And now it needs to combine with RL and GANs and so on.

00:17:48 So there’s definitely that area that like the boundary

00:17:52 that’s shifting and pushing the state of the art.

00:17:55 But I think there’s more and more of the past

00:17:57 that’s much more stable and even stuff

00:18:01 that was two, three years old is very, very usable

00:18:03 by lots of people.

00:18:04 So that part makes it a lot easier.

00:18:07 So I imagine, maybe you can correct me if I’m wrong,

00:18:09 one of the biggest use cases is essentially

00:18:12 taking something like ResNet 50

00:18:14 and doing some kind of transfer learning

00:18:17 on a very particular problem that you have.

00:18:19 It’s basically probably what majority of the world does.

00:18:24 And you wanna make that as easy as possible.

00:18:27 So I would say for the hobbyist perspective,

00:18:30 that’s the most common case, right?

00:18:32 In fact, the apps and phones and stuff that you’ll see,

00:18:35 the early ones, that’s the most common case.

00:18:37 I would say there are a couple of reasons for that.

00:18:40 One is that everybody talks about that.

00:18:44 It looks great on slides.

00:18:46 That’s a presentation, yeah, exactly.

00:18:49 What enterprises want is that is part of it,

00:18:53 but that’s not the big thing.

00:18:54 Enterprises really have data

00:18:56 that they wanna make predictions on.

00:18:58 This is often what they used to do

00:19:00 with the people who were doing ML

00:19:01 was just regression models,

00:19:03 linear regression, logistic regression, linear models,

00:19:06 or maybe gradient booster trees and so on.

00:19:09 Some of them still benefit from deep learning,

00:19:11 but they want that’s the bread and butter,

00:19:14 or like the structured data and so on.

00:19:16 So depending on the audience you look at,

00:19:18 they’re a little bit different.

00:19:19 And they just have, I mean, the best of enterprise

00:19:23 probably just has a very large data set,

00:19:26 or deep learning can probably shine.

00:19:28 That’s correct, that’s right.

00:19:30 And then I think the other pieces that they wanted,

00:19:33 again, with 2.0, the developer summit we put together

00:19:36 is the whole TensorFlow Extended piece,

00:19:39 which is the entire pipeline.

00:19:40 They care about stability across doing their entire thing.

00:19:43 They want simplicity across the entire thing.

00:19:46 I don’t need to just train a model.

00:19:47 I need to do that every day again, over and over again.

00:19:51 I wonder to which degree you have a role in,

00:19:54 I don’t know, so I teach a course on deep learning.

00:19:56 I have people like lawyers come up to me and say,

00:20:01 when is machine learning gonna enter legal,

00:20:04 the legal realm?

00:20:05 The same thing in all kinds of disciplines,

00:20:09 immigration, insurance, often when I see

00:20:14 what it boils down to is these companies

00:20:17 are often a little bit old school

00:20:19 in the way they organize the data.

00:20:20 So the data is just not ready yet, it’s not digitized.

00:20:24 Do you also find yourself being in the role

00:20:26 of an evangelist for like, let’s get,

00:20:31 organize your data, folks, and then you’ll get

00:20:33 the big benefit of TensorFlow.

00:20:35 Do you get those, have those conversations?

00:20:38 Yeah, yeah, you know, I get all kinds of questions there

00:20:41 from, okay, what do I need to make this work, right?

00:20:49 Do we really need deep learning?

00:20:50 I mean, there are all these things,

00:20:52 I already use this linear model, why would this help?

00:20:55 I don’t have enough data, let’s say,

00:20:57 or I wanna use machine learning,

00:21:00 but I have no clue where to start.

00:21:01 So it varies, that to all the way to the experts

00:21:04 to why support very specific things, it’s interesting.

00:21:08 Is there a good answer?

00:21:09 It boils down to oftentimes digitizing data.

00:21:12 So whatever you want automated,

00:21:14 whatever data you want to make prediction based on,

00:21:17 you have to make sure that it’s in an organized form.

00:21:21 Like within the TensorFlow ecosystem,

00:21:24 there’s now, you’re providing more and more data sets

00:21:26 and more and more pre trained models.

00:21:28 Are you finding yourself also the organizer of data sets?

00:21:32 Yes, I think the TensorFlow data sets

00:21:34 that we just released, that’s definitely come up

00:21:37 where people want these data sets,

00:21:39 can we organize them and can we make that easier?

00:21:41 So that’s definitely one important thing.

00:21:45 The other related thing I would say is I often tell people,

00:21:47 you know what, don’t think of the most fanciest thing

00:21:51 that the newest model that you see,

00:21:53 make something very basic work and then you can improve it.

00:21:56 There’s just lots of things you can do with it.

00:21:58 Yeah, start with the basics, true.

00:22:00 One of the big things that makes TensorFlow

00:22:03 even more accessible was the appearance

00:22:06 whenever that happened of Keras,

00:22:08 the Keras standard sort of outside of TensorFlow.

00:22:12 I think it was Keras on top of Tiano at first only

00:22:18 and then Keras became on top of TensorFlow.

00:22:22 Do you know when Keras chose to also add TensorFlow

00:22:28 as a backend, who was the,

00:22:31 was it just the community that drove that initially?

00:22:34 Do you know if there was discussions, conversations?

00:22:37 Yeah, so Francois started the Keras project

00:22:41 before he was at Google and the first thing was Tiano.

00:22:44 I don’t remember if that was

00:22:46 after TensorFlow was created or way before.

00:22:49 And then at some point,

00:22:51 when TensorFlow started becoming popular,

00:22:53 there were enough similarities

00:22:54 that he decided to create this interface

00:22:56 and put TensorFlow as a backend.

00:22:58 I believe that might still have been

00:23:00 before he joined Google.

00:23:03 So we weren’t really talking about that.

00:23:06 He decided on his own and thought that was interesting

00:23:09 and relevant to the community.

00:23:12 In fact, I didn’t find out about him being at Google

00:23:17 until a few months after he was here.

00:23:19 He was working on some research ideas

00:23:21 and doing Keras on his nights and weekends project.

00:23:24 Oh, interesting.

00:23:25 He wasn’t like part of the TensorFlow.

00:23:28 He didn’t join initially.

00:23:29 He joined research and he was doing some amazing research.

00:23:32 He has some papers on that and research,

00:23:34 so he’s a great researcher as well.

00:23:38 And at some point we realized,

00:23:40 oh, he’s doing this good stuff.

00:23:42 People seem to like the API and he’s right here.

00:23:45 So we talked to him and he said,

00:23:47 okay, why don’t I come over to your team

00:23:50 and work with you for a quarter

00:23:52 and let’s make that integration happen.

00:23:55 And we talked to his manager and he said,

00:23:56 sure, quarter’s fine.

00:23:59 And that quarter’s been something like two years now.

00:24:02 And so he’s fully on this.

00:24:05 So Keras got integrated into TensorFlow in a deep way.

00:24:12 And now with 2.0, TensorFlow 2.0,

00:24:15 sort of Keras is kind of the recommended way

00:24:18 for a beginner to interact with TensorFlow.

00:24:21 Which makes that initial sort of transfer learning

00:24:24 or the basic use cases, even for an enterprise,

00:24:28 super simple, right?

00:24:29 That’s correct, that’s right.

00:24:30 So what was that decision like?

00:24:32 That seems like it’s kind of a bold decision as well.

00:24:38 We did spend a lot of time thinking about that one.

00:24:41 We had a bunch of APIs, some built by us.

00:24:46 There was a parallel layers API that we were building.

00:24:48 And when we decided to do Keras in parallel,

00:24:51 so there were like, okay, two things that we are looking at.

00:24:54 And the first thing we was trying to do

00:24:55 is just have them look similar,

00:24:58 like be as integrated as possible,

00:25:00 share all of that stuff.

00:25:02 There were also like three other APIs

00:25:04 that others had built over time

00:25:05 because we didn’t have a standard one.

00:25:09 But one of the messages that we kept hearing

00:25:11 from the community, okay, which one do we use?

00:25:13 And they kept seeing like, okay,

00:25:14 here’s a model in this one and here’s a model in this one,

00:25:16 which should I pick?

00:25:18 So that’s sort of like, okay,

00:25:20 we had to address that straight on with 2.0.

00:25:24 The whole idea was we need to simplify.

00:25:26 We had to pick one.

00:25:28 Based on where we were, we were like,

00:25:30 okay, let’s see what are the people like?

00:25:35 And Keras was clearly one that lots of people loved.

00:25:39 There were lots of great things about it.

00:25:41 So we settled on that.

00:25:43 Organically, that’s kind of the best way to do it.

00:25:46 It was great.

00:25:47 It was surprising, nevertheless,

00:25:48 to sort of bring in an outside.

00:25:51 I mean, there was a feeling like Keras

00:25:52 might be almost like a competitor

00:25:55 in a certain kind of, to TensorFlow.

00:25:58 And in a sense, it became an empowering element

00:26:01 of TensorFlow.

00:26:02 That’s right.

00:26:03 Yeah, it’s interesting how you can put two things together,

00:26:06 which can align.

00:26:08 In this case, I think Francois, the team,

00:26:11 and a bunch of us have chatted,

00:26:14 and I think we all want to see the same kind of things.

00:26:17 We all care about making it easier

00:26:18 for the huge set of developers out there,

00:26:21 and that makes a difference.

00:26:23 So Python has Guido van Rossum,

00:26:26 who until recently held the position

00:26:28 of benevolent dictator for life.

00:26:31 All right, so there’s a huge successful open source project

00:26:36 like TensorFlow need one person who makes a final decision.

00:26:40 So you’ve did a pretty successful TensorFlow Dev Summit

00:26:45 just now, last couple of days.

00:26:47 There’s clearly a lot of different new features

00:26:51 being incorporated, an amazing ecosystem, so on.

00:26:54 Who’s, how are those design decisions made?

00:26:57 Is there a BDFL in TensorFlow,

00:27:02 or is it more distributed and organic?

00:27:05 I think it’s somewhat different, I would say.

00:27:08 I’ve always been involved in the key design directions,

00:27:14 but there are lots of things that are distributed

00:27:17 where there are a number of people, Martin Wick being one,

00:27:20 who has really driven a lot of our open source stuff,

00:27:23 a lot of the APIs,

00:27:26 and there are a number of other people who’ve been,

00:27:29 you know, pushed and been responsible

00:27:31 for different parts of it.

00:27:34 We do have regular design reviews.

00:27:36 Over the last year, we’ve had a lot of

00:27:38 we’ve really spent a lot of time opening up to the community

00:27:41 and adding transparency.

00:27:44 We’re setting more processes in place,

00:27:45 so RFCs, special interest groups,

00:27:49 to really grow that community and scale that.

00:27:53 I think the kind of scale that ecosystem is in,

00:27:57 I don’t think we could scale with having me

00:27:59 as the lone point of decision maker.

00:28:02 I got it. So, yeah, the growth of that ecosystem,

00:28:05 maybe you can talk about it a little bit.

00:28:08 First of all, it started with Andrej Karpathy

00:28:10 when he first did ComNetJS.

00:28:13 The fact that you can train and you’ll network

00:28:15 in the browser was, in JavaScript, was incredible.

00:28:18 So now TensorFlow.js is really making that

00:28:22 a serious, like a legit thing,

00:28:26 a way to operate, whether it’s in the backend

00:28:28 or the front end.

00:28:29 Then there’s the TensorFlow Extended, like you mentioned.

00:28:32 There’s TensorFlow Lite for mobile.

00:28:35 And all of it, as far as I can tell,

00:28:37 it’s really converging towards being able to

00:28:41 save models in the same kind of way.

00:28:43 You can move around, you can train on the desktop

00:28:46 and then move it to mobile and so on.

00:28:48 That’s right.

00:28:49 So there’s that cohesiveness.

00:28:52 So can you maybe give me, whatever I missed,

00:28:56 a bigger overview of the mission of the ecosystem

00:28:58 that’s trying to be built and where is it moving forward?

00:29:02 Yeah. So in short, the way I like to think of this is

00:29:06 our goals to enable machine learning.

00:29:09 And in a couple of ways, you know, one is

00:29:13 we have lots of exciting things going on in ML today.

00:29:16 We started with deep learning,

00:29:17 but we now support a bunch of other algorithms too.

00:29:21 So one is to, on the research side,

00:29:23 keep pushing on the state of the art.

00:29:25 Can we, you know, how do we enable researchers

00:29:27 to build the next amazing thing?

00:29:28 So BERT came out recently, you know,

00:29:31 it’s great that people are able to do new kinds of research.

00:29:33 And there are lots of amazing research

00:29:35 that happens across the world.

00:29:37 So that’s one direction.

00:29:38 The other is how do you take that across

00:29:42 all the people outside who want to take that research

00:29:45 and do some great things with it

00:29:46 and integrate it to build real products,

00:29:48 to have a real impact on people.

00:29:51 And so if that’s the other axes in some ways,

00:29:56 you know, at a high level, one way I think about it is

00:29:59 there are a crazy number of compute devices

00:30:02 across the world.

00:30:04 And we often used to think of ML and training

00:30:07 and all of this as, okay, something you do

00:30:09 either in the workstation or the data center or cloud.

00:30:13 But we see things running on the phones.

00:30:15 We see things running on really tiny chips.

00:30:17 I mean, we had some demos at the developer summit.

00:30:20 And so the way I think about this ecosystem is

00:30:25 how do we help get machine learning on every device

00:30:29 that has a compute capability?

00:30:32 And that continues to grow and so in some ways

00:30:36 this ecosystem is looked at, you know,

00:30:38 various aspects of that and grown over time

00:30:41 to cover more of those.

00:30:42 And we continue to push the boundaries.

00:30:44 In some areas we’ve built more tooling

00:30:48 and things around that to help you.

00:30:50 I mean, the first tool we started was TensorBoard.

00:30:52 You wanted to learn just the training piece,

00:30:56 the effects or TensorFlow extended

00:30:58 to really do your entire ML pipelines.

00:31:00 If you’re, you know, care about all that production stuff,

00:31:04 but then going to the edge,

00:31:06 going to different kinds of things.

00:31:09 And it’s not just us now.

00:31:11 We are a place where there are lots of libraries

00:31:14 being built on top.

00:31:15 So there are some for research,

00:31:17 maybe things like TensorFlow agents

00:31:20 or TensorFlow probability that started as research things

00:31:22 or for researchers for focusing

00:31:24 on certain kinds of algorithms,

00:31:26 but they’re also being deployed

00:31:27 or used by, you know, production folks.

00:31:30 And some have come from within Google,

00:31:33 just teams across Google

00:31:34 who wanted to build these things.

00:31:37 Others have come from just the community

00:31:39 because there are different pieces

00:31:41 that different parts of the community care about.

00:31:44 And I see our goal as enabling even that, right?

00:31:49 It’s not, we cannot and won’t build every single thing.

00:31:53 That just doesn’t make sense.

00:31:54 But if we can enable others to build the things

00:31:57 that they care about, and there’s a broader community

00:32:00 that cares about that, and we can help encourage that,

00:32:02 and that’s great.

00:32:05 That really helps the entire ecosystem, not just those.

00:32:08 One of the big things about 2.0 that we’re pushing on is,

00:32:11 okay, we have these so many different pieces, right?

00:32:14 How do we help make all of them work well together?

00:32:18 So there are a few key pieces there that we’re pushing on,

00:32:21 one being the core format in there

00:32:23 and how we share the models themselves

00:32:26 through save model and TensorFlow hub and so on.

00:32:30 And a few of the pieces that we really put this together.

00:32:34 I was very skeptical that that’s,

00:32:35 you know, when TensorFlow.js came out,

00:32:37 it didn’t seem, or deep learning JS as it was earlier.

00:32:40 Yeah, that was the first.

00:32:41 It seems like technically very difficult project.

00:32:45 As a standalone, it’s not as difficult,

00:32:47 but as a thing that integrates into the ecosystem,

00:32:49 it seems very difficult.

00:32:51 So, I mean, there’s a lot of aspects of this

00:32:53 you’re making look easy, but,

00:32:54 and the technical side,

00:32:57 how many challenges have to be overcome here?

00:33:00 A lot.

00:33:01 And still have to be overcome.

00:33:03 That’s the question here too.

00:33:04 There are lots of steps to it, right?

00:33:06 And we’ve iterated over the last few years,

00:33:07 so there’s a lot we’ve learned.

00:33:10 I, yeah, and often when things come together well,

00:33:14 things look easy and that’s exactly the point.

00:33:16 It should be easy for the end user,

00:33:18 but there are lots of things that go behind that.

00:33:21 If I think about still challenges ahead,

00:33:25 there are,

00:33:29 you know, we have a lot more devices coming on board,

00:33:32 for example, from the hardware perspective.

00:33:35 How do we make it really easy for these vendors

00:33:37 to integrate with something like TensorFlow, right?

00:33:42 So there’s a lot of compiler stuff

00:33:43 that others are working on.

00:33:45 There are things we can do in terms of our APIs

00:33:48 and so on that we can do.

00:33:50 As we, you know,

00:33:52 TensorFlow started as a very monolithic system

00:33:55 and to some extent it still is.

00:33:57 There are less, lots of tools around it,

00:33:59 but the core is still pretty large and monolithic.

00:34:02 One of the key challenges for us to scale that out

00:34:05 is how do we break that apart with clearer interfaces?

00:34:10 It’s, you know, in some ways it’s software engineering 101,

00:34:14 but for a system that’s now four years old, I guess,

00:34:18 or more, and that’s still rapidly evolving

00:34:21 and that we’re not slowing down with,

00:34:23 it’s hard to change and modify and really break apart.

00:34:28 It’s sort of like, as people say, right,

00:34:29 it’s like changing the engine with a car running

00:34:32 or trying to fix that.

00:34:33 That’s exactly what we’re trying to do.

00:34:35 So there’s a challenge here

00:34:37 because the downside of so many people

00:34:41 being excited about TensorFlow

00:34:43 and coming to rely on it in many of their applications

00:34:48 is that you’re kind of responsible,

00:34:52 like it’s the technical debt.

00:34:53 You’re responsible for previous versions

00:34:55 to some degree still working.

00:34:57 So when you’re trying to innovate,

00:34:59 I mean, it’s probably easier

00:35:02 to just start from scratch every few months.

00:35:04 Absolutely.

00:35:07 So do you feel the pain of that?

00:35:09 2.0 does break some back compatibility,

00:35:14 but not too much.

00:35:15 It seems like the conversion is pretty straightforward.

00:35:18 Do you think that’s still important

00:35:20 given how quickly deep learning is changing?

00:35:22 Can you just, the things that you’ve learned,

00:35:26 can you just start over or is there pressure to not?

00:35:29 It’s a tricky balance.

00:35:31 So if it was just a researcher writing a paper

00:35:36 who a year later will not look at that code again,

00:35:39 sure, it doesn’t matter.

00:35:41 There are a lot of production systems

00:35:43 that rely on TensorFlow,

00:35:44 both at Google and across the world.

00:35:47 And people worry about this.

00:35:49 I mean, these systems run for a long time.

00:35:53 So it is important to keep that compatibility and so on.

00:35:57 And yes, it does come with a huge cost.

00:35:59 There’s, we have to think about a lot of things

00:36:02 as we do new things and make new changes.

00:36:06 I think it’s a trade off, right?

00:36:09 You can, you might slow certain kinds of things down,

00:36:12 but the overall value you’re bringing

00:36:14 because of that is much bigger

00:36:16 because it’s not just about breaking the person yesterday.

00:36:20 It’s also about telling the person tomorrow

00:36:23 that, you know what, this is how we do things.

00:36:26 We’re not gonna break you when you come on board

00:36:28 because there are lots of new people

00:36:29 who are also gonna come on board.

00:36:31 And, you know, one way I like to think about this,

00:36:34 and I always push the team to think about it as well,

00:36:37 when you wanna do new things,

00:36:39 you wanna start with a clean slate.

00:36:42 Design with a clean slate in mind,

00:36:44 and then we’ll figure out

00:36:46 how to make sure all the other things work.

00:36:48 And yes, we do make compromises occasionally,

00:36:52 but unless you design with the clean slate

00:36:55 and not worry about that,

00:36:56 you’ll never get to a good place.

00:36:58 Oh, that’s brilliant, so even if you are responsible

00:37:02 when you’re in the idea stage,

00:37:04 when you’re thinking of new,

00:37:05 just put all that behind you.

00:37:07 Okay, that’s really, really well put.

00:37:09 So I have to ask this

00:37:11 because a lot of students, developers ask me

00:37:13 how I feel about PyTorch versus TensorFlow.

00:37:16 So I’ve recently completely switched

00:37:18 my research group to TensorFlow.

00:37:20 I wish everybody would just use the same thing,

00:37:23 and TensorFlow is as close to that, I believe, as we have.

00:37:26 But do you enjoy competition?

00:37:32 So TensorFlow is leading in many ways,

00:37:34 on many dimensions in terms of ecosystem,

00:37:36 in terms of number of users,

00:37:39 momentum, power, production levels, so on,

00:37:41 but a lot of researchers are now also using PyTorch.

00:37:46 Do you enjoy that kind of competition

00:37:47 or do you just ignore it

00:37:48 and focus on making TensorFlow the best that it can be?

00:37:52 So just like research or anything people are doing,

00:37:55 it’s great to get different kinds of ideas.

00:37:58 And when we started with TensorFlow,

00:38:01 like I was saying earlier,

00:38:03 one, it was very important

00:38:05 for us to also have production in mind.

00:38:07 We didn’t want just research, right?

00:38:09 And that’s why we chose certain things.

00:38:11 Now PyTorch came along and said,

00:38:12 you know what, I only care about research.

00:38:14 This is what I’m trying to do.

00:38:16 What’s the best thing I can do for this?

00:38:18 And it started iterating and said,

00:38:20 okay, I don’t need to worry about graphs.

00:38:22 Let me just run things.

00:38:24 And I don’t care if it’s not as fast as it can be,

00:38:27 but let me just make this part easy.

00:38:30 And there are things you can learn from that, right?

00:38:32 They, again, had the benefit of seeing what had come before,

00:38:36 but also exploring certain different kinds of spaces.

00:38:40 And they had some good things there,

00:38:43 building on say things like JNR and so on before that.

00:38:46 So competition is definitely interesting.

00:38:49 It made us, you know,

00:38:50 this is an area that we had thought about,

00:38:51 like I said, way early on.

00:38:53 Over time we had revisited this a couple of times,

00:38:56 should we add this again?

00:38:59 At some point we said, you know what,

00:39:01 it seems like this can be done well,

00:39:02 so let’s try it again.

00:39:04 And that’s how we started pushing on eager execution.

00:39:07 How do we combine those two together?

00:39:09 Which has finally come very well together in 2.0,

00:39:13 but it took us a while to get all the things together

00:39:15 and so on.

00:39:16 So let me ask, put another way,

00:39:19 I think eager execution is a really powerful thing

00:39:21 that was added.

00:39:22 Do you think it wouldn’t have been,

00:39:25 you know, Muhammad Ali versus Frasier, right?

00:39:28 Do you think it wouldn’t have been added as quickly

00:39:31 if PyTorch wasn’t there?

00:39:33 It might have taken longer.

00:39:35 No longer?

00:39:36 Yeah, it was, I mean,

00:39:37 we had tried some variants of that before,

00:39:38 so I’m sure it would have happened,

00:39:40 but it might have taken longer.

00:39:42 I’m grateful that TensorFlow is finally

00:39:44 in the way they did.

00:39:44 It’s doing some incredible work last couple years.

00:39:47 What other things that we didn’t talk about

00:39:49 are you looking forward in 2.0?

00:39:51 That comes to mind.

00:39:54 So we talked about some of the ecosystem stuff,

00:39:56 making it easily accessible to Keras,

00:40:00 eager execution.

00:40:01 Is there other things that we missed?

00:40:03 Yeah, so I would say one is just where 2.0 is,

00:40:07 and you know, with all the things that we’ve talked about,

00:40:10 I think as we think beyond that,

00:40:13 there are lots of other things that it enables us to do

00:40:16 and that we’re excited about.

00:40:18 So what it’s setting us up for,

00:40:20 okay, here are these really clean APIs.

00:40:22 We’ve cleaned up the surface for what the users want.

00:40:25 What it also allows us to do a whole bunch of stuff

00:40:28 behind the scenes once we are ready with 2.0.

00:40:31 So for example, in TensorFlow with graphs

00:40:36 and all the things you could do,

00:40:37 you could always get a lot of good performance

00:40:40 if you spent the time to tune it, right?

00:40:43 And we’ve clearly shown that, lots of people do that.

00:40:47 With 2.0, with these APIs, where we are,

00:40:53 we can give you a lot of performance

00:40:55 just with whatever you do.

00:40:57 You know, because we see these, it’s much cleaner.

00:41:01 We know most people are gonna do things this way.

00:41:03 We can really optimize for that

00:41:05 and get a lot of those things out of the box.

00:41:09 And it really allows us, you know,

00:41:10 both for single machine and distributed and so on,

00:41:13 to really explore other spaces behind the scenes

00:41:17 after 2.0 in the future versions as well.

00:41:19 So right now the team’s really excited about that,

00:41:23 that over time I think we’ll see that.

00:41:25 The other piece that I was talking about

00:41:27 in terms of just restructuring the monolithic thing

00:41:31 into more pieces and making it more modular,

00:41:34 I think that’s gonna be really important

00:41:36 for a lot of the other people in the ecosystem,

00:41:41 other organizations and so on that wanted to build things.

00:41:44 Can you elaborate a little bit what you mean

00:41:46 by making TensorFlow ecosystem more modular?

00:41:50 So the way it’s organized today is there’s one,

00:41:55 there are lots of repositories

00:41:56 in the TensorFlow organization at GitHub.

00:41:58 The core one where we have TensorFlow,

00:42:01 it has the execution engine,

00:42:04 it has the key backends for CPUs and GPUs,

00:42:08 it has the work to do distributed stuff.

00:42:12 And all of these just work together

00:42:14 in a single library or binary.

00:42:17 There’s no way to split them apart easily.

00:42:18 I mean, there are some interfaces,

00:42:20 but they’re not very clean.

00:42:21 In a perfect world, you would have clean interfaces where,

00:42:24 okay, I wanna run it on my fancy cluster

00:42:27 with some custom networking,

00:42:29 just implement this and do that.

00:42:31 I mean, we kind of support that,

00:42:32 but it’s hard for people today.

00:42:35 I think as we are starting to see more interesting things

00:42:38 in some of these spaces,

00:42:39 having that clean separation will really start to help.

00:42:42 And again, going to the large size of the ecosystem

00:42:47 and the different groups involved there,

00:42:50 enabling people to evolve

00:42:52 and push on things more independently

00:42:54 just allows it to scale better.

00:42:56 And by people, you mean individual developers and?

00:42:59 And organizations.

00:43:00 That’s right.

00:43:01 So the hope is that everybody sort of major,

00:43:04 I don’t know, Pepsi or something uses,

00:43:06 like major corporations go to TensorFlow to this kind of.

00:43:11 Yeah, if you look at enterprises like Pepsi or these,

00:43:13 I mean, a lot of them are already using TensorFlow.

00:43:15 They are not the ones that do the development

00:43:18 or changes in the core.

00:43:20 Some of them do, but a lot of them don’t.

00:43:21 I mean, they touch small pieces.

00:43:23 There are lots of these,

00:43:25 some of them being, let’s say, hardware vendors

00:43:27 who are building their custom hardware

00:43:28 and they want their own pieces.

00:43:30 Or some of them being bigger companies, say, IBM.

00:43:34 I mean, they’re involved in some of our

00:43:36 special interest groups,

00:43:38 and they see a lot of users

00:43:39 who want certain things and they want to optimize for that.

00:43:42 So folks like that often.

00:43:44 Autonomous vehicle companies, perhaps.

00:43:46 Exactly, yes.

00:43:48 So, yeah, like I mentioned,

00:43:50 TensorFlow has been downloaded 41 million times,

00:43:52 50,000 commits, almost 10,000 pull requests,

00:43:56 and 1,800 contributors.

00:43:58 So I’m not sure if you can explain it,

00:44:02 but what does it take to build a community like that?

00:44:06 In retrospect, what do you think,

00:44:09 what is the critical thing that allowed

00:44:11 for this growth to happen,

00:44:12 and how does that growth continue?

00:44:14 Yeah, yeah, that’s an interesting question.

00:44:17 I wish I had all the answers there, I guess,

00:44:20 so you could replicate it.

00:44:22 I think there are a number of things

00:44:25 that need to come together, right?

00:44:27 One, just like any new thing,

00:44:32 it is about, there’s a sweet spot of timing,

00:44:35 what’s needed, does it grow with,

00:44:38 what’s needed, so in this case, for example,

00:44:41 TensorFlow’s not just grown because it was a good tool,

00:44:43 it’s also grown with the growth of deep learning itself.

00:44:46 So those factors come into play.

00:44:49 Other than that, though,

00:44:52 I think just hearing, listening to the community,

00:44:55 what they do, what they need,

00:44:57 being open to, like in terms of external contributions,

00:45:01 we’ve spent a lot of time in making sure

00:45:04 we can accept those contributions well,

00:45:06 we can help the contributors in adding those,

00:45:09 putting the right process in place,

00:45:11 getting the right kind of community,

00:45:13 welcoming them and so on.

00:45:16 Like over the last year, we’ve really pushed on transparency,

00:45:19 that’s important for an open source project.

00:45:22 People wanna know where things are going,

00:45:23 and we’re like, okay, here’s a process

00:45:26 where you can do that, here are our RFCs and so on.

00:45:29 So thinking through, there are lots of community aspects

00:45:32 that come into that you can really work on.

00:45:35 As a small project, it’s maybe easy to do

00:45:38 because there’s like two developers and you can do those.

00:45:42 As you grow, putting more of these processes in place,

00:45:46 thinking about the documentation,

00:45:49 thinking about what two developers care about,

00:45:51 what kind of tools would they want to use,

00:45:55 all of these come into play, I think.

00:45:56 So one of the big things I think

00:45:58 that feeds the TensorFlow fire

00:46:00 is people building something on TensorFlow,

00:46:03 and implement a particular architecture

00:46:07 that does something cool and useful,

00:46:09 and they put that on GitHub.

00:46:11 And so it just feeds this growth.

00:46:15 Do you have a sense that with 2.0 and 1.0

00:46:19 that there may be a little bit of a partitioning

00:46:21 like there is with Python 2 and 3,

00:46:24 that there’ll be a code base

00:46:26 and in the older versions of TensorFlow,

00:46:28 they will not be as compatible easily?

00:46:31 Or are you pretty confident that this kind of conversion

00:46:35 is pretty natural and easy to do?

00:46:37 So we’re definitely working hard

00:46:39 to make that very easy to do.

00:46:41 There’s lots of tooling that we talked about

00:46:43 at the developer summit this week,

00:46:45 and we’ll continue to invest in that tooling.

00:46:48 It’s, you know, when you think

00:46:50 of these significant version changes,

00:46:52 that’s always a risk,

00:46:53 and we are really pushing hard

00:46:55 to make that transition very, very smooth.

00:46:58 So I think, so at some level,

00:47:02 people wanna move and they see the value in the new thing.

00:47:05 They don’t wanna move just because it’s a new thing,

00:47:07 and some people do,

00:47:08 but most people want a really good thing.

00:47:11 And I think over the next few months,

00:47:13 as people start to see the value,

00:47:15 we’ll definitely see that shift happening.

00:47:17 So I’m pretty excited and confident

00:47:19 that we will see people moving.

00:47:22 As you said earlier, this field is also moving rapidly,

00:47:24 so that’ll help because we can do more things

00:47:26 and all the new things will clearly happen in 2.x,

00:47:29 so people will have lots of good reasons to move.

00:47:32 So what do you think TensorFlow 3.0 looks like?

00:47:36 Is there, are things happening so crazily

00:47:40 that even at the end of this year

00:47:42 seems impossible to plan for?

00:47:45 Or is it possible to plan for the next five years?

00:47:49 I think it’s tricky.

00:47:50 There are some things that we can expect

00:47:54 in terms of, okay, change, yes, change is gonna happen.

00:47:59 Are there some things gonna stick around

00:48:01 and some things not gonna stick around?

00:48:03 I would say the basics of deep learning,

00:48:08 the, you know, say convolution models

00:48:10 or the basic kind of things,

00:48:12 they’ll probably be around in some form still in five years.

00:48:16 Will RL and GAN stay?

00:48:18 Very likely, based on where they are.

00:48:21 Will we have new things?

00:48:22 Probably, but those are hard to predict.

00:48:24 And some directionally, some things that we can see is,

00:48:30 you know, in things that we’re starting to do, right,

00:48:32 with some of our projects right now

00:48:35 is just 2.0 combining eager execution and graphs

00:48:39 where we’re starting to make it more like

00:48:41 just your natural programming language.

00:48:43 You’re not trying to program something else.

00:48:45 Similarly, with Swift for TensorFlow,

00:48:47 we’re taking that approach.

00:48:48 Can you do something ground up, right?

00:48:50 So some of those ideas seem like, okay,

00:48:52 that’s the right direction.

00:48:54 In five years, we expect to see more in that area.

00:48:58 Other things we don’t know is,

00:49:00 will hardware accelerators be the same?

00:49:03 Will we be able to train with four bits

00:49:06 instead of 32 bits?

00:49:09 And I think the TPU side of things is exploring that.

00:49:11 I mean, TPU is already on version three.

00:49:13 It seems that the evolution of TPU and TensorFlow

00:49:17 are sort of, they’re coevolving almost in terms of

00:49:23 both are learning from each other and from the community

00:49:25 and from the applications

00:49:27 where the biggest benefit is achieved.

00:49:29 That’s right.

00:49:30 You’ve been trying to sort of, with Eager, with Keras,

00:49:33 to make TensorFlow as accessible

00:49:34 and easy to use as possible.

00:49:36 What do you think, for beginners,

00:49:38 is the biggest thing they struggle with?

00:49:40 Have you encountered that?

00:49:42 Or is basically what Keras is solving is that Eager,

00:49:46 like we talked about?

00:49:47 Yeah, for some of them, like you said, right,

00:49:50 the beginners want to just be able to take

00:49:53 some image model,

00:49:54 they don’t care if it’s Inception or ResNet

00:49:57 or something else,

00:49:58 and do some training or transfer learning

00:50:00 on their kind of model.

00:50:02 Being able to make that easy is important.

00:50:04 So in some ways,

00:50:07 if you do that by providing them simple models

00:50:09 with say, in hub or so on,

00:50:11 they don’t care about what’s inside that box,

00:50:13 but they want to be able to use it.

00:50:15 So we’re pushing on, I think, different levels.

00:50:17 If you look at just a component that you get,

00:50:20 which has the layers already smooshed in,

00:50:22 the beginners probably just want that.

00:50:25 Then the next step is, okay,

00:50:26 look at building layers with Keras.

00:50:29 If you go out to research,

00:50:30 then they are probably writing custom layers themselves

00:50:33 or doing their own loops.

00:50:34 So there’s a whole spectrum there.

00:50:36 And then providing the pre trained models

00:50:38 seems to really decrease the time from you trying to start.

00:50:43 You could basically in a Colab notebook

00:50:46 achieve what you need.

00:50:49 So I’m basically answering my own question

00:50:51 because I think what TensorFlow delivered on recently

00:50:54 is trivial for beginners.

00:50:56 So I was just wondering if there was other pain points

00:51:00 you’re trying to ease,

00:51:01 but I’m not sure there would.

00:51:02 No, those are probably the big ones.

00:51:04 I see high schoolers doing a whole bunch of things now,

00:51:07 which is pretty amazing.

00:51:09 It’s both amazing and terrifying.

00:51:11 Yes.

00:51:12 In a sense that when they grow up,

00:51:15 it’s some incredible ideas will be coming from them.

00:51:19 So there’s certainly a technical aspect to your work,

00:51:21 but you also have a management aspect to your role

00:51:25 with TensorFlow leading the project,

00:51:27 a large number of developers and people.

00:51:31 So what do you look for in a good team?

00:51:34 What do you think?

00:51:36 Google has been at the forefront of exploring

00:51:38 what it takes to build a good team

00:51:40 and TensorFlow is one of the most cutting edge technologies

00:51:45 in the world.

00:51:46 So in this context, what do you think makes for a good team?

00:51:50 It’s definitely something I think a favorite about.

00:51:53 I think in terms of the team being able

00:51:59 to deliver something well,

00:52:01 one of the things that’s important is a cohesion

00:52:04 across the team.

00:52:05 So being able to execute together in doing things

00:52:10 that’s not an end, like at this scale,

00:52:13 an individual engineer can only do so much.

00:52:15 There’s a lot more that they can do together,

00:52:18 even though we have some amazing superstars across Google

00:52:21 and in the team, but there’s, you know,

00:52:25 often the way I see it as the product

00:52:27 of what the team generates is way larger

00:52:29 than the whole or the individual put together.

00:52:34 And so how do we have all of them work together,

00:52:37 the culture of the team itself,

00:52:40 hiring good people is important.

00:52:43 But part of that is it’s not just that,

00:52:45 okay, we hire a bunch of smart people

00:52:47 and throw them together and let them do things.

00:52:49 It’s also people have to care about what they’re building,

00:52:52 people have to be motivated for the right kind of things.

00:52:57 That’s often an important factor.

00:53:01 And, you know, finally, how do you put that together

00:53:04 with a somewhat unified vision of where we wanna go?

00:53:08 So are we all looking in the same direction

00:53:11 or each of us going all over?

00:53:13 And sometimes it’s a mix.

00:53:16 Google’s a very bottom up organization in some sense,

00:53:21 also research even more so, and that’s how we started.

00:53:26 But as we’ve become this larger product and ecosystem,

00:53:30 I think it’s also important to combine that well

00:53:33 with a mix of, okay, here’s the direction we wanna go in.

00:53:38 There is exploration we’ll do around that,

00:53:39 but let’s keep staying in that direction,

00:53:42 not just all over the place.

00:53:44 And is there a way you monitor the health of the team?

00:53:46 Sort of like, is there a way you know you did a good job?

00:53:51 The team is good?

00:53:53 Like, I mean, you’re sort of, you’re saying nice things,

00:53:56 but it’s sometimes difficult to determine how aligned.

00:54:00 Yes.

00:54:01 Because it’s not binary.

00:54:02 It’s not like there’s tensions and complexities and so on.

00:54:06 And the other element of the mission of superstars,

00:54:09 there’s so much, even at Google,

00:54:11 such a large percentage of work

00:54:13 is done by individual superstars too.

00:54:16 So there’s a, and sometimes those superstars

00:54:19 can be against the dynamic of a team and those tensions.

00:54:25 I mean, I’m sure in TensorFlow it might be

00:54:26 a little bit easier because the mission of the project

00:54:28 is so sort of beautiful.

00:54:31 You’re at the cutting edge, so it’s exciting.

00:54:34 But have you had struggle with that?

00:54:36 Has there been challenges?

00:54:38 There are always people challenges

00:54:39 in different kinds of ways.

00:54:41 That said, I think we’ve been what’s good

00:54:44 about getting people who care and are, you know,

00:54:48 have the same kind of culture,

00:54:50 and that’s Google in general to a large extent.

00:54:53 But also, like you said, given that the project

00:54:56 has had so many exciting things to do,

00:54:58 there’s been room for lots of people

00:55:00 to do different kinds of things and grow,

00:55:02 which does make the problem a bit easier, I guess.

00:55:05 And it allows people, depending on what they’re doing,

00:55:09 if there’s room around them, then that’s fine.

00:55:13 But yes, we do care about whether a superstar or not,

00:55:19 that they need to work well with the team across Google.

00:55:22 That’s interesting to hear.

00:55:23 So it’s like superstar or not,

00:55:26 the productivity broadly is about the team.

00:55:30 Yeah, yeah.

00:55:31 I mean, they might add a lot of value,

00:55:32 but if they’re hurting the team, then that’s a problem.

00:55:35 So in hiring engineers, it’s so interesting, right,

00:55:39 the hiring process.

00:55:40 What do you look for?

00:55:41 How do you determine a good developer

00:55:44 or a good member of a team

00:55:46 from just a few minutes or hours together?

00:55:50 Again, no magic answers, I’m sure.

00:55:52 Yeah, I mean, Google has a hiring process

00:55:55 that we’ve refined over the last 20 years, I guess,

00:55:59 and that you’ve probably heard and seen a lot about.

00:56:02 So we do work with the same hiring process

00:56:04 and that’s really helped.

00:56:08 For me in particular, I would say,

00:56:10 in addition to the core technical skills,

00:56:14 what does matter is their motivation

00:56:17 in what they wanna do.

00:56:19 Because if that doesn’t align well

00:56:21 with where we wanna go,

00:56:22 that’s not gonna lead to long term success

00:56:25 for either them or the team.

00:56:27 And I think that becomes more important

00:56:30 the more senior the person is,

00:56:31 but it’s important at every level.

00:56:33 Like even the junior most engineer,

00:56:34 if they’re not motivated to do well

00:56:36 at what they’re trying to do,

00:56:37 however smart they are,

00:56:38 it’s gonna be hard for them to succeed.

00:56:40 Does the Google hiring process touch on that passion?

00:56:44 So like trying to determine,

00:56:46 because I think as far as I understand,

00:56:48 maybe you can speak to it,

00:56:49 that the Google hiring process sort of helps

00:56:53 in the initial like determines the skill set there,

00:56:56 is your puzzle solving ability,

00:56:57 problem solving ability good?

00:56:59 But like, I’m not sure,

00:57:02 but it seems that the determining

00:57:05 whether the person is like fire inside them,

00:57:07 that burns to do anything really,

00:57:09 it doesn’t really matter.

00:57:09 It’s just some cool stuff,

00:57:11 I’m gonna do it.

00:57:15 Is that something that ultimately ends up

00:57:17 when they have a conversation with you

00:57:18 or once it gets closer to the team?

00:57:22 So one of the things we do have as part of the process

00:57:25 is just a culture fit,

00:57:27 like part of the interview process itself,

00:57:29 in addition to just the technical skills

00:57:31 and each engineer or whoever the interviewer is,

00:57:34 is supposed to rate the person on the culture

00:57:38 and the culture fit with Google and so on.

00:57:40 So that is definitely part of the process.

00:57:42 Now, there are various kinds of projects

00:57:45 and different kinds of things.

00:57:46 So there might be variants

00:57:48 and of the kind of culture you want there and so on.

00:57:51 And yes, that does vary.

00:57:52 So for example,

00:57:54 TensorFlow has always been a fast moving project

00:57:56 and we want people who are comfortable with that.

00:58:00 But at the same time now, for example,

00:58:02 we are at a place where we are also very full fledged product

00:58:05 and we wanna make sure things that work

00:58:07 really, really work, right?

00:58:09 You can’t cut corners all the time.

00:58:11 So balancing that out and finding the people

00:58:14 who are the right fit for those is important.

00:58:17 And I think those kinds of things do vary a bit

00:58:19 across projects and teams and product areas across Google.

00:58:23 And so you’ll see some differences there

00:58:25 in the final checklist.

00:58:27 But a lot of the core culture,

00:58:29 it comes along with just the engineering excellence

00:58:32 and so on.

00:58:34 What is the hardest part of your job?

00:58:39 I’ll take your pick, I guess.

00:58:41 It’s fun, I would say, right?

00:58:44 Hard, yes.

00:58:45 I mean, lots of things at different times.

00:58:47 I think that does vary.

00:58:49 So let me clarify that difficult things are fun

00:58:52 when you solve them, right?

00:58:53 So it’s fun in that sense.

00:58:57 I think the key to a successful thing across the board

00:59:02 and in this case, it’s a large ecosystem now,

00:59:05 but even a small product,

00:59:07 is striking that fine balance

00:59:09 across different aspects of it.

00:59:12 Sometimes it’s how fast do you go

00:59:13 versus how perfect it is.

00:59:17 Sometimes it’s how do you involve this huge community?

00:59:21 Who do you involve or do you decide,

00:59:23 okay, now is not a good time to involve them

00:59:25 because it’s not the right fit.

00:59:30 Sometimes it’s saying no to certain kinds of things.

00:59:33 Those are often the hard decisions.

00:59:36 Some of them you make quickly

00:59:39 because you don’t have the time.

00:59:41 Some of them you get time to think about them,

00:59:43 but they’re always hard.

00:59:44 So both choices are pretty good, those decisions.

00:59:49 What about deadlines?

00:59:50 Is this, do you find TensorFlow,

00:59:53 to be driven by deadlines

00:59:58 to a degree that a product might?

01:00:00 Or is there still a balance to where it’s less deadline?

01:00:04 You had the Dev Summit today

01:00:06 that came together incredibly.

01:00:08 Looked like there’s a lot of moving pieces and so on.

01:00:11 So did that deadline make people rise to the occasion

01:00:15 releasing TensorFlow 2.0 alpha?

01:00:18 I’m sure that was done last minute as well.

01:00:20 I mean, up to the last point.

01:00:25 Again, it’s one of those things

01:00:26 that you need to strike the good balance.

01:00:29 There’s some value that deadlines bring

01:00:32 that does bring a sense of urgency

01:00:33 to get the right things together.

01:00:35 Instead of getting the perfect thing out,

01:00:38 you need something that’s good and works well.

01:00:41 And the team definitely did a great job

01:00:43 in putting that together.

01:00:44 So I was very amazed and excited

01:00:45 by everything how that came together.

01:00:48 That said, across the year,

01:00:49 we try not to put out official deadlines.

01:00:52 We focus on key things that are important,

01:00:57 figure out how much of it’s important.

01:01:00 And we are developing in the open,

01:01:03 both internally and externally,

01:01:05 everything’s available to everybody.

01:01:07 So you can pick and look at where things are.

01:01:11 We do releases at a regular cadence.

01:01:13 So fine, if something doesn’t necessarily end up

01:01:16 this month, it’ll end up in the next release

01:01:17 in a month or two.

01:01:18 And that’s okay, but we want to keep moving

01:01:22 as fast as we can in these different areas.

01:01:26 Because we can iterate and improve on things,

01:01:29 sometimes it’s okay to put things out

01:01:31 that aren’t fully ready.

01:01:32 We’ll make sure it’s clear that okay,

01:01:34 this is experimental, but it’s out there

01:01:36 if you want to try and give feedback.

01:01:37 That’s very, very useful.

01:01:39 I think that quick cycle and quick iteration is important.

01:01:43 That’s what we often focus on rather than

01:01:46 here’s a deadline where you get everything else.

01:01:49 Is 2.0, is there pressure to make that stable?

01:01:52 Or like, for example, WordPress 5.0 just came out

01:01:57 and there was no pressure to,

01:02:00 it was a lot of build updates delivered way too late,

01:02:03 but, and they said, okay, well,

01:02:05 but we’re gonna release a lot of updates

01:02:07 really quickly to improve it.

01:02:09 Do you see TensorFlow 2.0 in that same kind of way

01:02:12 or is there this pressure to once it hits 2.0,

01:02:15 once you get to the release candidate

01:02:16 and then you get to the final,

01:02:18 that’s gonna be the stable thing?

01:02:22 So it’s gonna be stable in,

01:02:25 just like when NodeX was where every API that’s there

01:02:28 is gonna remain in work.

01:02:32 It doesn’t mean we can’t change things under the covers.

01:02:34 It doesn’t mean we can’t add things.

01:02:36 So there’s still a lot more for us to do

01:02:39 and we’ll continue to have more releases.

01:02:41 So in that sense, there’s still,

01:02:42 I don’t think we’ll be done in like two months

01:02:44 when we release this.

01:02:46 I don’t know if you can say, but is there,

01:02:49 there’s not external deadlines for TensorFlow 2.0,

01:02:53 but is there internal deadlines,

01:02:57 the artificial or otherwise,

01:02:58 that you’re trying to set for yourself

01:03:00 or is it whenever it’s ready?

01:03:03 So we want it to be a great product, right?

01:03:05 And that’s a big important piece for us.

01:03:09 TensorFlow’s already out there.

01:03:11 We have 41 million downloads for 1.0 X.

01:03:13 So it’s not like we have to have this.

01:03:16 Yeah, exactly.

01:03:17 So it’s not like, a lot of the features

01:03:19 that we’ve really polishing

01:03:21 and putting them together are there.

01:03:23 We don’t have to rush that just because.

01:03:26 So in that sense, we wanna get it right

01:03:28 and really focus on that.

01:03:29 That said, we have said that we are looking

01:03:31 to get this out in the next few months,

01:03:33 in the next quarter.

01:03:34 And as far as possible,

01:03:37 we’ll definitely try to make that happen.

01:03:39 Yeah, my favorite line was, spring is a relative concept.

01:03:44 I love it.

01:03:45 Yes.

01:03:46 Spoken like a true developer.

01:03:47 So something I’m really interested in

01:03:50 and your previous line of work is,

01:03:52 before TensorFlow, you led a team at Google on search ads.

01:03:57 I think this is a very interesting topic

01:04:01 on every level, on a technical level,

01:04:04 because at their best, ads connect people

01:04:07 to the things they want and need.

01:04:09 So, and at their worst, they’re just these things

01:04:12 that annoy the heck out of you

01:04:14 to the point of ruining the entire user experience

01:04:17 of whatever you’re actually doing.

01:04:20 So they have a bad rep, I guess.

01:04:23 And on the other end, so that this connecting users

01:04:28 to the thing they need and want

01:04:29 is a beautiful opportunity for machine learning to shine.

01:04:34 Like huge amounts of data that’s personalized

01:04:36 and you kind of map to the thing

01:04:37 they actually want won’t get annoyed.

01:04:40 So what have you learned from this,

01:04:43 Google that’s leading the world in this aspect,

01:04:45 what have you learned from that experience

01:04:47 and what do you think is the future of ads?

01:04:51 Take you back to that.

01:04:52 Yeah, yes, it’s been a while,

01:04:55 but I totally agree with what you said.

01:04:59 I think the search ads, the way it was always looked at

01:05:03 and I believe it still is,

01:05:04 is it’s an extension of what search is trying to do.

01:05:08 And the goal is to make the information

01:05:10 and make the world’s information accessible.

01:05:14 That’s it’s not just information,

01:05:17 but maybe products or other things that people care about.

01:05:20 And so it’s really important for them to align

01:05:23 with what the users need.

01:05:26 And in search ads, there’s a minimum quality level

01:05:30 before that ad would be shown.

01:05:32 If you don’t have an ad that hits that quality,

01:05:34 but it will not be shown even if we have it

01:05:35 and okay, maybe we lose some money there, that’s fine.

01:05:39 That is really, really important.

01:05:41 And I think that that is something I really liked

01:05:43 about being there.

01:05:45 Advertising is a key part.

01:05:48 I mean, as a model, it’s been around for ages, right?

01:05:51 It’s not a new model, it’s been adapted to the web

01:05:54 and became a core part of search

01:05:57 and many other search engines across the world.

01:06:00 And I do hope, like you said,

01:06:04 there are aspects of ads that are annoying

01:06:06 and I go to a website and if it just keeps popping

01:06:10 an ad in my face not to let me read,

01:06:12 that’s gonna be annoying clearly.

01:06:13 So I hope we can strike that balance

01:06:18 between showing a good ad where it’s valuable to the user

01:06:23 and provides the monetization to the service.

01:06:29 And this might be search, this might be a website,

01:06:32 all of these, they do need the monetization

01:06:35 for them to provide that service.

01:06:38 But if it’s done in a good balance between

01:06:43 showing just some random stuff that’s distracting

01:06:46 versus showing something that’s actually valuable.

01:06:49 So do you see it moving forward as to continue

01:06:54 being a model that funds businesses like Google,

01:07:00 that’s a significant revenue stream?

01:07:04 Because that’s one of the most exciting things

01:07:07 but also limiting things in the internet

01:07:09 is nobody wants to pay for anything.

01:07:11 And advertisements, again, coupled at their best,

01:07:14 are actually really useful and not annoying.

01:07:16 Do you see that continuing and growing and improving

01:07:21 or is there, do you see sort of more Netflix type models

01:07:26 where you have to start to pay for content?

01:07:28 I think it’s a mix.

01:07:29 I think it’s gonna take a long while for everything

01:07:32 to be paid on the internet, if at all, probably not.

01:07:35 I mean, I think there’s always gonna be things

01:07:37 that are sort of monetized with things like ads.

01:07:40 But over the last few years, I would say

01:07:42 we’ve definitely seen that transition towards

01:07:45 more paid services across the web

01:07:48 and people are willing to pay for them

01:07:50 because they do see the value.

01:07:51 I mean, Netflix is a great example.

01:07:53 I mean, we have YouTube doing things.

01:07:56 People pay for the apps they buy.

01:07:58 More people I find are willing to pay for newspaper content

01:08:03 for the good news websites across the web.

01:08:07 That wasn’t the case a few years,

01:08:08 even a few years ago, I would say.

01:08:11 And I just see that change in myself as well

01:08:13 and just lots of people around me.

01:08:14 So definitely hopeful that we’ll transition

01:08:17 to that mix model where maybe you get

01:08:20 to try something out for free, maybe with ads,

01:08:24 but then there’s a more clear revenue model

01:08:27 that sort of helps go beyond that.

01:08:30 So speaking of revenue, how is it that a person

01:08:35 can use the TPU in a Google call app for free?

01:08:39 So what’s the, I guess the question is,

01:08:43 what’s the future of TensorFlow in terms of empowering,

01:08:48 say, a class of 300 students?

01:08:51 And I’m asked by MIT, what is going to be the future

01:08:56 of them being able to do their homework in TensorFlow?

01:09:00 Like, where are they going to train these networks, right?

01:09:02 What’s that future look like with TPUs,

01:09:06 with cloud services, and so on?

01:09:08 I think a number of things there.

01:09:10 I mean, any TensorFlow open source,

01:09:12 you can run it wherever, you can run it on your desktop

01:09:15 and your desktops always keep getting more powerful,

01:09:17 so maybe you can do more.

01:09:19 My phone is like, I don’t know how many times

01:09:21 more powerful than my first desktop.

01:09:23 You’ll probably train it on your phone though,

01:09:25 yeah, that’s true.

01:09:26 Right, so in that sense, the power you have

01:09:28 in your hands is a lot more.

01:09:31 Clouds are actually very interesting from, say,

01:09:34 students or courses perspective,

01:09:36 because they make it very easy to get started.

01:09:40 I mean, Colab, the great thing about it is,

01:09:42 go to a website and it just works.

01:09:45 No installation needed, nothing to,

01:09:47 you’re just there and things are working.

01:09:50 That’s really the power of cloud as well.

01:09:52 And so I do expect that to grow.

01:09:55 Again, Colab is a free service.

01:09:57 It’s great to get started, to play with things,

01:10:00 to explore things.

01:10:03 That said, with free, you can only get so much.

01:10:06 You’d be, yeah.

01:10:08 So just like we were talking about,

01:10:10 free versus paid, yeah, there are services

01:10:12 you can pay for and get a lot more.

01:10:15 Great, so if I’m a complete beginner

01:10:17 interested in machine learning and TensorFlow,

01:10:19 what should I do?

01:10:21 Probably start with going to our website

01:10:23 and playing there.

01:10:24 So just go to TensorFlow.org and start clicking on things.

01:10:26 Yep, check out tutorials and guides.

01:10:28 There’s stuff you can just click there

01:10:29 and go to a Colab and do things.

01:10:31 No installation needed, you can get started right there.

01:10:34 Okay, awesome, Rajit, thank you so much for talking today.

01:10:36 Thank you, Lex, it was great.