Why we need Version Control for Data with Joe Doliner Co-founder and CTO of Pachyderm

The below is a full (unedited), machine-generated transcript of a Youtube session / podcasting episode I recorded with Joe Doliner, the co-founder and CEO of Pachyderm in Q2 2019. You can view the video/listen to the podcast on YoutubeApple PodcastStitcher or wherever you get your podcasts.


Erasmus Elsner 0:01 
Hi, and welcome to another episode of Sand Hill Road, the show where I talk to successful startup founders and their venture capitalists, about the companies that they built and invested. And the goal like always, is to give you a sense of what it’s like to be in their shoes. And today, I’m super excited to be joined by Joe Doliner JD, who’s the co-founder and CEO of Pachyderm. Put very simply, Pachyderm is version control for data. So what Gitlab and GitHub does for code, Pachyderm does for data, in particular large datasets think Hadoop, MapReduce, or Spark-scale data sets. So this means Pachyderm helps you track the state of your data over time, back-test models on historical data, share models and results across teams. As such pachyderm is in my opinion, building a core piece of the infrastructure for the large data set, or Big Data developer and application ecosystem. When they started out in 2014. When companies wanted to get serious about big data analysis, they had to hire an elite set of programmers with a specialisation in Hadoop and MapReduce jobs or they could contract third-parties like Cloudera. But neither of these options were inexpensive undertakings. And this is exactly where Pachyderm with their open core model came in. Having worked as a data infrastructure engineer at Airbnb in 2014. JD realized the complexity, but also the potential of solving this large data set version control problems. Today pachyderm is a fast growing tech company, finance but some of the best names in the valley. They were part of the 2015, Y Combinator winter batch. After going to Y Combinator, there was a $2 million seed round backed by the likes of DataCollective, Susa Ventures and Foundation capital, and the Crunchfund. It was really interesting to hear from JD In this episode, that from the seed round, it wasn’t really a linear pathway to series A. So last November, they closed on a 10 million series A led by Benchmark. And in this episode, JD shares the really interesting journey on how the series A was really a long time in the making.


Erasmus Elsner 2:50 
So I’m super excited to dig into the founder product and funding journey of Pachyderm. So let’s dig right in. So for the first question, there was actually a little bit of a problem with the recording. So I asked JD, how he found his co founder, and how the YC journey went. And he told me that the first time he applied to YC, it was actually on his own. And it was only after he got rejected for first time, that he and his co founder have joined forces, but let’s hear it from him.

Joe Doliner 3:34 
And so it was at that point that I went to Joey and said, You know, I, of course, told him that I’d been applying to rethink YC. and stuff, this wasn’t totally out of the blue for him, I told him like, look like they didn’t, they didn’t let me in. But here’s what they said, I, you know, it was all very, very positive, I think I’m going to apply again in six months. One of the main things that they said I should have at that point is a co founder. And I think that you’re exactly the right guy to do that. And so then we sort of started talking about, like, why, why we thought that we would be the right sort of mix of co founders to do a company like that. And, you know, my basic premise for that was that for a company like pachyderm, or like rethinkdb, which was a database company, the two sort of most important aspects of the company are the technical engineering aspects, and the sales and operational aspects, because you need to be able to, you know, sell into big enterprises and navigate their org charts and their procurement processes and everything like that. And so I felt like we needed to have a founder who was going to be really, really interested and want to do those parts of work, not just be willing to do those parts of work. But that’s actually, you know, the thing that they wake up in the morning thinking about and so, you know, Joey and I have a very, very natural split in how we work on the company in that I, you know, focus on all the technical stuff, and I wake up every morning thinking about how can we make this product better? How can we make the technology you know, faster and meet our users needs better? and Joey wakes up thinking about how can we get this More people’s hands, how can we you know, get this into companies and get them to be successful with it and get them to pay us for it and get our message out there at conferences and everything like that. And so that was basically how the conversations proceeded was us just figuring out like, was this going to be a good partnership.

Erasmus Elsner 5:17 
And then basically, you were working on this already before, why see, at least intellectually for for a couple of years. And then you went through Y Combinator, a lot of people who go through Y Combinator, at the beginning, they don’t really feel like especially those who go directly from, from a large technology job into YC. They, they often tell me that they don’t feel like a real company in the beginning, and only after they sort of incorporate get all the mentorship and do the pitching and the demo day, that it at the end of the process, they really feel like, like they have a real company there. 

Joe Doliner 5:53 
And that is certainly I think matches with my experience very well. I mean, YC is some of the best early validation that you can get. And, you know, to anybody who is thinking of applying to YC, like that alone is, is definitely not a good enough reason to do it, like YC will not take you from zero to a company. But why see, we’ll help you to sort of like put together and realize and have the self confidence that this idea is actually something that people are going to be willing to pay for. And I think that a lot of where this comes from NYC is less to do with, like any of the individual partners or stuff that YC the organization has does, and it has a lot more to do with the community and the fact that you’re just surrounded by people who are at a similar stage in their startup. And you know, they believe in their company, and they believe in your company, and you sort of like help each other out, you know, I experienced a very similar thing. When I went to college, where in high school, I sort of had some some wonders of like, Can I actually be you know, like a math major Can I actually like, hack it at a top university and do this stuff. And then I got in there and I got in a classroom with other people. And I started realizing like, Oh, I know these concepts, I can do this. And it was just like having those people around me, made me believe that I was one of them.

Erasmus Elsner 7:15 
So you basically you go through Y Combinator in, in January 2015, you’re, you then basically launch the first version of your product, which gets covered on TechCrunch. And at this point, pachyderms is built already quite heavily into the whole container ecosystem building on Dockers. And at this point, I think, I think in in the fall, when you went through Y Combinator, I think Docker should have just raised their series c 40 million series C, that sounds right, starting to hit some traction, but you you were taking some platformers. And probably when you went out to pitch it to investors, they might not have been that familiar with the whole Docker and Kubernetes ecosystem that would emerge later on. So how was that process pitching this to investors? And also, how do you think about taking this platform risk at that point in time.

Joe Doliner 8:13 
Um, so I always felt that taking those risks was sort of one of the best, the best things that we could bet on. Just because I had a lot of confidence that if, if I knew nothing else about how to found and run this business, I at least knew what the open source ecosystem looked like, at the time and sort of how to navigate that, because that’s just the world that I’ve been embedded in. Since I learned how to program, you know, I’m always reading about like the latest infrastructure, open source projects and things like that. So I had a lot of confidence that Docker was going to succeed. And we actually had the really, really good fortune, it sort of set us on the path to founding this company, which was that really, really early on back before Docker was Docker, back when they were cloud. Solomon, the guy who wrote who originally wrote Docker, went on sort of a little tour of Silicon Valley to just like, demo it to anybody who’d watch to get feedback on it. And rethinkdb happened to be one of the places that he came to, to demo it because we he was good friends with the founder there. And so we got to see this like really hacky early version of Docker before anybody else did before. I think they’d even release it at HP icon that we looked at it and I at least knew and I think most of the people at Airbnb knew, like, Oh, this is a really awesome way for deploying code like this is this is a really big problem that like we have, and everyone we talked to has, and this is going to solve it. So I had a lot of confidence in Docker, in terms of this being like kind of a headwind or in the early days. It totally was. There was you know, a long process with with fundraising where like there was a certain set of investors who were just like, we’re like, we’re like not sold on Docker yet. And so if we’re not sold on Docker, then like your company makes absolutely no sense. So we don’t want to invest in you. And so that was a sort of large percentage of investors, certainly not the majority, but like a meaningful chunk that we started just couldn’t get to talk to us in the early days. And it was actually even more of a headwind when we were talking to customers. Because there, you know, we could go and talk to some customers, and they could say, look, we, we think that Docker is really cool, too. We think that Docker is the future. But we have the reality of our company’s infrastructure right now that we’re working with. And the reality is, we don’t use Docker, we deploy everything on EC two. And so if we’re going to need to use learn to use Docker, to use your product, then that’s like, we basically need to use two products, just just to get the value that you’re offering. And so this, the trade off wasn’t there for a lot of places. And, you know, there were even more places like this doesn’t happen anymore. But in the early days, it happened where we go in and they just hadn’t really heard of Docker, you know, they’ve maybe heard the word before, but they didn’t know what it was. And so you can kind of imagine how those sales pitches go because like the first 45 minutes or so you spend explaining Docker because Docker is not a super simple thing. And so and then it by for 45 minutes in, you know, if you’ve done your job, well, the people are like, okay, we understand what Docker is like, this makes sense. What do you guys do? Again? Like, did you guys write Docker? And so then you go into the packet or a picture, that’s just like, never gonna work at all. The good news is, though, that things have definitely flipped since then. And all of the tailwinds have become headwinds. And so now, when we walk into a company, and they asked us, you know, okay, this, the benefits of your products seemed really interesting to us. How do we deploy this and stuff? And we just say, Oh, you just, you just deployed directly on Kubernetes. And like, Oh, great. So I just tell my, like, Kubernetes administrator, like, hey, I need a namespace for packet arm, and like, you give me a YAML file that I throw at it. And we’re done. Right? Yep. That’s how it works. Now, that’s really, really awesome, because now there’s just so many places where that means that we can like, go right into their infrastructure directly without having to have any sort of a complicated deployment process. And that was really one of the original things that we wanted to fix. I mean, one of the things that my team did at Airbnb was we, we built the Hadoop clusters. And that was often a like, many months, like six months process of just drawing out all of the sort of, like terraform scripts, and how are we going to have all of these EC two nodes and who’s talking to whom and everything like that, just to get a cluster up and running that people could use. And so going from that to, we’ve already got Kubernetes running, I just throw like, some text at it, and all of a sudden, everything springs up, and it’s all like, architected correctly, and everything’s connected, and all the logs are captured and stuff like that, is one of the really big leap forwards we feel we provide compared to Hadoop.

Erasmus Elsner 12:43 
Yeah,I think that makes a lot of sense to me that probably in the in the series A, which I think you closed in last fall, people were already fully on board with what Docker is, and you could going right into, into what pachyderm actually does. So let’s, let’s dig a little bit into the product, I would say, let’s start a little bit with the basics of the product and try to make this really, really simple for the audience to wrap their head around. Yep. So let’s say you have on the one side, you have some data, then you have some code that transforms this data. And I think you call this a data pipeline. And Yep, at the end of this, you get an output a result. And if you if you were to implement this in a cloud infrastructure system, I would say probably most often, you see an AWS S3 object storage instance, where the data is stored, then you basically spin up your Docker image for the data pipeline, you might also run it and in more than one note through orchestrator through a Kubernetes container orchestration system, then you do the analysis, and then you get the result back, you store it back into the object storage, it says how it works. Where am I?

Joe Doliner 13:52 
Basically, that’s basically exactly how it works. Yeah, I mean, like, you know, you can go into more detail. Of course, like, each each of these steps has a lot of like, things you have to answer in terms of like, well, what happens when it fails? What happens if the node you’re on like magically disappears? Because that happens sometimes on the cloud. But, you know, I don’t I don’t think that those are particularly important details to have to have here. Yeah, that’s basically exactly how it works. Okay. You just described it. Okay, then.

Erasmus Elsner 14:21 
Maybe let me take it a step further. Basically, understand a little bit how Pachyderm maps versus Hadoop. So the way I understand it, Pachyderm has built its own file system. The Pachyderm file system is a little bit like the HDFS system file system that we see that we know from the Hadoop ecosystem. But the way I understand it, in the Hadoop clusters, most of the modern do classes, they basically they bypass the HDFS and they store directly to S3. And the way that I understand at least the pachyderm file system is that you You’re kind of putting data into the container and then taking it out. But the data lives somewhere else. So maybe shed some light on how this actually works.

Joe Doliner 15:09 
Yeah, absolutely. So we we do sort of the same thing that you just described, which is that everything is ultimately stored in S3. When we put data into a container, what we do is we take the user’s code. So you know, let’s say that they’ve got a container that runs a tensor flow job on some data. And so the first thing that we do is we boot up that container, and then pack it goes into the container, and it downloads the data from S3, and it writes it into the containers file system. And so that’s ephemeral, that’s just stored on the the sort of like local node. And if the container were to die, that data would go away, and we’d have to redownload it. But assuming it doesn’t die, what happens next is we boot up the users process, whatever code they told us to run, and that code starts running. And it finds in the file system, all of the data that it needs to to access. And so we can just read that off of disk, like any program reads any file off of disk, it can do whatever it wants to it with it. And then when it’s done, in this case, when it’s trained the model, it’s going to write that back out to disk in a special place. And so after their code exits, then our code goes back in and it slurps up all of that data, and they output it and we just stream that right back into S3. And we, you know, just sort of checkpoint it into there. So the thing that you observed about how, how people who are running Hadoop have sort of moved to storing a lot of stuff in S3, that’s something that’s a shift that was sort of just starting to happen toward the tail end of when I was using Hadoop at Airbnb. And it sort of flew in the face of one of the big like performance benefits of HDFS, which is that HDFS sort of always advertise that they bring the computation to the code. So when you fire off a computation to happen, you’ve got all of these different nodes that are storing some small subset of the data that needs to be processed. And what HDFS would do or really what MapReduce would do on top of HDFS is it would send these computations to the nodes where the data is stored, processed them there, so it doesn’t have to move the data around. And then it would aggregate the results into one place, which was a lot more efficient at a time when bandwidth within a data center was actually a limiting factor on these computations. But people can just kind of realize that the bandwidth within a data center is so high and the bandwidth of like getting data out of S3 from within Amazon’s data centers is so high that this optimization just wasn’t worth it anymore. Especially because storing stuff in S3 is just so much more operationally simple than running HDFS with all these hard drives that can fail and nodes can go down and stuff like that. So people just switch to this much more operation, operationally, simple thing that had some slight performance penalties, but was ultimately those performance penalties weren’t really that important. And we just sort of leapfrogged straight to that, because we just realized that like storing stuff in S3 was going to be the simplest thing to do, particularly in a container environment where, you know, it’s containers can come up and down even a lot more frequently than VMs. And so we just use S3 to store all this stuff. And it’s very simple. And you can basically store your entire packet arm cluster in an S3 bucket.

Erasmus Elsner 18:22 
So maybe let’s, let’s move on a little bit to the competitive like competitor product. So you’re highly integrated with Kubernetes. And as three as we just talked about. And so one of the competitive products out there is DVC, which is highly integrated with Git. So what went into your thought process? And we talked earlier about how you, you’re a strong believer in Dockers and Kubernetes. But how do you think as a result, pachyderm is perceived differently from DVC? Or? Or how do you position yourself in the marketplace? You would say, if I could say it’s probably you’re more geared towards the really large tech companies, but I’m not sure whether that’s, that’s really the case?

Joe Doliner 19:03 
I think that is the case. And I think certainly, our user base reflects that. I think more than anything, we’re geared toward large data sets, which, generally large companies have larger data sets than small companies, although that’s obviously not like a strict rule or anything. We’re also much more geared toward the sort of like, operational, like, I have this pipeline. And every single night, I dump my databases into this repo, and then I want the pipeline to run. And then on the other side, I want there to be machine learning models and reports and all of these things, versus I want to perform experiments on this data. I don’t know exactly what I’m looking for, but I want to you know, like fire off a job that does this, and maybe I get back something that’s totally useless. Or maybe I get back something that’s really interesting. And then I fired off another job based on that. were much more geared toward the prior use case. There. Look I’m looking over, I took a sort of look over dvcs like website and marketing material. Before this podcast, I haven’t actually used it yet. I remember when I remember when DVC came out, and I thought it was was pretty interesting. I think that DVC does a lot of the same things that packet RM does targeted at like slightly different use cases, slightly different scales, and things like that. I think the DVC is also a lot more tuned directly toward machine learning, at least that’s like, what’s what all the stuff on their site talks about. And packet arm tries to sort of be a little bit more generic of a data processing system. And, in practice, people use packet arm a lot for machine learning. But you know, when I go into a user’s cluster, who’s using machine learning, and I look at what their actual, like pipeline dag looks like, there’s like, a bunch of nodes that are doing machine learning. And then there’s way way more other parts that are just like cleaning the data and reformatting the data and doing quality checks on it and doing quality checks on the model. That’s an inference like that. And in actuality, like, it’s a lot of them are kind of boring, like a lot of them are just like things like you know, pivots and stuff that you can do in Excel. It’s just that when you’re doing that on petabytes of data, when you need to make sure that this runs every single night when you need to be able to like trace it all back to its origin in case like something looks funny. Like that becomes a much more interesting problem.

Erasmus Elsner 21:27 
So let’s talk a little bit about the business model. Pachyderm is open source, but you’re operating under this open core model where you also have an enterprise version. I had Sid on from Gitlab, I think he was in the same YC batch. I had him on the show, actually, yesterday, and I think you used to beat him on the Nintendo during YC. I think he recounted that once. 

Joe Doliner 22:04 Yeah, I’m a lot better at Nintendo than he is. 

Erasmus Elsner 21:05 Yeah. So basically, he like he had this whole process of like, going through like the donation model, the service model. So did you land directly on this open core model? Or what was the iteration process there?

Joe Doliner 22:14 
So there was, there was some iteration process, there was a lot less iteration for me than there was for Sid for the simple reason that said, sit and get lab, we’re probably like three or four years ahead of us, when they got into YC. Like they already had a pack, you know, product in business and everything like that. And we were still working on the MVP of our product. And so I got to short circuit a lot of that process by just having a few conversations with Syd. And him telling me like, Look, I’ve tried all of these things, I’ve systematically iterated on them, like, here’s what works, here’s what doesn’t work, things like that. And so our, you know, our business model is model lot off of the learnings of Gitlab and sort of looking around at other companies that have been in similar situations are our board member Chetan who’s the guy at Benchmark who did the investment is also on the board of Mongo and elastic. And he’s not actually on the board of Docker right now. But he’s someone else that Benchmark is. And so he’s got just a lot of like, you know, information on how every single company has accomplished this. And so we sort of realized that an open core model was really what made the most sense to us, because, as you said before, like our offering is very naturally tuned toward big companies. And there’s a bunch of features that those big companies were asking us for, that we hadn’t heard a thing about, from our opens a lot of our open source users, like the best example of this is sort of authorization and governance features of being able to say, like, who owns each piece of data, and who’s able to access this data, and what do I do to get access and things like that. And so those are the features that we wind up, we wound up building for our closed source enterprise offering, like that. The other aspect of our business model, which we’re just sort of writing and rolling out now is the the sort of hub component of it. So just like, you know, Gitlab is an enterprise product has an enterprise product that you can, like pay them for, and you can like, run on on your own servers. And a lot of that is open source, I think most of that is open source, but you can also just go to get lab comm and use, you know, a hosted version of get lab with all of the enterprise features, and you can pay for it like that. And that’s a really, really natural business model for a product like git, because Git is all about collaboration, right? And and it just gets a lot nicer and easier if you can just put this up on the public Internet rather than having it on your company’s internal servers. And the same exact thing applies for packet arm you know, packet arm is basically like get for data science. And so it’s all about collaboration on data and things like that. And so we’re building Using a hosted hub solution for packet ARB right now that we’re going to be launching in the second half of this year, that’s basically for people to get all of these enterprise features online in a hosted solution where, you know, they’re just paying us for the hosting costs. 

Erasmus Elsner 25:13 
I heard you announced this product before. And I think that’s, it’s gonna be really exciting. In terms of the enterprise version, what’s sort of the threshold where people upgrade from the, from the open source self hosted version to to the enterprise version, correct me if I’m wrong, but I think on the enterprise version, you have a lot of visualizations features, maybe talk a little bit about the threshold when companies start upgrading to the enterprise version.

Joe Doliner 25:40 
So normally, the the threshold comes from like some some sort of specific need, and we’ve identified like, probably five or six of them, that we’ve got in the enterprise product that we just don’t have any open source. So the biggest one, I would say, is like the authorization features of just being able to like, pin down who all of the different users are and integrate with LDAP and stuff like that. That’s just, that’s just something that you find at big companies that like a guy who’s working on a weekend project is never going to ask you about because it’s, it’s his cluster, he owns all but um, visualizations are a big part of it, too, you know, the right now, package are open source for you to sort of set up package or open source and then do like an interesting data science project on top of it, you’ve got to have a pretty broad skill set, you’ve got to basically be like, at least a competent DevOps engineer, at least a competent data engineer and a probably a really good data scientist to do something interesting on top of it, which, you know, some some people are, a lot of people are, but not everybody is. And so, companies, often they want to be able to have like our data engineering team, like understands how to go directly into Kubernetes, and see what’s wrong and sign logs and stuff like that. Our data scientists don’t like to do that so much. And so if they can have just a graphical interface, where they can click in and see like, here’s some data, here’s a pipeline, I want to put this data into this pipeline, something went wrong, like, click the button, show me Show me what happened with it. And things like that like that allows them to focus on what they’re good at a lot more, rather than having to muck around in like Kubernetes, container land and stuff like that, which is not that interesting to them. So that’s one of the big tick triggers. There’s also just sort of advanced pipeline profiling and performance, things like that. So like the ability to see like, this pipeline is running slowly, like, tell me more about why that is happening. Another another, I will say, like, pretty big trigger is just the support that comes with the enterprise product. You know, we have like open source support channels, which means we have a big Slack channel with all of our users. And then people ask questions, and we answer questions. But for enterprise users, like we give them private support channels, and we’ll get on the phone with them. And if there’s you know, anything, any blocking bugs or anything, we’ll get those fixed really, really quickly. And, you know, if they need a new feature, that’s something we haven’t thought of before. Like, we’ll work with them to design that and get it implemented, and things like that.

Erasmus Elsner 28:07 
So talking a little bit about customers, maybe you can mention some of the flagship customers and use cases we haven’t really touched on this I for me, like the the archetypical use cases really this this Airbnb, large geolocation data sets, but maybe you can you can give us some some some flagship customers and use cases of people using pachyderm today.

Joe Doliner 28:31 
Yeah, absolutely. Um, so one example is actually the Department of Defense. And we get used by a couple of defense departments. But most recently, the United States Department of Defense use packet or for hosting this competition for doing image recognition on drone footage. So basically, like they had a whole, you know, repository of drone footage, collected from drones that are flying around, and they wanted to host a competition to the public, where people attempt to implement algorithms to detect what’s going on, like be able to say, here’s a hospital, here’s a tree, things like that. And so they they implemented all of that on top of factor in, you know, this is a pretty, like clear data pipeline problem, wherein you have like, at the top all of this drone footage, and then a user submits some code to run and they create a pipeline that runs that code against the footage and gets a score from that and then sends that score back to the users. We’re also getting used in sort of a lot of geological studies things we recently onboard at Shell, the Petroleum Company, as as a customer. We also get used in a lot of biomedical sciences. So have you heard of the company ag biome? Yes, yes. Yeah, they do basically microbe design for agricultural products. Pachyderm also sees a lot of interest in financials, financial industries. So we’re working with the Royal Bank of Canada to build out basically their whole data platform that their data scientists use to consume data. And, you know, process it into forecasting models into low like loan algorithms and things like that.

Erasmus Elsner 30:20 
These make a lot of sense to me financial services, definitely one where you have really huge granular data sets, also image recognition. So let’s talk a little bit about scaling Pachyderm. And so you raise this 10 million series A in I think in November from which was led by Benchmark and you’re now in the scale up mode. So talk to me a little bit about what keeps you awake and, and how you’re deploying this capital. Now, I assume a lot of it goes into into engineering work and, and probably sales, but maybe talk a little bit about how you think about scaling this company.

Joe Doliner 30:57 
Yeah, so this is definitely This is one of the newest aspects of the sort of whole startup experience. For me, this is, you know, the company is past the point that I experienced at Airbnb. And well, before the point that I experienced at Airbnb, so I haven’t really seen a company go through this type of a stage, the inside, um, you know, the main thing that keeps me up at night, is thinking about how to build out and structure the employees in a way such that everybody is productive and satisfied with their job and not getting burned out. And so that it doesn’t become, you know, sort of the demoralizing state that a lot of big companies can reach. And so, you know, right now, packet arm has a very, very flat structure, we don’t really have any managers outside of the founders. And we have sort of intentionally kept it that way for a while and are just now getting to the point where we’re thinking about starting to hire some actual professional managers. And so one of the big things that I’m thinking about is, you know, how do I, how do I make this so that it doesn’t have the the downsides that people typically associate with like inserting layers of management, and things like that? And how do I let engineers sort of still be engineers, and, you know, kind of drive the company, I think one of the examples that I try to look to is, you know, Google, which I think this isn’t true anymore, but for a very, very long time into Google’s life. It was the company was really like an engineering first organization. And it was really run by the engineers, and the engineers kind of had the leeway to explore the ideas that they found most interesting. And Google benefited a lot from that, you know, they found a lot of interesting ideas. And so I want to sort of make sure that we build a similar engineering first organization, without making it so that like, sales are second class citizens that need to just like figure out how to fit it and coexist around engineering, and how they can actually do their jobs too. So, you know, it’s, it’s a tricky problem. And I don’t think any company actually does this perfectly, you just sort of try to try to avoid things that suck as much as possible. It is, I think, the best you can do. And, you know, sometimes there’s just there isn’t a way to solve it, you know, sometimes it’s just like, Look, you know, we we need to get everybody on the same page about this. And people are confused. And so that means we’re going to need to have a meeting together. And nobody, nobody likes to be sitting in a room with like, a million people all like, trying to talk about the same thing in a meeting, and nobody’s really quite sure what it is. And people have a hard time getting their voices heard. But like, sometimes you just sort of have to do the best known solutions to these problems. I’ve intentionally avoided like some companies try to have like really, really innovative management structures and things like that. And like, like a concept of just like, no bosses at all, like valve. The gaming company does that. And we’re not doing anything that that far afield, we try to do like fairly standard things. I just tried to make sure that like, we’re not we’re not becoming a demoralizing company, basically, that we’re not turning into something where we’re treating our employees just like, you know, human assets and stuff like that. And people don’t feel like we’re not hearing their voices.

Erasmus Elsner 34:23 
I always think that there is nothing human in the term “human resources”. 

Joe Doliner 34:24 
Exactly. The name human. We don’t have any human resources officially at our company right now. Like we don’t have a human resources department. And I know that can’t go on forever, but the term Human Resources just feel so demoralizing to me. Yeah,

Erasmus Elsner 34:41 
I hear you. So maybe last question is so from from the outside pachyderm looks really like an all around success. I mean, you’ve gone through YC you’ve raised a seed right after, or almost right afterward. Then you’ve now gone to series A but we’re Were there some, some points where you sort of hit hit the Roadblock, and you felt like this was going to fail? And you’re, you’re thinking to yourself, what am I? What did I do? Leaving Airbnb for, for this grueling startup journey?

Joe Doliner 35:16 
There, there have been many points like that. And and I will certainly say that while while from the outside packet arm looks like a success, you know, we’re still fairly early in the startup journey, you know, there’s still the still the most likely scenario for the company is that we die. And we’re going to have to really work hard and do a lot of things right to make that not be what happens. to point a sort of a specific case, before we raised our series A so we raised our series A at the end of last year. So I think it was like September of 2018, the entire year of 2017, we were trying to raise our series A and failing. And we were failing, basically, because the the company in the market just hadn’t matured enough. And, you know, we remained pretty confident throughout that this was still a good idea with a big, big addressable market, and that we had the right solution for it. We just kind of needed the the market to some time to like breathe and congeal around us. And for our solution to kind of like grow into it. But the entire year of 2017 was just every single investor that we talked to said no, almost every single customer that we talked to, like wasn’t quite there, we had a ton of things sort of fall through at the last minute. And we were we were really getting worried that the company was going to just die. And we things sort of didn’t turn around until the very end of the year. I think it was like, basically over between Christmas and New Year’s something like that, like very end of December. You know, we were getting we were sort of getting pretty close to the end of our runway, we hadn’t been able to get any investors to say yes. And so we realized, like, okay, the only way that we can get this company to survive is like if we can’t raise money from investors, we’ve got to raise money from customers. And so simple, right? Like, why who needs investors when you’ve got customers. And so at the very end of the year, like I said, like it must have been like right around Christmas. Joey managed to close BCG, the Boston Consulting Group, for a large six figure, figure contract, I think it was just just shy of 500k. And so getting that was both, you know, great, because that was the money like we looked at that like, Oh, shit, this is we just closed, you know, like six more months of runway in one contract. But then also, once we saw that, we realized, like, Oh, this is we can close contracts this big, you know, companies are going to be willing to pay this amount of money for packet arm. And so once we sort of had that things really started to change around 2018. And we closed another a couple more large six figure deals, investors started to become a little bit more receptive to us. And that all sort of culminated in 2000, then the end of 2018, when we finally raised money from Benchmark, one more sort of funny story to throw in there was that actually, one of the investors that we talked to, during 2017, when we couldn’t raise money was Chetan, who ultimately wound up investing in us. And that was the conversation with Chase and was probably one of the most demoralizing conversations, because we had our first meeting with him. And we were like, We felt like that went great. Like we had never had an investor so engaged, like he understood the problem, he understood the solution. He just was really into it. And then we didn’t even get a second conversation with him. And that was when we really started questioning, like, what are we just like, completely wrong? Like, were we just sitting there in the room thinking that he was loving it and secretly, like, he was just like, looking at his watch, like, when am I going to get these guys out of here? And so we started to just like I, you know, I guess we were wrong, like, chalk it up, like, keep on moving. And it wasn’t until the next year when we went to talk to Jason and now he moved to Benchmark so he was at NEA for the first conversation. And we talked to him and he was super interested again, and he was still super interested. And we were like, well, what’s going on? Like, you know, you seem super interested before then you said no, he’s like, well, it you know, NEA after the conversation that we had, I went back to the partners and told them I talked to this awesome company. And they were like, Look, you know, this just doesn’t quite fit into our portfolio thesis, you know, we’re looking for anyways, like the biggest investment firm and The world. And so they’re looking to deploy like $300 million into a company that’s going to IPO it within the next five years, rather than into a startup like us where they’re going to deploy like 10 million. And we’re going to like IPO in 10 years or something like that. And so it had nothing to do with the quality of the company that they had said no, to us, it was just the situation of the investment firm at the time. He’s like that. And so that’s why, like, it’s, it’s really, really important if you’re trying to raise money or doing anything with your startup, that you not let any one data point affect you that much. Because there’s just, there’s just so much going on, in any one decision. And it’s really hard not to because like you’re out there, sort of, like baring your soul, or at least you know, you’re in a very vulnerable position when you’re, you’re trying to get people to invest in your company that you’ve spent years and years and years working on. And so the fact that they can say no, because it’s like, not because you’ve done anything wrong, it’s just because we are looking for investments with this timeframe that we can invest this much money in, and like, the numbers here don’t match. And that’s, that’s perfectly reasonable for investment firms to do. You know, like, they’ve got to have sort of a thesis and a way of doing business and stuff like that. But it feels very personal, like, that one definitely like that would hit both of us pretty hard. The most important thing for all of this stuff is that, you know, being being with a co founder, where you you really do believe in each other and believe in the idea, you know, and you can at least sort of come back at the end of the day and say, like, Look, this person did invest in us. I don’t know why it sucks. I think they were wrong, but we’re just gonna keep going. Because, you know, I still believe in you, you still believe in me? And we both still believe in this company?

Erasmus Elsner 41:51 
Yeah, that’s a great story. I mean, I recently talked to Joseph Jackosn who is running COSS Capital? And we talked about how, for a long time, it was really hard for venture to wrap their head around the open-core model, the commercial open source software ecosystem in general. And I mean, you have people like Peter Fenton at Benchmark, who have invested in this space for a long time, and I think NEA, they just moved very much into the later stage deals over time.

Joe Doliner 42:23 
Yeah, and I get when I, when I talk to other entrepreneurs, particularly ones who are just starting out, I get a lot of questions. They’re sort of like, you know, I’m thinking of doing this, but I’ve heard that investors don’t like it when a company is like this, like, do you think I should do this instead? And I always tell them, like, maybe, you know, if it’s really, really important that you raise money from those investors who think that, but generally, you shouldn’t be making that many decisions in your company, based on like, what investors think about in funding, unless they, they sort of make inherent sense to you, as a founder. So like, you know, investors like to see revenue, and they really like to see profits. That’s a really wide stance by investors, like you should totally, like, you should totally make your decisions based on that to align with what investors are thinking, all these are the other sort of, like trends that investors get into and things like that, like, it’s, you want to keep your finger on that stuff a little bit, you want to understand it, but ultimately, you should, you should really be able to make up your own mind on that stuff. And investors change their minds on this stuff all the time. You know, like, we, we’ve just watched it, we’ve already seen a few cycles of it, you know, we’ve gone from receiving investor skepticism about the container ecosystem and people asking us, okay, so like, how do you eventually expand beyond the container ecosystem? To then all of a sudden, we just never get that question again, because all of a sudden is just like, it’s like a switch flips. And, you know, we heard about this from our investor Chetan and you know, we part of the reason, I think the Benchmark, we’re the ones who ultimately did our round was because Benchmark is a pretty independent thinking VC firm. And they don’t really follow trends, at least nowhere near as much as other VC firms, they sort of have their own beliefs that they stick to. And so we talked to a lot of investors during our series who were just like, we don’t quite understand what category of product, what category of company this is that you guys are building. And then as soon as Benchmark did the investment in us, he started coming back to us with the slide decks from other companies that were raising money that we’re like, yeah, we’re a competitor to Pachyderm. And like, all of a sudden, this has become like a category that you can base on because other other investors were just like, Oh, yeah, well, that makes sense. Benchmark just did this round like Benchmarks really smart and forward looking. They must know something. So we got to invest in a company like this, too. And it’s this can be infuriating to a lot of entrepreneurs like it’s been infuriating to me on several cases, but you sort of just have to, you have to have enough self confidence to know when an idea is good, even if investors won’t validate it for you at that moment,

Erasmus Elsner 45:18 
I love that. And let’s end on this great note that you, you shouldn’t always cater to external validation as a founder. I think I love that. Where can people find out a little bit more about you about the firm?

Joe Doliner 45:32 
So I mean, they can find out more about packet or just a packet from.io that’s our website. We have our blog and everything there we have a pretty active slack users channel that’s just slack dot packet. arm.io I believe. I have a personal blog and a Twitter that’s not very frequently updated. So that’s where people can go to learn more about me. Okay.

Erasmus Elsner 45:55 
Perfect. So this is it for today. I hope you found it useful. I think package erm is super exciting company. And I’m really looking forward to follow their journey. And if you want to hear more about what I’m up to, you can always subscribe to my newsletter on sandhillroad.io or just subscribe to the channel and tune in next time. It’s up to you. Cheers, guys.