Craig Kerstiens sat down with us to share his portfolio of tools and techniques for managing product growth at Citus Data.

We cover:

  • why every product manager should learn SQL
  • what were some of the product tradeoffs in Craig’s tenure running Heroku’s hosted-Postgres offering
  • how Citus Data is positioned to take advantage of some extraordinary new PostgreSQL features

For the full text transcript see below the fold:

Craig Kerstiens, Citus Data


Play in new window || Download


Max: Today we have the great pleasure of Craig Kerstiens joining us of CitusDB.

Craig runs Product for Citus on the Citus Cloud-hosted database offering, and comes with a heck of a lot of experience at Heroku previous to that for many years, up through their acquisition by Salesforce, and before that in a number of software engineer individual contributor roles. Is that right?

Craig: Yeah. I’ve had an interesting background, a mix of a little bit of “IC” kind of engineering background, actually some sales engineering, some solutions architecture, and then most recently on the product management side.

So I’ve been an engineer. I don’t know if I was ever a really good one but got some early chops in engineering and have always been in developer-focused companies.

Max: You’re probably the most technical product manager I’ve met in that you use and leverage a number of tools to track analytics that the product managers I’m familiar with may not do as rigorously or use the tools as analytically and carefully as you do with your technical background.

For the members of our audience who are product managers and who are interested whether they should learn programming or not, share with us a few of the tools in your toolkit as a technical product manager.

Craig: For me, focusing on development or data products, it’s almost a prerequisite that you have some technical background.

But even beyond that, I found that being able to dig-in to know the technical details, but also dig-in and answer my own questions is super-valuable. It’s no longer just intuition-based. It can be data-driven.

I think the #1 thing in my toolbox is SQL, and that’s one thing that I find even engineers that have been doing it for 20 or 30 years may not be proficient in SQL, and many actually just hate SQL!

For some good reasons. Like, it’s not the prettiest language, it’s not the most user-friendly, but it is very powerful. And no matter what product you are, if it’s a SaaS-based product and you’re the product manager there is data somewhere and it’s probably sitting in a SQL database.

If you have to go and ask someone to pull that report, it’s just gonna be a slower turnaround.

Max: For yourself in your current role, you have a track record of having an engineering background so getting stakeholders who might manage credentials to the databases to share credentials with you so you can run SQL queries isn’t so hard.

But for some of the younger product managers who may not have–I mean, you yourself don’t have a Computer Science degree–but for a lot of people it might be hard to get stakeholders to give them SQL access.

So for those folks who have a hard time organizationally getting access to writing SQL against the company databases, what do you recommend?

Craig: I think there’s a few things in there.

One, I think you’d be surprised at engineers–when you’re willing to self-service and try–they’re usually receptive to that. Engineers by nature typically are learners and want people to self-service.

But there’s a number of tools for getting access to the production dataset, the live production database.

I myself, who’s fairly proficient in SQL, have run queries that have probably brought down Heroku before. It’s been a long, long time since that ever happened, but the worst thing is a PM is running a query that I think, “Hey, I’m just doing some analysis,” and then 15 minutes later you have engineers running over your desk saying, “What are you doing?”

So there’s a number of things that can come in and help with that. So, like, read replicas of the production dataset.

Usually it’s pretty easy to set this up, and then you have less impact on production. It may be a data warehouse where things are archived on a daily basis. You don’t really need the live up-to-date production dataset normally.

One of my favorite tools is, if you’re running on Heroku, which many people are, is dataclips, which–it’s hard to describe, but it’s like a live-run query in a read-only mode against whatever database, and you get a unique URL that you can send around.

We have a number of reports at Citus that live-run queries against our Heroku Postgres database, return the results in CSV format into a Google sheet, and we chart against it. So it’s like a live dashboard against all our production data.

Max: So rather than having to write in SQL all day as a product manager, you can write SQL once, save that as a report, and then just check Google Drive spreadsheet on a daily basis to track key metrics?

Craig: Yeah, exactly. You’re investing a few hours upfront and then it’s a live update.

I look at what our run rate is, when your customers have signed up, any attrition/changes/upgrades/downgrades of plans, all of this.

And it’s SQL backing it but I wrote it once and then it refreshes automatically, which is really nice and powerful.

Max: Oh, for sure. I know there’s a very active market for third-party services of this kind. There’s Chartio, there’s Periscope, and there’s an open source and pretty mature other option called Redash which we use ourselves.

All of those are options that product managers can use to get their hands dirty early on with SQL.

Craig: There’s a slew of them out there. I think Chartio is a good one, Periscope, as you mentioned. I think there is a Mode Analytics.

There’s basically a complete suite of options there. Really you can take which one you like and use it. You don’t always have to go down to the SQL level. Some of those make it easier to drag and drop.

I do encourage product managers really to spend some time and learn SQL, and I think that gives you a better understanding of the data.

It’s not just about self-servicing to me. It’s that when you understand the underlying data you may be able to then ask a question you didn’t know you could ask before. And once you understand everything that you have, you’re better equipped than just asking the question to an engineer and getting an answer where you’re kind of limiting how far you can go.

Max: I think a lot of folks who may not yet know SQL may encounter barriers when they’re first learning due to the fact that there’s multiple flavors of SQL.

You have MySQL, you have Postgres, you have Oracle (God forbid), you have Citus. And the search results that you might find when you search for tutorials on SQL are each slightly different.

So for our audience can you allay people who might be fearful of some of the roadblocks they might encounter in first learning SQL?

Craig: Yeah. In general, SQL is a standard and databases usually follow it pretty strictly…usually.

There are some exceptions there, but you start to get more advanced before you encounter those.

Basic SQL–like querying a table, joining against it, filtering on it–it’s gonna be the same whether you’re on MySQL or Postgres or Oracle or even SQLite. All of the basic building blocks are the same.

Then where you’re gonna get specific is more of: “what is the database that you’re running?”

Personally I’m a big fan of Postgres. It’s been growing very well. It’s open source so you can not pay as much money to Oracle which is nice.

And then Citus is just an extension that sits on top of Postgres, so we’re pure Postgres. Once you learn Postgres there’s a number of other tools that build on top of it, which make it a nice portable interface.

Max: For people who aren’t familiar with Citus, it would be really cool to hear from you–the manager of product at Citus–what the product is? What do you guys augment about Postgres?

Craig: Yeah. I guess it’d probably make sense to back up a little bit first on what I did at Heroku.

So I came over to Heroku–many, many people had known about Heroku from the Ruby world. I actually was not a Rubyist at all. I was more of a Python guy and heard about this thing, but it didn’t exist for me at the time. I came over there as one of their first product managers and I’m like, “Okay, this is kind of cool, but I don’t do Ruby but it’s a cool company!”

Along the way, Heroku kept hearing from customers that they needed a place to store data. Like, “I’m building this Ruby app, can I save my data somewhere?” And Heroku’s like, “No, no. Just run your app.” It turns out people want a database.

It turns out people want a database.

They need to store state somewhere. So, “How hard can that be? Let’s just start running Postgres.” It turns out there’s a lot of work running a database.

In hindsight it was still a good choice, but there was a lot that went into that. So over time I ended up running product for Heroku Postgres where we ran something like a million and a half Postgres databases for customers. So a one-in-a-million problem was daily.

But what I saw there was that at a certain level the easiest way to scale a database is to scale up. So your application you scale up pretty early, you go from one app server to two, throw a load bouncer. It’s a reasonably well-defined and solved problem.

Databases, it’s the opposite. Like, you go up, you go up, you go up, and then suddenly there’s a ceiling here.

Amazon, for all they do as an infrastructure-as-a-service provider, there’s only so big you can go on instances. It’s not like you can say, “Hey, here’s more money. Give me a bigger instance.” They just don’t have it.

Hey, here’s more money. Give me a bigger instance.

And what I saw at Heroku was that people would scale up, scale up, and they’re like, “Well, crap. Now I’ve got to spend six months rearchitecting my app. I’ve got to manually shard.” And sharding is essentially where you take data and split it up.

So there’s a couple types of apps that lend themselves really, really well to sharding. Really commonly a SaaS, B2B multi-tenant space. If you’re familiar with, if you use Salesforce, your data doesn’t interact with anyone else’s, so you’re a “tenant” to them. You’re a single customer, but it’s all isolated. It’s never gonna interact with someone else’s.

I can’t come in and say, “Let me see your leads this month,” for a good reason, right?

Max: Probably, yeah :)

Craig: So some apps lend themselves very, very well to sharding where you’re splitting this up across multiple physical nodes.

So now suddenly, you’re starting to scale your database the same way you scale your app.

So in a long-winded way I saw that problem and had known the Citus guys for a little while.

Citus essentially is an extension of Postgres. Postgres has all these hooks where you can go and change things on the fly–transparent to the user but it hooks in like you can add geospatial data types, you can add a key value store, you can do full-text search. All these things you don’t think of a relational database doing it can do because of these extension hooks.

Citus hooks into your application in that it still looks like a single database. Under the covers, we split it up across multiple physical nodes. So it’s scaled out horizontally just like your application, but you don’t have to do all that hard work for it.

Max: These are the concept of “Postgres extensions”, right?

Craig: Yup, exactly.

Max: These are not so new, they’re in Version 9 of Postgres? Is that when they were introduced?

Craig: They may have been even earlier, but it was right around that time. They’ve been around for about seven years now.

Max: There’s a lot of extremely interesting new features coming to Postgres and it’s one of the reasons I think you guys are well-positioned.

You’ve got JSONB handling unstructured data and making MongoDB redundant.

You have logical replication which is a tremendously powerful feature which makes Postgres Firebase-esque in being able to stream new events, and updates and creation events, and deletion events, wherever willy-nilly you want to.

A lot of very interesting stuff coming down the pike for Postgres!

Craig: I think of Postgres as becoming less of a traditional relational database and more like a data platform.

You look at all these things being built on it. Now you don’t have to have this high bar for C code that’s reviewed by the core committee and released once a year. An extension can just be built and shipped.

There’s ones for hypothetical indexes like, “What if I added this index, would it improve performance or not?”

So a lot of cool things coming, based on all that extension framework and APIs without it having to come back into the core, in addition to all of the core Postgres still getting better and better.

Max: So Citus, the extension to Postgres, is actually an open source project. Is that correct?

Craig: Yep.

Max: And it’s viewable on GitHub. If you just google “Citus GitHub” you can find the open-source project. What’s the open source license on the code if you don’t mind me asking?

Craig: So it’s AGPL which people have opinions on for a lot of software. I think as a database–because we connect via the standard Postgres driver–it looks like Postgres, acts like Postgres. So it’s AGPL, take it, run it, use it as you see fit. If you edit it, then you’re supposed to contribute that back basically, is how the license works.

Max: In the very near future, we’re gonna be covering the topic of different open-source licenses like AGPL, and what it means to our audience.

One of the things I was curious about with Citus is that you guys focus on Postgres.

Is that merely because you found a attractive market segment or have you guys considered reproducing what you’ve done with Postgres for MySQL, or Oracle, or what-have-you?

Craig: Postgres was interesting from two perspectives. One, I think it’s a rising tide. Like, Hacker News loves Postgres and who can fault it if Hacker News loves it? But it’s actually gaining more and more momentum over time. It continues to get better.

For a while, it was just known as safe, stable for your database and not the cool thing. Then suddenly it got some of the cool stuff like JSONB and now it’s really rising in momentum.

The other thing is that no other database has that same set of extension APIs. So the fact that we’re an extension is very, very key.

Most databases, when they start to change things, they fork away. If you’re familiar with Redshift or Greenplum, they were once-upon-a-time Postgres, but they were Postgres as of eight years ago.

So all these new things like JSONB, like logical replication–they are still waiting. And when you fork off of something, it doesn’t mean anything on Day 1. It means something on Year 2, Year 5, Year 7.

Because of these extension APIs, I don’t know if it’s Day 1 or, you know, Week 1, but we will support Postgres 10 and the next version and the next version so as Postgres gets better, we keep getting better.

It was a fairly key strategic decision that we won’t fade away as Postgres continues to improve.

Max: I guess neither of us are best positioned to talk on this topic of, “What is MySQL up to” or “what is Oracle up to” or “what is MS SQL up to?”

But do any of them have extension APIs that would allow you guys to build or reproduce what you’ve done with PostgreSQL?

Craig: Not in the same way.

  • MySQL has some pluggable back-end stuff, but not the same kind of level of hooks.
  • Oracle I’m not actually sure at all, but I suspect probably not based on their typical license.
  • And SQL Server, not in the same way. SQL Server has probably less ability than MySQL to be modified.

So from what I know–and there may be something I’m missing on both–it’s been a few years, not nearly to the same level, which I think sets Postgres in a unique category by itself.

Max: Got it. So one of the common problems–dialing back from Postgres and SQL–is that in product management, getting at data about product usage or product performance, however you measure it.

One of the things that I’ve seen product managers encounter is when that data is split across multiple different databases. So you might use Postgres for your application database but you may also use Salesforce for your customer relations management, or some other CRM.

For product managers who are in this predicament or pickle how do you handle that case where, okay, you’ve learned SQL, you know SQL, you can run queries against your app database and get usage information, but what about payments data like churn?

Craig: So the most popular product management statement ever, “It depends.” It entirely depends on your environment, your setup.

But generally, what I try to do is a couple of things. Within Postgres, there’s two options:

One, you could look at building a data warehouse for your ETL-ing all of this stuff and basically saying, “Hey, let’s do a nightly sync from Salesforce into my Postgres database, and from Stripe into my Postgres database,” and I have found generally that works okay. You can do this with some lightweight Python scripts which is my favorite thing to do. Just schedule it on a Cron job or a Heroku Scheduler to run once a day.

It’s brute force, it’s probably not the most elegant thing, but it just works, which, as a product manager, works really well. You can do a more complicated ETL pipeline if you need to. I usually find that works.

The other option is, Postgres actually has a really cool feature called Foreign Data Wrappers, where you can connect from inside Postgres to another data source. So I could connect from inside Postgres to Redis, and actually query and join these things in a very cool way.

Max: So even though Redis does not have a direct SQL interface, you can use SQL to query it

Craig: Yep, even SQL to query MongoDB. There’s a Mongo Foreign Data Wrapper that you can query directly into Mongo using SQL from inside Postgres.

Max: So Foreign Data Wrappers are a relatively new addition to Postgres that allows you to use a Postgres instance to drive these queries that are run against these foreign data sources?

Craig: Yup.

Max: Is it realistic to depend on Foreign Data Wrappers to do things like we’re describing or is it immature relative to some other options that people might consider?

Craig: It depends, right? I wouldn’t put it into a production app, but if you’re doing offline product management like data analysis, it’s something I’ve actually regularly used. I guess in production, for me, from a reporting standpoint perspective.

If you’re looking to put it live to users there is some extra latency there. There’s probably better ways to design and build things.

Max: Do you mention sharing what specific Foreign Data Wrappers you’ve used to do this type of stuff?

Craig: My favorite one actually is the Postgres Foreign Data Wrapper. So if I’ve got two completely different Postgres systems, it just makes sense. I think at Heroku we had something like a total of 30, 40 different Postgres databases in total where some house stuff about the dashboard, some house stuff about billing, some about the core database. And this was actually the easiest way to say, “Here, set up this other Postgres table within this database just as a logical mapping and just push down that query.”

I’ve played with, at times, the Redis one. I’ve never used the Mongo one because I’ve never run Mongo in production.

Those are the big two. I’ve actually played with the Salesforce one before.

Max: Have you successfully set it up and were able to query? For example, the “leads” table or the “contacts” tables from your Salesforce instance?

Craig: Yeah. I definitely have been able to do that from a proof-of-concept.

For me–I’ve written this three times now–now I just write a sync process that runs nightly and pulls over stuff via the bulk API. It’s a little faster and not hitting the REST API of Salesforce.

Max: Sounds a lot like logical replication!

Craig: It’s not quite–it’s a little more brute force but it’s the same idea basically just pulling things over.

And I find, on a lot of project management things, brute-forcing doesn’t have to be the best engineering that gets the job done. I actually find the engineers kind of appreciate that you don’t have to over-engineer. If it gets the job done it’s perfect.

Max: There’s a couple of open-source projects for people who are down with “over-engineering” that I wanna plug real quick, that I have not personally contributed to yet. One is called Bottled Water.

Craig: It’s an interesting one. There’s some mixed opinions on it–about whether it’s production-ready.

Max: I don’t believe they advertise it as being production-ready nor is it actively developed at this point. I think last active development was 1+ years ago by the folks at Confluent who managed the Kafka project.

Craig: They actually took it over. The primary development was by Sam Stokes and Martin Kleppman, so two guys. We actually funded some of it at Heroku for a little while and it’s very cool. I guess we should actually explain what it does!

Max: Yeah, absolutely. What is Bottled Water?

Craig: So Bottled Water is “streaming change data capture,” so everything that happens from inside Postgres into Kafka.

Max: Inserting a row, updating a row, deleting rows?

Craig: All of this. So you’ve got an audit trail from everything you do in your Postgres database going into Kafka.

And then you could do whatever you want. You can set your topics, you can consume it, you can alert on it, you can filter, you can send it to other places. But the idea is that if Kafka is your streaming hub where you’re gonna send data out to different places, Postgres is your system of record.

Let’s get all that system of record, everything that happens, into Kafka and do what you want so it’s very cool. It parses the Postgres write-ahead log, figures out what to do.

Max: So the common variation that possibly all of our engineering audience might be familiar with is: in your web app you have some POST API endpoint where somebody creates a new resource, you perform an insert into a SQL database and then you do a subsequent second write to a different database, for example, a full-text search databases like Elasticsearch. So you’re ending up in your app server doing two things. One is writing to SQL, one is writing to Elasticsearch, let’s say.

Craig: And then hope one doesn’t fail and you have to unwind…

Max: So the brilliance of this Kafka solution where you listen to the Postgres logs and write it to another log that can be read from, is that you can “fan out” your writes or any of your write operations to any listener that you want to set up in the future.

And not only that but that log can be replayed from the very beginning of your database. Not only can you be assured that you have all events as of the moment you switch on your new listening service but you can replay everything from the beginning of the state of your database, which has some tremendously beneficial properties.

Craig: Yeah, we leverage that same kind of capabilities of the write-ahead log and Postgres for point-in-time recovery for Citus. Actually yesterday at 5:00 p.m. we had a customer that said, “Oh, crap. I just ran a migration, my test migration which drops all the tables, against production.”

Max: This is not uncommon!

Craig: It happens way more than you think. So if you’ve ever done it, one, feel a little bad.

But two, you’re not the only person. I don’t know if I hear about it once a week but I hear about it a reasonable amount.

I also hear it from a customer who’s customer deleted data. But with the write-ahead log, you can say, “Give me the backup and replay the log up to this second or a minute in time.” So we actually recovered their database and several terabytes within 50 minutes to the one minute before that ran.

I was actually out to dinner, following along, checking as we were taking care of it, ran a couple of commands, and I checked in an hour and a half later when I was back at home from dinner: “Everything’s good, thanks!”

Max: I think this happens more often than you hear about it probably. You probably hear about one in every five times that this happens.

Craig: Usually customers, when they delete their data, they usually say, “Can you help me. I know you probably can’t.” So it’s nice to be like, “Actually we completely got you covered.”

But yeah, I could imagine that customers probably do it on their own, too, and just say, “Well, I’m gonna go do the backup that’s two days old.”

Max: So coming to Citus you had this wealth of experience with managing Postgres databases, over a million of them at Heroku. The backup system, is it any different? I know at Heroku you have these snapshots using WAL-E.

For members of our audience that might not know what WAL-E is, you mind sharing what it is?

Craig: Yeah. Postgres, under the covers, is just one giant append-only log. That’s known as the write-ahead log, and when you write a record it’s written to that log. When you update a record, it’s written to that. When you delete a record, it’s actually a write to that log.

We leverage WAL-E. We actually have the engineer that wrote it while at Heroku and we actually released a new version of it a week ago, WAL-G, re-written in Go. We’ve seen between 4X and 7X faster on the ability to restore a database.

But what we do is take a base backup which is the logical bytes on disk and take the write-ahead log and archive it to S3. Basically that means if your entire database fails we can recover from it.

That’s one type of backup within Postgres, that’s what’s known as a “physical backup.”

More people probably know about when you run a pg_dump. When you download this archive locally to restore to staging or that sort of thing. That’s actually the raw SQL INSERTs so it’s more machine-portable but when you actually run pg_dump it introduces a lot of load to your database, so most high-volume databases don’t run pg_dump.

At Citus we have a very similar setup where, if you want to, you can run pg_dump. When I came over to Citus, engineers number 1 and 2 from Heroku Postgres also joined Citus. So it’s a very similar experience. It’s one of them jokes, “doing the exact same thing the second time around is a lot faster and easier and you don’t have to make the same mistakes.”

Max: I’m realizing that we’ve gotten pretty technical in talking about specific Postgres features and specific projects.

I forgot to mention the second open-source project that I really quickly wanted to plug which is called Maxwell, and it is a rendition of what Bottled Water does but for MySQL. It listens to the MySQL write-ahead log, which I’m not sure if it’s called the write-ahead log, and distributes that to Kafka.

It is production-ready and it is used by a number of companies that we all might have heard of, including Zendesk.

Craig: I would guess Uber is one, but I’m not sure.

Max: So Uber, really good story there that maybe our product management audience might not care much about, but our Postgres and database-aware folks might, is how Uber, a very high-profile user of Postgres, made the decision to switch from Postgres to MySQL.

I think you had some commentary about that situation, or you obviously paid attention to that conversation as it happened in the public. Do you think they made the right choice? Are the flaws that they encountered with Postgres things that they should have been able to surmount?

Craig: I think they definitely could have lived with Postgres. I think it’s actually really funny if you look back in time, they wrote the exact opposite article two years before: how they migrated from MySQL to Postgres.

The short of it, I think, is that they couldn’t find Postgres people to hire. They actually had some new management that had come in with a MySQL background and they knew how to run it. And really if you read through what they did was build a whole other NoSQL engine on top of their relational database.

I wonder if they should be using MySQL or Postgres, at all versus something much lower-level based on what they’re doing.

Max: This brings up an extremely relevant point for members of our audience who might be earlier in their careers or still debating what skills to learn, is if there isn’t a market supply for people familiar with the technology, maybe it’s not the right technology to use. That manifests in different ways, but for our audience who we’ve highly recommended learning SQL I think we should restate that learning SQL is a pretty universal skill…

Craig: Yeah, it’s not going anywhere. I think it’s been around for 30 years. It’s the lingua franca of data. It’s outlived a whole bunch of things that were supposedly gonna replace relational databases. And even on all the NoSQL databases, now they have a SQL layer on top.

Mongo, Cassandra, Spark–all these things that are the cool new things, they also just have a SQL-like interface, so you’re not missing out on much.

Max: I think I was reading about how Mongo, the parent company 10Gen, is filing or has filed for IPO recently, when they announced that they had a replication system in place for ETL and analytics–turns out it was actually Postgres Foreign Data Wrapper!

Craig: Yes. SQL is a fairly safe investment–you don’t have to worry that you’ve learned something that you’re not even able to use.

Max: Dialing back from SQL, dialing back from these open-source projects–what might be specific about working on product management for developer-focused products, like Citus Cloud?

Craig: Basically developers are horrible customers, so I wouldn’t encourage anyone to go down that path (sarcasm).

Max: Well, why are they horrible customers?

Craig: Developers usually have really strong opinions. They’re often a little cheap.

Developers usually have really strong opinions. They’re often a little cheap.

Developers…it’s interesting. For developers trying to automate so many things, sometimes they won’t always do the cost analysis of, “If I automate this myself, it took me how many hours?” versus, “What do I bill up per hour?”

I’ve seen more and more discussions about this, how developers are starting to do the economics and say, “It’s totally worth it for me to pay $500 for this service. Yes, I could automate it in 10 hours, but I can go and bill for 10 hours which pays off much better.”

I do find it very rewarding, it’s a challenge. You’ve got to actually add a lot of value. And developers by nature are already technical so they can do a lot of these things.

There’s a bit of a higher bar, I would say. You can’t go and do something very, very simple. You actually do have to go solve a hard problem, which, to me, is really rewarding, also very challenging. I think it’s a very interesting approach to product management and a bit of a different one from a lot of other kind of product managers out there in other industries.

Like, if you’re a product manager at Uber or Facebook it may be still good to be technical, but how people interact with pictures or how most of people new to Uber onboard is a very, very different problem than selling to engineers.

Max: I think one of the other particulars about product management for Citus, which offers hosted database service offerings, is the concept of lock-in that a lot of customers have to grapple with and might be one other big dimension to how engineers, as customers think, or developers as customers think. Once they’re on a platform like an Oracle they’re on Oracle for a long while and there’s a high switching cost.

So have you ever worked on products that don’t have high switching costs and do not require selling to somebody who has this fear of lock-in?

Craig: I’m trying to think a little bit there. I mean, maybe Heroku? Like in terms of high switching costs, there is some lock-in there. If you’re using Heroku Postgres or you’re using Heroku, yes, your data lives in there, but you can take and run Postgres yourself. It’s open source.

Citus is actually somewhere in the middle there, in that if you use our database as a service, give us your data, it’s there, but we’re also open source. Take it and run it yourself. You don’t actually have to pay us anything and you could run and use the database. Which keeps that bar high, we have to keep providing value, delivering a good service, because as soon as we don’t do that then why would you stay?

Max: Well I guess how do you, in trying to sell Citus as a hosted offering, allay fears of lock-in?

Craig: I think a big part of it is just that open-source piece. Like, “We are Postgres, it is open source. Take it and run it yourself.”

And a lot of people will show up and say, “Hey, I just wanna learn a little more in the open source. I don’t know if I trust you running my database yet,” and then having some conversations with them. They understand the experience that comes with a team, the additional value that comes with it. So you usually start on the side of, “I don’t wanna be locked in. I don’t wanna be locked in,” and then later say, “You know what? There’s enough value in this that it absolutely makes sense,” and then there’s that open-source piece. And if you find a lot of developer products, database products, there is an open-source angle to it that allays that fear.

Max: I think there’s a significant fraction of our audience that may not know exactly what “open source” refers to.

For our audience members who don’t know or have never contributed or even understand the concept of contributing to open source:

Are there any good first projects or groups that people can go to and meet up in-person with people who can coach them on, A) what is open source? B) walking them through their first contribution to open source, even if it’s just adding documentation?

Craig: Yeah, that’s a great question.

Open source is the idea that code is completely open, free, public, take it, use it.

There’s a number of places and a lot of conferences for learning like DjangoCon or Pycon or RailsConf or RubyConf. A lot of them will end up doing sprints afterwards. At a conference you go to a bunch of talks, learn something hopefully, maybe not, meet a bunch of other developers. But afterwards there’s usually a day or two where there’s pizza, coffee, beer, and people just hacking on these open-source projects. Usually there’s some big-name projects there like a Django or Rails, and, “Oh, we’re working on this project.” Ten of us will meet at this table. And you sit there and code, and I find that’s actually a great opportunity to come and…all open-source projects usually track bugs in public, and a lot of bugs are really easy ones, like, “This isn’t documented well,” which is a great place to start.

Usually it seems like a pretty scary thing to dig into and start at first. I do find that the Django, Rails, Ruby, and Python communities in particular are very welcoming. And I think there’s some blog posts, if you look on those main projects. “Here’s how to do your first contribution to open source, and here are, like, good things to tackle as a first-time contributor.”

Max: Oh, yeah. I mean, for people that are feeling overwhelmed, attending a conference, like Craig just described, is super-helpful. You’ll meet a lot of very friendly people who are very invested in their projects, and are happy to help and educate you on how to contribute effectively.

So all of the advice Craig is giving here is gold.

Craig: Yeah, and there’s probably a local group near you. There’s Rails Girls, Women Who Code, there’s some Django ones. So there’s usually a bunch of regional ones–and it doesn’t have to be one of these big national conferences. I’m trying to think of where all the Ruby ones–there’s PyTennessee, there’s PyOhio, a Python one in Orlando.

Max: And of course there’s, there’s plenty of technical meetups on whatever programming language you’re getting started with or are using in your work. And oftentimes those will have events that may not be focused on open-source sprints and contributing to code, live and in-person with other people, but they’re a great kicking-off point for meeting people who can help you through your first open-source contribution.

Craig: There’s probably someone there that’s done it before that can help.

Asking the question is probably the hardest part, and you may get more help than you wanted–you may have three people willing to help you right there, because generally people care about the community, what they are doing, and happy to help grow it!

Max: For sure. Well, this has been freaking awesome. Thank you for joining us, Craig!

Craig: Yeah, thanks for having me.