He left Boulder to move to San Francisco to work in the data-intensive space of background checks for high-volume employers like Uber and Lyft.
Covered topics include:
- how 20% of Americans have a criminal record
- how Checkr handles the “false positive” problem
- Ben’s tips for optimally growing your engineering career
For the full text transcript,
Max: Alright. Welcome, guys!
Today, we have the honor of being joined by Ben Jacobson of Checkr where he’s an engineering manager. For people who don’t know you, do you mind sharing relatively quickly where all you have been before joining Checkr about a year ago?
Ben: Yeah. I studied computer science at the University of Colorado in Boulder. And I graduated in 2010, and had been involved in the startup scene in Boulder for a few years during college. And then after college, kinda stuck around town. I joined an ad tech company. We were building some ads for content marketing and then I joined a competitive intelligence company. We were doing some data analysis. We were aggregating data for large corporations and then a lending company, and then I decided to move out here and joined Checkr which is a background check company.
Max: Got it. So, you’re not so much an “accidental engineer”. You’re probably the second guest we’ve had on that’s a computer science major.
Ben: Yeah. I would say that I’m a little bit of an accidental engineer. I’ve always liked computers and I liked web design. But it never clicked that that was a thing people did as a career. And then when I got into college, I originally applied as a journalism major. I went to go sign up for a computer science course because I just liked it and I thought it was interesting. They were like, “Oh, you can’t really take that unless you’re a computer science major.” So, I was like, “Well, I guess I’m switching to computer science.”
Ben: Yeah. So, it was kind of an accident, I suppose.
Max: So, for the people in the audience who haven’t heard of Checkr before like myself, do you mind sharing real quick what Checkr’s service offering is? You joined there a year ago, you’re an engineering manager there now.
Ben: Yup. Checkr at a high level is an API for background checks. So, background checks are these complicated kind of black box processes that no one really understands. We make an API that allows people to integrate background checks into their products, for example, a ridesharing company might need to onboard thousands of drivers a day and we do background checks for them. And they can do that within their app, within their flow, you know, everything. And we provide an API that lets them accomplish their checks the way they wanna do it.
Max: So, rideshare apps like an Uber or a Lyft might wanna know whether a prospective driver has a criminal record basically?
Ben: Yup, has a criminal record, has been driving 50 miles over the speed limit, you know, that kind of thing.
Max: Speeding tickets.
Ben: Speeding tickets, exactly, yeah.
Max: So, what kind of data coverage do you guys have and are you only in the United States at this point?
Ben: We are a majority in the United States. We do some Canadian background checks currently, definitely intentioned to do international background checks at some point. But the majority of our customers are the United States, United States background checks. Yeah.
Max: What degree of technology is there behind this and what are the engineering problems that you guys are solving?
Ben: Yes. So, the engineering problems, a lot of… It’s easy to think like an API and think fast. Like you send us a POST request and we get you back a response about the person. And that’s not really how it works. Background checks are very slow and asynchronous. You send us, you know, a request for a background check for person A. We then kick off a lot of these processes where, okay, well, we need to go do a social security trace and we need to go get the address history that this person’s lived at. And then once we have all the counties they’ve lived at, we need to go talk to all these different counties. Let’s say they’ve lived at 10 in the past 10 years, you know, each county might…some might have digital records, some might have physical records.
The ones that have physical records we have to send people, like literally into the court houses to look up records, you know, stand in line, look up records, go into the back room, pull records, write down what they see, put it back into the system, send it to us. And then we, you know, convert that into JSON and then send it to our customers. That could take on the order of minutes, hours, days, sometimes weeks. We’re trying to make that faster but it’s possible, weeks.
Max: Okay. So, I think a lot of our audience probably imagines at least entry level engineering jobs involve HTML, CSS, designing interfaces of software. But one of the interesting problems that I imagine you guys have is how do you uniquely identify a person. So, we send “Max Mautner” to the Checkr API. How do you guys figure out what counties to look at and that sort of thing?
Ben: Yup. So, we have some PII about you, your first name, last name, date of birth, social. We use a few services to figure out, okay, here is the address history of this person to kinda compile the list. So, it’s kind of the same services that credit bureaus use to build. You know, if you’ve ever signed up for Comcast at this house then you will start showing up on the list of. This is where you’ve been associated with, and we can kind of assume that you’ve lived here. And so we gather these addresses and break down these addresses. It’s complex and complicated but some counties are harder to work with than others, some are more digital than others.
Some are behind their own bureaucratic systems. Some counties go together, so they operate as a group of counties and we have to say, “Well, if you’ve lived in multiple of these counties, we only wanna do one request because we’re kinda working with a group system here.” So it would be not optimal to send multiple requests. A lot of complicated logic goes into figuring out where to actually start gathering the data. And then the value that we provide as we go, we look at this data. We get this record back. It could be you, it might be you, but it might not be you as well. So just because we find a match with your name and let’s say with not your date of birth but your date of year birth, like the year that you were born, the question then becomes, what is the probability that this is you or someone else? And you’d be surprised at how many other Max’s there are in the United States.
Max: I’m sure there are many false positives…
Max: …where I might have a criminal record even though I might not.
Ben: Yeah, totally. And you’d be surprised at how many other Max’s were born on the exact date that you were born. And so it becomes harder, or that have the last four digits of your social. Then it becomes a probability game of, you know, based on 350 million people in the United States, what is the probability that this record ends up being your record. And we have this model built where we get data from the record, we get data from a candidate, how do they all match, what’s the probability that this is an exact perfect match.
Max: And god forbid, somebody changes their name.
Ben: Yeah, totally, yes. So, yeah, and that’s a whole other issue altogether.
Max: And a common occurrence when people get married.
Ben: Absolutely or, you know, they go by aliases. Let’s say you go by your middle name and not your first name. Well, how can we figure that out and how can we be more strict on the matches then.
Max: So, this service offering that Checkr provides, where you work, is relatively or radically different from the other companies that you’ve worked for in terms of what they do. So, where you were at last previous to Checkr, do you mind sharing who that was and what you did there?
Ben: Yeah. I was at a consumer lending company, Confident Financial. It was similar in the sense that we were…we had an applicant apply for credit and we had to run a, not a background check but a credit check and kinda understand the applicant. Are they who they say they are? You know, if we’re gonna wire them $5,000, we need to make sure that they are who they [say they] are. We need to look up their history, their credit history. Do they have any outstanding loans, stuff like that. So, it’s very similar and it’s a part of the reason why I like Checkr as it’s in this highly-regulated industry, so we can be innovative but we have to do it in this way that’s compliant and how can we figure out ways to be optimal in this industry that is very old and where the laws haven’t caught up. It’s an interesting challenge beyond just the technology.
Max: I believe it.
I know even before Confident Financial, you had the good fortune of landing an internship while at the University of Colorado at Boulder, and this was a long time ago, probably 10 years. Do you mind sharing about that–how you landed that first internship experience as an undergraduate student, and what that was like, and how it encouraged you to step into the career you’re in now?
Ben: Yeah, totally. I was walking through the career fair at the University of Colorado Boulder, and there was a local company that were presenting named Rally Software. And they were like one of, this was like in maybe 2007, 2008, they were one of the early agile companies, like they were building project management software for this thing called “agile”, which at the time was pretty new. It was the idea of sprints and you don’t this huge waterfall thing, you do this like little chunks of work thing. So, it seemed like a very interesting company. I went there for a summer and interned, working on their large Java app that was very complicated and had a lot of business logic. And it was the first time I had been exposed to a large code base like that. I learned a lot, definitely.
Max: But you were telling me about how you encountered Ruby for the first time.
Ben: Yes. So, while I was at this company, I was tasked with a kind of side project inside the company to manage their internal resources. And there’s ground-up applications, so I was looking for frameworks to use to build this out. And this framework, Ruby on Rails, had just come out. This was in 2007, I think it had been out for a year. People had been talking about it a little bit. It’s like, “Okay, you know, I’ll play with this.” And I distinctly remember starting to play with it like running a migration, which is something that was a fairly new concept but everyone had felt the pain of that before and using an ORM. And it was like, “Whoa, guys, we should look at this framework.” You know, and of course this company is like this giant Java company and they’re like, “Uh, I don’t think…that’s cute but it’s not gonna be…you know, we’re not gonna pay attention to that.”
Max: And this is an agile shop, too.
Ben: This is an agile shop, yeah. And I was like, “I don’t know. I think this is the future.” And so, you know, I spent basically the next 10 years of my career going out of my way to figure out how can I use Ruby, and Rails. And, you know, because it’s the funnest thing to work with.
Max: Do you still use it today at Checkr?
Ben: Yup. Checkr is a Ruby shop. We don’t use Rails purely technically. It’s a Ruby shop though. We use Active Record which is I think 80% of Rails basically.
Max: Oh, and it’s Ruby.
Ben: Yeah, and it’s Ruby. So, our backends are all in Ruby and our frontends are all Reacts and kinda making that slow transition to React.
Max: So, one thing that I wanted to ask you about, having held a number of individual contributor roles as an engineer and now as a manager is when you’ve joined companies in the past, whether currently at Checkr or at Confident Financial, or the companies before that, what was it like joining companies at different stages in their lives, whether it comes to the number of customers or revenue or number of employees?
Ben: I think that it’s just a different set of challenges. When you join a company that’s really small and you’re just trying to figure out your customers, the challenges are “how can we build technology as fast as possible to figure out if this even makes sense, and this customer will like it.” And maybe that will get us another customer and another one. And it’s like all about iteration and making things work quickly.
The next phase you have some customers and they use you, and then you have more customers, and then that phase is like “how can we keep up with this?” “How can we make sure that this stuff that we duct taped together to get this customer is actually gonna end up working when we have a thousand of these customers?”
And it’s kind of like I’m doing the duct tape and like making it better. And then now it’s like, okay, we’ve kind of we made this really nice ball of duct tape and it’s like working pretty well. You know, how can we, and it’s serving this industry really well, how can we now go find a whole new industry and serve it really well? Maybe it might take a whole, like it might take a little bit of work on that ball of duct tape to change it a little bit for this industry. How can we make sure that we don’t, you know, ruin the ball of duct that we have for this industry and then still serve this other industry? So, it’s like it’s really changing kind of like the end goal and like the thing that you’re trying to accomplish.
Max: So, you were telling me when you first joined Checkr about a year ago, moving from Boulder to San Francisco that Checkr was at this point in time where primarily their business was on-demand API calls. And since you’ve joined, the business has transitioned more into enterprise clients that have very large volume, I would guess, of background checks they need to be made. What does that change looked like? Has that entailed you as an engineer interacting with your internal sales team, for example, as they try and figure and how to service enterprise clients?
Ben: Yeah. It’s a lot of collaboration internally to the companies. So, if initially when we are serving on-demand companies, we have API docs and these companies have engineering teams and they can read the docs and then they can, you know, figure out what to do. We’re all engineers so like we can be like, “Oh, we should add this endpoint, it makes sense. Let’s do it.” It’s very simple to build an API because we’re talking the same language. As we go to more enterprise companies, it’s less about our API and it’s more about, “well, do you integrate it with our internal HR system?” And it’s like, “Well, not yet but let’s figure out how we can do that.” So that, you know, if they’re moving massive amounts of candidates forward to their pipeline, how can we make it as seamless as possible to use Checkr?
So, it’s now a lot of people integrating our API–how can we integrate other people’s APIs into our API to make sure it’s as seamless as possible for people to use us and get started. And we do that with integrations in like Greenhouse, for example. If you’re moving a candidate through the pipeline, now we can be part of that pipeline right inside Greenhouse.
Max: So, once you guys are integrated with pipelines like applicant tracking service, ATS, like Greenhouse, you guys probably handle a very large volume of requests. You shared with me a statistic, not related to applications or the number of API calls you guys handle, about what share of the U.S. population has a criminal record. Do you mind sharing that statistic again?
Ben: Yeah. So, it’s about 70 million people. So that’s a lot of people in the U.S. that have a criminal record. And I would say that while a lot of people have records, one of Checkr’s missions is to be fair and accurate. So, just because you have a criminal record doesn’t mean like you don’t deserve the job.
Max: Oh, sure. Just to put that in additional perspective, there are 350 million in the U.S., so 70 divided by 350, that means 20%, 1 in 5 Americans has a criminal record of some sort or the other, approximately.
Ben: Yeah. It’s a lot of people, right? And so what Checkr does on top of just providing the highest quality data that we can to our customers is when we find, let’s say a record on someone’s background, we actually have these…we run this analysis and point people to resources and say “Hey, you should reach out to this this group. They might be able to work with you to get this record expunged and removed from your background.” Or we just started working on these features where, if you get adverse actions, which is like basically declined for a job because of something on your background, we can look at the thing that adverse-actioned you and say, “Okay, well, you didn’t qualify for this on-demand company, but based on our data, you might, you could apply to these other ones and it looks like you might have better luck there.” And so it’s helping people kind of find jobs proactively instead of like, you know, “something’s on my background, I can’t do anything” or, you know, get frustrated about it because it’s this black box system.
Max: Does Checkr dog food its own product? Meaning when you interviewed for a job, did they run a background check on you?
Ben: Yes, we do dog food our own products, yeah. And I distinctly remembered going through that processing, going, “Man, I got some work cut out for me.” Because like I said, it was a lot of duct tape and now I think we made a lot of strides in making it a better process all around.
Max: For both the applicants, the clients…
Ben: Applicants, clients, partners, everyone. And internal customers like our internal users of the system to do investigations and disputes and stuff like that. Our products have gotten a lot better over the past year.
Max: Awesome. So, taking a turn away from Checkr, a large portion of our audience are people who are in the first five years of their career or may not have even landed their first full time jobs. So, as a hiring manager, and as a person who’s probably been interviewing software engineering candidates for a number of years, what advice do you have for people that are common missteps maybe?
Ben: That’s a good question. I would say always be learning, like I’ve been doing software for 10 years and I’m not even close to knowing a fraction of what I need to know about the industry or how things work and stuff like that. So, I would say always be learning. If you’re trying to find the first company to go to in your career, make sure it’s one that you can learn from technically, business-wise. Question why decisions are being made by the business and understand the impacts of the technology you’re building is going to have on the business and stuff like that. So, I would say position yourself to learn as much as possible always in your career, always.
Max: That makes a lot of sense.
Max: I remember only maybe five years ago when the job title of data scientist became a thing. It started very rapidly and one thing that I’m curious about with regards to Checkr and when you were at Confident Financial, both businesses that are rich with data is whether you guys hire data scientists and how that is a distinguished career from engineering at Checkr?
Ben: Yeah. So, at Checkr, we don’t have a data scientist yet. We haven’t made a hire for that specific role, but we just started working on our data team. So, in the past year, we’ve made a lot of strides in liberating our data. So, for a long time… So let’s say you’re starting to build an application and it’s a Rails application, right? And you put all of your data in MySQL and then you’re like, “Okay, I need to start dumping data for this other purpose in S3,” and so you start dumping data into S3. And then let’s say some other reason comes along and you would say, “Well, we need to store documents at Mongo,” so, you introduced Mongo and now you have, you know, the updated MySQL, Mongo, S3, and then you’re using analytics services and those events are going to other places.
So, the first step is to really build the data infrastructure to bring it all in one place, and that’s something that we’ve just recently done within the last three months or so, is allowing ourselves to really create, like cross-join the data and understand it more. And now that we have that infrastructure in place, now we can start having the conversation of like what is a data scientist? What are the goals of that person when they come into the organization, and do they have the tools necessary to do the job? Because if you hire someone and they don’t have the tools to do the job, then it’s like they’re not gonna…yeah, it’s gonna be bad, right? So, you need to make sure that someone is set up for success in order to do that.
And setting up the data infrastructure is a very different job than doing analysis of the data. So, we are just now at a point where we have kind of our data in one place, in a Presto database. And we are currently in the process of exposing it to other parts of the organization. So that people that aren’t engineers–we’re doing these workshops where we teach people SQL, so people can start answering their own questions or scratch their own type of itch or curiosity or understand the data themselves.
That’s not a data scientist but it’s kind of the first step for us to get people excited about what questions they can ask.
Max: Yeah, for sure. So, for members of our audience who didn’t know about this before, one possible career track is data engineering as opposed to data science which is a, where with a lot of companies, even as mature as Checkr, data science or hiring for a data scientist is potentially putting the cart before the horse, so to speak.
Ben: Yeah, exactly. I think if you’re a data scientist that can also roll up your sleeves and do the data engineering to get to the data, you’re very valuable. But oftentimes, companies don’t really know that they need to get the data in one place first, which is a mistake I think companies often make.
Max: I don’t have any other questions off the top of my head. I mean these anecdotes and anecdata are super helpful. I think we’re definitely gonna do this again.
Max: Yes, totally.
Ben: This has been extremely illuminating. I think one thing that I’d like to emphasize for our audience is how, if they put it in perspective of just how much data that you guys have at a place like Checkr, you guys have customers that are very large that received very high volumes of customers, particularly large rideshare companies, that is only one of your customers, so the types of opportunities available to Checkr as a business and monetizing in some form, or providing feedback in some form from the data that you guys have aggregated across all of your customers is tremendously more, an order of magnitude more, than an individual customer of yours working with the data that you guys have given them.
Ben: Yup, exactly.
Max: With that, do you wanna plug any roles that you guys are hiring for?
Ben: Well, I’m an engineering manager so I would say engineering roles, you know, of all kinds including data science.
Max: Specifically, what roles are the engineering team hiring for?
Ben: Yes. So, we hire generalists basically. I don’t wanna say this term full stack because I think that’s…
Max: A loaded term.
Ben: A loaded term. But we hire backend and frontend engineers. Our frontend work has historically been low, but it’s become more and more of a priority as we go into the enterprise. And our backend work is…I mean like I said earlier, we have this giant state machine that’s trying to figure out what counties to talk to and what, you know, services they need to parse and stuff like that, so there’s always work to do there. So, any engineering role, yeah.
Max: You mentioned Presto, but do you mind sharing a little bit more about what technologies and frameworks and programming languages you guys use in your stack?
Ben: Yeah. So, our internal dashboard for customers and our internal teams is a legacy Angular application, which we are slowly decomposing into more modern React type applications. And then our backend–while not Rails–uses Active Record and Ruby. Technically Sinatra for a lot of our lightweight endpoints. And then we are just starting to experiment with Go and Protocol Buffers to make our inner services kinda… They’ve mostly been REST and now we’re trying to kind of decompose them down to gRPC and stuff like that which is more efficient, you know, easier to scale in the long term. And then for our data we do a lot of MySQL, Mongo, and direct dumps to S3 to archive data. So, I think it’s pretty modern stack.
Ben: Yeah, definitely.
Max: And some pretty traditional robust database software.
Ben: Yup. And then we have a dev ops team which is responsible for deployment pipeline and we are working on open source tools that sit on top of Kubernetes that allow engineers to write a Docker file and then ship it to the service and it gets deployed for us. So, we’re kind of open sourcing that, but that’s been a really great way for our engineering team to increase velocity.
Max: I believe it.
Max: Well, thanks for joining us, Ben.
Ben: Yeah, cool.
Max: We’ll talk again pretty dang soon.
Ben: Cool, thank you.