Patrick Wong shares his story of becoming a Data Scientist at Glassdoor.

We cover:

  • Patrick’s path from economics undergrad at UC Santa Cruz to working with Glassdoor’s novel salary data and employer reviews
  • how Patrick learned programming in preparation for his first data science role
  • some juicy stats about the job market (circa Fall 2017)

For the full text transcript see below the fold:


Play in new window || Download


Max: Welcome! We have the good fortune of being joined today by Patrick Wong. Patrick is a data scientist at Glassdoor.

For those of you guys who have not heard of them they are a jobs-listing site with ratings and reviews of employers as well as salary data.

Patrick, do you mind introducing real quick the types of things that you work on at Glassdoor?

Patrick: Sure. I sit within our data science team right now, and specifically, within our economics research group.

What we have is a lot of data about job seekers, about employers, and what we try to do is take all that data and understand what’s happening in the labor market.

Max: So, Patrick has an interesting story of how he came to be in this role as a data scientist, which is that you did your bachelor’s degree in economics at UC Santa Cruz and for a big portion of your early career you spent it in roles of being an analyst.

Do you mind sharing for our audience what kind of path you took from college graduation to joining Glassdoor as a data scientist?

Patrick: When I started out at my first job at I was analyzing survey data, sifting through it, trying to find some insights there.

It was similar to what I was doing in school which was taking data, applying some statistics, and trying to figure out, “Okay, what’s going on here?” As I continued on at I found that what I was doing a lot of the time was manual reporting.

It’s a mundane task. Nobody wants to do it but somebody has to do it, right? While I love to do the analytics portion of that I spent a lot of time doing the reporting.

What I ended up realizing was, “Hey, actually a lot of this can be automated.” I started writing macros in Excel to start copying and pasting things and applying formulas here and there. It cut big jobs down that took hours down to a minute, and that was really cool.

That showed me the power of programming. It was super relevant to me, right? That’s when I started trying to learn it on my own and that’s when I learned about data science.

Max: So you hadn’t done much in the way of VBA macros or programming as an undergrad in economics, right? Or did you have some base level where you were like, “Oh I realized everything I’m doing I can do much faster if I automated it?”

Patrick: Yeah, definitely. As an undergrad I had exposure–I knew what they were but I hadn’t really used it for anything. When I got to the point where I thought, “Hey, actually I know of this tool, I should just learn how to use it.” So I just googled how to do it and it was super helpful.

Max:–your current employer at the time, which you spent nearly four years at before joining Glassdoor–did they provide any formal training for you as an employee or you was all of this kind of self-taught?

Patrick: It was more self-taught. I was spending so many hours doing that, that I thought I was gonna go mad if I just kept doing it the way that I was.

Max: So Excel and VBA and automation and spreadsheet tools was something you picked up while you were at

What changed when you joined Glassdoor? You initially joined Glassdoor as an analyst before getting and re-applying as a data scientist.

Patrick: I had taken a lot of courses on data science while I was at, just during my nights and weekends. I wanted to do more advanced analytics in addition to automating more stuff.

That really caught my interest. I started spending more time learning Python, getting better at R, and just learning more techniques within the field like machine learning, regressions, and stuff like that.

Max: So the data science team that you work on at Glassdoor does economic type of data science research. For our audience, what types of research does that entail?

Patrick: One example of something that I did earlier this year was looking at what factors predict your overall satisfaction with your employer.

We collect ratings not just on overall satisfaction on where you work, but things like work-life balance, senior leadership, culture and values.

Something that I did was looked at all these different sub-factors and tried to understand which of these are the most important in predicting high satisfaction for an employer.

And what we found was that culture and values is by far the most important predictor for being happy at work.

Max: One of the things I’m curious about with regards to Glassdoor’s data set of ratings from employers is selection bias problem. How do you on the economics team at Glassdoor account for selection bias?

Were the people most likely to write reviews of their employers and give you guys a dataset perhaps the most enthusiastic supporters of the employer or the least enthusiastic employees?

Patrick: So actually, when we look at the distribution of ratings we see that it’s pretty normal. It’s not just all 1’s and all 5’s. Those are the min and max for the ratings.

We actually have some research coming out that will go more into depth on this and will prove that we have reviews coming from all types of people, whether they’re super unhappy or super happy or in the middle.

Max: So does your guys’ dataset also include or take into account where employees are at in their employment with their employer?

For example, there’s a number of services and startups in the space of OKR reporting, where you track employees’ satisfaction on a weekly basis or their performance on a weekly basis. Do you guys have more granular data about where employees are at when they write their reviews?

Patrick: Not so much, but one thing that I can say that is a proxy about where people are in their career is in that study that I mentioned about what makes people happy at work–we also did that across different income levels.

We are able to track as you earn more money–probably later in your career–what starts to matter to you more. And one thing that we found was that as you earn more, you care more about senior leadership. You’re working probably higher up so you’re closer to the senior leadership. That matters more to you.

Max: So you’re pickier about who your senior leadership is as an employee as you get further into your career?

Patrick: I don’t know if you’re more picky but you’re gonna have a lot more exposure to senior leadership, right?

Max: Ah, so you care more.

Patrick: So that’s gonna matter a lot more to you.

Max: Got it, got it.

For people who are curious about what kinds of tools you guys use on the data science team that you had to learn to get your data scientist role, what tools do you use?

You mentioned at learning Python and learning from data science courses on your nights and weekends but do you mind getting specific?

Patrick: So the tools that I’m currently using and a lot of other data scientist at Glassdoor use is one programming language, either Python or R. It’s pretty split between the two right now.

We use Tableau for visualization and for reporting. And then, of course, SQL skills, knowing some skill or some language of how to pull data from a database.

Max: For a lot of people who don’t know the history or the background, what is kind of the breakdown of who chooses to use R versus who chooses to use Python?

How did that schism happen, or where does that divider arise from at a place like Glassdoor or Art?

Patrick: Yeah. I think it really just depends on an individual’s background, right? I came from an economics background and for us we learned R. We didn’t learn any Python.

I think the same goes for people who studied statistics in their undergrad. For Python, I think a lot of people who come from more traditional CS backgrounds are more familiar with it, but it’s really just dependent on that. I’ve met people from all sorts of backgrounds and it seems to change here and there.

Max: And the economics curriculum hasn’t changed much. I remember the softwares that are taught in those courses, like Stata and SAS, though both of them are pretty good intros if you transition to using R. Python is a bit more of a programming language then R, which is more of an integrated development environment, I guess.

But Python has mimicked R in that they’ve created Jupyter notebooks or IPython notebooks. Do you guys use those?

Patrick: Yeah, again, it depends on the individual. Some people don’t really like them. I myself really like the format.

For those who don’t know, it’s a format of Python code where you have different cells and you can put in Python code or markdown. You can make it look nice. For me it’s really useful for analysis–to be able to pull in my data and outline my process in this notebook.

Max: One of the crazy things about living in the San Francisco Bay area is that I was at the rock climbing gym and I saw Fernando Perez at the rock climbing gym. He’s the creator of IPython, now Jupyter.

Patrick: Of course, wacko.

Max: I was like, “What??” But speaking of locations and geographies, one huge dimension to the job market is geographic location.

Is that a topic that the data science group at Glassdoor studies or that you guys care about on the jobs listings front?

Patrick: Well, we definitely look at location in all sort of areas.

When we do research on who’s making what, where, we do take that into consideration. One thing that we do that really highlights that is something called our monthly local pay reports.

It’s a local payer report for 10 specific metro areas in the U.S., including San Francisco, Seattle, New York. It’s a listing of over 80 different jobs, what the median salary is, what the trend is for those wages, and also across different industries and employer sizes.

Max: Is there much volatility in those numbers? Or can people rely on them on a pretty real-time basis?

For example, one of the tools that recruiters and employers use to help them determine how much to offer their candidates is to pay for reports like the type that you guys are making public.

What kinds of guarantees… I mean it’s a free public report but do you guys have a very large sample size? How recent is the data that you guys publish in these reports?

Patrick: Yeah, when we publish it at the end of every month that is all the data that we’ve received that month–all the salary reports and that’s thousands and thousands of reports.

When we feed that into our model that produces all these salaries we get a pretty consistent idea of what they should be and how they’ve been trending.

And in terms of volatility, of course, month-to-month, we shouldn’t expect very much and that’s what we see.

But year-over-year there are some interesting trends that arise. Something, for example, is we’ve seen pay for recruiters increasing a lot lately.

Max: Interesting.

Patrick: The labor market is pretty hot right now, unemployment is super low, and employers are having a hard time finding people, so it just puts more pressure on them to pay for good recruiters to find people who are not only actively looking but who are passive job seekers.

Max: I’m guessing that those budgets for hiring recruiters are also budgets towards paying for jobs listings.

Patrick: Yeah, that’s right.

Max: Is this a phenomenon that you guys can look back far in your data sets to periods where the economy and job market was not as hot and see how the trend line for recruiter compensation has changed through economic ups and downs?

Patrick: Yeah. Glassdoor has been around for about 10 years, so we have data going back that far. We have quite a bit of history there with our local payer reports. We look at just maybe a year out and that alone is enough insight to tell us a pretty good story.

Max: Are there different metro areas that people should be aware of as being great places to move to? If perhaps they’ve just graduated college and haven’t lined a job and are worried about where they can find employers?

Patrick: I think it definitely depends on what you do.

Max: Sure. If we were focusing on the audience that are people who might have unconventional backgrounds and are looking to get into engineering or data science, what metro area is it in that case?

Patrick: Definitely San Francisco, Seattle, what you would expect! And in those areas–as employers are hard pressed to find engineers–if you’re not someone who comes from a traditional computer science background, that’s a perfect place to try to find an entry level job.

Max: For our audience that maybe missed it earlier in the interview, you spent 4 years at as an analyst in marketing, customer acquisition and product usage. You were describing, using Excel, querying and grabbing data from different data sources to drive business decisions.

Then you left to join Glassdoor as an analyst as well, but most recently you reapplied for an internal role as a data scientist. So for our audience that might not have the bona fides or the undergrad CS degree, there’s absolutely paths–accidental paths–to finding your way to more engineering-focused roles.

One thing we have focused on and have talked about previously with other Accidental Engineer guests is the self-teaching process.

So, for people who are having problems with motivation or maybe even not knowing what to learn, do you have any advice you can share about maybe your own experience acquiring your VBA skills, even when you were on the job at

Patrick: Yeah, for me, it’s really about picking something that you want to do, and then acquire the skills that you need to be able to do that.

For me it was about automating a really mundane task. And then later it became, “Okay, I wanna be able to try an algorithm to try to predict something like ‘who the best teams in the NBA are gonna be this season?’”

So it really just depends on what you’re interested in, right? If you’re not interested in what a random forest model can do for you, then, you know, look elsewhere! Data science is super broad. I mean engineering is super broad, right? So just pick something that you want to do.

Max: Before we started recording, you were talking about how–like you just said–data science is a super broad field. There are a lot of people in the field who have had or are coming from 6+ years in Ph.D. programs and that the diversity of things you might work on as a data scientist are huge at this point in time and the job title has not differentiated for things that people are working on.

So, for our audience that is curious about what you do at Glassdoor in contrast to what someone who spent six years in a Ph.D. program is doing? Can you speak to what those contrasts might be, maybe even what tools you guys use differently?

Patrick: Between like a Ph.D. and someone who comes from a traditional, like, analytics background?

Max: Correct, yeah.

Patrick: I would say that from an analyst’s standpoint, I would say the main contrast there is just that analysts work with the business. They help make people within a business make decisions. So the typical tools are just pulling data, using Excel, formatting it.

For a Ph.D–someone who has a lot of research experience–they’re gonna be knowledgeable about probably R, maybe Python, and they’re gonna have a really good sense of how to dig into a broad and unstructured problem. Versus an analyst who knows what they’re looking for already, they’re trying to answer a question that’s basically given to them.

Max: Got it, got it.

Patrick: One thing that I was gonna mention for people who are trying to get into data science is that I usually tell people to just try to get their foot in the door as an analyst.

You don’t have to start out as a data scientist. Like I said, the “data scientist” role is really broad. A lot of people have described it as the combination of business acumen, statics, and computer science, and it’s really hard to get all those things right out a school, right?

So for me, coming into the field as an analyst, I feel like I’m really strong on the business side and that really helped me. For someone who is trying to get into that role, it maybe more difficult to present yourself as, “Hey, I have all these skills,” without having the background to show for it.

I would say just get your foot in the door and find somebody who’s willing to mentor you and teach you, giving you the opportunity to progress towards that career.

Max: Something else that I find really in common between both of our careers is that one of my early jobs was as an analyst and—like you described–employers don’t really offer official, formal training for skills that can help you on the job.

But by all means, you should be teaching yourself these skills, and knowing what skills to learn is easier once you have a job because your job and your employer can give you direction about what it is that they really care about.

Patrick: Absolutely.

Max: Once you get your foot in the door you have an advantage over candidates who have spent six-plus years in Ph.D. programs by knowing what it is that the business cares about from a skills perspective, from an output perspective.

Like Patrick says, great advice is to get that first job and figure out after-the-fact what skills you need to develop. This has been extremely educational. We should plug that Glassdoor is hiring and actively growing. Are there roles on your team?

Patrick: Yeah, we’re hiring on all teams. We definitely need more data scientists. We hire not only in Mill Valley (CA), but we also have an office in San Francisco. So for those who can’t make it up there, that’s a really great option.

Max: Okay. And for people who might not be quite yet prepared to apply: what are the skills that you guys are looking for in particular, maybe specific to analytics and data science?

Patrick: Yeah, definitely a programming language like R or Python, some language to be able to pull data, and then experience working with data and producing results with that.

Max: Sweet, cool. Well, this has been awesome. Thanks for joining us, Patrick. I’m hoping we’ll get a chance to do it again soon!

Patrick: All right, thank you so much!