How to Be a Data Scientist: An Interview with Dr. Pete Meyers of Moz
Here at WordStream, we occasionally do a “Make a Wish” exercise where everyone gets to make a few (work-related) wishes and we try our best to make them a reality. Recently, I wished to hire a data scientist for the content marketing team. And lo and behold, my wish has been granted! We’re planning to hire for a new data scientist role in early 2015.
Peter Meyers is a cognitive psychologist and resident Marketing Scientist at Moz. His latest obsession is hunting the algorithm to find out what makes Google tick.
Moz is all about TAGFEE, the T in which stands for “transparency.” What does transparency mean to you, in terms of content marketing and data sharing? (This word always makes me think of the old joke about the guy who goes to see a psychiatrist wearing only plastic wrap.)
How many psychiatrists does it take to change a light-bulb? One, but the light-bulb has to *want* to change.
I see data transparency much like it gets treated in science (done right, which it too often isn’t) – you try to be clear about your methodology, you use measurements that are as unbiased as possible, and you don’t present data (especially visually) in a way that skews it to your own agenda. When we put out data, I try to be clear about how we got it and to present it in a way that lets people draw their own conclusions. As a marketer, obviously there’s a story-telling component, but I think there’s a difference between bringing someone along on a journey and blindfolding them and throwing them in your trunk.
How did you come up with the idea for MozCast? How does it fit into your content marketing strategy? (For example: do people have it bookmarked and check it every day, or does it bring you net-new traffic through search?) How has MozCast evolved since you first conceived it?
MozCast really came out of the Algorithm History project. Google revealed that there were 500+ algorithm updates in 2010 (and that number keeps increasing), and yet we only named a handful. So, I became obsessed with the idea that maybe we could measure day-over-day change in the SERPs. In many ways, that turned out to be a lot harder than I thought, and I learned just how dynamic search results really are.
MozCast has powered maybe a dozen blog posts, dozens of Google+ posts, hundreds of tweets, a handful of presentations (one that topped 100K views on Slideshare), and the site just topped a quarter-million visits in 2014. It’s been far more successful than I’d have ever expected.
The project really took on a life of its own. It turns out that algo flux is a tough problem (too much noise, not enough signal), but studying SERPs got me interested in tracking features, discovering Google tests in progress, and generally trying to use SERP data to understand where Google is headed. MozCast is essentially four distinct research tools now, two of which are public.
The weather report according to MozCast
What did it take, from an investment perspective, to get a project like MozCast up and running? How can smaller businesses with lower budgets use data to their advantage for marketing?
The first month of MozCast cost us $9.95 for hosting and $21.99 for proxy IPs, for a total of about $32. I build the prototype myself, starting with only 50 keywords, and used that proof-of-concept to sell Moz on the project. I think that’s the key – show, don’t tell. Had I explained the idea, a bunch of people would’ve said “Hey, that sounds cool,” and then we would’ve all moved on with our lives. By investing my own time and energy up front on the minimum viable version, the selling did itself.
Of course, eventually our design team took over the front-end and the site moved to internal servers, but I still build most of the back-end. In terms of time spent, it obviously costs a lot more than $32/month now, but we’ve been able to scale that investment along with the returns.
Talk to me like I’m a third grader about data visualization: What are your favorite tools? Do you absolutely need a designer?
Pictures speak louder than numbers, and no one wants to stare at a giant pile of numbers. I’m a big believer in using the simplest visualization possible – I’d say 90% of what I do is just bar charts. I don’t tend to get fancy with the tools, because I think that sometimes that gets in your way – focus on the story, and “draw” that story. If you can do that with Crayolas and dinosaurs, awesome.
Data viz from a recent Moz post
What about tools for data analysis? Do you turn to specific tool(s) (Excel, Access, SAS, Stata, R?) to guide you when you’re digging into a data set or do you shoot from the hip?
I’m a “whatever works” kind of guy, so I do a lot in Excel or code it straight into PHP (when I need to use SQL data). Our data science team will use tools like SAS when we need to do more advanced statistical analysis, and they use R from time-to-time. They’re Python people, and mock me for using PHP. Most of my work isn’t technically “big” data, so I find simpler tools work fine. We have a Big Data team at Moz, but they work with things like the link graph, where you’re talking about billions of data points. Our Data Science team is more inclined to work on modelling and “messy” problems, like detecting spam or scoring content.
I loved your session at MozCon about generating ideas. Can you provide some tips for finding a content niche and using data to help tell your story? Do you come up with ideas first, then look for data to support it, or the other way around?
Finding your niche is really tough. I’m going to say something controversial – I really don’t believe in “passion” the way most people seem to mean it. I wasn’t that kid who knew what he wanted to be in first grade. I find that I’m passionate about work when I’m learning and discovering and digging into things that other people haven’t dug into. I didn’t know I was passionate about this niche until I was already up to my elbows. Sometime, you just have to start digging and see what happens.
I really had no idea what MozCast would tell us, if anything. I certainly think there are times when I have a hypothesis, especially in content marketing, but sometimes it’s just a matter of collecting the data and learning from it. It wasn’t until I watched SERPs every day that I started to find out how and why they were interesting.
What steps do you take to make sure the data you release is accurate? Is there a formal review process?
We don’t have a formal process right now. We’re lucky to have an enthusiastic community that isn’t shy about criticism, so I try to put my methods out there and let people at them. We’ve certainly learned from that and adjusted along the way. I also try to make it clear how data should be interpreted. For example, a lot of MozCast is relative. If I say that X% of SERPs have image results, for example, that absolute percentage could very well be wrong – what’s important is the trend over time, which I think is accurate. We’re definitely dealing with mess problems, though, so we adjust all the time.
What roadblocks have you run into when working with data? How did you overcome them?
Data is messy, especially in the real world. You just have to keep going, and try not to get lazy. I think I’m fortunate, in many ways, because marketing is a world where you can publish, learn, and evolve. I don’t have to spend two years on a study only to find out I got a null result and won’t ever get published. I get to think out loud a bit. I try not to hide from the inconvenient truths, I admit when I’m wrong, and I move forward. Our industry is moving fast, and the data is changing every day. If you need your problems clean and neat, don’t be a data scientist, IMO.
If you're a data scientist, your desk might look like this (via Pascal)
Are there times where you have access to juicy data that you want to share for content marketing reasons, but can’t due to conflicts of interest? For example, let’s say sharing the data might compromise a business partnership with a third party. What do you do in those situations?
Since Moz is out of the consulting business, we don’t generally have to worry about client data, but I often run into situations where someone shares something with me privately that I can’t pass along. It can be frustrating, but I try to add it to the story in my head. In other words, I might not be able to share that Big Brand X lost a ton of search traffic, but I can use that information to help prove or disprove a theory, and then I can publish that theory, more secure that the data supports it.
We try to balance empathy for our audience with empathy for sources and businesses, and that’s not always clear-cut. When someone says “Don’t share this”, that’s it – I don’t share it. When data might help our readers and yet could make Google unhappy from a business perspective, I’m probably going to share it. If that data would materially harm someone at Google, that would be different. I’ll call Google out for behavior that seems hypocritical, for example, but I try hard not to attack people like Matt Cutts personally. Matt’s a human being, and just one employee of a giant company.
How can marketers use data to address questions that are fuzzier and less measurable? For example, PPC marketers benefit from the fact that a lot of our work is easily measurable and quantifiable; we can quickly see and measure reach, CTR, conversion rate, Quality Score, etc. We’ve been showing how these quantifiable variables relate to each other for a long time. But what about questions like “What kinds of offers perform best?” or “What kinds of keywords should I be focusing on most?” or “How does my [social/offline media/email marketing] affect my [SEO/paid search]?” We don’t always have hard numbers available to measure and compare – particularly when you’re looking across industries. How do you begin to address big questions like these where measurable data is less abundant?
Data is messy, and yet we treat it like it’s a law of nature. Most of the time in marketing, we do the best we can. I think measuring something is often better than measuring nothing, as long as we’re honest and are willing to go where that data guides us. Even classic metrics that are clearly defined are limited and don’t give us the whole picture. Sometimes, I think diving into the mess actually makes us more honest and productive than just using the metrics everyone else says are ok.
When in doubt, make a hypothesis based on the data you have and test it. You may be wrong. As long as you’re not making billion-dollar bets, that’s ok. Frankly, even the people who make billion-dollar bets are working on incomplete data – they just have more money to bet than the rest of us.
I would highly, highly recommend Nate Silver’s “The Signal and the Noise” for anyone who wants to understand data in the real world and the limitations of our current approaches. Even our generally accepted approaches to measurement are relatively new and have flaws.
What are the key qualities you would look for in a potential data scientist hire? Should marketers poach data scientists from other industries? What absolutely needs to be in my job description for this data scientist role?
It’s a weird field right now, because the term “data scientist” is getting popular but means so many different things to so many different people. I think one big difference between analysts and data scientists is that data scientists can handle messy problems. People who love numbers often seem to love them for their orderliness – they like well-defined problems. If you’re a quant, that’s fine – and those skills are useful. If you’re going to dive into a pile of data and sort out the mess, though, that personality isn’t going to work. We think it’s all just math, but it takes a different mindset.
I’d also say that data scientists tend to be in roles that require cross-departmental communication and actively navigating organizations and big decisions. If you’re a data scientist hired to inform C-level decisions and you can’t make your case to the CEO, you’re not right for the job. So, communication skills, written and verbal, are an absolute necessity.
As for poaching, I think that’s more an ethical decision. Truthfully, there aren’t a lot of proven data scientists to poach right now, and that poaching is going to cost you well into six-figures.
Most companies still view math as an academic discipline, and are going to want to see some formal education. I’m not saying that’s right or wrong, but it’s harder to prove your street smarts in this area. If someone could clearly show projects where they worked with messy problems and developed real insights, I personally wouldn’t get hung up on their formal background. Corporate America is probably going to want advanced degrees, though, especially at a certain pay-scale.
Thank you, Dr. Pete!