How to Be a Data Scientist: An Interview with Dr. Pete Meyers of Moz

20

Here at WordStream, we occasionally do a “Make a Wish” exercise where everyone gets to make a few (work-related) wishes and we try our best to make them a reality. Recently, I wished to hire a data scientist for the content marketing team. And lo and behold, my wish has been granted! We’re planning to hire for a new data scientist role in early 2015.

Dr. Pete MeyersI had a lot of questions about how data science works in conjunction with content marketing, so I turned to one of the best in the field: Peter Meyers at Moz, perhaps better known by his Twitter handle Dr. Pete. Pete was kind enough to answer all my questions about what a data scientist does, what data science tools he uses, the ethics of data science and more.

Peter Meyers is a cognitive psychologist and resident Marketing Scientist at Moz. His latest obsession is hunting the algorithm to find out what makes Google tick.

Moz is all about TAGFEE, the T in which stands for “transparency.” What does transparency mean to you, in terms of content marketing and data sharing? (This word always makes me think of the old joke about the guy who goes to see a psychiatrist wearing only plastic wrap.)

How many psychiatrists does it take to change a light-bulb? One, but the light-bulb has to *want* to change.

I see data transparency much like it gets treated in science (done right, which it too often isn’t) – you try to be clear about your methodology, you use measurements that are as unbiased as possible, and you don’t present data (especially visually) in a way that skews it to your own agenda. When we put out data, I try to be clear about how we got it and to present it in a way that lets people draw their own conclusions. As a marketer, obviously there’s a story-telling component, but I think there’s a difference between bringing someone along on a journey and blindfolding them and throwing them in your trunk.

How did you come up with the idea for MozCast? How does it fit into your content marketing strategy? (For example: do people have it bookmarked and check it every day, or does it bring you net-new traffic through search?) How has MozCast evolved since you first conceived it?

MozCast really came out of the Algorithm History project. Google revealed that there were 500+ algorithm updates in 2010 (and that number keeps increasing), and yet we only named a handful. So, I became obsessed with the idea that maybe we could measure day-over-day change in the SERPs. In many ways, that turned out to be a lot harder than I thought, and I learned just how dynamic search results really are.

MozCast has powered maybe a dozen blog posts, dozens of Google+ posts, hundreds of tweets, a handful of presentations (one that topped 100K views on Slideshare), and the site just topped a quarter-million visits in 2014. It’s been far more successful than I’d have ever expected.

The project really took on a life of its own. It turns out that algo flux is a tough problem (too much noise, not enough signal), but studying SERPs got me interested in tracking features, discovering Google tests in progress, and generally trying to use SERP data to understand where Google is headed. MozCast is essentially four distinct research tools now, two of which are public.

Mozcast

The weather report according to MozCast

What did it take, from an investment perspective, to get a project like MozCast up and running? How can smaller businesses with lower budgets use data to their advantage for marketing?

The first month of MozCast cost us $9.95 for hosting and $21.99 for proxy IPs, for a total of about $32. I build the prototype myself, starting with only 50 keywords, and used that proof-of-concept to sell Moz on the project. I think that’s the key – show, don’t tell. Had I explained the idea, a bunch of people would’ve said “Hey, that sounds cool,” and then we would’ve all moved on with our lives. By investing my own time and energy up front on the minimum viable version, the selling did itself.

Of course, eventually our design team took over the front-end and the site moved to internal servers, but I still build most of the back-end. In terms of time spent, it obviously costs a lot more than $32/month now, but we’ve been able to scale that investment along with the returns.

Talk to me like I’m a third grader about data visualization: What are your favorite tools? Do you absolutely need a designer?

Pictures speak louder than numbers, and no one wants to stare at a giant pile of numbers. I’m a big believer in using the simplest visualization possible – I’d say 90% of what I do is just bar charts. I don’t tend to get fancy with the tools, because I think that sometimes that gets in your way – focus on the story, and “draw” that story. If you can do that with Crayolas and dinosaurs, awesome.

data scientist visualizations

Data viz from a recent Moz post

What about tools for data analysis? Do you turn to specific tool(s) (Excel, Access, SAS, Stata, R?) to guide you when you’re digging into a data set or do you shoot from the hip?

I’m a “whatever works” kind of guy, so I do a lot in Excel or code it straight into PHP (when I need to use SQL data). Our data science team will use tools like SAS when we need to do more advanced statistical analysis, and they use R from time-to-time. They’re Python people, and mock me for using PHP. Most of my work isn’t technically “big” data, so I find simpler tools work fine. We have a Big Data team at Moz, but they work with things like the link graph, where you’re talking about billions of data points. Our Data Science team is more inclined to work on modelling and “messy” problems, like detecting spam or scoring content.

I loved your session at MozCon about generating ideas. Can you provide some tips for finding a content niche and using data to help tell your story? Do you come up with ideas first, then look for data to support it, or the other way around?

Finding your niche is really tough. I’m going to say something controversial – I really don’t believe in “passion” the way most people seem to mean it. I wasn’t that kid who knew what he wanted to be in first grade. I find that I’m passionate about work when I’m learning and discovering and digging into things that other people haven’t dug into. I didn’t know I was passionate about this niche until I was already up to my elbows. Sometime, you just have to start digging and see what happens.

I really had no idea what MozCast would tell us, if anything. I certainly think there are times when I have a hypothesis, especially in content marketing, but sometimes it’s just a matter of collecting the data and learning from it. It wasn’t until I watched SERPs every day that I started to find out how and why they were interesting.

What steps do you take to make sure the data you release is accurate? Is there a formal review process?

We don’t have a formal process right now. We’re lucky to have an enthusiastic community that isn’t shy about criticism, so I try to put my methods out there and let people at them. We’ve certainly learned from that and adjusted along the way. I also try to make it clear how data should be interpreted. For example, a lot of MozCast is relative. If I say that X% of SERPs have image results, for example, that absolute percentage could very well be wrong – what’s important is the trend over time, which I think is accurate. We’re definitely dealing with mess problems, though, so we adjust all the time.

What roadblocks have you run into when working with data? How did you overcome them?

Data is messy, especially in the real world. You just have to keep going, and try not to get lazy. I think I’m fortunate, in many ways, because marketing is a world where you can publish, learn, and evolve. I don’t have to spend two years on a study only to find out I got a null result and won’t ever get published. I get to think out loud a bit. I try not to hide from the inconvenient truths, I admit when I’m wrong, and I move forward. Our industry is moving fast, and the data is changing every day. If you need your problems clean and neat, don’t be a data scientist, IMO.

how to become a data scientist

If you're a data scientist, your desk might look like this (via Pascal)

Are there times where you have access to juicy data that you want to share for content marketing reasons, but can’t due to conflicts of interest? For example, let’s say sharing the data might compromise a business partnership with a third party. What do you do in those situations?

Since Moz is out of the consulting business, we don’t generally have to worry about client data, but I often run into situations where someone shares something with me privately that I can’t pass along. It can be frustrating, but I try to add it to the story in my head. In other words, I might not be able to share that Big Brand X lost a ton of search traffic, but I can use that information to help prove or disprove a theory, and then I can publish that theory, more secure that the data supports it.

We try to balance empathy for our audience with empathy for sources and businesses, and that’s not always clear-cut. When someone says “Don’t share this”, that’s it – I don’t share it. When data might help our readers and yet could make Google unhappy from a business perspective, I’m probably going to share it. If that data would materially harm someone at Google, that would be different. I’ll call Google out for behavior that seems hypocritical, for example, but I try hard not to attack people like Matt Cutts personally. Matt’s a human being, and just one employee of a giant company.

How can marketers use data to address questions that are fuzzier and less measurable? For example, PPC marketers benefit from the fact that a lot of our work is easily measurable and quantifiable; we can quickly see and measure reach, CTR, conversion rate, Quality Score, etc. We’ve been showing how these quantifiable variables relate to each other for a long time. But what about questions like “What kinds of offers perform best?” or “What kinds of keywords should I be focusing on most?” or “How does my [social/offline media/email marketing] affect my [SEO/paid search]?” We don’t always have hard numbers available to measure and compare – particularly when you’re looking across industries. How do you begin to address big questions like these where measurable data is less abundant?

signal versus noiseMy advisor liked to say there were three levels of understanding statistics. Level 1 is when you know enough to be dangerous. Level 2 is when you know all the rules and pointing them out to everyone, but you apply them so rigidly that you can’t get anything done. Level 3 is when you know how to break the rules.

Data is messy, and yet we treat it like it’s a law of nature. Most of the time in marketing, we do the best we can. I think measuring something is often better than measuring nothing, as long as we’re honest and are willing to go where that data guides us. Even classic metrics that are clearly defined are limited and don’t give us the whole picture. Sometimes, I think diving into the mess actually makes us more honest and productive than just using the metrics everyone else says are ok.

When in doubt, make a hypothesis based on the data you have and test it. You may be wrong. As long as you’re not making billion-dollar bets, that’s ok. Frankly, even the people who make billion-dollar bets are working on incomplete data – they just have more money to bet than the rest of us.

I would highly, highly recommend Nate Silver’s “The Signal and the Noise” for anyone who wants to understand data in the real world and the limitations of our current approaches. Even our generally accepted approaches to measurement are relatively new and have flaws.

What are the key qualities you would look for in a potential data scientist hire? Should marketers poach data scientists from other industries? What absolutely needs to be in my job description for this data scientist role?

It’s a weird field right now, because the term “data scientist” is getting popular but means so many different things to so many different people. I think one big difference between analysts and data scientists is that data scientists can handle messy problems. People who love numbers often seem to love them for their orderliness – they like well-defined problems. If you’re a quant, that’s fine – and those skills are useful. If you’re going to dive into a pile of data and sort out the mess, though, that personality isn’t going to work. We think it’s all just math, but it takes a different mindset.

I’d also say that data scientists tend to be in roles that require cross-departmental communication and actively navigating organizations and big decisions. If you’re a data scientist hired to inform C-level decisions and you can’t make your case to the CEO, you’re not right for the job. So, communication skills, written and verbal, are an absolute necessity.

As for poaching, I think that’s more an ethical decision. Truthfully, there aren’t a lot of proven data scientists to poach right now, and that poaching is going to cost you well into six-figures.

Most companies still view math as an academic discipline, and are going to want to see some formal education. I’m not saying that’s right or wrong, but it’s harder to prove your street smarts in this area. If someone could clearly show projects where they worked with messy problems and developed real insights, I personally wouldn’t get hung up on their formal background. Corporate America is probably going to want advanced degrees, though, especially at a certain pay-scale.

Thank you, Dr. Pete!

If you’re a data scientist with an interest in marketing – or you want to be one! – keep an eye on our jobs page, or get in touch with me via email or Twitter.

Find out how you're REALLY doing in AdWords!

Watch the video below on our Free AdWords Grader:

Visit the AdWords Grader.

Comments

larsjaeger
Oct 15, 2014

i agrey It’s a weird field right now, because the term “data scientist” is getting popular but means so many different things to so many different people.

Elisa Gabbert
Oct 15, 2014

True! It's a very trendy term right now. I think it was Harvard Business Review that called it "the sexiest job in America" or something like that. (Ha!)

AdwordsM
Oct 15, 2014

Thanks for sharing your views.Can a PPC Marketer Become a Data Scientist?Best,  

Elisa Gabbert
Oct 15, 2014

Absolutely! Our own Mark Irvine manages PPC campaigns for clients and does a lot of data science for us on the side. You just have to LOVE data and know how to make sense of it.

Dr. Pete
Oct 16, 2014

I did contract PPC in a past-life, and I think PPC people are definitely versed in messy data problems. You have a ton of numbers to work with, but you're also operating in a massive and dynamic ecosystem where every change you or a competitor makes can upset the balance. These are difficult and, added all up, multi-billion dollar problems.

Elisa Gabbert
Oct 16, 2014

Multi-billion dollar problems are good problems to have! Or are they...

Arthur
Nov 05, 2014

Really interesting to learn about peoples's ways of dealing with problems.Thank you for this very good post...  

sara corner
Oct 17, 2014

Thanks for sharing info,Can really a marketer be a  Data Scientist?

Brian
Oct 15, 2014

Really great professional insight into the science of data (no joke, Dr. Pete is an algorithmic legend). It will be interesting to see how the state of data mining continues to evolve alongside the internet.

Elisa Gabbert
Oct 15, 2014

Thanks Brian! I agree -- I think especially in our industry (search marketing) it has so much potential to clear up empty predictions, speculation, hype and help us start acting on and reacting to data/evidence. But as Pete says above, it'll always be tricky since data is messy and spinnable.

pipo
Oct 15, 2014

It is alwyas interesting to learn about peoples's ways of dealing with problems. Very good read, thank you!

Elisa Gabbert
Oct 15, 2014

Thanks for reading!

Andre Van Kets
Oct 16, 2014

Hi Elisa (and Dr Pete) - great interview and a sneaky smart format for spreading the word of your vacant resident geek marketing position :)I also enjoyed Dr Pete's talk at Moz this year. He has inspired me to create on a "Big Five Big Data" project (we're a online travel company that offers safaris to Africa). Will let you know when we get our MVP up and running. Would love to get feedback and insights from people who're thinking along the same lines.Cheers - Andre

Elisa Gabbert
Oct 16, 2014

That's great Andre! Excited to see what you come up with.

Dr. Pete
Oct 16, 2014

That's really awesome to hear, Andre. One person trying something new and big because of that talk is more rewarding for me than 100 people saying "Good talk!" Let me know how it goes.

Richard
Oct 16, 2014

My son is very interested in becomming a data scientist when he graduates. Andy advice other than sharing this article with him? Many thanks, great article.

Elisa Gabbert
Oct 16, 2014

Is he in college or high school? If high school, urge him to look for universities that offer a data science degree.

Dr. Pete
Oct 16, 2014

It's a little tough right now, because the field is evolving so quickly. Most of my colleagues who are more properly data scientists in the usual sense have advanced math or statistics degrees. Now, schools are starting to create data science programs, but I imagine the focus varies pretty wildly. I know data scientists with general science backgrounds (I'm an experimental psychologist by training) and even computer science backgrounds.The beauty of the field right now is that the tools and techniques are pretty widely available, and the best thing you son can probably do is to dig in and learn all he can about where the field is at. THere are some great resources on the web, including:Data Science Tutorials...and a lot of online classes popping up on places like Coursera:Introduction to Data Science

Elisa Gabbert
Oct 16, 2014

Thanks for the tips!

Andreas Voniatis
May 27, 2015

Hi Elisa,

In addition to material on Coursera, I'd strongly recommend doing a lot of maths study particularly linear algebra, multivariable calculus and probability. Your son will also need to learn R and Python. A lot of data science is spent getting the data in. So whilst the theoretical background (on software engineering and advanced mathematics) is important, there is no substitute for getting your hands dirty by opening APIs, exploring the data, building models, automating via machine learning and testing them. The best data scientists have hands on experience which is the ability to come up with solution ideas as challenges pop up in the real world when solving data science problems. I don't think this advice is age limited either (even if it is more challenging for a school pupil).

One of the things is to ask your son - what area is he interested in or passionate about? Exploring the data requires some background knowledge on the subject area you want to make predictions in. The passion is also important as it will fuel his motivation to 'crack on' when the going gets tough during his studies.

Andreas

Leave a comment