Podcast

Alternative data: What is it and what is its role in marketing?

Third-party data or alternative data, when leveraged and combined with a company’s proprietary data, can reveal useful information and new insights about consumers. However, free data can be difficult to find, a hassle to sort through and requires validation to ensure the data is clean.

Zack Pike, head of data at Callahan, discusses where to find alternative or free data, what, exactly, an analyst may have to do to organize the data into a usable form and how it can be applied to your marketing strategy.

Listen here:


(Subscribe on iTunesStitcherGoogle PlayGoogle PodcastsPocket Casts or your favorite podcast service. You can also ask Alexa or Siri to “play the Uncovering Aha! podcast.”)

Welcome to Callahan’s Uncovering Aha! podcast. We talk about a range of topics for marketing decision-makers, with a special focus on how to uncover insights in data to drive brand strategy and inspire creativity. Featuring Jan-Eric Anderson and Zack Pike.

Jan-Eric:
I’m Jan-Eric Anderson, chief strategy officer at Callahan.

Zack Pike:
And I’m Zack Pike, vice president of data at Callahan.

Jan-Eric:
Zack, it’s good to see you. Thanks for joining me on the podcast. I wanted to connect with you today to talk about something that we’ve talked about on some past podcasts, but it’s something I want to dive a little bit deeper on and explore, it’s this idea that we’ve talked about, about free data. We’ve talked a lot about useful data, the right type of information to bring in, and data that can be leveraged.

Jan-Eric:
An obvious thing for marketers is to use their own data, their own sales data information that they have around their product sales or mix or distribution or whatever the data is about, but we’ve talked also about how a lot of times unlocking insight can come from combining proprietary data that you have about your own company with other information.

Jan-Eric:
What I want to talk a little bit more about is free data, just try to understand it, and then understand what I think is your point of view, that it’s generally a good thing to be able to acquire free data. Where does free data come from? What is free data?

Zack Pike:
Yeah. It’s actually, in the marketing space and even in the business intelligence circles, it is kind of a new topic, but it’s been happening for years, maybe decades in some form or fashion in the hedge fund industry. Hedge funds, obviously, when they’re investing a large amount of money, they have a lot of data on the companies that they’re investing in, all the sales data and performance information around the company itself to make sure that they’re putting their money in the best interest of their investors.

Zack Pike:
When they can augment that data with data that’s not part of the company, and I’m talking about things that are influencing factors on the performance of that business, so it could be something as simple as growth in population, in different areas of the country, aligned with growth of the company in those areas of the country. When you can start to understand how external factors are impacting the business, your decisions on investment get better, you start making more money on every dollar you put somewhere.

Zack Pike:
Well, the same idea is true in marketing, right? We are investing dollars in different marketing channels, different areas of the country against different products to try and drive sales. If we can understand the other influencing factors, and usually, there’s data around some of those factors, we want to use that.

Zack Pike:
That’s the whole idea behind this third-party free data. In the hedge fund industry, they call it “alternative data.” The thing that a lot of people don’t realize is that, to your point, a lot of it is actually free. It’s just sitting out there in a database with the government. A place like the NOAA offers weather data freely available. It’s just sitting out there for the taking.

Jan-Eric:
Gotcha. Can you give me some examples of alternative data?

Zack Pike:
Yeah.

Jan-Eric:
I’ll call it “free data” or “alternative data,” “third-party data.” What are some examples of that? What are we talking about here?

Zack Pike:
Yeah, so a couple that we use a lot and that I think people would probably be able to grasp really quickly is Census Bureau. The Census Bureau, of course, does the census, right, but they collect a lot of other data. There are surveys that happen every year through the Census Bureau figuring out what’s going on in different communities, who those people are, what their interests are, how the population is changing. They’re collecting all of that data and making it available on the Census Bureau website. If you start digging deeper and deeper in there, you could find these tables. That’s one example. That’s demographic info, population, stuff like that, changes in population, which tends to be pretty valuable.

Zack Pike:
Bureau of Labor statistics takes a lot of good data in. They have a consumer expenditure survey that is loaded with how consumers are spending money year to year and how that’s changing over time and it gets very specific and it’s all geographically aligned, so you’re not just looking at this for the country.

Zack Pike:
If you do business in a certain area or you’re trying to grow in a certain area of the country where you’re having trouble in a certain area of the country, you can dig through this data just for that portion of the country. Even things like the consumer price index, which is our measure to typical measure of inflation for the country, but that’s not just one number. They collect data on hundreds of products every year so then you can start to look at your pricing in relation to other products and stuff like that. All this data is 100% free. It’s sitting out there for anybody to use.

Zack Pike:
Another one that we use a lot, we use a couple of different suppliers for this data, but the NOAA makes all their weather data available, so all of the NOAA weather stations around the country, and in fact, around the world, are recording data and they make this data available to you. If you have stores in certain areas of the country and you want to look and see how your foot traffic is impacted by weather, you can go mine their data, put it up against your sales or foot traffic data and start to draw insights.

Jan-Eric:
Just in the examples that you’ve just given, basically, you can capture population counts, quantifying populations, all sorts of characteristics around that population, demographic information, household income, debt information, ethnicity, size of household. You’re talking about really being able to dimensionalize, and customize it any way you want as a nice alternative to maybe broader measurement pieces of things like a MRI, for example, which don’t really allow you to get down into very specific geographies. You’re also talking about spending behaviors down to the geography and then things like weather.

Jan-Eric:
These pieces, it gives you a lot of variables, I guess, is what you’re saying. You’ve got a lot of variables that you’re working with, then you start to overlay that with your proprietary information or your own data about your own company. You can start to look for insight and understanding around what drives what or what correlates with what and things like that. Fair?

Zack Pike:
Yes. Of course, we use it mostly for opportunity identification, right? It’s: Where’s my next marketing dollar can be spent smartest? Even things as simple as population. This is a metric that’s really easy to get and it’s rarely used, but when I’m looking at sales data geographically, I am always normalizing for population, right? Because if we just grow anybody, sales, data, any national company sales data on a map, the biggest states are going to be California, Texas, and New York. That’s where most of the sales is going to be. It doesn’t matter the company. It’s always going to be like that.

Zack Pike:
But if I normalize for population, you will see very wide differences, right? You might find that you’re more heavily penetrated in the South than you are the East or West Coast. You may find that you’re better in the West Coast than you are the East Coast. Then that starts to draw questions like, “Okay, well, do I have more room to grow there? Or am I never going to win on one area of the country?” That is something that literally would take an analyst an hour to do. For some reason, a lot of people just don’t think about it.

Jan-Eric:
Gotcha. Yeah. The Census Bureau data, where do you find that? Where do you go to get that information?

Zack Pike:
Well, it’s on the Census Bureau website. The problem-

Jan-Eric:
Just that simple? Just go to a website and you can click here to get the data?

Zack Pike:
… There’s some digging that has to happen. It’s sitting on, I can’t remember their website off the top of my head, but it’s the main Census Bureau website. You’ll find a section in there for all their data. Now, the problem, and it’s not specific to Census Bureau, it’s all of them, is the data is not clean. It is not nicely structured. It is not ready for analysis. It’s not like you’re downloading a nice, clean Excel document.

Jan-Eric:
Here’s the catch: It’s free, but it sounds like you got to do some work for it.

Zack Pike:
Right. It can be, depending on your skillset as an analyst and as a company, your resources, it can be expensive to get the data ready to be incorporated into your analysis.

Jan-Eric:
Expensive from hard cost standpoint or from a time investment standpoint?

Zack Pike:
Time. People time.

Jan-Eric:
Gotcha.

Zack Pike:
The challenge can be weird things happen. Some data sources, if you open an Excel document, you have the headers across the top that tell you what each column is, it’s like sales and quantity and stuff like that. Well, in a lot of the government’s data, those are actually codes. They’re like numbers and letters that make no sense. Well, then you have to go find the coding document and then you have to wire those two things together to understand what the heck you’re actually looking at.

Zack Pike:
On top of that, a lot of times the definitions of the fields in the data you’re looking at don’t make sense. You then have to do a bunch of research to say, “Okay, when I’m looking at population normalized 2010, what does that actually mean?” Is that the actual account of people? Is that normalize for something somehow? When was that from?” You have to really understand what those definitions are so you know what to pull into your analysis.

Zack Pike:
There’s that whole definition piece, but then there’s also a lot of times, you’re mashing together multiple tables. It’s not like the government just makes us available in a nice big clean database. You’re doing several different polls of data. If you’re really sophisticated, you can hook into APIs that the government offers to pull this data out, but even those aren’t foolproof and there’s a lot of work that goes into it.

Zack Pike:
I mean, we’re getting, I think, to the reason why no one does this is because it is hard. It’s not easy to do. It’s not something an analyst is going to be in the middle of an analysis and be like, “Oh, you know what’d be great? If I could pull demographic information by zip code,” and he just goes and does it. It doesn’t really work that way.

Jan-Eric:
Yeah, but the benefit, if you can get it figured out, is to be able to say, “Hey, let’s get it in here,” and now that we’ve got it in and we’ve got it cleaned up and we structure it right and we can integrate it right within our own database, then it’s there, right?

Zack Pike:
Yep.

Jan-Eric:
It doesn’t have to be redone. Now, let me ask you something: Let’s say you could find this data and maybe it’s scattered and unorganized all over the place, but you’re able to locate it and decode how the data is set up and understand what the data is itself and you’re able to then get it transferred into your own database. If you’ve come that far, now you’re ready to do some work because now you’re going to reap the benefits of that work, rather. When an updated data comes and when, when there’s a refresh on the information, are you starting over?

Zack Pike:
It depends on the data source and how much changes from the last time it was refreshed. I don’t know that we’ve run into that a whole lot where when we got a refresh of the data it was like a crazy restructure or anything. I mean, our government doesn’t make that much change that quickly, especially in their data products. Now, I will say on these sources that are offered by the government, the government has over the past five years or so been making more of a push to make this data more available. They do want more people using it.

Zack Pike:
In fact, a lot of people don’t know this, but the government does seminars and stuff. They’ll send their head of the Census Bureau data out to talk at conferences and he’ll talk to people about why you should be using this data and stuff, so they have opened up some APIs recently to make this stuff available. I could see more of that happening, but I haven’t seen a lot of examples of where we’ve had to rework an entire structure for one data source when it was refreshed.

Jan-Eric:
Gotcha. Free data, alternative data, just kind of recapping this, what we’ve talked about, it’s certainly not a new concept. It’s something that’s been used by in the financial sector for quite some time. It might be something that’s a little bit newer to marketing analytics, business analytics, and certainly marketing analytics, but in your experience has been very valuable as an overlay to your client’s proprietary information to be able to get that context understanding. It’s free, which is great, but the catch is that it comes with a little bit of a hassle:

Jan-Eric:
Sometimes, it’s difficult to find. When you can find it, then it can be scattered and unorganized. You’re going to have to decode it in a way to understand exactly what that information is and then you’re going to have to get it transferred and structured to match up with your database and how your database is structured.

Jan-Eric:
These are steps that can be done, but they require some rigor and some expertise, but in your opinion, if the juice is worth the squeeze here, if you can get this done, the benefit, you can really reap the great benefits of being able to overlay that and really get more context.

Zack Pike:
Yeah, I mean, and I think we’ve talked about this on prior podcasts, we have countless examples of where we’ve used alternative data to influence insight that has led to decisions in marketing that we wouldn’t have made otherwise. Yeah, it’s critical.

Jan-Eric:
Yeah, that’s great. Well, anything else you want to add before we wrap up?

Zack Pike:
The only thing I’ll add is we do have to be careful. It is enticing to go out and just do a search for whatever data source you’re looking for on the Internet and try and find something. If we think about if you’re looking for something simple, like “I just want my population by city,” for example, because I’ve got a data set that has city in it and I want to normalize for population, there will be hundreds of files available to you to download that have population by city in them. For the analysts that are listening, there will be tons of them sitting on GitHub repositories. There will be just a lot available to download.

Zack Pike:
You have to really validate the data set that you’re looking at because what you don’t want to do is roll something into your analysis that just some guy out there built and didn’t make sure it was accurate. Let’s say he built it with a VLOOKUP in Excel and his VLOOKUP had an error. Now, you’ve got New York’s population sitting inside California. That would be a real problem and that happens a lot. There’s a lot of bad data out there.

Zack Pike:
Just like with everything on the Internet, we have to validate that what we’re looking at is accurate. Our approach typically is to try and get it from official sources whenever we can and also weigh it against the decision you’re trying to make, right? If it’s some quick really high-level analysis, maybe you don’t put so much due diligence into it, but if you’re making decisions on millions of dollars, spend your time to really dig through and make sure the data you’re rolling in is real and accurate.

Jan-Eric:
Sounds like a future podcast, how to do quality control on alternative data sources.

Zack Pike:
Yeah.

Jan-Eric:
This has been great. Hey, thanks. I know I learned a lot for this. Thanks for joining me and walk me through this.

You’ve been listening to the Uncovering Aha! podcast. Callahan provides data savvy strategy and inspired creativity for national consumer brands. Visit us at callahan.agency to learn more.