Data Quality and the Digital World
By Eric Peterson of Analytics Demystified
Excellent, thanks so much Brian and thanks everyone for participating in ObservePoint’s Virtual Analytics Summit and for taking the time to listen and chat with me today. As Brian mentioned, I am the CEO, founder, and senior partner at Analytics Demystified. I’ve been doing this for a really long time, and as I look at this copyright notice and confidential notice which I should have probably removed, 2018, I got my start in the digital analytics industry 20 years ago. I started at Web Trends back when we grep log files to try to find some nugget of information and look how far we’ve come.
So today I want to talk to you about governance, and I want to talk to you about the need to really think carefully about data quality. The executive summary. If you’re not going to pay attention to anything that I say for the next twenty odd minutes, listen to this.
Tag based technology is a genie that’s been let out of a bottle, and we’re never going to put it back. There’s never going to be a day or a time where there are fewer tag based technologies going onto your websites than there were the day before, the week before, the month before. This is the reality that we live with, right? New technology is hot, it’s sexy. Everybody wants it, and so it just goes on your web pages. It just goes into your apps. It goes onto your sites in a way that’s not always thoughtful, right?
So I can look at pretty much any client website, any of your websites I’m sure and see too many tags but not enough governance. Not enough clarity around why is that tag there? Who put that there? What are we hoping to do with that? Where is that tag sending data? We’ve got all these tags out there, but a lot of companies, a lot of clients and non clients alike are never sure why that data is being collected and who made that decision.
What’s worse is that your internal resources largely lack training on the tagging process. Don’t take this the wrong way. I’m not criticising your developers, I’m sure they’re fine and wonderful people. But they just don’t know tag based data collection by and large. It’s just another block of code that somebody in marketing is telling them to deploy. So they don’t know whether or not the data is flowing properly or if it’s just the tag is firing. They lack training on tagging and data.
Your agencies, right? If you’re an enterprise of any size in this modern age, and that’s largely the companies that we work with at Analytics Demystified, your agencies aren’t making this any better. They will come forward and say, “We’re going to put this tag and that tag and this other tag on these pages and it’s going to do all this wonderful stuff.” But they don’t have the training. They don’t have the knowledge about those tags to really be effective in collecting data accurately and collecting actual data that your organization can use.
As if that isn’t bad enough, your websites are leaking data. And like all leakage, if it’s from your car’s engine or something else, leakage is generally bad. So, we’ll come back to that towards the end.
The notion of data leakage is something that surprised me when I first wrote a white paper in 2015 with ObservePoint. It has only become a bigger and bigger problem. So, the executive summary. I laid it out. There’s problems. So what do you do?
Well, I’m going to give you ten tips to improve your data quality. We’re big fans of a ten tip and twenty tip format. Something that we do at our Accelerate conference. We’ve got one coming up in January in Los Gatos, California if you’re interested. And there’s a white paper, data quality in the digital world, from April 2015 that you can download at this sort of ugly, bitly url. But the reality is that these ten tips, they work. If you really take them to heart, if you think about, “How can I deploy these within my organization? How can I get people in my company to do seven out of ten of these things? Or eight out of ten of these things?” You will be able to trust the data that you have substantially more than you likely can today.
So, here are the requirements or excuse me, the tips.
So, number one. There has to be ownership for data quality. This shouldn’t be surprising, and I think as you listen to the next twenty odd minutes, you’re not going to be surprised but hopefully you’ll take this information and go back and say, “Gosh here is something that we’ve talked about, but this is something that we need to do. We really need to execute on these tips.” So, you’re not surprised when I say that somebody needs to own data quality. Nobody on this call should be.
When you’ve done that, work to define governance for data in general. Most of you, despite the fact that I and my peers and people like Avinash Kaushik at Google and Gary Angel at Ernst & Young and others have been talking for years and years and years about the need for process in analytics, most companies still sort of suck at that. They talk about the need for process. They talk about the need for governance, but they don’t execute. But data quality requires define data governance. Right? Especially in this era of tag proliferation. Especially in this era where you’re actually scraping pages and scraping the document object model to pass information into tags to pass along to vendors.
You have to have governance. You have to have governance, so that any time you add a new collection point, you know how the data is going to be collected and what tags are going to go there. You need governance to verify the initial flow of data. You can’t just sort of wave your hands and say, “Well, the tag’s deployed, it must be working.” Because that doesn’t work. It just doesn’t work.
Most importantly, you need governance to confirm the ongoing flow of data. Because if there’s one thing that we hear day in and day out at Analytics Demystified, working with fortune fifty, fortune one hundred, global two thousand clients, is, “The data was being collected and then somebody modified a page and all of the data is broken. Worse. We didn’t notice that the data was broken for thirty days, and now when we look back, we have to put an asterisk on that report because we know data collection was incomplete.” So governance is a way to start to stop having incomplete data. Governance. Not tag management.
I believe, ironically, I was one of the first people to coin the phrase “tag management systems” back when we called them the universal tag. I actually wrote a paper cautioning that tag management was not a replacement for actual process. Tag management is not a placebo. Do not be fooled in believing that if you have a tag manager, and if a couple of people are trained how to use GTM or Telium or Adobe Launch or whatever it is, that you’ve got governance. That’s crazy. That’s absolutely crazy. Governance and process is what you need.
So, you need a tag auditing technology and no great surprise. I’m a huge fan of ObservePoint. We’ve been partners of ObservePoint, we love ObservePoint. We encourage all of our clients to use ObservePoint because good data governance requires automation. There is no way to look at your increasingly complex investment in the digital world be it a mobile app, be it a mobile site, be it responsive design, be it a regular website, and know that everything is working perfectly when it comes to data collection. It is just impossible.
In this era now with ObservePoint and other companies and technologies like ObservePoint, you can. You can automate this. You can automate it for complex web processes. You can automate it for your applications. You can give yourself the benefit to not have to go in and look at code day in and day out or worse, wait for something to go wrong. You can get ahead of this. The alternative, auditing data collection by hand. It’s not an alternative anymore. It probably worked 20 years ago when I got my start at Web Trends when we had largely crappy web pages that did literally nothing, but it doesn’t work today. It does not work today.
So, when you’ve got some technology to help you, you audit and then you audit and then you audit some more. You treat confirming high quality data collection as part of analytics as part of the analytics and optimization process. Now most of the issues that we see arise during new content development and deployment. But that doesn’t necessarily mean when the tags are being deployed, it just means when new stuff is going out. New marketing campaigns, new web pages, new processes online. That breaks stuff.
Assuming your analytics is integrated into development, which is a huge assumption I know, but it is the best practice. You’re supposed to do that. Assuming that it’s integrated into development, you need to test frequently and audit frequently during the development process. Are the tags there? Is the data flowing? Is the right data flowing? Is the right data flowing to the right report suites or the right containers or the right tags or whatever it might be? And then test weekly during the QA process. Don’t do it on the front end and then make hundreds of changes through your sprint cycles and hope that your analytics is correct. Keep testing, and then test aggressively following deployment.
So many times we’ve heard, “Yeah everything was fine in QA, everything looked great in development. And as soon as it actually hit the live internet, everything broke.” Fight that. Fight that. You’re not going to get everything. You’ll never get everything. It’ll never be perfect, but if you’re auditing and if you have a process around confirming data quality, at least you have a fighting chance.
So, set a rhythm for monitoring. Monitoring for data quality is not a one time effort. It’s not something you do once a quarter or once a year. It’s something you do constantly. It’s like maintaining an automobile. I’m a huge car nut, big fan of sports cars. You can’t just get a 1969 porsche 912 and run it every now and again and hope that it will fire up every time. You’ve got to keep taking care of something like that just like you need to keep taking care of your website and your data and the data quality that you’re looking for.
So, monitoring weekly for obvious changes. And they may not be obvious changes to you. It may have been done by somebody in a different group. Somebody in a different part of the company. Somebody in a different part of the world, but because of the complexity of our digital world now, those changes can cascade back through and affect your data quality. So setting up ongoing or weekly monitoring for obvious changes. You can also, if you’re sophisticated in this, you can actually feed urls that have changed to a monitoring system. And say, “Okay, don’t look at everything, but look at these pages because these pages have been changed. These pages have been touched.”
Then, monitor monthly for deeper issues. Maybe once a month, you look at everything. And you make sure that everything is still sending data in a way that is consistent with how your business people would like to see it.
One interesting thing too, I’ve talked about this a little bit in the past but it often doesn’t get deployed, and I’m never sure why, is a good key performance indicator. As mentioned earlier, I’m the author of ‘The Big Book of Key Performance Indicators.’ It’s freely available at the Analytics Demystified website. It’s a little out of date now, but a great key performance indicator for your data quality are percent properly tagged pages. If you’ve got a hundred pages and you’ve got three tags, to give the simplest example, do all one hundred pages have all three of those tags and are all three of those tags working properly? And if you’re not at 100% for that KPI, why? Why? Go back to your CDO and say, “Look, we’ve got a problem. How can we fix this?” Set that rhythm for monitoring. Set that rhythm and stick to it.
Another thing that I heard as I asked my larger team this morning, I said, “Tell me a little bit about data quality.” I heard one client, great client, huge name in technology, they had defined a CDO, they had defined a group of people who are responsible for data quality, they had defined a rhythm for monitoring, but then the organization changed, things changed, people changed, priorities changed, and that whole process had gone by the sideline. You have to maintain the rhythm. You have to keep monitoring for data quality.
Another wonderful thing to do. Right? So you’re going to monitor on an ongoing basis. But if you’ve got technology to do this and ObservePoint does, others do as well, establish alerts. Set something up that will proactively tell you, that you will tell your chief data officer, that will tell the people responsible for the data that something broke. We don’t know why, but in the last deployment, somebody changed something that ended up changing something in the data layer that ended up changing something in how we were passing data to the tag management system which means that Adobe is getting nothing. Or Google is getting nothing. Have that alert there.
More importantly, have a triage in place to quickly correct for that. The worst case scenario is where you get an alert that says that half of your data collection for analytics is now zeroed out. It’s not working. But to be told that there’s no way to correct for that until the next sprint cycle, the next release, whenever that is. Have an analytics triage process to fix things when you know they’re broken.
Again, I mentioned this earlier, the alternative is having missing data that constantly needs to be factored into analysis. And if you’re an analyst on this call listening, you know there’s nothing more annoying than looking at year over year data or month over month data or quarter over quarter data, whatever it might be, and saying, “Oh yeah, but remember that huge data outage in Q2 or a month ago?” And how do we factor that in? How do we report back? That all becomes worse if you have data consumers within your enterprise who are self servicing and who maybe don’t know that there was a data breakage process or a data breakage, excuse me, months ago or quarters ago. And they’ll just go willy-nilly do comparative analysis and not know.
Those proactive alerts will help you avoid these data quality issues.
If you are a large organization. Global enterprise or maybe you’re a company who buys other companies, you’re trying to integrate your websites. I would encourage you to not just rely on that chief data officer that I mentioned earlier, but to create a group, a governance group, or a data working group who can scan locally, excuse me, who can scan locally and act globally. I’ve got this backwards a little bit here, but you want people to be able to look at their own data and say, “Hey, here’s a problem. Maybe it’s in my business unit. Maybe it’s in my geographical region,” but who can surface that back to other people in the organization to make sure that they have seen that and that they’re not having that same problem.
You’re working towards excellent data quality. You’re working towards having data that you can trust day in and day out for decision making, but no one person is going to be able to do that alone. It’s just impossible.
So, confirm complex integrations. This is a biggie. Increasingly, your websites are not just website. Increasingly, you’re creating whole digital experiences, and maybe they transcend your website. Maybe they go on to other domains for registration or purchases or the list goes on and on. So if you’re creating complexity in your digital experiences, you’ve got complexity in your data collection as well.
Just knowing that data is being collected isn’t going to be enough. If you’re moving from sight to sight, or if you’re moving from, maybe you’re moving from a mobile app and somebody’s got to go back to a digital webpage. Again, there’s so many examples, but just knowing the data is being collected, that’s just not enough. You need to monitor and validate the actual data. You need to say, “Is this the data we expect?”
One example that was given to me is in response to websites. Client company wanted to collect and pass different data to Google Analytics if they were looking at a responsive view that was optimized for mobile versus optimized for a desktop. But if you just look, “Is the data present?” The answer is going to be yeah. Yeah, the data is present. Is it right? That’s what you need to monitor for. You need to monitor for specific data flowing into collectors. You need to monitor for data flowing into specific accounts of systems. It gets really complicated. If you try to do this without automation, if you say, we’re just going to go, we’re going to look at the data and we’re going to do it by hand, that may not be as successful as you would hope. Think about it. Think about how complex your digital world is. All the tags you have. All the things you’re trying to collect and factor that in when you think about data quality.
Another cool thing you can do if you get good at auditing your data. If you have a CDO, if you have a working group, if you have people who are focused on and who have job requirements, to focus on data collection and data quality. Well then, you can use this to confirm adherence to your original solution design.
Now, probably some of you are thinking, “Original solution design. Gosh, how long have I seen our websites original solution design?” Hopefully you have one. Hopefully you have a solution design, and you went through a business requirements gathering process and translated that into a solution design and effective design that was implemented. But most people don’t go back and look at the solution design or consider changes to that when they get three months, six months, nine months down the road. But what happens is, then people say, “Why is that data being collected that way? What is actually in that eVar? What is in that s prop? Why would we have that and how could we use it?” Well, if you get good at confirming your data quality you can go back and match the data that you’re collecting to your business requirements and your solution design and confirm that those key requirements are still being collected and still being met. Right?
And this can be very very powerful if you’re like a lot of our client companies you have people coming and going within the organization. How nice would it be if you could hand a solution design and say, “This is the data that we have in whatever, Adobe or Google, and here’s how everything is set up and why it was set up. Go read this document and you’re going to know what we’re doing and we’re trying to do and how you can be successful as an analyst.” Well, you can’t do a great job at that if your solution design is wholly out of data. And if when you say, “Hey, here’s this information in eVar 16.” And then the analyst goes and looks at eVar 16, and it’s empty because you didn’t notice that it broke six months ago, etc. etc. etc. So, this is kind of a happy accident when I was writing the original white paper. But I realized that this is a really practical use of the data quality of the data governance process. To make sure that you continue to be thoughtful about what you’re trying to do with your analytics.
So, the last big tip, and I’ll probably end a little bit early here, the last big tip is about preventing data leakage. Again, when I wrote the original white paper with ObservePoint in 2015, and as I was talking to clients of ours and clients of ObservePoint, it continued to come up that people were using ObservePoint to validate what tags were on their pages.
At first they said, “Well why would you need ObservePoint to validate that? You know what tags are on your pages.” And the senior people within these organizations that we were talking to were saying, “Yeah, you’d think that, right? You’d think that our agencies wouldn’t just put Google Analytics on the site because they like Google Analytics. You’d think that they would our Adobe Analytics or our core metrics or our web trends or whatever it was. But they’re not. They’re actually deploying their own tags. You would think that IT wouldn’t deploy additional technologies.” Or IT would say, “You’d think that marketing wouldn’t deploy additional technologies.”
But the reality was, that when I started asking companies this, they were saying, “Yeah, we’ve got tags on our website and we have no idea why they’re there.” And sometimes they were analytics tags asking the senior director of analytics asking the VP over analytics, “Why is this tag here? Why are you using Hotjar?” And they’d say, “I don’t know.”
So, I’d imagine and I challenge you, do you know every data collector being deployed on your website today? Do you know everything that is in your tag management system? Do you know everything that is being used to pull data out, away from your visitors, into another system? Ostensibly your system. Do you know that? If you do, I’d go run an ObservePoint audit and confirm that. And see whether or not you’re surprised.
The reality is staff come and go. It is shockling common for systems to have access to people who no longer work in the organization. They still have access to your Adobe, they still have access to your Google. And that’s not something you would discover through ObservePoint, but that’s something that you have to consider. These leave data leakage.
Your agencies come and go. And again, agencies don’t always ask before putting tags on pages. Sometimes they do what they think is best for the agency, what is best for the client, what is best for the company. But in an era of GDPR, you have to know. You can’t just do that anymore.
Technologies come and go too. Another thing we realized is that, in terms of data leakage, that there were technologies still deployed across the websites of these companies that I talked to that were sending data to nowhere. These were basically dead technologies. Or they were technologies that the company was no longer paying for. The data was still flowing.
So, it’s a pretty big problem. As I was preparing for this event, I read up on Magecart.
So, if you have a data auditing process, if you take my advice today to heart, and you go out and do five of these things or six and seven of these things, there’s a very high likelihood that somebody at New Egg would have said, “Hey, what’s neweggstats? And why are we sending data to that at the point where we’re collecting credit card information from our buyers?” And somebody else would have said, “Huh, I have no idea.” And who knows what would have happened at that point.
It’s not just limited to Magecart. This is the kind of new digital attack that’s going to happen over and over and over again. And because we have all done the right thing in deploying tag management systems, and because we are all in a frantic hurry to have the newest, greatest data collection or tag based whatever technology, this is going to continue to happen. So you need to watch for this.
There are several quotes in the white paper. Let me modify this so I can read it, but this is one that really stood out. One of the folks I interviewed for the white paper, who wanted to remain anonymous, said, “During one of our site scans, we discovered that one of our third-party advertising providers, people we trusted, was dropping a commonly used competitive analysis solution’s tag on our site without our knowledge. This was very much a problem since the solution in question sells data to our competitors and, thanks to our advertising partner, they now had direct access to our data.”
And that doesn’t make anybody very happy. Now, certainly, the advertising partner wasn’t doing that on purpose. They weren’t trying to steal that data, but this is preventable. This is preventable just by having a governance process in place.
So, the ten requirements for data quality that we talked through today.
I thank you all very much for your time. If you’d like to learn more about this, here is my email address. It’s firstname.lastname@example.org. You could come and meet myself and my entire team at Accelerate 2019 in Los Gatos, California on January 25, and I’ll turn it back over to ObservePoint.