Matt Gellis - Data Integrity - Why Your Data May Be Wrong

November 22, 2016

Slide 1:

Pleasure to be here at the ObservePoint Summit. We’ve had some great presentations so far, and hopefully I can keep that momentum going. As was just introduced, my title is data integrity – why your data may be wrong. So obviously, a little bit of a slippery slope as we jump into some of the things that may be impacting data credibility, data quality, and data consistency.

Slide 2:

But before we jump into all that, I’ll give you guys a little bit of background about who I am and my experience so far in the industry. We’ll keep it pretty short.

I’ve been around about 15 years—maybe a little bit more—working in all kinds of various capacities. Like many people, I backed into the role of digital measurement through a series of events. Working with the advertising and marketing team, making sure that we were starting to map drive value based on specific actions and success, taking over some of the social side, there’s a number of ways that we’ve all have backed into some of these roles. I’ve worked on the vendor side, on the client side, obviously on the consulting services side, so a lot of varied expertise that I bring to the deal, and that Keystone does as well. As we work with these enterprise organizations on exactly what we’re saying here: data credibility and quality. How do we get to the point where we can remove trusting the data from the equation as a barrier to adoption?

Slide 3:

With that, we’ll move on. Data quality. There’s about a hundred definitions. If you’re for data quality, you can generically define it. There’s a lot of different ways that you can try to get to the heart of what data quality is, but at the end of the day, the integrity of an organization data is critical to driving those actionable insights. It’s got to be something that’s very valuable to you. When we start to talk about data, you can’t truly be data driven when you have data that’s not trusted, when you haven’t validated or verified the data along the entire journey from insight, inception, to delivery. And as we start to expand the use of the data internally, if we don’t keep our eye on data quality, adoption suffers, apathy sets in, total cost-to-measurement can skyrocket, and your time-to-value, just getting those insights increases. But we keep doing it. The ultimate goal is to ensure that the right data is always available to the right people. It’s driving actionable insights credibly and consistently. That’s a process, it takes a long-term vision, it’s dedication and desire to leave behind a richer data than you were left with. It’s not easy, but it can be done. Ultimately, data quality is something that is important and valuable to your organization, and that can be trusted consistently.

Slide 4:

We’ve all been here. The picture in our head of what it is that we think our data, or organizational table looks like, and what the reality of our situation is. Sometimes it’s just so chaotic that we can’t really get to the picture of what that organizational perception would look like. But we’ll never be able to get anything done if we don’t start. So we have passed the point where we do nothing because there is just so much to do, and we have to grab a hamper, grab the laundry, start somewhere. We have to start prioritizing what we start on. We have to be frugal with your time. We have to look at ways to give that back to you and your team. That’s really what’s allowing that organizational framework to evolve. If we start to look at—at the end of the day—time is that biggest variable, that lever that we just don’t have, and therefore, back-burner so many projects, so many things that could be long-term valuable propositions for our data measurement program. That takes, again, a lot of time. And to get to that perception, you have to have the right framework.

In this particular case on the left, we’ve got the right tools, the right cabinetry, we’ve got the right drawers. We have the framework and the other tools to allow us to start at least making sense of what’s around us. So we want to make sure that the right tools, the vendors, partners, internal support, and certainly of mapping objectives. When we are cleaning up our data—to what point? What purpose? We all want to make sure that we’re evolving the maturity of the analytics. We can work on some of those as they become available. These aren’t things that have to be done in parallel. That’s the beauty of data quality. It’s that we can make an impact on the quality of our data at a lot of different points along the path.

Slide 5:

A lot of times, when we’re working with these large enterprise organizations at Keystone, we start to get, inevitably, questions about: What’s the big deal if we don’t do this? Or we don’t do that? Or what if we don’t consistently curate and keep our eye on the data, and shepherd it through? And it’s very easy to get into that particular mode because when you look at something only one step out, it can seem like it doesn’t have a very big impact. We like to talk about being data driven. We like to believe that data is our greatest asset. But in so many cases, our actions and the way we continue to do things that we’ve always done, that business-as-usual approach, is killing the innovation that we need to drive, to change the industry as it relates some of those long-term goals, some of the truly valuable, that destination, that objective that we’re trying to get to. And we make to make sure that at some point we stop becoming complacent with our data. We’ve allowed “good enough” to permeate the data table for so long, that we consistently look one or two steps in front of us, and we don’t see the impact that will have on our long-term value for our data plan.

Consider this as we start to look at the slide that we’re looking at right now, it’s called “the power of 1 degree”. It’s always been an interesting thing to me. Over the course of the last several years, I’ve looked at some of these stories. It’s either people flying, or people doing cross-country treks. But if you’re off course by just a single degree, after only a foot or two, you’ve only missed your target by about .2 inches. It’s pretty trivial, it looks like it doesn’t matter. But as you get farther out, that single degree of difference in what you started out doing and what path you actually took, it starts to become more significant. Although .2 inches isn’t that much when you’re looking at just one step in front of you—a 100 yards, you’re only off by about 5 feet, so still definitely not catastrophic—but after we go a mile, we’re off by almost a 100 feet. If we’re travelling—as this particular map shows—from Georgia to California, by a degree or two difference, we can end up in Texas. Or when we’re driving up to Grandma’s in New York City, if I’m off by a degree, and I end up in New Jersey, those are outcomes that—at least in our house, if I say “We’re going to Disneyland,” and end up in Texas, or we’re going to Grandma’s, and we’re spending our time in New Jersey, things get out of hand pretty quickly at our house.

And that’s only a one degree difference. By the time you’re traveling, again, from a data perspective, a journey, it started to compound the farther out you get until, at some point, you’re so far down that path, that amount of time, the effort, the cost to go back and start over, becomes significant. A lot of people just pitch their tent right there, with where they’ve landed, and try to make due with where they ended up sadly. I’m trying to figure out how we can move some of those obstacles out of the way. In this particular case, when we start to talk about that one degree difference, we want to make sure that we have the right expertise and highlighting those shifts and degrees. Are you getting off course even just by a little bit? Are those degrees and those shifts in degrees, are those things that are going to have big impacts, or minor impacts? Because those small shifts can certainly have, as we have mentioned, long-term larger impacts, and have to be managed so that we’re arriving at the destination we ultimately set out to arrive at.

Some of those things that we can do to prevent those shifts is have the right framework. We have to have the right governance, the right process, and impact evaluations. Those are crucial tools in risk mitigation. We want to make sure that if we do have issues, errors, data anomalies, which is always going to be the case, we can mitigate the impact that those have on our overall data measurement, so that the destination, the end goal, is not impacted.

And then keeping that holistic view. We’ve got to examine things both at micro and macro levels. Both from a tactical perspective, as well as the strategic long-term so that both the way that we get there, and the destination we arrive at, end up being planned for, or that we’ve purposely changed direction instead of inadvertently arriving somewhere and just having to make due.

Slide 6:

At Keystone—people get tired of hearing me say this—is that the consumer journey that we’re talking about and really trying to highlight and create this credibility around, is completely indifferent to the structure, the teams, the alignment, and how things are run internally, their experience—and rightly so—is fairly invisible. The magic beneath the cover, so to speak, that is going on, and yet we somehow believe that that magic then absolve us from data issues, data leakage, data errors. So one of the simple things that we like to do at Keystone is to due consumer traceroutes.

What is that traceable path from the inception of data, to then where we deliver the right insights to the right person? Along that path, once you really start to map that out, as we dig deep, as we start to pull in all places where that data could potentially be misconfigured, have human error into it, have some of the processing and collection skew the result, there’s so many of these as we look at the traceable path. And I’m sure none of you guys have this problem, but from the point of inception, where we have the birthplace of the consumer experience, at least what we’re tagging, at least where that creation point of the data element is, we’re trying to make it through each of these with credible, quality data that everyone can trust. So the amount of places that it can be touched from inception, at least they are small. Now, where the inception point is, we can talk about, but most of the time we can at least kick the data off.

I’m sure none of us have any problems with collection. Every time we collect data, it’s collected the exact right way, it’s passed along perfectly, there’s never any misalignment on the configuration or where that data is placed, but in other cases, some others might have some data leakage, or errors in the collection area of the journey. Then of course the transition, my favorite. Three and four is where I’d say 75 percent of most of the issues come. And that’s saying something, because collection is where we see a good chunk of them, but that moving of the data, transitioning from where we collected it, to where it’s going to be stored long-term, who has to have it? Are we copying it? Does replicating data come with all of the configuration that the data went through before replicating? Does everybody have all that context? And let’s just blanket copy it to everybody because then nobody can see we didn’t give them the data. Those are things have, again, compounding issues from a data error leakage and standardization standpoint.

QA, again, one of my favorite topics. One of the biggest impacts we can have on data quality and one of the things that’s done most infrequently. And more often than not, even when we start to work with our larger development teams, mostly QA is done from a pure IT perspective. There’s not even an awareness, in some cases, that digital measurement requires a completely different level of QA and curation and ownership of that data. So usually it devolves into, “Did you check that thing on the page? Yep, then I think we’re good.” And you roll something out and the next thing you know, it’s a 150 thousand dollar issue that didn’t need get caught for three weeks.

Then of course, the insight delivery. The reporting. I’m sure there’s no error ever here because everybody understand the entire process. And once you’ve delivered the insight, it’s actionable, it’s clear, it’s consistent, it’s credible. And did you want page views with that? Because that’s all it comes with. There’s a lot of these different places that as we trace the entire consumer journey that can be impacted by other people, that can impacted by the other machine and tools and touchpoints and software and hardware stacks that we all have used that end up, ultimately, translate a completely different message.

Slide 7:

As the game of telephone we all know and love, a lot of times what goes in, is very rarely what comes out. The problem is we’re making decisions on those things that are coming out the other end. This idea—again, I bang this drum a lot—this idea of creating a trace route out of your consumer journey is really a necessary step in planning for the health of your digital data. There’s so many different touch points that can be misdirected, deleted, changed, that seemingly simple journey from inception to insight, can certainly have its hurdles along the path. Just as a broader view of what we’ve just been talking about. But some other touch points within these that, very frequently, lead to data leakage, some sort of error, some sort of misconfiguration, because as you look at the way that these handoffs and as the journey progresses for the consumer, lots of software, lots of hardware, a number of people, a lot of teams, very different maturity of where everybody is understanding the reporting. All of those things play a part in what decision is made on what data by the time it finally hits that final stage.

As you can see here, just a few of the things that we run into as well. As we move from that point where we’re collecting, moving, QA, making sure that it’s consistently monitored, that we’re impacting the quality of the data, and then delivering that. That whole funnel is designed specifically so that at the end of the day, from an adoption perspective, we are delivering the right reports, we’re delivering the right time, it’s the right delivery mechanism, it’s the right type of report, and it’s always trusted all the time. Now, that doesn’t mean that data errors don’t happen, but we need to get to a point where our insights, our actions, the things that we’re delivering as a result of this entire journey, are not accompanied with four pages of caveats on when you can or when you cannot use the data, and why it may or may not directionally relevant.

I think sometimes we get a little bit overwhelmed thinking, “Oh my gosh. Data integrity, data quality, there’s so much to do.” Start somewhere. Just pick one of the areas that you have the biggest influence from one of these particular pillars, if you will, these areas, and prioritize. What are some of the things that give you the biggest bang for your buck?

And I don’t always mean the dollar. For us, sometimes that means time. If there’s something that we can tackle, standardize, automate, remove some error hurdles from, it gives us time back. And that time can then be spent more effectively, either through analysis or tackling additional data quality considerations as we move through the consumer journey. Prioritize according to what’s going to give you most time back, and what you have the most influence on right now. Because that will allow you to free up some of the time to tackle the more—as with all data quality—there are always those bigger projects that you’re going to have to spend a lot of time on. So tackle the little things that can buy you that time and that help tell the right story as you look at the data quality and tease out how data quality can, and will, help your organization save—in a lot of our company’s cases—hundreds of thousands of dollars.

Slide 8:

The simpler view of all of that is, like I said, the telephone game. This is really what we run into so many times from a consumer data journey perspective, that we look at and say, “Ok. What was the original request? Do we map that? What was the original value proposition? What was the hypothesis or theory that drove needing this particular data? What are we ultimately trying to decide? And does this move things downstream according to our strategy?”

Sadly, that may all be very well articulated from one person to the next. But as we continue to go through, we don’t the railings, the safeguards, the governance, and the process to allow our data to flow through that consumer journey unmolested, so to speak, as we get to that end state, we make decisions based on what comes out the other end of the telephone game. So frequently those aren’t even remotely related to what we started with, what we’re making decisions on. And worse, success can never be attributed back to the original requester. You’ve misaligned and remapped and reconfigured data to the point where it doesn’t even look like the same things. Therefore, any success, or failure, or other things that we can learn from it, it’s very difficult to map back to the original impetus for the request in general. We continue to make decisions in silence, it stifles the collaboration. The telephone game, while fun for passing time at elementary school, is certainly not the most effective to run a data-driven organization.

Slide 9:

A couple of other things as we push through where we can focus our time from a data quality and data integrity perspective. One of the things that we see all the time in our role with Keystone, is just the number of tools that people use. The number software and hardware stacks in play when we start about digital measurement, when we start to talk about analytics in general. What’s even more complicated and more confusing is that most of those tools do have a valid place in the ecosystem for digital measurement. However, in many, many cases, we’re using those tools for the wrong things, or we’re trying gain insights using the wrong tool. For example, on this particular case we’re looking at a microscope, binoculars, telescope. All of the magnify objects. That’s what they’re supposed to do. However, experts in using a microscope are not necessarily going to understand everything that a telescope is showing. Those who need the telescopic data are not going to be the ones who are best at interpreting the data that is much deeper, or keeping people on track. There’s so many different roles and responsibilities and things that go into a data program, that even starting as simply as, “Are we using the right tools for the right part of the process,” can be the very beginning of that consumer traceable journey.

Whether it’s deep-look tools, we want to be able to look deep into the data. We want monitor it, we want mine it, we want to be able to distribute it, we want analyst-level raw data, we want be able to really start to customize what we extract from insights with this data. Fantastic tools, but very, very poor when you actually move them into using them for other purposes. So keeping on track, for example, we’re talking about your data collection vendors, and monitoring, and visualization, and planning, and insight delivery. Not so much the raw data, we’re starting to extract that up a layer, we’re trying to get that visualized a little bit more, easier to digest for consumers and stakeholders.

Then our long range objectives. We need to have the right tools in there, so that we’re looking at those trended dates. And as we gain more historic data, and it’s credible data, then we can do a more of that compare and contrast and that trend comparison, deeper visualizations, our predictive analysis, our experience analysis. All of the things that then allow us to be proactive and innovative with our consumers. Those are, again, completely different tools. So as we start to prioritize, as we start to look and say, “Ok. Where are going to refine that time, buy ourselves some time.” Make sure that your tools, which have the tremendous impact on data quality and consistency, are correct. They’re aligned to your end goal. That we are all using them the way they’re meant to be used, that not leaking time and effort out because we’re using the wrong tools to extract the information.

These are just part of that overall framework that has to be in place to care for data in a shifting environment where data is, and should be, our greatest asset. We need to make sure that are treating it accordingly. And some tools are not needed at each phase. We sometimes run out there and buy a hundred tools thinking that will solve the problem. Some of those tools are just not there. From a maturity perspective, resource, technology, budget, whatever it is, start with what has the biggest impact and what you can actually do something with, then expand from there. Tools are not the silver bullet that eliminate the need to work on an active and evolving digital measurement program.

Slide 10:

As we diagnose this, and as we start to try and figure out how we cure this, there’s a few different categories that we look at this through the lens of. First is diagnosing the problem. It’s irrelevant even trying treat any symptoms or other things downstream, until we diagnose what some of the root causes are, and more importantly, what the impact is to our data and digital measurement program. We have to make sure that we identify what those are. The analysis: identify the source of those. Who’s impacting it? Are they people? Are they technology? What is it that is exacerbating those root issues? Or there’s redundancy in tools that we need to eliminate. Is it lack of governance, which is another big category? Is there no process, no procedure, no data safeguards, and quality railings, if you will, to insure that the things that we’re doing in an active and evolving digital measurement program are consistent, are credible, that they’re underscoring the need for that trusted data that we can use to further our measurement initiatives.

Education is one of the biggest areas that adoption fails at that we’ve seen a long time. When we start to talk about the governance, which is the framework—that’s fantastic. But even with the best governance, the best analysis, you’ve diagnosed everything, if you’re training your people, your stakeholders, if you’re not actively helping them understand the impact the data has on them, that the roles that you have internally from the data analysis perspective, implementation, development, that they understand, not only the impact that they have on the data and the process, but how data can help them and collaboratively start to become a data-driven organization. Where that adoption is encouraged and we’re supporting that implementation practice. It also helps get some allies on your team as you start to look at, “How do we tackle these small, but ultimately larger data quality issues?” Then, like everything, it’s a refinement process. It’s realizing that data quality is a constantly changing process and we have to adapt to the organization needs. But we have to do that in a way that’s both consistent, and that maintains the credibility of our data.

The way that I explain it sometimes in Keystone, is—I’m dating myself a little bit—but I used to play the Connect Four—the original Connect Four game—back in the day. I started to look at it and started to say, “Well in a digital measurement program, those checkers, which you’re trying to drop in this framework and get four in a row or diagonal, whatever way you can do it.” I look at that and say, “It’s important for us to have that framework.” We have to have the correct transition and bridge points. You have to have the data quality that is infused to that framework, both to facilitate, dropping in any of those data nuggets, but also to mitigate the impact and the risk that bad data—or data that we have that has issues—has on the of the organization. We work very hard in some of those cases to create the right framework so that errors can happen, they’re going to, so that success is encouraged and identified and documented, that the guardrails are there, that you can get people excited about the data. And that’s framework related and making sure that the data quality and measurement framework that you have is going to support all of those roles.

Slide 11:

As we wrap up I want to cover: where do I start? At this particular point, like we’ve talked about, just to sum everything up, the first thing I always do: start mapping. There’s no point in trying to solve things that you can’t even identify yet. Make a consumer traceroute. Spend some time, and don’t elevate yourself above what the dotted organizational lines are, and how data is compartmentalized internally. Elevate to the consumer level, and then try to overlay that trace route of when I start collecting the data to when I can deliver a valuable insight, and see where all of those touch points are. What technology is touching it? What people have the ability to touch it? What vendors can misconfigure the data if it’s not administered correctly? So on and so forth, and then you start to get an idea of where to prioritize. One of the other big areas that we work on that has an immediate impact on data quality, is research and define your QA process. Who’s touching it? When, why, how are they doing it? And more importantly, is it standardized? Is it more the, “Did this break the page? If not, go ahead and push it.” That’s not digital measurement QA. That’s basic IT and dev QA, and very critical, very needed, but not something that’s as valuable to us as analysts and digital marketers.

Defining strategic objectives. Again, the mapping exercise. As you go through this consumer experience, there’s a point—at least there should be—and at the end of that point, we want to make sure that the objective we lead them to participate in, and there’s a variety of them, we can map back to those starting points where we collect the right data. Where we made sure that we shepherded that through credibly, all the way to that insight delivery. Define some of those relevant stakeholders. Where they’re touching it, and who can be allies as you start to prioritize data integrity and data credibility. Audit and develop governance. This is, again, that framework. How do we both facilitate and enhance our ability to analytics and mitigate the impact or risk that bad data will on the work that we do within the organization. And then set up proactive monitoring. This is a no-brainer. Here we are at the ObservePoint summit, and that’s a perfect way to start instant monitoring on some of the key pages, even if it’s just simple things, so that instead of months to respond to something—even if there’s a lot of work that goes behind it—it can be done in hours, days, weeks, saving tens or hundreds of thousands of dollars.

Slide 12:

Again, kind of drinking from the firehose. We could spend an entire day just on data quality and data integrity alone. We do frequently. But at this point, we’ll leave it at the summary for now and answer any questions. I’ll turn back over to Jack for a few minutes and then we’ll start the QA portion and obviously, as you see, there’s lots of good things that have already come from the previous sessions. Stick around for some of the other ones, I know that they’re going to be as fantastic.

Previous Video
Team Demystified- Debugging Code, People, and Habits: Tips for Better Data Quality Management
Team Demystified- Debugging Code, People, and Habits: Tips for Better Data Quality Management

Learn Team Demystified's best tips to manage data quality in both Google Analytics and Adobe Analytics.

Next Item
AppAssurance Product PDF
AppAssurance Product PDF

Learn how AppAssurance can help you test your analytics and other SDKs on your mobile app.