Judah Phillips, SmartCurrent - Data Stewardship in Digital Analytics

November 22, 2016

Slide 1:

Thanks for having me here today. We’ve had some great presentations so far and I’m excited to be part of the Analytics Summit.

I want to talk to you all today about Data Stewardship and governance in digital analytics.

Slide 2:

What are we going to talk about today? We’re going to talk about data governance. What is data governance? How many people of you do it? How many people of you participate in doing it? And how many people of you have heard it before? Data governance is a term that refers to the overall management and availability, usability and integrity, and security of the data used and employed at your enterprise. What I’m trying to describe to you right now are some of the core elements of a data governance organization. If we build from the ground up, there’s this idea of data quality and I think this is inherent in an enterprise class platform like ObservePoint. It’s one of their goals to ensure your data quality for your digital data. But that’s not all a data governance organization does and that’s not all that data governance is about. It’s about a lot of other things too.

Slide 3:

The current industry situation is that data is everywhere. By this point in time, there’s more digital data and bytes being collected than there are sand grains on Earth. There is a huge amount of data and if you read the first chapter in my blue book, you’ll learn that about 99.9 percent of that data actually isn’t being analyzed.

Slide 4:

We’ve got data everywhere and here’s the resulting challenge: we’ve got a ton of different data sources, and in order to bring it all together, it costs a lot of money. This is a big challenge, a bit cost associated with it, but like all things moving into the future, there’s opportunity to solve the data governance challenge in digital analytics.

Slide 5:

If we as an industry don’t address the challenge, the result is negative business impact. Where does that negative business impact come from? Poor implementations. There’s a lot of poor implementations out there. I’m sure you’ve all run into them before, maybe you’re dealing with them right now or you’re trying to course correct. A lot of those poor implementations result from the inability to deliver to business requirements, leading to unsatisfied stakeholders; people who don’t trust the data, don’t believe the data, think it’s wrong. This data doesn’t match this data. For whatever reason, not all the data matches conceptually and we in the industry know a lot of reasons why that is. But when you have unsatisfied stakeholders, the perception of the analytics team can be poor, and it results in missed opportunities. Missed opportunities from a business perspective, missed opportunities from a technology perspective, and missed opportunities cause bad results. It will reduce revenue and increase costs.

Slide 6:

As I mentioned, data governance is an important part of the solution, but let’s take a look at this slide. This talks about the data asset life cycle, the information management life cycle. That kind of starts in this dark gray box around data creation. How does your company create data? How does it store data? How does it move data around? Maybe you’re just using data for dashboarding and standard reporting. That data eventually goes stale. You don’t need it anymore. There’s an extinction to it. So the idea of data ownership to data standards, focuses on policies that have been created that data stewards marshal and work towards as part of data governance. Data governance is an important part of this solution to overcome the challenge of data being everywhere, and the operational pain and high costs associated with data.

Slide 7:

So what are we going to talk about today with that frame? I want to talk about data stewardship. What’s data stewardship? This is Wikipedia’s definition of data stewardship, but data stewardship is a concept. It’s an activity and it’s also a role, a role that’s filled by somebody called the data steward, and we’re going to talk about more what a data steward does a little later in the presentation. Essentially data stewardship and data stewards are part of the core of a data governance team. It’s an area where digital analysts and analytics can plug into fairly seamlessly if one understands what it is and knows how to plug in. With all the other stuff that goes on in analytics, for better or for worse, sometimes the quality of the data can degrade despite your best efforts. Data stewardship and data stewards exist to help prevent that.

Slide 8:

When we talk about data governance and data stewardship, this is an organizational capability and here’s a breakout of what a data stewardship organization can look like as you’re building out the capability. It starts with a governance executive board and we’ll talk more about who the data governor is, whether that’s the chief data officer, whether that’s a separate role. Data governance requires committees. It’s ruled by committees because the idea is that data governance crosses all the different business units. And you can have lots of different working groups that are simultaneously working across different projects to assure that policies are being adhered to, processes are being followed, data quality is being insured, and you have lots of contributing members.

Slide 9:   

But don’t forget, as I mentioned earlier, data governance is a technical capability. If you look at this, around the information and data management life cycle, there’s a lot of different things that go on.

Slide 10:

This is where a platform like ObservePoint fits in because it is a technology platform, but it also ties nicely into business operations, especially digital analytics operations around auditing and monitoring your data layer. There’s a lot of marketing technologies and ObservePoint squarely is focused on helping you govern data across all the different marketing technologies.

Slide 11:

But of course, you’ve heard this before—this is like the 101 thing—it’s not that clever to say people process in technology, right? No one in digital analytics invented it, it goes back a long, long way. But it is relevant and smart to talk about process. Digital analytics and data governance is a process, and this is what it kind of looks like. You need your data governed or else people are going to self service themselves into absolute confusion. And in order to do so it’s a process.

Slide 12:

The process first starts with having a strategy, but getting your data definitions right is one of the early things. Kind of like the second step there. What’s a data dictionary? This is from a book that was written yesterday, called “Understanding Computers Today and Tomorrow.” It was written a while ago, the fourteenth edition. It talks about the data dictionary as the data definitions in a database, the table structures, the relationships, the current information, metadata, all sorts of stuff. Let’s take a simpler view, the data definitions as included in a dictionary, it has three components; it has a business component. Let’s say a unique visitor—what do most people think that is? They think it’s a person. Many of you are probably wincing when I say that because we all know it’s not a person. But the business definition of what that is, is one of the first steps to defining digital data, defining any data.

Then you have an operational definition. An operational definition of a unique visitor would be something like: a de-duplicated cookie. And maybe the processing rules on how you dedup cookies. Now we’re starting to get somewhere, you guys hear me. Then when we go to the technical definition—business, operational, now technical definition—it’s these things; it’s a tables and a database, it’s a field, it’s an SQL statement, it’s metadata, it’s all this other stuff.


Slide 13:

Getting a hold on your data and doing data governance requires you to define your data. Then you’ve got to create a business glossary. This is straight out of IBM. This is a tool out of IBM that allows you to create a business glossary and look at what’s in it. You can create terms, you can create categories, you can approve things, you can look at approved things, you can publish things, you can label things, you can assign things. For data stewards, you can look at all your data assets and you type in what it is: unique visitor, a short description, it’s not a person, it’s a de-duplicated cookie. You don’t sum them up like you sum up visits.

All the 101 stuff you guys know cold. Or you can have a long description with some more interesting information relevant to folks who need to get in and deconstruct that data. You can put it into a taxonomy with parent nodes or parent categories, you can reference it across multiple nodes, you can label it with different metadata, you can assign it to Mr. Harold Bishop the data steward, you can add status updates to it, you can search it. Data definitions that you create to begin governing your data can be populated in a business glossary. I have a lot of fun with our clients at SmartCurrent helping them build out these things quite frequently, at a very large scale.

Slide 14:

Then, data lineage—think about it, you’ve got your data defined, you know you documented it. What’s data lineage? I love Wikipedia because it gives us these great definitions that everybody can edit, and hopefully agree on. Here’s the definition from Wikipedia of data lineage. Again, part of the data life cycle, it’s a fancy term like data layer, and it tells you where the data comes from and where it’s moving over time. It describes what happens to it across all these processes that I just talked about, such that you have visibility into your pipeline so you can trace errors back to the source. Lineage is important, and tools like ObservePoint help you better understand your digital data lineage.

Slide 15:

Another thing with data governance in the process—and this is a slide from IBM again and I love this slide, I show this slide in a few of my classes at BU and also at Babson—data can be moved through zones. There are lot of technologies that exist out there that complement digital analytics. They tend to fall more in the realm of BI and arguably data science. But this idea that there are zones you can move data from—maybe you have a raw zone: your data layer where data is somewhat un-curated and exists in its raw file formats. Then maybe you move into an operational data zone.

Maybe you move some of it in a real-time data processing and analytics zone. Maybe you move some of it as it goes through stages of curation and business definition and associative lineage and clarity into something more in your analytics sandbox. Or maybe you do something that’s pumping your data into particular data science tools. Not trying to name any vendors on the call—though they’re all at the top of my mind—to do deep analytics. Whether it's using Python or a whole slew of other abstracted capabilities that data science and data analytics tool vendors provide. Or maybe you do what Avinash hilariously calls: “the data warehousing boondoggle.”

Centered in all of this, of course, is information, integration and governance. And look at the results. You see them here: 79 percent of companies—according to IBM and I think this was a sample of a couple thousand business professional—when they have high-quality data, their decision making is ranked higher on a scale of 1 to 10. I guess 7 out of 10 means you get a C when you’re governing your data. What does it take to get a B or A on this? If you’re not governing your data, what grade are you going to get?

Slide 16:

We talked a bit about business glossary—this is a data catalog. A data catalog would ingest the business glossary. This is another big company technology, we’ve all heard of Microsoft before, some of you may or may not use their laptops, none of us use their phones these days. But check this book. What does it have in the center of it? Tags. So a data catalog would even have your tags and descriptions and they could relate back to the actual tables, columns, fields, data types that your tags may end up populating as that digital data is moved out from the reporting API or your data warehouse, and brought into other systems and unified.

Slide 17:

Part of data governance is this idea of a data catalog. I promised we’d talk a little bit about organizational capabilities. In order to do all this stuff, in order to operate with a data governance process, in order to create some of these artifacts and implement some of these software to enable your data governance operations, you have to think of it as an organizational capability. Squarely in the center of this capability is this idea of a data steward. In this case, you could have an office of data stewards depending on how large your company is. You could have business data stewards and functional data stewards, and they work closely with those committees I talked about, as well as IT, to do all those things that I just described. Continue to think about data governance as an organizational capability with the centroid being the steward, and at the apex, the data governor and the data governance steering committee.

Slide 18:

I just talked about the data governor. For those of you who like to watch AMC in the United States on Sunday nights, I’m not talking about that type of governor. I’m talking about the data governor. What is that person? Often times the data governor may be the chief data officer or the chief analytics officer in companies that have appointed C-level or EVP-level resources to managing data when data is a primary concern. Many of the industries that I just referenced earlier that are serious about this stuff—insurance, financial services, healthcare, biotech, and pharma—they have data governors. Usually they have one, I’ve seen organizations that have more than one, but that’s a little rarer. What the data governor does—it is a formal role and responsibility, if it’s not the chief data officer, it’s often somebody who reports into a very senior leader—they’re responsible for these things that you read here.

They’re responsible for improving data quality and data understanding. That’s a key part of their charter. The goal of making data access easier across the entire enterprise. And like I talked about; creating consistency and high-quality data accurate so they can keep using it. Ensuring that privacy is protected—I should add privacy to this slide—by ensuring and increasing data security. Data governors are concerned about operational issues that affect the integrity of data, the quality of data, the security of data. They’re really in charge of making sure, not just that the data is correct, but the stakeholders who are using it are satisfied with the data they’re getting. When that all happens, the data governor is reducing the cost of the data management or the information management life cycle.

They’re also doing something that’s maybe a little bit less explicit, that’s a little bit more tacit: they’re creating transparency into something that most people don’t care about—data governance. We care about it as analysts, but you know that you’ve got a data governance team, a data governor, data stewards, and they’re all making sure that data is truthful and accurate and consistent and high-quality, people believe it. It makes the jobs of analysts easier. Hear me now and believe me later. The data governor helps to create truth.

As a friend of mine, Matt Cutler—Matt Cutler actually hired Jim Sterne to write the white paper in 1999 called “eMetrics” of which the conference is derived—Matt Cutler was one of the first guys in the 90s to start indexing log files and created a web analytics company called NetGenesis, which I think was the first or second publicly traded web analytics company. Many of you—maybe Phil Kemelgor or Mr. Abraham or Eric Richards—have familiarity with NetGenesis. Anyone who’s been in the business as long as I have knows all about Matt Cutler and NetGenesis and his white paper “eMetrics” that Jim Stern wrote. And what matt talks about with all this stuff in analytics is: we’re analysts on a noble quest for the truth.

Slide 19:

And the truth is what the data governor is trying to get at by doing data governance. But, like I said, in the center of all of this, there’s a role called the “data steward.” I think that the digital analytics team is well positioned to be data stewards for digital data, certainly, and maybe and in the best case—I truly believe this—lots of other data across an enterprise. It is hard to learn the stuff that we’ve all learned. It’s a complex activity and we need to do it better. And data governance allows us to play in the same space as people who are doing BI, advanced analytics, and data science. When you have all of this stuff in place, you have happy data analysts, and that’s what we all want to be—we want to be happy data analysts like this person here.

Slide 20:

So what does a data steward do? I promised you guys I would talk about data stewardship, but I needed to frame it in the construct and the concept of data governance and the organizational capabilities as well as the software and the artifacts that you use to do it. A data steward is the person on the ground, in the trenches, working hard in the fields, plowing them day to day, driving the tractor. They’re in charge of the oversight of how the data is used across the company. An oversight role means that they might not necessarily be using the data, but they’re looking over your shoulder as you’re typing in your spreadsheet or looking at your configuration or running your simulations.

They are people who are in charge of verifying new data requirements. So when you have a new data requirement in a governance organization, the data steward is going to check it out, and they’re going to say: “Great idea. Bad idea. We already have this. Let me help you.” They’re going to help ensure that the data requirements are aligned across the organization with existing data sets. And they likely will already have pre-established work flows to help whether you’re in a waterfall environment or agile environment, whether you’re a SCRUM master, you’re iterating on the points every two weeks. They help you with these workflows that support efficiency and effectiveness across the different software development methodologies, across all these different tools that actually I’ve shown a few screen captures of including platforms like ObservePoint.

Data stewards often exist aligned with business functions, and if you have a lot of business functions—and we’ll talk about that a little later—there’s going to be a lot of data stewards and it’s helpful to have them coordinate. Centralized centers of excellence, hub and spoke—there’s a lot of different organizational models. Don’t believe people who tell you one is appropriate. It’s like Stéphan Hamel says, “It depends.” The MBS answer. Research that I’ve conducted with Doctor Tom Davenport, one of my colleagues at Babson, we’ve studied a lot of organizations and there’s no one right size. There’s great ways to do it, it’s not just a center of excellence of hub and spoke. It’s a lot of different ways: federated, centralized, decentralized, siloed. And these data stewards help to coordinate across those different varieties for structure organization and they share ways of working that help to be more efficient, protect operational properties like I was talking about that reduce operational pain.

Slide 21:

So what impact do data stewards have on the business? They have the ability to work for the greater good. They control human self-interest. They do things like minimize risk because they’re governing IT, which results in better data, which gives you better outcomes for analytics. And, as I’ve talked about before, there’s a lot of operational efficiency that occurs through data governance, so that helps to increase revenue and lower costs. And it reduces the business risk of bad data.

Slide 22:

There are a lot of different types of data stewards. Traditionally, a data steward is a role like I’ve described that ensures data integrity, they look at traceability, lineage, they understand workflow. There’s also scientific stewards and technology stewards. We see these less in marketing organizations. Technology stewards are often called data custodians, but a similar concept. A scientific steward is something you see in more regulated scientific environments like healthcare and biotech or in heavily algorithmic data science organizations where applied analytical science and computer science is in vogue. Lots of different types of data stewards and they have different responsibilities. There’s even functional and business unit data stewards, as I’ve described, those who work across functions and those who work within specific business units.

Slide 23:

Data stewards are at the center. They work closely with data governors. You can’t build this enlightened triangle, this magic triangle, without data stewards close to the customer. What data stewards do is they work with end users, analysts, business stakeholders, business teams, to help identify requirements and guide policy decisions that data governor or data governors makes based on the executive leadership’s input.

Slide 24:

Data stewards execute across business units. You can have one that handles all data stewards, you can have a data stewards in each functional unit, you can have ones that cross the lines and work across many. This is an example of data stewardship in—any guesses out there? Insurance.

Slide 25:

They also participate in these governance councils and committees, and often times you’ll see many different roles and responsibilities from many different business units in these committees. Certainly the data governor’s going to be there, maybe your CIO, your CTO might have a role here, or they may delegate that to their subordinates at the next level down. We have folks who are in charge of compliance and regulated environments, in charge of security in most environments. You’ll have folks from operations, marketing, finance, product. And what happens is when data stewards need to define new workflows, need to create new data, or they need to ensure that there’s operational consistency across these different units and there’s some confusion or concern or escalation in necessary, or strategy needs to be dictated—it goes up to the governance council to decide and cascades down into the organization through the data stewards.

Slide 26:

So how do you get started with data stewardship? If you look at digital analytics, you get started pretty quickly. Free accounts with Google Analytics, buy some software from Adobe, maybe go to some of the other vendors that are out there. You’re establishing this idea of the culture of analytics, being data-driven, you’re giving access to data. Then you have to start doing stewardship. That gives your data the quality, that ensures people use it. That continues to rinse through that by acquiring data that allows you to ladder up your maturity to where you’re moving into things like not just predictive, but prescriptive analytics. Not just predicting what will happen, but based on those predictions, recommending what people should do next, what actions they should take, what you prescribe them to do. This journey can take years.

Slide 27:

Thank you, I hope you enjoyed the presentation. Again, my name is Judah Phillips, I’m the founder of SmartCurrent. You can reach out to me at any of these email addresses. Feel free to follow me on Twitter, I usually do one update a day on some cool information I find out there in the wild. And I’ve got three books out there, you should all read my books, I think they’re pretty good. I look forward to seeing you out there in the industry. I want to thank everybody here at ObservePoint today and I’m ready to take some questions if you’re still online. Thanks.

Previous Video
Lori McNeill, Riptide Analytics - No Dev Resources? No Problem! 6 Technical Hacks for Analytics Practitioners
Lori McNeill, Riptide Analytics - No Dev Resources? No Problem! 6 Technical Hacks for Analytics Practitioners

Lori will be walking through some "hacks" for analysts to increase self-sufficiency in getting the data the...

Next Video
Daryl Acumen, Hewlett Packard Enterprise - Keeping Your Analytics Clean and Consistent Across Business Units
Daryl Acumen, Hewlett Packard Enterprise - Keeping Your Analytics Clean and Consistent Across Business Units

This session covers threats in maintaining analytics across business units and demonstrates why it is criti...