How to Ensure a Healthy Data Layer: Data Layer Governance

March 23, 2020

The value of a data layer is widely understood in the digital analytics industry. What isn't, however, is data layer governance.

Keeping your data layer free from errors can be quite a task, but it doesn't have to be. In this webinar "How to Ensure a Healthy Data Layer: Data Layer Governance" with Rob English, Implementation Specialist at Napkyn Analytics, and Jordan Hammond, Consultant at ObservePoint, you will learn how to:

  • Implement key governance processes
  • Automate data layer governance
  • Increase data quality

Fill out the form to watch the webinar.



Hi everyone, thanks for being here today! My name is Aunica Vilorio, I’m the partner marketing specialist here at ObservePoint and I’ll be hosting today’s webinar, “How to Ensure a Healthy Data Layer: Data Layer Governance” with Rob English, Senior Implementation Specialist & GTM Practice Lead at Napkyn Analytics and Jordan Hammond, Data Governance Consultant at ObservePoint

Rob English is an experienced Analytics Implementation Specialist, having worked in the digital industry for over a decade. Serving as Napkyn's GTM Practice Lead, Rob is at the leading edge of Google Analytics and data layer deployment. 

Jordan Hammond is a data governance consultant responsible for world-class enterprises like Comcast, Harley-Davidson, and Texas Instruments. With a background in communications, Jordan has nourished his passion for software by spending time working for tech companies like Mozenda, NUVI, and ObservePoint.

With that, we’ll turn it over to Rob English. Rob. 

Okay, so before I dive into the governance side of things, I'm going to just basically go back to the very basics of what is a data layer. Many of us who are on this call are probably very familiar with what the data layer is, but for anybody who may just be learning about it or may want to know more about it. The data layer is basically just a Javascript object that sits on your website and it becomes a unified structure or repository for all of the data that you're going to be using for various tracking tools like Google Analytics or Facebook tagging, AdWords tagging. So it sort of unifies all of the data on your site that would normally be scattered through different variables or cookies or directly in the presentation layer itself. And it puts it in a nice organized structure that's free and clear of your markup. 

So why is the data layer so important? First off from a consistency and accuracy perspective, it does unify all of these data sources and allows you to put it all in one place, that becomes the source of truth for everything that you plug into it now. So rather than having to constantly scrape your page values to send to tagging, you can now just reference that same variable for all of your tagging from the data layer. 

And then from a governance perspective as well, just having to keep an eye on that one unified set of data rather than having to constantly keep up with what's going on in all these separate data sources is much, much easier in the long term. 

Scalability wise, it's also very easy to scale up or back depending on what your needs are, just by adding a couple of key value pairings into it or events or anything like that that helps to bolster new tracking. Or if you find that you have old tracking you no longer need, you can always remove that as needed. 

And then finally, this sort of encompasses all of those previous points, but I can't stress it enough. Because it is a single source of data removed from the presentation layer of the site. It's  a much better solution than other more commonly used methods as well, such as page scraping. Some of the problems with page scraping are that you have to account for different sources of the same piece of data across different pages. We have to start pulling it in from separate variables and it can get very easily broken simply with a couple of style changes on your website. So by passing it all to the data layer, you remove that presentation layer entirely and you have this separate repository that everything can rely on. 

So why is data layer governance so important? 

First off, from a data quality perspective, because it is the single source for everything that's going to plug into it. If you get it wrong once, that trickles down into all of your separate tracking suites. So you'll start to see inconsistency in Google analytics. You can start to see it in Facebook, Critio. And then the opposite is if you get it right in your data layer and all of these tools reference that point, your data layer, you can be more confident that your data is as accurate as possible.

From a visibility perspective. It allows you to understand what you're collecting, where it's being collected, how it's being collected, and why you're collecting it that way, much easier. 

And then from a compliance standpoint, this is super important, especially with new legislation like GDPR or CCPA. Particularly when you start getting into the documentation end of things because you do start to identify where potential areas that you need to focus on in order to meet compliance with these new legislations could be. And as those legislations change or new ones get added  you can easily go back to that documentation and really understand any changes that you need to make to make sure that you are maintaining compliance with those new regulations. 

So next I'm going to dive into the data layer governance process. Generally there is probably going to be one of two spots that you're in if you're in the governance process. So you might be at the point where you are just defining what the architecture of the data layer is going to look like and you want to implement a governance process right off the bat. Or you may be at a point where you already have a data layer, but you really want to go back and revisit that governance process to make sure that you are maintaining data quality and accuracy. 

So if you're just starting out, some of the first things I'd want to answer is who's going to be responsible for what in this entire process. So this is something that's probably not going to be done in a siloed manner, but it is going to encompass various departments within your company. It could include vendors from outside of your organization. So for example, your IT department or an external development company might be responsible for building out your data layer architecture on your site. And it might be your advertising or marketing department that drives what it is you want to know. So what you need to expose in your data layer. 

And then answering a few questions surrounding what do you hope to do with your data layer? So is there a particular set of events or reporting that you're looking to get an answer on? 

And then also, and this one you might not be able to answer fully right from the start, but what will you be using the data layer for? So what analytics solutions are you going to plug into it? Are there any marketing pixels that are going to be included? And you'll probably have a small set to start with and you'll probably grow out from there. But even if you can identify the basics of what you're doing with it now, this'll help you to identify what you need to expose in your data layer longterm. 

And then if you already have a data layer in place you've probably done those last three steps. But if you haven't, I do recommend that you do identify and answer those questions at some point. But if you have already, in order to make sure your data layer is up to date and as governed as possible, I recommend auditing the state of your data layer. So really diving into what it's doing and where it's doing it on your site, documenting that state. And as you go through that process, you're probably gonna want to optimize it or identify areas where you need to optimize it. Make those optimizations and then re-document so that you always have an UpToDate state of what's going on. And then long term, and Jordan will jump into this later on in this webinar, is just monitor your data layer and tagging longterm. 

So from the audit perspective the general approach that napkin likes to take when we work with clients and the auditing of the data layers that we don't just focus on the data layer, but we do audit the tech stack as it applies to their analytics tracking as a whole. So, you know, we look at what's happening on a page type page type basis, what values are being passed in a data layer, but also what tags plug into your data layer and at what point they're being triggered. Are they passing any custom variables? Really identifying how your data layer is going to be used and what values your tagging are going to take from it. It's really important to know not only what's in the data layer, but just also how your data layer powers your analytics. 

And then when it comes to documentation surrounding the data layer, this is something I usually refer to as a data dictionary. And the reason being is it basically becomes a variable by variable description of what your data layer is. So I'll identify just every key value pairing that gets populated in your data layer at some point. In your site, what it should be collecting, whether that's as generic as something like a product ID where it could be collecting thousands of values, just providing a description of what you expect to see there. You may run into data that you only expect one or two values. So you could get as specific in your documentation as, for example, a user type. You could get a specific as saying this should only collect logged in or logged out. And then identifying on a page type basis how and where it should be seen. So when I say page type base, I mean sort of breaking your page down into like this is a homepage, this is a product listing page, this is a product detail page. And then from there just explaining what should be seen across those page types. And then repeating this process at the event level too is also valuable. As you can start to define, I expect to see this event triggered when I do this on my webpage and these are the data values that I expect to see fired with it. 

So why do I want to do this? One of the main reasons is visibility across your organization and then just changes to your team as well. So you know, people can get promotions, they can change teams, they can leave the organization, you might have vendors that are involved in this process that change over time. You really don't want the state of your data layer, the knowledge about your data layer, being siloed in a way that's you know, if changes do happen that you don't have a running state of your data layers as it stands and have to, you know, revisit auditing your, your data layer every time you want to know what's going on. So this will just allow you to keep a documentation that's there from start through all those changes that team members and different members across your organization can go back to and get a sense of what your data layer's being used for, what you should expect to see across the different page types or events. 

So as Rob said, he kind of touched on the importance of the data layer, what it is, where you can find it, what's, what are some of the things, key things you can look for? The first thing I want to kind of focus on is taking a step back and for those who might not know where the data layer is, a lot of times clients that I work with, I'll talk to and they'll say, I don't know where our data layer is. I can't find it. So the first couple slides I'm going to go through, I'm just going to kind of highlight how you can find it on your webpage. And it's very simple. 

So for anyone familiar with Chrome or any other web browser, you can go into the dev tools and go and search for page or page name. And then you can also search for universal data object, UDO, or just look for data and type in the data layer. And you'll be able to identify on a webpage if you're on your URL where your data layer is, where information is being collected and stored based on any type of events that are firing on a page. 

So there's a lot of different types of technologies that will use the data layer: Tealium, Adobe, Insighten and Signal, Google. there's a lot of different data layer names that might come from those. A data layer is not going to be a universal name though. Each of them are going to have a little unique name and it can be altered. So typically with Tealium and you'll see Adobe: _satellite.dataelements. And these will typically be used to be able to find and identify what's being stored on your page. Here's a few examples that I will typically share with clients to understand that most websites have a data layer and this is the kind of one of the key takeaways I initially want to start with is like this is a very valuable piece of information. Because kinda like Rob suggested, this is going to be the foundation of how you are identifying the different tags on your page that are collecting information that might be similar to your blueprint, which is the data layer. So here is an example of something you might see when you're searching for the data layer right under the, you'll see that there's different variables, like the site country, domain language, the page category, the site currency, the site region, the page type. These different variables will be collected within the data layer and so if anything goes wrong with your Adobe Analytics or Google Universal Analytics where the data is wrong, you can always reference back to the data layer. And so it's very important that we put this into practice and I'm seeing with more clients that I work with that this has become such a bigger topic with a lot of the things going on with GDPR and CCPA. You want to make sure your data is correct and you know what's being pulled in. 

So within ObservePoint, we try to emphasize that this is a continuous process. You don't upload your data layer and then just leave it be. Cause you have Adobe Analytics scanning on your site and change is going to happen. It's going to be frequent. It's going to be coming from different parties that are working on the website, and so there's bound to be mistakes and this is the whole mission of ObservePoint. We want to catch those mistakes so we can assure that data accuracy is 100%. Now we'll try to help catch all the different things that are happening on the website through the automation that we have within the tool, but this is a key thing that we need to be tracking is the data layer. So again, every time the website has an update, there's a potential break in the data analytics. 

We'll want to use that data layer type to diagnose those problems. So typically what I'll see, as an example, someone is tracking different variables in Adobe analytics that would be representative of something that would be in the data layer as well. So Adobe Analytics has the different evars and props that are collecting data on a page. So let's say one of those props or one of those evars starts missing data or it's inaccurate, it's not the right data. If you, a lot of times people might not have a reference sheet to say, "Oh, what is it supposed to be?" Typically, clients will refer back to the data layer because the data layer is like that blueprint for what you should expect on your website. So I like the example of like the car model, there's a car that's new, but there's a lot of different features that are added to it and things will change in those different models. If there's a recall, they'll look back at the blueprint and say, "Okay, this is what went wrong with this different model of this car. Let's go and fix that." And so they'll take all those different models that had been altered. So like Adobe Analytics has been altered from the data layer, they'll take it back and reference back to the data layer and go and fix that. So, and ultimately this is kind of the key. The key is consistency. You want to be able to understand that there are different, there's different levels of validation, but you want to make sure it's consistent through all those different levels. From the data layer, to the tag manager, to the analytics itself, to those different vendors. That way there's a consistency between all three or four different levels of reporting. That way you can be able to identify any problems and know exactly how it needs to be fixed. 

So how do we do that? So there's different measurements that we take in ObservePoint that will be able to allow you to monitor the data layer like we would do with any other different vendor or tag. So right here at the very top, it's a small screen shot, but I set up a scan that would scan through a hundred different pages on a specific domain. So if I use the example of We scan a hundred pages on there, we're checking the data layer, it's going to hit go to look at all the different page load events that are firing on all 100 of those pages. So it will pull in, it's going to pull in all the different network requests and see, "Okay is what's happening." And then we can also, we have that ability to pull the data layer as well. So even outside of just the different vendors, we can look at the data layer and see, "Okay this is what's on each page." And then you can be able to monitor different types of variables or key indicators from the data layer to validate and create alerts based on the different parameters that you've set with your team, the marketing team to understand, okay, this is what we need to be tracking and if anything goes wrong here, we're alerted right away. 

So we want to be able to automate this process. We don't want to be having to go and spot check something that's wrong or only go in and check when we're starting to miss data. We want to continue to be more strategic about our approach of validating our data layer. So typically we'll create those alerts or rules that back expectations just say, "Hey, on this particular, on this page, on this data layer, I'm looking then these 10 variables that should be consistent throughout all my different pages." Whether that's scanning on the audit, looking at page load events, you're looking at click events, drop down menus, fill forms, confirmation pages, you're going to want to keep a consistent track of everything going on on your page. So typically clients will set up alerts around the data layer based on what's expected in their Adobe Analytics and their Google Universal Analytics or any other vendor. So they reference the two together to make it a lot more seamless. 

So I already kind of mentioned some of the benefits of the data layer, but I just kind of wanted to dive into this more. So really it really boils down to, you're going to have a higher confidence in the data that is powering your data decisions based on any type of marketing that might be present or advertising on your site. So it's really key. It's important that we have this. It's going to set you up for success if you have that application layer, the tag manager, the data layer and the website. So we have all those different layers then need to be consistent throughout the entire process of outpatient. So we can do types of comparisons. Let's say, Hey, I want to check the data layer against the application layer. So against Google Analytics or Adobe Analytics, I want to compare the two and make sure that both of them are representative of the data that should be collected. So they're consistent between the two. 

There's a lot more visibility longterm across your different domains and organizations and websites on how this can be used. So we're able to pull all those requests, we're able to pull the data layer on all those different events that might be firing on the page to validate what's expected, what might go wrong. That way we can be alerted right away and this is easily scalable. We can scan hundreds of pages at a time and look and create different types of critical flows on a website, whether that's a purchase process or signing up for a newsletter or a "contact us" dropdown page, we can be able to scale this on all different types of events firing on a website, whether that's page load or any of the others I mentioned, any click events. 

So some of the key takeaways from this, from what Rob and I have spoken about is data layer governance is vital. This is something I've seen personally with a lot of the different clients I work with that were mentioned at the beginning outside of those is that this is becoming a higher subject. So if you're not doing anything now, go and speak to your representatives, with your partners, your marketing team, your analytics team to talk about, "Are we tracking our data layer? Are we validating that to make sure we're pulling the right information." If you're working with ObservePoint, talk to someone at ObservePoint about this and they'll be able to mention how you can be successful with using ObservePoint to validate your data layer.

And then documentation knowing what your data is. Data layers collecting. This is a huge, huge thing. A lot of times you'll see that like, yeah, we're just collecting a bunch of normal information when they visit the site, what's the day, but with a lot of the things going on with GDPR and CCPA, a lot of security type things, you want to know exactly what you're collecting. That way you can feel confident that you're responsible with the different data and personal information that might be collected. 

And then finally automating your data layer QA will ensure that your data is accurate and will affect the reporting you get in the end. So being able to scale this, being able to automate this process will not only save you time, but it will also be able to give you a higher confidence in the data you're collecting, making sure it's accurate, it's representative of your key performance indicators. And then also being able to reassure that data that's not only being collected in your data layer, but that's being representative of your data layer or within your tag manager, within your analytics tools like Adobe, that all the data is being accurate. And then you can always reference back to that. And those are some of the key takeaways. And there's a lot more too we can discuss, but data layer governance is a vital piece to your data governance on your website as a whole.

Previous Video
Automating QA Testing for Failproof Analytics & Marketing Data
Automating QA Testing for Failproof Analytics & Marketing Data

Learn how to automate QA testing to benefit your organization with Chris Mavromatis and Mike Maziarz

Next Video
Assessing and Improving Customer Data Maturity
Assessing and Improving Customer Data Maturity

Learn how to assess your customer data maturity, identify problem areas within the 6 dimensions of maturity...

Get a free 14-day trial with ObservePoint

Start Your Trial