Team Demystified- Debugging Code, People, and Habits: Tips for Better Data Quality Management

November 22, 2016

Slide 1:


We’re all so glad to be here today. It’s a really innovative format to have a virtual summit, but we’re heard a lot of great presentations today and are looking forward to more. Thank you everyone for attending our session today. We are going to talk to your guys about the idea of data quality management in terms of debugging the people, the collection, and the technology.

My name is Elizabeth Eckels, you can call me Smalls, I am better known as Smalls within the industry. I’ve been working with Team Demystified for a few years now, and I’m really quite happy to be under the Demystified umbrella.

Slide 2:

Let’s go ahead and we’ll move to the next slide. What we’re going to address here in today’s concepts are essentially three elements for data quality. It’s the idea of collection, people, and technology as three elements that are crucial to be successful in this realm. What do we mean by the idea of collection?

Slide 3:

The idea of collection is more about the concept of your tracking code configuration and processes that you use to collect the actual data.

Slide 4:

On the next slide we talk about this idea of people. So people is really, not just yourself and not necessarily only the people who are working on the data, but everyone involved touching the data collection processes or interpreting and manipulating the data, pulling the reports. It’s really everyone involved or affected by the data that you track.

Slide 5:

The last part of this is the concept of technology. Don’t think about this as the systems and tools for data collection, this is the systems and tools to evaluate and monitor your data quality itself. We’ll be talking to you guys about these three umbrellas of data quality management.

Slide 6:

To get started we will talk about this concept of tracking for debugging. Tracking for debugging, what I mean by this is really the idea of intentionally collecting data related to QA processes that will help you in debugging your implementations.

Slide 7:

So the seven points here that I will give you an example of, and I do recommend standard, are going to be the ones that I have listed here. Just know that you can definitely report more than these, you don’t have to use all of them. I’m just going to share with you some scenarios of where I find that they are useful. Also I want to mention as you track these, you don’t have to actually put each one of these into their own data points. So for people who have smaller clients, possibly using the free version of GA, you are limited in the amount of variables that you have to collect so you can concatenate them. Keeping in mind that you don’t want to basically have your values collected that are too long, so you won’t want to concatenate raw URL with anything, but for smaller length values, so you don’t get truncated or rejected server calls, you can definitely concatenate them.

Slide 8:

Raw URL is our first example. I really like this one because if I collect this, it actually helps me track the functionality of all of my filters or any of my processing roles, and it can allow me to check that my admin settings are correct. In this example here, I had a user come to me and say, “Wow, my blog hits are getting around 40 thousand page views for this period. What’s going on? Am I awesome?” then we broke it down by the raw URL and the host name and we were able to see that actually the page name itself was not set correctly. The page name itself actually was being set identically for two separate URLs and two different user experiences. We were able to update and fix and change that.

Also, you can see on line six down there, the third white line item, is we have, and what we figures out there is that the host name is correct, but the page name was unspecified because we had a tag management configuration system setting that basically overwrote and validated our page name when GCLID was in the raw URL. So we were able to fix that as well. Kind of a nice way to give you detail on exactly what URL is causing the problem and help you hypothesize and solve for what could be contributing to the issues you have.

Slide 9:

In the same line as raw URL, we talk about about the host name. I showed that in my previous example, but just some additional ways you can use the hostname as a debugging value is actually collecting it uniquely and I know a lot of you are already thinking, “Well the host name is already there.” But the idea here is that it is not available by default in every single report. When you put it into a custom variable, you can actually set it to be useful in much more than just whatever reports GA or Adobe allows you to put it in. That’s one of the biggest reasons why I prefer using hostname. Also, you can use with alerts to make sure that you are tracking on the domains and hosts that you want to. Sometimes you don’t want to track any subdomains or you want to make sure the traffic you’re getting is actually coming from your own host. And, as in the previous example, you can use it to check that your default pages are being set correctly.

Slide 10:

The third one here is going to be the tag management container version. I like this one because it allows me to understand if I’m seeing any nuances in my implementation, I can just immediately break it down and know that if it’s a current problem or if it was a historical problem that has therefore been fixed. This is a big time saver because if someone finds out that last Monday we started seeing some strange variables, instead of spending all of your time tracking what’s been going on since last Monday, you can break it down by the container versions and say, “Oh well that was an old version of our system settings. And I know that when I pull it in the newest versions that that problem is not happening again.” I really like that.

Another reason I use this one is because it’s easy to filter out the container preview traffic. Another case where this is a useful debugging variable to track is, I see here in my example that version 34 was published, it only has 3 hits. I could probably ignore it, but I want to at least follow-up and figure out why was that published. Talking with my client, I found out that they accidently had launched it for a couple of seconds. So I now knew when someone was messing with our publishing of containers.

Slide 11:

Site environment is one that I find very valuable as well. Normally, yes, you can track this by setting your own variables or your own filters to understand when you want a development, UAT, or production environment. However, I prefer that this is set in the data layer by the developers because they tend to have better ways to know whether or not it’s a development environment, staging, etc. I use this one most commonly to include or exclude production data. I can also use it to see if values are being set differently based on environment. Sometimes you know that your databases between dev, staging, production don’t quite lineup like you think they should, therefore it may impact what values are set in the data layer. This is a really great use case to find if it’s a production issue or not. Last here, in my example I’m showing, I can see when staging traffic is coming through my main site, my production site, and I see these shouldn’t be coming through. I need to double check my filters and maybe understand why this could be happening.

Slide 12:

The next one I use is server time. This one I don’t use commonly, but when I do, it is super valuable, really to understand the order of hits being fired. I like to use this in conjunction with segments. Whenever I’m trying to say A happened, B happened, C happened, if that filter doesn’t work out or if the segment isn’t showing what I’m expecting, I can go back and actually look in my data and see: is it happening in order A, B, C or is there possibly a timing issue where it’s firing A, C, B and I didn’t know it? It kind of helps you in that scenario. I also like it because you can pinpoint the timing of changes in variables or related to site launches. I’m sure we’ve all had that experience where we see a client who launches something and maybe for that to tell you, didn’t tell you, or launched early and you weren’t necessarily aware. I like server time for that reason as well.

Slide 13:

The next is page template. Page template is one I like to find out if a specific template is causing any issues. In this example here, I actually concatenated page type and title onto one custom dimension in GA, you do the same thing in Adobe, just use a s.prop. What I found out here, when my client and I were looking at specific issues, we found out that a certain page type was not really configured as we had intended it to because it had the exact page type name as a different template. So we were able to see this should be landing one, landing two, however, all of these seven are being categorized under the same landing page even though they’re not. That helped us really identify our problem in that state. This one as well, the next one that I’m presenting, keep in mind that while these are very useful debugging variables, they can also be very useful to your marketing analyst or your site analyst teams. They can basically break down behaviors by the page template. Hopefully you have different behaviors by page template. But maybe you could even test out two different templates and see if there was an adjustment in behavior there.

Slide 14:

And then last, but definitely not least, there’s this idea of user login status. User login status, I like to use this one to make sure that my values are being set consistently, whether the user is logged in or not. Again, this is a useful business case. In order to segment by user type, you will pretty much in all cases see that users who have logged in or are logged in behave much differently in general than users who are not. So useful as both a technical and business analyst. An example I show here is just showing that I can verify consistently whether or not the user is logged in, they have the same values which is expected for this case.

Hopefully you found those seven data points valuable and consider using them for debugging to manage your data quality. With that, I’ll pass it along and we’ll talk more about auditing with tools.

Slide 15:


Thanks Smalls.

Slide 16:

My name is Nico Miceli. I’m a technical analytics consultant at Team Demystified. Today I’m going to talk to you about one specific tool that I use all that time that can seem daunting, but is very robust, very suitable for many kinds of debugging, and it’s already on your computer right now.

Slide 17:

It’s Chrome Dev tools. Chrome Dev tools is a complete development environment built into your browser, built into the Chrome browser. It allows you to run lots of powerful tests, check how things are firing and working, inspect HTML elements, view the DOM, debug JavaScript, or get notified of JavaScript errors, view network performance and how things are loading. To open it, you can follow one of these methods. One of the the things I’m going to really talk about specifically today is the network tab.

Slide 18:

There are other features, other ways to use this tool, but the network tab I feel like is a common one that’s overlooked. This will really show you what’s happening on your site and what’s firing off. So all these different network requests are various things that are making your site go. Images that get loaded on your site, scripts that run, all the code that gets run, along with all mainstream analytics tracking, is built in JavaScript. Google Analytics, Adobe Analytics, and so on, all use JavaScript and then they fire off something to their servers and they collect data. Before we get into that, I just want to start by explaining a couple of things that you should know about. First one is, right here. All the different requests, each one of them are going to be laid out and you can filter them right up here.

You can just type some keywords if you know what’s going to be in the request. I’ll show you an example shortly of what that looks like. The other one is preserve log. This will help you keep all the network requests as you click through different pages, as the page's reload, things like that. Right here is actually one of those issues that I’ve had a couple times. These are all different filters for these different network requests. Most of the time if you’re an analyst, or even doing technical analytics QA, you won’t really have to filter most of these, what you’re looking for is an image request or all. If you can’t find what you’re looking for, make sure it’s not clicked on Doc or clicked on the CSS thing because JavaScript doesn’t execute with CSS.

Slide 19:

When you click onto one, you’ll see that this is actually the data that’s being sent to Google and Adobe servers. You can see right here, this is the full request URL. If you’re not in this view just click on Headers and it will show you everything. Right here, the status code, you’ll see 200. That means basically, that’s good, it fired off correctly. If it’s a yellow or red light, you’ll see other things. Right at the bottom, you’ll see all of the query parameters, and this is broken down based on the information that’s being sent to Google Analytics. On this example, this was on Demystified’s tab, this was actually on Demystified’s site, and you can see if you actually look at what’s being sent, the website URL and you can see, up here, implementation, the language I’m viewing it from, the page title, domain title, the version of universal analytics, the TID, which is the tracker ID, so you can see what’s being sent.

Slide 20:

Same goes for Adobe.

Slide 19:

And for both of these, you can go to this URL and see a list of all the parameter names and what they equal. You can look at this and actually know what every one of these characters mean.

Slide 20:

Like I said, same with Adobe. You can see page name equals home, Demystified, and all that other information. R is refer, so I can from google to Demystified. You can really see if this data is being sent properly at a very granular level without using tools because this is built into Chrome. While tools are great, and Jonas is going to speak about them, this will also help you continue to dig in and try to make sure that nothing else is affecting it.

Slide 21:

There’s also a lot of other debuggers built into the Dev Tools, in the console. And the console is another pane on the Dev Tools screen. These are great to have. These are things that just basically print all the information in a more pleasing view. But if you have lots of hits and paid views going on, that can get really daunting, can get really confusing.

Slide 22:

That’s why I like to use the network tab because I can view sequential data requests and really filter in or filter out what I want. For example, let’s say I’m going through a site—an eCommerce site—clicking on things, going through the checkout, and I’m firing off tons of hits. That’d be hard to really look at in a console log or in other kinds of debuggers.

Slide 23:

So I’ll come in here and since I know collect is in the URL—in the Google Analytics URL—and I know T equals event and T equals page view, is a parameter—and that means tracker type—I can just say, “collect dot star.” Turn on Regex, and it will show me all the events that are being fired and then down here we’re showing all the pageviews. It really lets me filter, in and out, what I want and lets me look at the sequential view based on the tracking that I’m specifically looking for. Next up is Jonas.

Slide 24:


Thanks Nico. Happy to be here. I’m Jonas Newsome with Team Demystified, I’m honored to be a virtual panelist. Nico has talked a lot about the network tab in Devtools. I wanted to talk about some of the more simple, I guess user friendly, tools that you can also use and their browser extensions in Chrome or Firefox. So let’s look at that. We want to talk about those because not everyone is as technical and we want to be able to make it easy and make it clean.

Slide 25:

On the next slide. We’re not going to cover every one of these. We want to cover a few that we found very valuable. We’re going to look at ObservePoint’s extension, Omnibug, Ghostery, WASP. We’re not going to cover GA tagging system, but we definitely find it valuable, or Adobe’s debugger or others. Definitely going to cover the ones that are most commonly use in my tool belt.

Slide 26:

That would be Omnibug and ObservePoint. I want to compare and contrast these two and explain what they bring to the table for basic manual auditing, so not what ObservePoint’s full tool can do, which is a comprehensive sweep across many pages. These are more for individual scenarios and going in and being able to figure out what’s going on on the fly. And they’re agnostic to platforms across GA, Adobe, even Kissmetrics or WebTrends, those types of tools. With Omnibug, I find it really useful. It persists and preserves your log across many page views, and compared to ObservePoint, it’s a little bit more buggy sometimes. Sometimes it will drop calls, especially on really long URLs and it mostly sticks to analytics type tags, which is good if you’re really focused on that, but you also miss a lot of other types of tags. I would recommend using it on Firefox where the developer who built it uses it and is native there. And you have more options to highlight different variables, decide which tools you want to enable, which platforms, and that’s really great because if you have specific variables that you’re looking at on a consistent basis, highlighting them will really pop out and make it faster.

ObservePoint, I use it almost as commonly. The persistence, or the log preservation is a little bit more configurable, which I like. You can decide when you want to have it load just on the one page and reset on the next hit, or when you want to have a persistence and collect all the hits throughout your surfing. It’s a little less buggy than Omnibug, at least in my experience. It has a much cleaner Excel export, which Cathy will go over. And it definitely has a more comprehensive tag list. It covers analytics tags, tag management hits, advertising, data management, social, optimization testing, so most of those types of tags that we’re interested in in this industry, you’re going to be able to get really easily and cleanly through ObservePoint’s extension. Another note I would just mention, I do a lot of QA in incognito mode and both of these will work, you just have to enable them so that they’re not blocked when you’re in incognito mode.

Slide 27:

I also want to talk about Ghostery and WASP, which I’ve used for helping to block scripts. When you’re interested in looking at all the tags that are in your page, not in an in-depth level, but just sort of a high level, these two tools will be great. I’ve found that especially when I’m doing optimization testing and there could be caching issues or you’re trying to get down to the pure environment, this tool lets you toggle and say, “I want to block, let’s say Crazy Egg right now and Google Analytics or whatever and I want to refresh the page.” And it will show you what happens with your tags in that moment.

Same with WASP, it gives you a nice map of how tags are related and interrelated between each other, which is really cool and interesting. It also lets you block a script or high a script and then reload the page to be able to see what would happen if I didn’t allow that to fire. It comes in handy, especially when you’re trying to figure out why tagging or analytics isn’t firing as you would expect. With that I’ll pass it on to Cathy.

Slide 28:


Thanks Jonas. I’m Cathy Morse, and I’m going to talk to you about analyzing hit data. You’ve seen some great examples of how you can collect this data in your browser.

Slide 29:

Maybe you’ve even seen some copying and pasting of that data into a spreadsheet. I want to give you a tip that will make it a lot easier to see that data in a more organized fashion. A lot of times when you go into these debuggers or these Devtools and you copy and paste different hits across different pages. The information is not aligned so you can’t easily see what variables are firing across all the different pages. If you use ObservePoint’s debugger download feature, you’ll see here in the example below that it organizes it in this really pretty tabular format and lets you see all the different variables and events that are firing across all of your different pages in one organized view. You can see if something is missing or double firing or if it’s not implemented consistently across different pages as you would expect. And this is easy once it’s in Excel, you can so all of your sorting and filtering to find the information that you’re looking for.

Slide 30:

In order to collect this data, you’ll want to open up the debugger and just make sure you hit record. Make sure that that dot is red. Then just surf and navigate your pages as you would normally to collect the hits that you want, and then just click the download button and you’ll see that download into that tabs file.

Slide 31:

Now here on the next slide, I want to talk to you about that that can be a little bit tricky to open up sometimes depending on your OS, but I have found that these particular tricks help open up those files in both PC and Mac environments. I’m not going to walk through all of the details here, they’re just for reference. But know that basically you’re importing the file into Excel as opposed to just clicking the downloading file. And in some cases, like on a Mac, you actually have to rename it with a .txt extension in order for it to open up properly. Then once you import it, you can use your Text Delimiter Wizard to get that data formatted in the spreadsheet that you need.

Slide 32:

With that, I’m going to talk a little bit about Adobe, their raw data that you can get from client care. Not everybody knows that this is available to you if you just send them an email with the report suite and the data that you’re looking for, they can send it to even their FTP site. So it’s pretty easy to access. The key is just not to get too much data, so make sure you can kind of estimate how many hits you think you’re requesting.

And if you need to, you can always ask for that in an hour granularity and so you can get smaller files. But this is really helpful to see all of the data that’s being sent to Adobe, as well as the processing that they do on their backend to persist evars or upend any other data that may not be available to you are debugging in your browser. These are things like their serve time stamps, visitor IDs, page numbers, and a lot of those things can be really useful to scan through, especially if you want to just sort of see at a glance what your typical sessions look like. You can sort your data so that it’s in the order of each session so you can sort of glance through it and go: “Does this make sense?” there are a lot of callings of data, but I find that the process of calling that data is really helpful in terms of finding any issues that really stand out and just to get more familiar with their data structure overall.

Slide 33:

One of the great use cases I’ve had with raw data is understanding the delay that offline, native app hits can come in. If you’ve got instant user apps, offline enabled, you know that those hits can come in later, and there’s both the device time stamp, and that’s what’s actually used in the reporting interface, but there’s also the server timestamp of when that call was received. Based on the delay of those hits, you may want to wait a certain period of time in order to pull that data, to make sure that you’ve given your data enough chance to come in, just based on people’s usage patterns. So if you look at the raw data, you can look at these two different date time stamps and calculate the difference and actually come up with a curve that tells you when late hits come in and when do the majority come in so that you can have a better guess as to how much time you should wait to actually pull a native app data.

Slide 34:

Here’s just a quick reference to the documentation. It’s really important as you get in there because it’s going to be a lot of new fields that you’re not familiar with. So just know that they’ve got great documentation, and that you’ll want to reference this as you start digging into any of that raw data. And with that, I’m going to hand it back over to Liz to talk about who to involve in the QA process.

Slide 35:


Thanks so much Cathy, Jonas, and Nico. That was a lot of awesome information very quickly. Just kind of pausing really quick, everyone can refer to this deck as you have any kind of questions as we go forward. We’ll wrap it up today with this idea of addressing the people concept of data quality management.

Slide 36:

The idea of defect introduction is really something to think about. We don’t really want to do QA when it’s going to be most expensive. Defects are mostly introduced at the requirements and design phase, but because we don’t necessarily go through QA processes at that moment, we don’t necessarily know it. Keep that in mind as you are going through requirement and design phases, to really think it through quite solidly and understand how that could potentially affect defects down the line.

Slide 37:

On the next slide it talks about this idea of, “Well now we know when defects are introduced most of the time, and we see here ideally they are mostly detected in UAT.” But the next thing I don’t like seeing is that production has about 21 percent left of the time where defects are detected. So it’s really important to do QA prior to production, but also know that production QA is crucial. And ongoing monitoring is very valuable, and ObservePoint’s a great tool for that, and an automated one.

Slide 38:

Next slide too, we want to talk about this idea of, “Okay, if we get them when we want them, what is the cost?” And this is just a piece of research that shows the cost of defects in production phase is about 100 times more than if you find it in the design phase. So really stressing the point that the earlier you find them, the better.

Slide 39:

Last here, is the idea of teaching the developers to QA while they’re coding in the same way they check their JS errors, we want them to do that for analytics, and just empower them to use standardized test scripts or ObservePoint.

Previous Article
Poor Data Quality Threatens Your Company’s Efficiency
Poor Data Quality Threatens Your Company’s Efficiency

This article discusses how how poor data quality can decrease efficiency in your organization.

Next Video
Matt Gellis - Data Integrity - Why Your Data May Be Wrong
Matt Gellis - Data Integrity - Why Your Data May Be Wrong

Take a practical look at the true impact of data credibility issue on the cost of analytics within an organ...