Establishing trust in the key conversion data captured within any digital analytics tool is critical for organizations to actually use the insights
In this webinar, Peter O'Neill, Director of Analytics at Ayima, discusses how to:
- Evaluate parity for transaction data between back-end and analytics systems
- Set a threshold for data quality for transaction analysis
Fill out the short form to view the webinar on-demand.
Good day. Thanks for coming to my session here today. I'm Peter O'Neill and I'm discussing today data quality. Of course we are at the ObservePoint Virtual Analytics Summit and the topic I'm focusing on is how to identify if you have an issue or discrepancy between your analytics tools and your backend data sources, how to identify the cause of this discrepancy and most importantly how to resolve it.
First about myself, my name is Peter O'Neill. You can tell from the accent pretty quickly I am an Australian, although I've been living in London now since 2004. I've been working in the field of digital analytics since 2006. I am the director of analytics at Ayima, a digital marketing agency. I joined there a year and a half ago after they acquired my company, LeapThree, which is a company as a digital analytics specialist brought on board to complement their services via SEO, paid media.
I'm probably better known generally though as the original founder of MeasureCamp. MeasureCamp, for the poor of you that don't know what it is yet, it is a digital analytics unconference. We started this in London in 2012, it was very successful there in London and it's now amazingly thanks too a lot of amazing local organizers, local volunteers spread around the world. It's gone all across Europe, across China, New Zealand. It's in the US as well. It was held in 24 different cities around the world in 2019. I'm hoping for the same, hopefully a few more this year. The first one in Austin, Texas happens the end of January. So hopefully we'll keep on expanding and more of you can attend in the future. But finally when it comes down to it, I am a geek and I'm okay with that. Just to quickly mention, Ayima, my agency that I work for, we're a digital marketing agency. We offer SEO services, paid media, and of course analytics.
We are a Google Marketing Platform Certified Partner. There's about 150 of us around the world. Headquartered in London, but with offices, especially in New York, San Francisco, Vancouver, and Manila. But enough about that. Let's get into the data quality aspect of things.
Where I want to start, where I always like to start is just to repeat, to remind the whole purpose behind it digital analytics. And this is about the intelligence, the insights, the information, not the data, not the numbers themselves, but is it the the bullet points, the insights, the recommendations that come off the data and provide these to the whole business. The people who make decisions take actions throughout a business. The marketing people, the merchandisers, the product managers, the developers, the management team to inform the decisions and the actions that they are taking every day, every week, every month.
Next issue is the fact that pretty much every, implementation, it's not on any website, big, small. There are some bugs there. The code isn't on an all pages. It's not tracking every interaction properly. Naming conventions aren't quite right. This bug says problems and again, that means the data isn't accurate. Going beyond that, you're looking at different tools and different tools track data in different ways like different parts, like where the code's located, different aspects of the website, different locations on the page. They process that are in different ways, they use different definitions. So in AdWords and Analytics, the number of conversions attributed back to a click on a paid search link would never match up because they define attribution and who should get credit for that conversion in different ways. The click in an Adwords is not a session in Analytics and this applies to all tools.
And finally mentioning thing that while the website, the apps are very important, digital analysts, they're only part of the story for a business. So the actions, which are the critical actions which happen off the website, are typically not captured in back in the digital analytics tool. They're valid for the business elsewhere, but not within the tool itself. So things like a lead converting to a sale, a transaction being canceled. That information typically isn't available in your digital analytics tool. Many digital analytics data is never accurate.
So, with that depressing news in place, what does this mean? What do you do? And my simple advice is to basically to get over it and to move on because the focus here, in order to make smarter decisions, take smarter actions, to achieve better results, to make more money for the business is that the intelligence that comes from the data is more important than the data itself because we're focusing on is what data is useful, not what's accurate.
As long as you make the smart business decisions because you know that people are converting better in the UK versus the US or vice versa. Even though the absolute numbers, the exact details aren't correct, the insights you can get from it is enough to make smarter decisions and that's the part to focus on, not being accurate. There is some exceptions. There are some places where you can open the box, look what's inside and actually get some accurate numbers and these are the metrics where the data is captured not just in your digital analytics tool, but also in your backend systems, whichever systems you use. You know exactly how many transactions are placed on your websites, how many leads are transmitted through for you, how many job CVs you receive for your job ads. And when you know these numbers, it's a chance to actually understand how accurate your digital analytics data is. Can you trust these numbers or not?
I mean we'll never get to know exactly how many page views, sessions, or viewers, we actually have on the website, but we can verify if the data in your digital analytics tool for transactions is accurate enough to be useful, reliable, insightful or not? Because when it comes down to it, the stakeholders, the people making the decisions, they need to trust their data. The common rule of thumb around this is that if this data, not 100% correct, but if the data we can see in the digital analytics tool compared to your backend system has discrepancy less than or equal to, I say no more than 5%, we're okay with it. We can trust that. It's accurate enough to be useful. If that discrepancy between the back end systems and your digital analytics tool for those key conversion metrics is higher than 5% there's a problem there.
You should be able to get it lower than this. Something sort of is wrong, so you need to dig into it and to correct that, and that's what I'm focused on today. I'm going to refer back a lot to transactions just because it's a simple metric to use, but I'm also going to be using Google Analytics, if I'm referring to a tool at all. Now everything here should be agnostic. The approaches that I use could be used with any analytics tool out there, not just Google Analytics. I'll mostly mention transactions, any examples of ran transactions. The same logic applies to leads or job applications or anything else.
There are some caveats, things you must consider in advance. Very obviously first one there: the actual numbers, actual data must be bubbled through a backend system. Now again, this could be whatever data warehouse you have or whatever system you have, but you must have some numbers in that which can then refer back to the match against your Google Analytics data. For every transaction or every lead or every job application, you need to have a unique ID associated with it. An id can match in your analytics tool and your backend system. But then we've got more information because here if things aren't working out correctly, if the numbers are wrong, discrepancy is too high, we need to dig deeper. And for that we're looking for patterns, especially we need more details of each transaction. Details of the payment method being used, the country the order is being sent to, what products are included in that transaction. So we can dig deeper, look for a pattern to find the issue.
And finally we're looking at numbers. We must have a big enough sample size here, data size here that is set to make decisions based on. If you're looking at ones, twos and threes, we don't have a difference. It's just purely noise, not actually a pattern. I'd recommend this whole approach for work. If you're looking probably at least 50 conversions per day is a big enough dataset to identify where the issues are and what accuracy should be. With that in place, where you can start off is getting your total numbers out: Number of transactions from the back end data, 7,215. Number of transactions from Google Analytics data, the same date range, 6,468., And for this I'd recommend looking at a complete week or up to like between one and four weeks, but definitely complete weeks.
These numbers straight away. Well it's not great. It's a big difference there. We're actually missing 747 transactions, which is 10.4% total transactions. That's more than 5%, so something's gone wrong here. As a first step though, need to make sure you're doing a true like-for-like comparison, that you're comparing apples with apples, not apples with oranges. There are reasons which can cause, I've seen it myself, the comparison to not be totally accurate. First of all, you need to make sure that each transaction that's recording is the same date in both systems. I mean in GA, it records a transaction in the date in which the transaction is placed according to the time zone set within Google analytics itself. In the back end data it should the same, but I've seen times when I've been given transactions against the date in which the order was processed or even when the order was actually sent out to the customers.
Those days don't match up. Therefeore numbers won't match up, you can get the back end data with transactions in the day in which the order was first placed. In a similar way, time zones need to match up as well. It's a simple little thing, but if you're looking at transactions by day and the time zone for one of them is based on daylight savings and the other system is not, numbers won't match up there. They're off by an hour and you'll probably add some and minus some on both sides of the day. But we want to get it exactly accurate, all the way through to a true like-for-like comparison.
Then the backend data. That 7,215 we know it's accurate, but what does accurate mean? What are we defining as a transaction here? Because to do the comparison, we don't want to include some transactions. We don't want to include internal staff orders, call center orders, store orders. They should not be appearing in your Google Analytics data because they're not transactions you credit for for your marketing, so therefore these transactions shouldn't appear here. We just stripped them out of the back end data to a proper comparison. In a similar way, it's possible that you may actually not have the correct transaction count because you have duplicate transaction IDs. This may sound silly, but I'll send just a couple of weeks ago, set of transactions, transactions by day. The actual extract from the backend system had multiple rows for each transaction for a particular reason. Make sure that when you're looking at transactions per day it's a true count of unique transactions IDs. The transaction count from your backend data might be too low as well because it needs to include canceled orders. I've seen numerous scenarios when I get a list of transactions through from a client, from the backend systems, but actually it already has all canceled IDs removed from it. And for the reporting from backend and make sense, I mean, it's not a real transaction It's been canceled, it doesn't exist anymore. But for comparison against Google Analytics, I need to have that number so I can compare transactions in both systems.
Similarly in Google Analytics, the numbers can be inflated there as well. Duplicate transactions in GA is a very much a known problem, which I'm not going to cover today. Look this up. Google it. There are people out there that have written blog posts on how to identify duplicate transactions and how to resolve the issue. Double check for yourself if it's a problem or not. And go through the fixes they recommend. But also your test orders, your internal orders should also be removed from Google Analytics data. We only want to see true, live external customer orders within the day we're looking at to evaluate performance across your marketing sources. Then what then is reducing your Google Analytics data? Tracking issues. And that's, that's what we're looking to here. Why is the data being sent through to Google Analytics not correct, not complete enough. There are reasons for it. I'm going to go through some of these potential reasons.
First things first though, you want to get from your backend systems an extraction of transaction IDs. At the same time, the same date range, that 1-4 week period, the same sort of extract of transaction IDs from your Google Analytics data. Go into Excel, which or the tool of your choice and compare the two datasets using I've got an example formula there. They had to look up one against the other. And you want to go both ways, looking to see which transaction ID is captured in both datasets and which ones are missing. We can see for the back end data that 4,904 transactions were captured, 474 missing and if that's the case, got a discrepancy there of 8.8% again, Mullin, 5% not good enough. [inaudible] to what's causing this, we've why are we missing these transactions here? Interesting to note, in Google Analytics it's missing some transactions as well. There's 26 transactions appearing in the GA data, which don't appear in the back end data. These SH ones we're missing, are they store orders? Internal customers? They should be excluded as well. It's possible, looking back at the previous slide, these two numbers here, you actually get two numbers that are very similar. As a total, even if they're sort of level within 5%, both numbers being totally wrong. They inflated data on the back end which includes internal orders. It's missing the canceled orders. The GA has extra orders from duplicate transactions, but also it's missing a heap of transactions, but just because they both adding in missing transactions, the numbers actually balance out. Goodness sense. As you can see, your total orders are actually close to reality, but it's also really bad because you can't trust it. The numbers are wrong, just for different reasons. When we get numbers we can trust and rely upon, this process, if you can't get past this stage here, is still required. If you are here though, less than 5% to be honest, I'd stop there. There's better things to be doing with your time.
Okay. A good first place to start. Just comparing transaction number, getting a rough number of page views of your order confirmation page. As we can see here, 6,471 order confirmation page views, 6,468 transactions, it's not identical difference of three I can live with that. What we're looking for here is to make sure that transactions are treated correctly on every view of the order confirmation page. If you found there were a lot more page views of the confirmation page than committed transactions, the most likely cause is that you're seeing different measurements being fired off one for the page view, one for the transaction, and for some reason the page views are sent correctly, the transaction's are being sent at a later point and not always being recorded. That gives me a clue as to where to start looking to see why these transaction codes are being sent later and how to fix that so they send at an earlier stage.
In the reverse, if the number of page views is less than transactions, it's just that the transactions are being fired on different pages. And again, I want to look into what this is, what is it he pages are, how are those pages being accessed, and if there's a missing page there, which should have had a transaction fired but it's not and that's causing the problem. This is included potentially in this quick comparison. Next thing to look at. Look at the daily discrepancy. Essentially different tools. Because I mean, if it's no longer that 8.8%, it should be around eight, nine 10% every day, a bit of noise. But it shouldn't vary that much. If it is very erratic going from 2% to 20% different days, well that can also give hints to what the problem is. Obviously, why can it be so erratic?
Alternatively, if you're not seeing a clean pattern like this, or in the step change in between. Well, it's great saying step changes like this. Every analyst loves seeing it cause it gives me a date when things changed and I'm going to be able to understand on that date what happened. Did the website change? Did the tracking code change? Was a new payment method ntroduced the website previously? Did we launch in a new country? This date could actually help me identify, what's caused it to go from being a not great but not terrible, 6.5-7% after to a really bad 12%. Then we start looking at things with the transaction data from backend systems and the additional information that from those dimensions I mentioned earlier. And I just want to start with payment methods. I looked at what the discrepancy is by different payment methods. And if you can say a percentage like this, you're in a very happy place.
You can say actually the discrepancy is actually usually 4 or 5% within our margin for error. I'm okay with all those, but there's one payment method, which for some reason we never had transactions for. I've got a place to start digging. Now, possibly with this one payment method, the visitor isn't sent back to the confirmation page, or they go to a different confirmation page that doesn't contain transaction codes. Or when they're returned to the confirmation page, the information necessary to populate transaction code transaction isn't sent through at the same time. These are all things you can fix. Now you've narrowed down where the issue is. And if your back end data is giving you nothing at all, you can check the patterns directly in GA or Adobe Analytics or any other tool. The trick here is to create a goal or a segment for the very, very last stage before the conversion, in this case of transaction happening. So maybe an order summary page, looking at that last stage versus the transactions themselves looking at the completion rate between stages and again at that point the pattern should be very consistent across all dimensions, across all browsers, different devices, different countries. It shouldn't vary by much at that very, very final stage. It's defined here that the completion rates, usually 85-90%, but for one browser it's down. It's down at 0% or at 50%, you know the problem, you know the tracking code is working fine in most cases with that one, two exceptions and that's when you dig deeper, fix that and fix the discrepancy.
So what's some typical causes? Typical resolutions? I said a key one I've seen again and again is the payment method. The visitor has to end back up onto the confirmation page for every payment method. And if necessary, to pus the transaction code through to populate it, it has to be sent through as well. If this isn't happening, fix this. That's the easiest fixed from a discrepancy. Duplicate transactions, have they gone in two? It's a very common cause for numbers to be wrong in GA. Basically what you want to do, is only fire the GA code if an actual transaction is processed, not just the viewing the order confirmation page. There were some solutions out there using tag management tools, some cookies to try and improve this, but ideally have your developers only trigger transaction code when I true transaction is processed.
Okay. Internal conversions mentioned earlier should be stripped out of all systems, both the backend system and Google Analytics when you're actually doing a comparison. So make sure you're actually doing a true like-for-like comparison. You need to make sure that you're tracking all conversion actions. So for transactions, or confirmation pages, or versions of confirmation, must be tracked in the same way. But for other conversion actions such as leads, you could have three, four, eight different forms that could be completed to submit a lead. Make sure that every single form is being tracked in the right way, otherwise that's the source of the missing leads and conversions. Make sure you include your canceled orders in your backend data. When you compare it with your analytics data, otherwise numbers won't match up, it'll actually look better for you.
It won't be closer. But it's inaccurate and therefore not useful. And another big one, which we see a common basis is that transaction code isn't being fired quickly enough on the confirmation page. I mean the confirmation page, someones' begun the process, placed an order, or submitted the lead, or submitted a CV, they finished the process. there's a natural exit page and if the code required to trigger the conversion isn't fired for a few seconds as the page has to load fully first and get nice pretty pictures on it, well then the code won't always get fired. You'll miss out on some transactions. So this is a case where you really want to get your transaction code, your analytics code as high on the page as possible. Fired as early as possible, preventing people from leaving if need be the page until the code is fired to make sure you're capturing all the information, every transaction, every lead, every job application submission.
There's some common causes and this is all about trying to get that discrepancy under 5%. But 5% where dioes that come from? It's almost become a rule of thumb. I've used it for a number of years. And heard it from other people as well. If you can see it under 5%, we don't worry about it. But what if it's getting worse because 5% is our discrepancy now, is that increasing in the future or was that increasing from a few months ago as well because we'll always have issues at play here. Do you have intelligent tracking prevention in place? ITP with Safari and other browsers where they're blocking code from being fired? We've got GDPR in place in Europe and other regulations around the world changing, which is influencing companies to change how they track their customers, their visitors and did not track everyone as if consent weren't given in an advance, which means again, we're just going to lose some people, lose some tracking.
Ad blockers, they can be there for good reasons to stop you being tracked around the entire internet. But if they're also picking up Google Analytics, Adobe Analytics, other analytics tools and stopping them from being tracked as well, we know that sample of people who we don't track is going to get bigger and bigger and bigger. I mean overall, people are getting more, more likely to actually put ad blockers in place, to regulate cookies on a regular basis because the perception is that cookies are bad, marketing tracking is bad. All of these put together just just makes it probable that we're going to get less data into our analytics tools. With that in place, what it's going to mean is that 5%, which we're used to, has been in place for years, may get bigger, 8%, 10%, 20%, if not more, of your analytics data with the perfect tracking code in place will be missing. That is what it is, possibly, which in the end we're going to have to live with that and just readjust what we define as accurate and as useful. And on that really positive notes, thank you very much. My name again, Peter O'Neill from Ayima. Please do get in touch. I'm happy to answer questions now. Our send me it through an email. Enjoy the rest of the conference. Thank you.