How to Build a Testing Optimization Center of Excellence
As mentioned, I head up Slalom’s testing optimization and analytics strategy and today I’m going to share with you how to get started thinking about a best-in-class testing program. We’ll walk through components and roles necessary for a successful testing process. If you like to tweet, my twitter handle is @DRAnalytics and be sure to use #AnalyticsSummit. Also, a few of us are here to answer any question, so please post them to the room as I present and we’ll respond in real time.
Testing is critical to the organization’s success as evident by this recent Harvard Business Review article by Ron Kohavi, GM for the Analysis and Experimentation team at Microsoft. As a focus area was on available orgs, testing has grown up fast. These mature testing orgs are using the capability to measure potential business models, strategies, products, and services in addition to typical marketing campaigns. Many orgs are not doing it right, nor are they getting as much value from their testing program as they would like. A well thought out testing and optimization program focuses on analytics.
A testing center of excellence allows you to understand users and solve for their pain points by understanding needs. Why is this hard to do? Why do teams find it hard to use their testing capability to easily develop more desirable and engaging experiences? They lead to highly engaging experiences driving advocacy.
Surprisingly, we’ve never heard typical testing pain points across organizations. That said, some common themes emerge from different departments that own the testing program. Marketers are thinking about which tests they should do and what tests tie back to the channel campaign and performance. As they mature with testing, teams often need more deeply testing skilled people to help drive complex test design. Product leaders often complain that they find it hard to participate in the testing program. They might feel it’s hard to get tests live, at scale, and when they do, are afraid their tests will collide with marketing’s test and therefore render the results from both tests unqualified.
When a testing program is housing analytics, the biggest concern is getting the most value from the tool. Are users using the capability to the best of their power? Is the data flowing back to our internal databases for immediate and future advanced analysis? Start-up and unicorn-type orgs might want to understand the tools available and how to implement and use as they often don’t have analytics support.
Also, usually as early testing maturity folks not heading a rigorous process to support the amount of tests they want to run causes a lot of internal pain. These are a handful of general testing concerns, but what does a best-in-class testing program look like then?
As a testing baseline, these principles should be incorporated into a testing program. Focus on quality, not quantity. Focus on learning, not the immediate results. The best test design, not fast iterative designs. And easy to use by all folks in your organization. These are the hallmarks of a very mature testing optimization program. Let’s talk more about that really quickly.
We’ve developed a maturity model for testing optimization to help assess where we might improve experimental effort. It includes five stages of maturity, from early adopters all the way up to a testing innovation center of excellence across these four components: technology, people, analyses, and process. Todays’ presentation is around unpacking the process and structure maturity component. Who are the teams? What is a best-in-class maturity process? And what does it look like?
It starts with an understanding of the three primary teams involved: the business, analytics, and dev teams. Successful testing depends on this critical collaboration. While some steps are solely dedicated to a particular team, many steps are often shared responsibilities. That said, each team has a specific role to play in the testing process. Let’s jump in.
This testing process framework is a starting point to what a best process might be, high in maturity level. This is just a starting point for conversations, of course. The testing process is broken up into four distinct phases. Let’s review that first phase.
Great learning starts with great tests, and in particular, great test ideas. Identifying and prioritizing the right tests to do so, is the first phase of an optimal testing process. Most useful skills needed here are business acumen, influence strategy, and partnership attitude to get the right ideas out the door. As we move through the optimal testing process, I’m going to share some tips along the way. Let’s jump into step one of our testing process framework; test ideation.
Who has the best idea to test? Sometimes, owners of a particular experience or those closest to a page miss amazing ideas to be tested. Maybe they’re too far in the weeds or not thinking outside the box at all. I don’t know how many times I’ve stopped by designers’ working on a page who told me five great ideas off the cuff for improving the page or design without even being asked. This first step of the testing process demands that all ideas from anywhere in the organization can make their way into testing consideration. The tip for this first step is to create a tool that allows users across the org, at any given time and any given project they’re working, to stop any quickly—within a minute or two—drop an idea into the test funnel, which we can work through the process.
Before applying hypothesis testing or prioritization to test ideas, you need to have a back-of-the-envelope sizing analysis done. This, along with an estimated amount of effort, provides a view of expected results. Ideally, all tests will have a quick assessment run, but in reality, only big swings or moderate changes must be looked at for feasibility. Test tip idea here is to understand that we can use different templates to capture the results against different experiences.
Hypothesis testing is a necessary part of our rigorous testing program. A well-stated and documented hypothesis defines the reasons behind the test area. It forces us to think about why we feel the treatment will perform better than the control. Often we don’t find results to be true, but having this data hypothesis will allow for examination and deep-dive analysis that will uncover learning and provide direction on the next test design. The person who identified the test should have a sense of why. So the tip here is to allow the person in the tool format in step one to talk about their hypothesis, and that hypothesis terms, and the analytics team, who has the best skill for this, can turn that conversation into hypothesis testing.
One of the hardest steps in the testing process is prioritization. Which test will move forward? Expect to have a handful to hundreds of test per experience down the road. You need a strong governance team lead by the business, and thus mature testing orgs, and by the analytics teams and more mature orgs or a governance team made up of execs and members of all parts of the organizations. This prioritization team should have transparent protocols and ensure tests are aligned with departmental strategies.
The tip here in the process of prioritizing tests is to do some education throughout the organization. What are we focusing on this quarter? Bottom line wins or revenue or conversion? Are we looking at learning from the organization from the new product launch or something of that sort? Sharing this with the organization will allow the best ideas to come forward.
So once tests are decided, it’s time to plan the design. I’ve often see a lot of confusion around designing a test plan. Often tests are deemed invaluable after they run for one week because they just weren’t designed properly with the hold out group, random order effects taken into account, or just turned off before statistical significance were reached. Even for a simple AB test.
The most important skill necessary here is that the years of experimental design. A treated researcher looks at test design in a different way. Allow that experience to develop best test designed on the team. This is an important process that cannot be overlooked for any organization wanting to be near the most mature spectrum of the model.
While designing this simple AB test for marketing is straightforward, any more advanced test needs to reviewed or designed from experienced experimenters. Often after test results are in, we find that the test needed more rigor. Don’t allow lack of necessary test planning and design to destroy the goodwill you’ve made with the stakeholders. Be sure the analyst running the experiment performs a power analysis and announces expected test length. This will keep everyone on the same page and help with conversations down the road about when might be the best time to turn off the test.
Our test tip here is to educate partners as you go. The more the marketing and product folks that we are working with understand complex test design, the more interested they’ll be to work and make sure the tests are designed by analysts and in a way that will have effective and amazing results come from them.
Now it’s time to get the test live. We need the developers, sometimes internal on business teams, as well as test developers who can configure the tool who know more how to do that within all the tool sets and the databases. These steps are more operational, but shouldn’t be pushed through too quickly.
First you need to build the test. We do this with back end programmers. We’re using a home-grown tool. Or we work with the dev team on the testing team for configuration. Any dev structure and agile approach works best to receive any content, or make any changes including new algorithms and keeps the dev team on track for tests to launch. Our tip here for building the test is use agile SCRUM. A lot of content will come in from marketing pages, images and copy and some targeting pieces of information. So we have to put that together. The best way is to go through agile, run through the program, get the test live, and start working on the next test. Otherwise, the build-the-test stage is going to be clogged in the funnel.
We’ve got the test configured, it’s time to QA. Bring everyone involved to the table and do this together. Best would be in one room with devs leading the exercise. Unless the test is very complicated, this step can be completed in a matter of hours, or even less. Be sure to follow typical UAT type logic. Review by browser, customer segment, the usual suspects. Our tip here is to create a formal sign off. When tests go live, often if there’s an issue or problem, we see through QA that the test plan may not be organized or configured in a way that we were hoping to see.
The business should step forward during the QA process, and raise their hand if they see anything glaring, or anything that’s misunderstood with the QA process when the test went live. Analytics team needs to be on hook to understand if the tracking and tagging is going to work when we go live. All three together need to have formal communication process to work to make sure when we’re live, we’re live with tagging, and we’re live with the right experience.
When live, monitoring tests consists of two separate tracks issued by the analytics team. Right after going live, start looking at metrics the first hour. Many tools allow for this sort of review within their systems. Make sure you’re seeing typical breakdowns by a handful of metrics.
Number two, the next day, start monitoring data that’s fed to your internal systems using APIs overnight. Run queries. For example, are you seeing users fall into both experiences? That would not be good. A tip here again is to have the formal communication steps. This is critical. If I’m on the analytics team, and I’m doing monitoring, and I notice that there’s an issue with a segment not populating, or an experience recipe not populating to the level that we set, we need to be able to raise our hand immediately, solve the problem, restart the test, and not lose days, not get to the end of a week, or a few weeks test cycle and realize we have a problem.
It’s time now for heavy duty analytics. Let the team show you their chops. All the planning, execution, waiting time has been for this. Let’s make sure we can take it to the finish line.
Deciding when to turn off the test has been a major pain point for organizations for a long time. Business owners want the test off soon and results delivers ASAP, and that makes sense because they want to start realizing gains from the experiment. Those doing the analysis, though, often want more time to collect data allowing for more rich segmentation and deeper dives. A happy balance can be negotiated. But at the end of the day, we still need to be sure we’ve at least gathered enough data to answer the primary research question with statistical significance.
Allow partner teams to informally ask to be finished, but use those requests as educational conversations. Teach them the power of having enough data to do amazing analysis, which sometimes might just take a few more days. My tip here is to keep stakeholders on board. Set up communications. A daily email, or a post on the internet daily, specifically after a week or so after we’re running the last 75 percent of the tests, let’s send out emails about the results and why we’re not turning the test off yet so we’re not getting badgered with a bunch of phone calls and emails asking if we can get results right away. Stay in front of it.
I like breaking out analysis and insights into separate steps. It helps to focus the conversation about what general analysis is. Just a snapshot of the data with some initial findings that, quite frankly, can be automated. Think reporting around user per treatment groups, conversion rates, even product makeshift, all this is on our internal databases. If we set up feeds, that is less important than what’s coming after two days to a week of digging deep into the data. Tip here, segment your test results for additional learnings.
For example, if we see a treatment group performing 10 percent better than a control group, that’s amazing, that’s a big win, let’s take it, spend extra time here on the analytics team to dive into different segments and find which segments aren’t performing at that level, and why. Try to do the deep diving and ask the question why. But apply the test result to both groups. Instead of getting a 10 percent across the board, or 10 percent for the B group who is winning, you may get 2 percent of a smaller segment on top of that. Maybe another lift in another direction from another segment. If your tool can handle it, and if you’re mature and sophisticated in the way in which you target and use your CRM, you can realize all these gains with one analysis.
Insights, on the other hand, are the rich learnings from deep-dive analyses that provide actionable recommendations. Having a clear process setup to identity the effort is important. Make clear recommendations to grow the business requires that the analytics team that is close to your partners, and might even know what the next projects are. Analysts might want to roll this test result up with a handful of others, and do a quick analysis highlighting which content areas perform well overall. Maybe enough tests have been concluded to have a thought on true levels on that page are. In any case, the true value of any test are results within the insights process step. Insights should be shared widely, in formal scheduled settings, and open for all to attend. Not just so the media stakeholders of the particular current test.
A tip here on the analytics team bringing insights is typically going to be in charge of the test. Where in the analyses, at some point the analysis is turned off called to test, turned the test off, and has results in front of themselves. I suggest sitting with analysts on the team, sharing findings, and getting a robust set of recommendations for those who are less familiar with the program and have a higher level, a 30,000 view of the test and the result. They might come up with ideas that you haven’t thought of.
Test have been run, insights delivered, but it’s not over. One crucial step remaining often gets overlooked. And because of this, organizations lose valuable continuous learning. Developing and updating a learning library is one of the best ways to keep relevant knowledge in-house. Most tools haven’t done a great job at putting this out. I suggest getting started with a PowerPoint, one test per page, treatment groups clearly shown, and with results arranged by experience. Most people can flip through this and have understandings within minutes.
A tip here is have all the new employees run through this deck. After a year, you might have 300 tests run and a deck broken out by several different experiences, or maybe by business units. A new person to the organization should read through this deck in no time and starting internalizing changes and understand the curve or the arc of the story through testing and learning of the user. Very valuable to not only keep it in-house, but for new employees, and to review, specifically by the analytics team you review, and think of all the analysis opportunities.
Here’s a quick view of each step reviewed identifying roles. While organizations are all different, I found this framework aligned to separate roles between analytics, business, and dev, to be a valuable first step to beautiful conversations to figure out what would be best in your organizations. What we didn’t discuss today is the power of having a great governance team. Separating the program to process, championing the efforts, and the budget, but that’s conversation is worth of another complete presentation to layer on top of this.
So whether you’re testing or just thinking about after they’ve implemented an amazing, successful, and result-oriented online testing program, let’s say high maturity model, they apply those same principals to non-marketing and non-product departments and offline to learn more about how to optimize.
Imagine developing test around retail floor layouts, or optimal pricing using online channels as a research arm, or pop-up catalogue layout. How can we use these principles and experimental design within your organizations endless as evidenced by most mature testing orders.
My takeaway to learn today and apply to those other facets of your organizations to think outside the box and grow returns exponentially, or solve immediate pinpoints.
Thank you for your time today, and a big shout out for ObservePoint for hosting. Hope you have a great time at the Analytics Summit. Please reach out, or let me know if you’d like to discuss more of the testing framework we’ve discussed today, or learning in detail about the other components of the maturity model to help you grow to your best-in-class testing optimization program. I look forward to connecting.