Bob Selfridge, TMMData - Improving Data Supply Chain

November 22, 2016

Slide 1:

My name is Bob and I work here at TMMData Incorporated. I’m excited to show you some ways that we hope that we can help you improve your data supply chain, automation, and overall process for data preparation. Probably the most important piece of this, to note to you, that our overall corporate mission is to meet data needs painlessly. I can’t stress that enough from the perspective of - there’s a lot of different ways to meet our data supply chain and data analytics needs. Unfortunately, many of them come with a lot of pain, agony, and relentless frustrations. Our ultimate mission as a company is to do that in a very painless way.

Slide 2:

Just very quickly about us, we integrate, prepare, and manage data and software services. We do deliver on a web based tool. However, that can be delivered as SaaS or On-Premise, or Hybrid. We have a lot of organizations as you can see for who we work with. Specifically, the finance and in some cases, medical and higher end organizations. They really do not want to have that information outside their private cloud. So that’s one of the ways we work. We’re headquartered here in Pennsylvania, close to Penn State University, and we have offices in Seattle, Baltimore, Philadelphia, and the Philippines.

Slide 3:

Enough about us. On to the topic of the day: improving the data supply chain. What my format today is going to be, I will walk through some high level information. I’m going to then flip back and forth from that to our product as we walk through way that we manage those same problems within, not just the supply chain, but also around data integration and ETL topics, automation, and finally a lot of space within the data preparation, data integration, and data wrangling space as well.

Slide 4:

It all can be pretty simple. We used to walk through the process and many of us still do for smaller organizations. We as analytics, these happy guys in the middle, we will send an email to our friend, the geek in IT, the geek will pull—I find myself a geek so please don’t think that’s a negative term—they will ask the geeks to get them some data, our geek friends will send us the information in Excel or CSV. We’ll do our magic work and then we will send that information out our supervisors. It’s a very linear chain, and it’s a fairly simple process, but again, it tends to be multiple rounds of communication, etc. What I want to do at this point is I’m going to walk you through how we would solve that very simple problem and then we build from there.

Demo Screen 1:

The very first step we’re going to do, is we’re going to walk through here to our tool, and it’s not liking me because I sat too long, so we’re going to go back in and we’re going to create a new data set. We’re going to go to our tables, as they’re called, our tables then will allow us to come in and we’re going to build a new table in demo, let’s say. Within there, we’re going to create a table called “Initial DB Table of Campaign Data.” We always and forever—everything you see with our systems always has data description items as well. It could maybe become helpful over time to play with multiple taxonomies in data governance, but we’ll talk about that a little bit later.

Finally, we got lots of different table types, very high level standard table types that’s no different than building Excel spreadsheet or tables within Access, or something along those lines from an import, drag and drop, no big deal, and from connection. We’ll talk about those others later, but this one will be one from a connection. Why? Because as we noted on our chart here earlier, we have a database somewhere where we want to connect to and want to get the data.

The ultimate goal here is around self-service. Notice we have connectors set up here, so I’m going to add a new one. Because we want to show you what it is like from endpoint to end-point. We’ll call this the DEMO Mysql Server Connection. The type is a connection, we know that. And what are we going to pick? Now I’ll walk through these in a little more depth, but you can kind of get an idea of the types of connectors and the folks we work with here. But specifically, we’re going to talk about Mysql. We have a product called Beacon. I would love to answer some questions on that if any has questions, but that’s not what we’re here for.

We’re going to go to our local server, and we just need to enter our user ID and password. Now, we will talk about this again a little bit later in regards to data governance and how to have access to this. But we do have accessibility into the data sets you’re going to manage in our tool from any SQL compliant server. You will notice I’ve got several data bases. We’re going to talk to our one that ties into our customer. So now I have a connection set up to a database.

Now, normally this would be the process that your geek friend would’ve done—to use a tool to connect to a database to start looking at data sets. What we want to do here is we’re going to look at some campaign data from the other system, so there’s lots of tables in there, but what we want to do is just pull this data in and we’re going to send it off to somebody else. Much like we would normally do. There’s a couple systematic fields that we’re not going to worry about. We can select which fields we want, which we don’t. Again, this would normally be a SQL statement that your geek friends in the other department are doing. I can filter this based upon certain fields or information—we’re not going to worry about it, we’re going to make this as simple as possible. Because I’m what’s known as a developer user, for all of you technical folks out there or other geeks. You actually have the option then to be able to modify and insert other additional items into your select statement.

What’s this is going to do is generate our first table, you’ll see it charted up here. It’s going to let me know in a moment that I’ve had a connection to my database server, I’ve got some records, 316 pulled in, and we’re just going to look at the data very quickly to give you an idea. Now, normally what would happen, is you would ask your geek friend, “Hey, I need some data from this data set. All I care about is campaign ID and vendor.” And we call vendor source. So in those scenarios, we want to be able to do quick transformations of that information because he would do that for us because our friend, let’s call him “Jimmy,” Jimmy our technical friend who’s going to take care of us. We know we don’t want anything past vendors so we’re going to get rid of those fields. And we want vendor to be called Source. We’re going to do a rename on that and we’re going to load that guy up.

Great, we have our information. Now what we want to do is we want save this as a specific view of the data. Because that what we want to send off to ourselves. We’ll call it “My View for Reduced Data.” We’re going to share this out to everybody in the system. Again, we’ll talk governance a little bit later. Now I have a view of this information that I want to send on. The final piece of that, if you remember our chart, we asked our friend, he gets the data out of the database, he prepares it the way we ask him to, then he sends it off to us. So we’re going to create a new export, and we’ll say, “Export to ME for DEMO.” What do I want to deliver? I want to deliver as email. Let’s say we wanted XLSX files, and we’ll create a file name of “FIleOut” and let’s go ahead and add the current date and maybe a time stamp on there as well. That always is helpful to be able to manage that.

Next, and then the last piece is: who do I want to send it to? I’m going to go ahead and send this to myself, and I’ll say “Here is my file form Jimmy,” my geek friend and whatever you want for your message: “Here you go.” When I’m all done here, I can save this, and I can run this, and you’ll see here in a moment it’s going to say it’s scheduled for delivery. Now I’m just going to very briefly bring open my email here, and we can kind of get an idea of how that works. So here’s my file from Jimmy, and yay I have my file attachment. Now, that’s fun, that’s really about as complex as it going to work with Jimmy, my technical friend, had I set him up earlier to do additional work for me manually. We’ll go ahead and jump out of there for a moment, so we can get back to where we started. Now we’ve got our export all saved up for us. So now let us go back to our presentation real quickly.

Slide 5:

So now that we’ve gone through that process, we’re not automating anything, we just made our way through. Then we add things like the cloud and the analytics. This is the next step we have in most of our organizations. We start adding tome systematic items and some tools that are out there that exist within the cloud. We certainly need to pay attention to those. Where would we be without our Adobe Analytics and Google Analytics and Core Metrics and all these other great tools that are out there? At this point, what I’d like to do is just talk really briefly about the fact that we still are doing it, we still do this by hand and we’re willing to give this up at this point because we have those additional insights and capabilities.

Demo Screen 2:

Real quickly, again, we’re going to go in again and we’re going to build you a very simple flow of data to pull from one of those other systems. If I come in here to tables and come in, I’m going to call this “DEMO Google Data” and you tell me if this makes sense to you, we’re going to pull it from a connection. I do have a Google Analytics Connector, we’re going to go to that and we’re going to use that here in a moment.

But I did want to talk through some of those other connectors real quick, so I’ll back track in a moment. We are heavily invested within the Adobe infrastructure, all the way from Audience Manager to Data Warehouse Extract, Adobe Target and analytics from that perspective. A lot of the web based APIs that are out there, both for social as well as advertising, as well as data storage, including a lot of other analytics packages, data scraping tools, survey tools, ObservePoint itself, to be able to pull metric data out from there as well. And then finally the different database sites. We pretty much covered the enterprise databases, then finally FTP, SFTP, Drive, Google Sheets, and email connectors as well. So we’re not going to really talk about that.

I do want to show you how a connector is tied to something in the cloud—the simplicity of pulling that information in. Right now, we’re going to talk about frequency, we’re going to only update this once. We’re going to use our data set. We’re then going to select which property. Now we should have an appropriate property and here’s our new one. And we’re simply going to pull out some data sets. So I want all website data, I want a visitor report, and the last piece would be to pull the last two weeks just to give us an idea. Again, this would normally be a process, right now, folks are mainly downloading or there’s other tools that you have to get IT involved to be able to generate those and our goal is to not do that. It’s to simply grab this information, pull it out as quickly as possible and allow it to pull in and work, and you’ll see that data in there from that perspective.

Slide 6:

That’s that, it kind of gives you an idea from purely the tools, cloud based tools and how we work with those. So then we get into the fun world of adding more stuff. I don’t know what else to call it, we got stuff. We’ve got big datasets back behind the firewall, we’ve got enterprise systems, and we’ve got lots a cloud systems that are floating out there as well that are picking it up. We’re getting more data that we need to manage, that we need to integrate, so we really need some automation, ETL, and data integration products to be able to do that. It becomes imperative now because otherwise your sources are going to outgrow your people to be able to get a lot of this work done.

Demo Screen 3:

Again, we’re just going to show a quick product item. We’re going to get rid of Google, you get the gist on that, and we’re going to talk about our campaigns. This is our initially campaign data, if you remember on this we brought data in, we modified the data being shown, we changed some names and then we shipped it off to our friend via email. What we’re going to do, we’re going to go a little further.

This is where our business processes come in, we’re just going to add a new business process. We’ll call this “DEMO Process for ETL and Automation” and finally, we’re going to pick our table here. If do DEMO, there it is. Then we’re going to add some tasks. The very first task is going to be very simple, we’re going to Import from MySQL. Remember, we built a MySQL connector. There’s an item here that says “refresh table” and all that does is re-pull that data in. That’s our first role. Our second role is going to be “Export the File” and if you recall, we already generated an export. We just now need to point to it to be able to export that information. There’s the file and export to me from DEMO. We’re just going to go ahead and close this out. So we’re going to go about as basic as we can go. Or we’re going to go back to our dataset over here.

I’m actually going to do something kind of odd, I’m going to delete all my records. Because I want you to see what that process is to move that information around. We’re going to do what we call a “quick launch” here. You’ll see your views, your exports, and specifically our business process. I’m going to run this guy. What you’re going to see, is it’s going to turn along and in a moment here it’s going to tell me that it’s all done. It’s probably already done, honestly, if I refresh data. My records are all back and if I check my email one more time, I get my messages.

We should see another message roll in. And you’ll see I have an additional file from Jimmy, the same thing we looked at earlier. Kind of gives you an idea of how that process works. Additional to that, as a nice add-on, here’s where we can do a lot of great business transformation roles. This is where the T in ETL comes in. And we’ll say, “fix stuff.” I just very quickly wanted to talk about this. We could base this on some criterias. Some criteria if campaign contains “Christmas,” then we want to create an action and we’re going to set the channel to “Santa.” We’ll be able to pull that information through. So that’s a role.

Obviously, we don’t want to fix it after we export, we want to fix it before we export, so those are the kind of things we can do. Then, without getting into a lot of detail, you can do anything here based on criteria from number of records, to day of the week, to the day of the month, etc. The goal here is to be able to provide that, provide very simple user based fixes, analyst friendly, so to speak, if you’re not a SQL person. Though if you do know SQL, we can work with that as well. So we’re just going to talk a little about what that means if we say “run” and the different types of business processing that can happen. We already talked about action, now I do want to note in here, any time you see that little wizard’s wand, that allows you to build other items.

So you see concatenation we can concatenate several fields, anything that is PS2L compliant can be utilized here so I can do string comparisons, string splits, string concatenations, I can do numerical analysis, sine, cosine, tangent, standard deviations, etc. Anything that’s valid there, we can do. And we’ve done some of those as deep as if it’s outside of two standard deviations for a certain data set, then we send a message or an alert to somebody. That’s a very simple factor there.

Call URL, if you do any restful integration, restful API integrations across your different tools. This allows you to do a direct integration to call them as needed. You’ve got your straight URL, your method, and finally all your parameters you send in. If you’re non-technical you can ignore that. The non-tech standard users within our tools do not have these options so we do not want to cause any consternation there. That’s the Call URL for restful; we do allow Custom PHP, Custom SQL, Custom Pearl, Custom Python to be run here as well. This allows if you’ve got someone on staff that says, “Please just let me type in MySQL.” We want to allow that. Deleting of data, pretty simplistic. If your name starts with a B, delete the record, so very simplistic.

Deduplication is one that’s the bane of many analysts that unfortunately they spend too much time doing deduplication on a lot of datasets. We’ve got a simplistic merge and remove duplicates process, you select which fields, and then the fields that you’re not basing your comparison on—what do you want to do with them, etc, etc. Sending an email, pretty self-explanatory.

To Table and From Table allow you to move data from one dataset to another. You remember we’re in our campaign data, so if I say From Table, and I pick my other table, let’s say Campaigns Internal, you’ll see it allows me to take all this data that’s in my other dataset and move it. Anything that matches, it’s automatically going to do, it’s pretty simplistic. To Table is the same deal. I’m pushing data from my table to somewhere else.

And finally Transformation. What do I want to do with campaign? I want to convert it, I want to clean it of all these characters if I have crazy characters in there. That allows me to do some simple transformations on the data. I’ll leave me crazy transformation there and I’ll save it as “Test Transform” and again, I want my transformation to go. So you can kind of see how you can build these processes out very ETL-esque in nature.

Slide: 7

As we walk through there, part of the next piece of this is—the good news is we got our marketing cloud—you made your selection, Adobe or Salesforce or Oracle or SAP or Google or HP or one of the many that are out there—those are the big guns we’re talking about there. And you have one cloud and it kind of rules them all. The reality is no, most organizations use multiple clouds, one offs, and other items that need to be in there for management. What does that mean when we’ve got multiple private clouds within the organization? Will they be using Adobe’s marketing cloud for analytics and segmentation? Will they use Salesforce's’ marketing cloud for our email and our CRM? And maybe we’ve got some survey tools like Foresee and some other items out there floating that we utilize for marketing efforts, social efforts for example. Clearly you need a way to manage all this information. What that tends to mean is: more clouds mean more tools. Which means more data, which means more users, which mean more access. Data governance and providing an organizational taxonomy so everybody is on the same page which is very, very important.

Demo Screen 4:

From that perspective, we’re going to talk just very briefly in regards to the different tools and what those mean. We’ll start with our customer levels. Within our tool, we have customers—now this is a TMMData layer, but I do want to talk about it because it’s important to note. First and foremost, every customer within our tool—I told you that big long number would come into play here later—every customer within the tool, has a customer-wide MySQL, FTP, SFTP, as well as GPG public key that they can use to encrypt data prior to sending to us. Secondary to that, we also have additional needs for password validation, password expiration, etc. So you’ll want to set your history, your requirement, and send out special to your admin for delivery email for example. That’s kind of at that level. Secondary level then would become user groups. Those guys are what allow us to do really fun stuff like protect our datasets. So we can allow specific user groups to have access to certain tools or not access to other tools. We can allow them group access to specific items.

If I look at our demo, let’s say I want everybody to have view, I only want everybody to have updates on my order revenue and on delete—we can go ahead delete our Google data. Then we also have forms on our system to be able to drive data from a manual perspective, not talking about those today, but it is a nice feature set.

The last piece we want to talk about here are users. We’ll hit those just very quickly. The only thing to note around users, other than standard user stuff, is you can add selections about which groups your users belong in, obviously. Secondary to that becomes your system access type. System access types become very, very important from the perspective of what they’re going to be able to see. As I noted earlier, developers are the only ones who have access to code level items, SQL code, PHP, Python, Perl code.

Your administrators are the only ones who can manage your users or manage the user groups. Standard users can do whatever you say they can do and finally dashboard/form users are named users that access certain elements within the tool like dashboards and forms. If you’ll notice here, just a note as well, we do have form two factor identification if your organization requires it. That’s certainly an area that, depending on your security needs, especially within healthcare or finance, those become areas that are problematic. And then the final piece is every individual user for any data set that they manage or create also have direct access via their own SQL user ID or password. So that does provide another layer. And the one I always have to note is user exceptions. Unfortunately, as we all know, no matter what we do, there’s always an exception to the rule or permissions so this is what allows us to take care of that. And allows us to provide some governance around that process as a whole.

Slide 8:

Last couple of items here: we talked about data matching and preparation and wrangling. Overall the ultimate goal here is to provide a much simpler way to manage these processes and we’ll show you that. One of the things that we try to do that differentiates us from a lot of our—not just our competition but also, due to the needs of the industry—is being able to combine datasets together to work with them.

Demo Screen 5:

So as you see here, I’ve got a dataset here called Order Revenue Data. It’s got unique identifiers, unique IDs, and some revenue. Obviously this data in itself would be what you’d pull from, for example: Oracle Financials or your orders from your analytics package or orders online, it depends on the infrastructure you have there. But this is similar to what you will see. You will see unique identifiers, date & time information, if you’re really lucky, campaign related information for measurement, and then finally some revenue. What we want to do here though is we want to make this more useful.

We’re going to actually generate an additional table, and I promised we would talk about it here for a moment. We’re going to say CRM + Revenue. We’re going to pull in some CRM data out of this. We’re going to put it in our DEMO, and we’re going to make this a horizontal mashup. For you technical folks out there, horizontal is a join, making it wider. Vertical is a union, making it longer, and custom lets you do whatever you. So we’re going to again say it’s a horizontal mashup and from that we’re going to say All Rows That Match—that’s an interjoin, just making a note for you there—and we’re going to talk about our Order Revenue Data as our first table and we’ll look for our CRM. Data is our second table (and I know because I know my data ID is my ID). Now, if I had a complex join, I could certainly come here and do additional matches, but we’re not going to worry about that because I know the data that I want to look at and we’re just going to make a single join.

What’s nice about this is I’ve got my match, I say, “Great,” I feel good about it, maybe want to change some names on here. So maybe we’ll call this OrderID. And we don’t need the second ID, which is our user, because we have it right here. And we can make changes, do filters if I only want people from certain states or certain areas, I can do that as well, but I’m not going to worry about it. So we’re going to save that guy and what you’re going to see is immediately I have a join dataset that allows me to do some pretty fun stuff. Now I’ve got data where I can start building real aggregates and real datasets to use elsewhere.

The ultimate goal with this is to do a report that makes sense. We’re going to do a state pull and let’s say revenue. Except for the value, I want my sum set to revenue. Notice it knows what I’m talking about here. I’m going to load this and I’ll have a nice aggregate of my revenue data. And I’m going to save that like we did earlier, as a view. And we’ll say “Chart View Data,” and we’ll share it with everybody in the system as part of the niceness of that data governance. If we want to generate charts off of this, we can certainly do that. We can add our states and we can add our revenue, and you’ll see we’re going to get some revenue dollars by state as it goes. Yay! So that’s one very nice way to do it.

A second one, just to note to you, is we can also save any of these guys as an Excel table. Which means direct linkage to Excel and I’m going to open that up one more time. Now if we just do that data, get external, from web, and we say, “Go.” Looks good, and what you’ll see is that guy’s going to jump right in there. The last piece is the same exact concept, only now we can utilize the same thing with Tableau or any of the above.

Slide 9:

With that said, in closing, as we walk through these different products and these different pieces—what I wanted to note—is what’s important about these guys is as we start building these aggregates, as we start building these systems, we want to stop allowing our data pools to be a dead end. We don’t want those guys to be able to move. We want to augment our enterprise systems and clouds with info from others and we want to reuse the intelligent work products that’s already been done by guys in the organization. Our main point here: merge the gaps between the systems, don’t let those items be a dead end. For the most part—kind of overarching—I do want to note that right now, this is the way items are laid out. You’ve got IT Owned, analytics and marketing used, and then the business consumed. What we really want to do, is we want to shift this to IT friendly, we want the analysts and the marketers to manage these processes. And finally, we want the business to own it. Why? Because they’re the ones with the data.

Previous Video
Matt Maddox, ObservePoint - Advanced User Journeys: Best Practices and Troubleshooting Tips
Matt Maddox, ObservePoint - Advanced User Journeys: Best Practices and Troubleshooting Tips

This session offers tips for maintaining User Journeys and for jQuery or JavaScript actions.

Next Video
Krista Seiden, Google - Best Practices for Testing, Adapting, and Personalizing the User Experience
Krista Seiden, Google - Best Practices for Testing, Adapting, and Personalizing the User Experience

Learn best practices for building a culture of optimization and personalizing the online experience.