Help! My Data is Everywhere!

Welcome to an exciting journey through the multifaceted universe of data as we tackle our inaugural listener question:

“Help! My data originates from everywhere! How can I bring it together and trust it?”

Our listener, Todd, prompts a fascinating exploration into integrating disparate data sources and establishing trust in the collated information.

Join Data Dave, our resident expert, as he unveils crucial technical terms and enlightening insights, making this episode a must-listen for anyone keen on technology and data. Discover key strategies for efficient data consolidation, understand the importance of data reliability, and grasp how to build trust in your data sources.

Whether you’re a tech enthusiast hungry for knowledge or a data aficionado seeking advanced insights, this episode is packed with invaluable information and practical advice on managing and trusting diverse data origins. Don’t miss out on unraveling the complexities of data integration and trust-building in this intriguing episode.

Dive into the world of data with us!

Subscribe, listen, and learn as we unravel the mysteries of data integration and trust in this highly informative episode.

Don’t forget to send your questions to talktech@d3clarity.com and stay tuned for more enlightening discussions.

HAVE A QUESTION?
Ask Data Dave about all things data, cloud, or technology.
We'll be happy to answer your question on the podcast.

or send us an email to: techtalk@d3clarity.com

Published:

September 25, 2023

Duration:

00:18:27

Transcript

Alexis
Hi everyone. Welcome to Talk tech with Data. Dave. I’m Alexis, and I’m here to chat with my dear friend, Data Dave, about all things data, all things technology, all those things that I just don’t understand, and hopefully learn something along the way.

Data Dave
Hi Alexis, this is Dave Wilkinson. I’m the Chief Technology Officer of D3Clarity. I’ve been working in the data engineering space for about 30 years. So happy to try and help and happy to try and, to use your term, educate and have some fun.

Alexis
Our listener’s name is Todd, and Todd asked “Ohh my data originates from everywhere. How can I bring it together and trust it? We have multiple systems with different formats, spreadsheets, handwritten contracts, and even tribal knowledge that just is in one person’s head. What do we do?

Data Dave
This is a very, I loath to say but, common sort of question/ common issue that we run into when we start looking at business data as well. What we have to remember is that a lot of organizations have been running very successfully on a disparate set of systems, disparate data, and data is all over the place. This is a historical fact. This new data world where we want all our data to be cleaned and polished and nice and shiny and so on is actually a fairly new phenomenon.

The bottom line is that data is always everywhere. Data exists in people’s minds or it exists as knowledge in people. It exists as infinite information in books and in contracts and everywhere all over the place. Data is really just evidence that people have been doing something.

As companies have got bigger, as companies have got more complex, as our world has become more complex, the data has become more numerous, more disparate, more separated. We have to accept that as a fundamental first and not expect our data to all be in one nice polished little place.

Alexis
Oh my gosh, that sounds just like a headache.

Data Dave
It is, but it’s also true, and it’s also a challenge. The more standard you are in the way that you operate with data, the more understanding you have of that, the better you’ll behave, and the better your data will be. What we often find is that data is disparate and it’s all over the place, and the evidence of things happening.

Business is disparate with the nature of the business, right? So, a sale versus a shipment. I’ve sold something, and I shipped it somewhere. Are they the same thing? Are they describing the same thing? Fundamentally, they probably are. Somebody bought it, and I shipped it to that person. And it’s the same sale. What I’m trying to get at is looking at not only the data in raw form, but also what is the data describing? So, if you buy something Alexis, if you buy something from me and I ship it to you. Then the person that is Alexis should be the same person in the sale, which is the financial transaction, and in the shipment, which is the logistical transaction.

Alexis
Okay. But you just, it should be the same person.

Data Dave
So, we should, should be the same, but should be the same data describing the same thing. Often these systems are disparate, are separated. One of the things about bringing it together, bringing all this data together is starting to look at what we call the critical data elements and the critical data and entities that the information describes and start to be able to track this across the different systems. So as a customer, the same thing, is it a person, is it an organization? How do you describe it? How do you know that tt’s the same one? So now I know this sale to Alexis led to the shipment to Alexis. And that these two Alexis are actually the same person versus different people.

Alexis
A couple of weeks ago, we dug into data governments a little bit and you talked to me about, you know, kind of making sure the same things were the same things in the same places. This sounds like one of those situations.

Data Dave
It’s exactly one of these situations, making sure the thing that says it’s the same is actually the same. And now, once you start to do that and you start to define the structures, then you can start to say, well, “This spreadsheet or this system over here is describing the same thing as the system over there, right?” So you can start to say, “Is it the same or is it different and is it trying to be the same?”

People often talk about data lakes and data warehouses and bringing all the data together, which is great, and that’s obviously a good place to start. But in doing that you also have to start to say, does the data describe the same thing? Can I join it to each other? Can I get some semantic reasoning across it? We’ve got organizations that grew through acquisition and so they’ve acquired different businesses. One in the US and one in Europe or in the UK for example. Well, they’ve been run by different systems. So now do they both describe sale? Does customer describe itself in the same way? Can I join or, you know, bring these together in a way that makes semantic sense when I add them up?

Alexis
So, to roll that back just a little bit, you said that people like to talk about data lakes or data warehouses. I’m assuming those aren’t buildings and giant bodies of water.

Data Dave
No, but to a certain extent, they are, right?

Alexis
They’re giant bodies of water?!

Data Dave
Well, they’re giant bodies of data, right? So, the idea is that you’ve got this data lake. It is basically a place where you put a lot of data with little or no formal predefined structure to it.

So, your water in the lake has no structure to it. It’s not like you’re making it out of bricks, right? Not making it out of ice cubes. It’s flowing all over the place. So it’s got no structure to it until you take it out. It’s only when you take it out, you take out the glass of water and it’s got structure. It’s the shape of the laptop, right? And that’s the concept of that which is place all your data in one place and then you can look at it all. But it’s difficult to look at if you don’t understand.

Alexis
Yeah, that was about to be my next question. So, okay, I like this water analogy. I can understand this. You know, if you take it out, you can put it in a glass. So what do I need to do to get it into structure?

Data Dave
You need to look at it and you need to understand it. So, you start talking about what are the key elements in it and the key entities in it. And an entity is a thing, people, place, or a thing, right? A product is an entity. A person is an entity. A company is an entity. So, you start to refine or distill your water or your data, into these elements, these entities, these entity structures that give you meaning across your business, across your environment. So, you start to say a customer is a legal entity. Well, it’s a legal entity, therefore it’s got these characteristics. Therefore, it can only be a customer if it’s got these characteristics.

Realistically, if those characteristics are the same between two customers, then they must be the same entity. Therefore, this person has purchased from me twice.

Alexis
What you just described to me, there has to be a technical word for that, right?

Data Dave
The technical world for this is getting into, in analytic terms, we call it a semantic layer or a data model that starts start to describe the model that your data conforms to, and then you can bring in all the data that now conforms.

Alexis
Oh, I like that word, okay.

Data Dave
Or conforms to a certain extent with the model that you’ve put in place now. When you do that in sort of semi-abstract sort of terms you can then start to say the evidence that they have either conforms to or does not conform with this model and you can start to drive reason and drive structure into your data lake and then you can start asking interesting questions.

Alexis
Asking it interesting questions about the data that’s in it, or about what the data in it means?

Data Dave
Both. So, you can ask interesting questions about the data that’s in it. Because now that conforms to a model which conforms to your business, all this data describes your business. So, now you can start asking those interesting questions that start to say how many products did I sell in Brazil? How many products might I sell in Brazil given that last year in the month of May I sold this many products in Brazil? What’s gonna happen this year in the month of May, you can go down all these roads now that you can join all this data together and start to look at it across the world and in larger construct in larger contexts.

Does that make sense?

Alexis
Yeah. So let me, give you an Alexis rundown of what I heard. We’ve got a bunch of data everywhere. That’s the essence of data. You can put it into a data lake where it can all be together and define a data model saying this is what it should look like. And you can start asking what you have different questions to help you figure something out.

Data Dave
Exactly. That’s a very good way of putting it. That’s actually very good. The only addition I would make to that is that people do try that. The pitfall of that is you’re talking about an awful lot of data and an awful lot of places. So, what we usually do, what I often talk to people about, is the idea of actually starting with the question that you want to ask and then going and collecting the data that contains the answer. Because if you think about how to very large reverse engineer it from the question. So, I’ve got this interesting question that I want to answer so ask. Now what data do I need to put in my peg to answer that question?

Let’s put the model on just that area of the lake. So instead of saying I’ve got all this data, can I bring it all together? I say what is the data that I can answer this?

Actually, let me collect that data into my data lake. Now, let me answer that question. What’s the next question you ask and then you go do it again and your data lake grows now. And if you do it by putting this data model, the semantic layer of this data model in place, so it grows, you start to be able to ask ever more interesting questions and more and more of your data starts to coalesce into this data lake underneath this data model. If you do it with, let’s collect all the data first. You can end up with the colloquial term is the data swamp, right? So, you’ve got this idea where you’ve got this swamp of data. Yeah, think of the lake when I say lake, you think of this pretty landscape, pretty blue water. I can go on a boat and have a good time and everything’s great. I can even drink the water. It’s clear enough to drink even if I don’t know what’s in it. And I’ve just collected all this stuff together. I really don’t know what’s in it then you think of a data swamp and it’s murky water. It’s dirty, it’s polluted, it’s corrupt, it’s all sorts of stuff. How many people do I really want to let in it? I don’t want people swimming in that. I don’t want to drink it. I don’t know what they’re gonna catch.

So, if you think about all the data in your organization, you start thinking about the cleansing function, the governance functions, and the way that we can make this swamp into this crystal clear, nice, beautiful place where we can actually get meaningful data out of and we can trust it and we can understand it. And we’re not finding monsters in our data. Right? We’re not finding Nessy in the corner and other things in the swamp.

Alexis
Let’s say a client comes to you and asks you the question that our listener just asked us. Hey, I’ve got all this stuff everywhere. What do I do? What would your answer be to the client or the potential client?

Data Dave
So, my first question is often a question of what is your highest priority? What are you trying to answer? What are you trying to get to and you’ll usually get either, “I want to know my customer base better” or “I want to know my products better”. There’s something that they’re feeling pain with in terms of “I don’t understand my customer base don’t understand my products, et cetera.” There’s usually this point of pain that they’ve got this burning question. “I want to know my products better. I want to understand my products. I want to understand my customers. I want better cross sell up, sell across my customers. I want to know what my customers are doing,” or something like that. There’s usually one of these fundamental questions and what we then look at is starting to say, okay, well in what systems does data describing your customers reside right? Is stored whatever word you want to use. Now how do we pull that out, and what we also start to do is to define for them what one of these entities is.

Alexis
That’s a lot of stuff to answer your question. One time you told me that we had a client- The CEO wanted to know who to send a Christmas card to. I think you may have been joking about that, but when you were telling your background here, everything kind of like clicked to that story you told me. And I was like, Ohh, like maybe that’s what we do?

Data Dave
Well, it is to a certain extent, right, because that was exactly one of the questions we were asked by one of our clients- Our CEO wants to know who should be on his Christmas card list. It was a little bit of a euphemism, right? Because what he really wanted to know was for this multi-divisional multinational corporation, who are my most important clients? Now, that business is actually a very, very diverse business spread across the world. It’s manufacturing, hospitality, it’s all sort, all sorts of divisions to it. So, the construct of importance has a different meaning for each division of the business and it’s not purely financial. The largest customer might not be the most important customer in this division. That was the concept that we had to go into.

So, what is the definition of a client? What is the definition of customer for each business? What if the customer of this business is also a customer of that business? Have you sold from two businesses to the same customer? If so, how big is that customer now worth and you have to tease this apart from this multifaceted view of what does important mean and what does customer mean? So, in answering that question, they were taking months. In fact, when we first engaged with them, the finance organization (and it’s usually the financial organization that gets asked these questions) could not answer that question. They simply had no way of answering that question. So, we started the analysis and started the path to at least be able to help them get an idea of what their most important customers were and build that list.

Alexis
I mean, I guess that really speaks to what our listener was asking about how, you know, there’s so much stuff everywhere, and we have to figure out how to get it into an understandable place.

Data Dave
And it’s not just collecting the data. That’s the point that I’m trying to make. It’s not just collecting the data together. It’s also having the understanding within the data to start to say this data has meaning when it comes together. Actually, that’s what a lot of people do. They put all their data in one place, and they expect magic to happen suddenly. I’m going to have these insights because I put all my data in one place. Well, that doesn’t happen until you actually start to drill into that data and try and understand it, and that’s where you get into advanced analytics. Data mining, we talked about that once before as well. Data mining, advanced analytics even into AI and machine learning and that sort of thing to try and get understanding from this data. When is the data describing the same thing, when it is describing something interesting?

Alexis
This has been a great conversation Data Dave. I feel like I’ve learned so much just in this last like 15 minutes, so I’m very, very thankful.

Data Dave
You as well.

Alexis
Any final comments to our listeners or to our question submitter who sent in this question?

Data Dave
My final parting thought would be there is a journey here. There is a way to get through it. At the very beginning, historically we’ve been running very successful businesses on what we today would call very poor data. That’s true, right? That’s the absolute truth. The comment I would make is that now we expect cleaner data. We expect to make better decisions based off the data clarity. You know, to drive clarity and decisions, etc., there is a path here. There is a road through here that is very definable and very navigable, navigable is probably the right word, very navigable to work through, and that’s what we spend a lot of time though helping these companies who can’t see the forest with the trees because they’re used to looking at what they know of as truth. What they know of as their data, and they’re confused and have difficulty in understanding how to drive this level of clarity that we’re talking about. What we spend a lot of time though is working with them, trying to guide them through this, for lack of a better phrase, this landscape of the data that they have.

Alexis
That’s awesome. So just thanks for chatting with me today Data Dave. My mind has been expanded. I feel like I’ve learned a ton today, like even a ton that’s just going to help me with my job not doing technical things with D3Clarity. So, I’m very, very happy about this. Well, thanks for chatting with me.

Data Dave
You’re welcome.

Alexis
Thank you everyone, have a great day.

Data Dave
Thank you. Have a good day.

Ask Data Dave!

Listener questions are the best.
Ask Data Dave any question you have about all things data, all things cloud, or all things technology.
We'll be happy to answer your question on the podcast.

We will never sell, share or misuse your personal information.

Let's Talk.

An expert, not a sales person, will contact you quickly.
Usually in less than 20 minutes during business hours.

We will never sell, share or misuse your personal information.

Schedule a free meeting with an Expert.