What is data science? / Data for bluffers #1

22 November 2021

Want to understand how data can help your growth, marketing and sales efforts but don’t have the time to dive deep?

Join Dr Ed Barter and Tom Ridges as they break down the world of data and AI into bite size nuggets

 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Tom Ridges

Hello, and welcome to the first episode of data for blockers short soundbites for growth team, to either interested in data or want to pretend they are. I’m Tom CEO of GDlabs, and each week I’ll be joined by ed our head of data science say, hello, ed. Hi Tom. So why another podcast, GDlabs. We’re an analytics business for growth teams. When we see a lot of confusion around the barriers and buzz words and the concepts and the data and AI space. So we thought we’d create something specifically for growth sales and marketing teams in an easily digestible format. We’re aiming for 20 minute pods release bi-weekly, as you can pop this in your ears, when you can a coffee and you leave with an understanding of the various concepts over the next few weeks topics such as why your friends have more friends than you. Statistically speaking, it’s an algorithm AI for growth teams this much more, but today we thought we’d start with one. We hear a lot. What is data science?

Ed Barter

Well, I wish it was that simple. I mean, it takes the science. As I imagine, most of our lessons we’ll have seen is a phrase that has really taken the world by storm in the last maybe 10 years, but in particular in the last five years. And I think in many ways it’s formalizing something that people have been doing for a while, or even a group of things that people have been doing for a long time. And at the same time, capitalizing on advances in computing, in academic research, into machine learning, AI, all of these buzz words that float around, along with data science, using those techniques to develop business value at the end of the day, it’s any way that you can use data, use the insights you can generate from that data to drive business growth or business sustainability.

Tom Ridges

How does that differ between, you know, I used to spend a lot time hacking around in, in XL and kind of arriving at various conclusions. How does it differ from?

Ed Barter

So I think, I think data science does two things differently from hacking around an Excel. First thing that data science is, is it makes a science of that process. So it not only introduces the methods and the means to do these things, but it makes that a scientific process where we are more rigorous in what we’re doing and what we’re learning. You know, there’s this, this phrase that everyone’s heard once, I’m sure that you can make the statistic show anything that you want to, and data science is really about understanding what the statistics are really telling you and converting those data into insights and into things that are true, or we know to be true or into processes that you can implement within your business to drive growth or make you more efficient. The second thing that data science is different, I would say, and is important in its growth over the last five years.

Ed Barter

It’s just the advancement of techniques. So the development of a whole suite of techniques that you can’t execute and Excel as you really need a much greater understanding of statistics, of mathematics and of computing to be able to do efficiently and correctly. And to know that you’re doing the right thing. And then also sort of related to that is the ability to do these things, what we call in production. So live. So you don’t have to have a human sat there, download some data, does some analysis makes a PowerPoint with the results in it. You can produce automatic metered systems that are taking the data in as it comes in in real time and producing the insights at the same time in real time, be that in the first instance in sort of what people might see in say like live dashboards. So places where you can go and see large amounts of information and get good visibility of what’s going on in the business or in a particular part of a business, but also systems that can then make decisions based on that in real time. So we have sort of automated pricing, for example, in Amazon. So you can set up a machine learning or a data science system, which uses the data from Amazon to then reprice your products on Amazon live. Okay.

Tom Ridges

Okay. Th there was a couple of words I’m going to pick up. One was the statistics word. And one, one was a machine learning word, but before I jump into those two, just, I think I’ve, I’ve got that. So when I’m, when I’m messing around in Excel the scientific approach here allows us to know what numbers we’re getting out for. Correct. As opposed to me hoping I’ve done it right without a statistical foundation, but then also allows us to build these, these automated systems. Yeah,

Ed Barter

Yeah, exactly. So maybe, you know, making sure you’re kind of getting the most value possible out of your data and at the same time, the value you’re getting out, you can apply efficiently. So you said statistics

Tom Ridges

In your explanation, like what’s the difference between a data scientists and a,

Ed Barter

So that’s quite a contentious question. I think I would, I would say that all staff decisions would probably now be badged as data scientists. Like that’s the truth of the matter that, and in many ways, people are getting rebadged as data scientists that what does a statistician based data scientists bring compared to other forms of data scientists or people with other data science experience. So statistics in, and this is my, I’m not a statistician, this is my crude and probably quite offensive break down to statisticians. But statistics to me is the is a lots of ways to answer the question, what were the chances of that happening? And that’s kind of what statistics is. And so that tells you, okay, we’ve seen this effect, right? We might’ve seen that. Lots of people in Bristol, for example, where I am, have bought the product and the statistics answered the question now, what, what was the chances that happening just by ran just by chance, by random chance and what were the other factors that might have caused that to happen?

Tom Ridges

Okay. So, and at the danger of being even more contentious, would it, would it be would it be fair to say all statisticians or data scientists, but not all data scientists, a statistician? So

Ed Barter

I think this way, if you are, if you call yourself a statistician and you’re looking for a new job, you’ll almost certainly be looking at jobs which are titled data scientists. Okay.
Tom Ridges

Ed Barter

Exactly. You don’t want to adore your data scientist.

Tom Ridges

The other word you mentioned, which again, I think is probably a topic for another episode is, is, is machine learning. Now, I guess there’s you know, you introduced the concept in one or two lines. How would you just summarize that in the context of what we’re talking about here in terms of what is data science? And then, you know, we can, we can drill into that in another way.

Ed Barter

So machine learning, again, another poorly defined term or widely used term, I think hunting is a better way, but broadly speaking, machine learning systems are algorithms that learn from data and use that to predict what’s going to happen in the future, or what’s going to happen right now in a particular instance,

Tom Ridges

How do you apply this know-how now, when you sort of understand what data science is like, how do we apply? You mentioned one, one way of building automated systems, but how else can people apply this?

Ed Barter

Ultimately, a definition that you’ll end up falling on a lot of the time is that data science is the thing that data scientists do here. Basically data scientists use data to answer questions. Now, those questions can be very specific, such as, you know, tell me how many leads this particular marketing campaign has generated. That might be an easy question to answer where the challenge for the data scientist is just go in to find the system that’s tracking that base of data, or build a system that tracks that piece of data, and then report the number back. You know, that, that question of how many leads are actually generated by a campaign could be much harder to generate. I, there isn’t like what people would call a single source of truth. So there isn’t a particular place that we can find that number. And instead we have to use models to estimate that number based on a much wider array of data sources that are available.

Ed Barter

So that’s the sort of a very specific question that data scientist, my answer, but data scientists also answer much more broad questions, which normally start with a dataset. So here is some data that we have on our customers or on our previous marketing campaigns. And the question is where can I find new customers using this? Or how do I find new customers using this? And the output of that form of data science rather than a particular number is sort of a recommendation I would say. So, so a guide for future behavior that comes out of the analysis of the data there, there’s sort of two broad categories of questions that data science is. I answer so very specific questions where you want to go and either get the data or using the data available, give your best possible estimate of a number. The second kind is okay, can you recommend certain actions based on previous data?

Tom Ridges

And I think one of the, one of the problems we see a lot, you know, when people have got their own data science team or they’re using contract data, science teams is not being clear on the questions the business wants to answer. You know, I think some, some a lot of people we’ve spoken with or say, right, we’ve, we’ve hired data scientists, we’ve got a data science resource give us some, give us some answers. And, you know, we always say, you know, you hear that here, what answers to what? So in your experience, you know, ed, having, having been a data scientist for many years, how do you, how do you get the best out of a business when they don’t know what questions

Ed Barter

Want to ask the best way is always to have a clear question that you want the answer to, but that question doesn’t necessarily, and often isn’t best coming straight from the business, but emerging from a dialogue between the data scientists and the business in producing an understanding for the data scientists of what the business needs are, and then converting that into questions. And then importantly, before the data science is done and clarifying what those questions are so that everyone knows what’s coming out of the process.

Tom Ridges

Yeah. So you like a lot of things that get overestimated, you know have a conversation, have a conversation, you know, that the data scientists can’t predict what wants to be answered, but they do know the best way to formulate the question. So if they’ve got a good understanding of the objectives, you can arrive at arrive at better quality questions, which help you arrive at better quality answers.

Ed Barter

Yeah, exactly. And I think also it’s important to say that that understanding then how those questions from a business point of view, understanding how those questions feed into actions for the business at the early stage always greatly improves the businesses experience.

Tom Ridges

Yeah. So rather than just putting up pretty pretty charts you know, it’s, it’s, how do we actually use it insight? How do we make it

Ed Barter

Exactly. And thinking about that in advance, the, if we get this answer, this is what we will do helps phrase useful questions at the end of the day, there’s often a temptation to ask questions that have, that you think will have interesting answers without really knowing what you would do with the information.

Tom Ridges

The just, just, I guess, changing gears slightly. One, one thing that I guess became obvious me when I first started working with and hiring data scientists is, is the different types. You know, I think I just, I naively thought it was, it was a particular role, but, you know, I, I, I quickly categorize them into sort of two sets. And it was the data scientists that have, have had a computer science background to you know, experts are using pre-built algorithms and optimizing and getting the best out of those. And then there were the, those who came from a more physics or math background who maybe took a blue sky approach and created their, their own algorithms and, you know, both have strengths and weaknesses depending on your objective. But is that a fair classification? Is that an oversimplification and under simplification in, in, you know, actually speaking to a data scientist as

Ed Barter

I think it’s a, I think it’s a fair sort of first, first level simplification

Tom Ridges

At first, I would try not to offend everybody out there. Who’s a data scientist. Yeah,

Ed Barter

Exactly. I mean, as these things are, people’s, you know, they’re experienced in when they work means that they are learning, you know, skills across, across the set. And I think then part of this comes back to data. Science is such a widely use term now that people doing very, very different jobs are called data scientists to take an extreme example. It’s relatively recently that people have started talking about data engineers in the data science space. So previously it’d be quite common for data scientists, main role to be manipulating data, to get it in the right format for a system that someone else is dealing with. Now, that’s now, thankfully from my point of view, being separated off into its own role, which has now be, you know, is often called like a data engineer. And that kind of simplifies the job of data scientist and more doing the modeling and the statistics.

Ed Barter

But even within that, there are many different approaches as you say, that people will take. And that’s really where that distinction I find. I agree with your distinction between sort of physics, maths backgrounds versus a computer science background and the ease at which, or the, I wouldn’t say it’s the ease. I think the natural inclination of those two groups of people in the way that they approach problems and what that does is that makes them more or less suited to dealing with different types of problem. I mean, I’m going to be slightly what may sound slightly disparaging here, but the sort of computer science backgrounds of being able to execute very efficiently, very quickly, a large number of different algorithms, and then test witches gives you the best results. That is a type of data science or data science system, which works really well for very sort of a broad set, but very specific tasks.

Ed Barter

A very a very good example of this would be something like image classification, a task, you have lots of data. It’s very easy to produce what we call training data. So where you have data, which you can so pictures in this case, which you can attach a result to, is this a cat or is this not a cat? You could, you could tell the computer if it’s a cat or not a cat. And then you can basically in the simplest possible way. And I know this is massively oversimplifying things. You could try loads of different techniques and see which one finds the most cats that are cats and the most non non cats. And that will choose you the system, which is then your system for identifying cats. And that’s a very specific question. If you ask a much more broad question to that same system of, okay, where can I find new customers, for example,

Tom Ridges

Good question.

Ed Barter

Which is the question everyone wants to stay. That’s not the sort of question you can produce an answer to by just trying lots of algorithms and hoping, you know, identifying the best one. Yeah.

Tom Ridges

Okay. So it’s almost problems we know about versus problems we don’t know about.

Ed Barter

Yeah, definitely. There’s like very specific questions that, you know, very diff specific types of forecasting where you, you really know if you’re getting it right or wrong and, you know, instantly,

Tom Ridges

I think what’s really interesting, you know, we’ve, we’ve set here to sort of, you know, if you’re responsible for growth, you know, in a business, this conversation about data comes up a lot and you know, what is a data science? I think, you know, really what I’m hearing from hearing you talk at is there, isn’t a simple definition of a data science, you know, there’s, there’s some, some, some buzz words you can use, but actually it really depends on what you’re trying to achieve, you know, what, what, how the businesses set up. And, and, and the outcomes that, that, that, that does that seem like a fair statement. If we’re trying to define what data science is.

Ed Barter

Yeah. I think, I think that’s a fair statement for defining what data science is, and it’s probably even more important for defining what you would want a data scientist to do, or as an ambition for when, if you had a data scientist working for you, or you have access to data science resources, telling them what to do. It’s important to realize that what questions you have answered or you want answered will change the approaches that they will take to the extent that different people could be better at answering those different questions.

Tom Ridges

Yeah. If we, if we were to wrap it up there and you had to give, you know, one piece of advice for someone who’s just getting their head around this subject you know, that they’re looking to work with data scientists or bring consultants in or wherever it might be like, what, what, what would you say to them? I would say,

Ed Barter

Never assume that when you say data science or data scientists that the person you’re talking to is thinking the same thing, always be more specific, be Pacific in both what you want them to do and the questions you want them to answer. There’s a, there is an unfortunate tendency to be specific in the tools you want them to use. And I think it’s much better to talk about the questions you want them to answer. And that in more broadly from a bit, from a finding business value point of view, think about how the results are going to feed into the business before you ask the questions and let that help shape your questions. Correct.

Tom Ridges

Ed, thank you for joining me for earth podcast. Number one, and we will see you all for episode two.

Friends in conversation | Herdify

Sign up to the Herdify newsletter