What is an algorithm? / Data for bluffers #2

6 December 2021

What is an algorithm?  How do they compare to models which we hear about daily on the news? Why do they go wrong?

All this and more in episode two, presented by Tom Ridges and Dr Ed Barter.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

Speaker 1: 

Welcome to episode two of the data for Bluffer’s podcast . I’ve told GD laps , and this week, me and ed were talking about algorithms, what they are, what they’re not, why fat fingers are often used in describing them. So what we’ll do is we’ll go into the conversation and hopefully you’ll learn something as much as I did. So this weekend I really wanted to get under the covers of algorithms and models. And I think it’s something we hear about a lot, you know , I think especially over the last 18 months with the pandemic, I think models is probably been the most overused word in , in anything I’ve seen on the news. And I really just wanted to kind of dive in what is an algorithm. What’s a model let’s try and uncover a bit more about that. And hopefully I can leave with a bit more clarity. I’m going to start and say, what’s an algorithm. So

Speaker 2: 

An algorithm is quite a simple idea and algorithms are set of instructions, probably more commonly talked in terms of a set of instruction. That’s carried out by a computer. Many people would say that an algorithm is like a recipe, a set of instructions that tell you how to do something. And in the end, hopefully there’s the result that you’re looking for. You know, the meal you’re trying to cook , um, in terms of computer science, you know, algorithms are more strict than that in the sense that , uh, they are a list of instructions that we give to a computer and fundamentally dictate what the computer does. So everything a computer does is some form of algorithm, right? Some form of list of instructions that either you, as the user is giving it or a triggered by, you know, the , the programmer who programmed the computer to do certain things or anything really.

Speaker 1: 

I mean that back to, to the, to the human world, if you like, if I, if I get up and give myself a no, that’s a really bad example. I was thinking on the fly about trying to build a here with algorithm, but I didn’t work. So , um , okay. So if I think about that in , in human terms , uh, you know, if we’re sending out marketing campaigns or email campaigns or whatever we’re doing, I might want to send emails to people based on a set of instructions. And it might be something that I manually do that, you know, if I’ve heard from someone within the last six weeks, you know, if they’ve, if they’ve, if they’ve, if I’ve seen that browsed our website and I’m gonna , I’m going to send them, send them an email that is info that is in effect an algorithm, but all we’re doing in computer science is programming those steps. We’re giving an , a set of rules. And rather than me doing it manually, we’re allowing a machine or a computer to automate it.

Speaker 2: 

Exactly that. And both of those are at it’s sort of algorithm. They used to be an app. It may still be around , uh, it was called if, if this, then that, and it’s really is the summary of, of, you know, the fundamentals of most algorithms are a set of chain together. If this happens, do this, the SIM simple algorithms don’t even have to do that, they can just run the same thing all the time. As you get more complicated, you have to start off in the F grade .

Speaker 1: 

So algorithms list of instructions , um, and in computer science in the data world, there are lists instructions that we’re getting the competes to do for us , that I get. So where does the model fit into this conversation? Because I, you know, you hear them used it to change bubbly , and sometimes they’re just used you just hear one, not the other, like where would I use my models? How would I build my models? What is a model ?

Speaker 2: 

It was probably a lot harder to find than algorithm because it’s a word that’s used across different fields. And it’s used to mean very different things. So lots of different examples of models exist and in different fields, people will use the word model to refer to very specific things. It’s a bit like when we spoke about data science last week, when one person says model, that doesn’t necessarily mean the same as when someone else says model, what we can do is sort of say, well, what , what are the general properties of all the things that people do talk about as models and often, and in the data world, in particular, they’re parts of algorithms, an example, to take your example earlier of an algorithm that based on, you know, a client or prospect’s activity, you deliver some form of marketing. So an email to them, then the model in that case is the bit that tells you, okay, because they’ve visited our website or they’ve interacted with us on Twitter or something like that. And then ads adds those inputs up over some time. You’re going to send them an email in many, in some ways I think the best way of thinking about a model is , is this something that maps many inputs to one output? So in the data world, often those inputs are numbers, right? So it might be, we have some demographic information about someone. So we have their age where they, where they live, the distance, they live from city from a city. And we can also align that up with , um, sort of Tempur information. So where are we in the year? What season is it? That sort of thing? And we can put those numbers together and produce a propensity. So we’ve mapped lots of numbers to one number, this propensity, which is what’s the chances of that person buying our product in the next month. And they’re based on that propensity, we can decide how much marketing to push towards that.

Speaker 1: 

So w would it be fair then to say that all models are algorithms, but then not all algorithms,

Speaker 2: 

I would say so, yes. Lie , like models, models, all models do something that they’re , they can inform certainly the instruction in an algorithm

Speaker 1: 

Like trigger.

Speaker 2: 

Exactly. Yeah. So often they , they form part of the, of the algorithm. Um, and they’re almost an algorithm in an of itself. So this is, this is why algorithms and models quite hard to distinguish a lot of the time is that all algorithms , uh , in certain way, like a nested set of our algorithm, if that makes sense. So I need to go back to our, our , um, our example, our marketing example of set the final step in that is to send an email to someone, but to actually, if you’re going to implement that in a machine to actually send an email to someone involves an algorithm, which has to go to a database, get that person’s email address, come back, go to a , probably another separate database and find out what the email should look like, bring that information back in, and then combine those two pieces of information and send that out. So even though we talked about an algorithm is actually nest is each of those steps could be thought of as their own little algorithms.

Speaker 1: 

Got you . Yeah . Yeah. Okay. So we’re, we’re basically saying we we’ve , we’ve got , uh , a propensity model. That’s taking lots of different variables, you know, whether they’ve what they’ve done on social, how they’ve interacted with us, you know, all those types of things to give us some for propensity, that’s a model taking those multiple inputs down to one. We can then use that to trigger an algorithm, to actually perform an action for us.

Speaker 2: 

The there’s also an algorithm as a whole, which is that whole system, which says, okay, go and check what this person’s propensity is and based . And then if that is high enough, send them the,

Speaker 1: 

Yeah . Okay. That’s as clear as mud. That’s good. I think, I think what’s really integrate what subs done , understand them . I was really interesting that he’s , there’s not a, there’s not a, this is a model and this is an algorithm clear definition. There’s they kind of exist in both camps.

Speaker 2: 

It goes more complicated once you get into data science, because a lot of models require algorithms to what we call fit in the model. So in our propensity example, that would be okay, how are we going to decide what level of propensity is enough to send an email? And we would use a separate algorithm, which does some experiments to test different levels of propensity to decide on that value. And then the results from that are then imported into our algorithm that is running every day and decide to here to send emails to,

Speaker 1: 

Yeah. Okay. So hence your nest it as I guess, I guess it’s, it’s , it’s easy to see why, why people can get confused about them and how they get used interchangeably. But I think, I think for me, the takeaway, what I’m hearing in this conversation is making me think actually, rather than trying to understand the, what they both mean independently is actually how they, how the inter operate and how they kind of operate together. Really. That’s good. That’s making me feel better . So now that I feel better about what they are, they, you often, you know, we take any take any of the, you know, AI or data, bad news stories. It’s often followed by some sort of commentary about this , a black box. We did that with what happened, or, you know, algorithms are evil or, you know, sub variants of that narrative, but effectively pointing the finger at an algorithm. You know, and , and what we’re saying here is, well, not all algorithms are necessarily hand off hands off , so it’s not an hour with them that we can necessarily point the finger at. Um, but it’s also like, why do these things go wrong? You know, if, if we’re, if we know what we’re doing, you know, what wider algorithms go wrong? Why they’re not paying

Speaker 2: 

Perfect. I mean, it does, it does seem like algorithms only make the news when it’s bad. Right? Yeah . They already seem to get bad press, you know, it’s , it’s not news that the web has been running for well over three decades now and never fallen over completely. And there’s all of a sudden algorithms that make you sure that process carries on. But to get into goes to your question of why, why do they go wrong? Or how do they go wrong as a , as a basic rule? Like it’s never really the way it’s never the algorithms for itself, right. It’s fake . Or it w especially in the world of computer algorithms, you know , the computer only does whatever people have told it to do. So it’s , it’s always, it’s always the full of someone at some point in the process of telling her what to do. And in general, sort of what I say , I say the most clean cut version of that is when the PA whoever’s typing the code into the computer has done something that, where they , where they’ve not written the code in the way that they wanted. Right. And that’s caused a problem now in software development, actually that in itself is not really their fault , uh , because there’s a whole set of tests that sit around that that should have caught these things. And so there’s a whole set of algorithms for updating the code base. Those algorithms are actually the ones that have gone wrong, and that leads us to why, why do they really go wrong? So algorithms know nearly, always go wrong where they used, they used for something, they weren’t, the whoever wrote them, didn’t anticipate them to be useful , or someone tries to use them for something, or, you know, tries to give them date information that’s incorrect or information that’s far outside the scope of what was, what it , what was expected. So I think in this case, we could kind of summarize those two sorts of errors is what people call fat finger errors. So it’s like someone’s pressed the wrong button on the keyboard accidentally, and that’s caused a problem either when the algorithm is written or later on when someone was trying to use it.

Speaker 1: 

Yeah. Okay. And falling into that. Does it, does it also take into account data? That’s sorry, assumptions or facts that are no longer valid? You know, if I, if I wrote an algorithm 15 years ago , uh , you know, the world’s a very different place right now, or think things have moved on, would that fall into, into the fat finger category? Or is that a valid, is that a valid reason? Should we be anticipating that? And how does that, how does that side work? Because you know, these things stick around forever, as you said, the web’s been up for so long and there’s going to be our algorithms and lines of code that have been written decades ago that are still running. How do we, how do, how should we be thinking about that?

Speaker 2: 

So I think, I think that really comes into this idea of, you know, how as humans, how do we utilize algorithms and how do we get them to work for us and how you can never just cut a system off and let leave it and expect it to work forever. Whenever you’re writing software, it will always break one day and it’ll break one day because something it’s trying to talk to has changed and the dependency is not being maintained. So it’s trying to talk to something and it eventually the way it gathers that information is not accessible anymore because something’s changed. I mean, only if we think more, a much higher level, you know, only last year during the coronavirus pandemic, right. We learn that it shouldn’t be a surprise to people that a lot of things that people have been doing. And a lot of systems, people have been relying on didn’t work when everyone started working from home, suddenly all your algorithms that are implementing, for example, marketing timing, right. It’s a good example of this. Yeah . Completely went out the window because people used to have routines. They would get on a train for a commute and go to LinkedIn. Now they’re not doing that anymore. LinkedIn, the , the time where most people aren’t on LinkedIn moved during the day to later in the day, because people are kind of getting bored around lunchtime, don’t have their colleagues to talk to. So I sat there on LinkedIn because they want a little bit of a break.

Speaker 1: 

Yeah. What’s coming down the road with cookies and what’s happened with privacy and tracking with apple. You know , that, that in itself from a technology point of view from an algorithmic point of view would be really tough, right? Because there’s, there’s a whole lot of systems that have been built and, and the system underpinning it has changed. But what hadn’t thought about until we were talking now is actually that on top of mass behavioral change as well, that we’ve just witnessed, we’ve kind of got changes coming from both ends and systems that have been around for, you know , 15, 10, 15 years. They’ve now got to , to be updated, I guess, to operate in with both sides of that.

Speaker 2: 

Yeah, definitely. It’s a major challenge sort of in , in the systems engineering side of things. So when you’re, when you’re setting up any form of algorithm, understanding, or trying to anticipate when it will break, how it will break and what it should do when it breaks are kind of three of the most difficult questions, do you need to answer? I’d also, unfortunately that hopefully they’re three questions that you don’t find out that you didn’t answer very well for a very long time. We just kind of reduces the pressure in the, in the instance that really sort those problems out to take the example you said of , uh , third party cookies. Now there’ll be algorithms sitting around, which are doing their thing, which are feeding off third party cookies. And one, one day soon, I think it will be next year for Google. That stream of data is going to get cut off. Now, at that point, those algorithms can do really one of two things. They could stop working and send a load of error messages and tell you that they’ve broken. Or there will be examples out there where they’ll just keep running, but on rubbish data, because they’re not really getting any new data in, for example. So they’ll just keep running on the data they have. And that’s almost as a data scientists, machine learning engineer perspective. That’s the worst situation, right. Is when you’re getting answers out, but they aren’t giving you any of the benefits that you expected to get from running that process,

Speaker 1: 

Right? Yeah. Okay. See , that’s a , that’s a nightmare from data science and machine learning engineers. But I guess also if, if I’m a consumer of any service, it’s very hard for me to validate the results that I’m getting. Um , if I’m not sure what’s being fed into the process of you , like your , your , the results are linked directly to what’s being poured into the machine.

Speaker 2: 

Exactly. So, you know , people, people talk about garbage in garbage out. It’s basically in summary, right? If you put rubbish into something, you get rubbish out of it now in the data world, that’s really, really true. You can easily end up in a situation where you’re desperately trying to put your data in. You write your algorithm in a clever way. So it can deal with all of the problems with the data, but it , the way it deals with those problems is to erase any of the use or any of the value in the analysis of the data. And that’s especially true when you’re moving a system from like a research phase where you have quite strong control over the data to a production phase, the situation that we’re in, where you have a product and lots of people will interact with it.

Speaker 1: 

If you’re going to trust, you know , an algorithm and really people don’t trust algorithm , they trust products and services, right. That are baked in. But for the sake of what we’re , what , what means you’re talking about with trusting an algorithm, it’s really, if I want to get the best out of an algorithm, it’s really to make sure that I’m, I’m feeding in the best data. Um, and I guess that either means if I’m putting in a small amount of data, that’s maybe easier to review. If I’m putting in lots of data, I’ve just got to make sure that I’m confident that the data I’m putting in is of the right quality, as opposed to I’m going to Chuck those , the data at their system and hope it gives me the best quality answer the other side, because ultimately if I’m, if I’m hoping that the data is of the right quality, then I’m only hoping that the outputs of reasonable quality

Speaker 2: 

Domo, your result is only as strong as the weakest link. It’s never a good thing to have the weakest link as the data go right in the beginning. I mean, Google’s got a machine learning handbook and their sort of first rule is effectively, do you need machine learning, right? Do you need a big data solution or can a simple heuristic based algorithm? So it’s still an algorithm, but a simple heuristic based algorithms are based on sort of the knowledge that you already have, can that perform better. And often the reason it performed better is because it works on a much smaller set of the data. So data that is much more reliable and you have many more examples of data that you have for everyone. So for example, if you have, you know , first party data from your transactions with people that that’s data that you know is correct, because it’s your data and it has to be correct for billing purposes. So it got to have it as a test outlet sort of naturally that , uh , shows you that it’s correct. And that allows you to be confident in the output of your models and the output of your algorithm. And therefore in confident when you’re taking any actions that come out of that ,

Speaker 1: 

This is far from saying big data is not the thing. You know , I think there’s a lot of debt . There’s a lot of benefits to, to big data solutions. You know, we certainly, I think it’s the more popular phrase I should just say for those people who might not know the difference between big data and small data. When we talk about that, you know , the , the industry or the world has recently talked a lot about big data. That’s huge amounts of data from various sources, be that things like databases and spreadsheets, which can be structured all the way through to social media data. It’s to be a lot of, a lot of data, which really needs a lot of processing and it can be quite complicated to use. That’s typically what people would refer to as big data. When we talk about small data, we’re more talking about smaller, easy to access data sets that have something very specific , um, that can be used for very, very specific task. But I guess what I’m , I’m now, now here, and what now thinking is if you’re using small data in , in the relevant systems, it’s potentially easier to provide higher quality data in smaller quantities and therefore get higher quality outputs out of systems. So it feels like there’s a juggling act of sometimes you’re going to need just to throw a ton of data at something that assistant’s designed for that. But to do that, you have to be mindful that the outcomes might not be as accurate. If you’re trying to do something maybe more accurate in some instances, smaller data sets might be better. So it shouldn’t be , it shouldn’t be big data or nothing. There’s, there’s, there’s a strong argument for small data sets as well.

Speaker 2: 

Yeah, definitely. I think like it’s all about how, how do you get value out of what you have? And so whenever you’re building a database solution, you’re making choices about what data is going into that. And even when you go for a big data solution, there’s still a whole bunch of data that you just don’t have that you’re not including. So to think of it as like, oh, am I tend to taking an active decision here to reduce the amount of data or not is not really the right way to look at it. I think not the way that I think about it. I think about it as a, okay, I’m going to choose what data goes into the system. So I’ve got to be careful about what data on two .

Speaker 1: 

Yeah . You’re not starting off thinking about big data, small data, or the kind of buzzwords you’re building something and thinking, right. Well, what’s the best data to inform this algorithm that I’m writing , um , as opposed to let’s just throw everything in and , and write something accordingly.

Speaker 2: 

I mean, I think you’ve probably seen this there’s , you know , the idea of sort of what we call spurious correlation. So if you, if you Chuck enough data or something, you’ll find patterns there. Yeah . Humans are really good at spotting patterns. We see patterns everywhere.

Speaker 1: 

What’s the , what’s the funny website. It, I think it might be called spirits, correlations dot.

Speaker 2: 

Yeah. Spirits correlations. The one that’s often cited is like the correlation between films that Nicholas cages in, or the number of films, Nicholas cages in and swimming pool deaths in the U S right. Yeah. It’s a really high correlation, but those things just clearly aren’t really affecting each other. They’re kind of spirits , examples. There’s much more serious examples of this in finance, where, you know, big quant houses, they sort of, at the moment, they’re having putting time limits on their data. So once you play around with the data for long enough, you’ll get to find something that tells you, you can make money out of it. So now they have limits on how many times they’re allowed to have a go at building a model or an algorithm that makes money out of that data. And then that if that fails, they’ll bend the data.

Speaker 1: 

Wow. Okay. I think ultimately what I’m, I’m , what I’m hearing hearing here is , is, is similar to , to the last conversation we had in that a lot of these phrases don’t have a single definition, you know, in what we said , talked about data science, it was first, the first thing is don’t assume that when someone says, you know, what, what is data science that you’re talking about? The same thing. I think it’s the same with, with, with algorithms and models, they can be used within different contexts. But , um, but ultimately when we’re talking about either of them, it really comes back to the importance of good quality data, you know , and that doesn’t have to be a big organism , organizational headache, you know what we put brain wash with big data. It’s not actually necessarily, we’ve got to spend years trying to correlate and clean and tidy and whatever with our corporate data, it’s about the right data for the activity we try to do. So it’s not necessarily the bigger headache that it often is. Any, any last words from you?

Speaker 2: 

If I, if I have one thing to say about algorithms and models is that algorithms are everywhere. Don’t be scared of them. And probably most importantly, don’t blame them, blame the people in .

Speaker 1: 

Okay. So, so now hug her for the hug it out, where there was that weird algorithm. Great. Well, we hope you enjoyed the conversation that me and ed had earlier in the week. Hopefully you learned something, hopefully you found it interesting. And if you did, please subscribe to the podcast, share it with people that you think might find it interesting. And we will see you in two weeks time.

Friends in conversation | Herdify

Sign up to the Herdify newsletter