There's currently a buzz about how artificial intelligence (AI), specifically machine learning, will be the next great disruptor to industry. Some are going as far as to call it the new “electricity.” Similar to the rhetoric about the cloud years ago, many people are talking about AI but relatively few actually understand it. The truth is that AI concepts and technologies can be intimidating. They were intimidating to me, and I’ve been an emerging technology engineer for over twenty years. However, I wanted to have more meaningful conversations with our data scientists and customers about the topic, so I’ve dug in, shored up my understanding, and summarized my learning into a three part primer, the first of which is below. It’s not short, but I’ve tried to make it meaningful, comprehensive, and easy to follow.
One thing I can assure you is that you don’t have to be intimidated by these technologies. The concepts of artificial intelligence and machine learning are not difficult to grasp, but the potential for these technologies is undeniable. I do believe they will fundamentally change how businesses operate. AI and machine learning have the potential to drive significant cost reductions while creating customer experiences that were never possible before. This primer is meant for a nonengineering audience in the hope that everyone may benefit from understanding and embracing the opportunities these technologies can provide.
A business primer on artificial intelligence and machine learning
What is machine learning?
Machine learning is defined as the science of getting a computer to act without being explicitly programmed, but what exactly does that mean? The “machine” part of machine learning can be anything. It could be an app, a car, a robot, a server in a data center, or an API in the cloud. The “learning” piece refers to a program that gets smarter as more data is provided. In fact, machine learning works in a similar fashion to how our brains learn. Let’s look at an example.
Much of the human brain works on the concept of pattern recognition. For example, our brains use pattern recognition to translate the sounds someone is speaking into words we can understand. For a moment, think about this process of language comprehension as a sound-to-word translation program. Someone says “hey,” then your sound-to-word translation program runs and you respond back with “hey.” If the person we’re talking to is someone we’ve never met before, and we’ve never heard their exact sound for “hey,” we can still understand them. We are able to do this because our mind has created a massive collection of all of the different sounds for “hey” that we have previously heard and has indexed them into a “hey” word group. So if we meet someone new and they say “hey,” we scan all of our word groups for that sound until we find the one group that provides the closest match. This is pattern recognition.
One interesting thing to note is our sound-to-word translation program gets better over time. For example, when an American meets someone from Scotland for the first time, it may be harder to understand him or her at first. However, once they do, they get better at understanding everyone with a Scottish accent. How does that work? If our brains hear a sound we haven’t heard before (i.e., can’t find a word group), we say “I’m sorry I didn’t understand you, can you say that again?” We then listen more carefully for the new sound and scan our existing word groups again. When we do end up finding the right word group, we file that new sound into that group, which makes us even better at recognizing that word (or that accent) in the future. We’re now smarter at recognizing this new sound, but we haven’t reprogrammed our mind. We’ve added new data to our existing sound-to-word translation program.
At a high level, this is how machine learning works. We write the program once, but the program gets better over time as we add more data to it. The more data we give it, the smarter it gets. Amazon’s Alexa is a great example of machine learning. The sound-to-word translation program for most people is pretty good and operates at an accuracy of about 95 percent. Amazon’s Alexa uses machine learning to run a similar program and her accuracy is almost 94 percent. However, Alexa is getting smarter the more people use her. Right now she is pretty good at understanding my wife and me, but she is not very good at understanding the accent of our Ukrainian nanny, nor can she understand my four year old because her word pronunciation is still developing. Alexa will get better over time as she collects more data from each of these demographics. Alexa collects data in our house and in every household that uses her. Eventually, she’ll be better than any human on the planet at understanding people with various accents and dialects across many different languages. Who knows, she may be able to understand dolphins.
This is why machine learning is often used interchangeably with artificial intelligence. With machine learning we can simulate how our own brain works to allow machines to do things that only humans could do historically. Machine learning powers things like the natural language processing in Alexa and Siri, and it also powers things like self-driving cars. Self-driving cars are using the visual pattern recognition of highway lanes, curbs, other cars, and street signs (along with many other sensor data points) to understand their surroundings and make decisions on when to hit the gas, when to hit the brake, and when to turn the wheel. Similar to a sixteen year old who just got her license, a self-driving car gets better at driving the more it drives because it has more patterns and experiences (data) to draw from; however, self-driving cars have an advantage over the sixteen year old. Every self-driving car from a manufacturer shares one massive data set of patterns or experiences. This is why many people believe that, over time, self-driving cars will be much better at driving than humans.
Machine learning can also be used to do things humans can’t. For example, machine learning is used to return Google search results instantaneously, sequence the human genome, predict traffic patterns, and spot fraudulent activity on your credit card. You are already interacting with machine learning dozens of times every day without knowing it. Even still, the applications of machine learning are extremely limited compared to the potential use cases, which is why many feel it will be a big industry disruptor as it is applied to different industries.
How does a data scientist create a machine learning program?
When we talk about creating a machine learning program, it essentially involves doing three things:
-
Step one is defining the question we’re trying to answer. In machine learning, this is referred to as our machine learning task. A task could be “How much will that house sell for based on its location, square footage, and age?” Another task could be “Based on this image, is that tumor malignant or benign?”
-
Step two is the acquisition of a training set of data. In the examples above, a training set would be a list of previously sold houses alongside their sales price, location, square footage, and age, or in the second example, a set of images of previously diagnosed malignant and benign tumors.
-
Step three is using or creating an algorithm to identify a hypothesis function to answer the task in step one based on new input that follows the structure of the training data in step two.
That’s basically it. Data scientists are typically responsible for performing these actions. They need the domain expertise in order to offer up the tasks that machine learning could accomplish for an organization (i.e., step one); however, the more you and others in your organization are informed about what’s possible with machine learning, the more you can offer up additional tasks for the data scientists to investigate. The data scientist is also responsible for collecting the data for step two and writing the algorithm mentioned in step three. On the surface, writing the algorithm would seem to be the most difficult part, but a vast number of the applications of machine learning today use a relatively small set of already defined algorithms (I’ll touch on the main ones later). As a result, building a machine learning system is often more about picking an algorithm than it is about writing a new one from scratch. The biggest challenge for most machine learning systems is actually getting the training set of data (step two) into a clean format that can be used by these algorithms.
This spurs another question. What exactly is a data scientist? A good data scientist is one part computer scientist, one part statistician, and one part business analyst/domain expert. The latter piece is important to understand how the former two pieces can be applied to actually benefit a business. Someone strong in all three disciplines is a bit of a purple squirrel, but you can put two or even three people together with these various skill sets to form one data science team.
What's a machine learning algorithm?
In machine learning, algorithms are used to define the hypothesis function that predicts the future based on knowledge of the present (i.e., accomplishes the task). For example, if we want to predict how much a house is going to sell for based on its square footage, that’s pretty easy. We could just divide the average price of all houses sold over the past ninety days by their average square footage and come up with a multiple. We can then apply that multiple to the square footage of a new house and predict the price. However, in most cases, that’s not going to be a very accurate prediction. A brand new 2,000 square foot house in an awesome school district is going to cost a lot more than a thirty-year-old 2,000 square foot house in a crappy school district. Our hypothesis function needs to become more complicated because we need a function that doesn’t just factor in the square footage but the home’s age and its school district rating. Realistically, we should also consider the number of floors, the size of the yard, and the number of bedrooms. This is where algorithms are useful. For example, in this case we can use an algorithm called “multivariate linear regression” to help us come up with a hypothesis function that takes into account all of these different features (square footage, age, school, floors, yard, and bedrooms) to predict the sale price of a home. The algorithm helps us come up with this function by basically running thousands (or even millions) of derivations for potential hypothesis functions against the training data to find the hypothesis function that fits the data the best. Once we’ve identified it, we can use the hypothesis function to predict the future sale price of a house more accurately. This is basically how Zillow’s Zestimate works. Zillow created a multibillion dollar business by applying linear regression to home sales data.
As we get more training data (i.e., from more houses sold), we can add this new data and rerun the algorithm to get a better or more accurate hypothesis function. This is how the machine learns how to get better at its task, which in this case, is predicting housing prices. If we get access to other data points over time (e.g., economic indicators, average salaries, amount of traffic on the street), we can continue to refine our task by introducing these new variables and running the algorithm again to obtain an even more accurate hypothesis function. This is how algorithms are fundamentally used in machine learning. It should be noted that we didn’t actually write the algorithm. We just used the multivariate linear regression algorithm that already existed. The task (predicting the prices of houses) and the attributes we wanted to use to formulate our hypothesis function were specific to us. In your business, your barrier to entry for machine learning is not necessarily figuring out an algorithm, it’s figuring out the problem domain and the relevant data attributes and then getting that data into a format where the algorithms can be applied.
What are the applications of machine learning in business?
There are two broad categories of machine learning: supervised learning and unsupervised learning. Each has different practical applications.
Supervised learning
Supervised learning is when we have a clear idea of the output or the type of answer we want based on the input. Every example we’ve given so far is an example of supervised learning. We put in a sound, we receive a word. We put in a set of sensor data, we get a decision to speed up or slow down the car. We put in square footage, bedrooms, and location, and we receive a price.
Supervised learning can be broken down further into regression and classification models. Regression machine learning models follow continuous output. For example, a price will probably change depending on the size of a house. More traffic on the 290 probably means my commute will be longer. Regression machine learning models generally use a class of “linear regression algorithms,” meaning the outputs follow a linear path as the inputs change. In business, we often think of these as “trend lines” (for you Excel users), but think of a trend line that isn’t just two dimensional (i.e., square footage as a function of price). Think of one that could be three, four, or twenty dimensions (depending on how many features we want to include). That’s regression machine learning.
For potential applications of regression machine learning, think about the moments that matter in your business or department that you would like to accurately predict or even better, proactively address. This may include inventory shortages, cash flow, customer wait time, returns, maintenance tickets, and the close date of a big sale. Chances are these outcomes are a result of past actions or external factors (like weather, traffic, or economic indicators) or a combination of the two. Regression machine learning could be used to predict the timing or magnitude of the moments that matter.
On the other hand, the classification machine learning models are about putting things into categories. For example, if you give me a sound, I can tell you the word. If you give me an image of a tumor, I’ll tell you if it’s malignant or benign. If you give me an email, I can tell you whether or not it’s spam. In Gmail, classification is used to further classify an email as important, promotional, or social. Classification machine learning models generally use a class of “logistical regression algorithms,” meaning the outputs have a discreet set of outcomes.
Opportunities to apply classification machine learning are often seen in the repetitive tasks performed by knowledge workers: lawyers scanning through contracts looking for specific language, technicians scanning images for certain properties or characteristics, contact center representatives going through a call script or decision tree with a customer, and process or data analysts reading an email or report and routing it to the next step in a process. These tasks can often be automated through the application of classification machine learning techniques.
Unsupervised learning
The other type of machine learning is unsupervised learning. This is where we don’t know what the output is, and we want the machine to give us some suggestions or insights based on correlations it can find in the data, which may not be obvious to us on the surface. Unsupervised learning has been used to determine what gene mutations might be responsible for certain medical conditions. Amazon and Netflix utilize unsupervised learning to power their recommendation engines. LinkedIn and Facebook use it to suggest connections for you to make. Similar to supervised learning, the more data these unsupervised learning systems have, the smarter they become.
Opportunities to apply unsupervised learning techniques exist where correlations in your business could be valuable, but you may not know where to look. If you want to ask a more open-ended question about your business, unsupervised learning can offer a different perspective. For example, what attributes do all of your top customers have in common? Maybe most of them are veterans at their company for more than four years. A disproportionate amount of them might be female directors or are active LinkedIn users. This information could influence your marketing strategy. What’s unique about your company’s top sales performers? Maybe a majority of them have liberal arts degrees, speak a second language, or are active on a charity board. This could influence your sales recruitment strategy. Unsupervised learning can provide insights that may not be visible on the surface, and this might lead to discovering moments that matter, which you didn't realize existed.
Final thoughts
I’ll leave you with this final thought. It’s been estimated that only 15 percent of all data collected is publicly accessible, whereas 85 percent of the data in existence is within individual corporate data centers. This provides a massive competitive advantage for many organizations if they choose to take action by identifying the machine learning use cases for their businesses. Together, we can demystify and accelerate the adoption of this next great advancement in technology.