Lamers Can Use AI Too
I have always wondered as king of lamers how I can spread my empire.
Is it time for me to get an army of lame robots to spread my message?
(It is.)
But, I don’t have the energy to go and program every little lame-bot myself.
Hmmmmmm…
What to do?
Enter Machine Learning.
What is Machine Learning?
It’s pretty neat concept where machines learn by themselves from examples we provide. In fact, if you have ever solved a captcha or reported an email as spam, you probably helped in training a machine learning (ML) model.
What does this mean for me?
I don’t have to program the lame-bots myself.
I just show them examples of being lame (data) and they will learn to be lame – on their own!
Time for some formal definitions
Machine learning is the science of getting computers to act without being explicitly programmed. (Arthur Sameul)
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” (Tom Mitchell)
Simply put you tell your computer to learn to do something (task T) by looking at data (experience E). Obviously, your computer needs some metric (performance measure P) to judge itself by so that it can adjust its learning approach accordingly.
For example, say you want to train a machine learning model to look at a picture and tell if it’s a cat or a dog. Here the task is of classification. The experience E will be hundreds of pictures of cats and dogs with correct labels. The performance measure could be how many unseen pictures it correctly classifies.
Types of Machine Learning Problems
Two most common problems solved by machine learning are regression and classification problems.
In regression, you train a model to predict a continuous variable like rainfall, stock price, gas price, etc.
Classification problems are pretty self-explanatory. You teach your computer to classify stuff. For example, given a picture classify whether it’s a cat or a dog.
Or, given an email classify whether it’s spam. Or, given a review, classify whether it is positive or negative (sentiment classification).
The classification does not have to be binary – it can also involve more than 2 classes.
For instance, a very famous example of machine learning is handwritten digit recognition where the model looks at a handwritten number and classifies it as one of the digits in 0-9.
Types of Machine Learning Approaches
There are three types of machine learning – supervised, unsupervised, and reinforcement learning.
Supervised machine learning is when you give the correct answers (called labels) with the training data (called features). For example, to predict whether a picture is of a cat or a dog, you will give a dataset of labelled images to your model so that it can learn the difference between the two.
Unsupervised machine learning is when you just give the data without the labels. The model finds out the patterns by itself. For example, a popular use case for this is customer segmentation where people with similar buying interests are grouped together to send them relevant offers and deals.
Reinforcement learning is the most fun, in my opinion. In reinforcement learning, our agent (the model is called the agent here) learns by interacting with the environment. We all remember the snake game where the snake grows by eating fruits and dies when it bites its own tail. A reinforcement learning application onto this would be to train a snake (agent) over thousands of game plays to keep growing and not bite its tail. How?
In reinforcement learning, there’s a concept of rewards. Whenever our agent does something which we want it to do, we give it a positive reward. Otherwise, we penalize it. Eventually the agent automatically learns to change its moves to minimize penalties and maximize rewards.
Here’s a super cool demo of agents learning to play hide-and-seek
What You’ll Need to Learn Before Starting with Machine Learning
Before starting with your own machine learning projects, you will have to spend some time learning the tools you will be using. This is essential to do hands-on projects by yourself as well as to understand code written by others.
Why Python for Machine Learning?
- Python is easier to learn than other programming languages.
- The code is very readable and looks almost like English.
- Moreover, due to its wide use already, it opens up more job opportunities.
Follow these resources to build the foundation you need to get started with practical machine learning:
- Kaggle’s Python Course – Python is the most widely used programming language in data science. So, this is where you should start. This is a text based course spanning over 7 lessons. The expected duration to complete it is 7 hours. By the end, you will know all the basics required to understand python code used in machine learning.
- Kaggle’s Pandas Course – Pandas is a cute name given to the data analysis library built on top of Python. Before you can begin training models on data, you need to be able to read it from different sources and format it as per your needs. This course will teach you everything required for data handling.
- Learn NumPy – NumPy is a specialized python library used to handle data for mathematical operations. Whenever you are reading somebody else’s python code, you are likely to encounter NumPy being used alongside pandas. So, learn it now.
- Kaggle’s Data Visualization Course – Data is numbers, but numbers aren’t fun on their own. In order to explain to other people what they are seeing (or to understand the data better yourself), you need to make some pretty plots. Go through this course to become a pro at data visualization.
- Setting up Anaconda – The good thing about starting with Kaggle is you don’t need to install anything. But you should also learn how to make data science projects offline – using your PC. Anaconda is a data science platform for that. It installs all the necessary tools and libraries in one go, so you don’t have to install everything separately.
- Programiz Python Course – For those who like learning through text versus video tutorials. Programiz has every Python concept organized neatly with tutorials, examples, and also an online IDE to try out the commands in the browser itself.
- Sentdex Python Playlist – He’s one of my favorite YouTubers. This playlist will teach you everything you need to know to get started with Python. Everything has been explained in a beginner-friendly manner. The playlist begins with how to install Python and goes on to introduce the programming concepts in Python. Finally, we also learn how to make our very own Tic-Toc game in Python.
- Automate the Boring Stuff with Python – This is an excellent (and FREE!) book which teaches you how to apply Python to automate tasks like sending emails, running programs, etc. One of the most useful sections in this course is about Web Scraping. From a machine learning perspective, at some point you might have to collect your own data – say tweets from a particular page or having a particular hashtag to analyze their sentiment. Web scraping (using python to automatically “scrape” date from webpages) comes handy at that time.
Learn Machine Learning Basics
After you have a basic grasp of Python, Pandas, Numpy, and Data Visualization, it’s time to start with the actual machine learning concepts.
Again, I suggest these Kaggle courses to get started quickly:
- Intro to Machine Learning – This course will teach you the basic terminology used in Machine Learning. You will also get to make your own project about predicting house prices from past data,
- Intermediate Machine Learning – This will build on everything learned so far. The model we make in the above course is decent, but its accuracy can be improved. We learn how in this course.
Deep Dive into Machine Learning Concepts
After you have got a handle on the basics, it is time to develop a better intuition. As the next step, we want to get better at recognizing what model is best suited for a particular problem, how that model works, and briefly understand the underlying mathematics.
Don’t worry if you don’t have a background in mathematics. The resources I am listing here do a very good job at explaining the basic mathematical equations behind the models. They will help you develop a solid practical intuition when it comes to applying machine learning in real life.
- Machine Learning A-Z – The key features of this course are:
- Practical projects – you learn by doing. They tackle real world problems like customer churn, sentiment analysis of restaurant reviews, etc in their projects.
- For each model, they first explain the intuition with diagrams and helpful analogies. (I have never seen complex mathematical concepts broken down in so simple terms in any other course.)
- Coding tutorials available in both python and R (another popular data science language).
- All the resources, datasets, code files, etc. are available online (for free) at their dedicated website.
- Stanford Machine Learning by Andrew Ng – This is the gold standard for machine learning coursers. Taught by Andrew Ng (founder of Google Brain, co-founder of Coursera, Chief scientist at Baidu), it covers everything you need to develop a good intuition when it comes to using machine learning as a problem-solving technique. It has more maths than the above course so you will get a stronger foundation. The coding exercises are in MATLAB but it’s so popular that people have made the effort to make them available in Python as well.
Hands – On Machine Learning
By now, we should have enough knowledge to explore new datasets on our own. Find a dataset which interests you here. For each dataset, do your own research. Experiment with different models. Read other people’s solutions – what worked for them, what didn’t. That’s how you will learn.
Here are some classic starter datasets:
- Iris Dataset – You will build a model to classify iris plants into 3 species on the basis of features like sepal width, petal length, etc. (Beginner level)
- Titanic Dataset – Here, you will build a model to predict the survival of passengers. Given a row of data with details about a passenger, your model will have to predict the likelihood of their survival. (Beginner level)
- Telco Customer Churn – Your task is to predict whether a customer will leave the company or not – called the churn rate. The data has information like the services a customer has signed up for, their age, gender, etc. A truly real world problem. (Advanced level)
- Credit Card Fraud Detection – This is a favorite of mine. It is a real world use case. You have to build a model to predict whether a transaction is fraud or not. Also, the dataset is imbalanced – there are very few fraud examples in the dataset. Then, how can you build a model to accurately distinguish between frauds and genuine transactions when the training data has very few instances of fraud to learn from? Handling imbalanced datasets like this is what you will learn through attempting this challenge. (Advanced level)
And, a couple of fun datasets I found:
- The Complete Pokemon dataset – You could frame this as a classification problem.
- Reddit WallStreetBets Posts – WallStreetBets has become famous for its role in propping up GameStop stock. This dataset has the unfiltered posts from that subreddit. You can frame this as a sentiment classification problem.
- World Happiness Report Data – Perform data analysis and data visualization to find out what makes people in a country happy.
These datasets barely scrape the surface of total datasets available. There are thousands of datasets freely available on Kaggle and by extension, thousands of problems to be solved. So, put on your problem-solving hat and get to work!
That’s not all. Kaggle also hosts challenges where you can earn real money (thousands of dollars!) for solving problems using Machine Learning. You can help scientists preserve rainforests, recognize cancer from images, predict stock prices, and a lot more. What’s great is that if your solution is the winning solution, you’ll also be handsomely rewarded for your contribution.
If you think as a beginner you can’t win these competitions, let me dispel that doubt for you. Beginners have not only won cash prizes, but have also gone on to grab awesome career opportunities due to participating in these competitions. Getting better at these competitions takes time, but it’s worth it. The skills you pick up, you can directly apply in real life. Machine learning professionals are highly paid so learning machine learning is a great investment into your career.
Wrapping Up
Phew…that was a long post. The list of resources may seem daunting but remember, you don’t have to go through it at once. Bookmark this page to come back to it later. If you don’t like a particular resource, switch to another one. For instance, some people prefer to learn from text and not video. That’s why I gave multiple resources.
In addition to the great career opportunities Machine Learning offers, it’s also a very exciting field to work in. You can literally work to create the future we only see in sci-fi movies. (You can make it as lame as you want!)Also, do come back to tell me about the awesome projects you make using your newly acquired ML knowledge. I am always interested in seeing creative ways people apply machine learning in their life.