iManage – Unravelling the Labyrinth of AI Myths: AI does not learn by itself

AI Needs Training

Encouraged by media portrayals of AI, a widespread myth is that AI simply learns by itself. For example, a common misconception represents AI as a digital brain that can be plugged and played into a given scenario, learning to solve X, Y, Z challenges on its own. Such representations are based on fiction, not fact.

AI Is Mathematics, Not Magic

While AI is a robotic brain that can learn, it learns in a different way than a human brain. AI uses mathematics and pre-classified data to learn. Crucially, AI needs a human brain to guide it through the learning process by pre-classifying data into categories that it can examine and categorize. Today’s AI does that through machine learning, its principal mechanism. In its turn, machine learning has deep learning as a subset, which imitates the processes of the human brain in analyzing data and creating patterns to make AI decisions. Overall, machine learning techniques are made up of a cocktail of mathematics, statistics and algorithms. These are applied to data inputs to generate conclusions.

AI Learns with Supervised and Unsupervised Learning Techniques

At a high level, we can subdivide how AI systems learn into two techniques – supervised and unsupervised learning. These techniques contribute to the myth that AI learns by itself. Unsupervised learning erroneously suggests the system learns seemingly random insights from data, when in fact the structures and distributions it identifies are mathematical objects based on the data that was fed into the system. Furthermore, unsupervised learning – and the idea that AI learns by itself – is often muddled up when vendors claim that their AI tools work out-of-the-box, without requiring any additional training. In these cases, the reality is that either the system is using an unsupervised technique, or that it comes with a supervised technique pre-trained by the vendor to achieve a specific task for the user. In the second case, even though the vendor and not the end user trained the system, it still learns through supervised learning.

Let’s look at both these learning techniques more closely:

Supervised learning

Let’s say that the objective is to train a system on how to map the square footage of houses to their corresponding house prices. To facilitate learning, the AI system would need to be provided with a labelled (i.e. named) dataset of input data (X = square footage) and the corresponding labelled dataset of output data (Y=house prices).

Both the labelled datasets would run through an algorithm to generate iteratively the resulting mathematical function that best maps square footage to the corresponding house price, with the desired degree of accuracy. As this process is iterative, a human will likely need to be looped in to add or remove data from the labelled datasets, or to correct the machine-generated outputs. The human interaction and feedback is necessary to improve the mathematical function for accuracy.

A real-life application of this kind of supervised learning is photo-tagging of family and friends in Facebook. The algorithm in Facebook learns from users’ tagging of photos to approximate the relationships of pixels in each photo to the tags that users apply, e.g. “Alistair” vs. “Not Alistair.”

Unsupervised learning

Unlike supervised learning, the dataset used for unsupervised learning contains the input data, but no corresponding output data. The AI system identifies mathematical relationships determined by the input data. With this kind of training, the objective is to model the underlying structure and distribution of data to learn more about the data itself.

A real-life application of unsupervised learning is in the retail industry. Online retailers apply unsupervised clustering algorithms to discover the inherent groupings of customers by purchasing behavior (or association algorithms) to describe data. E.g. people that buy books of genre A often also buy books of genre Z.

Screwdriver or Hammer?

An analogy for supervised vs. unsupervised learning is to think of one as a screwdriver and of the other as a hammer. Both are tools, but are they substitutes for each other? The answer is no, they serve similar, but different purposes. In the same way, when people ask the question – “Which is better supervised or unsupervised learning?” – the answer is neither.

Both techniques have distinct functions and are often combined to achieve a result. Using the analogy above, a screwdriver and hammer may be used in combination to make a set of shelves. Similarly, AI systems combine unsupervised learning with supervised learning. Typically, the first is used to preprocess data into logical groupings based on the distribution of the data. Subsequently, the second is used to predict or label new data/outcome.

The final myth regarding AI is “reinforcement learning,” which is perhaps the closest to true AI “self-learning.” We’ll cover it in the next blog.

iManage platform helps organisations work more productively, collaboratively, and securely, leveraging knowledge to drive better [...]