Methods and Techniques of Machine Learning

Artificial_Intelligence Machine_Learning

Date: 2024-02-01 | Time of reading: 9 minutes (1734 words)

In 2023, it is challenging to envision a news feed without the concept of "machine learning." It is anticipated that the market for this technology will reach 225 billion dollars by 2030. For comparison, the forecast for 2023 predicted growth to 26 billion dollars.

In other words, machine learning is here to stay for the long term. The technology will continue to be integrated into various aspects of human life: predicting, classifying, generating objects. Therefore, it is essential not to fall behind and understand what machine learning is, how ML emerged, and what methods and techniques it employs. We will cover these topics in our article.

Essence of machine learning

Machine learning is a branch of artificial intelligence that simulates human thinking processes. Unlike traditional software that follows a rigid sequence of actions, machine learning involves continuous "thinking," mimicking the way the human brain operates.

Machine learning entails making predictions based on vast amounts of data, where algorithms discover patterns. The concept is closely related to neural networks, which are part of one of the types of machine learning and operate through deep learning.

Machine learning algorithms are used to create services that:

Recommend products, services, and content, including based on user actions. For example, online cinemas suggest movies and series based on viewing history.
Forecast the future: possible trends, sales volumes, patient illnesses in medical clinics based on their medical history, events, and more. Machine learning models determine credit ratings in banks and predict the behavior of customers who may struggle to repay a loan.
Recognize objects in images and videos, understand speech, and analyze text. This speeds up routine processes, simplifies human work, and even enhances security. For instance, recognizing vehicle license plates through surveillance cameras adds an extra layer of control.

How machine learning works

Defining criteria for selection and data collection involves gathering vast amounts of information.
Data preparation involves labeling the data, which is crucial for ML algorithms to recognize the desired elements. Currently, labeling is mostly done by specialists and is less frequently automated, making this process time-consuming.
Data verification and pattern identification identifies errors in the data to correct them and make the next stage more precise by searching for patterns.
Model selection and training commencement. The algorithm processes the data and produces results.
Obtaining and evaluating algorithm performance. At this stage, errors are corrected, and algorithms may be adjusted for further operations.

History of machine learning

20th Century

The first instances of machine learning date back to the mid-20th century. Scientists Walter Pitts and Warren McCulloch successfully mimicked a neuron in 1943. There is also the notable secret project by the U.S. Army in 1946, aimed at creating tables to improve shooting accuracy through software.

The flourishing era of machine learning began in the 1950s, marked by the appearance of Checkers-playing, a program by Joseph Weizenbaum, Frank Rosenblatt, and Arthur Samuel that could play checkers.

During the same period, the Mark I Perceptron, a neural network model simulating human brain function, was introduced by Rosenblatt. In the late 1950s, the SNARC, a neural network capable of handling complex tasks, was developed by American scientist Marvin Minsky.

The term "machine learning" was officially coined in 1959 at a conference at Dartmouth College (USA).

In the 1960s, the prototype of the first virtual assistant was launched – ELIZA, a system that simulated a conversation with a psychotherapist. In the same decade, an algorithm capable of recognizing and classifying data was invented. Towards the end of the 1960s, Bernard Widrow and Sebastian Trun developed the backpropagation algorithm, a significant advancement in improving neural network performance.

In the 1980s, scientists once again employed technologies for gaming, this time in chess. Young researchers from Carnegie Mellon University devised the ChipTest system. Based on this development, the supercomputer Deep Blue emerged in the late 1990s, defeating the renowned chess player Garry Kasparov. This showdown between man and machine showcased the power of machine learning in predicting robot actions.

21st Century

In our time, the computing power has increased, and data volumes have grown, propelling machine learning (ML) into a new stage of development. In the 2000s, the concept of "deep learning" emerged. The early 2010s marked an era of discovering new projects related to neural networks. Notably, Google entered the race, and by 2012, a Google X Lab algorithm had learned to recognize cats in images and videos. Google also introduced the Google Prediction API, a service for analytics and machine learning.

Amazon, Microsoft, Facebook, and other giants were not left behind; they developed their platforms where machine learning methods were applied. Mark Zuckerberg's company, in particular, introduced the DeepFace technology, which achieved high-precision facial recognition.

In the 2020s, the role of machine learning continues to grow. Technologies are already operational in the fields of finance, healthcare, industry, and transportation, gradually integrating into people's daily lives.

Machine learning techniques

Method	Description
Supervised learning	The model is trained on already prepared data (labeled) and ready answers. The algorithm has to choose the answer to the hypothesis: right or wrong, and the human has to control the result. This is a relatively simple way of ML that is suitable for data classification. It is learning with a teacher that helps to distinguish between objects.
Unsupervised learning	For training, information without markup is taken. The algorithm itself must understand the patterns and features that distinguish objects. Suitable for prediction and for automated cleaning of input data, recognizing and understanding human language.
Semi-supervised learning	Here marked and unmarked data are used. The rest of the partitioning is performed by the algorithm itself according to the specified parameters. Such training is useful for processing long files (e.g., longreads).
Reinforcement learning	The method is based on a point system that the model receives depending on the actions. It is similar to a reward system in a game, where the player gets a bonus for a correct action.

Machine learning methods

Naive Bayes classifier

The method is named after Bayes' theorem. Its foundation lies in determining the class of an object based on its features. Considered one of the simplest methods, it is used for categorization, object recognition in images, and identifying spam emails.

Decision Trees

In this method, the model identifies the relationship between events and objects, as well as the consequences of their interactions. Visually, this scheme can be depicted as "branching trees," representing different development scenarios depending on the choices made. It is similar to algorithmic tests with "yes" or "no" answers, leading to a result. The method's advantage lies in its systematic approach, but it is not applicable everywhere.

Logistic regression

A statistical method that has found application in machine learning. The logistic function determines the relationship between one or more variables. Objects are categorized into two classes based on specific values in the range of 0-1.

Determining credit potential, predicting sales success, advertising campaigns, and other probabilistic events are identified through the logistic regression method.

Support vector machines

Incorporating several algorithms, support vector machines classify objects in a hyperplane separated by vectors. The system's task is to find the most appropriate positioning of the line in the plane for optimal classification.

This method solves complex problems and can determine factors like whether there is a man or woman in a photo, what ads to show a user on a website, and so on.

Linear regression

The essence of this method is to relate one variable (N) to another variable or several variables. Conceptually, on a plane, there are data points for which one needs to find the most suitable line that connects them.

This algorithm easily predicts trends and determines what information is missing in simple linear sequences. For example, what needs to be added to the sequence 10, 20, 30, 40, 50…

Assembly methods

An ensemble of methods where various classifiers are generated and data is divided based on averaging or voting principles. This approach reduces the probability of errors. Assemblies include:

Bagging: Basic models are trained in parallel with the collection of more complex classifiers.

Boosting: Strong models are created using weak ones. Training occurs on previous classifiers to improve and correct errors for subsequent ones.

K-Means clustering

One of the clustering methods in machine learning. The algorithm utilizes a random number of clusters K. Each has its center (point), from which the distance to each data point in the groups is calculated. Then the center of each cluster is listed, vectors again partition data points into clusters. This process continues until no further changes occur.

Clustering is useful for biological research, IT technologies, and sociology.

Generative adversarial learning

This method is based on the operation of neural networks that, in a way, compete with each other (hence the name). The algorithm involves discriminator and generator networks. The discriminator attempts to classify incoming data and correlate it with categories, while the generator, conversely, aims to choose images that fit the categories. The generator creates objects, and the discriminator checks their authenticity. This algorithm is capable of generating images of non-existent people that resemble real ones.

Summary

Machine Learning is a form of artificial intelligence that simulates the activity of the human brain. The initial machine learning algorithms emerged in the 1940s and 1950s, but significant growth was achieved in the 21st century, and they are now utilized in various aspects of human life.

ML methods are classified based on human involvement. There is supervised, unsupervised, semi-supervised, and reinforcement machine learning.

Additionally, methods are categorized by the algorithm's operation. This includes Bayesian classifier, decision trees, logistic regression, assembly methods (bagging, boosting), K-means clustering, generative adversarial learning, among others.

*Product of the company Meta, prohibited on the territory of the Russian Federation.