Recently it is so common to find the term **algorithm** in expressions such as “the power of the algorithms”.

That expression is used to mean the power of the data analysis process and methods that are capable of automatically predicting or inferring features, e.g. preferences of people. A typical example is that of the prediction of the interests of the people surfing the web in order to propose them targeted advertisements.

I personally do not like this kind of metonymical usage of the term; here I briefly explain why.

# Definitions

There are a few key related terms:

**algorithm**

is a description of a computation that consists in an ordered and finite sequence of elementary steps (operations or instructions) in order to produce a result in a finite time.

**(mathematical) model**

is a description of a system, typically using mathematical concepts and language

**analytics**

is the process and method for discovery, interpretation, and communication of meaningful patterns in data.

# How it works

When people (journalists mostly) talk about algorithms what they mean is actually the analytics process and methods, that is:

- You start with some data set
- You define/reuse an algorithm to build a (mathematical) model
- You get a model that describes your data; e.g. showing a correlation

## 1 – Data sets

Average per capita cheese consumption in the years 2000 through 2009:

2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 |

29.8 | 30.1 | 30.5 | 30.6 | 31.3 | 31.7 | 32.6 | 33.1 | 32.7 | 32.8 |

Number of people who died becoming tangled in their bedsheets, in the same years

2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 |

327 | 456 | 509 | 497 | 596 | 573 | 661 | 741 | 809 | 717 |

The above weird pair of data series is taken from http://www.tylervigen.com/spurious-correlations.

## 2 – Algorithm

Supposing we aim at a linear model that links the two series by means of a linear regression, the Least Squares method can be used.

The method can be applied using different algorithms, e.g. in R:

leastSquare <- function(x,y){ X = matrix(c(rep(1,length(x)),x),ncol=2) b = solve(t(X)%*%X) %*% t(X) %*% y return(as.numeric(b) ) }

Once the algorithm is executed on the two series of values it will return two values: -2977.3485 and 113.1329

## 3 – Model

The two numbers represent the coefficients of the linear interpolation, and allow us to write the equation:

*DbBSE = 113 * ApCCC – 2977*

Where:

*DbBSE* : Death by BedSheet Entanglement

*ApCCC* : Average per Capita Cheese Consumption

The equation is a description of a model that describes how the two measures are linked to each other. So apparently an increase of one lb in the average per capita consumption could be linked to 113 more deaths by bed sheet entanglement.

Of course the link is almost for sure due to pure chance: in general the presence of a correlation does not imply causation.

# The role of algorithms

The whole process is what is generally called analytics, the model is the description of how different measures are linked to each other, and the algorithms constitute a means to identify the parameters of the model starting, e.g. from historical data.

In fact the “algorithms” are a small part in such a complex process, so the expression “the power of the algorithms” is a synecdoche.

In general I believe that such expression is meaningful and correct if it is clear that it is a synecdoche, otherwise it is a misleading simplification.