Sondaggi Referendari

Nella presentazione dei sondaggi elettorali, in particolare sotto forma grafica, c’è una semplificazione che riguarda l’integrità grafica e che potenzialmente trasmette un messaggio non corretto ai cittadini.

Principalmente:

  • le stime di percentuali per il SI/NO non sono messe correttamente in relazione a quelle degli indecisi e degli astenuti
  • l’incertezza derivante dal campionamento non viene rappresentata

Continua a leggere

Analytics, models, and algorithms

Recently it is so common to find the term algorithm in expressions such as “the power of the algorithms”.

That expression is used to mean the power of the data analysis process and methods that are capable of automatically predicting or inferring features, e.g. preferences of people. A typical example is that of the prediction of the interests of the people surfing the web in order to propose them targeted advertisements.

I personally do not like this kind of metonymical usage of the term; here I briefly explain why.

Definitions

There are a few key related terms:

algorithm

is a description of a computation that consists in an ordered and finite sequence of elementary steps (operations or instructions) in order to produce a result in a finite time.

(mathematical) model

is a description of a system, typically using mathematical concepts and language

analytics

is the process and method for discovery, interpretation, and communication of meaningful patterns in data.

How it works

When people (journalists mostly) talk about algorithms what they mean is actually the analytics process and methods, that is:

  1. You start with some data set
  2. You define/reuse an algorithm to build a (mathematical) model
  3. You get a model that describes your data; e.g. showing a correlation

1 – Data sets

Average per capita cheese consumption in the years 2000 through 2009:

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
29.8 30.1 30.5 30.6 31.3 31.7 32.6 33.1 32.7 32.8

Number of people who died becoming tangled in their bedsheets, in the same years

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
327 456 509 497 596 573 661 741 809 717

The above weird pair of data series is taken from http://www.tylervigen.com/spurious-correlations.

2 – Algorithm

Supposing we aim at a linear model that links the two series by means of a linear regression, the Least Squares method can be used.

The method can be applied using different algorithms, e.g. in R:

leastSquare <- function(x,y){
    X = matrix(c(rep(1,length(x)),x),ncol=2)
    b = solve(t(X)%*%X) %*% t(X) %*% y
    return(as.numeric(b) )
}

Once the algorithm is executed on the two series of values it will return two values: -2977.3485  and  113.1329

3 – Model

The two numbers represent the coefficients of the linear interpolation, and allow us to write the equation:

DbBSE =  113 * ApCCC – 2977

Where:

DbBSE : Death by BedSheet Entanglement

ApCCC : Average per Capita Cheese Consumption

The equation is a description of a model that describes how the two measures are linked to each other. So apparently an increase of one lb in the average per capita consumption could be linked to 113 more deaths by bed sheet entanglement.

Of course the link is almost for sure due to pure chance: in general the presence of a correlation does not imply causation.

The role of algorithms

The whole process is what is generally called analytics, the model is the description of how different measures are linked to each other, and the algorithms constitute a means to identify the parameters of the model starting, e.g. from historical data.

In fact the “algorithms” are a small part in such a complex process, so the expression “the power of the algorithms” is a synecdoche.

In general I believe that such expression is meaningful and correct if it is clear that it is a synecdoche, otherwise it is a misleading simplification.

 

 

 

A few pointers on using metrics for decision making

Metrics program are becoming very common in several areas, and more and more decision making is based on them. The idea of founding decision on numbers or evidence is appealing, though it hides several risks.

Here I collected a few quotes and pointers for further reading.

Tell me how you measure me and I will tell you how I behave.

Eliyahu Goldratt

Goodhart’s law:

When a measure becomes a target, it ceases to be a good measure.

Campbell’s law:

The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. 

Also linked is the concept of Perverse incentive.

An interesting read in the context of software engineering is:

An Appropriate Use of Metrics: http://martinfowler.com/articles/useOfMetrics.html

Novantanove per cento

Noi siamo il 99% è lo slogan che è partito dal movimento Occupy Wall Street ed è poi stato adottato dalle proteste diffuse in tutto il mondo degli indignados ed è stato adottato anche dalla protesta italiana con le manifestazioni del 15 Ottobre 2011 (ad esempio dagli studenti).

L’origine dello slogan è nel fatto che l’1% della popolazione detiene una larga parte delle ricchezze, che lo stesso 1% (o forse meno) è l’unico ad aver consistentemente conseguito grossi vantaggi negli ultimi 30-40 anni, che quell’1% condiziona le politiche dei governi a proprio favore e a discapito del restante 99%.

Credo sia facile essere d’accordo con questi sintomi, ma se ci spostiamo sulla diagnosi….

Continua a leggere