Using Reflection to (Try to) Overcome Type Erasure Limitations in Java

Introduction

This week, at the end of a lecture on Java generics I was describing type erasure and its consequences, mainly the fact that the raw type has no information about which type is substituted in the current instance. Consider, for example the following generic class:

class Instantiator<T> {
  public static T newInstance(){
    return new T();
  }
}

Since after type erasure, any information about T is lost, instantiation of type T is not possible and a Java compiler would issue a compilation error such as Cannot instantiate type T().

For this reason the interface Collection<E>, in the Java Collections framework, defines two methods to convert an the collection into an array:

public Object[] toArray();
public <T> T[] toArray(T[] ary);

The former returns and array of Object,the universal reference type in Java, that does not depend on the actual type T, the latter does not creates and array but accepts it as an argument and merely fills it.

So the question that emerged from the students was: is there a way to find out the actual type parameter used to instantiate a collection class and use it to instantiate an array of the appropriate type?

My first quick answer was: in theory it is possible by means of reflection, but it would be highly inefficient and this is the reason why it was not adopted in the Java collections framework.

The goal of this short article is to elaborate more on that answer and provide further details and evidence.

Quick overview of reflection

Here I would like to recap the main aspects of Java reflection, mainly for the purpose of instantiating an array.

In Java, given any (reference to an) object,at run-time, is is possible to get information about its class using the method Object.getClass(), which returns an object of type Class that provides, among others, the methods:

String getCanonicalName()
Class[] getInterfaces()
Class getSuperclass()

That return, respectively, the fully qualified name of the class, the list of implemented interfaces, and the super class.

The class object can also be retrieved using the predefined static class attribute, e.g.

Class stringClassDescriptor = String.class;

Given a Class object is is possible to instantiate an array of that type using the method newInstance() of class Array, that accepts the type of the elements and the size as arguments.

Proposed solution

Using the elements presented above it is possible to write a simple method that: infers the actual type parameter from the first element in a collection, instantiates the suitable array, and fill it in:

public static <T> T[] toArray(Collection<T> c) {
 T element = c.iterator().next();
 Class base = element.getClass();
 int lenght = c.size();
 T[] res = (T[]) Array.newInstance(base, lenght);
 c.toArray(res);
 return res;
}

While the above code perfectly illustrates the principle, it is not correct or suitable due to two main issues:

  1. it does not handle possible errors (e.g. zero-lenght array)
  2.  it assumes that inference is possible form the first element of the collection only

While problem 1) can be solved just by adding a few lines of code, problem 2) cannot be solved completely: due to polymorphism it is possible that the array host references to instances of several different classes, and it is clearly possible that the first element’s class does not correspond to the actual type parameter.

For instance, let us consider the following example:

LinkedList<Number> list = new LinkedList<Number>();
list.add(new Integer(1));
list.add(new Long(2));
list.add(new Float(3.0));

Inferring the actual type parameter (Number) from the first element would lead us to an error: inferring the type from the first element would lead us to instantiate and array of Integer, though when the toArray() method of LinkedList is invoked to fill it in an ArrayStoreException would be raised as soon as it tries to store a reference to a Long object in a Integer array.

Therefore the inference should be performed on the basis of the types of all the elements of the collection. We need to start the inference with the class of the first element and the update it with each other element by finding a common ancestor, for instance:

Class base = null;
for (T element : c) {
  Class cls = element.getClass();
  if (base == null) {
    base = cls;
  } else if (base != cls) {
    base = findCommonAncestor(cls, base);
  }
}

While this solution provide a viable solution to correctly generate an array representation of the collection, it still has  a serious limitations that is not solvable using reflection

  • it is possible that more that one common ancestor is present, therefore several possibilities for the array type are valid. This is possible in two distinct cases:
  • a search algorithm typically finds the most specific (less abstract) common class in the inheritance tree, though a valid common ancestor is also the super class of that one, and its super class etc.
  • due to multiple inheritance (from interfaces) is is possible that several equally specific types are common ancestors of two distinct classes.

Let us consider that case of a list containing both an Integer and a Double objets, if the look at the (partial) inheritance tree shown in the following figure, we can easily observe that three possible types are valid inferences for the actual type parameter: Number, Comparable, and Object.

  • Joint inheritance tree

As a consequence, while it is always possible to identify a valid inference for the actual type parameter, there is no guarantee that it is the real one. This approximation may potentially lead to errors in other portions of the program.

Performance assessment

In addition the the above considerations, there is also the problem of the performance penalty induced by the use of reflection.

For this purpose we compared three different solutions using:

  1. toArray(T[]) with the correct array type instantiated by the caller
  2. a method that infers the array type form the first element of the collection
  3. a method that infers the array type from the type off all elements in the collection

Since the latter’s performance is clearly dependent on the size of the array we analyzed several different array sizes.

In addition, in orded to avoid issues deriving from measurement error deriving from the noise introduced by other programs running on the computer, we repeated each measure 100 times.

For each solution we fitted a linear regression model fixing the intercept to zero.

The data, and the regression lines, together with the slope are presented in the following figure.

Regressions

With respect to the standard solution provided in the Collections framework (red line with slope 4.437), we observe a 12% increase in the slope for the solution using the first element only (green line with slope 4.948) and a 67% increase for the solution using all the elements (blue line with slope 7.418).

Update: I also fitted a linear model of the form

Time ~ N + N:toAAll + 0

the models has a R^2 of 99.47% and the two estimated coefficients are 0.420886 (for N) and 0.082698 (for N:toAAll), the ratio of the two coefficients, 19.6% represents the performance penalty introduced by using the solution that estimates the base type using all the elements.

Conclusions

Type erasure implies that any information about the actual type parameter for a collection is lost at run-time.

Using reflection is possible to infer the type with some approximation, therefore a general exact solution is not possible.

Even the approximate solution requires considering the type of all elements in the collections, which leads to a performance penalty of 19%.

Update: the full source code, R scripts, and data used can be found here: http://softeng.polito.it/courses/02JEY/ReflectionGenerics.zip

Annunci

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...