Monday, October 11, 2010

Writing more elegant comparison logic with Guava's Ordering

One of the essential interfaces to define object ordering in Java is Comparator<T>. The contract is rather straightforward:
int compare(T o1, T o2): Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.
This simplicity can be deceiving: implementing a correct and readable comparator logic can be quite tricky. There are many common pitfalls that one should be aware of, and failure to do so may lead to subtle bugs. Often we also see complicated (and therefore error-prone) comparators that implement various additional rules (e.g. null handling and/or tie breaking) that obfuscates the big picture logic. Moreover, even the simplest and most straightforward comparator often duplicates code on the two objects being compared.

One of the better solutions to solve all of these problems is Ordering<T> from Guava (a strictly compatible superset of the old Google Collections library). This post shows how it enables elegant implementation of various comparison patterns. An appendix also explains why the commonly seen comparison by subtraction and reverse order comparison by negation "tricks" are broken.

This simple usage shows how we can sort a List<Integer>, allowing for null values.
List<Integer> nums = Arrays.asList(3, 5, 4, null, 1, 2);

Collections.sort(nums,
   Ordering.natural().nullsLast()
);
System.out.println(nums);
// [1, 2, 3, 4, 5, null]

Collections.sort(nums,
   Ordering.natural().nullsFirst()
);
System.out.println(nums);
// [null, 1, 2, 3, 4, 5]

Collections.sort(nums,
   Ordering.natural().reverse().nullsLast()
);
System.out.println(nums);
// [5, 4, 3, 2, 1, null]
Ordering<T> implements Comparator<T>, so it's compatible with existing API. Ordering.natural() is as defined by Comparable<T>. We can then simply reverse an ordering and/or add a nullsFirst/nullsLast rule.

In the next example, we show how we can sort strings by length, and how we can add a tie-breaking rule.
Ordering<String> byLength = new Ordering<String>() {
   @Override
   public int compare(String s1, String s2) {
      return Ints.compare(s1.length(), s2.length());
   }
};

List<String> strings = Arrays.asList("xxx", "Z", null, "22", "A", "33", "11");

Collections.sort(strings,
   byLength.nullsFirst()
);
System.out.println(strings);
// [null, Z, A, 22, 33, 11, xxx]

Collections.sort(strings,
   byLength.compound(Ordering.natural()).nullsFirst()
);
System.out.println(strings);
// [null, A, Z, 11, 22, 33, xxx]

Collections.sort(strings,
   byLength.compound(Ordering.natural().reverse()).nullsLast()
);
System.out.println(strings);
// [Z, A, 33, 22, 11, xxx, null]

Collections.sort(strings,
   byLength.reverse().compound(Ordering.natural()).nullsLast()
);
System.out.println(strings);
// [xxx, 11, 22, 33, A, Z, null]
Note that Ints.compare(int, int) is used for comparison. Note also that in byLength, we duplicate .length() on s1 and s2.

We typically use the same mechanism to get the comparison "key" to compare two objects, so we can encapsulate this logic into a Function<F,T>. This eliminates duplication, enables testing, and also makes it reusable for other purposes.

This next example shows how we can define a function to map objects to their simple class names, and how we can use it (among other things) to perform the comparison onResultOf applying it to the objects being compared.
Function<Object, String> toSimpleClassName = new Function<Object, String>() {
   @Override
   public String apply(Object obj) {
      return obj.getClass().getSimpleName();
   }
};
 
System.out.println(
   Lists.transform(
      ImmutableList.of("foo", 42, 'x'),
      toSimpleClassName
   )
); // [String, Integer, Character]
 
Ordering<Object> bySimpleClassName =
   Ordering.natural()
      .onResultOf(toSimpleClassName);
 
System.out.println(
   bySimpleClassName
      .sortedCopy(
         ImmutableList.of("foo", 42, 'x', "bar", 666)
      )
); // [x, 42, 666, foo, bar]

System.out.println(
   bySimpleClassName.reverse().compound(Ordering.natural())
      .sortedCopy(
         ImmutableList.of(3, 'y', true, "foo", 'z', 1, 'x', "bar", 2, false)
      )
); // [bar, foo, 1, 2, 3, x, y, z, false, true]
   
System.out.println(
   Multimaps.index(
      ImmutableList.of(3, 'y', true, "foo", 'z', 1, 'x', "bar", 2, false),
      toSimpleClassName
   )
); // {Integer=[3, 1, 2], Character=[y, z, x], Boolean=[true, false], String=[foo, bar]}
Note that in addition to its usage as a keying mechanism, the above example also reuses toSimpleClassName for Multimaps.index and Lists.transform

In summary, using Ordering<T> allows you to write clean and elegant comparison logic.
  • The basic comparison logic doesn't have to worry about nulls and/or how to break ties.
  • You can compose a sophisticated ordering from the basic building blocks in a natural, readable, elegant way.
  • You can use a Function to encapsulate the keying logic, eliminating duplication, facilitating test and reuse.

Appendix

For those who prefer to write their own comparators from scratch, here are some of the more common pitfalls:

Broken trick #1: Comparison by subtraction

The essence of this "trick" is the use of subtraction to determine if one number is larger than the other.
// BROKEN! DO NOT DO THIS!
return obj1.getNumber() - obj2.getNumber();
This "trick" works for some numbers (e.g. we could've used s1.length() - s2.length() in byLength), but it does NOT work in general due to arithmetic overflow (e.g. Integer.MIN_VALUE - Integer.MAX_VALUE is positive).

Broken trick #2: Reverse order comparison by negation

The essence of this "trick" is the use of negation to attempt to reverse the sign of the result of previous comparison. It's common to see this implemented by multiplication with -1.
// BROKEN! DO NOT DO THIS!
return -1 * obj1.compareTo(obj2);
The problem here is that negation for sign reversal works for all but one value. In two's complement representation, the most negative number has no positive counterpart. As a Java int, the Integer.MIN_VALUE, its negation -Integer.MIN_VALUE, and its absolute value Math.abs(Integer.MIN_VALUE), is the same (negative) number.

Contractually speaking, both Comparator.compare and Comparable.compareTo permit Integer.MIN_VALUE as a valid return value.

References


findbugs (RV_ABSOLUTE_VALUE_OF_HASHCODE): Bad attempt to compute absolute value of signed 32-bit hashcode
Java Puzzlers (the ascending/descending sort, "polygenelubricants")

1 comment:

  1. First I got a compilation error and couldn't pinpoint what the problem was. After several compare,
    I found an inaccuracy, corrected it and eventually all would be ok. But I needed explanations so, in the same way, I found a tutorial that explains the difference between comparable and comparator https://explainjava.com/comparable-comparator-java/ interfaces. These tricks are applicable to each method, so I was able to determine what suits me for which script. If you have questions about the difference between compilers, you can also ask on platforms for learning java like this.

    ReplyDelete