Saturday, February 27, 2010

Final field in Java: is it really?

import java.lang.reflect.Field;

public class ItsFinallyTrue {
   
   public final boolean dream1 = false;
   public final Boolean dream2 = false;
   
   @Override public String toString() {
      return dream1 + "-" + dream2;
   }

   public static void main(String[] args)
         throws IllegalAccessException, SecurityException {

      ItsFinallyTrue dreamer = new ItsFinallyTrue();
      System.out.println(dreamer);
      
      for (Field dream : dreamer.getClass().getDeclaredFields()) {
         dream.setAccessible(true);
         dream.set(dreamer, true);
      }      
      System.out.println(dreamer);
   }
}
What will happen if you run this code?

(A) It throws IllegalAccessException
(B) It throws SecurityException
(C) It prints false-false and false-false
(D) It prints false-false and false-true
(E) It prints false-false and true-true

Thursday, February 25, 2010

Concerning deep*$#!%

How do Arrays.deepToString, deepHashCode, deepEquals handle self-containing arrays? Inconsistently, apparently...
import java.util.Arrays;

public class LetsGetToTheBottomOfThis {
   
   public static Object[] selfContainingArray() {
      Object[] arr = new Object[1];
      arr[0] = arr;
      return arr;
   }
   
   public static void main(String[] args) {
      try {
         System.out.println(
            Arrays.deepToString(
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // prints "[[...]]"

      try {
         System.out.println(
            Arrays.deepHashCode(
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // stack overflows

      try {
         System.out.println(
            Arrays.deepEquals(
               selfContainingArray(),
               selfContainingArray()
            )
         );
      } catch (StackOverflowError e) {
         System.out.println(e);
      } // stack overflows
   }
   
}
This behavior is documented clearly, so I suppose it's not exactly a bug (which is why it's not in the database, or else I would've submitted it). It is curious, though, why deepToString is made cycle-aware and the rest aren't. I admit that deepEquals will be made quite complicated, but deepHashCode is essentially the same as deepToString.
Also, this suggests to me the need for some sort of visitor pattern as well.
Update 03/01: OK, it's a bit worse than I thought: this inconsistency is also there with List (where presumably you favor safety to efficiency).
Note: While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.

Tuesday, February 23, 2010

Hoare: "Null References: The Billion Dollar Mistake""

C. A. R. Hoare (most famous for his quicksort) is now convinced that the concept of a null reference was a mistake.
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.
I haven't done much type system analysis to be entirely convinced. I do use sentinels and/or Null Object whenever I can, but that is rather unrelated to the fundamental question of whether null itself is good or bad.
Watching the video right now. I can say already that I admire him for bravely admitting his mistakes (though I've yet to see this as one).
Eh, I'm still not convinced. Using the same argument shown in the final segment, while it's in many ways unsafe, I still think it's useful enough to have.

Wednesday, February 17, 2010

"Microcomputers: the unredeemed lollipop..."

So let us formalize the problem of finding a pair of strings (not necessarily distinct) from a dictionary of size N, such that the Java string hash code of their concatenation is Integer.MIN_VALUE or 0.

How quickly can this be done?

A quote from Joshua Bloch himself on how he found "polygenelubricants" (at 1:07:00 in the video):
I have a dictionary of 200,000 words, which means there are 40 billion word pairs, [...] and it took it 10 minutes to calculate all the hash values of every word pair
The quote suggests at least an O(N^2) solution, assuming that the hash code calculation is O(1) for each word pair (which can in fact be done).

I think you can do MUCH better than that, however. I'm still working on this as we speak, but here's a Maple snippet to shed some light:
> # "lubricants".hashCode() = -573982625
> # "lubricants".length() = 10
> msolve(x * 31^(10) + (-573982625) = 0, 2^31);
                                {x = 561786081}
> # 561786081 = "polygene".hashCode()
Thus, the problem is solvable in O(N) times the time it takes to do msolve.

My goal is to demonstrate the superiority of this technique by coming up with word triplets. If O(N^2) takes 10 minutes, then O(N^3) would take almost 4 years.

Tuesday, February 16, 2010

Why I could've bought stubbiespornographer.org and didn't

I bought polygenelubricants.com last night. Imagine that, my very first domain name.

Just in case you're wondering, the string "polygenelubricants" traces its origin to Java Puzzlers by Joshua Bloch and Neal Gafter; it is a string whose hash code in Java is Integer.MIN_VALUE. It was used to demonstrate a corner case involving the most negative number in a two-complement representation.

Going for the classic may not have been terribly original, but I think that's also precisely why it's the right choice. Lest I be accused of merely stealing other people's work, however, I will list more strings that fall into the same bucket. The first 4 of these are already known; the rest are my own discoveries.
polygenelubricants
DESIGNING WORKHOUSES
GydZG_
aluminiumsb├ątentabu
stubbiespornographer.org
squirreledconcoctive.net
insecticidaltightness.com
www.livableunsoundness.net
www.cosmetologiesbattleship.org
www.nonsectarianhomology.org
www.sadistpacemakers.com
WWW.MOLLYCODDLERPOTABLES.NET
Honestly, I think some of those domain names are VERY tempting.
While we're at it, I also found some examples of the equally (if not more) important strings that hash to 0.

Monday, February 15, 2010

java.lang.String.hashCode() doesn't cache 0

public class Main {
   static void test(String s) {
      long start = System.currentTimeMillis();
      for (int N = 1 << 20; N --> 0;) {
         s.hashCode();
      }
      System.out.format("Took %d ms.%n", System.currentTimeMillis() - start);
   }
   public static void main(String[] args) {
      String z = new String(new char[1000]);
      test(z);
      test(z + "!");
   }
}
Run it!
Quiz: what percentage of all Java strings hash to 0?

Saturday, February 13, 2010

Factory method pattern for Java exceptions

I'm not sure if this is a new idea. I did a brief survey and I didn't find anything like it out there. I'm going to get right to the point, and then incrementally explain the rationale of my approach.
void before(int index, int lo, int hiX) {
   try {
      throw new IndexOutOfBoundsException(
         String.format("%d out of [%d,%d)", index, lo, hiX)
      );
   } catch (IndexOutOfBoundsException e) {
      throw (NoSuchElementException)
         new NoSuchElementException().initCause(e);
   }
}

void after(int index, int lo, int hiX) {
   try {
      throw Exceptions.indexOutOfBounds(index, lo, hiX);
   } catch (IndexOutOfBoundsException e) {
      Exceptions.translate(e, Exceptions.noSuchElement());
   }
}

Thursday, February 11, 2010

Beware when reflecting on Java arrays

From JLS 10.7 (Array Members):
The members of an array type are all of the following:
  • The public final field length, which contains the number of components of the array (length may be positive or zero).
  • The public method clone, which overrides the method of the same name in class Object and throws no checked exceptions. The return type of the clone method of an array type T[] is T[].
  • All the members inherited from class Object; the only method of Object that is not inherited is its clone method.
Unfortunately, life isn't all peaches and cream...

RFE: verbatim string literals in Java

This is very timely following my previous entry:
Bug ID: 4472509 -- Add support for vertabim[sic] string literals

I feel like there are some very good arguments in that discussion. The need is definitely real, and I've had my share of disgruntlement with the multiple strings concatenation idiom.

A syntactic sugar is not in itself bad. Besides, Java has added less useful sugars recently anyway (e.g. static import, which even the official guide cautions to use "very sparingly!").

Verbatim string literals: other modern languages have it; it's time for Java to follow suit.

Wednesday, February 10, 2010

A Concurrent Modification Exception corner case

CME has always piqued my curiosity, and today I finally went ahead and examined the java.util.* source codes to figure out how modification is detected. As can be predicted, one best-effort implementation uses a counter to facilitate the test (see: AbstractList's modCount field). I've never been a fan of such technique, and the fact that the counter is only 32-bit makes the point all the more demonstratable:
import java.util.*;

public class ConcuMod {
   public static void main(String[] args) {
      List<String> list = new ArrayList<String>();
      list.add("HA!");
      list.add("HA!");
      for (String s : list) {
         System.out.println(s);
         for (long reps = (1L << 32); reps --> 0;) {
            list.clear();
         }
      }
   }
}
The strategy is obvious, of course: modify the list exactly 2^32 times. As expected, the iterator fails to detect the modification.
HA!
Exception in thread "main" java.util.NoSuchElementException
   at java.util.AbstractList$Itr.next(Unknown Source)
   at ConcuMod.main(ConcuMod.java:9)
The decision to throw NSEE instead of CME in this case is worth studying. It would at first seem reasonable to expect CME to be thrown even though the counter test yields a (false) negative, precisely because the test is not 100% foolproof to begin with. However, translating the underlying IndexOutOfBoundsException to NSEE turns out to be better because:
  • It's not presumptuous. IOOB naturally means NSE; CM is merely a possibility. Undetected modification scenarios such as shown here is admittedly so improbable that it's really not worth considering.
  • In the more common scenario, the IOOBE is probably due to an internal bug in the implementation of an AbstractList subclass (it is, after all, documented for inheritance). Throwing NSEE would quickly distinguish this case. CME would in effect make the wrong accusation against the wrong person.

That's a sweet, sweet language!

I've found a new (probably short-lived) hobby: compiling a list of computer science eggcorns. I only have one entry so far: "synthetic syntactic sugar".

From an interview with Anders Hejlsberg:
Q: "Some people in the C academic community don’t like the idea of operator overloading. Some people call it 'synthetic poison', not only 'synthetic sugar'"
From an M.S. thesis supervised by Bertrand Meyer:
"Enumerations are nothing else than synthetic sugar for classes extending java.lang.Enum"
And various, less notable sightings:
  • "With C# 3.0 you can use synthetic sugar to write less code"
  • "Trigraph characters are 'synthetic sugar' which is handled by the preprocessor"
  • "Simply put, synthetic sugar, something JavaScript has plenty of"
  • "In the end, aren't all programming languages basically just synthetic sugar, on top of machine code (from an engineering perspective), or on top of a Turing machine/lambda calculus equivalent (from a theoretical perspective)?"

Like all eggcorns, the logic behind this error is self-evident. Dietary synthetic sugars are manufactured as nonnutritive additives to sweeten food while adding little else (most hopefully not calories!). Syntactic sugars are analogous in that they're added to the language by designers to make it "more palatable" to programmers, while adding absolutely nothing to its overall expressive power.

A parallel may also be drawn between the artificial nature of synthetic sugars and the superficial nature of syntactic sugars, and all the negative connotations that follow from such labels. This angle, though more speculative in nature, is perhaps demonstrated by the programming epigram "syntactic sugar causes cancer of the semicolon".
Submitted for your consideration!

Monday, February 8, 2010

OpenSearch for JLS

My previous blog entry has a lot of references to sections of the Java Language Specification. Instead of being hyperlinks to the official online version of the document, however, they're merely plain non-clickable text (that was copied-and-pasted from elsewhere).

I could've modified the post and added the links manually, but then:
  • Finding the target URL for each section is tedious.
  • The blog entry source will be cluttered.
  • I'd have to repeat this for all my future posts.
  • More importantly, I'd still see unlinked references in forum posts, e-mail discussions, and even some official JLS documents.
After a quick research, I settled on a simple solution for my problem: an OpenSearch engine to perform the URL lookup. Once added to Firefox's list of search engines, one can:
  • Look up any section directly from the search bar
  • Look up a highlighted section number reference in a page from the search action in the context menu.
    • By default, Firefox uses the currently selected search engine.
    • Add-ons such as Context Search allow users to select from all previously-added search engines.

Sunday, February 7, 2010

Glossary of Name Reuse

This is a one-page summary that is somewhat hidden in Java Puzzlers. I feel that knowing the proper names for these concepts is important enough that it's worth quoting for future reference. I've taken the liberties to reorder these as I see fit:
Overloading

Methods in a class overload one another if they have the same name and different signatures. The overloaded method designated by an invocation is selected at compile time [JLS 8.4.9, 15.12.2].

Shadowing

A variable, method, or type shadows all variables, methods, or types, respectively, with the same name in a textually enclosing scope. If an entity is shadowed, you cannot refer to it by its simple name; depending on the entity, you cannot refer to it at all [JLS 6.3.1].

Although shadowing is generally discouraged, one common idiom does involve shadowing. Constructors often reuse a field name from their class as a parameter name to pass the value of the named field. This idiom is not without risk, but most Java programmers have decided that the stylistic benefits outweigh the risks.

Obscuring

A variable obscures a type with the same name if both are in scope: If the name is used where variables and types are permitted, it refers to the variable. Similarly, a variable or a type can obscure a package. Obscuring is the only kind of name reuse where the two names are in different namespaces: variables, packages, methods, or types. If a type or a package is obscured, you cannot refer to it by its simple name except in a context where the syntax allows only a name from its namespace. Adhering to the naming conventions largely eliminates obscuring [JLS 6.3.2, 6.5].
Overriding

An instance method overrides all accessible instance methods with the same signature in superclasses [JLS 8.4.8.1], enabling dynamic dispatch; in other words, the VM chooses which overriding to invoke based on an instance's run-time type [JLS 15.12.4.4]. Overriding is fundamental to object-oriented programming and is the only form of name reuse that is not generally discouraged.

Hiding

A field, static method, or member type hides all accessible fields, static methods, or member types, respectively, with the same name (or, for methods, signature) in supertypes. Hiding a member prevents it from being inherited [JLS 8.3, 8.4.8.2, 8.5].
In particular, shadowing (the forementioned constructor/setter idiom) and obscuring (method name reused for return variable) are two of my own practices that I can now vernaculate.