Tuesday, September 13, 2011

Interning String Literals In Java

A coworker and I were discussing strings, and some of the things we had heard about how the Java compiler stores and retrieves them from memory. We were particularly wondering if constants, local variables, literals, and runtime generated strings were handled differently. I wrote the following JUnit 4 test to show how all these cases are handled (Java JDK 1.6.0_27):

/**
* Test string interning affects on literal and runtime strings.
*/
public final class StringInternTest {

/**
* String intern test.
*/
@Test
public void stringIntern () {
/* literals */
assertEquals("date", "date");
assertTrue("date" == "date");
/* Compare to an external constant */
assertEquals("date", Constants.DATE);
assertTrue("date" == Constants.DATE);
/* Compare locally defined strings. */
final String myDateString = "date";
assertEquals("date", myDateString);
assertTrue("date" == myDateString);
final String myDateString2 = "date";
assertEquals(myDateString, myDateString2);
assertTrue(myDateString == myDateString2);
assertEquals(Constants.DATE, myDateString);
assertEquals(Constants.DATE, myDateString2);
assertTrue(Constants.DATE == myDateString);
assertTrue(Constants.DATE == myDateString2);
/* Create new strings at runtime. */
final String runtime1 = new String("date");
final String runtime2 = new String("date");
assertEquals("date", runtime1);
assertEquals("date", runtime2);
assertEquals(Constants.DATE, runtime1);
assertEquals(Constants.DATE, runtime2);
assertEquals(runtime1, runtime2);
assertFalse(runtime1 == runtime2); // !!!
assertFalse("date" == runtime1); // !!!
assertFalse("date" == runtime2); // !!!
assertFalse(Constants.DATE == runtime1); // !!!
assertFalse(Constants.DATE == runtime2); // !!!
/* Intern the runtime strings. */
final String interned1 = runtime1.intern();
final String interned2 = runtime2.intern();
assertTrue(interned1 == interned2);
assertTrue("date" == interned1);
assertTrue("date" == interned2);
assertTrue(Constants.DATE == interned1);
assertTrue(Constants.DATE == interned2);
}
}

If you set a break point in this test and examine the variables you will find that Constants.DATE, myDateString, myDateString2, interned1, and interned2 all have the same internal object ID. I learned online that all string literals are supposed to be interned when the application is compiled, which accounts for the variables having the same ID.

Strings stored in runtime1 and runtime2 are each given a unique object ID when instantiated at runtime, and thus return a false when == is tried with any of the interned strings.

Some have suggested that manually interning is better because using == is much faster (5x) than equals() (because comparing object IDs is faster than comparing string lengths || characters). However, others dismiss this as a minor gain at best, and I can certainly see that using == might lead to some hard-to-find bugs if a non-interned string is compared to either an interned or another non-interned string. I also learned that main argument strings (args[]) are not interned.

Interned strings are stored in the PermGen (Permanent Generation) memory, and it is possible to fill up that space with strings if one is not careful.

References:

No comments: