5 things you didn't know about ... the Java Collections API, Part 1
The Java Collections API came to many Java developers as a much needed replacement for the standard Java array and all of its shortcomings. Associating Collections primarily with
ArrayList
isn't a mistake, but there's much more to the Collections for those who go looking.Similarly, while the
Map
(and its oft-chosen implementation,HashMap
) are great for doing name-value or key-value pairs, there's no reason to limit yourself to these familiar tools. You can fix a lot of error prone code with the right API, or even the right Collection.This second article in the 5 things series is the first of several devoted to Collections, because they're so central to what we do in Java programming. I'll start at the beginning with a look at the quickest (but possibly not the most common) ways to do everyday things, like swapping out
Array
s for List
s. After that we'll delve into lesser known stuff, like writing a custom Collections class and extending the Java Collections API.Similarly, while the Map
(and its oft-chosen implementation,HashMap
) are great for doing name-value or key-value pairs, there's no reason to limit yourself to these familiar tools. You can fix a lot of error prone code with the right API, or even the right Collection.Developers new to Java technology may not know that arrays were originally included in the language to head-off performance criticism from C++ developers back in the early 1990s. Well, we've come a long way since then, and the array's performance advantages generally come up short when weighed against those of the Java Collections libraries.
Dumping array contents into a string, for example, requires iterating through the array and concatenating the contents together into a
String
; whereas, the Collections implementations all have a viable toString()
implementation.Except for rare cases, it's good practice to convert any array that comes your way to a collection as quickly as possible. Which then begs the question, what's the easiest way to make the switch? As it turns out, the Java Collections API makes it easy, as shown in Listing 1:
Listing 1. ArrayToList
import java.util.*; public class ArrayToList { public static void main(String[] args) { // This gives us nothing good System.out.println(args); // Convert args to a List of String List |
Note that the returned
List
is unmodifiable, so attempts to add new elements to it will throw anUnsupportedOperationException
.And, because
Arrays.asList()
uses a varargs parameter for elements to add into the List
, you can also use it to easily create List
s out of new
ed objects.It's not uncommon to want to move the contents of one collection (particularly one that was manufactured out of an array) over into another collection or to remove a small collection of objects from a larger one.
You might be tempted to simply iterate through the collection and add or remove each element as it's found, but don't.
Iterating, in this case, has major disadvantages:
- It would be inefficient to resize the collection with each add or remove.
- There's a potential concurrency nightmare in acquiring a lock, doing the operation, and releasing the lock each time.
- There's the race condition caused by other threads banging on your collection while the add or remove is taking place.
You can avoid all of these problems by using
addAll
or removeAll
to pass in the collection containing the elements you want to add or remove.The enhanced for loop, one of the great conveniences added to the Java language in Java 5, removed the last barrier to working with Java Collections.
Before, developers had to manually obtain an
Iterator
, use next()
to obtain the object pointed to from the Iterator
, and check to see if more objects were available via hasNext()
. Post Java 5, we're free to use a for-loop variant that handles all of the above silently.Actually, this enhancement works with any object that implements the
Iterable
interface, not just Collections
.Listing 2 shows one approach to making a list of children from a
Person
object available as an Iterator
. Rather than handing out a reference to the internal List
(which would enable callers outside the Person
to add kids to your family — something most parents would find uncool), the Person
type implements Iterable
. This approach also enables the enhanced for loop to walk through the children.Listing 2. Ehanced for loop: Show me your children
// Person.java import java.util.*; public class Person implements Iterable |
Using
Iterable
has some obvious drawbacks when domain modeling, because only one such collection of objects can be so "implicitly" supported via the iterator()
method. For cases where the child collection is obvious and apparent, however,Iterable
makes programming against the domain type much easier and more obvious.Have you ever wanted to walk a
Collection
, but in reverse? That's where a classic Java Collections algorithm comes in handy.The children of
Person
in Listing 2 above, are listed in the order that they were passed in; but, now you want to list them in the reverse order. While you could write another for loop to insert each object into a new ArrayList
in the opposite order, the coding would grow tedious after the third or fourth time.That's where the underused algorithm in Listing 3 comes in:
Listing 3. ReverseIterator
public class ReverseIterator { public static void main(String[] args) { Person ted = new Person("Ted", "Neward", 39, new Person("Michael", "Neward", 16), new Person("Matthew", "Neward", 10)); // Make a copy of the List List |
The
Collections
class has a number of these "algorithms," static methods that are implemented to take Collections
as parameters and provide implementation-independent behavior on the collection as a whole.What's more, the algorithms present on the
Collections
class certainly aren't the final word in great API design — I prefer methods that don't modify the contents (of the Collection passed in) directly, for example. So it's a good thing you can write custom algorithms of your own, like the one shown in Listing 4:Listing 4. ReverseIterator made simpler
class MyCollections { public static |
The customized algorithm above illustrates a final point about the Java Collections API: that it was always intended to be extended and morphed to suit developers' specific purposes.
So, for example, say you needed the list of children in the
Person
class to always be sorted by age. While you could write code to sort the children over and over again (using the Collections.sort
method, perhaps), it would be far better to have aCollection
class that sorted it for you.In fact, you might not even care about preserving the order in which the objects were inserted into the
Collection
(which is the principal rationale for a List
). You might just want to keep them in a sorted order.No
Collection
class within java.util
fulfills these requirements, but it's trivial to write one. All you need to do is create an interface that describes the abstract behavior the Collection
should provide. In the case of a SortedCollection
, the intent is entirely behavioral.Listing 5. SortedCollection
public interface SortedCollection |
It's almost anticlimactic to write an implementation of this new interface:
Listing 6. ArraySortedCollection
import java.util.*; public class ArraySortedCollection |
This quick-and-dirty implementation, written with no optimizations in mind, could obviously stand some refactoring. But the point is, the Java Collections API was never intended to be the final word in all things collection-related. It both needs and encourages extensions.
Certainly, some extensions will be of the "heavy-duty" variety, such as those introduced in
java.util.concurrent
. But others will be as simple as writing a custom algorithm or a simple extension to an existing Collection
class.Extending the Java Collections API might seem overwhelming, but once you start doing it, you'll find it's nowhere near as hard as you thought.
Like Java Serialization, the Java Collections API is full of unexplored nooks and crannies — which is why we're not done with this subject. The next article in the 5 things series will give you five more ways to do even more with the Java Collections API.
5 things you didn't know about ... the Java Collections API, Part 2
The Collections classes in
java.util
were designed to help, namely by replacing arrays and, thus, improving Java performance. As you learned in the previous article, they're also malleable, willing to be customized and extended in all kinds of ways, in service of good, clean code.Collections are also powerful, however, and mutable: use them with care and abuse them at your own risk.
Java developers frequently make the mistake of assuming that
ArrayList
is simply a replacement for the Java array. Collections are backed by arrays, which leads to good performance when looking up items randomly within a collection. And, like arrays, collections use integer-ordinals to obtain particular items. Still, a collection isn't a drop-in replacement for an array.The trick to differentiating collections from arrays is knowing the difference between order and position. For example,
List
is an interface that preserves the order in which items are placed into a collection, as Listing 1 shows:Listing 1. Mutable keys
import java.util.*; public class OrderAndPosition { public static |
When the third element is removed from the above
List
, the other items "behind" it slide up to fill the empty slots. Clearly, this collections behavior differs from that of an array. (In fact, removing an item from an array is itself not quite the same thing as removing it from a List
— "removing" an item from an array means overwriting its index slot with a new reference or null.)There's no doubt that Java developers love the Java Collections
Iterator
, but when was the last time you really looked at theIterator
interface? Most of the time, we just slap Iterator
inside a for()
loop or enhanced for()
loop and move on, so to speak.But, for those who go digging,
Iterator
has two surprises in store.First,
Iterator
supports the ability to remove an object from a source collection safely, by calling remove()
on the Iterator
itself. The point here is to avoid a ConcurrentModifiedException
, which signals precisely what its name implies: that a collection was modified while an Iterator
was open against it. Some collections will let you get away with removing or adding elements to a Collection
while iterating across it, but calling remove()
on the Iterator
is a safer practice.Second,
Iterator
supports a derived (and arguably more powerful) cousin. ListIterator
, only available from List
s, supports both adding and removing from a List
during iteration, as well as bidirectional scrolling through List
s.Bidirectional scrolling can be particularly powerful for scenarios such as the ubiquitous "sliding set of results," showing 10 of many results retrieved from a database or other collection. It can also be used to "walk backwards" through a collection or list, rather than trying to do everything from the front. Dropping in a
ListIterator
is much easier than using downward-counting integer parameters to List.get()
to "walk backwards" through a List
.Ruby and Groovy developers like to brag about how they can iterate across a text file and print its contents to the console with a single line of code. Most of the time, they say, doing the same thing in Java programming takes dozens of lines of code: open a
FileReader
, then a BufferedReader
, then create a while()
loop to call getLine()
until it comes back null. And, of course, you have to do all this in a try/catch/finally
block that will handle exceptions and close the file handle when finished.It may seem like a silly and pedantic argument, but it does have some merit.
What they (and quite a few Java developers) don't know is that not all
Iterable
s have to come from collections. Instead, anIterable
can create an Iterator
that knows how to manufacture the next element out of thin air, rather than blindly handing it back from a pre-existing Collection
:Listing 2. Iterating a file
// FileUtils.java import java.io.*; import java.util.*; public class FileUtils { public static Iterable |
This approach has the advantage of not holding the entire contents of a file in memory, but with the caveat that, as written, it doesn't
close()
the underlying file handle. (You could fix this by closing whenever readLine()
returns null, but that won't solve cases where Iterator
doesn't run to completion.)Map
is a wonderful collection, bringing us the niftiness of key/value pair collections often found in other languages like Perl. And the JDK gives us a great Map
implementation in the form of the HashMap
, which uses hashtables internally to support fast key lookups for corresponding values. But therein lies a subtle problem: Keys that support hash codes dependent on the contents of mutable fields are vulnerable to a bug that will drive even the most patient Java developer batty.Assuming the
Person
object in Listing 3 has a typical hashCode()
(which uses the firstName
, lastName
, and age
fields — all non-final — to calculate the hashCode()
), the get()
call to Map
will fail and return null
:Listing 3. Mutable
hashCode()
drives me buggy// Person.java import java.util.*; public class Person implements Iterable |
Clearly, this approach is a pain but the solution is easy: Never use a mutable object type as a key in a
HashMap
.When cruising through the Javadocs, Java developers frequently happen across the
SortedSet
type (and its lone implementation in the JDK, the TreeSet
). Because SortedSet
is the only Collection
in the java.util
package that offers any sorting behavior, developers often begin using it without questioning the details too closely. Listing 4 demonstrates:Listing 4.
SortedSet
, I'm so glad I found you!import java.util.*; public class UsingSortedSet { public static void main(String[] args) { List |
After working with this code for a while, you might discover one of the
Set
's core features: that it disallows duplicates. This feature is actually described in the Set
Javadoc. A Set
is a "collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element."But this doesn't actually seem to be the case — although none of the
Person
objects in Listing 4 are equal (according to theequals()
implementation on Person
), only three objects are present within the TreeSet
when printed.Contrary to the stated nature of the set, the
TreeSet
, which requires objects to either implement Comparable
directly or have aComparator
passed in at the time of construction, doesn't use equals()
to compare the objects; it uses the compare
orcompareTo
methods of Comparator/Comparable
.So, objects stored in a
Set
will have two potential means of determining equality: the expected equals()
method and theComparable/Comparator
method, depending on the context of who is asking.What's worse, it isn't sufficient to simply declare that the two should be identical, because comparison for the purpose of sorting isn't the same as comparison for the purpose of equality: It may be perfectly acceptable to consider two
Person
s equal when sorting by last name, but not equal in terms of their contents.Always ensure that the difference between
equals()
and the Comparable.compareTo()
-returning-0 is clear when implementing Set
. By extension, the difference should also be clear in your documentation.The Java Collections library is scattered with tidbits that can make your life much easier and more productive, if only you know about them. Unearthing tidbits often involves some complexity, however, like discovering that you can have your way with
HashMap
, just as long as you never use a mutable object type as its key.So far, we've dug beneath the surface of Collections, but we haven't yet hit the gold mine: Concurrent Collections, introduced in Java 5. The next five tips in this series will focus on
java.util.concurrent
.(
)
No comments:
Post a Comment