Accepted answer

Scala collections are clever things...

Internals of the collection library is one of the more advanced topics in the land of Scala. It involves higher-kinded types, inference, variance, implicits, and the CanBuildFrom mechanism - all to make it incredibly generic, easy to use, and powerful from a user-facing perspective. Understanding it from the point-of-view of an API designer is not a light-hearted task to be taken on by a beginner.

On the other hand, it's incredibly rare that you'll ever actually need to work with collections at this depth.

So let us begin...

With the release of Scala 2.8, the collection library was completely rewritten to remove duplication, a great many methods were moved to just one place so that ongoing maintenance and the addition of new collection methods would be far easier, but it also makes the hierarchy harder to understand.

Take List for example, this inherits from (in turn)

  • LinearSeqOptimised
  • GenericTraversableTemplate
  • LinearSeq
  • Seq
  • SeqLike
  • Iterable
  • IterableLike
  • Traversable
  • TraversableLike
  • TraversableOnce

That's quite a handful! So why this deep hierarchy? Ignoring the XxxLike traits briefly, each tier in that hierarchy adds a little bit of functionality, or provides a more optimised version of inherited functionality (for example, fetching an element by index on a Traversable requires a combination of drop and head operations, grossly inefficient on an indexed sequence). Where possible, all functionality is pushed as far up the hierarchy as it can possibly go, maximising the number of subclasses that can use it and removing duplication.

map is just one such example. The method is implemented in TraversableLike (Though the XxxLike traits only really exist for library designers, so it's generally considered to be a method on Traversable for most intents and purposes - I'll come to that part shortly), and is widely inherited. It's possible to define an optimised version in some subclass, but it must still conform to the same signature. Consider the following uses of map (as also mentioned in the question):

"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]

In each case, the output is of the same type as the input wherever possible. When it's not possible, superclasses of the input type are checked until one is found that does offer a valid return type. Getting this right took a lot of work, especially when you consider that String isn't even a collection, it's just implicitly convertible to one.

So how is it done?

One half of the puzzle is the XxxLike traits (I did say I'd get to them...), whose main function is to take a Repr type param (short for "Representation") so that they'll know the true subclass actually being operated on. So e.g. TraversableLike is the same as Traversable, but abstracted over the Repr type param. This param is then used by the second half of the puzzle; the CanBuildFrom type class that captures source collection type, target element type and target collection type to be used by collection-transforming operations.

It's easier to explain with an example!

BitSet defines an implicit instance of CanBuildFrom like this:

implicit def canBuildFrom: CanBuildFrom[BitSet, Int, BitSet] = bitsetCanBuildFrom

When compiling BitSet(1,2,3,4) map {2*}, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, Int, T]

This is the clever part... There's only one implicit in scope that matches the first two type parameters. The first parameter is Repr, as captured by the XxxLike trait, and the second is the element type, as captured by the current collection trait (e.g. Traversable). The map operation is then also parameterised with a type, this type T is inferred based on the third type parameter to the CanBuildFrom instance that was implicitly located. BitSet in this case.

So the first two type parameters to CanBuildFrom are inputs, to be used for implicit lookup, and the third parameter is an output, to be used for inference.

CanBuildFrom in BitSet therefore matches the two types BitSet and Int, so the lookup will succeed, and inferred return type will also be BitSet.

When compiling BitSet(1,2,3,4) map {_.toString}, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, String, T]. This will fail for the implicit in BitSet, so the compiler will next try its superclass - Set - This contains the implicit:

implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A]

Which matches, because Coll is a type alias that's initialised to be BitSet when BitSet derives from Set. The A will match anything, as canBuildFrom is parameterised with the type A, in this case it's inferred to be String... Thus yielding a return type of Set[String].

So to correctly implement a collection type, you not only need to provide a correct implicit of type CanBuildFrom, but you also need to ensure that the concrete type of that of that collection is supplied as the Repr param to the correct parent traits (for example, this would be MapLike in the case of subclassing Map).

String is a little more complicated as it provides map by an implicit conversion. The implicit conversion is to StringOps, which subclasses StringLike[String], which ultimately derives TraversableLike[Char,String] - String being the Repr type param.

There's also a CanBuildFrom[String,Char,String] in scope so that the compiler knows that when mapping the elements of a String to Chars, then the return type should also be a string. From this point onwards, the same mechanism is used.


The Architecture of Scala Collections online pages have a detailed explanation geared towards the practical aspects of creating new collections based on the 2.8 collection design.


"What needs to be done if you want to integrate a new collection class, so that it can profit from all predefined operations at the right types? On the next few pages you'll be walked through two examples that do this."

It uses as example a collection for encoding RNA sequences and one for Patricia trie. Look for the Dealing with map and friends section for the explanation of what to do to return the appropriate collection type.

Related Query

More Query from same tag