score:81
Scala collections are clever things...
Internals of the collection library is one of the more advanced topics in the land of Scala. It involves higher-kinded types, inference, variance, implicits, and the CanBuildFrom
mechanism - all to make it incredibly generic, easy to use, and powerful from a user-facing perspective. Understanding it from the point-of-view of an API designer is not a light-hearted task to be taken on by a beginner.
On the other hand, it's incredibly rare that you'll ever actually need to work with collections at this depth.
So let us begin...
With the release of Scala 2.8, the collection library was completely rewritten to remove duplication, a great many methods were moved to just one place so that ongoing maintenance and the addition of new collection methods would be far easier, but it also makes the hierarchy harder to understand.
Take List
for example, this inherits from (in turn)
LinearSeqOptimised
GenericTraversableTemplate
LinearSeq
Seq
SeqLike
Iterable
IterableLike
Traversable
TraversableLike
TraversableOnce
That's quite a handful! So why this deep hierarchy? Ignoring the XxxLike
traits briefly, each tier in that hierarchy adds a little bit of functionality, or provides a more optimised version of inherited functionality (for example, fetching an element by index on a Traversable
requires a combination of drop
and head
operations, grossly inefficient on an indexed sequence). Where possible, all functionality is pushed as far up the hierarchy as it can possibly go, maximising the number of subclasses that can use it and removing duplication.
map
is just one such example. The method is implemented in TraversableLike
(Though the XxxLike
traits only really exist for library designers, so it's generally considered to be a method on Traversable
for most intents and purposes - I'll come to that part shortly), and is widely inherited. It's possible to define an optimised version in some subclass, but it must still conform to the same signature. Consider the following uses of map
(as also mentioned in the question):
"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]
In each case, the output is of the same type as the input wherever possible. When it's not possible, superclasses of the input type are checked until one is found that does offer a valid return type. Getting this right took a lot of work, especially when you consider that String
isn't even a collection, it's just implicitly convertible to one.
So how is it done?
One half of the puzzle is the XxxLike
traits (I did say I'd get to them...), whose main function is to take a Repr
type param (short for "Representation") so that they'll know the true subclass actually being operated on. So e.g. TraversableLike
is the same as Traversable
, but abstracted over the Repr
type param. This param is then used by the second half of the puzzle; the CanBuildFrom
type class that captures source collection type, target element type and target collection type to be used by collection-transforming operations.
It's easier to explain with an example!
BitSet defines an implicit instance of CanBuildFrom
like this:
implicit def canBuildFrom: CanBuildFrom[BitSet, Int, BitSet] = bitsetCanBuildFrom
When compiling BitSet(1,2,3,4) map {2*}
, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, Int, T]
This is the clever part... There's only one implicit in scope that matches the first two type parameters. The first parameter is Repr
, as captured by the XxxLike
trait, and the second is the element type, as captured by the current collection trait (e.g. Traversable
). The map
operation is then also parameterised with a type, this type T
is inferred based on the third type parameter to the CanBuildFrom
instance that was implicitly located. BitSet
in this case.
So the first two type parameters to CanBuildFrom
are inputs, to be used for implicit lookup, and the third parameter is an output, to be used for inference.
CanBuildFrom
in BitSet
therefore matches the two types BitSet
and Int
, so the lookup will succeed, and inferred return type will also be BitSet
.
When compiling BitSet(1,2,3,4) map {_.toString}
, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, String, T]
. This will fail for the implicit in BitSet, so the compiler will next try its superclass - Set
- This contains the implicit:
implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A]
Which matches, because Coll is a type alias that's initialised to be BitSet
when BitSet
derives from Set
. The A
will match anything, as canBuildFrom
is parameterised with the type A
, in this case it's inferred to be String
... Thus yielding a return type of Set[String]
.
So to correctly implement a collection type, you not only need to provide a correct implicit of type CanBuildFrom
, but you also need to ensure that the concrete type of that of that collection is supplied as the Repr
param to the correct parent traits (for example, this would be MapLike
in the case of subclassing Map
).
String
is a little more complicated as it provides map
by an implicit conversion. The implicit conversion is to StringOps
, which subclasses StringLike[String]
, which ultimately derives TraversableLike[Char,String]
- String
being the Repr
type param.
There's also a CanBuildFrom[String,Char,String]
in scope so that the compiler knows that when mapping the elements of a String
to Char
s, then the return type should also be a string. From this point onwards, the same mechanism is used.
score:8
The Architecture of Scala Collections online pages have a detailed explanation geared towards the practical aspects of creating new collections based on the 2.8 collection design.
Quote:
"What needs to be done if you want to integrate a new collection class, so that it can profit from all predefined operations at the right types? On the next few pages you'll be walked through two examples that do this."
It uses as example a collection for encoding RNA sequences and one for Patricia trie. Look for the Dealing with map and friends section for the explanation of what to do to return the appropriate collection type.
Source: stackoverflow.com
Related Query
- How are Scala collections able to return the correct collection type from a map operation?
- How to return the original collection type from generic method
- How are Scala 2.9 parallel collections working behind the scenes?
- How to disable the method return type hint in IntellijIdea scala plugin
- How to build a Map of lists of map from type safe config in scala
- Why does Scala maintain the type of collection not return Iterable (as in .Net)?
- Scala - How to convert from List of tuples of type (A,B) to type (B,A) using map
- Scala - how to build an immutable map from a collection of Tuple2s?
- How to return correct type from generic function passed a related abstract type parameter
- How can I ensure that the dynamic type of my custom Scala collection is preserved during a map()?
- How to best implement "first success" in Scala (i.e., return the first success from a sequence of failure-prone operations)
- collect vs collectFirst - why the return values are of different type - Scala
- How to get the proper return type when using a filter based on type in Scala
- How to write efficient type bounded code if the types are unrelated in Scala
- In Scala how do I define upper type bounds that are exclusive of the defined class?
- How to map a function over the HList of functions from a certain type to some type?
- how to make Scala canBuildFrom to build collection type from Seq to Set
- How to read from textfile(String type data) map and load data into parquet format(multiple columns with different datatype) in Spark scala dynamically
- How to compare the return type of a method to a Scala native or TypeTag?
- How to assign a function as a return type from function in Scala
- How do I choose the collection type to return in Scala?
- Scala 2.10 benchmark: generic methods from the collections are useless when performance is important?
- Scala Reflection: How to find a no-arg constructor via reflection (if there are multiple constructors) ? What is the type signature of "no-arg"?
- How do I have the scala compiler infer one type from another?
- How can I use the last result from a scala map as input to the next function?
- Can Scala infer the actual type from the return type actually expected by the caller?
- How to give the Scala compiler evidence that a collection has elements of the correct types?
- What are the way/s of telling scala compiler that type TableQuery[T] can map to somthing has the attribute id
- Return a Java collection of type and subclasses from Scala
- How to return a Map with Value Type as String in scala
More Query from same tag
- EitherT: Call function returning Either only if a certain condition is true (otherwise return right)
- MongoDB+Scala: Accessing deep nested data
- In ScalaPb, How to create a case object and case class extending from the same trait?
- compile scala to assembly jar with SBT or gradle
- Syntax of += for immutable Set
- Scala Array of Maps
- Java/Scala ThreadPool: How to release a thread while waiting?
- Play framework 2.0: Store values in Http.Context
- Join two Spark mllib pipelines together
- NullPointerException while working with stateful PartialFunction and collectFirst
- Store ListBuffer[List[Double]] with Redis
- Convert normal recursion to tail recursion
- Why does this not give a type error?
- In Scala 2.10 or higher, how does one supply an "empty" catch block
- Accessing type members outside the class in Scala
- Creating and merging Lists to a case class
- Providing implicit instances in the inheritor of a type that needs them
- Gatling. Check, if a HTML result contains some string
- How to convert a sparse vector to dense in Scala Spark?
- Sum of Integers Using a List
- Weird error while using resetLocalAttrs
- What's the cost of converting a sequential collection into a parallel one, against creating it from scratch
- What's the difference of the return type between "def ... = arg.foreach.println()" and "def ... = arg.foreach.println"?
- Scala API design for Java compatibility
- getting a java.io.Serializable from Option[..] in Scala
- Scala counter not updating when Akka message is received
- How to filter RDD relying on hash map?
- Adding Java Project to my Scala Project
- Spark - Can't create schema for array of structs
- Akka - Is it possible to get the message in the actor's supervisor on it's failure?