Algebraic Datatypes
Overview
Haskell has a system called algebraic datatypes for defining new types. This sounds fancy, but is rather simple. Let’s dive in by looking at the standard library definitions of some familiar types:
data Bool = True | False
data Ordering = LT | EQ | GT
With this syntax you too can define types:
-- definition of a type with three values
data Color = Red | Green | Blue
-- a function that uses pattern matching on our new type
rgb :: Color -> [Double]
rgb Red = [1,0,0]
rgb Green = [0,1,0]
rgb Blue = [0,0,1]
Prelude> :t Red
Red :: Color
Prelude> :t [Red,Blue,Green]
[Red,Blue,Green] :: [Color]
Prelude> rgb Red
[1.0,0.0,0.0]
Fields
Types like Bool
, Ordering
and Color
that just list a bunch of
constants are called enumerations or enums in Haskell and other
languages. Enums are useful, but you need other types as well. Here we
define a type for reports containing an id number, a title, and a body:
data Report = ConstructReport Int String String
This is how you create a report:
Prelude> :t ConstructReport 1 "Title" "This is the body."
ConstructReport 1 "Title" "This is the body." :: Report
You can access the fields with pattern matching:
reportContents :: Report -> String
reportContents (ConstructReport _ _ contents) = contents
setReportContents :: String -> Report -> Report
setReportContents contents (ConstructReport id title _) =
ConstructReport id title contents
Constructors
The things on the right hand side of a data
declaration are called
constructors. True
, False
, Red
and ConstructReport
are all
examples of constructors. A type can have multiple constructors, and a
constructor can have zero or more fields.
Here is a datatype for a standard playing card. It has five
constructors, of which Joker
has zero fields and the others have one
field.
data Card = Joker
| Heart Int
| Club Int
| Spade Int
| Diamond Int
Constructors with fields have function type and can be used wherever functions can:
Prelude> :t Heart
Heart :: Int -> Card
Prelude> :t Club
Club :: Int -> Card
Prelude> map Heart [1,2,3]
[Heart 1,Heart 2,Heart 3]
Prelude> (Heart . (\x -> x+1)) 3
Heart 4
Sidenote: Deriving
By the way, there’s something missing from our Card
type. Look at how
it behaves compared to Ordering
and Bool
:
Prelude> EQ
EQ
Prelude> True
True
Prelude> Joker
<interactive>:1:0:
No instance for (Show Card)
arising from a use of `print' at <interactive>:1:0-4
Possible fix: add an instance declaration for (Show Card)
In a stmt of a 'do' expression: print it
The problem is that Haskell does not know how to print the types we
defined. As the error says, they are not part of the Show
class. The
easy solution is to just add a deriving Show
after the type
definition:
data Card = Joker
| Heart Int
| Club Int
| Spade Int
| Diamond Int
deriving Show
Prelude> Joker
Joker
The deriving
syntax is a way to automatically make your class a member
of certain basic type classes, most notably Read
, Show
and Eq
.
We’ll talk more about what this means later.
Algebraic?
So why are these datatypes called algebraic? This is because, theoretically speaking, each datatype can be a sum of constructors, and each constructor is a product of fields. It makes sense to think of these as sums and products for many reasons, one being that we can count the possible values of each type this way:
-- corresponds to 1+1. Has 2 possible values.
data Bool = True | False
-- corresponds to Bool*Bool, i.e. 2*2. Has 4 possible values
data TwoBools = TwoBools Bool Bool
-- corresponds to Bool*Bool+Bool+1 = 2*2+2+1 = 7
-- Has 7 possible values.
data Complex = Two Bool Bool | One Bool | None
There is a rich theory of algebraic datatypes. If you’re interested, you might find more info here or here.
Type Parameters
We introduced type parameters and parametric polymorphism when
introducing lists in a previous reading. Since then, we’ve seen other
parameterized types like Maybe
and Either
. Now we’ll learn how we
can define our own parameterized types.
Defining Parameterized Types
The definition for Maybe
is:
data Maybe a = Nothing | Just a
What’s a
? We define a parameterized type by mentioning a type
variable (a
in this case) on the left side of the =
sign. We can
then use the same type variable in fields for our constructors. This is
analogous to polymorphic functions. Instead of defining separate
functions
headInt :: [Int] -> Int
headBool :: [Bool] -> Bool
and so on, we define one function head :: [a] -> a
that works for all
types a
. Similarly, instead of defining multiple types
data MaybeInt = NothingInt | JustInt Int
data MaybeBool = NothingBool | JustBool Bool
we define one type Maybe a
that works for all types a
.
Here’s our first own parameterized type Described
. The values of type
Described a
contain a value of type a
and a String
description.
data Described a = Describe a String
getValue :: Described a -> a
getValue (Describe x _) = x
getDescription :: Described a -> String
getDescription (Describe _ desc) = desc
Prelude> :t Describe
Describe :: a -> String -> Described a
Prelude> :t Describe True "This is true"
Describe True "This is true" :: Described Bool
Prelude> getValue (Describe 3 "a number")
3
Prelude> getDescription (Describe 3 "a number")
"a number"
Syntactic Note
In the above definitions, we’ve used a
as a type variable. However any
word that starts with a lower case letter is fine. We could have defined
Maybe
like this:
data Maybe theType = Nothing | Just theType
The rules for Haskell identifiers are:
- Type variables and names for functions and values start lower case
(e.g.
a
,map
,xs
) - Type names and constructor names start with upper case (e.g.
Maybe
,Just
,Card
,Heart
)
Note that a type and its constructor can have the same name. This is very common in Haskell code for types that only have one constructor. In this material we try to avoid it to avoid confusion. Here are some examples:
data Pair a = Pair a a
data Report = Report Int String String
Prelude> :t Pair
Pair :: a -> a -> Pair a
Beware of mixing up types and constructors. Luckily types and constructors can never occur in the same context, so you get a nice error:
Prelude> Maybe
<interactive>:1:1: error:
• Data constructor not in scope: Maybe
Prelude> undefined :: Nothing
<interactive>:2:14: error:
Not in scope: type constructor or class ‘Nothing’
Sidenote: Multiple Type Parameters
Types can have multiple type parameters. The syntax is similar to
defining functions with many arguments. Here’s the definition of the
standard Either
type:
data Either a b = Left a | Right b
Recursive Types
So far, all of the types we’ve defined have been of constant size. We can represent one report or one colour, but how could we represent a collection of things? We could use lists of course, but could we define a list type ourselves?
Just like Haskell functions, Haskell data types can be recursive. This is no weirder than having an object in Java or Python that refers to another object of the same class. This is how you define a list of integers:
data IntList = Empty | Node Int IntList
deriving Show
ihead :: IntList -> Int
ihead (Node i _) = i
itail :: IntList -> IntList
itail (Node _ t) = t
ilength :: IntList -> Int
ilength Empty = 0
ilength (Node _ t) = 1 + ilength t
We can use the functions defined above to work with lists of integers:
Prelude> ihead (Node 3 (Node 5 (Node 4 Empty)))
3
Prelude> itail (Node 3 (Node 5 (Node 4 Empty)))
Node 5 (Node 4 Empty)
Prelude> ilength (Node 3 (Node 5 (Node 4 Empty)))
3
Note that we can’t put values other than Int
s inside our IntList
:
Prelude> Node False Empty
<interactive>:3:6: error:
• Couldn't match expected type ‘Int’ with actual
type ‘Bool’
• In the first argument of ‘Node’, namely ‘False’
In the expression: Node False Empty
In an equation for ‘it’: it = Node False Empty
To be able to put any type of element in our list, let’s do the same
thing with a type parameter. This is the same as the built in type
[a]
, but with slightly clunkier syntax:
data List a = Empty | Node a (List a)
deriving Show
Note how we need to pass the the type parameter a
onwards in the
recursion. We need to write Node a (List a)
instead of Node a List
.
The Node
constructor has two arguments. The first has type a
, and
the second has type List a
. Here are the reimplementations of some
standard list functions for our List
type:
lhead :: List a -> a
lhead (Node h _) = h
ltail :: List a -> List a
ltail (Node _ t) = t
lnull :: List a -> Bool
lnull Empty = True
lnull _ = False
llength :: List a -> Int
llength Empty = 0
llength (Node _ t) = 1 + llength t
Prelude> lhead (Node True Empty)
True
Prelude> ltail (Node True (Node False Empty))
Node False Empty
Prelude> lnull Empty
True
Note that just like with normal Haskell lists, we can’t have elements of different types in the same list:
Prelude> Node True (Node "foo" Empty)
<interactive>:5:12: error:
• Couldn't match type ‘[Char]’ with ‘Bool’
Expected type: List Bool
Actual type: List [Char]
• In the second argument of ‘Node’, namely
‘(Node "foo" Empty)’
In the expression: Node True (Node "foo" Empty)
In an equation for ‘it’:
it = Node True (Node "foo" Empty)
Example: Growing a Tree
Just like a list, we can also represent a binary tree:
data Tree a = Node a (Tree a) (Tree a) | Empty
Our tree contains nodes, which contain a value of type a
and two child
trees, and empty trees.
In case you’re not familiar with binary trees, they’re a data structure
that’s often used as the basis for other data structures (Data.Map
is
based on trees!). Binary trees are often drawn as (upside-down)
pictures, like this:
The highest node in the tree is called the root (0
in this case),
and the nodes with no children are called leaves
(2
, 3
and 4
in
this case). We can define this tree using our Tree
type like this:
example :: Tree Int
example = (Node 0 (Node 1 (Node 2 Empty Empty)
(Node 3 Empty Empty))
(Node 4 Empty Empty))
The height of a binary tree is length of the longest path from the root
to a leaf. In Haskell terms, it’s how many nested levels of Node
constructors you need to build the tree. The height of our example tree
is 3. Here’s a function that computes the height of a tree:
treeHeight :: Tree a -> Int
treeHeight Empty = 0
treeHeight (Node _ l r) =
1 + max (treeHeight l) (treeHeight r)
treeHeight Empty ==> 0
treeHeight (Node 2 Empty Empty)
==> 1 + max (treeHeight Empty)
(treeHeight Empty)
==> 1 + max 0 0
==> 1
treeHeight (Node 1 Empty (Node 2 Empty Empty))
==> 1 + max (treeHeight Empty)
(treeHeight (Node 2 Empty Empty))
==> 1 + max 0 1
==> 2
treeHeight (Node 0 (Node 1 Empty
(Node 2 Empty Empty))
Empty)
==> 1 + max (treeHeight (Node 1 Empty
(Node 2 Empty Empty)))
(treeHeight Empty)
==> 1 + max 2 0
==> 3
In case you’re familiar with binary search trees, here are the definitions of the lookup and insert opertions for a binary search tree. If you don’t know what I’m talking about, you don’t need to understand this.
lookup :: Int -> Tree Int -> Bool
lookup x Empty = False
lookup x (Node y l r)
| x < y = lookup x l
| x > y = lookup x r
| otherwise = True
insert :: Int -> Tree Int -> Tree Int
insert x Empty = Node x Empty Empty
insert x (Node y l r)
| x < y = Node y (insert x l) r
| x > y = Node y l (insert x r)
| otherwise = Node y l r
Record Syntax
If some fields need to be accessed often, it can be convenient to have
helper functions for reading those fields. For instance, the type
Person
might have multiple fields:
data Person = MkPerson String Int String String String
deriving Show
A list of persons might look like the following:
people :: [Person]
people = [ MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer"
, MkPerson "Maija Meikäläinen" 35 "Rovaniemi" "Finland" "Engineer"
, MkPerson "Mauno Mutikainen" 27 "Turku" "Finland" "Mathematician" ]
Suppose that we need to find all engineers from Finland:
query :: [Person] -> [Person]
query [] = []
query (MkPerson name age town state profession):xs
| state == "Finland" && profession == "Engineer" =
(MkPerson name age town state profession) : query xs
| otherwise = query xs
Thus,
query people
==> [MkPerson "Maija Meikäläinen" 35 "Rovaniemi" "Finland" "Engineer"]
Note that the types of the fields give little information on what is the
intended content in those fields. We need to remember in all places in
the code that town
goes before state
and not vice versa.
Haskell has a feature called record syntax that is helpful in these
kinds of cases. The datatype Person
can be defined as a record:
data Person =
MkPerson { name :: String
, age :: Int
, town :: String
, state :: String
, profession :: String }
deriving Show
We can still define values of Person
normally, but the Show
instance
prints the field names for us:
Prelude> MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer"
MkPerson {name = "Jane Doe", age = 21, town = "Houston", state = "Texas", profession = "Engineer"}
However, we can also define values using record syntax. Note how the fields don’t need to be in any specific order now that they have names.
Prelude> MkPerson {name = "Jane Doe", town = "Houston", profession = "Engineer", state = "Texas", age = 21}
MkPerson {name = "Jane Doe", age = 21, town = "Houston", state = "Texas", profession = "Engineer"}
Most importantly, We get accessor functions for the fields for free:
Prelude> :t profession
profession :: Person -> String
Prelude> profession (MkPerson "Jane Doe" 21 "Houston" "Texas" "Engineer")
"Engineer"
We can now rewrite the query function using these accessor functions:
query :: [Person] -> [Person]
query [] = []
query (x:xs)
| state x == "Finland" && profession x == "Engineer" =
x : query xs
| otherwise = query xs
You’ll probably agree that the code looks more pleasant now.
Algebraic Datatypes: Summary
- Types are defined like this
data TypeName = ConstructorName FieldType FieldType2
| AnotherConstructor FieldType3
| OneMoreCons
- … or like this if we’re using type variables
data TypeName variable = Cons1 variable Type1
| Cons2 Type2 variable
- You can have one or more constructors
- Each constructor can have zero or more fields
- Constructors start with upper case, type variables with lower case
- Values are handled with pattern matching:
foo (ConstructorName a b) = a+b
foo (AnotherConstructor _) = 0
foo OneMoreCons = 7
- Constructors are just functions:
ConstructorName :: FieldType -> FieldType2 -> TypeName
Cons1 :: a -> Type1 -> TypeName a
- You can also define datatypes using record syntax:
data TypeName = Constructor { field1 :: Field1Type
, field2 :: Field2Type }
This gives you accessor functions like
field1 :: TypeName -> Field1Type
for free.
Sidenote: Other Ways of Defining Types
In addition to the data
keyword, there are two additional ways of
defining types in Haskell.
The newtype
keyword works like data
, but you can only have a single
constructor with a single field. It’s sometimes wise to use newtype
for performance reasons.
The type
keyword introduces a type alias. Type aliases don’t affect
type checking, they just offer a shorthand for writing types. For
example the familiar String
type is an alias for [Char]
:
type String = [Char]
This means that whenever the compiler reads String
, it just
immediately replaces it with [Char]
. Type aliases seem useful, but
they can easily make reading type errors harder.
How Do Algebraic Datatypes Work?
Remember how lists were represented in memory as linked lists? Let’s look in more detail at what algebraic datatypes look like in memory.
Haskell data forms directed graphs in memory. Every constructor is a
node, every field is an edge. Names (of variables) are pointers into
this graph. Different names can share parts of the structure. Here’s
an example with lists. Note how the last two elements of x
are shared
with y
and z
.
let x = [1,2,3,4]
y = drop 2 x
z = 5:y
What happens when you make a new version of a datastructure is called path copying. Since Haskell data is immutable, the changed parts of the datastructure get copied, while the unchanged parts can be shared between the old and new versions.
Consider the definition of ++
:
[] ++ ys = ys
(x:xs) ++ ys = x:(xs ++ ys)
We are making a copy of the first argument while we walk it. For every
:
constructor in the first input list, we are creating a new :
constructor in the output list. The second argument can be shared. It is
not used at all in the recursion. Visually:
One more way to think about it is this: we want to change the tail
pointer of the list element (3:)
. That means we need to make a new
(3:)
. However the (2:)
points to the (3:)
so we need a new copy of
the (2:)
as well. Likewise for (1:)
.
The graphs that we get when working with lists are fairly simple. As a more involved example, here is what happens in memory when we run the binary tree insertion example from earlier in this lecture.
insert :: Int -> Tree Int -> Tree Int
insert x Empty = Node x Empty Empty
insert x (Node y l r)
| x < y = Node y (insert x l) r
| x > y = Node y l (insert x r)
| otherwise = Node y l r
Note how the old and the new tree share the subtree with 3 and 4 since it wasn’t changed, but the node 7 that was “changed” and all nodes above it get copied.
Self Checks
Check 1
Why can’t we map Nothing
?
- Because
Nothing
doesn’t take arguments - Because
Nothing
returns nothing - Because
Nothing
is a constructor.
Check 2
If we define data Boing = Frick String Boing (Int -> Bool)
, what is
the type of Frick
?
Boing
String -> Boing -> Int -> Bool -> Boing
String -> Boing -> (Int -> Bool) -> Boing
Check 3
If we define data ThreeLists a b c = ThreeLists [a] [b] [c]
, what is
the type of the constructor ThreeLists
?
[a] -> [b] -> [c] -> ThreeLists
a -> b -> c -> ThreeLists a b c
[a] -> [b] -> [c] -> ThreeLists a b c
[a] -> [b] -> [c] -> ThreeLists [a] [b] [c]
Check 4
If we define:
data TwoLists a b = TwoList { aList :: [a]
, bList :: [b] }
what is the type of the function aList
?
aList
is not a function, it is a fieldTwoLists a b -> [a]
[a] -> TwoLists a b
[a]
Acknowledgment:
This reading was originally written by Joel Kaasinen and John Lång in their open source textbook. The book is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, which allows users to copy, modify, and distribute the book.
This reading was modified by Titus H. Klinge in 2021 and presented above under the same license in order to better serve the students of this course.