Algebraic data types in Python

Tags: / python /

Creating an algebraic data type as in functional programming languages, but within Python.


Recently, I found a dataset of regular expressions used with python's re built-in package.

I kind of wished to use those regexes from ocaml. Randomly searched online and came across an answer in this stackoverflow post.

Was surprised to see that we could actually have a form of algebraic data type in python. So I had a closer look.

(Yeah, instead of my main goal, I went on a side-quest.. Not so proud to say that it's not my first time.. 😅)

I had not used python much after Python 3.9 or so. It's come a long way since then (latest stable is 3.12.5).

Some things that I liked:

In a way, python's type hints was what made me first realize the importance of types. To think of types as actually guiding the program rather than the nuisance that I used to think they were. It took a language without static type checking for me to realize value of type checks. But that was before I came in contact with Coq, Haskell, sml and ocaml.

Python does have a lot of stuff inspired from the world of functional style of programming, after all.

—

Anyhow, as an example, I will show how a 'ADT-like' type representing binary tree can be made in Python. I kind of follow the description here used during a programming course at our institute. That course was taught by my advisor (with me as one of the TAs) and used sml.

Tree type

In sml, a tree type can be made like this:

datatype 'a tree
  = Empty
  | Node of ('a tree) * 'a * ('a tree)

Let's try replicating this in Python.

from __future__ import annotations
import dataclasses

@dataclasses.dataclass
class Node[T]:
   left: Tree[T]
   value: T
   right: Tree[T]

type Tree[T] = Node[T] | None

Observe that this is a recursive type (Python 3.12 needs stuff from future for this).

Looks like None is kind of special in Python. It seems to be a value of type NoneType, but it looked as if NoneType is not meant to be used by users. So, it seemed like in this case, None is both a value and a type. Kind of like () in Haskell.

Anyway, let's see if this working.

# Type of binary trees
# sml analogue: 'a tree
>>> Tree
Tree

>>> # Type of binary trees with int values
# sml analogue: int tree
>>> Tree[int]
Tree[int]

>>> # Type of binary trees with boolean values
# sml analogue: bool tree
>>> Tree[int]
>>> Tree[bool]
Tree[bool]

Python interpreter seems happy.

Now let's make some values.

First, to make it easy to create values, let's make a helper function singleton to make a tree with a single value in it.

def singleton[T](val: T) -> Tree[T]:
    return Node(None, val, None)

Now on to actually making some trees:

>>> egbool: Tree[bool] = Node(singleton(true), false, singleton(false))
>>> egbool
Node(left=Node(left=None, value=True, right=None), value=False, right=Node(left=None, value=False, right=None))

Note that python by itself doesn't do any static type checking. So this would also run without error.

# Type is boolean tree, but value is int tree!
>>> egboolBad: Tree[bool] = Node(singleton(0), 1, singleton(2))
>>> egboolBad
Node(left=Node(left=None, value=0, right=None), value=1, right=Node(left=None, value=2, right=None))

Got to run a static typer checker to catch this error. There are quite a few. Among them mypy, pyre, pytype, pyright are open source.

I tried mypy:

$ mypy tree.py --enable-incomplete-feature=NewGenericSyntax

tree.py:16: error: Argument 1 to "singleton" has incompatible type "int"; expected "bool"  [arg-type]
tree.py:16: error: Argument 2 to "Node" has incompatible type "int"; expected "bool"  [arg-type]
Found 2 errors in 1 file (checked 1 source file)

Syntax that is used to define Tree uses a syntax that is new in Python 3.12 (PEP 695). Static type checkers like mypy has not yet finished incorporating this change apparently due to still supporting older codeĘł. Hence the extra option passed to the mypy command.

Without the --enable-incomplete-feature=NewGenericSyntax, I got this:

error: PEP 695 generics are not yet supported. Use --enable-incomplete-feature=NewGenericSyntax for experimental support  [valid-type]

—

Okay, let's create a function on Tree[T] type. Say, in-order traversal. A function that accepts a tree and returns a list of its values.

from typing import assert_never

def inorder[T](tree: Tree[T]) -> list[T]:
    rval = []
    match tree:
        case None: pass
        case Node(lft, val, rgt):
            rval += inorder(lft) + [val] + inorder(rgt)
        case _:
            assert_never(tree)
    return rval

This uses the match .. case (patmat) syntax for structural pattern matching. Like the kind that we are used to in functional languages like sml. Not as cool maybe, but still nice.. 🙂 A tutorial for this syntax of Python is available here.

Also, see the way the T type is put in here. Kind of like an implicit argument in a Coq function. Or of a typeclass in Haskell.

(Thanks to PEP 695, I guess. Earlier T would have had to be declared separately with TypeVar. And sometimes we would also have had to use Generic from typing package).

assert_never is meant to be used only for static type checkers as a way to assert that some part of code will never be reached. See this.

Trying an example with inorder:

>>> egInt1: Tree[int] = Node(singleton(0), 1, singleton(2))
>>> egInt2: Tree[int] = Node(singleton(4), 5, singleton(6))
>>> egInt: Tree[int] = Node(egInt1, 3, egInt2)
>>> print(inorder(egInt))

[0, 1, 2, 3, 4, 5, 6]

Similarly, pre-order and post-order traversal functions can be defined.

Next, let's try making a function to map values in a Tree[T].

from typing import Callable

def treemap[A, B](
        func: Callable[[A], B],
        tree: Tree[A]) -> Tree [B]:
    match tree:
        case None:
            return None
        case Node(lft, val, rgt):
            lftmod = treemap(func, lft)
            rgtmod = treemap(func, rgt)
            return Node(lftmod, func(val), rgtmod)
        case _:
            assert_never(tree)

Callable[[A], B] indicates type of a 'callable' value (like a function) which accepts a value of type A and returns a value of type B.

The above function in sml could be like:

fun treemap f Empty = Empty
  | treemap f (Node (lft, x, rgt)) =
      let val lftmod = treemap f lft in
        let val rgtmod = treemap f rgt in
          Node (lftmod, f x, rgtmod)
        end
      end

A whole program

A complete python file with the definitions seen so far:

from __future__ import annotations
import dataclasses

from typing import Callable, assert_never

@dataclasses.dataclass
class Node[T]:
   left: Tree[T]
   value: T
   right: Tree[T]

   def __iter__(self):
       return iter(inorder(self))

   def __str__(self) -> str:
       return ' '.join(map(str, self))

# Type of trees with values of type `T`
type Tree[T] = Node[T] | None

def singleton[T](val: T) -> Tree[T]:
    return Node(None, val, None)

def inorder[T](tree: Tree[T]) -> list[T]:
    rval = []
    match tree:
        case None: pass
        case Node(lft, val, rgt):
            rval += inorder(lft) + [val] + inorder(rgt)
        case _:
            assert_never(tree)
    return rval

def preorder[T](tree: Tree[T]) -> list[T]:
    rval = []
    match tree:
        case None: pass
        case Node(lft, val, rgt):
            rval += [val] + postorder(lft) + postorder(rgt)
        case _:
            assert_never(tree)
    return rval

def postorder[T](tree: Tree[T]) -> list[T]:
    rval = []
    match tree:
        case None: pass
        case Node(lft, val, rgt):
            rval += postorder(lft) + postorder(rgt) + [val]
        case _:
            assert_never(tree)
    return rval

def treemap[A, B](
        func: Callable[[A], B],
        tree: Tree[A]) -> Tree [B]:
    match tree:
        case None:
            return None
        case Node(lft, val, rgt):
            lftmod = treemap(func, lft)
            rgtmod = treemap(func, rgt)
            return Node(lftmod, func(val), rgtmod)
        case _:
            assert_never(tree)

def rotateClockwise[T](tree: Tree[T]) -> Tree[T]:
    match tree:
        case Node(Node(llft, lval, lrgt), val, rgt):
            return Node(llft, lval, Node(lrgt, val, rgt))
        case _:
            return tree

Type checking with mypy:

$ mypy tree.py --enable-incomplete-feature=NewGenericSyntax
Success: no issues found in 1 source file

Versions of software used