1.3. B Tree

1.3.1. General

A B-tree has interior nodes and leaf nodes. The leaf nodes have no children. The interior nodes have children. The root node is either an interior node (if it has children) or a leaf node (if it doesn’t have children).

Each node carries keys (or key value pairs). The keys are sorted. Each key separates two children in the interior nodes. The child to the left of a key has only keys strictly lower than the key and the child to the right of a key has only keys strictly greater than the key.

Each interior node has one child more than the number of keys. The maximum number m of children in a B tree is called its order. The minimal order of a B tree is 3.

            Example: B-tree of order 3

                +---+
                | 4 |                             root node / interior node
                +---+
             /         \
    +---+                +---+---+
    | 2 |                | 7 | 9 |                interior nodes
    +---+                +---+---+
    |   |               /    |    \
+---+   +---+   +---+---+  +---+   +---+
| 1 |   | 3 |   | 5 | 6 |  | 8 |   | 10|          leaf nodes
+---+   +---+   +---+---+  +---+   +---+

A B-tree of order m satisfies the following invariant:

  • Every node has at most m - 1 keys and m children in case of an interior node.

  • Every node (except the root node) has at least ceil (m / 2) - 1 keys and ceil (m / 2) children in case of an iterior node.

  • The tree is balanced in the sense that all leaves appear at the same level.

  • The tree is sorted in the sense that all keys of the child to the left of a key are strictly smaller than the key and all keys of the child to the right of a key are strictly greater than the key.

Note that m / 2 is not an integral value if m is an odd number. Therefore ceil (m / 2) rounds the value up to its nearest integral number. Example: If the order is 3, then 3 / 2 = 1.5 and ceil (3 / 2) = 2. I.e. a B tree of order 3 has at least 2 children and 1 key and at most 3 children and 2 keys.

order

keys

children

3

1 - 2

2 - 3

4

1 - 3

2 - 4

5

2 - 4

3 - 5

6

2 - 5

3 - 6

32

15 - 31

16 - 32

According to the invariant each node except the root node has at least the minimal number of keys. Any interior node except the root node has at least the miminal number of children.

The keys (or key value pairs) and the pointers to the children are usually stored in arrays. If we choose a power of two for the order which represents the caches size of a processor (e.g. 32), then the arrays can be stored within a cache line which speeds up insertion an deletion and results in better data locality.

B trees can be used to implement finite maps i.e. to store and retrieve key value pairs or finite sets i.e. to store keys. The keys have to be sortable. In this section we describe the the implementation of finite map via B trees.

In ocaml we use a module functor of the following form:

module Map (Key: SORTABLE) =
struct

    let order = ...                             (* usually 32 for cache efficiency *)

    let odd_order: bool =                       (* is the order odd? *)
        assert (3 <= order);                    (* minimal order 3 is a must *)
        order / 2 * 2 < order

    let max_keys: int = order - 1               (* the maximal number of keys *)

    let min_keys: int =                         (* the minimal number of keys *)
        if odd_order then
            (order - 1) / 2
        else
            order / 2 - 1

    type 'a pairs = (Key.t * 'a) array

    type 'a t =
        | Leaf of 'a pairs                 (* key value pairs *)
        | Node of 'a pairs * 'a t array    (* key value pairs + children *)

    ...
    (* Search, insert and delete see below ... *)
end

with the module type SORTABLE which is defined in the module Fmlib_std.Ìnterfaces.

module type SORTABLE = sig
    type t

    val compare: t -> t -> int
    (** [compare a b]

        compare a b < 0         if and only if a < b
        compare a b = 0         if and only if a = b
        compare a b > 0         if and only if a > b
    *)

A leaf consists of an array of keys (or key value pairs)

   0    1    2                              len
+----+----+------------------------------+
| k0 | k1 | ...                          |
+----+----+------------------------------+

where k0, k1, … k(len-1) are the keys of the leaf.

An interior node consists of an array of keys and an array of children where the array of children has one more element than the array of keys.

   0    1    2                              len
+----+----+------------------------------+
| k0 | k1 | ...                          |
+----+----+------------------------------+
|    |    |                              |
c0   c1   c2                             c(len)

c0, c1, c2, … c(len) are the children with the property that all keys in the child c1 are strictly less than the key k1 and all the keys in the child c2 are strictly greater than the key k1.

1.3.3. Insertion

All nodes have at most m - 1 keys and m children where m is the order of the B tree. A node which has exactly m - 1 keys (and m children in case of an interior node) is full.

Insertion always starts in a leaf node. If the leaf node is full, then the insertion causes an overflow. The overflow condition might pop up to the root. In that situation the height of the tree grows.

An insertion of a key value pair in a tree which has already a key value pair with the same key causes a nondestructive overwrite of the old value by the new value. The structure of the B tree remains the same, just the value of the corresponding key value pair will be updated.

A proper insertion of a new key value pair starts by searching the leaf node and the position in the leaf node where to insert the new pair. The result of the insertion is described by the data type

type 'a insert =
    | Normal_insert of 'a t
    | Split_insert of 'a t * (Key.t * 'a) * 'a t

I.e. we insert the new pair into a leaf node and either return a new leaf node if there is enough room in the node or a splitted leaf node with a new key value pair which has to be inserted into the corresponding parent or a splitted leaf node which consist of a left leaf node, a popup key value pair and a right leaf node.

The insertion into the parent can either end in a normal insert or in a split insert.

During insertion the invariant is maintained that the popup key separates all the key value pairs of the left tree from the key value pairs of the right tree.

The basic insertion function looks like

let add (key: Key.t) (value: 'a) (map: 'a t): 'a t =
    match add_aux key value map with
    | Normal_insert map ->
        map
    | Split_insert (left, popup_key, right) ->
        (* tree grows at the root *)
        Node ([| popup_key |], [| left; right |]

If the splitting reaches the root, then a new root is created with one key value pair (the popup pair) and two children.

The basic insertion function uses the auxiliary function add_aux which implements the recursive algorithm.

let rec add_aux (key: Key.t) (value: 'a) (map: 'a t): 'a insert =
    match map with
    | Leaf pairs ->
        add_in_leaf key value pairs

    | Node (pairs, children) ->
        let i, exact = bsearch key pairs in
        if exact then
            (* An exact match has been found. Therefore update the value. *)
            let pairs = Array.replace i (key,value) pairs in
            Normal_insert (Node (pairs, children))
        else
            (** Add the key value pair into the [i]th child. *)
            match add_aux key value children.(i) with
            | Normal_insert child ->
                let children = Array.replace i child children in
                Normal_insert (Node (pairs, children))
            | Split_insert (u, y, v) ->
                add_in_node i u y v pairs children

The function add_aux uses the two helper functions add_in_leaf and add_in_node to insert the key value pair either into a leaf node or an interior node.

1.3.3.1. Insertion into a leaf node

The insertion into a leaf node is easy, if the leaf is not full. In case of overflow we have to distinguish several cases.

In the following we assume that we want to insert a key value pair y at position i into a full leaf node. As a result we want to get a triple (left, popup_key, right) where left is the left tree after the split, right is the right key after the split and the key in the key value pair popup_key separates the keys in left from the keys in right.

Now we analyze the different cases.

Overflow, odd order:

m = 2 * k + 1, i.e. we have 2 * k keys and the array of the key value pairs consists of two subarrays of size k. At the right end of the left subarray there is a key s1 and at the left end of the right subarray there is a key s2.

We have to distinguish the cases i = k, i < k and i > k.

i = k:

In that case we have s1 < y < s2. We split the array into the two subarrays of size k and use y as the popup key.

before:
                        i
                        k                      2k
+-----------------+---+---+------------------+
|                 | s1| s2|                  |
+-----------------+---+---+------------------+


after:
                      +---+
                      | y |
                      +---+
                      |   |
+-----------------+---+   +---+------------------+
|                 | s1|   | s2|                  |
+-----------------+---+   +---+------------------+
i < k:

In that case we have y < s1 and we have to insert y into the left subarray and use s1 as the popup key.

before:
         i              k                      2k
+------+---+------+---+---+------------------+
|      | z |      | s1| s2|                  |
+------+---+------+---+---+------------------+


after:
                      +---+
                      | s1|
                      +---+
         i            |   |
+------+---+---+------+   +---+------------------+
|      | y | z |      |   | s2|                  |
+------+---+---+------+   +---+------------------+
i > k:

In that case we have s2 < y and we have to insert y into the right subarray and use s2 as the popup key.

before:
                        k        i             2k
+-----------------+---+---+----+---+---------+
|                 | s1| s2|    | z |         |
+-----------------+---+---+----+---+---------+


after:
                      +---+
                      | s2|
                      +---+
                      |   |
+-----------------+---+   +-----+---+---+--------+
|                 | s1|   |     | y | z |        |
+-----------------+---+   +-----+---+---+--------+
Overflow, even order:

m = 2 * k, i.e. we have 2 * k - 1 key value pairs and the array of key value pairs consists of two subarrays of size k - 1 with a single separator pair s which separates the two subarrays.

We have to distinguish the cases i < k and i >= k. In both cases s can be used as the popup key. The insertion of y happens either in the left or in the right subarray.

i < k:

before:
            i            k
+---------+---+----+---+------------------+
|         | z |    | s |                  |
+---------+---+----+---+------------------+


after:

                        +---+
                        | s |
                        +---+
            i           |   |
+---------+---+---+-----+   +------------------+
|         | y | z |     |   |                  |
+---------+---+---+-----+   +------------------+

i >= k:

before:
                         k    i
+------------------+---+----+---+---------+
|                  | s |    | z |         |
+------------------+---+----+---+---------+


after:

                   +---+
                   | s |
                   +---+
                   |   |
+------------------+   +-----+---+---+---------+
|                  |   |     | y | z |         |
+------------------+   +-----+---+---+---------+

The following function does the insertion into a leaf node.

let add_in_leaf (key: Key.t) (value: 'a) (pairs: 'a pairs): 'a insert =
    let len = Array.length pairs in
    let i, exact = bsearch key pairs in
    if exact then
        Normal_insert (Leaf (Array.replace i (key, value) pairs))

    else if len < max_keys then
        (* Leaf is not full. *)
        Normal_insert (Leaf (Array.insert i (key, value) pairs))

    else
        (* Leaf is full *)
        let insert_subarray = insert_subarray pairs i (key, value)
        and k = order / 2
        in
        if odd_order then
            if i = k then
                let left  = subarray pairs 0 k
                and right = subarray pairs k len
                in
                Split_insert (Leaf left, (key, value), Leaf right)
            else if i < k then
                let left  = insert_subarray 0 (k - 1)
                and right = subarray pairs k len
                in
                Split_insert (Leaf left, pairs.(k - 1), Leaf right)
            else
                let left  = subarray pairs 0 k
                and right = insert_subarray (k + 1) len
                in
                Split_insert (Leaf left, pairs.(k), Leaf right)
        else begin
            (* even order *)
            if i < k then
                let left  = insert_subarray 0 (k - 1)
                and right = subarray pairs k len
                in
                Split_insert (Leaf left, pairs.(k - 1), Leaf right)
            else
                let left  = subarray pairs 0 (k - 1)
                and right = insert_subarray k len
                in
                Split_insert (Leaf left, pairs.(k - 1), Leaf right)
        end

Using the two helper functions subarray and insert_subarray.

subarray (arr: 'a array) (start: int) (beyond: int): 'a t =
    (* The subarray of [arr] starting at [start] and ending one before [beyond]. *)
    assert (0 <= start);
    assert (start <= beyond);
    assert (beyond <= Array.length arr);
    Array.sub arr start (beyond - start)


let insert_subarray
        (arr: 'a array) (i: int) (x: 'a) (start: int) (beyond: int)
    : 'a array
    =
    (* The subarray of [arr] starting at [start] and ending one before [beyond]
       with [x] inserted at position [i]. *)
    assert (0 <= start);
    assert (start <= i);
    assert (i <= beyond);
    assert (beyond <= Array.length arr);
    let arr2 = Array.make (beyond - start + 1) x in
    Array.blit arr start arr2 0 (i - start);
    Array.blit arr i arr2 (i - start + 1) (beyond - i);
    arr2

1.3.3.2. Insertion into an interior node

In this subsection we treat the situation that a key value pair has been inserted into the i th child t of an interior node and the insertion into the child t caused an overflow and resulted in the triple (u, y, v) where u and v are two valid B trees separated by the popup key value pair y. Furthermore the interior node is full and must be splitted as well.

The case distinctions are the same as in the insertion into a full leaf node with the additional complexity that the child nodes have to be treated.

Overflow, odd order:

m = 2 * k + 1. The array of the key value pairs consists of two subarray of equal size k with the key value pair s1 ending the left subarray and s2 beginning the right subarray.

We have to distinguish the cases i = k, i < k and i > k.

i = k:

s1 is a strict lower bound of all keys in t and therefore for all keys in (u, y, v). s2 is a strict upper bound. We split the interior node into the two subarrays and use u as the last child in the left part and v as the first child in the right part. We use y as the popup key.

before:
                        i
                        k                      2k
+-----------------+---+---+------------------+
|                 | s1| s2|                  |
+-----------------+---+---+------------------+
                      |
                      t


after:
                      +---+
                      | y |
                      +---+
                      |   |
+-----------------+---+   +---+------------------+
|                 | s1|   | s2|                  |
+-----------------+---+   +---+------------------+
                      |   |
                      u   v
i < k:

In that case we have to insert the popup key y at position i of the key value pairs of the interior node, replace the i``th child ``t by u and use v as the additionally needed child.

before:
         i              k                      2k
+------+---+------+---+---+------------------+
|      | z |      | s1| s2|                  |
+------+---+------+---+---+------------------+
       |
       t


after:
                      +---+
                      | s1|
                      +---+
         i            |   |
+------+---+---+------+   +---+------------------+
|      | y | z |      |   | s2|                  |
+------+---+---+------+   +---+------------------+
       |   |
       u   v
i > k:

The same as before just with insertion into the right part of the interior node.

before:
                        k        i             2k
+-----------------+---+---+----+---+---------+
|                 | s1| s2|    | z |         |
+-----------------+---+---+----+---+---------+
                               |
                               t


after:
                      +---+
                      | s2|
                      +---+
                      |   |
+-----------------+---+   +-----+---+---+--------+
|                 | s1|   |     | y | z |        |
+-----------------+---+   +-----+---+---+--------+
                                |   |
                                u   v
Overflow, even order:

m = 2 * k.

We have to distinguish the cases i < k and i >= k.

i < k:

before:
            i            k
+---------+---+----+---+------------------+
|         | z |    | s |                  |
+---------+---+----+---+------------------+
          |
          t


after:

                        +---+
                        | s |
                        +---+
            i           |   |
+---------+---+---+-----+   +------------------+
|         | y | z |     |   |                  |
+---------+---+---+-----+   +------------------+
          |   |
          u   v

i >= k:

before:
                         k    i
+------------------+---+----+---+---------+
|                  | s |    | z |         |
+------------------+---+----+---+---------+
                            |
                            t


after:

                   +---+
                   | s |
                   +---+
                   |   |
+------------------+   +-----+---+---+---------+
|                  |   |     | y | z |         |
+------------------+   +-----+---+---+---------+
                             |   |
                             u   v

The insertion function which inserts into an interior node reads like

let add_in_node
        (i: int)
        (left: 'a t)
        (pair: Key.t * 'a)
        (right: 'a t)
        (pairs: 'a pairs)
        (children: 'a t array)
    : 'a insert
    =
    let len = Array.length pairs in
    if len < max_keys then
        let pairs = Array.insert i pair pairs
        and children = Array.insert i left children
        in
        children.(i + 1) <- right;
        Normal_insert (Node (pairs, children))
    else
        (* Node is full. *)
        let k = order / 2
        and insert_subarray = insert_subarray pairs i pair
        and split_subarray start beyond =
            split_subarray children i left right start beyond
        in
        if odd_order then
            if i = k then
                let left_pairs     = subarray pairs    0 k
                and left_children  = subarray children 0 (k + 1)
                and right_pairs    = subarray pairs    k len
                and right_children = subarray children k (len + 1)
                in
                left_children.(k)  <- left;
                right_children.(0) <- right;
                Split_insert (
                    Node (left_pairs, left_children),
                    pair,
                    Node (right_pairs, right_children))
            else if i < k then
                let left_pairs     = insert_subarray 0 (k - 1)
                and left_children  = split_subarray  0 k
                and right_pairs    = subarray pairs    k len
                and right_children = subarray children k (len + 1)
                in
                Split_insert (
                    Node (left_pairs, left_children),
                    pairs.(k - 1),
                    Node (right_pairs, right_children))
            else begin
                let left_pairs     = subarray pairs    0 k
                and left_children  = subarray children 0 (k + 1)
                and right_pairs    = insert_subarray (k + 1) len
                and right_children = split_subarray  (k + 1) (len + 1) in
                Split_insert (
                    Node (left_pairs, left_children),
                    pairs.(k),
                    Node (right_pairs, right_children))
            end
        else begin
            (* even order *)
            if i < k then
                let left_pairs     = insert_subarray 0 (k - 1)
                and left_children  = split_subarray  0 k
                and right_pairs    = subarray pairs    k len
                and right_children = subarray children k (len + 1)
                in
                Split_insert (
                    Node (left_pairs, left_children),
                    pairs.(k - 1),
                    Node (right_pairs, right_children))
            else
                let left_pairs     = subarray pairs    0 (k - 1)
                and left_children  = subarray children 0 k
                and right_pairs    = insert_subarray k len
                and right_children = split_subarray  k (len + 1)
                in
                Split_insert (
                    Node (left_pairs, left_children),
                    pairs.(k - 1),
                    Node (right_pairs, right_children))
        end

with the additional helper function

let split_subarray
        (arr: 'a array) (i: int) (x: 'a) (y: 'a) (start: int) (beyond: int)
    : 'a array
    =
    (* The subarray of [arr] starting at [start] and ending one before [beyond]
       with [x] inserted at position [i] and the original value at position
       [i] replaced by [y]. *)
    assert (i < beyond);
    let arr = insert_subarray arr i x start beyond in
    arr.(i - start + 1) <- y;
    arr

1.3.4. Deletion

1.3.4.1. Basic Deletion

As with insertion, an actual deletion can only be done on a leaf node.

If the to be deleted key value pair is located in an interior node, then it cannot be deleted directly. In order to delete a key value pair in an interior node, we have to find a direct neighbour (predecessor or successor with respect to the order), which is always in a leaf node and delete the direct neighbour. Since the deleted key value pair is a direct neighbour of the to be deleted key value pair we can substitute the deleted key value pair for the key value pair in the interior node without disturbing the order.

Furthermore we have to handle a possible underflow condition. The invariant of a B tree requires that each leaf node or interior node which is not the root of the tree must have a minimal number of keys (and a minimal number of children in case of an interior node).

If a node underflows because of the deletion of a key value pair, the missing key value pair can be pulled from the parent and the missing child from the sibling. This can require to merge the underflowing child with its sibling (details see below).

The reparation might cause the parent to underflow. The reparation of the underflow condition of the parent can be done with the help of the parent of the parent etc. The underflow condition might popup recursively to the root.

An underflow condition in the root node is no problem except the case that the root becomes empty (i.e. no key value pairs and just one child). In that case the height of the tree shrinks at the root and the child becomes the new root.

In order to handle deletion we use the datastructure:

type 'a delete = {
    tree:  'a t;        (* The tree with the deleted key value pair. *)
    pair:   Key.t * 'a; (* The deleted key value pair. *)
    underflow: bool;    (* one key less than the minimal number *)
}

The basic deletion looks like:

let remove (key: Key.t) (map: 'a t): 'a t =
    match remove_aux key map with
    | None ->
        map
    | Some d ->
        match d.tree with
        | Node (pairs, children) when Array.is_empty pairs ->
            (* tree shrinks at the root *)
            children.(0)
        | _ ->
            d.tree

It uses an auxiliary function remove_aux which returns an optional 'a delete structure. The auxiliary function returns None if the tree does not have a key value pair with the desired key. In case that the tree contains a key value pair with the desired key, a tree is returned with the key value pair deleted. The function remove has to check, if the root node is an interior node with no key value pairs. In that case the tree shrinks in height and the only child becomes the new root.

The function remove_aux implements the recursion.

let rec remove_aux (key: Key.t) (map: 'a t): 'a delete option =
    match map with
    | Leaf pairs ->
        let i, exact = bsearch key pairs in
        if exact then
            let pair =  pairs.(i)
            and pairs = Array.remove i pairs
            and underflow = Array.length pairs <= min_keys
            in
            Some {
                tree = Leaf pairs;
                pair;
                underflow
            }
        else
            None

    | Node (pairs, children) ->
        let i, exact = bsearch key pairs in
        if exact then
            let d = remove_last children.(i) in
            let pair  = pairs.(i)
            and pairs = Array.replace i d.pair pairs in
            Some (handle_delete i pair pairs children d)
        else
            Option.map
                (fun d -> handle_delete i d.pair d pairs children)
                (remove_aux key children.(i))

The function searches in a leaf node and in an interior node for the key value pair which has to be deleted.

In case of an exact match, the key is deleted. In a leaf node the deletion is straightforward. In an interior node the deletion cannot be done directly. The last key value pair in the child to the left of the to be deleted key value pair is deleted with the help of the function remove_last and the last key value pair is substituted for the to be deleted key value pair. Remember that the last key value pair in the child to the left of the to be deleted key value pair are direct neighbours.

The function handle_delete checks for an underflow condition and does the reparation if needed.

If the search does not find an exact match in a leaf node, then no key value pair with the to be deleted key exists and nothing has to be done.

If the search does not find an exact match in an interior node, then the deletion continues in the child to the left of the key.

The code of the function handle_delete is straightforward and uses the helper function handle_underflow to handle an underflow condition.

let handle_delete
        (i: int)                (* Index of the child where the deletion occurred. *)
        (pair: Key.t * 'a)      (* The deleted key value pair. *)
        (d: 'a delete)          (* The new tree with the key value pair deleted. *)
        (pairs: 'a pairs)       (* The key value pairs of the parent. *)
        (children: 'a t array)  (* The children of the parent. *)
    : 'a delete
    =
    if not d.underflow then
        {
            tree = Node (pairs, Array.replace i d.tree children);
            pair;
            underflow = false
        }
    else
        let len = Array.length pairs in
        if i < len then
            handle_underflow i true d.tree children.(i + 1) pair pairs children
        else
            let i = i - 1 in
            handle_underflow i false children.(i) d.tree pair pairs children

If the deletion in the child has not caused an underflow in the chid, the new child is substituted for the old child. This cannot case an underflow in the parent either.

If the deletion in the child has caused an underflow, then a sibling has to be used to repair the underflow.

As long as the child is not the last child of the parent, there is a right sibling. The new child and its right sibling are passed to the function handle_underflow with the indication that the underflow happened in the first tree of the two siblings.

If the child is the last child of the parent, then we use its left sibling and hand it over to the function handle_underflow.

Since a valid B tree interior node has at least one key value pair and two children (even the root!) there is always either a left or a right sibling.

1.3.4.2. Handling of Underflow

In this section we treat the case that the deletion in one of the children of a parent node caused an underflow and how this underflow in the child can be repaired with the help of the parent and a sibling. We assume that the underflow happend in the left child. The situation where the underflow happened in the right child is symmetrical.

                                       i
               +---------------------+---+--------------------+
parent node    |                     | y |                    |
               +---------------------+---+--------------------+
                                     |   |
                     +-----------+---+   +---+-----------------+
siblings             |           | x |   | z |                 |
                     +-----------+---+   +---+-----------------+
                                     |   |
                                     u   v

                        underflow           no underflow

In the picture it is assumed that the siblings are interior nodes and have children. If the children are leaf nodes, then the algorithm is the same. Just ignore the children of the siblings.

The left sibling has exactly one missing key value pair and exactly one missing child. Reason: Since both are not the root node they have at least the minimal number of keys and children before the deletion. Deletion happened in the left sibling i.e. the left sibling had the minimal number of keys and children before the deletion. Otherwise no underflow would have happened.

We have to distinguish two cases: The right sibling is not minimal of the right sibling is minimal.

Right sibling is not minimal:

In that case we can chop off the first child v and the first key value pair z from the right sibling without violating the B tree invariant. The key value pair z can be pushed up to the parent and key value pair y from the parent and the B tree v can be appended to the underflowing chid to repair the situation.

rotation before:

                                           i
                   +---------------------+---+--------------------+
    parent node    |                     | y |                    |
                   +---------------------+---+--------------------+
                                         |   |
                         +-----------+---+   +---+-----------------+
    siblings             |           | x |   | z |                 |
                         +-----------+---+   +---+-----------------+
                                         |   |
                                         u   v
                             underflow           no underflow

rotation after:

                                           i
                   +---------------------+---+--------------------+
    parent node    |                     | z |                    |
                   +---------------------+---+--------------------+
                                         |   |
                     +-----------+---+---+   +-----------------+
    siblings         |           | x | y |   |                 |
                     +-----------+---+---+   +-----------------+
                                     |   |
                                     u   v
Right sibling is minimal:

In that case we cannot chop off a key value pair and a child from the right sibling without violating the B tree invariant. The only possibility is to merge the two siblings.

In order to merge the two sibling successfully, we have to push down the separator key value pair from the parent. This might cause an underflow in the parent depending on the size of the parent. A possible underflow in the parent must be repaired at the level of the parent of the parent.

The parent looses one key value pair and one child with the merge. If the parent is the root node, the key value pair might be the last one leaving the parent with the merged child as the only child. In that case we can throw away the parent and use the merged child as the new root.

The merge can never create an overflow. The merged node has 2 * min_keys key value pairs which is either the maximal number of keys or one less than the maximal number of keys.

merge before:

                                           i
                   +---------------------+---+--------------------+
    parent node    |                     | y |                    |
                   +---------------------+---+--------------------+
                                         |   |
                         +-----------+---+   +---+-----------------+
    siblings             |           | x |   | z |                 |
                         +-----------+---+   +---+-----------------+
                                         |   |
                                         u   v
                             underflow           minimal


merge after:

                                           i
                   +---------------------+--------------------+
    parent node    |                     |                    |
                   +---------------------+--------------------+
                                         |
                         +-----------+---+---+---+-----------------+
    siblings             |           | x | y | z |                 |
                         +-----------+---+---+---+-----------------+
                                         |   |
                                         u   v

The following function handles the underflow condition properly in all cases.

let handle_underflow
        (i: int)                (* Index of the child where the deletion occurred. *)
        (underflow_left: bool)  (* Underflow happend in the left child? *)
        (left_child: 'a t)
        (right_child: 'a t)
        (pair: Key.t * 'a)      (* The deleted key value pair. *)
        (pairs: 'a pairs)       (* The key value pairs of the parent. *)
        (children: 'a t array)  (* The children of the parent. *)
    : 'a delete
    =
    let not_minimal pairs1 pairs2 =
        if underflow_left then
            not_minimal pairs2
        else
            not_minimal pairs1
    in
    match left_child, right_child with
    | Leaf pairs1, Leaf pairs2 when not_minimal pairs1 pairs2 ->
        (* Right sibling is not minimal, rotate *)
        let pairs1, pairs, pairs2 =
            rotate_keys underflow_left i pairs1 pairs pairs2
        in
        let children =
            replace2 i (Leaf pairs1) (Leaf pairs2) children
        in
        {tree = Node (pairs, children); pair; underflow = false}

    | Leaf pairs1, Leaf pairs2 ->
        (* Sibling is minimal, merge *)
        merge_leaves i pair pairs1 pairs2 pairs children

    | Node (pairs1, children1), Node (pairs2, children2)
        when not_minimal pairs1 pairs2
        ->
        (* Sibling is not minimal, rotate *)
        let pairs1, pairs, pairs2 =
            rotate_keys underflow_left i pairs1 pairs pairs2
        and children1, children2 =
            rotate_children underflow_left children1 children2
        in
        let children =
            replace2
                i
                (Node (pairs1, children1))
                (Node (pairs2, children2))
                children
        in
        {tree = Node (pairs, children); pair; underflow = false}

    | Node (pairs1, children1), Node (pairs2, children2) ->
        (* Sibling is minimal, merge *)
        merge_nodes
            i pair
            pairs1 children1
            pairs2 children2
            pairs children

    | _, _ ->
        assert false (* Cannot happen, tree is balanced. *)

The function uses the following helper functions:

let not_minimal (pairs: 'a pairs): bool =
    min_keys < Array.length pairs


let replace2
        (i: int) (left: 'a t) (right: 'a t) (children: 'a t array)
    : 'a t array
    =
    let children = Array.copy children in
    assert (Array.valid_index i children);
    assert (Array.valid_index (i + 1) children);
    children.(i)     <- left;
    children.(i + 1) <- right;
    children


let rotate_keys
        (to_left: bool)
        (i: int) (left: 'a pairs) (parent: 'a pairs) (right: 'a pairs)
    : 'a pairs * 'a pairs * 'a pairs
    =
    let open Array in
    assert (valid_index i parent);
    if to_left then
        push parent.(i) left,
        replace i (first right) parent,
        remove_first right
    else
        remove_last left,
        replace i (last left) parent,
        push_front parent.(i) right


let rotate_children
        (to_left: bool)
        (left: 'a t array) (right: 'a t array)
    : 'a t array * 'a t array
    =
    let open Array in
    if to_left then
        push (first right) left,
        remove_first right
    else
        remove_last left,
        push_front (last left) right



let merge_keys
        (i: int) (left: 'a pairs) (parent: 'a pairs) (right: 'a pairs)
    : 'a pairs * 'a pairs
    =
    assert (Array.valid_index i parent);
    let len_left  = Array.length left
    and len_right = Array.length right
    in
    let merged = Array.make (len_left + 1 + len_right) parent.(i)
    and parent = Array.remove i parent
    in
    Array.blit left  0 merged 0 len_left;
    Array.blit right 0 merged (len_left + 1) len_right;
    merged, parent


let merge_leaves
        (i: int)
        (pair: Key.t * 'a)
        (pairs1: 'a pairs) (pairs2: 'a pairs)
        (pairs: 'a pairs) (children: 'a t array)
    : 'a delete
    =
    assert (i + 1 < Array.length children);
    let merged, pairs = merge_keys i pairs1 pairs pairs2
    and children      = Array.remove i children
    and underflow     = Array.length pairs <= min_keys
    in
    children.(i) <- Leaf merged;
    {tree = Node (pairs, children); pair; underflow}



let merge_nodes
        (i: int)
        (pair: Key.t * 'a)
        (pairs1: 'a pairs) (children1: 'a t array)
        (pairs2: 'a pairs) (children2: 'a t array)
        (pairs: 'a pairs) (children: 'a t array)
    : 'a delete
    =
    assert (i + 1 < Array.length children);
    let pairs_new, pairs = merge_keys i pairs1 pairs pairs2
    and children      = Array.remove i children
    and underflow     = Array.length pairs <= min_keys
    and children_new  = Array.append children1 children2
    in
    children.(i) <- Node (pairs_new, children_new);
    {tree = Node (pairs, children); pair; underflow}