1.3. B Tree
1.3.1. General
A B-tree has interior nodes and leaf nodes. The leaf nodes have no children. The interior nodes have children. The root node is either an interior node (if it has children) or a leaf node (if it doesn’t have children).
Each node carries keys (or key value pairs). The keys are sorted. Each key separates two children in the interior nodes. The child to the left of a key has only keys strictly lower than the key and the child to the right of a key has only keys strictly greater than the key.
Each interior node has one child more than the number of keys. The maximum
number m
of children in a B tree is called its order. The minimal order of
a B tree is 3.
Example: B-tree of order 3
+---+
| 4 | root node / interior node
+---+
/ \
+---+ +---+---+
| 2 | | 7 | 9 | interior nodes
+---+ +---+---+
| | / | \
+---+ +---+ +---+---+ +---+ +---+
| 1 | | 3 | | 5 | 6 | | 8 | | 10| leaf nodes
+---+ +---+ +---+---+ +---+ +---+
A B-tree of order m
satisfies the following invariant:
Every node has at most
m - 1
keys andm
children in case of an interior node.Every node (except the root node) has at least
ceil (m / 2) - 1
keys andceil (m / 2)
children in case of an iterior node.The tree is balanced in the sense that all leaves appear at the same level.
The tree is sorted in the sense that all keys of the child to the left of a key are strictly smaller than the key and all keys of the child to the right of a key are strictly greater than the key.
Note that m / 2
is not an integral value if m
is an odd number.
Therefore ceil (m / 2)
rounds the value up to its nearest integral number.
Example: If the order is 3, then 3 / 2 = 1.5
and ceil (3 / 2) = 2
. I.e.
a B tree of order 3 has at least 2 children and 1 key and at most 3 children and
2 keys.
order |
keys |
children |
3 |
1 - 2 |
2 - 3 |
4 |
1 - 3 |
2 - 4 |
5 |
2 - 4 |
3 - 5 |
6 |
2 - 5 |
3 - 6 |
32 |
15 - 31 |
16 - 32 |
According to the invariant each node except the root node has at least the minimal number of keys. Any interior node except the root node has at least the miminal number of children.
The keys (or key value pairs) and the pointers to the children are usually stored in arrays. If we choose a power of two for the order which represents the caches size of a processor (e.g. 32), then the arrays can be stored within a cache line which speeds up insertion an deletion and results in better data locality.
B trees can be used to implement finite maps i.e. to store and retrieve key value pairs or finite sets i.e. to store keys. The keys have to be sortable. In this section we describe the the implementation of finite map via B trees.
In ocaml we use a module functor of the following form:
module Map (Key: SORTABLE) =
struct
let order = ... (* usually 32 for cache efficiency *)
let odd_order: bool = (* is the order odd? *)
assert (3 <= order); (* minimal order 3 is a must *)
order / 2 * 2 < order
let max_keys: int = order - 1 (* the maximal number of keys *)
let min_keys: int = (* the minimal number of keys *)
if odd_order then
(order - 1) / 2
else
order / 2 - 1
type 'a pairs = (Key.t * 'a) array
type 'a t =
| Leaf of 'a pairs (* key value pairs *)
| Node of 'a pairs * 'a t array (* key value pairs + children *)
...
(* Search, insert and delete see below ... *)
end
with the module type SORTABLE
which is defined in the module Fmlib_std.Ìnterfaces
.
module type SORTABLE = sig
type t
val compare: t -> t -> int
(** [compare a b]
compare a b < 0 if and only if a < b
compare a b = 0 if and only if a = b
compare a b > 0 if and only if a > b
*)
A leaf consists of an array of keys (or key value pairs)
0 1 2 len
+----+----+------------------------------+
| k0 | k1 | ... |
+----+----+------------------------------+
where k0
, k1
, … k(len-1)
are the keys of the leaf.
An interior node consists of an array of keys and an array of children where the array of children has one more element than the array of keys.
0 1 2 len
+----+----+------------------------------+
| k0 | k1 | ... |
+----+----+------------------------------+
| | | |
c0 c1 c2 c(len)
c0
, c1
, c2
, … c(len)
are the children with the property
that all keys in the child c1
are strictly less than the key k1
and all
the keys in the child c2
are strictly greater than the key k1
.
1.3.2. Search
Since the keys in a B tree are sorted we can search within a B tree using binary search. Let’s assume we have a binary search function of the form
let bsearch (key: Key.t) (arr: 'a pairs): int * bool =
...
We can search within the leaf and the interior nodes the corresponding array of
key value pairs for the position of the key key
. The function returns the
position and an exact flag. The exact flag indicates if an exact match has been
found. If the flag is not set, then the search key is strictly smaller than the
key at the position. If the length is returned as the position, then we know
that all keys in the array are strictly smaller than the search key.
If no exact match can be found in a leaf node, then the key is not in the leaf.
If no exact match can be found in an interior node, then the search key is not
in the interior node. However it can be in the child at the corresponding
position. Note that the array of the children has always one more element than
the array of key value pairs. Therefore there is a valid child at the position
length pairs
.
The search algorithm can be implemented by a straightforward recursive function
let rec find_opt (key: Key.t) (map: 'a t): 'a option =
match map with
| Leaf pairs ->
let i, exact = bsearch key pairs in
if exact then
Some (snd pairs.(i))
else
None
| Node (pairs, children) ->
let i, exact = bsearch key pairs in
if exact then
Some (snd pairs.(i))
else
find_opt key children.(i)
1.3.3. Insertion
All nodes have at most m - 1
keys and m
children where m
is the
order of the B tree. A node which has exactly m - 1
keys (and m
children
in case of an interior node) is full.
Insertion always starts in a leaf node. If the leaf node is full, then the insertion causes an overflow. The overflow condition might pop up to the root. In that situation the height of the tree grows.
An insertion of a key value pair in a tree which has already a key value pair with the same key causes a nondestructive overwrite of the old value by the new value. The structure of the B tree remains the same, just the value of the corresponding key value pair will be updated.
A proper insertion of a new key value pair starts by searching the leaf node and the position in the leaf node where to insert the new pair. The result of the insertion is described by the data type
type 'a insert =
| Normal_insert of 'a t
| Split_insert of 'a t * (Key.t * 'a) * 'a t
I.e. we insert the new pair into a leaf node and either return a new leaf node if there is enough room in the node or a splitted leaf node with a new key value pair which has to be inserted into the corresponding parent or a splitted leaf node which consist of a left leaf node, a popup key value pair and a right leaf node.
The insertion into the parent can either end in a normal insert or in a split insert.
During insertion the invariant is maintained that the popup key separates all the key value pairs of the left tree from the key value pairs of the right tree.
The basic insertion function looks like
let add (key: Key.t) (value: 'a) (map: 'a t): 'a t =
match add_aux key value map with
| Normal_insert map ->
map
| Split_insert (left, popup_key, right) ->
(* tree grows at the root *)
Node ([| popup_key |], [| left; right |]
If the splitting reaches the root, then a new root is created with one key value pair (the popup pair) and two children.
The basic insertion function uses the auxiliary function add_aux
which
implements the recursive algorithm.
let rec add_aux (key: Key.t) (value: 'a) (map: 'a t): 'a insert =
match map with
| Leaf pairs ->
add_in_leaf key value pairs
| Node (pairs, children) ->
let i, exact = bsearch key pairs in
if exact then
(* An exact match has been found. Therefore update the value. *)
let pairs = Array.replace i (key,value) pairs in
Normal_insert (Node (pairs, children))
else
(** Add the key value pair into the [i]th child. *)
match add_aux key value children.(i) with
| Normal_insert child ->
let children = Array.replace i child children in
Normal_insert (Node (pairs, children))
| Split_insert (u, y, v) ->
add_in_node i u y v pairs children
The function add_aux
uses the two helper functions add_in_leaf
and
add_in_node
to insert the key value pair either into a leaf node or an
interior node.
1.3.3.1. Insertion into a leaf node
The insertion into a leaf node is easy, if the leaf is not full. In case of overflow we have to distinguish several cases.
In the following we assume that we want to insert a key value pair y
at
position i
into a full leaf node. As a result we want to get a triple
(left, popup_key, right)
where left
is the left tree after the split,
right
is the right key after the split and the key in the key value pair
popup_key
separates the keys in left
from the keys in right
.
Now we analyze the different cases.
- Overflow, odd order:
m = 2 * k + 1
, i.e. we have2 * k
keys and the array of the key value pairs consists of two subarrays of sizek
. At the right end of the left subarray there is a keys1
and at the left end of the right subarray there is a keys2
.We have to distinguish the cases
i = k
,i < k
andi > k
.i = k
:In that case we have
s1 < y < s2
. We split the array into the two subarrays of sizek
and usey
as the popup key.before: i k 2k +-----------------+---+---+------------------+ | | s1| s2| | +-----------------+---+---+------------------+ after: +---+ | y | +---+ | | +-----------------+---+ +---+------------------+ | | s1| | s2| | +-----------------+---+ +---+------------------+
i < k
:In that case we have
y < s1
and we have to inserty
into the left subarray and uses1
as the popup key.before: i k 2k +------+---+------+---+---+------------------+ | | z | | s1| s2| | +------+---+------+---+---+------------------+ after: +---+ | s1| +---+ i | | +------+---+---+------+ +---+------------------+ | | y | z | | | s2| | +------+---+---+------+ +---+------------------+
i > k
:In that case we have
s2 < y
and we have to inserty
into the right subarray and uses2
as the popup key.before: k i 2k +-----------------+---+---+----+---+---------+ | | s1| s2| | z | | +-----------------+---+---+----+---+---------+ after: +---+ | s2| +---+ | | +-----------------+---+ +-----+---+---+--------+ | | s1| | | y | z | | +-----------------+---+ +-----+---+---+--------+
- Overflow, even order:
m = 2 * k
, i.e. we have2 * k - 1
key value pairs and the array of key value pairs consists of two subarrays of sizek - 1
with a single separator pairs
which separates the two subarrays.We have to distinguish the cases
i < k
andi >= k
. In both casess
can be used as the popup key. The insertion ofy
happens either in the left or in the right subarray.i < k
:before: i k +---------+---+----+---+------------------+ | | z | | s | | +---------+---+----+---+------------------+ after: +---+ | s | +---+ i | | +---------+---+---+-----+ +------------------+ | | y | z | | | | +---------+---+---+-----+ +------------------+
i >= k
:before: k i +------------------+---+----+---+---------+ | | s | | z | | +------------------+---+----+---+---------+ after: +---+ | s | +---+ | | +------------------+ +-----+---+---+---------+ | | | | y | z | | +------------------+ +-----+---+---+---------+
The following function does the insertion into a leaf node.
let add_in_leaf (key: Key.t) (value: 'a) (pairs: 'a pairs): 'a insert =
let len = Array.length pairs in
let i, exact = bsearch key pairs in
if exact then
Normal_insert (Leaf (Array.replace i (key, value) pairs))
else if len < max_keys then
(* Leaf is not full. *)
Normal_insert (Leaf (Array.insert i (key, value) pairs))
else
(* Leaf is full *)
let insert_subarray = insert_subarray pairs i (key, value)
and k = order / 2
in
if odd_order then
if i = k then
let left = subarray pairs 0 k
and right = subarray pairs k len
in
Split_insert (Leaf left, (key, value), Leaf right)
else if i < k then
let left = insert_subarray 0 (k - 1)
and right = subarray pairs k len
in
Split_insert (Leaf left, pairs.(k - 1), Leaf right)
else
let left = subarray pairs 0 k
and right = insert_subarray (k + 1) len
in
Split_insert (Leaf left, pairs.(k), Leaf right)
else begin
(* even order *)
if i < k then
let left = insert_subarray 0 (k - 1)
and right = subarray pairs k len
in
Split_insert (Leaf left, pairs.(k - 1), Leaf right)
else
let left = subarray pairs 0 (k - 1)
and right = insert_subarray k len
in
Split_insert (Leaf left, pairs.(k - 1), Leaf right)
end
Using the two helper functions subarray
and insert_subarray
.
subarray (arr: 'a array) (start: int) (beyond: int): 'a t =
(* The subarray of [arr] starting at [start] and ending one before [beyond]. *)
assert (0 <= start);
assert (start <= beyond);
assert (beyond <= Array.length arr);
Array.sub arr start (beyond - start)
let insert_subarray
(arr: 'a array) (i: int) (x: 'a) (start: int) (beyond: int)
: 'a array
=
(* The subarray of [arr] starting at [start] and ending one before [beyond]
with [x] inserted at position [i]. *)
assert (0 <= start);
assert (start <= i);
assert (i <= beyond);
assert (beyond <= Array.length arr);
let arr2 = Array.make (beyond - start + 1) x in
Array.blit arr start arr2 0 (i - start);
Array.blit arr i arr2 (i - start + 1) (beyond - i);
arr2
1.3.3.2. Insertion into an interior node
In this subsection we treat the situation that a key value pair has been
inserted into the i
th child t
of an interior node and the insertion into
the child t
caused an overflow and resulted in the triple (u, y, v)
where u
and v
are two valid B trees separated by the popup key value
pair y
. Furthermore the interior node is full and must be splitted as well.
The case distinctions are the same as in the insertion into a full leaf node with the additional complexity that the child nodes have to be treated.
- Overflow, odd order:
m = 2 * k + 1
. The array of the key value pairs consists of two subarray of equal sizek
with the key value pairs1
ending the left subarray ands2
beginning the right subarray.We have to distinguish the cases
i = k
,i < k
andi > k
.i = k
:s1
is a strict lower bound of all keys int
and therefore for all keys in(u, y, v)
.s2
is a strict upper bound. We split the interior node into the two subarrays and useu
as the last child in the left part andv
as the first child in the right part. We usey
as the popup key.before: i k 2k +-----------------+---+---+------------------+ | | s1| s2| | +-----------------+---+---+------------------+ | t after: +---+ | y | +---+ | | +-----------------+---+ +---+------------------+ | | s1| | s2| | +-----------------+---+ +---+------------------+ | | u v
i < k
:In that case we have to insert the popup key
y
at positioni
of the key value pairs of the interior node, replace thei``th child ``t
byu
and usev
as the additionally needed child.before: i k 2k +------+---+------+---+---+------------------+ | | z | | s1| s2| | +------+---+------+---+---+------------------+ | t after: +---+ | s1| +---+ i | | +------+---+---+------+ +---+------------------+ | | y | z | | | s2| | +------+---+---+------+ +---+------------------+ | | u v
i > k
:The same as before just with insertion into the right part of the interior node.
before: k i 2k +-----------------+---+---+----+---+---------+ | | s1| s2| | z | | +-----------------+---+---+----+---+---------+ | t after: +---+ | s2| +---+ | | +-----------------+---+ +-----+---+---+--------+ | | s1| | | y | z | | +-----------------+---+ +-----+---+---+--------+ | | u v
- Overflow, even order:
m = 2 * k
.We have to distinguish the cases
i < k
andi >= k
.i < k
:before: i k +---------+---+----+---+------------------+ | | z | | s | | +---------+---+----+---+------------------+ | t after: +---+ | s | +---+ i | | +---------+---+---+-----+ +------------------+ | | y | z | | | | +---------+---+---+-----+ +------------------+ | | u v
i >= k
:before: k i +------------------+---+----+---+---------+ | | s | | z | | +------------------+---+----+---+---------+ | t after: +---+ | s | +---+ | | +------------------+ +-----+---+---+---------+ | | | | y | z | | +------------------+ +-----+---+---+---------+ | | u v
The insertion function which inserts into an interior node reads like
let add_in_node
(i: int)
(left: 'a t)
(pair: Key.t * 'a)
(right: 'a t)
(pairs: 'a pairs)
(children: 'a t array)
: 'a insert
=
let len = Array.length pairs in
if len < max_keys then
let pairs = Array.insert i pair pairs
and children = Array.insert i left children
in
children.(i + 1) <- right;
Normal_insert (Node (pairs, children))
else
(* Node is full. *)
let k = order / 2
and insert_subarray = insert_subarray pairs i pair
and split_subarray start beyond =
split_subarray children i left right start beyond
in
if odd_order then
if i = k then
let left_pairs = subarray pairs 0 k
and left_children = subarray children 0 (k + 1)
and right_pairs = subarray pairs k len
and right_children = subarray children k (len + 1)
in
left_children.(k) <- left;
right_children.(0) <- right;
Split_insert (
Node (left_pairs, left_children),
pair,
Node (right_pairs, right_children))
else if i < k then
let left_pairs = insert_subarray 0 (k - 1)
and left_children = split_subarray 0 k
and right_pairs = subarray pairs k len
and right_children = subarray children k (len + 1)
in
Split_insert (
Node (left_pairs, left_children),
pairs.(k - 1),
Node (right_pairs, right_children))
else begin
let left_pairs = subarray pairs 0 k
and left_children = subarray children 0 (k + 1)
and right_pairs = insert_subarray (k + 1) len
and right_children = split_subarray (k + 1) (len + 1) in
Split_insert (
Node (left_pairs, left_children),
pairs.(k),
Node (right_pairs, right_children))
end
else begin
(* even order *)
if i < k then
let left_pairs = insert_subarray 0 (k - 1)
and left_children = split_subarray 0 k
and right_pairs = subarray pairs k len
and right_children = subarray children k (len + 1)
in
Split_insert (
Node (left_pairs, left_children),
pairs.(k - 1),
Node (right_pairs, right_children))
else
let left_pairs = subarray pairs 0 (k - 1)
and left_children = subarray children 0 k
and right_pairs = insert_subarray k len
and right_children = split_subarray k (len + 1)
in
Split_insert (
Node (left_pairs, left_children),
pairs.(k - 1),
Node (right_pairs, right_children))
end
with the additional helper function
let split_subarray
(arr: 'a array) (i: int) (x: 'a) (y: 'a) (start: int) (beyond: int)
: 'a array
=
(* The subarray of [arr] starting at [start] and ending one before [beyond]
with [x] inserted at position [i] and the original value at position
[i] replaced by [y]. *)
assert (i < beyond);
let arr = insert_subarray arr i x start beyond in
arr.(i - start + 1) <- y;
arr
1.3.4. Deletion
1.3.4.1. Basic Deletion
As with insertion, an actual deletion can only be done on a leaf node.
If the to be deleted key value pair is located in an interior node, then it cannot be deleted directly. In order to delete a key value pair in an interior node, we have to find a direct neighbour (predecessor or successor with respect to the order), which is always in a leaf node and delete the direct neighbour. Since the deleted key value pair is a direct neighbour of the to be deleted key value pair we can substitute the deleted key value pair for the key value pair in the interior node without disturbing the order.
Furthermore we have to handle a possible underflow condition. The invariant of a B tree requires that each leaf node or interior node which is not the root of the tree must have a minimal number of keys (and a minimal number of children in case of an interior node).
If a node underflows because of the deletion of a key value pair, the missing key value pair can be pulled from the parent and the missing child from the sibling. This can require to merge the underflowing child with its sibling (details see below).
The reparation might cause the parent to underflow. The reparation of the underflow condition of the parent can be done with the help of the parent of the parent etc. The underflow condition might popup recursively to the root.
An underflow condition in the root node is no problem except the case that the root becomes empty (i.e. no key value pairs and just one child). In that case the height of the tree shrinks at the root and the child becomes the new root.
In order to handle deletion we use the datastructure:
type 'a delete = {
tree: 'a t; (* The tree with the deleted key value pair. *)
pair: Key.t * 'a; (* The deleted key value pair. *)
underflow: bool; (* one key less than the minimal number *)
}
The basic deletion looks like:
let remove (key: Key.t) (map: 'a t): 'a t =
match remove_aux key map with
| None ->
map
| Some d ->
match d.tree with
| Node (pairs, children) when Array.is_empty pairs ->
(* tree shrinks at the root *)
children.(0)
| _ ->
d.tree
It uses an auxiliary function remove_aux
which returns an optional 'a
delete
structure. The auxiliary function returns None
if the tree does not
have a key value pair with the desired key. In case that the tree contains a key
value pair with the desired key, a tree is returned with the key value pair
deleted. The function remove
has to check, if the root node is an interior
node with no key value pairs. In that case the tree shrinks in height and the
only child becomes the new root.
The function remove_aux
implements the recursion.
let rec remove_aux (key: Key.t) (map: 'a t): 'a delete option =
match map with
| Leaf pairs ->
let i, exact = bsearch key pairs in
if exact then
let pair = pairs.(i)
and pairs = Array.remove i pairs
and underflow = Array.length pairs <= min_keys
in
Some {
tree = Leaf pairs;
pair;
underflow
}
else
None
| Node (pairs, children) ->
let i, exact = bsearch key pairs in
if exact then
let d = remove_last children.(i) in
let pair = pairs.(i)
and pairs = Array.replace i d.pair pairs in
Some (handle_delete i pair pairs children d)
else
Option.map
(fun d -> handle_delete i d.pair d pairs children)
(remove_aux key children.(i))
The function searches in a leaf node and in an interior node for the key value pair which has to be deleted.
In case of an exact match, the key is deleted. In a leaf node the deletion is
straightforward. In an interior node the deletion cannot be done directly. The
last key value pair in the child to the left of the to be deleted key value pair
is deleted with the help of the function remove_last
and the last key value
pair is substituted for the to be deleted key value pair. Remember that the last
key value pair in the child to the left of the to be deleted key value pair are
direct neighbours.
The function handle_delete
checks for an underflow condition and does the
reparation if needed.
If the search does not find an exact match in a leaf node, then no key value pair with the to be deleted key exists and nothing has to be done.
If the search does not find an exact match in an interior node, then the deletion continues in the child to the left of the key.
The code of the function handle_delete
is straightforward and uses the
helper function handle_underflow
to handle an underflow condition.
let handle_delete
(i: int) (* Index of the child where the deletion occurred. *)
(pair: Key.t * 'a) (* The deleted key value pair. *)
(d: 'a delete) (* The new tree with the key value pair deleted. *)
(pairs: 'a pairs) (* The key value pairs of the parent. *)
(children: 'a t array) (* The children of the parent. *)
: 'a delete
=
if not d.underflow then
{
tree = Node (pairs, Array.replace i d.tree children);
pair;
underflow = false
}
else
let len = Array.length pairs in
if i < len then
handle_underflow i true d.tree children.(i + 1) pair pairs children
else
let i = i - 1 in
handle_underflow i false children.(i) d.tree pair pairs children
If the deletion in the child has not caused an underflow in the chid, the new child is substituted for the old child. This cannot case an underflow in the parent either.
If the deletion in the child has caused an underflow, then a sibling has to be used to repair the underflow.
As long as the child is not the last child of the parent, there is a right
sibling. The new child and its right sibling are passed to the function
handle_underflow
with the indication that the underflow happened in the
first tree of the two siblings.
If the child is the last child of the parent, then we use its left sibling and
hand it over to the function handle_underflow
.
Since a valid B tree interior node has at least one key value pair and two children (even the root!) there is always either a left or a right sibling.
1.3.4.2. Handling of Underflow
In this section we treat the case that the deletion in one of the children of a parent node caused an underflow and how this underflow in the child can be repaired with the help of the parent and a sibling. We assume that the underflow happend in the left child. The situation where the underflow happened in the right child is symmetrical.
i
+---------------------+---+--------------------+
parent node | | y | |
+---------------------+---+--------------------+
| |
+-----------+---+ +---+-----------------+
siblings | | x | | z | |
+-----------+---+ +---+-----------------+
| |
u v
underflow no underflow
In the picture it is assumed that the siblings are interior nodes and have children. If the children are leaf nodes, then the algorithm is the same. Just ignore the children of the siblings.
The left sibling has exactly one missing key value pair and exactly one missing child. Reason: Since both are not the root node they have at least the minimal number of keys and children before the deletion. Deletion happened in the left sibling i.e. the left sibling had the minimal number of keys and children before the deletion. Otherwise no underflow would have happened.
We have to distinguish two cases: The right sibling is not minimal of the right sibling is minimal.
- Right sibling is not minimal:
In that case we can chop off the first child
v
and the first key value pairz
from the right sibling without violating the B tree invariant. The key value pairz
can be pushed up to the parent and key value pairy
from the parent and the B treev
can be appended to the underflowing chid to repair the situation.rotation before: i +---------------------+---+--------------------+ parent node | | y | | +---------------------+---+--------------------+ | | +-----------+---+ +---+-----------------+ siblings | | x | | z | | +-----------+---+ +---+-----------------+ | | u v underflow no underflow rotation after: i +---------------------+---+--------------------+ parent node | | z | | +---------------------+---+--------------------+ | | +-----------+---+---+ +-----------------+ siblings | | x | y | | | +-----------+---+---+ +-----------------+ | | u v
- Right sibling is minimal:
In that case we cannot chop off a key value pair and a child from the right sibling without violating the B tree invariant. The only possibility is to merge the two siblings.
In order to merge the two sibling successfully, we have to push down the separator key value pair from the parent. This might cause an underflow in the parent depending on the size of the parent. A possible underflow in the parent must be repaired at the level of the parent of the parent.
The parent looses one key value pair and one child with the merge. If the parent is the root node, the key value pair might be the last one leaving the parent with the merged child as the only child. In that case we can throw away the parent and use the merged child as the new root.
The merge can never create an overflow. The merged node has
2 * min_keys
key value pairs which is either the maximal number of keys or one less than the maximal number of keys.merge before: i +---------------------+---+--------------------+ parent node | | y | | +---------------------+---+--------------------+ | | +-----------+---+ +---+-----------------+ siblings | | x | | z | | +-----------+---+ +---+-----------------+ | | u v underflow minimal merge after: i +---------------------+--------------------+ parent node | | | +---------------------+--------------------+ | +-----------+---+---+---+-----------------+ siblings | | x | y | z | | +-----------+---+---+---+-----------------+ | | u v
The following function handles the underflow condition properly in all cases.
let handle_underflow
(i: int) (* Index of the child where the deletion occurred. *)
(underflow_left: bool) (* Underflow happend in the left child? *)
(left_child: 'a t)
(right_child: 'a t)
(pair: Key.t * 'a) (* The deleted key value pair. *)
(pairs: 'a pairs) (* The key value pairs of the parent. *)
(children: 'a t array) (* The children of the parent. *)
: 'a delete
=
let not_minimal pairs1 pairs2 =
if underflow_left then
not_minimal pairs2
else
not_minimal pairs1
in
match left_child, right_child with
| Leaf pairs1, Leaf pairs2 when not_minimal pairs1 pairs2 ->
(* Right sibling is not minimal, rotate *)
let pairs1, pairs, pairs2 =
rotate_keys underflow_left i pairs1 pairs pairs2
in
let children =
replace2 i (Leaf pairs1) (Leaf pairs2) children
in
{tree = Node (pairs, children); pair; underflow = false}
| Leaf pairs1, Leaf pairs2 ->
(* Sibling is minimal, merge *)
merge_leaves i pair pairs1 pairs2 pairs children
| Node (pairs1, children1), Node (pairs2, children2)
when not_minimal pairs1 pairs2
->
(* Sibling is not minimal, rotate *)
let pairs1, pairs, pairs2 =
rotate_keys underflow_left i pairs1 pairs pairs2
and children1, children2 =
rotate_children underflow_left children1 children2
in
let children =
replace2
i
(Node (pairs1, children1))
(Node (pairs2, children2))
children
in
{tree = Node (pairs, children); pair; underflow = false}
| Node (pairs1, children1), Node (pairs2, children2) ->
(* Sibling is minimal, merge *)
merge_nodes
i pair
pairs1 children1
pairs2 children2
pairs children
| _, _ ->
assert false (* Cannot happen, tree is balanced. *)
The function uses the following helper functions:
let not_minimal (pairs: 'a pairs): bool =
min_keys < Array.length pairs
let replace2
(i: int) (left: 'a t) (right: 'a t) (children: 'a t array)
: 'a t array
=
let children = Array.copy children in
assert (Array.valid_index i children);
assert (Array.valid_index (i + 1) children);
children.(i) <- left;
children.(i + 1) <- right;
children
let rotate_keys
(to_left: bool)
(i: int) (left: 'a pairs) (parent: 'a pairs) (right: 'a pairs)
: 'a pairs * 'a pairs * 'a pairs
=
let open Array in
assert (valid_index i parent);
if to_left then
push parent.(i) left,
replace i (first right) parent,
remove_first right
else
remove_last left,
replace i (last left) parent,
push_front parent.(i) right
let rotate_children
(to_left: bool)
(left: 'a t array) (right: 'a t array)
: 'a t array * 'a t array
=
let open Array in
if to_left then
push (first right) left,
remove_first right
else
remove_last left,
push_front (last left) right
let merge_keys
(i: int) (left: 'a pairs) (parent: 'a pairs) (right: 'a pairs)
: 'a pairs * 'a pairs
=
assert (Array.valid_index i parent);
let len_left = Array.length left
and len_right = Array.length right
in
let merged = Array.make (len_left + 1 + len_right) parent.(i)
and parent = Array.remove i parent
in
Array.blit left 0 merged 0 len_left;
Array.blit right 0 merged (len_left + 1) len_right;
merged, parent
let merge_leaves
(i: int)
(pair: Key.t * 'a)
(pairs1: 'a pairs) (pairs2: 'a pairs)
(pairs: 'a pairs) (children: 'a t array)
: 'a delete
=
assert (i + 1 < Array.length children);
let merged, pairs = merge_keys i pairs1 pairs pairs2
and children = Array.remove i children
and underflow = Array.length pairs <= min_keys
in
children.(i) <- Leaf merged;
{tree = Node (pairs, children); pair; underflow}
let merge_nodes
(i: int)
(pair: Key.t * 'a)
(pairs1: 'a pairs) (children1: 'a t array)
(pairs2: 'a pairs) (children2: 'a t array)
(pairs: 'a pairs) (children: 'a t array)
: 'a delete
=
assert (i + 1 < Array.length children);
let pairs_new, pairs = merge_keys i pairs1 pairs pairs2
and children = Array.remove i children
and underflow = Array.length pairs <= min_keys
and children_new = Array.append children1 children2
in
children.(i) <- Node (pairs_new, children_new);
{tree = Node (pairs, children); pair; underflow}