You're half-way there when the bits grow on trees...

Imagine you're a phone company and you want to implement an automated system on your website for
assigning phone numbers for your customers. There are two use cases. One, is that the customer can ask
if a specific number is available. The other, is that the customer can ask for any random number that's
available. What data structure would you use to efficiently store and search these phone numbers? Keep
in mind that you could potentially have billions of these phone numbers (10^{11} if we include
international numbers).

If you use an `Array`

, access is `O(1)`

so it seems like a good idea at first, but
that's only in cases where you know the index of the value you want to access. Finding that index means
searching the entire array, which is really `O(n)`

. If you use a `Hash Table`

, the
cost of search is `O(1)`

so it seems like a better idea, but the cost of deletion and
insertion can be `O(n)`

in the worst case scenario (*where you have 100% collision*).

The best answer is to use a *BST* or
Binary Search Tree, because they have average
`O(log n)`

for search, insertion, and deletion. That is assuming it's a self-balancing tree
implementation.

If you're not at all familiar with Binary Search Trees or Binary Trees in general, let me give you a simple analogy to try and grasp this particular concept of search in computer science.

Let's say a friend of yours is stuck on the road due to car trouble. Like a good friend you go to pick them up. You know they are stuck at a fork on this road and you arrive at a fork. At the fork there is a sign with a number on it. The number is 5 and your friend sees a sign with the number 16. You know that if you go left, you will only find numbers less than 5 and if you go right you will find numbers greater than 5. So you go right and arrive at another fork. This time you see a sign with the number 11. What do you do?

Knowing how to get to your friend, on this road, is no different than knowing how to search a (*BST*).
You must always know two things, and always be able to make one of two simple choices. Is this number more
than or less than the number I'm searching for? Do I go left, or do I go right at the fork?

## Learning Opportunity

A BST is a data structure made up of nodes (the forks in our road analogy) which have left and right branches that, may or may not, each have an attached node (i.e there could be a dead end). A node attached to the left branch is less than the current node. A node attached to the right branch is greater than the current node. This always holds true no matter which node you're currently looking at. The entirety of the structure makes up what looks like a tree, given all these nodes connected by branches. The fact that there can only be two branches on each node is what makes itbinary. In the context ofsearchthese distinct properties are what make a BST very efficient.

The idea here is that you can always discard one-half of the tree with every iteration. This makes it
possible to search for any value in the tree logarithmically, *i.e.* `O(log n)`

.
Consequently, the cost of insertion or deletion to and from the tree are the same.

So say we have 9 random integers in a BST and we want to find out if the integer 16 is one of them. You always start at the root node of the tree and compare the given value to the search value. If the node's value is greater than the search value we move to the node in the left branch. If the node's value is less than the search value we move to the node in the right branch. Repeat this step until you find the value or reach the end (a null branch).

In PHP, we know that arrays are really just ordered hashmaps, and the cost of search in a hashmap is
`O(1)`

, which makes it much faster than `O(log n)`

. However, BSTs have other
properties that make them more desirable than a hashmap in many use cases. For example, finding the min
or max value in a hashmap is `O(n)`

whereas in a BST it is `O(log n)`

. In a real
vector, or contiguous block, the values would need to be sorted before search can be `O(1)`

,
but Quicksort requires `O(n log n)`

making it very inefficient for insertion and deletion, because you have to resort each time you insert
or delete.

The best way to learn is to just roll up your sleeves and do it, right? So let's implement a BST in PHP and learn something!

We already know that a binary search tree is made up of nodes. The properties that best describe these
nodes is that they have a `value`

, a `left`

branch, and a `right`

branch.
Each *branch* can also contain a node. So let's write up the `node`

class first since
it's simply described by a set of properties and doesn't have any real behavior (i.e. *methods*).

```
class Node
{
public $value, $left, $right;
public function __construct($value)
{
$this->value = $value;
}
}
```

Simple enough. Now that we have a way to create a node, we'll need the actual binary search tree
implementation to be able to create the tree. So far we understand that a BST has a `root`

node, so that will be one of the properties that describe this object.

The BST also has a couple of behaviors, that will make up the methods of this class. We'll start by
implementing the simplest of these behaviors, which is `search`

.

```
class BST
{
public $root;
public function __construct($value = null)
{
if ($value !== null) {
$this->root = new Node($value);
}
}
public function search($value)
{
$node = $this->root;
while($node) {
if ($value > $node->value) {
$node = $node->right;
} elseif ($value < $node->value) {
$node = $node->left;
} else {
break;
}
}
return $node;
}
}
```

As you can see search is quite simple. We merely compare the search value to the current node (*starting
at the root node*). If it's greater than the value of the current node, we go to the node in the right
branch. If it's less than the value of the current node we go to the node in the left branch. If they're
equal we return the current node. This repeats in the loop until we reach a branch that is `null`

.
This tells us that the value does not exist in the tree.

Now let's implement the `insert`

functionality of the tree in order to make it really useful. This
is very similar to our search behavior, with one difference. We're looking for a null branch to insert the
value. So ideally, our insert function should look fairly similar to our search function with the minor
difference being that we will attach a new node to the null branch reached at the end of our loop.

```
public function insert($value)
{
$node = $this->root;
if (!$node) {
return $this->root = new Node($value);
}
while($node) {
if ($value > $node->value) {
if ($node->right) {
$node = $node->right;
} else {
$node = $node->right = new Node($value);
break;
}
} elseif ($value < $node->value) {
if ($node->left) {
$node = $node->left;
} else {
$node = $node->left = new Node($value);
break;
}
} else {
break;
}
}
return $node;
}
```

OK, so what's going on here? Every time we go right or left we check to see if the branch is `null`

before proceeding and based on that information we make a decision to either attach a new node to the
branch and break from the loop, or move to the next node. So insertion happens just before the point
where we would normally end up with `null`

and break from the loop, but since we can't attach
a node to `null`

, we attach just before the `$node`

is assigned null, and then break.

Before we can learn about deleting nodes from the tree we must first grasp an important concept of binary
search trees, which is how to find their *minimum* and *maximum* values. Remember that
every time you go left in a BST you are guaranteed to get a value smaller than its parent. So if we
start at the root node, and keep going left until we arrive at a node with a `null`

left
branch, we should have reached the smallest value in the tree.

So our implementation of `min`

in the BST class should look something like this.

```
public function min()
{
if (!$this->root) {
throw new Exception('Tree root is empty!');
}
$node = $this->root;
return $node->min();
}
```

Notice that `BST::min()`

will only call on `root->min()`

so we must implement the
actual functionality in our `Node`

class. This makes sense since we will likely find it useful
to find the minimum value from a specific branch in the tree and not just from the root.

So our implementation of `min`

in the Node class should look something like this.

```
public function min()
{
$node = $this;
while ($node->left) {
if (!$node->left) {
break;
}
$node = $node->left;
}
return $node;
}
```

Notice that if you start at the root and keep going left in our example tree illustrated earlier, you will ultimately reach the smallest value in the tree, but consequently, if you were to start at any other node in the tree and keep going left until you arrive at a null left branch, you will have the smallest value in that branch.

You can probably guess how the implementation for `max`

will work in our BST class...

```
public function max()
{
if (!$this->root) {
throw new Exception('Tree root is empty!');
}
$node = $this->root;
return $node->max();
}
```

In our Node class the same concept applies, but we're going *right* instead of left.

```
public function max()
{
$node = $this;
while ($node->right) {
if (!$node->right) {
break;
}
$node = $node->right;
}
return $node;
}
```

Great, now we have a way to insert nodes into the tree and a way to search the tree. Next, we need to have a way to delete nodes from the tree. Deletion is slightly trickier than search and insertion, but that's OK, because we're super ninja programmers that can code our way out of any problem, right?

*Crickets...*

There are three kinds of deletion in a binary search tree we should be aware of, because they're each handled just a little differently.

- Deleting a node that has no children
- Deleting a node that has one child node
- Deleting a node that has two child nodes

The easiest is the first one with no nodes, obviously. You simply just delete the node by assigning
`null`

to its parent branch, which means you must keep track of the parent branch. So in our
earlier BST, if we decided to delete the node with the value of `1`

, its parent node
`2`

, would then have a left branch of `null`

and the node would be removed from
the tree.

Now, let's say we want to delete the node with the value `2`

, which at this point only has
one branch (containing the node with the value `4`

). It doesn't actually matter if the child's
branch is right or left. As long as it only has one branch we simply take the node in that branch and
attach it to the parent branch of the node we're about to delete.

Finally, the last case is when we wish to delete a node that has both branches occupied. To do this we must first find the smallest node of its right branch. We can do this using the node's min method that we implemented earlier.

Now we assign that min value to value of the node we're trying to delete, which will give us duplicate nodes.

Next we apply a delete operation to the duplicate node.

Now, to make this concept a little easier to implement we can make one small modification to our Node class. By allowing each node to keep track of its parent node, we never need to pass along the parent by keeping track of it from the calling context. This will simplify the last and most complex case of deletion for us quite a bit. It will also make it easier to walk the tree later.

```
class Node
{
public $parent, $value, $left, $right;
public function __construct($value, Node $parent = null)
{
$this->value = $value;
$this->parent = $parent;
}
}
```

Now we just need to modify the `insert`

method of our BST class to pass the parent into the
Node's constructor when doing insertion.

What this gives us is a doubly-linked list
of nodes. Making it possible for a child node, in the tree, to know about its parent node. This is also
a recursive relationship, because the child stores the parent property as the parent node object itself.
So, be careful of (*recursion there*) recursion there.

```
public function insert($value)
{
$node = $this->root;
if (!$node) {
return $this->root = new Node($value);
}
while($node) {
if ($value > $node->value) {
if ($node->right) {
$node = $node->right;
} else {
$node = $node->right = new Node($value, $node);
break;
}
} elseif ($value < $node->value) {
if ($node->left) {
$node = $node->left;
} else {
$node = $node->left = new Node($value, $node);
break;
}
} else {
break;
}
}
return $node;
}
```

The `root`

will have a `null`

parent, and that's fine. So nothing else should
change in our implementation. We're now ready to implement delete in our BST class.

```
public function delete($value)
{
$node = $this->search($value);
if ($node) {
$node->delete();
}
}
```

The `delete`

method in our BST class is rather simple. It just searches for the node we're
trying to delete, and calls the `delete`

method on that Node. The Node should be able to take
care of the rest from here.

```
public function delete()
{
if ($this->left && $this->right) {
$min = $this->right->min();
$this->value = $min->value;
$min->delete();
} elseif ($this->right) {
if ($this->parent->left === $this) {
$this->parent->left = $this->right;
$this->right->parent = $this->parent->left;
} elseif ($this->parent->right === $this) {
$this->parent->right = $this->right;
$this->right->parent = $this->parent->right;
}
$this->parent = null;
$this->right = null;
} elseif ($this->left) {
if ($this->parent->left === $this) {
$this->parent->left = $this->left;
$this->left->parent = $this->parent->left;
} elseif ($this->parent->right === $this) {
$this->parent->right = $this->left;
$this->left->parent = $this->parent->right;
}
$this->parent = null;
$this->left = null;
} else {
if ($this->parent->right === $this) {
$this->parent->right = null;
} elseif ($this->parent->left === $this) {
$this->parent->left = null;
}
$this->parent = null;
}
}
```

Our `Node::delete()`

, on the other hand, is not so simple.

The first `if`

condition,
in this method, is the number 3 scenario on our list, mentioned earlier. When the node we're trying to
delete has both a left and right branch, we must first find the `min`

of its right branch
and set its value to the value of that node. Then we call delete on that node. Notice, once we do that
the `min`

node cannot meet the first condition again. It will only fall into one of the other
two scenarios—it either has one child or no children at all (*because it will never have a left
branch*).

The animation here, highlights the most complex of deletion cases, where we need to perform a delete on a node that has both left and right branches occupied.

Now, we'll look at how you can walk through the binary search tree and some of the different ways you can do it.

One thing you should note about tree structures like the binary search tree is that they are not linear
structures, like vectors, or linked lists. They can not be viewed sequentially. They are actually
traversed as graphs, which means there is more than one way to traverse them. However, there must be a
systematic way in which you can *visit* every node in the tree exactly once. Otherwise, the
implementation is considered inefficient and potentially defective. Imagine you wanted to print all of
the node values in the tree and your method allowed some nodes to be visited more than once. This
wouldn't be very helpful.

There are two conceptual ways in which you can traverse a binary tree. One way is **iterative**,
meaning it works sequentially (one after the other), but requires a little more memory. The other way is to
do it **recursively**, and since a binary tree is by definition is a *corecursive*
structure, that would make sense. Doing it iteratively requires storing or deferring some nodes to
memory before traversing them. This is usually accomplished by using a
stack or
queue data structure. I actually
find it easier to implement a `walk`

method on a tree recursively, but that's just me. Then
again, I'm the one writing this so, what do I care how you like to do it?!

Now whether the method of implementation is recursive or iterative, you still have to choose the order of
traversal on the tree, which can be either **Depth First** or **Breadth First**.
*Breadth First* order is also sometimes referred to as *level order*, because you visit
every node on a particular level of the tree before moving down to the next level.

In *Depth First* order there are three different algorithms commonly used for traversal.

- Pre-order
- In-order
- Post-order

We're not going to go through all of these approaches here. In this article, we're just going to use Depth First In-order traversal, since I find it to make the most sense for demonstration purposes.

First, let's plot out our chartered course so that I have some high-level view of where we want to go, before we try to get there.

The sequence looks kind of like a roller coaster ride, but this particular aproach will give us the nodes in order.

So the result is all values in ascending order as
`[1, 2, 4, 5, 7, 11, 16, 23, 34]`

, which is what you'd expect if this was a sorted array,
for example.

```
public function walk(Node $node = null)
{
if (!$node) {
$node = $this->root;
}
if (!$node) {
return false;
}
if ($node->left) {
yield from $this->walk($node->left);
}
yield $node;
if ($node->right) {
yield from $this->walk($node->right);
}
}
```

I've used a generator implementation, which will only work in PHP 7, but that's just because it makes
the implementation so much simpler. You can accomplish the same thing in PHP 5 by replacing the `yield`

calls with an array assignment, for example, and then using `array_reduce`

to return the
actual results back to the caller. You might need a wrapper function for this, but the concept is the
same. It's just so much less code using a generator!

If we ran this code...

```
// Instantiate a new tree with root node of 5
$bst = new BST(5);
// Insert left branch nodes
$bst->insert(2);
$bst->insert(1);
$bst->insert(4);
// Insert right branch nodes
$bst->insert(11);
$bst->insert(7);
$bst->insert(23);
$bst->insert(16);
$bst->insert(34);
// Walk the tree
$tree = $bst->walk();
foreach($tree as $node) {
echo "{$node->value}\n";
}
```

We should get output like this...

1 2 4 5 7 11 16 23 34

So, you're probably thinking, what on earth would I need a binary tree for, and especially in a language like PHP? And you're probably very right for thinking that, but don't confuse what you'll need it for with why you almost never need to implement a binary tree in practice. In PHP, particularly, the performance gains of binary search are dwarfed by the context-switching overhead of having to implement them as objects (it takes a lot longer to carry out a function call in PHP than it would to do simple pointer dereferencing in a language like C, for example).

It's very important to know about and understand this data structure in concert with other
data structures so that you're aware of the performance (time and memory) characteristics
of each in a given problem. In many cases, you'll be using tools that implement a btree
quite frequently, but will rarely be required to implement one yourself. For example,
your Database Management System (*DBMS*) implements a type of btree commonly referred
to as B+ tree. In 3D computer
graphics, many games will often implement a kind of btree usually called a
Binary Space Parition
(*BSP*), which allows for recursively subdividing a space into convex sets by
hyperplanes.

There are many areas where trees or graphs (*nodes and edges*) are generally useful in a wide
array of programming problems. So having a good grasp of BST, is not a bad start!

As mentioned in the beginning of this article, only a self-balancing implementation of a BST, such as
AVL-tree will accomplish average `O(log n)`

in search, insertion, and deletion, because as
you can see by our implementation if I were to simply insert the numbers `[1,2,3,4,5]`

or
`[5,4,3,2,1]`

you would have `O(n)`

since the tree would be entirely right or
left heavy.

So, as always, I like to leave *something* up to the imagination. So I'll leave it to you to play
around with the implementation and see if you can figure out how to implement a `balance`

method that will automatically balance the tree.