Derivation of average case edit

The average number of iterations performed by binary search depends on the probability of each element being searched. The average case is different for successful searches and unsuccessful searches. It will be assumed that each element is equally likely to be searched for successful searches. For unsuccessful searches, it will be assumed that the intervals between and outside elements are equally likely to be searched. The average case for successful searches is the number of iterations required to search every element exactly once, divided by  , the number of elements. The average case for unsuccessful searches is the number of iterations required to search an element within every interval exactly once, divided by the   intervals. (Knuth 1998, §6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search".)

Successful searches edit

In the binary tree representation, a successful search can be represented by a path from the root to the target node, called an internal path. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length  , is   counting the initial iteration. The internal path length is the sum of the lengths of all unique internal paths. Since there is only one path from the root to any single node, each internal path represents a search for a specific element. If there are   elements, which is a positive integer, and the internal path length is  , then the average number of iterations for a successful search  , with the one iteration added to count the initial iteration.

Since binary search is the optimal algorithm for searching with comparisons, this problem is reduced to calculating the minimum internal path length of all binary trees with   nodes, which is equal to:

 

For example, in a 7-element array, the root requires one iteration, the two elements below the root require two iterations, and the four elements below require three iterations. In this case, the internal path length is:

 

The average number of iterations would be   based on the equation for the average case. The sum for   can be simplified to:

 

Substituting the equation for   into the equation for  :

 

For integer  , this is equivalent to the equation for the average case on a successful search specified above.

 

Unsuccessful searches edit

Unsuccessful searches can be represented by augmenting the tree with external nodes, which forms an extended binary tree. If an internal node, or a node present in the tree, has fewer than two child nodes, then additional child nodes, called external nodes, are added so that each internal node has two children. By doing so, an unsuccessful search can be represented as a path to an external node, whose parent is the single element that remains during the last iteration. An external path is a path from the root to an external node. The external path length is the sum of the lengths of all unique external paths. If there are   elements, which is a positive integer, and the external path length is  , then the average number of iterations for an unsuccessful search  , with the one iteration added to count the initial iteration. The external path length is divided by   instead of   because there are   external paths, representing the intervals between and outside the elements of the array.

This problem can similarly be reduced to determining the minimum external path length of all binary trees with   nodes. For all binary trees, the external path length is equal to the internal path length plus  . Substituting the equation for  :

 

Substituting   into the equation for  , the average case for unsuccessful searches can be determined: