In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

  • insert(X), which inserts X into set S;
  • delete(X), which removes X from set S;
  • label(X), which returns a label assigned to X subject to:
    • label(X)
    • X,Y S, X < Y implies label(X) < label(Y)

The cost of a list labeling algorithm is the number of label (re-)assignments per insertion or deletion. List labeling algorithms have applications in many areas, including the order-maintenance problem, cache-oblivious data structures,[1] data structure persistence,[2] graph algorithms[3][4] and fault-tolerant data structures.[5]

Sometimes the list labeling problem is presented where S is not a set of values but rather a set of objects subject to a total order. In this setting, when an item is inserted into S, it is specified to be the successor of some other item already in S. For example, this is the way that list labeling is used in the order-maintenance problem. The solutions presented below apply to both formulations.

Upper bounds edit

The cost of list labeling is related to  , the range of the labels assigned. Suppose that no more than   items are stored in the list-labeling structure at any time. Four cases have been studied:

  •  
  •  
  •  
  •  

Exponential Labels edit

In the exponential label case, each item that is inserted can be given a label that is the average of its neighboring labels. It takes   insertions before two items are at adjacent labels and there are no labels available for items in between them. When this happens, all items are relabelled evenly from the space of all labels. This incurs   relabeling cost. Thus, the amortized relabeling cost in this case is  .[6]

Polynomial Labels edit

The other cases of list labeling can be solved via balanced binary search trees. Consider  , a binary search tree on S of height  . We can label every node in the tree via a path label as follows: Let   be the sequence of left and right edges on the root-to-  path, encoded as bits. So if   is in the left subtree of the root, the high-order bit of   is  , and if it is in the right subtree of the root, the high-order bit of   is  . Once we reach  , we complete   to a length of   as follows. If   is a leaf, we append  s as the low order bits until   has   bits. If   is an internal node, we append one   and then  s as the low order bits until   has   bits.

The important properties of   are that: these labels are in the range  ; and for two nodes with keys   and   in   if   then  . To see this latter property, notice that the property is true if the least common ancestor of   and   is neither   nor  , because   and   will share bits until their least common ancestor. If  , then because   is a search tree,   will be in the left subtree and will have a next bit of  , whereas   will be in the right subtree and will have a next bit of  .

Suppose instead that, without loss of generality, the least common ancestor of   and   is  , and that   has depth  . If   is in the left subtree of  , then   and   share the first   bits. The remaining bits of   are all 1s, whereas the remaining bits of   must have a  , so  . If instead   is in the right subtree of  , then   and   share the first   bits and the  st bit of   is  , whereas the  st bit of   is  . Hence  .

We conclude that the   function fulfills the monotonicity property of the label() function. Thus if we can balance the binary tree to a depth of  , we will have a solution to the list labeling problem for labels in the range  .

Weight-balanced trees edit

In order to use a self-balancing binary search tree to solve the list labeling problem, we need to first define the cost function of a balancing operation on insertion or deletion to equal the number of labels that are changed, since every rebalancing operation of the tree would have to also update all path labels in the subtree rooted at the site of the rebalance. So, for example, rotating a node with a subtree of size  , which can be done in constant time under usual circumstances, requires   path label updates. In particular, if the node being rotated is the root then the rotation would take time linear in the size of the whole tree. With that much time the entire tree could be rebuilt. We will see below that there are self-balancing binary search tree data structures that cause an appropriate number of label updates during rebalancing.

A weight-balanced tree BB[ ] is defined as follows. For every   in a root tree  , define   to be the number of nodes in the subtree rooted at  . Let the left and right children of   be   and  , respectively. A tree   is  -weight balanced if for every internal node   in  ,   and  

The height of a BB[ ] tree with   nodes is at most   Therefore, in order to solve the list-labeling problem, we need   to achieve a depth of  

A scapegoat tree is a weight-balanced tree where whenever a node no longer satisfies the weight-balance condition the entire subtree rooted at that node is rebuilt. This rebalancing scheme is ideal for list labeling, since the cost of rebalancing now equals the cost of relabeling. The amortized cost of an insertion or deletion is   For the list labeling problem, the cost becomes:

  •  :  , the cost of list labeling is amortized   (Folklore, modification of Itai, Konheim and Rodeh.[7])
  •  :  , the cost of list labeling is amortized   This bound was first achieved by Itai, Konheim, and Rodeh[7] and deamortized by Willard.[8]
  •  : If   is a power of two, then we can set  , and the cost of list labeling is  . A more careful algorithm can achieve this bound even in the case where   is not a power of two.

Lower bounds and open problems edit

In the case where  , a lower bound of  [9] has been established for list labeling. This lower bound applies to randomized algorithms, and so the known bounds for this case are tight.

In the case where  , there is a lower bound of   list labeling cost for deterministic algorithms.[6] Furthermore, the same lower bound holds for smooth algorithms, which are those whose only relabeling operation assigns labels evenly in a range of items[10] This lower bound is surprisingly strong in that it applies in the offline cases where all insertions and deletions are known ahead of time.

However, the best lower bound known for the linear case of algorithms that are allowed to be non-smooth and randomized is  . Indeed, it has been an open problem since 1981 to close the gap between the   upper bound and the   in the linear case.[7][11] Some progress on this problem has been made by Bender et al. who give a randomized upper bound of  .[12]

Applications edit

The best known applications of list labeling are the order-maintenance problem and packed-memory arrays for cache-oblivious data structures. The order-maintenance problem is that of maintaining a data structure on a linked list to answer order queries: given two items in the linked list, which is closer to the front of the list? This problem can be solved directly by polynomial list labeling in   per insertion and deletion and   time per query, by assigning labels that are monotone with the rank in the list. The time for insertions and deletions can be improved to constant time by combining exponential polynomial list labeling with exponential list labeling on small lists.

The packed-memory array is an array of size   to hold   items so that any subarray of size   holds   items. This can be solved directly by the   case of list labeling, by using the labels as addresses in the array, as long as the solution guarantees that the space between items is  . Packed-memory arrays are used in cache-oblivious data structures to store data that must be indexed and scanned. The density bounds guarantee that a scan through the data is asymptotically optimal in the external-memory model for any block transfer size.

References edit

  1. ^ Bender, Michael A.; Demaine, Erik D.; Farach-Colton, Martin (2005), "Cache-oblivious B-trees" (PDF), SIAM Journal on Computing, 35 (2): 341–358, doi:10.1137/S0097539701389956, MR 2191447.
  2. ^ Driscoll, James R.; Sarnak, Neil; Sleator, Daniel D.; Tarjan, Robert E. (1989), "Making data structures persistent", Journal of Computer and System Sciences, 38 (1): 86–124, doi:10.1016/0022-0000(89)90034-2, MR 0990051.
  3. ^ Eppstein, David; Galil, Zvi; Italiano, Giuseppe F.; Nissenzweig, Amnon (1997), "Sparsification—a technique for speeding up dynamic graph algorithms", Journal of the ACM, 44 (5): 669–696, doi:10.1145/265910.265914, MR 1492341, S2CID 263324404.
  4. ^ Katriel, Irit; Bodlaender, Hans L. (2006), "Online topological ordering", ACM Transactions on Algorithms, 2 (3): 364–379, CiteSeerX 10.1.1.78.7933, doi:10.1145/1159892.1159896, MR 2253786, S2CID 6552974.
  5. ^ Aumann, Yonatan; Bender, Michael A. (1996), "Fault tolerant data structures", Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996), pp. 580–589, doi:10.1109/SFCS.1996.548517, ISBN 978-0-8186-7594-2, S2CID 80348.
  6. ^ a b Bulánek, Jan; Koucký, Michal; Saks, Michael E. (2015), "Tight Lower Bounds for the Online Labeling Problem", SIAM Journal on Computing, vol. 44, pp. 1765–1797.
  7. ^ a b c Itai, Alon; Konheim, Alan G.; Rodeh, Michael (1981), "A Sparse Table Implementation of Priority Queues", ICALP, pp. 417–431
  8. ^ Willard, Dan E. (1992), "A Density Control Algorithm for Doing Insertions and Deletions in a Sequentially Ordered File in Good Worst-Case Time", Information and Computation, vol. 97, pp. 150–204.
  9. ^ Dietz, Paul F.; Seiferas, Joel I.; Zhang, Ju (1994), "A tight lower bound for on-line monotonic list labeling", Algorithm theory—SWAT '94 (Aarhus, 1994), Lecture Notes in Computer Science, vol. 824, Berlin: Springer, pp. 131–142, doi:10.1007/3-540-58218-5_12, ISBN 978-3-540-58218-2, MR 1315312.
  10. ^ Dietz, Paul F.; Zhang, Ju (1990), "Lower bounds for monotonic list labeling", Algorithm theory—SWAT '90, pp. 173–180.
  11. ^ Saks, Michael (2018), "Online Labeling: Algorithms, Lower Bounds and Open Questions", International Computer Science Symposium in Russia, pp. 23–28.
  12. ^ Bender, Michael A.; Conway, Alex; Farach-Colton, Martin; Komlos, Hanna; Kuszmaul, William; Wein, Nicole (October 2022). "Online List Labeling: Breaking the log2n Barrier". 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE. pp. 980–990. arXiv:2203.02763. doi:10.1109/focs54457.2022.00096. ISBN 978-1-6654-5519-0. S2CID 247292594.