Talk:Rope (data structure)

Latest comment: 6 months ago by Jae in topic Language

Wrong O Notation? edit

String search in an array representation is O(1) ???

What about O(n)? — Preceding unsigned comment added by 91.213.91.28 (talk) 11:39, 19 October 2011 (UTC)Reply

If you read further up in the article, it defines search as an indexed access:
Definition: Search(i): return the character at position i
So the O(1) is perfectly fine.
-- Grumbel (talk) 00:23, 22 October 2011 (UTC)Reply

Okay!! Thank you! 91.213.91.28 (talk) 13:13, 25 November 2011 (UTC)Reply

Needs Improving! edit

Obviously,this article is too simple. Pictures/Examples are needed badly! Anyone who is familiar with rope please help! Visame (talk) 19:54, 3 May 2008 (UTC)Reply

Heck, I'd appreciate even a description of their space/time characteristics. --Gwern (contribs) 16:33 21 September 2008 (GMT)

yes, this article needs a major rewrite: the description is awol. — Preceding unsigned comment added by 131.111.1.66 (talk) 12:30, 20 July 2011 (UTC)Reply

I've been studying ropes extensively, as I've been trying to write an implementation. This page desperately needs help. Critical things, such as the time-complexity table comparing ropes to strings, are just down right wrong. For example, the time complexity for rope concatenation is listed as O(log n); in reality, it's O(n). SGI's implementation, which is cited by TFA, even notes this:

The worst-case cost of rebalancing is nonetheless linear in the string.

The other big citation of TFA, the Boehm et al. paper, states both constant, and linear (and maybe also logarithmic) complexity in different parts of the paper. Should a rope trigger rebalancing, the rebalancing algorithm is:

We traverse the rope from left to right, inserting each leaf into the appropriate sequence position, depending on its length.

To traverse the rope left-to-right, you MUST visit all nodes. Hence, linear. (The Boehm et al. paper's algorithm effectively rebuilds the tree, from scratch, weighting subtrees based on the lengths of their contained strings.) However, earlier in the paper:

In practice, we modify this in two ways. Concatenation is often a sufficiently important operation that it should run in unit, not logarithmic, time.

Contradicting the actual algorithm given for it. (Unless they mean amortized time; the paper is sloppy w.r.t. to the actual argument of any of its claims.) Lastly, they claim that AVL or B-Trees can be used for logarithmic time,

The first three of the above operations can be performed in a time logarithmic in the length of the argument, using, for example, B-trees or AVL trees

but does not state how; it is not as simple as "just use a tree algo, you have a tree!" as the other tree algos presume that internal (non-leaf) nodes in the tree hold data; this is not true for ropes.

Deathanatos (talk) 06:46, 7 March 2017 (UTC)Reply

Just LISP lists? edit

From this description, "ropes" look like http://en.wikipedia.org/wiki/Lisp_(programming_language)#Conses_and_lists

Could someone look up some feature of "ropes" that distinguish it from core LISP features from the 1950s? And if not, maybe this should be merged into the LISP page somehow. 78.105.8.188 (talk) 11:58, 21 September 2008 (UTC)Reply

Not quite seeing the problem here. The article clearly states that
A rope is essentially a binary tree whose leaves are arrays of characters.
which basically has nothing in common with Lisp-style lists, except in the trivial sense that binary trees can be implemented using conses. 87.194.117.80 (talk) 21:20, 5 June 2009 (UTC)Reply
78.105.8.18, your comment is extraordinarily stupid. Ropes can't even be implemented properly in LISP since the leaf nodes are supposed to be contiguous arrays near the size of a cache line. -- 98.108.211.71 (talk) 11:04, 24 July 2010 (UTC)Reply
98.108.211.71, Why do you say this? Do you believe that Lisp does not provide contiguous arrays of a given size? Common Lisp certainly does. Hammertime (talk) 11:31, 7 May 2011 (UTC)Reply
In fact, there is a Common Lisp implementation of ropes, as part of the FSet functional collections library (though it calls them "seqs"). (There are some minor differences: rebalancing is done (in log time) anytime a new seq is constructed, using the Weight-balanced tree algorithm, and the weight of each node is the total of its left and right subtrees, not just that of its left subtree.) ScottBurson (talk) 00:02, 7 July 2018 (UTC)Reply

Xanadu? edit

It sounds a lot like the Model T enfilade - even has a link back here from that article. Going to make a link at the bottom. —Preceding unsigned comment added by 97.75.165.74 (talk) 21:52, 6 October 2010 (UTC)Reply

Is this like ‘Piece-table’ with a tree as internal representation? edit

The reason I asks this is because I saw a draft outline for a (as of yet unfinished) book about "Purely Functional Data Structures" (What that now might mean), that kind of implies that Ropes and Piece-Tables are the same. I believe that they are not too closely related, but I think that it does not hurt to ask this here as it might be useful information for others (Maybe there could be a link to similar data structures on the page?).

I perfectly understands what Piece-tables is, and am quite sure I can implement them, but using linked lists.. But I am quite much interested in the speed of (Random?) access the trees can give, as for a linked list I must start searching from the beginning to reach the position 𝒙, while in a tree it is a fast binary search.

I really have tried to search for information regarding this topic, but I mostly get results about coffee tables from Google... It is quite hopeless to search for either "Piece Table" or "Ropes", so I kind of hopes that some one can give me pointers to some page discussing this or some publication regarding using Piece-Tables together with Trees.

Thanks in advance — Frank Eriksson 20:14, 7 September 2011 (UTC) — Preceding unsigned comment added by 81.228.157.247 (talk)

Ropes are very different from piece tables. The most obvious difference being the backend structure: a piece table is is represented by two arrays/buffers (file buffer, add buffer) while a Rope is represented as a binary tree. The rope is constructed in a piece-wise fashion from the input data, and individual strings within the rope are updated over the course of editing. There's no distinction between bits of the rope that were there at the start, and bits of the rope that have just been added. gringer (talk) 01:54, 3 February 2014 (UTC)Reply
using Piece-Tables together with Trees
I see that some people have suggested tweaking a piece table implementation to replace its linked list with a binary tree. (Such people include Joaquin Cuenca Abela; "Improving the AbiWord's Piece Table"). Is there a name for the resulting data structure?
differences between piece tables and ropes
What other data structures support support quickly inserting and deleting into the (logical) middle of a file or long string? (A common operation for text editors, word processors, etc.).
I agree with Frank Eriksson that this article should link to articles about those closely related data structures. (Such data structures are alluded to as "The core data structure in a text editor" in the string (computer science) and the text editor article).
The piece table and the rope data structure have many features in common, and also many differences.
Would it improve Wikipedia to have some text comparing them, pointing out those differences and common features?
Should that text be here in the rope data structure article, or in the piece table article?
Is there a more general article that would be more appropriate for that text comparing and contrasting these two data structures and closely related data structures?
(The above text by gringer is a good first draft; other differences and similarities are mentioned in the comments at https://www.reddit.com/r/programming/comments/22fpz0/whats_been_wrought_using_the_piece_table_how/ ).
--DavidCary (talk) 00:39, 25 June 2016 (UTC)Reply

Describe how "simple rope" is created? edit

The example rope given at the top of the article is not explained at all -- how would one create that rope from the original string "Hello_my_name_is_Simon"?

Order Statistic tree edit

Would it make sense to describe the tree used by rope as a variant on Order statistic tree? — Preceding unsigned comment added by Soronel~enwiki (talkcontribs) 13:06, 17 March 2016 (UTC)Reply

Not really... rope elements aren't ordered by any key. bungalo (talk) 20:54, 8 February 2020 (UTC)Reply

Searching for substrings is not mentioned edit

The article does not tell how string search (like in the C standard library function strstr) works and performs. It seems to me that a simple string search can get overly complicated and slow when using rope structures. -- Juergen 217.61.203.35 (talk) 16:33, 25 July 2020 (UTC)Reply

A string search would proceed just like in any sequence, in O(N+M) time if using KMP, for example. It may be slower in practice because the memory is fragmented, getting less out of SIMD implementations. On the other hand I wonder if there's an algorithm to exploit the shared nature of rope nodes to search each node only once -- that may improve the theoretical performance if there's a lot of duplicate substrings. bungalo (talk) 19:46, 25 July 2020 (UTC)Reply

Computational Complexity is Based on Bytes/ASCII edit

The computational complexity numbers are only accurate for tasks where the user perceived grapheme cluster/symbol/character has a trivial correlation to a single encoded value. O(n) analysis is always a bit misleading and needs to be taken with a grain of salt, but I think it would be helpful to add a comparison to the minimal data structure required to support ~Unicode NFC text editing. — Preceding unsigned comment added by Indolering (talkcontribs) 00:56, 4 December 2020 (UTC)Reply

The real comparison here is between ropes, arrays, gap buffers, piece tables, etc... The encoding of the data is orthogonal to the data-structure you use to hold the bits. Ropes aren't restricted to Bytes and ASCII just like regular strings and text files aren't. You can build a rope out of 32-bit integer if you wish. You should be aware that under no encoding a "user perceived grapheme cluster" would have "a single encoded value" of fixed width. Therefore, even Unicode-aware text editors would work with byte offsets into the UTF-8 encoded stream rather than grapheme indexes. bungalo (talk) 05:15, 4 December 2020 (UTC)Reply

Rebalancing edit

The only tricky part in the algorithm described by Boehm/Atkinson/Plass is tree rebalancing; the description in the paper is very sketchy, but it's clear that getting it right is crucial. This article hardly mentions the problem of rebalancing. It also doesn't discuss the optimizations needed to economize on space when a string is constructed by appending one character at a time. Mhkay (talk) 19:38, 18 December 2020 (UTC)Reply

Reverting ASCII/UTF-8/Unicode indexing efficiency edit

The following sentence has multiple issues:

"While ASCII-encoded strings have O(1) indexing, UTF-8 unicode strings do not, and indexing a UTF-8 string without caching is O(n) worst case. Ropes maintain O(log(n)) indexing even for unicode text."
  • First half assumes that indexing a UTF-8 string counts codepoints, when in reality it is more often done in bytes.
  • It says that UTF-8 is O(n) worst case but somehow ropes maintain O(log(n)) for unicode... UTF-8 is one of Unicode encodings so it's unclear how Ropes maintain that time complexity.
  • It talks about text encodings, which is an orthogonal topic to Ropes as data-structure.

bungalo (talk) 20:48, 7 April 2021 (UTC)Reply

Language edit

Are the examples Java? The article should probably say so. Peter Flass (talk) 20:06, 20 July 2023 (UTC)Reply

No. Please see WP:NOTREPOSITORY. --WikiLinuz {talk} 20:44, 20 July 2023 (UTC)Reply
It's not my code, but I think that, if it's there, it should say what language it is. Peter Flass (talk) 00:02, 21 July 2023 (UTC)Reply
Are you arguing against an addition of "these examples are written in the programming language Java" or against the examples themselves? I'm with Peter, if there's code in a concrete programming language, the language must be specified. jae (talk) 19:06, 23 October 2023 (UTC)Reply

Computational complexity edit

The rebalancing code in this article is sub-optimal. The rebalancing doesn't need to blow up the entire rope to the individual leaves, only to its already balanced subtrees. Given that the input ropes to each of the operations (split, concatenate) are already balanced, and all of their subtrees are also balanced, the rebalancing needs to work only with O(log n) subtrees along the path where the operation takes place. This is what gives split() its O(log n) guarantee. Concatenation and append are similarly O(log n) worst case. However when a bigger rope is built from smaller pieces, they are O(1) amortized by reasoning similar to how in-order construction of an rb-tree is O(n) rather than O(n log n). Insert is a combination of split and concat, giving it the split complexity.

Unfortunately the literature on the correct implementation of ropes is rather lacking, so this is mostly based on exploring existing rope implementation with some original research mixed in. But to be fair, so is most of this article. bungalo (talk) 14:28, 13 September 2023 (UTC)Reply