Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit the draft click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about yourself, your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by 90.95.194.207 (talk | contribs) 41 days ago. (Update) |
Powersort
editClass | Sorting Algorithm |
---|---|
Data structure | Array |
Worst-case performance | O(nlogn) |
Worst-case space complexity | O(n) |
Powersort is an adaptive and efficient sorting algorithm designed to exploit the existing order within the input data to achieve near optimal sorting performance. Powersort belongs to the family of merge sort algorithms and addresses certain limitations observed in previous adaptive sorting algorithms like Timsort. More specifically, Powersort is a drop-in replacement for Timsort's suboptimal merge policy derived from first principles (see connection to nearly optimal binary search trees). Like Timsort, Powersort is particularly notable for its stability and ability to handle a variety of input patterns with minimal overhead.[1] Powersort was proposed by J. Ian Munro and Sebastian Wild.[2]The development of Powersort was motivated by the need for a sorting algorithm that could efficiently handle partially ordered data with minimal overhead and by shortcomings of the predominant stable sorting algorithm in practice, Timsort.
Practical hands-on experience with Powersort can be gained by playing Powersort's Pursuit and for those liking a more graphical approach Sebastian Wild's PyCon US 2023 talk is available here.
Algorithm Overview
editPowersort is a stable mergesort variant that adapts to the existing runs in the input data. The algorithm builds on the concept of nearly optimal binary search trees, leveraging the structure of the input to minimize the number of comparisons and data movements required for sorting.
The merge policy of Powersort is designed to optimize the merging process by leveraging the inherent structure in the input data. Powersort begins by identifying naturally occurring runs of already sorted elements within the input array. These runs are then used to guide the merging process. The algorithm employs a nearly optimal binary search tree approach to determine the optimal order in which runs should be merged. This approach minimizes the total number of comparisons and data movements required during the merge operations. Powersort uses a stack to manage the merging process, ensuring that the runs are merged in a cache-friendly manner, which helps to reduce overhead and improve performance. By focusing on merging runs in an optimal sequence, Powersort effectively reduces the overall complexity of the sort, making it highly efficient for a wide range of input patterns, including those with partial order.
Powersort is a stable sorting algorithm, meaning that it preserves the relative order of equal elements. This property is essential for many applications where the stability of the sort is required to maintain data integrity and consistency.[1]
Powersort Adoption
editThe implementation of Powersort in CPython began with Python 3.11, replacing the older Timsort algorithm. The change was motivated by Powersort's superior performance and stability. The core implementation can be found in the CPython source code within the `listobject.c` file, where the list sorting functions are defined. The detailed merge policies and algorithm are described in `listsort.txt`.[3][4] The transition to Powersort involved addressing issue #78742 in the CPython repository.[5] The PyPy project, known for its high-performance Just-In-Time (JIT) compiler for Python, also integrated Powersort. The relevant commit, identified as `ab2269f3c0ca0265b4ff462f1bda29442182de11`, details the inclusion of Powersort into PyPy's list sorting functions.[6] In AssemblyScript, Powersort was integrated to enhance the performance of WebAssembly applications. The relevant pull request, #1904, and the implementation details can be found in the `sort.ts` file within the AssemblyScript standard library.[7][8] These implementations across different platforms highlight the adaptability and efficiency of Powersort in various programming environments.
PowerSort(A[1..n]) 1 X := stack of runs (capacity [lg(n)] + 1) 2 P := stack of powers (capacity [lg(n)] + 1) 3 s1 := 1; e1 = ExtendRunRight(A[1], n) // A[s1..e1] is leftmost run 4 while e1 < n 5 s2 := e1 + 1; e2 := ExtendRunRight(A[s2], n) // A[s2..e2] next run 6 p := NodePower(s1, e1, s2, e2, n) // Pj for node j between A[s1..e1] and A[s2..e2] 7 while P.top() > p // previous merge deeper in tree than current 8 P.pop() // merge and replace run A[s1..e1] by result 9 (s1, e1) := Merge(X.pop(), A[s1..e1]) 10 end while 11 X.push(A[s1, e1]); P.push(p) 12 s1 := s2; e1 := e2 13 end while // Now A[s1..e1] is the rightmost run 14 while ¬X.empty() 15 (s1, e1) := Merge(X.pop(), A[s1..e1]) 16 end while NodePower(s1, e1, s2, e2, n) 1 n1 := e1 − s1 + 1; n2 := e2 − s2 + 1; l := 0 2 a := (s1 + n1/2 − 1)/n; b := (s2 + n2/2 − 1)/n 3 while [a · 2l] == [b · 2l] do l := l + 1 end while 4 return l
Comparison with Timsort
editTimsort, widely used in programming languages like Python, also aims to optimize sorting by leveraging existing order in the data. [1] However, Powersort addresses some of Timsort's limitations. Timsort uses heuristic rules to determine the merging order, which can lead to suboptimal performance in certain cases.[9] Powersort, on the other hand, uses a theoretically grounded approach to determine a nearly optimal merging order, which allows provable guarantees on its merge cost. Powersort provides provable guarantees on its performance, ensuring near-optimal sorting with minimal overhead. Timsort's performance can degrade significantly on specific input patterns. For instance, Powersort has been shown to outperform Timsort by up to 30% on certain types of input data that contain long runs of sorted elements.[2] Despite its advanced theoretical foundation, Powersort's implementation is relatively straightforward and avoids some of the intricate rules and corrections required in Timsort.
Multiway Powersort
editMultiway Powersort is an extension of Powersort that generalizes the binary merging process to k-way merging. This approach can further reduce the number of merge operations required by taking advantage of the existing runs more effectively. By merging multiple runs at once, Multiway Powersort can achieve better cache performance and reduce the overall computational cost.
This extension maintains the stability and adaptiveness of the original Powersort algorithm while improving its efficiency for specific types of input data.[10] Multiway Powersort was detailed by William Cawley Gelling, Markus E. Nebel, Benjamin Smith, and Sebastian Wild in 2023.[10]
KwayPowerSortk(A[0..n)) 1 S := empty stack // capacity (k − 1)⌈logk(n) + 1⌉ 2 b1 := 0; e1 = FirstRunOf(A[b1..n)) 3 while e1 < n 4 b2 := e1; e2 := FirstRunOf(A[b2..n)) 5 P := Powerk(n, b1, e1, b2, e2) 6 while S.top().power > P 7 P′:= S.top().power 8 L := empty list; L.append(S.pop()) 9 while S.top().power == P′ 10 L.append(S.pop()) 11 end while // merge runs in L with A[b1..e1) 12 (b1, e1) := Merge(L, A[b1..e1)) 13 end while 14 S.push(A[b1, e1), P) 15 b1 := b2; e1 := e2 16 end while // Now A[b1..e1) is the rightmost run 17 while ¬S.empty() // pop (up to) k − 1 runs, merge with A[b1..e1) 18 (b1, e1) := Merge(S.pop(k − 1), A[b1..e1)) 19 end while Powerk(n, b1, e1, b2, e2) 1 n1 := e1 − b1; n2 := e2 − b2; p := 0 2 a := (b1 + n1/2)/n; b := (b2 + n2/2)/n 3 while [a · kp]== [b · kp] do p := p + 1 end while 4 return p
References
edit- ^ a b c Peters, Tim (2002). "Timsort". Python Mailing List.
- ^ a b Munro, J. Ian; Wild, Sebastian (2018). "Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs". 26th Annual European Symposium on Algorithms (ESA).
- ^ "CPython Implementation Details". GitHub. Retrieved 2024-07-30.
- ^ "CPython List Sort". GitHub. Retrieved 2024-07-30.
- ^ "Issue #78742: Integrate Powersort". GitHub. Retrieved 2024-07-30.
- ^ "PyPy Implementation of Powersort". Heptapod. Retrieved 2024-07-30.
- ^ "AssemblyScript Pull Request #1904". GitHub. Retrieved 2024-07-30.
- ^ "AssemblyScript Sort Implementation". GitHub. Retrieved 2024-07-30.
- ^ Buss, Sam; Knop, Alexander (2018). "Strategies for stable merge sorting". arXiv.
- ^ a b Gelling, William Cawley; Nebel, Markus E.; Smith, Benjamin; Wild, Sebastian (2023). "Multiway Powersort". Symposium on Algorithm Engineering and Experiments (ALENEX 2023).