RFC: Linked list cursors #2570

4e554c4c · 2018-10-21T19:33:16Z

🖼️ Rendered

⏭ Tracking issue

This is my first RFC, so feel free to critique :)

📝 Summary

Many of the benefits of linked lists rely on the fact that most operations (insert, remove, split, splice etc.) can be performed in constant time once one reaches the desired element. To take advantage of this, a Cursor interface can be created to efficiently edit linked lists. Furthermore, unstable extensions like the IterMut changes will be removed.

The reference implementation is here, feel free to make implementation suggestions. :)

❌ TODO

Resolve where split, split_before split lists (this seems to be ambiguous in the specification)

Diggsey · 2018-10-21T19:52:09Z

This RFC seems quite vague on the specifics. For example:

Do cursors have a lifetime bound to the list - this is necessary for iterators, but isn't mentioned in the RFC, and no lifetimes are present in the struct definitions.
Do you expect to be able to have multiple cursors into the same data structure? This is something prevented by the use of mutable cursors and lifetimes, but also one of the ways in which cursors could be most useful.

This seems like the kind of thing that might be better implemented as a crate first - given the amount of new API surface being added, I don't see how it could ever be accepted without at least a sample implementation.

4e554c4c · 2018-10-21T20:07:33Z

@Diggsey good points
Yeah, these would have to have lifetimes bound to the list. It should be possible to have multiple immutable cursors, but having multiple mutable cursors seems much more difficult (this is why I have it so that a mutable cursor can be used to create an immutable one).

I'll work on a reference implementation so this can be accepted

4e554c4c · 2018-10-22T04:12:16Z

Currently working on the implementation. All is done except for splitting lists. See here: https://github.com/4e554c4c/list_cursors

mark-i-m · 2018-10-22T17:24:25Z

Perhaps this could be an eRFC to allow greater experimentation? I would really like to play around with your implementation on nightly before deciding if we want to keep it as is, modify it, or go a different route.

sfackler · 2018-10-22T17:30:19Z

@mark-i-m the API would land unstable regardless of the e-ness of the RFC.

mark-i-m · 2018-10-22T17:42:30Z

@sfackler Yes, but the e-ness allows us to proceed without as clear of a vision on what API we will eventually adopt. In contrast, IIUC, if we accept a normal RFC for this, we are saying that we are reasonably confident in this approach. I think the approach has merit and is worth trying out, but I can't personally support strongly accepting the RFC because I simply don't have any experience with such an API.

4e554c4c · 2018-10-22T18:23:09Z

I'm still learning the Rust RFC process. How would I submit an eRFC?

Centril · 2018-10-22T18:24:53Z

@4e554c4c put an "e" before "RFC" ^.^ There's not much difference other than a change in intent.

4e554c4c · 2018-10-22T19:56:34Z

I have edited the title to show that this is an eRFC. The API is not finalized and possibly incomplete and I am taking suggestions.

Amanieu · 2018-10-24T21:41:27Z

The cursor API that you propose seems to be almost identical to the one that I implemented a while ago for intrusive-collections: Cursor, CursotMut and general docs. I have used this API in some of my personal projects and it has felt very ergonomic. I am not sure why this RFC has chosen to rename some of the methods.

One thing to note in particular as to how cursors work in my crate:

A cursor views an intrusive collection as a circular list, with a special null object between the last and first elements of the collection. A cursor will either point to a valid object in the collection or to this special null object.

Another difference is that I have front/front_mut/back/back_mut methods in LinkedList which return a cursor to the first/last element of the list (or the null element if the list is empty), while cursor and cursor_mut always return a cursor pointing to the null element.

Regarding IterMut vs CursorMut, I will repeat what I said in the other thread: these two types work in fundamentally different ways (a DoubleEndedIterator acts as a deque which can be popped from both ends) and serve completely different purposes (an iterator is for ... iterating, a cursor is for modifying the list).

4e554c4c · 2018-10-25T02:51:24Z

@Amanieu thank you for commenting!
Renaming methods to make things more consistent ("before"->"prev") would probably be a good idea. Also I like your implementation's remove, this may benefit from going that route instead of pop (although doing both works too).
front/front_mut/etc. I decided against because these methods already exist in LinkedList to refer to elements. I haven't decided against adding them, I'm just not sure what the best way to go about it would be.

xaberus · 2018-10-25T06:29:04Z

If I am not mistaken, the proposed API is not safe, i.e. current() can be used to create dangling references.

let mut c = list.cursor_mut();
c.move_next();
// here we create a mutable  reference to an element
let u = c.current();
c.move_prev();
// drop the element
drop(c.pop());
// here we drop this element
let mut dangle = u.unwrap();
// dereferencing a dangling reference to the now no longer existing element
**dangle = &777;
// use after free confirmed  by valgrind...

(I am still new to rust, so take the following with a grain of salt...)

The reason is that in the code above current() creates a reference that has a lifetime of the cursor ('a) but it should have a much narrower scope/lifetime. The point is, returning a reference to an element must either somehow freeze the cursor or taking a raw reference must be disallowed (like RefCell/RefMut).

Now we have basically created two mutable references to the same data: u and c.current() which is undefined behavior.

I think one way to prevent this is to separate the operation of editing the list and editing its elements, i.e. another Cursor* type... but this makes the interface quite useless, as you than cannot inspect the elements that you are editing.

If I remember correctly this is the reason, we have no similar interface in the std...

N.B. This is a nice reference highlighting some interesting problems.

4e554c4c · 2018-10-25T16:53:30Z

Interesting! I thought that the reference would borrow the cursor (like how IndexMut borrows slices).
I believe this is fixable, we would just have to make it so that the output of current borrows the cursor.

xaberus · 2018-10-25T18:36:21Z

@4e554c4c having given it a little more thought I think the prototype for current() should be

pub fn current(&mut self) -> Option<&mut T> {...}
// that is elided to
pub fn current<'b>(&'b mut self) -> Option<&'b mut T> {...}

not

pub fn current(&mut self) -> Option<&'a mut T> {...}

The first gives the reference a lifetime that is smaller than 'a (like a slice gives to a an element reference, 'a: 'b), while the second gives it the lifetime of the list, which is wrong, as the elements should not outlive the list. Hopefully I made no mistake and this makes sense. (Would be nice if somebody with more experience in lifetimes than me could comment on this...)

4e554c4c · 2018-10-25T19:23:00Z

yep! That seems to fix the problem. The lifetime annotation I added seems to have made this worse. I'm going to look at the lifetimes more and update the RFC
Now, the question is: is this behavior valid for cursor?
I think not since this can occur

fn main() {
    let mut list = LinkedList::new();
    let mut c = list.cursor_mut();
    c.insert(3);
    c.insert(2);
    c.insert(1);
    
    let u = {
        let mut c = c.as_cursor();
        c.move_next();
        c.current();
    };
    drop(c.pop());
    // use after free!
    println!("element: {:?}", u);
}

4e554c4c · 2018-10-25T20:14:15Z

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

xaberus · 2018-10-25T20:15:19Z

@4e554c4c: I think the example is actually OK. (I think you meant current() not current();, otherwise u would be ())

with the modified signatures

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current(&mut self) -> Option<&mut T> {...}
    ...
}

the borrow checker (on nightly) complains about c not living long enough in the let u = expression (and it makes sense to me: Cursor "contains" the element T, by calling current() we get a reference &'b T, a': b' [i.e. as a type 'a is 'b and more, 'b cannot outlive 'a]. When we drop the Cursor no references with lifetime 'b can remain).

xaberus · 2018-10-25T20:32:08Z

Ok, lifetimes should be added to the RFC and fixed in the reference implementation. Tell me what y'all think.

If my reading of the nomicon is correct (get_mut example), the explicit lifetime is not necessary in

pub fn as_cursor<'cm>(&'cm self) -> Cursor<'cm, T>;

By elision rules this should be the same as

pub fn as_cursor(&self) -> Cursor<T>;

(clippy nags about it...)

Amanieu · 2018-10-25T20:33:52Z

To be precise, the methods on Cursor can return &'list T, but the methods on CursorMut must return &'self mut T. This means that if you want to inspect the previous/next value before any modification then you need to call as_cursor first.

Amanieu · 2018-10-25T20:43:24Z

Actually, this would probably work:

impl<'a, T> Cursor<'a, T> {
    ...
    pub fn current(&self) -> Option<&'a T> {...}
    ...
}
...
impl<'a, T> CursorMut<'a, T> {
    ...
    pub fn current_mut(&mut self) -> Option<&mut T> {...}
    pub fn current(&self) -> Option<&T> {...}
    // Same with peek/peek_mut, etc
    ...
}

The advantage of having Cursor return &'a T instead of &'self T is that the reference remain valid after moving the cursor. This is fine since Cursor can't delete or modify any elements in the list.

4e554c4c · 2018-10-25T21:44:34Z

Good point, I'll leave that one how it was.
Should I introduce methods like current/current_mut for CursorMut? I feel like this is somewhat unneeded since one could always call as_cursor to get immutable access to the element and it would have the same borrow semantics.

4e554c4c · 2018-10-28T16:37:26Z

The reference implementation should be pretty much complete thanks to @xaberus
I've also started a list of what currently needs to get resolved in the RFC.

xaberus · 2018-10-30T18:07:06Z

Having tinkered with the reference implementation for a while, here are my impressions and comments so far.

I much more prefer the API where the cursors start with the first element, because I constantly forget the initial move_next()/move_prev(). As proposed by @Amanieu, I think head()/head_mut(), tail()/tail_mut() are a better choice.
While the wrapping of cursors gives an elegant implementation, the API can unfortunately give unexpected and/or ambiguous results. I started a non-wrapping branch of the reference implementation to test my assumptions. When/if I am happy with the results I will try to write down a pull request for the RFC as time permits.
So far, after rewriting parts of reference implementation a couple of times (see the branch above) I ended up with the following simple invariants that I would like to see in the API:
- inserting an item after the cursor position implies c.peek_after() == Some(e)
- inserting an item before the cursor position implies c.peek_before() == Some(e)
- inserting a list before the cursor position implies c.peek_before() == Some(inserted_tail)
- inserting a list after the cursor position implies c.peek_after() == Some(inserted_head)
- iterating past the head/tail with move_prev()/move_next() keeps the Cursor one position "before"/"after" the last element, allowing to reverse the iteration direction at any time. Effectively, there are now two empty elements. (This turned out a lot more complicated to implement than I imagined, but is so much harder to abuse.)

I hope to continue this list when I have time to work on it. What are your comments so far?

4e554c4c · 2018-10-30T19:13:54Z

I think this can work. The main reason that Cursor is wrapping is for efficient movement between the front and back of the list, but head()/tail() could do this just as well.

If the cursor doesn't start at the empty element, could we just do away with the empty element altogether? We could have the invariant that the Cursor is always on an element, and remove the Option from the current method. Then head()/tail() could return an Option<Cursor>

The only problem I see with this is that we would need to add some extra method to determine whether the cursor is at the beginning/end of a list, e.g. has_next() or similar

blankname · 2018-10-30T19:31:17Z

More prior art: https://contain-rs.github.io/linked-list/linked_list/struct.Cursor.html

xaberus · 2018-11-04T20:37:04Z

@Amanieu:

First, // Same as AfterLast if list is empty
Last, // Same as BeforeFirst if list is empty

This is what I wrote in one of my first attempts, but it is kind of surprising in the case of an empty list:

// cursor_mut(Last)
[] <>
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <>

v.s.

// cursor_mut(Last)
<> []
// insert_before(1)
[1] <>
// pop_before() -> Some(1)
[] <>

If I request a cursor at the Last position and insert and pop an element before it I expect the cursor to be still at the same position. This does not break the invariant, i.e. insert_before(1) -> pop_before() == Some(1), but the cursor value changes from BeforeFirst to AfterLast, i.e. a roundtrip is not a no-op.

Amanieu · 2018-11-04T20:42:27Z

@xaberus Note that a wrapping cursor (where BeforeFirst is the same as AfterLast) avoids this problem. This is why I went with that approach in intrusive-collections.

xaberus · 2018-11-04T22:31:41Z

@Amanieu I see. In way, this not a real problem: The invariants (sans the last one) enforce a wrapping behavior for inserts which effectively collapses the cursor states for the empty list to None. I found this issue in a test case only because I was inspecting the cursor state directly and not though the API. On the outside there is no visible difference.
I guess I may have found a complicated way to express the same behavior 😄, but for some reason wrapping cursors instill a feeling of unease in me. For example, if I want to find the nth item in the list in a wrapping implementation, I have to inspect each cursor position for None. If I forget about that I will wrap around and end up somewhere in the middle of the list.

// non-wrapping
let c = list.cursor(BeforeFirst);
for _ in 0..n {
    c.move_next();
}
c.current()

v.s.

// wrapping
let c = list.cursor(BeforeFirst);
c.move_next();
for _ in 0..n {
    if c.current().is_none() {
        return None;
    } else {
        c.move_next();
    }
}
c.current()

rfcbot · 2018-11-06T13:08:14Z

🔔 This is now entering its final comment period, as per the review above. 🔔

gnzlbg · 2018-11-07T10:56:26Z

The insert_list and insert_list_before methods have O(1) time complexity right ?

Amanieu · 2018-11-07T12:35:32Z

I would prefer if some changes were made to the method names. In particular, I think that every method name should explicitly specify the direction in which the operation is performed (forwards or backwards).

peek/peek_before => peek_next/peek_prev
insert/insert_before => insert_after/insert_before
pop/pop_before => remove
- pop is a bad name since it only applies at the head or tail or a list. Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.
insert_list/insert_list_before => splice_before/splice_after
- Splicing is a standard term in linked list operations.
split/split_before => split_after/split_before

nugend · 2018-11-07T16:22:24Z

I'm just driving by, but @Amanieu has got it right. This is not a clunky interface at all as long as the meaning of those API methods are really clear and unambiguous about which elements are being affected.

This RFC actually strongly reminds me of the more general Zipper concept that Haskell has explored at length (and linked lists are just degenerate trees). I suspect that's not really pertinent to the matter at hand, but I think it's interesting.

glaebhoerl · 2018-11-07T17:53:15Z

@Amanieu Aiming for symmetry with explicitly specified directions sounds good; however, doesn't this bit present a smidgen of discord in the plan?

Also the most common operation that you would want is to remove the current element, not the next/previous one. The cursor would then be advanced to the next element after the removal.

Why to the next element rather than the previous? Why aren't there then two versions of it, with explicitly specified directions, like the others? Of course as a matter of pragmatics, you more often want to move forwards rather than backwards, but if we're choosing not to privilege the forwards direction in the rest of the API...

xaberus · 2018-11-07T21:26:19Z

@glaebhoerl: I second that. Removing an item usually can be decided by a predicate f(&T)->bool. For this

let mut c = list.cursor_mut(BeforeHead);
while c.peek_next().is_some() {
    if let Some(true) = c.peek_next().map(f) {
        c.pop_next();
    } else {
        c.move_next();
    }
}

sounds like a reasonable interface that is symmetric by replacing next by prev. Moreover, this makes the action of removing an item explicit, no need to look up in the reference that the cursor will move in some arbitrary direction.

eaglgenes101 · 2018-11-09T14:08:47Z

I think that it might be possible to allow for multiple mutable cursors, safely, with a method that splits out two bounded cursors. These bounded cursors will then recognize the position at which the cursor was split, and if the linked list element pointed to is that node, following that pointer through the bounded cursor API instead goes to that bounded cursor's sentinel, and looping back around from the far end will go to the node adjacent to the split-at node on the bounded cursor's side instead of to the other end.

Splitting at the sentinel doesn't make sense, so if a split is attempted at the sentinel, then Option::None is returned instead, hence the Option in the return value.

pub fn split_at_mut<'cm>(&'cm mut self) -> Option<(BoundedCursorMut<'cm, T>, &'cm mut T, BoundedCursorMut<'cm, T>)>;

(The node in the middle where the list is split is a no man's land for both bounded cursors, so that if the resulting bounded cursors are used to manipulate the nodes bordering the split, they don't have to change the pointers on the other side's nodes to retain linked list consistency, instead only having to change the middle node's pointer pointing towards their side.)

rfcbot · 2018-11-16T13:13:50Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

clarfonthey · 2018-12-03T22:25:08Z

Anyone going to make a tracking issue?

Centril · 2018-12-03T22:27:43Z

@clarcharr I usually made them but I missed this one for some reason...

@sfackler given the recent discussion, can someone from T-libs create the tracking issue and merge the RFC to make sure that y'all have taken in and are OK with the recent discussion?

nitnelave · 2018-12-09T11:06:46Z

Just my grain of salt, but I have used a similar API (https://contain-rs.github.io/linked-list/linked_list/struct.Cursor.html) for a circular linked_list, and the ghost element made it hard to move around since I wasn't interested in where was the beginning/end of the list. The solution I had was to implement seek_forward/seek_backward methods that ensured that the next element was never the ghost, but it was clunky. Would it be somehow possible to have a way to say "I don't want a ghost"? i.e. the ghost would only be here for the empty list, but as soon as you insert an element, it becomes a loop to itself.

We could potentially have it as a generics parameter? (not sure about it)

mark-i-m · 2019-02-13T20:22:01Z

Ping?

Amanieu · 2019-02-14T20:13:13Z

I am currently on holiday, I will merge this RFC when I get back next week.

Centril · 2019-02-17T10:12:33Z

Huzzah! This RFC is hereby merged!

Tracking issue: rust-lang/rust#58533

nickeb96 · 2023-02-08T04:58:49Z

The remove_replace example function is wrong and should be fixed. Its explanation says the following:

For example, consider you had a linked list and wanted to remove all elements which satisfy a certain predicate, and replace them with another element.

Instead, the code in the example will visit the same element infinitely until the predicate function returns false. Also, it never checks the first element.

Here is a rework:

fn remove_replace<T, P, F>(list: &mut LinkedList<T>, p: P, f: F)
    where P: Fn(&T) -> bool, F: Fn(T) -> T
{
    let mut cursor = list.cursor_front_mut();
    loop {
        let should_replace = match cursor.current() {
            Some(element) => p(element),
            None => break,
        };
        if should_replace {
            let old_element = cursor.remove_current().unwrap();
            cursor.insert_before(f(old_element));
        } else {
            cursor.move_next();
        }
    }
}

Add cursors rfc

d317332

Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 21, 2018

4e554c4c changed the title ~~RFC: Linked list cursors~~ eRFC: Linked list cursors Oct 22, 2018

Add lifetimes to RFC

67d797b

4e554c4c force-pushed the cursors branch from 524f9f8 to 67d797b Compare October 25, 2018 21:56

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Nov 6, 2018

rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 16, 2018

Centril added A-types-libstd Proposals & ideas introducing new types to the standard library. A-collections Proposals about collection APIs labels Nov 22, 2018

Amanieu mentioned this pull request Feb 17, 2019

Tracking issue for LinkedList cursors rust-lang/rust#58533

Open

4 tasks

RFC 2570

71ec5ab

Centril merged commit 2735924 into rust-lang:master Feb 17, 2019

LukasKalbertodt mentioned this pull request Jan 11, 2020

Fix linked list cursor names #2847

Merged

Amanieu mentioned this pull request Nov 28, 2022

BTreeMap cursors rust-lang/libs-team#141

Closed

dtolnay mentioned this pull request Aug 6, 2024

Tracking Issue for CharIndices::offset function rust-lang/rust#83871

Closed

3 tasks

RFC: Linked list cursors #2570

RFC: Linked list cursors #2570

Conversation

4e554c4c commented Oct 21, 2018 • edited by Centril Loading

🖼️ Rendered

⏭ Tracking issue

📝 Summary

❌ TODO

Diggsey commented Oct 21, 2018

4e554c4c commented Oct 21, 2018

4e554c4c commented Oct 22, 2018

mark-i-m commented Oct 22, 2018

sfackler commented Oct 22, 2018

mark-i-m commented Oct 22, 2018

4e554c4c commented Oct 22, 2018 via email

Centril commented Oct 22, 2018 • edited Loading

4e554c4c commented Oct 22, 2018

Amanieu commented Oct 24, 2018

4e554c4c commented Oct 25, 2018

xaberus commented Oct 25, 2018

4e554c4c commented Oct 25, 2018

xaberus commented Oct 25, 2018

4e554c4c commented Oct 25, 2018 • edited Loading

4e554c4c commented Oct 25, 2018

xaberus commented Oct 25, 2018 • edited Loading

xaberus commented Oct 25, 2018

Amanieu commented Oct 25, 2018

Amanieu commented Oct 25, 2018

4e554c4c commented Oct 25, 2018 • edited Loading

4e554c4c commented Oct 28, 2018

xaberus commented Oct 30, 2018 • edited Loading

4e554c4c commented Oct 30, 2018

blankname commented Oct 30, 2018

xaberus commented Nov 4, 2018 • edited Loading

Amanieu commented Nov 4, 2018

xaberus commented Nov 4, 2018

rfcbot commented Nov 6, 2018

gnzlbg commented Nov 7, 2018

Amanieu commented Nov 7, 2018

nugend commented Nov 7, 2018

glaebhoerl commented Nov 7, 2018

xaberus commented Nov 7, 2018

eaglgenes101 commented Nov 9, 2018 • edited Loading

rfcbot commented Nov 16, 2018

clarfonthey commented Dec 3, 2018

Centril commented Dec 3, 2018

nitnelave commented Dec 9, 2018 • edited Loading

mark-i-m commented Feb 13, 2019

Amanieu commented Feb 14, 2019

Centril commented Feb 17, 2019

nickeb96 commented Feb 8, 2023

4e554c4c commented Oct 21, 2018 •

edited by Centril

Loading

Centril commented Oct 22, 2018 •

edited

Loading

4e554c4c commented Oct 25, 2018 •

edited

Loading

xaberus commented Oct 25, 2018 •

edited

Loading

4e554c4c commented Oct 25, 2018 •

edited

Loading

xaberus commented Oct 30, 2018 •

edited

Loading

xaberus commented Nov 4, 2018 •

edited

Loading

eaglgenes101 commented Nov 9, 2018 •

edited

Loading

nitnelave commented Dec 9, 2018 •

edited

Loading