Add pagination extension spec #494

Merged
prologic merged 14 commits from movq/yarn:pagination into master 3 months ago
movq commented 3 months ago
Collaborator

Formalized the pagination extension that has been discussed a bit lately.

Formalized the pagination extension that has been discussed a bit lately.
Poster
Collaborator

This will probably work as is, but I wonder: Should we include the date/time range in feed_prev and feed_next, so that clients know in advance which twts they are going to fetch?

This will probably work as is, but I wonder: Should we include the date/time range in `feed_prev` and `feed_next`, so that clients know *in advance* which twts they are going to fetch?
movq reviewed 3 months ago
```
# url = https://example.com/twtxt.txt
# nick = cathy
# feed_prev = https://example.com/twtxt-2021-10-30.txt
movq commented 3 months ago
Poster
Collaborator

Hmm, should these URLs be relative, i.e. just /twtxt-2021-10-30.txt? That could further reduce the need to edit archived feeds in the future (e.g., when you move to a different domain).

Hmm, should these URLs be *relative*, i.e. just `/twtxt-2021-10-30.txt`? That could further reduce the need to edit archived feeds in the future (e.g., when you move to a different domain).
movq commented 3 months ago
Poster
Collaborator

… in any case, they should not contain a protocol, because multi-protocol feeds exist (Gopher + HTTP + ...).

… in any case, they should not contain a protocol, because multi-protocol feeds exist (Gopher + HTTP + ...).
lyse commented 3 months ago
Poster
Collaborator

Yeah, good thinking regarding relative URLs and moves! Just a spontaneous idea: how about allowing multiple feed_prev fields with all its protocols? I have to think about this a bit more.

Yeah, good thinking regarding relative URLs and moves! Just a spontaneous idea: how about allowing multiple `feed_prev` fields with all its protocols? I have to think about this a bit more.
Poster
Owner

My $0.02 worth:

  • prev as the key.
  • point to a relative URI. relative to the feed's URI.
My $0.02 worth: - `prev` as the key. - point to a relative URI. relative to the feed's URI.
lyse reviewed 3 months ago
lyse left a comment

Oh yeah, this is brilliant work! I really love it, well done mate. :-)

---
layout: page
title: "Feed pagination"
lyse commented 3 months ago
Poster
Collaborator

All the other extensions have the "Extension" suffix, so to be consistent I'd call it "Feed Pagination Extension" in title case. Not sure if that was a good idea in the first place, though. :-)

That pattern is also reflected in the ugly filenames themselves. Not sure whether sticking or breaking with the bad tradition is better. Just want to raise attention. ;-)

As we speak so much of archive (what it actually is) and much less of pagination, maybe we should rename it to "Archive Feeds Extension"? What do you guys think? I'm also totally happy with the proposed title.

All the other extensions have the "Extension" suffix, so to be consistent I'd call it "Feed Pagination Extension" in title case. Not sure if that was a good idea in the first place, though. :-) That pattern is also reflected in the ugly filenames themselves. Not sure whether sticking or breaking with the bad tradition is better. Just want to raise attention. ;-) As we speak so much of archive (what it actually is) and much less of pagination, maybe we should rename it to "Archive Feeds Extension"? What do you guys think? I'm also totally happy with the proposed title.
movq commented 3 months ago
Poster
Collaborator

Yeah, "Archive Feeds Extension" is a much better name. 👌

Yeah, "Archive Feeds Extension" is a much better name. 👌
Poster
Owner

I also think it should be called "Archived Feeds Extension". We're not really "paginating" per se but saying "hey, here are all my old twts if you want them".

I _also_ think it should be called "Archived Feeds Extension". We're not really "paginating" per se but saying "hey, here are all my old twts if you want them".
movq commented 3 months ago
Poster
Collaborator

"Archived Feeds Extension" or "Archive Feeds Extension"? 🥴

"Archived Feeds Extension" or "Archive Feeds Extension"? 🥴
Poster
Owner

"Archived Feeds Extension" or "Archive Feeds Extension"? 🥴

Maybe the "imperative mood" version of "Archive Feed Extension"

> "Archived Feeds Extension" or "Archive Feeds Extension"? 🥴 Maybe the "imperative mood" version of "Archive Feed Extension"
lyse marked this conversation as resolved
pagination can be used to move old twts to a different (partial) feed.
Clients can then choose to retrieve only some of those feeds.
## Main feed and archived feeds
lyse commented 3 months ago
Poster
Collaborator

Title case this heading, too?

Title case this heading, too?
lyse marked this conversation as resolved
## Main feed and archived feeds
There is exactly one main feed, which is the same as the traditional
twtxt.txt file. This feed keeps growing by adding new twts at the end.
lyse commented 3 months ago
Poster
Collaborator

Should we emphasize that this append-only mode now differs from the original Twtxt File Format Specification?

Should we emphasize that this append-only mode now differs from the original Twtxt File Format Specification?
movq commented 3 months ago
Poster
Collaborator

I don't think it (= specifying the order of the feed) matters for this extension, but it'll probably make our life a bit easier when it comes to range requests. 🤔 I'll add it.

I don't think it (= specifying the order of the feed) matters for *this* extension, but it'll probably make our life a bit easier when it comes to range requests. 🤔 I'll add it.
Poster
Owner

My $0.02 worth:

  • Do what stackeffect @twtxt.stackeffect.de suggested and declare (add to the Metadata spec) a field called (say) order = append|prepend
  • State that if a feed doesn't declare the order of twts, that it is up to the client to determine this programatically.
My $0.02 worth: - Do what stackeffect @twtxt.stackeffect.de suggested and declare (add to the Metadata spec) a field called (say) `order = append|prepend` - State that if a feed doesn't declare the order of twts, that it is up to the client to determine this programatically.
lyse marked this conversation as resolved
twtxt.txt file. This feed keeps growing by adding new twts at the end.
Deletion or editing of twts anywhere in the feed is allowed.
Once the main feed is "full", some or all of its twts can be moved to a
lyse commented 3 months ago
Poster
Collaborator

I reckon we should add an explicit note somewhere that "full" is feed-specific and limits are not covered by this spec.

The following I'm not so sure on: Examples might be helpful that "full" can mean that N twts have been reached and thus a rotation is due or archives can be based on date ranges. What do you think?

I reckon we should add an explicit note somewhere that "full" is feed-specific and limits are not covered by this spec. The following I'm not so sure on: Examples might be helpful that "full" can mean that N twts have been reached and thus a rotation is due or archives can be based on date ranges. What do you think?
movq commented 3 months ago
Poster
Collaborator

I added a section about that. 👌

I added a section about that. 👌
Poster
Owner

I agree with @lyse "full" here is very much dependent on the feed author or yarn pod operator twiddling with feed rotation policies.

I agree with @lyse "full" here is very much dependent on the feed author or yarn pod operator twiddling with feed rotation policies.
lyse marked this conversation as resolved
The file names of archived feeds are implementation specific and don't
carry special meaning.
In archived feeds, there will be a `feed_next` metadata field, which
lyse commented 3 months ago
Poster
Collaborator

First I thought feed_next is a bit weird (sorry, don't find the right term, this sounds way too harsh), I had in mind that it should contain url. But now as you outlined above that it might be just an absolute path instead (or even relative to the main feed file?) it makes sense. Still, we might tweak the name. Don't have any alternatives to offer at the moment, though. :-S

First I thought `feed_next` is a bit weird (sorry, don't find the right term, this sounds way too harsh), I had in mind that it should contain `url`. But now as you outlined above that it might be just an absolute path instead (or even relative to the main feed file?) it makes sense. Still, we might tweak the name. Don't have any alternatives to offer at the moment, though. :-S
Poster
Owner

Do we even need this at all? Can we just have a prev = <relative_uri> for archived versions of the feed and put the onus on the client to keep track of the list of current -> archvied_feed_1 -> archived_feed_2 -> archived_feed_n?

Do we even need this at all? Can we just have a `prev = <relative_uri>` for archived versions of the feed and put the onus on the client to keep track of the list of `current -> archvied_feed_1 -> archived_feed_2 -> archived_feed_n`?
movq commented 3 months ago
Poster
Collaborator

Do we even need this at all?

You mean drop next and just make this a singly linked list? Hmm, probably. 🤔 (At least my implementation in jenny wouldn’t care about next at all.)

> Do we even need this at all? You mean drop `next` and just make this a singly linked list? Hmm, probably. 🤔 (At least my implementation in jenny wouldn’t care about `next` at all.)
lyse commented 3 months ago
Poster
Collaborator

Hmm, just offering prev but not next seems a bit odd to me. I'd go with a doubly linked list as movq suggested.

Hmm, just offering `prev` but not `next` seems a bit odd to me. I'd go with a doubly linked list as movq suggested.
Poster
Owner

Hmm, just offering prev but not next seems a bit odd to me. I'd go with a doubly linked list as movq suggested.

Okay, can we use prev and next then for less typing/characters :D

> Hmm, just offering `prev` but not `next` seems a bit odd to me. I'd go with a doubly linked list as movq suggested. Okay, can we use `prev` and `next` then for less typing/characters :D
lyse commented 3 months ago
Poster
Collaborator

Four characters should be enough for everybody! :-D

Four characters should be enough for everybody! :-D
lyse commented 3 months ago
Poster
Collaborator

So, technically we usually just go back, so prev alone together with url to the main/current feed would suffice. Just in case you somehow come across an archive feed from the middle of an archive chain, then next would be more helpful as you could navigate it piece by piece forward and don't have to start from the main feed and walk your way back. The question of course is whether this is some use case we want to care about (probably occurs too rarely).

I need to think more about it. But we could certainly start off very simple with only prev and if the need arises later on, just add the next pointer.

So, technically we usually just go back, so `prev` alone together with `url` to the main/current feed would suffice. Just in case you *somehow* come across an archive feed from the middle of an archive chain, then `next` would be more helpful as you could navigate it piece by piece forward and don't have to start from the main feed and walk your way back. The question of course is whether this is some use case we want to care about (probably occurs too rarely). I need to think more about it. But we could certainly start off very simple with only `prev` and if the need arises later on, just add the `next` pointer.
Poster
Owner

Yeah I just can't think of use-case myself with my limited mind of wanting to walk back 'n forth. I really do see this as an "archival" process, so only going back is useful here, but definately a pointer back to the main/current version of the feed I think is important here.

Yeah I just can't think of use-case myself with my limited mind of wanting to walk back 'n forth. I _really_ do see this as an "archival" process, so only going back is useful here, but definately a pointer back to the main/current version of the feed I _think_ is important here.
movq commented 3 months ago
Poster
Collaborator

Omitting next would certainly make the implementation a tiny bit easier.

Why do we need a pointer back to the main feed? 🤔

Omitting `next` would certainly make the implementation a tiny bit easier. Why do we need a pointer back to the main feed? 🤔
lyse commented 3 months ago
Poster
Collaborator

The main feed pointer is required at least for twt hashing as you've already stated in the spec. ;-)

Also if you somehow come across an archive and there is no reference to the main feed, it might be quite tricky to reconstruct it.

The main feed pointer is required at least for twt hashing as you've already stated in the spec. ;-) Also if you somehow come across an archive and there is no reference to the main feed, it might be quite tricky to reconstruct it.
lyse marked this conversation as resolved
Collaborator

Should we include the date/time range in feed_prev and feed_next, so that clients know in advance which twts they are going to fetch?

Oh, very interesting idea! 👍 Let me sleep on this. I don't know of any system that is doing that, maybe there is a reason behind that: complexity and harder maintainability. :-?

> Should we include the date/time range in feed_prev and feed_next, so that clients know in advance which twts they are going to fetch? Oh, very interesting idea! :+1: Let me sleep on this. I don't know of any system that is doing that, maybe there is a reason behind that: complexity and harder maintainability. :-?
Owner

This will probably work as is, but I wonder: Should we include the date/time range in feed_prev and feed_next, so that clients know in advance which twts they are going to fetch?

Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed?

> This will probably work as is, but I wonder: Should we include the date/time range in `feed_prev` and `feed_next`, so that clients know *in advance* which twts they are going to fetch? Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed?
Poster
Collaborator

Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed?

No, I was just thinking that it might make it easier for clients to decide whether they want to retrieve an archived feed or not. But I'm not so sure about it anymore myself. 🤔

> Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed? No, I was just thinking that it might make it easier for clients to decide whether they want to retrieve an archived feed or not. But I'm not so sure about it anymore myself. 🤔
Owner

Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed?

No, I was just thinking that it might make it easier for clients to decide whether they want to retrieve an archived feed or not. But I'm not so sure about it anymore myself. 🤔

I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place.

> > Hmm I'm not sure about this... Are we worried about the time between a feed being archived and a client potentially not seeing the last few twts in that archived feed? > > No, I was just thinking that it might make it easier for clients to decide whether they want to retrieve an archived feed or not. But I'm not so sure about it anymore myself. 🤔 I _think_ a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place.
Poster
Collaborator

(I'll squash all this once we're happy with it.)

(I'll squash all this once we're happy with it.)
Poster
Collaborator

I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place.

Thought so, too, but actually, I'm not so sure anymore.

  • Some client fetches my feed.
  • I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my entire feed, so the new main feed will start completely empty, then the last two twts will be added to it.
  • The client from above comes back and fetches my feed again. If that client does not also have a look at my prev feed, it might miss three twts (the ones that were added before archival).

How do we deal with that? If there was some info in metadata that says "next has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once).

Or maybe this just means that archiving an entire feed is a bad idea.

I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴)

> I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place. Thought so, too, but actually, I'm not so sure anymore. * Some client fetches my feed. * I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my *entire* feed, so the new main feed will start completely empty, then the last two twts will be added to it. * The client from above comes back and fetches my feed again. If that client does not also have a look at my `prev` feed, it might miss three twts (the ones that were added before archival). How do we deal with that? If there was some info in metadata that says "`next` has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once). Or maybe this just means that archiving an entire feed is a bad idea. I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴)
Owner

I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place.

Thought so, too, but actually, I'm not so sure anymore.

  • Some client fetches my feed.
  • I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my entire feed, so the new main feed will start completely empty, then the last two twts will be added to it.
  • The client from above comes back and fetches my feed again. If that client does not also have a look at my prev feed, it might miss three twts (the ones that were added before archival).

How do we deal with that? If there was some info in metadata that says "next has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once).

Or maybe this just means that archiving an entire feed is a bad idea.

I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴)

No we defiantely need feed roation/archiveal. the feeds.twtxt.net already does this for performance reasons.

Does it need to be a timstamp though?
What if it was just the last Hash of the last entry?
Can't a client then use that to determine whether or not they've completed fetched an archived feed or not?

> > I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place. > > Thought so, too, but actually, I'm not so sure anymore. > > * Some client fetches my feed. > * I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my *entire* feed, so the new main feed will start completely empty, then the last two twts will be added to it. > * The client from above comes back and fetches my feed again. If that client does not also have a look at my `prev` feed, it might miss three twts (the ones that were added before archival). > > How do we deal with that? If there was some info in metadata that says "`next` has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once). > > Or maybe this just means that archiving an entire feed is a bad idea. > > I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴) No we defiantely need feed roation/archiveal. the [feeds.twtxt.net](https://feeds.twtxt.net) already does this for performance reasons. Does it need to be a timstamp though? What if it was just the last Hash of the last entry? Can't a client then use that to determine whether or not they've completed fetched an archived feed or not?
Owner

I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place.

Thought so, too, but actually, I'm not so sure anymore.

  • Some client fetches my feed.
  • I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my entire feed, so the new main feed will start completely empty, then the last two twts will be added to it.
  • The client from above comes back and fetches my feed again. If that client does not also have a look at my prev feed, it might miss three twts (the ones that were added before archival).

How do we deal with that? If there was some info in metadata that says "next has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once).

Or maybe this just means that archiving an entire feed is a bad idea.

I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴)

No we defiantely need feed roation/archiveal. the feeds.twtxt.net already does this for performance reasons.

Does it need to be a timstamp though?
What if it was just the last Hash of the last entry?
Can't a client then use that to determine whether or not they've completed fetched an archived feed or not?

It could be done like this:

# prev = <last_hash> <relative_uri>

For for archvied/rotated feeds just:

# next = <relative_uri>
> > > I think a client should only fetch an archived feed for "posterity" honestly. i.e: for archival, backup or search/crawl/index purposes. I see no reason why a client would really care about re-visting an archived feed if they actively follow the feed in the first place. > > > > Thought so, too, but actually, I'm not so sure anymore. > > > > * Some client fetches my feed. > > * I add 5 new twts, one by one. The third one triggers feed archival. For whatever reason, I archive my *entire* feed, so the new main feed will start completely empty, then the last two twts will be added to it. > > * The client from above comes back and fetches my feed again. If that client does not also have a look at my `prev` feed, it might miss three twts (the ones that were added before archival). > > > > How do we deal with that? If there was some info in metadata that says "`next` has feeds from $date to $date", then clients would know that they have to fetch my archived feed (just once). > > > > Or maybe this just means that archiving an entire feed is a bad idea. > > > > I need to think about this again ... (Maybe I'm just confused right now, happens all the time. 🥴) > > No we defiantely need feed roation/archiveal. the [feeds.twtxt.net](https://feeds.twtxt.net) already does this for performance reasons. > > Does it need to be a timstamp though? > What if it was just the last Hash of the last entry? > Can't a client then use that to determine whether or not they've completed fetched an archived feed or not? It could be done like this: ``` # prev = <last_hash> <relative_uri> ``` For for archvied/rotated feeds just: ``` # next = <relative_uri> ```
lyse reviewed 3 months ago
won't receive further updates. Deletion or editing is still allowed, but
feed authors should not expect clients to retrieve archived feeds on a
regular basis (or at all). When moving twts to an archived feed, their
relative order should be retained.
lyse commented 3 months ago
Poster
Collaborator

Maybe add: "A twt should only appear in one feed, either the main feed or an archived feed, but not in both."

Maybe add: "A twt should only appear in one feed, either the main feed or an archived feed, but not in both."
Poster
Owner

Twt Hashes ensure this is basically impossible if clients use thw Twt Hash extneison but otherwise I agree, it's weird to see duplicate content in an archived feed.

Twt Hashes ensure this is basically impossible if clients use thw Twt Hash extneison but otherwise I agree, it's weird to see duplicate content in an archived feed.
lyse marked this conversation as resolved
Poster
Collaborator

It could be done like this: [...]

Hmmm. 🤔 Let's simulate it:

A client fetches my main feed:

2021-10-29T11:02:45Z	hello
2021-10-29T11:10:29Z	foo
2021-10-29T13:53:46Z	bar            <-- twt hash #o5cto2q

Let's assume that the bar line has that particular hash. The client stores this hash in its database as "most-recent hash".

I add new twts to my main feed:

2021-10-29T11:02:45Z	hello
2021-10-29T11:10:29Z	foo
2021-10-29T13:53:46Z	bar            <-- twt hash #o5cto2q
2021-10-29T17:20:11Z	this is new
2021-10-29T18:10:39Z	also new       <-- twt hash #kpw257a

And then I add some more, but this time they trigger rotation. So I move my entire main feed to twtxt-archive-1234.txt (which looks exactly the same as this last code snippet). My new main feed now looks like this:

# prev = kpw257a twtxt-archive-1234.txt
2021-10-29T18:15:42Z	just had coffe
2021-10-29T18:43:55Z	just had whisky

That client from before comes back to fetch my feed. It processes all twts in it and adds them to its database.

Then the client notices: "Whoops, last time I saw twt #o5cto2q in the feed, but now it's no longer there. But there's a prev field, so let's retrieve that and process all twts in it. Aha, there's twt #o5cto2q in it! So I'm all good now."

Right?

So this means that clients could traverse all the prev feeds backwards, until they find their last known "most-recent hash". (Or maybe until some limit is reached to avoid fetching all archived feeds. Maybe twt #o5cto2q simply got deleted.)

According to this, a simple prev = <relative-uri> would be enough. No need for a hash or timestamps after all, right?

🤔

> It could be done like this: [...] Hmmm. 🤔 Let's simulate it: A client fetches my main feed: ``` 2021-10-29T11:02:45Z hello 2021-10-29T11:10:29Z foo 2021-10-29T13:53:46Z bar <-- twt hash #o5cto2q ``` Let's assume that the `bar` line has that particular hash. The client stores this hash in its database as "most-recent hash". I add new twts to my main feed: ``` 2021-10-29T11:02:45Z hello 2021-10-29T11:10:29Z foo 2021-10-29T13:53:46Z bar <-- twt hash #o5cto2q 2021-10-29T17:20:11Z this is new 2021-10-29T18:10:39Z also new <-- twt hash #kpw257a ``` And then I add some more, but this time they trigger rotation. So I move my entire main feed to `twtxt-archive-1234.txt` (which looks exactly the same as this last code snippet). My new main feed now looks like this: ``` # prev = kpw257a twtxt-archive-1234.txt 2021-10-29T18:15:42Z just had coffe 2021-10-29T18:43:55Z just had whisky ``` That client from before comes back to fetch my feed. It processes all twts in it and adds them to its database. Then the client notices: "Whoops, last time I saw twt #o5cto2q in the feed, but now it's no longer there. But there's a `prev` field, so let's retrieve that and process all twts in it. Aha, there's twt #o5cto2q in it! So I'm all good now." Right? So this means that clients could traverse all the `prev` feeds backwards, until they find their last known "most-recent hash". (Or maybe until some limit is reached to avoid fetching *all* archived feeds. Maybe twt #o5cto2q simply got deleted.) According to this, a simple `prev = <relative-uri>` would be enough. No need for a hash or timestamps after all, right? 🤔
Collaborator

Very good example. The only time the twt hash in the prev could be useful is in a scenario like that:

The client had fetched the feed with all five twts (second code block including #kpw257a). Then the next two twts get appended and the feed is rotated. Now the client fetches again and gets the third code block with just the two new twts. It discovers that no twt in that feed had been seen so far. In order to make sure that no twt has been missed, it would fetch the archived feed twtxt-archive-1234.txt from the prev field. Now since the hash of the last twt in that archive feed is stated in the prev field, it detects, that the archive feed ends with #kpw257a, which it already knows. So it can save the request to fetch the archive feed in this particular case.

But that is the only thing a twt hash would help. I imagine this scenario does not happen a lot in the wild. So we can leave the hash out and go with a simpler approach.

In reality feed rotation is probabaly more incremental. Not the complete feed is being rotated, but maybe just the n oldest twts. So chances are that a client gets maybe at least 10% overlap (number totally made up) of what it had seen before and after the rotation still sees.

Very good example. The *only* time the twt hash in the `prev` could be useful is in a scenario like that: The client had fetched the feed with all five twts (second code block including #kpw257a). Then the next two twts get appended and the feed is rotated. Now the client fetches again and gets the third code block with just the two new twts. It discovers that no twt in that feed had been seen so far. In order to make sure that no twt has been missed, it would fetch the archived feed *twtxt-archive-1234.txt* from the `prev` field. Now since the hash of the last twt in that archive feed is stated in the `prev` field, it detects, that the archive feed ends with #kpw257a, which it already knows. So it can save the request to fetch the archive feed in this particular case. But that is the only thing a twt hash would help. I imagine this scenario does not happen a lot in the wild. So we can leave the hash out and go with a simpler approach. In reality feed rotation is probabaly more incremental. Not the complete feed is being rotated, but maybe just the n oldest twts. So chances are that a client gets maybe at least 10% overlap (number totally made up) of what it had seen before and after the rotation still sees.
Owner

There is another way to do this.

  • Client tracks the prev field.
  • On a detected chagned, it fetches the archived feed at prev with the last known location + a buffer just in case or fallback to a full fetch.

Basically if a client is tracking a feed, it can also track the last position using range requests, etc. If it now detects it has a new prev value it can deduce the feed was just rotated so the "last position" might refere t othe archived feed, so re-fetching the archived feed will hopefully "cetch" up the missing twts since last fetch. Once done, carry on with the current feed. Rinse and repeat.

There is another way to do this. - Client tracks the `prev` field. - On a detected chagned, it fetches the archived feed at `prev` with the last known location + a buffer just in case or fallback to a full fetch. Basically if a client is tracking a feed, it can also track the last position using range requests, etc. If it now detects it has a new `prev` value it can deduce the feed was just rotated so the "last position" might refere t othe archived feed, so re-fetching the archived feed will hopefully "cetch" up the missing twts since last fetch. Once done, carry on with the current feed. Rinse and repeat.
lyse reviewed 3 months ago
can be based on date ranges.
The main feed and all archived feeds form a linked list using the
[metadata](metadataextension.html) fields described below.
lyse commented 3 months ago
Poster
Collaborator

"field" should be singular, now that we wiped the next pointer.

"field" should be singular, now that we wiped the `next` pointer.
lyse marked this conversation as resolved
lyse reviewed 3 months ago
`prev` is a name relative to the base directory of the feed's URL in
`url` (more specifically, in the URL that the client used to retrieve
the feed). In the example above, `prev` would evalute to the full URL
lyse commented 3 months ago
Poster
Collaborator

Typo: evalu_a_te

Typo: evalu_a_te
lyse marked this conversation as resolved
lyse reviewed 3 months ago
Archived feeds *can* contain another `prev` field to point to yet
another archived feed.
`prev` is a name relative to the base directory of the feed's URL in
lyse commented 3 months ago
Poster
Collaborator

Not so sure whether we want to enforce relative-only URLs. What about the situation if then main feed redirects to somewhere else?

E.g. requesting https://example.com/twtxt.txt (which is also the url field) redirects to https://example.com/foo/twtxt.txt and contains a prev = twtxt-1.txt. Would this then result in an archive feed https://example.com/twtxt-1.txt or https://example.com/foo/twtxt-1.txt? Probably the first one.

Not so sure whether we want to enforce relative-only URLs. What about the situation if then main feed redirects to somewhere else? E.g. requesting `https://example.com/twtxt.txt` (which is also the `url` field) redirects to `https://example.com/foo/twtxt.txt` and contains a `prev = twtxt-1.txt`. Would this then result in an archive feed `https://example.com/twtxt-1.txt` or `https://example.com/foo/twtxt-1.txt`? Probably the first one.
Poster
Owner

I reckon for simplicy it should be assumed and required to be an absolute uri

I reckon for simplicy it _should_ be assumed and required to be an absolute uri
movq commented 3 months ago
Poster
Collaborator

Oh god, now you open the can of worm^W redirects. 🤔

It would be easier to implement if prev used absolute URLs. But then we’d lose some flexibility (when you move your main feed, you’d probably have to update all archived feeds) and we would have to introduce multiple prev fields to allow for the same feed to be available over multiple protocols. (#494)

I think it’s okay to require relative URLs. Clients could work like this:

  1. Request the main feed and thus deal with any redirects (clients have to do this anyway).
  2. While doing so, they probably update the feed’s URL in their database. Or they flag that feed as “has been redirected, user must confirm the change”.
  3. When clients request an archived feed, they either use the main feed’s url or they refuse to fetch the archived feed if the main feed is in “a state of flux” (i.e., we saw a redirect last time but the user has not yet confirmed that this redirect should actually be used).

So, basically … before fetching an archived feed, you have to fetch the main feed first. I suspect this happens anyway.

Oh god, now you open the can of worm^W redirects. 🤔 It would be easier to implement if `prev` used absolute URLs. But then we’d lose some flexibility (when you move your main feed, you’d probably have to update all archived feeds) and we would have to introduce multiple `prev` fields to allow for the same feed to be available over multiple protocols. (https://git.mills.io/yarnsocial/yarn/pulls/494#issuecomment-6168) I think it’s okay to require relative URLs. Clients could work like this: 1. Request the main feed and thus deal with any redirects (clients have to do this anyway). 2. While doing so, they probably update the feed’s URL in their database. Or they flag that feed as “has been redirected, user must confirm the change”. 3. When clients request an archived feed, they either use the main feed’s `url` or they *refuse* to fetch the archived feed if the main feed is in “a state of flux” (i.e., we saw a redirect last time but the user has not yet confirmed that this redirect should actually be used). So, basically … before fetching an archived feed, you have to fetch the main feed first. I *suspect* this happens anyway.
lyse commented 3 months ago
Poster
Collaborator

Okay, if the url is misconfigured, we just give up. That's fine.

Okay, if the `url` is misconfigured, we just give up. That's fine.
lyse marked this conversation as resolved
Poster
Collaborator

There is another way to do this.

  • Client tracks the prev field.
  • On a detected chagned, it fetches the archived feed at prev with the last known location + a buffer just in case or fallback to a full fetch.

Basically if a client is tracking a feed, it can also track the last position using range requests, etc. If it now detects it has a new prev value it can deduce the feed was just rotated so the "last position" might refere t othe archived feed, so re-fetching the archived feed will hopefully "cetch" up the missing twts since last fetch. Once done, carry on with the current feed. Rinse and repeat.

Alright, I think I understand. 🤔 (I probably won’t implement it like that in my client, but that doesn’t matter. 😁)

> There is another way to do this. > > - Client tracks the `prev` field. > - On a detected chagned, it fetches the archived feed at `prev` with the last known location + a buffer just in case or fallback to a full fetch. > > Basically if a client is tracking a feed, it can also track the last position using range requests, etc. If it now detects it has a new `prev` value it can deduce the feed was just rotated so the "last position" might refere t othe archived feed, so re-fetching the archived feed will hopefully "cetch" up the missing twts since last fetch. Once done, carry on with the current feed. Rinse and repeat. Alright, I think I understand. 🤔 (I probably won’t implement it like that in my client, but that doesn’t matter. 😁)
movq force-pushed pagination from 15b3e601b0 to 6a3cd4e5e7 3 months ago
Poster
Collaborator

I think all open comments have been addressed now. ✍

I think all open comments have been addressed now. ✍
lyse approved these changes 3 months ago
lyse left a comment

(I probably won’t implement it like that in my client, but that doesn’t matter. 😁)

Neither will I. :-D (I first need to pull out the fetch and download into my own client, before I can implement anything remotely what we have come up with here. That's goinf to be fun.)

The hash in the prev field seems a bit too overengineered to me, but I can live with it. Looking good now.

prologic changed title from WIP: Add pagination extension spec to Add pagination extension spec 3 months ago
prologic added 1 commit 3 months ago
2c7c77c4e8 Merge branch 'master' into pagination
prologic merged commit e460caa8f1 into master 3 months ago
prologic referenced this issue from a commit 3 months ago

Reviewers

lyse approved these changes 3 months ago
continuous-integration/drone/pr Build is passing
The pull request has been merged as e460caa8f1.
Sign in to join this conversation.
Loading…
There is no content yet.