Add support for feed refresh intervals + spec + refactor to support handling bad feeds #624

Merged
prologic merged 10 commits from refresh_metadata_field into master 4 weeks ago
prologic commented 4 weeks ago
Owner
There is no content yet.
prologic added 1 commit 4 weeks ago
prologic requested review from lyse 4 weeks ago
prologic requested review from xuu 4 weeks ago
prologic requested review from movq 4 weeks ago
prologic requested review from fastidious 4 weeks ago
prologic requested review from ullarah 4 weeks ago
prologic self-assigned this 4 weeks ago
fastidious reviewed 4 weeks ago
**Note:** This field is ignored by `yarnd` clients and instead uses an automatic exponential back-off based on a weight moving average of the feed's update frequency.
**Also note:** An empty, bad ur unaprsable valus is ignored.
Poster
Collaborator

Typos here. It should be "An empty, bad, or unparsable value is ignored."

Typos here. It should be "An empty, bad, or unparsable value is ignored."
prologic marked this conversation as resolved
fastidious reviewed 4 weeks ago
This field is used by clients as a "hint" as to how often they should fetch or update this feed. This is defined by feed authors and is primarily used so feed author's can give hints to clients to reduce unwanted traffic on their feed's web server(s).
The format of the value is a time duration as understood by the [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) function used by the `yarnd` implemtnation (_written in [Go](https://golang.org)_). For example: `5m` and `1h` are valid values.
Poster
Collaborator

Typo "implementation".

Typo "implementation".
prologic marked this conversation as resolved
prologic added 1 commit 4 weeks ago
275876d29b
Fix typos
prologic added 1 commit 4 weeks ago
8aa829e72b
Fix tests
prologic force-pushed refresh_metadata_field from 8aa829e72b to e3cc24301b 4 weeks ago
prologic force-pushed refresh_metadata_field from e3cc24301b to daba316068 4 weeks ago
movq reviewed 4 weeks ago
This field is used by clients as a "hint" as to how often they should fetch or update this feed. This is defined by feed authors and is primarily used so feed author's can give hints to clients to reduce unwanted traffic on their feed's web server(s).
The format of the value is a time duration as understood by the [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) function used by the [Yarn.social](https://yarn.social) multi-user client `yarnd` (_written in [Go](https://golang.org)_). For example: `5m` and `1h` are valid values.
movq commented 4 weeks ago
Poster
Collaborator

To make it easier for non-Go implementations to support this spec, maybe this should just be an integer (minutes)?

To make it easier for non-Go implementations to support this spec, maybe this should just be an integer (minutes)?
Poster
Owner

Ahh yes we can do that. Let me adjust this...

Ahh yes we can do that. Let me adjust this...
Poster
Owner

@movq Actually can we make it ever simpler and just states the value is an integer in seconds?

e.g:

# refresh = 1800
@movq Actually can we make it ever simpler and just states the value is an integer in seconds? e.g: ``` # refresh = 1800 ```
movq commented 4 weeks ago
Poster
Collaborator

I’m fine with seconds. 👌 I just assumed that we wouldn’t need that kind of granularity anyway. 😁

I’m fine with seconds. 👌 I just assumed that we wouldn’t need that kind of granularity anyway. 😁
prologic marked this conversation as resolved
prologic added 1 commit 4 weeks ago
642b29bbb3
Use seconds as the refresh value to simplfy things
prologic added 1 commit 4 weeks ago
87e40aa7c0
Update ChangeLog of the Metadata spec

why seconds, instead of minutes, or hours? would ANYONE want sub-minute refresh rate? converting either to seconds would be pretty easy anyway...

why seconds, instead of minutes, or hours? would ANYONE want sub-minute refresh rate? converting either to seconds would be pretty easy anyway...
Collaborator

Seconds provide more control without sacrificing anything. I am used to using seconds, they work out just fine.

Seconds provide more control without sacrificing anything. I am used to using seconds, they work out just fine.
Collaborator

I'm used to work with milliseconds!!

SCNR. :-D On more serious note, I want to raise awareness for the Syndication Specification for RSS and Atom feeds. It's probably overkill for the purpose here, but it has a quite nice approach of defining a period and frequency. Just take it as a source of inspiration for the brainstorming.

I'm used to work with milliseconds!! SCNR. :-D On more serious note, I want to raise awareness for the [Syndication Specification](https://web.resource.org/rss/1.0/modules/syndication/) for RSS and Atom feeds. It's probably overkill for the purpose here, but it has a quite nice approach of defining a period and frequency. Just take it as a source of inspiration for the brainstorming.
lyse reviewed 4 weeks ago
### `refresh`
This field is used by clients as a "hint" as to how often they should fetch or update this feed. This is defined by feed authors and is primarily used so feed author's can give hints to clients to reduce unwanted traffic on their feed's web server(s).
lyse commented 4 weeks ago
Poster
Collaborator

The plural "s" of "feed authors" should not be separated by an apostrophe.

The plural "s" of "feed authors" should not be separated by an apostrophe.
The value of this field is seconds represented by an integer.
**Note:** This field is ignored by `yarnd` clients and instead uses an automatic exponential back-off based on a weight moving average of the feed's update frequency.
lyse commented 4 weeks ago
Poster
Collaborator

Hmm, I don't really get the point of standardizing a field yarnd isn't even using. :-? However, it actually does at the moment, but it will be changed in the future.

Hmm, I don't really get the point of standardizing a field yarnd isn't even using. :-? However, it actually does at the moment, but it will be changed in the future.
Poster
Owner

Good point, discussion in #yarn.social on Libera chat:

[08:02:15]  <prologic> Lyse re your comment https://git.mills.io/yarnsocial/yarn/pulls/624#issuecomment-7154
[08:02:50]  <Lyse> So you would do that for a complete yarn tree, not just one "layer", right?
[08:02:59]  <prologic> I intend to build (I will probably try today/tomorrow) some kind of exponential back-off based on a weighted moving average of a feed's post frequency
[08:03:23]  <prologic> I had a brief look at the specs you linked, tbh never seen them used before :)
[08:03:39]  <Lyse> Alright.
[08:03:50]  <Lyse> I actually use this in my Atom feeds. :-)
[08:03:55]  <xuu> Lyse.. a root would describe a single yarn list i think
[08:04:04]  <prologic> xuu nice
[08:04:34]  <xuu> it gets a bit complicated to hash the sub yarns i think
[08:04:43]  <prologic> Lyse my point is, I _think_ I can automatically determine how often refresh/fetch a feed based on how often it is posted to :)
[08:04:51]  <prologic> I hope 🤞
[08:05:19]  <xuu> then the gossip can be like share some N of recent yarns updated
[08:05:35]  <xuu> maybe even a random set
[08:06:33]  <prologic> xuu you saw ./tools/compare_twt_chains.sh right?
[08:06:52]  <prologic> Where Quark and I have converged Yarns manually as a poc
[08:07:35]  <Lyse> Still the question remains, prologic, if yarnd is not gonna use the field but calculates it on its own, why bother and standardize it?
[08:09:18]  <prologic> It's a good point actually
[08:09:49]  <prologic> It's a work-around for gbmor where his feed is being hit by 3 pods resulting in around ~1k requests/day :D
[08:10:07]  <prologic> But your point is quite valid, maybe I just build the automatic back-off?
[08:10:19]  <prologic> But OTOH what about other clients/implementation?
[08:12:15]  <Lyse> They could do the same.
[08:12:58]  <Lyse> And probably most other clients are not periodically fetching feeds. The user does this manually.
[08:13:07]  <prologic> True
[08:14:13]  <Lyse> I just know of yarnd and jenny/cron that are pulling periodically. But my client knowledge is very limited. So there's that. :-)
[08:14:50]  <prologic> What's a typical Jenny cron schedule?
[08:17:04]  <Lyse> movq or Quark have to answer that, I don't know ;-)
[08:17:12]  <prologic> Hmm
[08:17:18]  <prologic> Ok :)
[08:17:48]  <prologic> Alrighty, if we drop the `refresh` field from the spec and just write code to do this automatically
[08:17:55]  <prologic> Are you awake enough to help me write that code? :)
[08:18:17]  <Lyse> Ah, I can just look at my logs. Movq uses a 15min interval.
[08:18:28]  <prologic> I _think_ ShouldRefreshFeed() also handle dead feeds as well, I.e: stop fetching dead feeds (404 for example)
[08:18:44]  <prologic> Yeah and I suspect Quark uses a smaller interval :)
[08:18:49]  <Lyse> Quark 5min.
[08:18:51]  <prologic> At least he does with his Yarn pod (1m)
[08:18:58]  <prologic> So there you go :)
[08:19:57]  <Lyse> I'm have to go to bed. Nearly fell asleep while watching a video.
[08:20:09]  <prologic> Part of me feels that having a `refresh` in a feed such as gbmor is a good "fallback" for feed authors that either can't cope with or don't want to with the potential volume of requests to their feed -- even if they're mostly just 304 Not Modified requests/resposnes
[08:20:28]  <prologic> So I'm not sure tbh, but I've often been swayed by your opinions in the past ;)
[08:21:23]  <Lyse> I will sleep on it. :-)
[08:21:26]  <prologic> Ok :)
[08:21:38]  <prologic> I _will_ try to write this magical™ algorithm :D
[08:21:41]  <Lyse> See you tomorrow™, mates1
[08:21:52]  <Lyse> Have fun ;-)
[08:22:00]  <prologic> Maybe I'll split this PR so I can land parts separately
Good point, discussion in #yarn.social on Libera chat: ``` [08:02:15] <prologic> Lyse re your comment https://git.mills.io/yarnsocial/yarn/pulls/624#issuecomment-7154 [08:02:50] <Lyse> So you would do that for a complete yarn tree, not just one "layer", right? [08:02:59] <prologic> I intend to build (I will probably try today/tomorrow) some kind of exponential back-off based on a weighted moving average of a feed's post frequency [08:03:23] <prologic> I had a brief look at the specs you linked, tbh never seen them used before :) [08:03:39] <Lyse> Alright. [08:03:50] <Lyse> I actually use this in my Atom feeds. :-) [08:03:55] <xuu> Lyse.. a root would describe a single yarn list i think [08:04:04] <prologic> xuu nice [08:04:34] <xuu> it gets a bit complicated to hash the sub yarns i think [08:04:43] <prologic> Lyse my point is, I _think_ I can automatically determine how often refresh/fetch a feed based on how often it is posted to :) [08:04:51] <prologic> I hope 🤞 [08:05:19] <xuu> then the gossip can be like share some N of recent yarns updated [08:05:35] <xuu> maybe even a random set [08:06:33] <prologic> xuu you saw ./tools/compare_twt_chains.sh right? [08:06:52] <prologic> Where Quark and I have converged Yarns manually as a poc [08:07:35] <Lyse> Still the question remains, prologic, if yarnd is not gonna use the field but calculates it on its own, why bother and standardize it? [08:09:18] <prologic> It's a good point actually [08:09:49] <prologic> It's a work-around for gbmor where his feed is being hit by 3 pods resulting in around ~1k requests/day :D [08:10:07] <prologic> But your point is quite valid, maybe I just build the automatic back-off? [08:10:19] <prologic> But OTOH what about other clients/implementation? [08:12:15] <Lyse> They could do the same. [08:12:58] <Lyse> And probably most other clients are not periodically fetching feeds. The user does this manually. [08:13:07] <prologic> True [08:14:13] <Lyse> I just know of yarnd and jenny/cron that are pulling periodically. But my client knowledge is very limited. So there's that. :-) [08:14:50] <prologic> What's a typical Jenny cron schedule? [08:17:04] <Lyse> movq or Quark have to answer that, I don't know ;-) [08:17:12] <prologic> Hmm [08:17:18] <prologic> Ok :) [08:17:48] <prologic> Alrighty, if we drop the `refresh` field from the spec and just write code to do this automatically [08:17:55] <prologic> Are you awake enough to help me write that code? :) [08:18:17] <Lyse> Ah, I can just look at my logs. Movq uses a 15min interval. [08:18:28] <prologic> I _think_ ShouldRefreshFeed() also handle dead feeds as well, I.e: stop fetching dead feeds (404 for example) [08:18:44] <prologic> Yeah and I suspect Quark uses a smaller interval :) [08:18:49] <Lyse> Quark 5min. [08:18:51] <prologic> At least he does with his Yarn pod (1m) [08:18:58] <prologic> So there you go :) [08:19:57] <Lyse> I'm have to go to bed. Nearly fell asleep while watching a video. [08:20:09] <prologic> Part of me feels that having a `refresh` in a feed such as gbmor is a good "fallback" for feed authors that either can't cope with or don't want to with the potential volume of requests to their feed -- even if they're mostly just 304 Not Modified requests/resposnes [08:20:28] <prologic> So I'm not sure tbh, but I've often been swayed by your opinions in the past ;) [08:21:23] <Lyse> I will sleep on it. :-) [08:21:26] <prologic> Ok :) [08:21:38] <prologic> I _will_ try to write this magical™ algorithm :D [08:21:41] <Lyse> See you tomorrow™, mates1 [08:21:52] <Lyse> Have fun ;-) [08:22:00] <prologic> Maybe I'll split this PR so I can land parts separately ```
Poster
Owner

Actually I think @jan6 makes a very good point in his comment; I believe I actually had this in the back of my mind.

Actually I _think_ @jan6 makes a very good point in his [comment](https://git.mills.io/yarnsocial/yarn/pulls/624#issuecomment-7160); I _believe_ I actually had this in the back of my mind.
Poster
Owner

I made a mistake here when writing this part, I did not mean to say yarnd would ignore the field. It would only be ignored if it's either a) missing or b) unparsable

I _made_ a mistake here when writing this part, I did not mean to say `yarnd` would ignore the field. It would only be ignored if it's either a) missing or b) unparsable
prologic marked this conversation as resolved
return true
}
// TODO: Implement exponentiatl back-off using weighted moving average of a feed's update frequency
lyse commented 4 weeks ago
Poster
Collaborator

Typo: exponentia_l

Typo: exponentia_l
prologic marked this conversation as resolved
types/twt.go Outdated
}
func (e ErrDeadFeed) Error() string {
return fmt.Sprintf("error dead feed: %s", e.Reason)
lyse commented 4 weeks ago
Poster
Collaborator

Hmm, these error messages are inconsistent:

  1. "not implemented" doesn't use any "error" prefix (personally I like that the best)
  2. "error: erroneous feed detected" has an "error" prefix with a colon
  3. "error dead feed" has an "error" prefix without a colon
Hmm, these error messages are inconsistent: 1. "not implemented" doesn't use any "error" prefix (personally I like that the best) 2. "error: erroneous feed detected" has an "error" prefix with a colon 3. "error dead feed" has an "error" prefix without a colon
Poster
Owner

Fixed

Fixed
prologic marked this conversation as resolved

why would yarnd ignore it? imho the best strategy is to first check if there's a refresh header, and if there's none, THEN use the backoff strategy... (or maybe "whichever is bigger")
why else would you have suggestions, if the main platform that's making the most requests, doesn't even respect it?
individual clients usually don't automatically check feeds in the background, only when you open them, so would have minimal effect regarding the header,
the only places (afaics) that it'd probably be considerably effective at, would be yarnd and registry instances... those are the only ones where there's an active server daemon that's continually (or at least, pretty often) fetching various feeds

why would yarnd ignore it? imho the best strategy is to first check if there's a refresh header, and if there's none, THEN use the backoff strategy... (or maybe "whichever is bigger") why else would you have suggestions, if the main platform that's making the most requests, doesn't even respect it? individual clients usually don't automatically check feeds in the background, only when you open them, so would have minimal effect regarding the header, the only places (afaics) that it'd probably be considerably effective at, would be yarnd and registry instances... those are the only ones where there's an active server daemon that's continually (or at least, pretty often) fetching various feeds
Poster
Owner

why seconds, instead of minutes, or hours? would ANYONE want sub-minute refresh rate? converting either to seconds would be pretty easy anyway...

It's just easier to deal with at the implemtnation level really and as @fastidious points out there is no downside really either way.

> why seconds, instead of minutes, or hours? would ANYONE want sub-minute refresh rate? converting either to seconds would be pretty easy anyway... It's just easier to deal with at the implemtnation level really and as @fastidious points out there is no downside really either way.
prologic added 1 commit 4 weeks ago
94caf3c743
Address @Lyse's feedback
prologic added 1 commit 4 weeks ago
8e4685792b
Simplify spec
prologic added 1 commit 4 weeks ago
82a1288a1f Merge branch 'master' into refresh_metadata_field
prologic added 1 commit 4 weeks ago
Poster
Owner
[09:54:01]  <prologic> I'm going to merge https://git.mills.io/yarnsocial/yarn/pulls/624 now
[09:54:12]  <prologic> I'm not going to wait for Lyse to sleep on it :)
[09:54:21]  <prologic> The reasons are valid and compelling to do this
[09:55:08]  <prologic> A magic™ refresh algorithm will also come later when I figure out how, likely behind a feature flag as I'm likely to "fuck it up" :D

``` [09:54:01] <prologic> I'm going to merge https://git.mills.io/yarnsocial/yarn/pulls/624 now [09:54:12] <prologic> I'm not going to wait for Lyse to sleep on it :) [09:54:21] <prologic> The reasons are valid and compelling to do this [09:55:08] <prologic> A magic™ refresh algorithm will also come later when I figure out how, likely behind a feature flag as I'm likely to "fuck it up" :D ```
prologic merged commit 3438075829 into master 4 weeks ago

Reviewers

lyse was requested for review 4 weeks ago
xuu was requested for review 4 weeks ago
movq was requested for review 4 weeks ago
fastidious was requested for review 4 weeks ago
ullarah was requested for review 4 weeks ago
continuous-integration/drone/pr Build is passing
The pull request has been merged as 3438075829.
Sign in to join this conversation.
Loading…
There is no content yet.