James Mills
19127bf374
Update maxScrapers to 2 (from 50) to slow down scraping and reduce CPU and Disk I/O
8 months ago
James Mills
933732f3b9
Add heuristic for feeds ending with a .txt extension other than twtxt that signifies the feed's cannonical name (e.g: 8ball.txt)
9 months ago
James Mills
7f8eea05a3
Fix CI
9 months ago
James Mills
860579612e
Add Twter to URL model and add job to fix null DiscoveredAt fields when Twter is populated
9 months ago
James Mills
f044fc370d
Add support for application/json content-negogation to search requests/responses
9 months ago
James Mills
09d1ea673e
Remove more useless code borrowed from yarn's codebase
9 months ago
James Mills
4946ccb135
Add an extra field stored urls (DiscoveredAt) to prevent new urls from being written to the feed twice
9 months ago
James Mills
22bc57c77e
Fix a whole bunch of bugs I can't be botehred explaining
9 months ago
James Mills
f5d2f0fc0f
Add missing stats_indexed_feeds gauge
9 months ago
James Mills
0ecfbce721
Fix paging and preserving sort
9 months ago
James Mills
050e4431fa
Add support for preserving the scroll position when navigating back 'n forth between search reulsts and permalinks
9 months ago
James Mills
519ccfb278
Fix discovered feed uris and overridden nicks and add a slugiified nick if unknown
9 months ago
James Mills
1c7b29aa84
Add yarns_stats_ gauges
9 months ago
James Mills
3f7813dfec
Remove some more dead code
9 months ago
James Mills
85be59a178
Refactored stats handling
9 months ago
James Mills
b2ae6589e4
Just recrawl feeds every hour (for now)
9 months ago
lyse
6216846490
Fix typos and switch to HTTPS links ( #6 )
...
* Fix the "keyword" typo.
* Fix title casing of section headline.
* Make search button stand out a bit more.
* Emphasize double quotes.
* Switch HTTP to secure HTTPS URLs.
Co-authored-by: Lysander Trischler <twtxt@lyse.isobeef.org>
Reviewed-on: yarnsocial/yarns#6
Co-authored-by: lyse <lyse@noreply@mills.io>
Co-committed-by: lyse <lyse@noreply@mills.io>
9 months ago
James Mills
7d8b185593
Fix recrawl strategy
9 months ago
James Mills
0ed1657058
Fix initial NewAvg
9 months ago
James Mills
4704ccd1c4
Split our rescraping into two parts (active and broken)
9 months ago
James Mills
7392fe6a0d
Update How exponential moving averages are calcualted
9 months ago
James Mills
115621f1c1
Add a very hacky recrawl algorithm
9 months ago
James Mills
7072ad6155
Fix CI
9 months ago
James Mills
5b19831ea4
Fix scraper status code handling
9 months ago
James Mills
e581bd171e
Update debug logging for lastScrapedAgo
9 months ago
James Mills
ff6a3b19e1
Fix error handling
9 months ago
James Mills
661e6fa45a
Fix task error handling
9 months ago
James Mills
196e5f7479
Add Dead Feeds to Stats page
9 months ago
James Mills
ca6679f51f
Fix /tasks endpoint
9 months ago
James Mills
cd71963f19
Fix logging
9 months ago
James Mills
b93479f485
Fix some concurrent bugs
9 months ago
James Mills
73a0dd0e07
Add loggig to figure out how to recrape every other url
9 months ago
James Mills
2ea9031feb
Fix send on closed channel bug with newfeeds channel
9 months ago
James Mills
f866df1895
Avoid adding scrape/crawl tasks that already exist and are running
9 months ago
James Mills
3debd80e02
Add support for re-scraping feeds not scraped yet by the crawler
9 months ago
James Mills
87e78c6188
Fix typos
9 months ago
James Mills
b3a0ace274
Add dead feed detection
9 months ago
James Mills
14bd5e8404
Fix support for scraping gopher feeds
9 months ago
James Mills
84168aac1c
Remove session not found warning
9 months ago
James Mills
6ddc092952
Remove expensive (for now) Drone CI Docker Image steps
9 months ago
James Mills
cadeb49e24
Add new RecrawlFeedsJob
9 months ago
James Mills
f221607ca0
Refactor crawler and scraper and remove a bunch of unused code
9 months ago
James Mills
8e135a1391
Fix a bunch of lint errors
9 months ago
James Mills
abed98b736
Fix a bunch of data race bugs and deadlocks. Improve stats
9 months ago
James Mills
2420a1e7e2
Add LastError to URL model
9 months ago
James Mills
871d2c558e
Deal with feeds that change locations
9 months ago
James Mills
397f46c76e
Refactor
9 months ago
James Mills
2b854683c8
Add a /twtxt.txt feed of all new feeds discovered during crawls
9 months ago
James Mills
160919a26f
Fix CI
10 months ago
James Mills
3b9ebd3943
Fix display of date/times
10 months ago