101 Commits (master)
 

Author SHA1 Message Date
James Mills 19127bf374
Update maxScrapers to 2 (from 50) to slow down scraping and reduce CPU and Disk I/O 8 months ago
James Mills 933732f3b9
Add heuristic for feeds ending with a .txt extension other than twtxt that signifies the feed's cannonical name (e.g: 8ball.txt) 9 months ago
James Mills 7f8eea05a3
Fix CI 9 months ago
James Mills 860579612e
Add Twter to URL model and add job to fix null DiscoveredAt fields when Twter is populated 9 months ago
James Mills f044fc370d
Add support for application/json content-negogation to search requests/responses 9 months ago
James Mills 09d1ea673e
Remove more useless code borrowed from yarn's codebase 9 months ago
James Mills 4946ccb135
Add an extra field stored urls (DiscoveredAt) to prevent new urls from being written to the feed twice 9 months ago
James Mills 22bc57c77e
Fix a whole bunch of bugs I can't be botehred explaining 9 months ago
James Mills f5d2f0fc0f
Add missing stats_indexed_feeds gauge 9 months ago
James Mills 0ecfbce721
Fix paging and preserving sort 9 months ago
James Mills 050e4431fa
Add support for preserving the scroll position when navigating back 'n forth between search reulsts and permalinks 9 months ago
James Mills 519ccfb278
Fix discovered feed uris and overridden nicks and add a slugiified nick if unknown 9 months ago
James Mills 1c7b29aa84
Add yarns_stats_ gauges 9 months ago
James Mills 3f7813dfec
Remove some more dead code 9 months ago
James Mills 85be59a178
Refactored stats handling 9 months ago
James Mills b2ae6589e4
Just recrawl feeds every hour (for now) 9 months ago
lyse 6216846490 Fix typos and switch to HTTPS links (#6) 9 months ago
James Mills 7d8b185593
Fix recrawl strategy 9 months ago
James Mills 0ed1657058
Fix initial NewAvg 9 months ago
James Mills 4704ccd1c4
Split our rescraping into two parts (active and broken) 9 months ago
James Mills 7392fe6a0d
Update How exponential moving averages are calcualted 9 months ago
James Mills 115621f1c1
Add a very hacky recrawl algorithm 9 months ago
James Mills 7072ad6155
Fix CI 9 months ago
James Mills 5b19831ea4
Fix scraper status code handling 9 months ago
James Mills e581bd171e
Update debug logging for lastScrapedAgo 9 months ago
James Mills ff6a3b19e1
Fix error handling 9 months ago
James Mills 661e6fa45a
Fix task error handling 9 months ago
James Mills 196e5f7479
Add Dead Feeds to Stats page 9 months ago
James Mills ca6679f51f
Fix /tasks endpoint 9 months ago
James Mills cd71963f19
Fix logging 9 months ago
James Mills b93479f485
Fix some concurrent bugs 9 months ago
James Mills 73a0dd0e07
Add loggig to figure out how to recrape every other url 9 months ago
James Mills 2ea9031feb
Fix send on closed channel bug with newfeeds channel 9 months ago
James Mills f866df1895
Avoid adding scrape/crawl tasks that already exist and are running 9 months ago
James Mills 3debd80e02
Add support for re-scraping feeds not scraped yet by the crawler 9 months ago
James Mills 87e78c6188
Fix typos 9 months ago
James Mills b3a0ace274
Add dead feed detection 9 months ago
James Mills 14bd5e8404
Fix support for scraping gopher feeds 9 months ago
James Mills 84168aac1c
Remove session not found warning 9 months ago
James Mills 6ddc092952
Remove expensive (for now) Drone CI Docker Image steps 9 months ago
James Mills cadeb49e24
Add new RecrawlFeedsJob 9 months ago
James Mills f221607ca0
Refactor crawler and scraper and remove a bunch of unused code 9 months ago
James Mills 8e135a1391
Fix a bunch of lint errors 9 months ago
James Mills abed98b736
Fix a bunch of data race bugs and deadlocks. Improve stats 9 months ago
James Mills 2420a1e7e2
Add LastError to URL model 9 months ago
James Mills 871d2c558e
Deal with feeds that change locations 9 months ago
James Mills 397f46c76e
Refactor 9 months ago
James Mills 2b854683c8
Add a /twtxt.txt feed of all new feeds discovered during crawls 9 months ago
James Mills 160919a26f
Fix CI 10 months ago
James Mills 3b9ebd3943
Fix display of date/times 10 months ago