Deadlocks #238

Open
opened 11 months ago by prologic · 25 comments
Owner

The current test suite in fact already runs into deadlocks:

(⎈ |local:default)
prologic@Jamess-iMac
Tue Sep 21 09:28:23
~/Projects/bitcask
 (scan_deadlock) 0
$ go test -v .
=== RUN   TestAll
=== RUN   TestAll/Open
=== RUN   TestAll/Put
=== RUN   TestAll/Get
=== RUN   TestAll/Len
=== RUN   TestAll/PutWithTTL
=== RUN   TestAll/GetExpiredKey
=== RUN   TestAll/Has
=== RUN   TestAll/HasWithExpired
=== RUN   TestAll/RunGC
=== RUN   TestAll/Keys
=== RUN   TestAll/Fold
POTENTIAL DEADLOCK: Recursive locking:
current goroutine 53 lock 0xc00017e000
bitcask.go:158 bitcask.(*Bitcask).Get { b.mu.RLock() } <<<<<
bitcask.go:157 bitcask.(*Bitcask).Get { func (b *Bitcask) Get(key []byte) ([]byte, error) { }
bitcask_test.go:137 bitcask.TestAll.func11.1 { value, err := db.Get(key) }
bitcask.go:536 bitcask.(*Bitcask).Fold.func1 { b.trie.ForEach(func(node art.Node) bool { }
~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:51 go-adaptive-radix-tree.traverseFilter.func1 { if options&TraverseLeaf == TraverseLeaf && node.Kind() == Leaf { }
~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:70 go-adaptive-radix-tree.(*tree).recursiveForEach {  }
~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:63 go-adaptive-radix-tree.(*tree).ForEach { t.recursiveForEach(t.root, traverseFilter(options, callback)) }
bitcask.go:542 bitcask.(*Bitcask).Fold {  }
bitcask_test.go:144 bitcask.TestAll.func11 { }) }

Previous place where the lock was grabbed (same goroutine)
bitcask.go:533 bitcask.(*Bitcask).Fold { b.mu.RLock() } <<<<<
bitcask.go:532 bitcask.(*Bitcask).Fold { func (b *Bitcask) Fold(f func(key []byte) error) (err error) { }
bitcask_test.go:144 bitcask.TestAll.func11 { }) }

FAIL	git.mills.io/prologic/bitcask	0.100s
FAIL

The branch scan_deadlock demonstrates this by simply swapping all imports of sync with https://github.com/sasha-s/go-deadlock and replacing all sync.*Mutex with deadlock..

The current test suite in fact already runs into deadlocks: ```#!sh (⎈ |local:default) prologic@Jamess-iMac Tue Sep 21 09:28:23 ~/Projects/bitcask (scan_deadlock) 0 $ go test -v . === RUN TestAll === RUN TestAll/Open === RUN TestAll/Put === RUN TestAll/Get === RUN TestAll/Len === RUN TestAll/PutWithTTL === RUN TestAll/GetExpiredKey === RUN TestAll/Has === RUN TestAll/HasWithExpired === RUN TestAll/RunGC === RUN TestAll/Keys === RUN TestAll/Fold POTENTIAL DEADLOCK: Recursive locking: current goroutine 53 lock 0xc00017e000 bitcask.go:158 bitcask.(*Bitcask).Get { b.mu.RLock() } <<<<< bitcask.go:157 bitcask.(*Bitcask).Get { func (b *Bitcask) Get(key []byte) ([]byte, error) { } bitcask_test.go:137 bitcask.TestAll.func11.1 { value, err := db.Get(key) } bitcask.go:536 bitcask.(*Bitcask).Fold.func1 { b.trie.ForEach(func(node art.Node) bool { } ~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:51 go-adaptive-radix-tree.traverseFilter.func1 { if options&TraverseLeaf == TraverseLeaf && node.Kind() == Leaf { } ~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:70 go-adaptive-radix-tree.(*tree).recursiveForEach { } ~/go/pkg/mod/github.com/plar/go-adaptive-radix-tree@v1.0.4/tree_traversal.go:63 go-adaptive-radix-tree.(*tree).ForEach { t.recursiveForEach(t.root, traverseFilter(options, callback)) } bitcask.go:542 bitcask.(*Bitcask).Fold { } bitcask_test.go:144 bitcask.TestAll.func11 { }) } Previous place where the lock was grabbed (same goroutine) bitcask.go:533 bitcask.(*Bitcask).Fold { b.mu.RLock() } <<<<< bitcask.go:532 bitcask.(*Bitcask).Fold { func (b *Bitcask) Fold(f func(key []byte) error) (err error) { } bitcask_test.go:144 bitcask.TestAll.func11 { }) } FAIL git.mills.io/prologic/bitcask 0.100s FAIL ``` The branch [scan_deadlock](https://git.mills.io/prologic/bitcask/src/branch/scan_deadlock) demonstrates this by simply swapping all imports of `sync` with `https://github.com/sasha-s/go-deadlock` and replacing all `sync.*Mutex` with `deadlock.`.
Poster
Owner

I tried to experiment with taking a deep copy of the trie effectively triny to take a snapshot of the keyspace, but I wasn't able to make this work. The downside to doing this is you also double the memory required everytime you need to scan keys or iterate over all keys :/

cc @taigrr

I _tried_ to experiment with taking a deep copy of the trie effectively triny to take a snapshot of the keyspace, but I wasn't able to make this work. The downside to doing this is you also double the memory required everytime you need to scan keys or iterate over all keys :/ cc @taigrr
Poster
Owner

FWIW we've known about the limitations here for a while, I solved my application problems a different way by collecting the keys I'm interested in into a slice and then doing the work I needed.

FWIW we've known about the limitations here for a while, I solved my application problems a different way by collecting the keys I'm interested in into a slice and then doing the work I needed.
Collaborator

This is extrememly unfortunate. I can see what's happening, and yes the only way to prevent a recursive lock is a copy.

The recursive lock itself isn't the issue, it's a peek into what else may cause an issue. The deadlocks happen because using a recursive deadlock makes this library not thread/goroutine safe. Call a deletion or insertion method from a different goroutine whist in the middle of a Scan() with an embedded Get() and the issue is made clear.

Ok, we're going to have to update the Range() and Scan() methods to copy the trie and then iterate. Bump from O(n) to 'O(2n)' for space and time.

This is extrememly unfortunate. I can see what's happening, and yes the only way to prevent a recursive lock is a copy. The recursive lock itself isn't the issue, it's a peek into what else may cause an issue. The deadlocks happen because using a recursive deadlock makes this library not thread/goroutine safe. Call a deletion or insertion method from a different goroutine whist in the middle of a Scan() with an embedded Get() and the issue is made clear. ~~Ok, we're going to have to update the Range() and Scan() methods to copy the trie and then iterate. Bump from O(n) to 'O(2n)' for space and time.~~
Collaborator

Ok, even a deep copy here doesn't work. I think the only option is to warn against calling Get() inside of Scan()/Range()/Fold() etc. and expose a new method Try() which is just a call to get() (Get() without the RLock()).

Use Get() in your own code, or Try() inside a Scan()/Range()/Fold() lambda.

Our only other option is we break the entire API and start passing a context.Context() object around so we can "inherit" the RWMutex lock when held by the calling function.

What are your thoughts? @prologic

Ok, even a deep copy here doesn't work. I think the only option is to warn against calling `Get()` inside of `Scan()`/`Range()`/`Fold()` etc. and expose a new method `Try()` which is just a call to `get()` (`Get()` without the `RLock()`). Use `Get()` in your own code, or `Try()` inside a `Scan()`/`Range()`/`Fold()` lambda. Our only other option is we break the entire API and start passing a `context.Context()` object around so we can "inherit" the RWMutex lock when held by the calling function. What are your thoughts? @prologic
Poster
Owner

Yeah I've tried to understand and fix this too and found what you also found.

If we break the API we'll have to release a v2 right?

Is this what other database libs do to solve this? Pass a context object around?

Yeah I've tried to understand and fix this too and found what you also found. If we break the API we'll have to release a v2 right? Is this what other database libs do to solve this? Pass a context object around?
Collaborator

Pogreb sidesteps this by not exposing a scan/range and instead hands out an iterator object: https://github.com/akrylysov/pogreb

BoltDB appears to require you to commit write transactions after running them: https://github.com/boltdb/bolt

LevelDB's golang rewrite appears abandoned, but it looks like they also intended to use an iterator: https://github.com/golang/leveldb/blob/master/batch.go

Yes, a breaking API change would mean v2.

I would recommend maybe we do both? Fix v1 with Try() and then add a context-aware v2? Just so people can continue to use v1.

Pogreb sidesteps this by not exposing a scan/range and instead hands out an iterator object: https://github.com/akrylysov/pogreb BoltDB appears to require you to commit write transactions after running them: https://github.com/boltdb/bolt LevelDB's golang rewrite appears abandoned, but it looks like they also intended to use an iterator: https://github.com/golang/leveldb/blob/master/batch.go Yes, a breaking API change would mean v2. I would recommend maybe we do both? Fix v1 with Try() and then add a context-aware v2? Just so people can continue to use v1.
Poster
Owner

Sounds good. The whole Try() thing isn't too cler to me though. Can we talk about how this would work?

Sounds good. The whole `Try()` thing isn't too cler to me though. Can we talk about how this would work?
Collaborator

Let's ignore that suggestion (Re: Try()). I've been thinking, and I think I have a better solution that's API-compatible, but does lose a state guarantee.

We can call the lambda from a goroutine for version 1.0 and then use a context in 2.0 to remove the need for a goroutine.

something like this:

var wg sync.waitgroup
for k := range trie{
	wg.Add(1)
	go func(){
		userFunc(k)
   	 wg.Done()
	}()
}
wg.Wait()
return

Forgive the pseudocode, but I think it would work.

Going back in time, the reason that we are only seeing the deadlocks now is because there was a missing RLock() in the Fold function, which I submitted PR to add back. That RLock was necessary since an insertion could happen at the same time as the Fold, which due to the implementation of the underlying trie, would cause a key to be skipped.

So, if we hold the RLock at the time of the range operation, but call each lambda from a goroutine, we guarantee that we call the function over each k/v pair that exists at call time, but we lose the order (read: not important for a KV store) and we have an edge case where a Deletion / Modification operation runs parallel to the goroutines, which means a value could be changed between call time and return time. However, this risk was also present before my initial pull request.

Basically, we can remove the RLock in Fold to return to the initial state (which has an in-flight edit issue, plus a trie-structure range issue) or we can use goroutines and remove the trie-structure range issue. Upgrade to 2.0 which passes context objects around to remove goroutines and fux the in-transit issue.

Let's ignore that suggestion (Re: `Try()`). I've been thinking, and I think I have a better solution that's API-compatible, but does lose a state guarantee. We can call the lambda from a goroutine for version 1.0 and then use a context in 2.0 to remove the need for a goroutine. something like this: ``` var wg sync.waitgroup for k := range trie{ wg.Add(1) go func(){ userFunc(k) wg.Done() }() } wg.Wait() return ``` Forgive the pseudocode, but I think it would work. Going back in time, the reason that we are only seeing the deadlocks now is because there was a missing `RLock()` in the `Fold` function, which I submitted PR to add back. That `RLock` was necessary since an insertion could happen at the same time as the `Fold`, which due to the implementation of the underlying trie, would cause a key to be skipped. So, if we hold the `RLock` at the time of the `range` operation, but call each lambda from a goroutine, we guarantee that we call the function over each k/v pair that exists at call time, but we lose the order (read: not important for a KV store) and we have an edge case where a Deletion / Modification operation runs parallel to the goroutines, which means a value could be changed between call time and return time. However, this risk was also present before my initial pull request. Basically, we can remove the `RLock` in `Fold` to return to the initial state (which has an in-flight edit issue, plus a trie-structure range issue) or we can use goroutines and remove the trie-structure range issue. Upgrade to 2.0 which passes context objects around to remove goroutines and fux the in-transit issue.
Poster
Owner

Sorry @taigrr I brielfy looked at what you proposed, it sounds good to me. I think you have the experience and bandwidth here over me 🤗 Thanks for doing this! 🤘

Sorry @taigrr I brielfy looked at what you proposed, it sounds good to me. I _think_ you have the experience and bandwidth here over me 🤗 Thanks for doing this! 🤘
Collaborator

Cool, I'll try to get to it this week for the v1 fix, maybe next week for a v2.

Cool, I'll try to get to it this week for the v1 fix, maybe next week for a v2.
Poster
Owner

No problems mate! Sounds good 👌

No problems mate! Sounds good 👌
Poster
Owner

@taigrr Are you still keen to help solve this?

@taigrr Are you still keen to help solve this?

I am having the same issue and will probably need to implement some workarounds for this issue. Is there any progress with this issue or would I be able to help in any way?

I am having the same issue and will probably need to implement some workarounds for this issue. Is there any progress with this issue or would I be able to help in any way?
Collaborator

@prologic @Force Sorry about my absense, I've been tied up for a very long time with stuff for my work. I'm just now getting back into using bitcask again and can finally use that work as an excuse to contribute here again while on the clock. I will take a look at this Tuesday and see what I can do about the v1 fix and the v2 api

@prologic @Force Sorry about my absense, I've been tied up for a very long time with stuff for my work. I'm just now getting back into using bitcask again and can finally use that work as an excuse to contribute here again while on the clock. I will take a look at this Tuesday and see what I can do about the v1 fix and the v2 api

Thanks for your reply :) I was able to fix it by not having any lookups in the scan function, so the issue is not super serious for me. If I can help in any way let me know

Thanks for your reply :) I was able to fix it by not having any lookups in the scan function, so the issue is not super serious for me. If I can help in any way let me know
Poster
Owner

Thanks for your reply :) I was able to fix it by not having any lookups in the scan function, so the issue is not super serious for me. If I can help in any way let me know

This is my current work-around as well. If you intend to do any lookups or modifications, colllect the keys you want tok work with first in a slice, then separately work on those.

It's a bit inconvenient but it works for now.

I'd have fixed this already honestly, but I believe I'm too. stupid to come up with the right API / Pattern here to solve this so I'm looking to @taigrr for advice 😅

> Thanks for your reply :) I was able to fix it by not having any lookups in the scan function, so the issue is not super serious for me. If I can help in any way let me know This is my current work-around as well. If you intend to do any lookups or modifications, colllect the keys you want tok work with first in a slice, then separately work on those. It's a bit inconvenient but it works for now. I'd have fixed this already honestly, but I _believe_ I'm too. stupid to come up with the right API / Pattern here to solve this so I'm looking to @taigrr for advice 😅
Collaborator

I've begun development on v2, and I'm taking a look at v1. v2 work is taking place on my fork on the v2-dev branch. Once everything seems usable, I'll PR.

I've begun development on v2, and I'm taking a look at v1. v2 work is taking place on my fork on the v2-dev branch. Once everything seems usable, I'll PR.
Poster
Owner

I've begun development on v2, and I'm taking a look at v1. v2 work is taking place on my fork on the v2-dev branch. Once everything seems usable, I'll PR.

Oh sweet as!

Are we breaking the API for this btw?
(if so, hopefully in a minimal way)

If not, how are you generally designing the v2?
(I'll have a look at your fork a bit later...)

> I've begun development on v2, and I'm taking a look at v1. v2 work is taking place on my fork on the v2-dev branch. Once everything seems usable, I'll PR. Oh sweet as! Are we breaking the API for this btw? (_if so, hopefully in a minimal way_) If not, how are you generally designing the v2? (_I'll have a look at your fork a bit later..._)
Collaborator

Yes, the reason for the major version change is we need to break the API.

The breakage will be adding a context.Context parameter to every function call.

Using some magic we can assign execution IDs to these context objects which will allow us to completely skip over mutexes within the same call stack, preventing a double write lock. Hopefully, this will make using the library easier for end users.

We will also get the benefit of cancelable db calls, i.e. Scan over a database for up to 3 seconds, then error out if not complete.

Yes, the reason for the major version change is we need to break the API. The breakage will be adding a context.Context parameter to every function call. Using some *magic* we can assign execution IDs to these context objects which will allow us to completely skip over mutexes within the same call stack, preventing a double write lock. Hopefully, this will make using the library easier for end users. We will also get the benefit of cancelable db calls, i.e. Scan over a database for up to 3 seconds, then error out if not complete.
Poster
Owner

Using some magic we can assign execution IDs to these context objects which will allow us to completely skip over mutexes within the same call stack, preventing a double write lock. Hopefully, this will make using the library easier for end users.

This is really nice! I look forward to this! I'm not actually familiar at all with this type of pettern, so very keen to learn from you here! 🤗

We will also get the benefit of cancelable db calls, i.e. Scan over a database for up to 3 seconds, then error out if not complete.

Ahh that will be a nice feature 👌

> Using some *magic* we can assign execution IDs to these context objects which will allow us to completely skip over mutexes within the same call stack, preventing a double write lock. Hopefully, this will make using the library easier for end users. This is really nice! I look forward to this! I'm not actually familiar at all with this type of pettern, so very keen to learn from you here! 🤗 > We will also get the benefit of cancelable db calls, i.e. Scan over a database for up to 3 seconds, then error out if not complete. Ahh that will be a nice feature 👌

Awesome, thanks a lot! :) :)

Awesome, thanks a lot! :) :)
Poster
Owner

Hey @taigrr How are you going with your work on this? Do you need any help?

Hey @taigrr How are you going with your work on this? Do you need any help?
Poster
Owner

Keen to start testing the new API :D

Keen to start testing the new API :D
Poster
Owner

@taigrr Can you remind me, Why do we even hold a Read lock on the tree at all in any of the Search/Scan/Sift/Fold API calls?

This patch seems to fix/solve all deadlocks:

diff --git a/bitcask.go b/bitcask.go
index 6dc206a..facc5da 100644
--- a/bitcask.go
+++ b/bitcask.go
@@ -249,7 +249,6 @@ func (b *Bitcask) delete(key []byte) error {
 func (b *Bitcask) Sift(f func(key []byte) (bool, error)) (err error) {
 	keysToDelete := art.New()
 
-	b.mu.RLock()
 	b.trie.ForEach(func(node art.Node) bool {
 		if b.isExpired(node.Key()) {
 			keysToDelete.Insert(node.Key(), true)
@@ -263,12 +262,9 @@ func (b *Bitcask) Sift(f func(key []byte) (bool, error)) (err error) {
 		}
 		return true
 	})
-	b.mu.RUnlock()
 	if err != nil {
 		return
 	}
-	b.mu.Lock()
-	defer b.mu.Unlock()
 	keysToDelete.ForEach(func(node art.Node) (cont bool) {
 		b.delete(node.Key())
 		return true
@@ -300,9 +296,6 @@ func (b *Bitcask) DeleteAll() (err error) {
 // the function `f` with the keys found. If the function returns an error
 // no further keys are processed and the first error is returned.
 func (b *Bitcask) Scan(prefix []byte, f func(key []byte) error) (err error) {
-	b.mu.RLock()
-	defer b.mu.RUnlock()
-
 	b.trie.ForEachPrefix(prefix, func(node art.Node) bool {
 		// Skip the root node
 		if len(node.Key()) == 0 {
@@ -325,7 +318,6 @@ func (b *Bitcask) Scan(prefix []byte, f func(key []byte) error) (err error) {
 func (b *Bitcask) SiftScan(prefix []byte, f func(key []byte) (bool, error)) (err error) {
 	keysToDelete := art.New()
 
-	b.mu.RLock()
 	b.trie.ForEachPrefix(prefix, func(node art.Node) bool {
 		// Skip the root node
 		if len(node.Key()) == 0 {
@@ -343,14 +335,10 @@ func (b *Bitcask) SiftScan(prefix []byte, f func(key []byte) (bool, error)) (err
 		}
 		return true
 	})
-	b.mu.RUnlock()
-
 	if err != nil {
 		return
 	}
 
-	b.mu.Lock()
-	defer b.mu.Unlock()
 	keysToDelete.ForEach(func(node art.Node) (cont bool) {
 		b.delete(node.Key())
 		return true
@@ -372,9 +360,6 @@ func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error)
 		return ErrInvalidRange
 	}
 
-	b.mu.RLock()
-	defer b.mu.RUnlock()
-
 	b.trie.ForEachPrefix(commonPrefix, func(node art.Node) bool {
 		if bytes.Compare(node.Key(), start) >= 0 && bytes.Compare(node.Key(), end) <= 0 {
 			if err = f(node.Key()); err != nil {
@@ -407,7 +392,6 @@ func (b *Bitcask) SiftRange(start, end []byte, f func(key []byte) (bool, error))
 
 	keysToDelete := art.New()
 
-	b.mu.RLock()
 	b.trie.ForEachPrefix(commonPrefix, func(node art.Node) bool {
 		if bytes.Compare(node.Key(), start) >= 0 && bytes.Compare(node.Key(), end) <= 0 {
 			if b.isExpired(node.Key()) {
@@ -426,15 +410,10 @@ func (b *Bitcask) SiftRange(start, end []byte, f func(key []byte) (bool, error))
 		}
 		return true
 	})
-	b.mu.RUnlock()
-
 	if err != nil {
 		return
 	}
 
-	b.mu.Lock()
-	defer b.mu.Unlock()
-
 	keysToDelete.ForEach(func(node art.Node) (cont bool) {
 		b.delete(node.Key())
 		return true
@@ -454,9 +433,6 @@ func (b *Bitcask) Len() int {
 func (b *Bitcask) Keys() chan []byte {
 	ch := make(chan []byte)
 	go func() {
-		b.mu.RLock()
-		defer b.mu.RUnlock()
-
 		for it := b.trie.Iterator(); it.HasNext(); {
 			node, _ := it.Next()
 			if b.isExpired(node.Key()) {
@@ -504,9 +480,6 @@ func (b *Bitcask) runGC() (err error) {
 // each key. If the function returns an error, no further keys are processed
 // and the error is returned.
 func (b *Bitcask) Fold(f func(key []byte) error) (err error) {
-	b.mu.RLock()
-	defer b.mu.RUnlock()
-
 	b.trie.ForEach(func(node art.Node) bool {
 		if err = f(node.Key()); err != nil {
 			return false
@taigrr Can you remind me, Why do we even hold a Read lock on the tree at all in any of the Search/Scan/Sift/Fold API calls? This patch seems to fix/solve all deadlocks: ```#!diff diff --git a/bitcask.go b/bitcask.go index 6dc206a..facc5da 100644 --- a/bitcask.go +++ b/bitcask.go @@ -249,7 +249,6 @@ func (b *Bitcask) delete(key []byte) error { func (b *Bitcask) Sift(f func(key []byte) (bool, error)) (err error) { keysToDelete := art.New() - b.mu.RLock() b.trie.ForEach(func(node art.Node) bool { if b.isExpired(node.Key()) { keysToDelete.Insert(node.Key(), true) @@ -263,12 +262,9 @@ func (b *Bitcask) Sift(f func(key []byte) (bool, error)) (err error) { } return true }) - b.mu.RUnlock() if err != nil { return } - b.mu.Lock() - defer b.mu.Unlock() keysToDelete.ForEach(func(node art.Node) (cont bool) { b.delete(node.Key()) return true @@ -300,9 +296,6 @@ func (b *Bitcask) DeleteAll() (err error) { // the function `f` with the keys found. If the function returns an error // no further keys are processed and the first error is returned. func (b *Bitcask) Scan(prefix []byte, f func(key []byte) error) (err error) { - b.mu.RLock() - defer b.mu.RUnlock() - b.trie.ForEachPrefix(prefix, func(node art.Node) bool { // Skip the root node if len(node.Key()) == 0 { @@ -325,7 +318,6 @@ func (b *Bitcask) Scan(prefix []byte, f func(key []byte) error) (err error) { func (b *Bitcask) SiftScan(prefix []byte, f func(key []byte) (bool, error)) (err error) { keysToDelete := art.New() - b.mu.RLock() b.trie.ForEachPrefix(prefix, func(node art.Node) bool { // Skip the root node if len(node.Key()) == 0 { @@ -343,14 +335,10 @@ func (b *Bitcask) SiftScan(prefix []byte, f func(key []byte) (bool, error)) (err } return true }) - b.mu.RUnlock() - if err != nil { return } - b.mu.Lock() - defer b.mu.Unlock() keysToDelete.ForEach(func(node art.Node) (cont bool) { b.delete(node.Key()) return true @@ -372,9 +360,6 @@ func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error) return ErrInvalidRange } - b.mu.RLock() - defer b.mu.RUnlock() - b.trie.ForEachPrefix(commonPrefix, func(node art.Node) bool { if bytes.Compare(node.Key(), start) >= 0 && bytes.Compare(node.Key(), end) <= 0 { if err = f(node.Key()); err != nil { @@ -407,7 +392,6 @@ func (b *Bitcask) SiftRange(start, end []byte, f func(key []byte) (bool, error)) keysToDelete := art.New() - b.mu.RLock() b.trie.ForEachPrefix(commonPrefix, func(node art.Node) bool { if bytes.Compare(node.Key(), start) >= 0 && bytes.Compare(node.Key(), end) <= 0 { if b.isExpired(node.Key()) { @@ -426,15 +410,10 @@ func (b *Bitcask) SiftRange(start, end []byte, f func(key []byte) (bool, error)) } return true }) - b.mu.RUnlock() - if err != nil { return } - b.mu.Lock() - defer b.mu.Unlock() - keysToDelete.ForEach(func(node art.Node) (cont bool) { b.delete(node.Key()) return true @@ -454,9 +433,6 @@ func (b *Bitcask) Len() int { func (b *Bitcask) Keys() chan []byte { ch := make(chan []byte) go func() { - b.mu.RLock() - defer b.mu.RUnlock() - for it := b.trie.Iterator(); it.HasNext(); { node, _ := it.Next() if b.isExpired(node.Key()) { @@ -504,9 +480,6 @@ func (b *Bitcask) runGC() (err error) { // each key. If the function returns an error, no further keys are processed // and the error is returned. func (b *Bitcask) Fold(f func(key []byte) error) (err error) { - b.mu.RLock() - defer b.mu.RUnlock() - b.trie.ForEach(func(node art.Node) bool { if err = f(node.Key()); err != nil { return false ```
Poster
Owner

I noticed that the ART libraryes we use upstream actually copied the root of the tree to the local scope of the search fucntions we use, so I'm not entire sure anymore whether we need to hold a lock at all? 🤔

I noticed that the ART libraryes we use upstream actually copied the root of the tree to the local scope of the search fucntions we use, so I'm not entire sure anymore whether we need to hold a lock at all? 🤔
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: prologic/bitcask#238
Loading…
There is no content yet.