-
Notifications
You must be signed in to change notification settings - Fork 20
fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a skipped stress test that repeatedly decompresses compressed random data while running a pprof server; and constrains zstd decoder concurrency to 1 in two V7 decompression sites. No exported APIs changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant T as TestDecodeAllDeadlock
participant P as pprof HTTP Server
participant C as V8 DACodec (Compress)
participant D as Decompressor (call sites)
participant Z as zstd Decoder (concurrency=1)
Note over T: generate random bytes (crypto/rand)
T->>P: start goroutine: http.ListenAndServe (pprof)
T->>C: Compress(random bytes)
C-->>T: compressed blob
loop repeat many times
T->>D: Decompress(compressed blob)
D->>Z: DecodeAll / Read (with concurrency=1)
Z-->>D: decompressed bytes
D-->>T: return bytes
T->>T: assert equality with original
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
encoding/codecv7_types.go (1)
485-489: Concurrency=1 hotfix is reasonable; add rationale + make it test-configurableForcing decoder concurrency to 1 is a pragmatic mitigation for the upstream deadlock. To aid future maintainers and enable targeted reproduction in tests, consider:
- Adding a short comment with a link/reference to the upstream issue explaining why we force 1.
- Making the concurrency value a package-level variable so tests can temporarily override it (e.g., to set 2 and reproduce).
Apply this diff to use a configurable variable at the call site:
- zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1)) + zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(v7DecoderConcurrency))Add this near the top of the file (right after imports):
// v7DecoderConcurrency defaults to 1 to avoid a rare deadlock observed in the zstd library when concurrency == 2. // Tests can temporarily override this to reproduce/report upstream issues. var v7DecoderConcurrency = 1encoding/codecv7_test.go (1)
24-49: Test won’t reproduce deadlock now that runtime forces concurrency=1; optionally allow per-test overrideSince decompressV7Bytes now forces concurrency=1, this test cannot reproduce the issue that happens at exactly 2. If you implement the suggested v7DecoderConcurrency variable, you can enable reproduction here:
Add inside the test (right after t.Skip is removed when you want to run it):
old := v7DecoderConcurrency v7DecoderConcurrency = 2 t.Cleanup(func() { v7DecoderConcurrency = old })This makes the test meaningful for upstream reporting while keeping production safe.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
encoding/codecv7_test.go(2 hunks)encoding/codecv7_types.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.
Applied to files:
encoding/codecv7_types.goencoding/codecv7_test.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (1)
encoding/codecv8.go (1)
NewDACodecV8(34-41)
🪛 GitHub Check: check
encoding/codecv7_test.go
[failure] 35-35:
SA1019: rand.Read has been deprecated since Go 1.20 because it shouldn't be used: For almost all use cases, crypto/rand.Read is more appropriate. (staticcheck)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
encoding/codecv7_test.go (2)
34-38: Good switch to crypto/rand.Read; resolves SA1019.Using crypto/rand.Read for randomBytes fixes the staticcheck deprecation (SA1019) and is the right choice here. Error handling via require.NoError is also correct.
3-13: Remove duplicate import of net/http (keeps pprof side-effect import intact).You’re importing net/http twice: once normally and once as a blank import. Go disallows importing the same package twice; this will fail to compile. Keep the normal import and the blank import for net/http/pprof only.
Apply this diff to the import block:
import ( crand "crypto/rand" "encoding/hex" "encoding/json" "fmt" "log" "math/big" "math/rand" "net/http" - _ "net/http" _ "net/http/pprof" "strings" "testing" )
🧹 Nitpick comments (1)
encoding/codecv7_test.go (1)
27-33: Optionally gate the pprof server behind an env var to avoid accidental port conflicts when un-skipping.If this stress test is ever enabled locally/CI, listening on a fixed port can clash. Consider starting the pprof server only when explicitly requested via an env var.
Apply this diff:
func TestDecodeAllDeadlock(t *testing.T) { - t.Skip("Skip test that triggers deadlock in zstd library") + t.Skip("Skip test that triggers deadlock in zstd library") - go func() { - log.Println(http.ListenAndServe("localhost:6060", nil)) - }() + if os.Getenv("DA_CODEC_RUN_PPROF") == "1" { + go func() { + log.Println(http.ListenAndServe("localhost:6060", nil)) + }() + }Additionally, add the import:
import ( + "os" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
encoding/codecv7_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.
Applied to files:
encoding/codecv7_test.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (2)
encoding/codecv8.go (1)
NewDACodecV8(34-41)encoding/interfaces.go (1)
CompressScrollBatchBytes(148-151)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
encoding/da.go (2)
555-556: Document the rationale inline and avoid future drift across callsites.Add a short note explaining why concurrency is pinned to 1 so future refactors don’t “optimize” it away, and consider centralizing zstd reader construction in a helper to keep behavior consistent across packages.
Apply this minimal inline comment to capture context:
- zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1)) + // NOTE: Restrict decoder concurrency to 1 to avoid an upstream deadlock when concurrency==2 (see PR #61). + // TODO: Revisit once upstream is patched and consider restoring auto-concurrency. + zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))Optionally, introduce a small helper to keep options consistent (applies outside the selected lines):
// package-level default to make it easy to tweak/feature-flag in one place. var zstdDecoderConcurrency = 1 func newZstdDecoder(r io.Reader) (*zstd.Decoder, error) { return zstd.NewReader(r, zstd.WithDecoderConcurrency(zstdDecoderConcurrency)) }Then replace the inlined constructor here with:
- zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1)) + zr, err := newZstdDecoder(r)
561-569: Don’t silently swallow non-EOF decode errors; return them to avoid truncated data.The loop currently breaks on any error and returns whatever was read so far. If the input is malformed or the decoder returns a real error, we’ll surface partial data without signaling failure. Recommend breaking only on EOF/UnexpectedEOF and returning other errors.
Apply this diff to handle errors precisely:
- for { - i, err := zr.Read(batchOfBytes) - res = append(res, batchOfBytes[:i]...) // append already decoded bytes even if we meet error - // the error here is supposed to be EOF or similar that indicates that buffer has been read until the end - // we should return all data that read by this moment - if i < readBatchSize || err != nil { - break - } - } + for { + i, err := zr.Read(batchOfBytes) + if i > 0 { + res = append(res, batchOfBytes[:i]...) + } + if err != nil { + // Only treat stream termination as success; surface other errors. + if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) { + break + } + return nil, fmt.Errorf("zstd decompress read: %w", err) + } + if i < readBatchSize { + break + } + }Additionally, add the missing import (outside the selected lines):
import "io"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
encoding/codecv7_test.go(2 hunks)encoding/da.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- encoding/codecv7_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.
Applied to files:
encoding/da.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests
🔇 Additional comments (2)
encoding/da.go (2)
555-555: Solid fix: forcing zstd decoder concurrency to 1 mitigates the known deadlock at exactly 2.Constraining the decoder with zstd.WithDecoderConcurrency(1) is a pragmatic, low-risk workaround. Close is deferred correctly.
548-574: All zstd.NewReader calls now pin decoder concurrency to 1. Verified across the repository—no instances remain withoutWithDecoderConcurrency(1). Great work!
Purpose or design rationale of this PR
This PR limits the decoder concurrency of
zstd.NewReaderto 1.da-codec/encoding/codecv7_types.go
Line 485 in 3521fcd
This is to prevent a deadlock somewhere downstream in the library that seems to occur only with a decoder concurrency of exactly 2:
zstd.WithDecoderConcurrency(1).PR title
Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:
Breaking change label
Does this PR have the
breaking-changelabel?Summary by CodeRabbit
Bug Fixes
Tests