fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61

jonastheis · 2025-08-22T09:42:10Z

Purpose or design rationale of this PR

This PR limits the decoder concurrency of zstd.NewReader to 1.

Line 485 in 3521fcd

zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))

This is to prevent a deadlock somewhere downstream in the library that seems to occur only with a decoder concurrency of exactly 2: zstd.WithDecoderConcurrency(1).

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

build: Changes that affect the build system or external dependencies (example scopes: yarn, eslint, typescript)
ci: Changes to our CI configuration files and scripts (example scopes: vercel, github, cypress)
docs: Documentation-only changes
feat: A new feature
fix: A bug fix
perf: A code change that improves performance
refactor: A code change that doesn't fix a bug, or add a feature, or improves performance
style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
test: Adding missing tests or correcting existing tests

Breaking change label

Does this PR have the breaking-change label?

No, this PR is not a breaking change
Yes

Summary by CodeRabbit

Bug Fixes
- Reduced decompression concurrency to improve reliability and stability during heavy decoding workloads.
Tests
- Added a high-load decompression stress test (skipped by default) with profiling enabled to validate robustness and detect potential deadlocks.

…n zstd library

coderabbitai · 2025-08-22T09:42:18Z

Walkthrough

Adds a skipped stress test that repeatedly decompresses compressed random data while running a pprof server; and constrains zstd decoder concurrency to 1 in two V7 decompression sites. No exported APIs changed.

Changes

Cohort / File(s)	Summary
Test: deadlock stress & profiling `encoding/codecv7_test.go`	Adds `TestDecodeAllDeadlock` which imports `crand` (`crypto/rand`), `log`, `net/http`, and `_ "net/http/pprof"`, starts a pprof HTTP server, generates random bytes, compresses with the V8 DACodec, and repeatedly decompresses the blob to surface a zstd deadlock; test is skipped by default.
Decoder configuration update (V7 bytes) `encoding/codecv7_types.go`	In `decompressV7Bytes`, constructs the zstd decoder with `zstd.WithDecoderConcurrency(1)` to limit decoder concurrency; other logic unchanged.
Decoder configuration update (scroll blobs) `encoding/da.go`	In `decompressScrollBlobToBatch`, creates the zstd reader with `zstd.WithDecoderConcurrency(1)` instead of the previous default; rest of function unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant T as TestDecodeAllDeadlock
  participant P as pprof HTTP Server
  participant C as V8 DACodec (Compress)
  participant D as Decompressor (call sites)
  participant Z as zstd Decoder (concurrency=1)

  Note over T: generate random bytes (crypto/rand)
  T->>P: start goroutine: http.ListenAndServe (pprof)
  T->>C: Compress(random bytes)
  C-->>T: compressed blob

  loop repeat many times
    T->>D: Decompress(compressed blob)
    D->>Z: DecodeAll / Read (with concurrency=1)
    Z-->>D: decompressed bytes
    D-->>T: return bytes
    T->>T: assert equality with original
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I thump my paw at threads that balk,
I stir the bytes, I check the clock.
One decoder stride, no race to chase,
Pprof hums softly—steady pace.
I nibble bugs and hop with grace. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/zstd-deadlock-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

encoding/codecv7_types.go (1)
485-489: Concurrency=1 hotfix is reasonable; add rationale + make it test-configurable

Forcing decoder concurrency to 1 is a pragmatic mitigation for the upstream deadlock. To aid future maintainers and enable targeted reproduction in tests, consider:

Adding a short comment with a link/reference to the upstream issue explaining why we force 1.

Making the concurrency value a package-level variable so tests can temporarily override it (e.g., to set 2 and reproduce).

Apply this diff to use a configurable variable at the call site:
- zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+ zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(v7DecoderConcurrency))
Add this near the top of the file (right after imports):
// v7DecoderConcurrency defaults to 1 to avoid a rare deadlock observed in the zstd library when concurrency == 2.
// Tests can temporarily override this to reproduce/report upstream issues.
var v7DecoderConcurrency = 1
encoding/codecv7_test.go (1)
24-49: Test won’t reproduce deadlock now that runtime forces concurrency=1; optionally allow per-test override

Since decompressV7Bytes now forces concurrency=1, this test cannot reproduce the issue that happens at exactly 2. If you implement the suggested v7DecoderConcurrency variable, you can enable reproduction here:

Add inside the test (right after t.Skip is removed when you want to run it):
old := v7DecoderConcurrency
v7DecoderConcurrency = 2
t.Cleanup(func() { v7DecoderConcurrency = old })
This makes the test meaningful for upstream reporting while keeping production safe.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2cfec8c and 6a97fc7.

📒 Files selected for processing (2)

encoding/codecv7_test.go (2 hunks)
encoding/codecv7_types.go (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-18T03:40:09.800Z

Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

encoding/codecv7_types.go
encoding/codecv7_test.go

🧬 Code graph analysis (1)

encoding/codecv7_test.go (1)

encoding/codecv8.go (1)

NewDACodecV8 (34-41)

🪛 GitHub Check: check

encoding/codecv7_test.go

[failure] 35-35:
SA1019: rand.Read has been deprecated since Go 1.20 because it shouldn't be used: For almost all use cases, crypto/rand.Read is more appropriate. (staticcheck)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: tests

encoding/codecv7_test.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

encoding/codecv7_test.go (2)
34-38: Good switch to crypto/rand.Read; resolves SA1019.

Using crypto/rand.Read for randomBytes fixes the staticcheck deprecation (SA1019) and is the right choice here. Error handling via require.NoError is also correct.

3-13: Remove duplicate import of net/http (keeps pprof side-effect import intact).

You’re importing net/http twice: once normally and once as a blank import. Go disallows importing the same package twice; this will fail to compile. Keep the normal import and the blank import for net/http/pprof only.

Apply this diff to the import block:
 import (
 	crand "crypto/rand"
 	"encoding/hex"
 	"encoding/json"
 	"fmt"
 	"log"
 	"math/big"
 	"math/rand"
 	"net/http"
-	_ "net/http"
 	_ "net/http/pprof"
 	"strings"
 	"testing"
 )

🧹 Nitpick comments (1)

encoding/codecv7_test.go (1)
27-33: Optionally gate the pprof server behind an env var to avoid accidental port conflicts when un-skipping.

If this stress test is ever enabled locally/CI, listening on a fixed port can clash. Consider starting the pprof server only when explicitly requested via an env var.

Apply this diff:
 func TestDecodeAllDeadlock(t *testing.T) {
-	t.Skip("Skip test that triggers deadlock in zstd library")
+	t.Skip("Skip test that triggers deadlock in zstd library")
 
-	go func() {
-		log.Println(http.ListenAndServe("localhost:6060", nil))
-	}()
+	if os.Getenv("DA_CODEC_RUN_PPROF") == "1" {
+		go func() {
+			log.Println(http.ListenAndServe("localhost:6060", nil))
+		}()
+	}
Additionally, add the import:
 import (
+	"os"
 )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a97fc7 and d801a9c.

📒 Files selected for processing (1)

encoding/codecv7_test.go (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-18T03:40:09.800Z

Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

encoding/codecv7_test.go

🧬 Code graph analysis (1)

encoding/codecv7_test.go (2)

encoding/codecv8.go (1)

NewDACodecV8 (34-41)

encoding/interfaces.go (1)

CompressScrollBatchBytes (148-151)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: tests

encoding/codecv7_test.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

encoding/da.go (2)
555-556: Document the rationale inline and avoid future drift across callsites.

Add a short note explaining why concurrency is pinned to 1 so future refactors don’t “optimize” it away, and consider centralizing zstd reader construction in a helper to keep behavior consistent across packages.

Apply this minimal inline comment to capture context:
-	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+	// NOTE: Restrict decoder concurrency to 1 to avoid an upstream deadlock when concurrency==2 (see PR #61).
+	// TODO: Revisit once upstream is patched and consider restoring auto-concurrency.
+	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
Optionally, introduce a small helper to keep options consistent (applies outside the selected lines):
// package-level default to make it easy to tweak/feature-flag in one place.
var zstdDecoderConcurrency = 1

func newZstdDecoder(r io.Reader) (*zstd.Decoder, error) {
	return zstd.NewReader(r, zstd.WithDecoderConcurrency(zstdDecoderConcurrency))
}
Then replace the inlined constructor here with:
-	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+	zr, err := newZstdDecoder(r)
561-569: Don’t silently swallow non-EOF decode errors; return them to avoid truncated data.

The loop currently breaks on any error and returns whatever was read so far. If the input is malformed or the decoder returns a real error, we’ll surface partial data without signaling failure. Recommend breaking only on EOF/UnexpectedEOF and returning other errors.

Apply this diff to handle errors precisely:
-	for {
-		i, err := zr.Read(batchOfBytes)
-		res = append(res, batchOfBytes[:i]...) // append already decoded bytes even if we meet error
-		// the error here is supposed to be EOF or similar that indicates that buffer has been read until the end
-		// we should return all data that read by this moment
-		if i < readBatchSize || err != nil {
-			break
-		}
-	}
+	for {
+		i, err := zr.Read(batchOfBytes)
+		if i > 0 {
+			res = append(res, batchOfBytes[:i]...)
+		}
+		if err != nil {
+			// Only treat stream termination as success; surface other errors.
+			if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
+				break
+			}
+			return nil, fmt.Errorf("zstd decompress read: %w", err)
+		}
+		if i < readBatchSize {
+			break
+		}
+	}
Additionally, add the missing import (outside the selected lines):
import "io"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d801a9c and fd86a7c.

📒 Files selected for processing (2)

encoding/codecv7_test.go (2 hunks)
encoding/da.go (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

encoding/codecv7_test.go

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-18T03:40:09.800Z

Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

encoding/da.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: tests

🔇 Additional comments (2)

encoding/da.go (2)

555-555: Solid fix: forcing zstd decoder concurrency to 1 mitigates the known deadlock at exactly 2.

Constraining the decoder with zstd.WithDecoderConcurrency(1) is a pragmatic, low-risk workaround. Close is deferred correctly.

548-574: All zstd.NewReader calls now pin decoder concurrency to 1. Verified across the repository—no instances remain without WithDecoderConcurrency(1). Great work!

jonastheis added 2 commits August 22, 2025 17:39

fix(zstd decompression): limit concurrency to 1 to prevent deadlock i…

3521fcd

…n zstd library

skip test

6a97fc7

use crypt/rand

d801a9c

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

encoding/codecv7_test.go Show resolved Hide resolved

encoding/codecv7_test.go Outdated Show resolved Hide resolved

yiweichi previously approved these changes Aug 22, 2025

View reviewed changes

yiweichi requested review from Thegaram and colinlyguo August 22, 2025 09:52

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

encoding/codecv7_test.go Outdated Show resolved Hide resolved

colinlyguo previously approved these changes Aug 22, 2025

View reviewed changes

address review comments

fd86a7c

jonastheis dismissed stale reviews from colinlyguo and yiweichi via fd86a7c August 22, 2025 09:54

yiweichi requested review from colinlyguo and yiweichi August 22, 2025 09:59

yiweichi approved these changes Aug 22, 2025

View reviewed changes

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

colinlyguo approved these changes Aug 22, 2025

View reviewed changes

yiweichi merged commit 0f1ad36 into main Aug 22, 2025
4 checks passed

yiweichi mentioned this pull request Aug 22, 2025

fix(bridge-history+rollup-relayer): update da-codec to prevent zstd deadlock scroll-tech/scroll#1724

Merged

13 tasks

coderabbitai bot mentioned this pull request Aug 25, 2025

fix(zstd decompression): add correct usage of zstd library #63

Merged

11 tasks

yiweichi mentioned this pull request Sep 3, 2025

fix: update da-codec to prevent zstd deadlock scroll-tech/go-ethereum#1237

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61

fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61

Uh oh!

jonastheis commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61

fix(zstd decompression): limit concurrency to 1 to prevent deadlock in zstd library #61

Uh oh!

Conversation

jonastheis commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose or design rationale of this PR

PR title

Breaking change label

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonastheis commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)