Skip to content

Conversation

@jonastheis
Copy link
Contributor

@jonastheis jonastheis commented Aug 22, 2025

Purpose or design rationale of this PR

This PR limits the decoder concurrency of zstd.NewReader to 1.

zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))

This is to prevent a deadlock somewhere downstream in the library that seems to occur only with a decoder concurrency of exactly 2: zstd.WithDecoderConcurrency(1).

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

  • build: Changes that affect the build system or external dependencies (example scopes: yarn, eslint, typescript)
  • ci: Changes to our CI configuration files and scripts (example scopes: vercel, github, cypress)
  • docs: Documentation-only changes
  • feat: A new feature
  • fix: A bug fix
  • perf: A code change that improves performance
  • refactor: A code change that doesn't fix a bug, or add a feature, or improves performance
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • test: Adding missing tests or correcting existing tests

Breaking change label

Does this PR have the breaking-change label?

  • No, this PR is not a breaking change
  • Yes

Summary by CodeRabbit

  • Bug Fixes

    • Reduced decompression concurrency to improve reliability and stability during heavy decoding workloads.
  • Tests

    • Added a high-load decompression stress test (skipped by default) with profiling enabled to validate robustness and detect potential deadlocks.

@coderabbitai
Copy link

coderabbitai bot commented Aug 22, 2025

Walkthrough

Adds a skipped stress test that repeatedly decompresses compressed random data while running a pprof server; and constrains zstd decoder concurrency to 1 in two V7 decompression sites. No exported APIs changed.

Changes

Cohort / File(s) Summary
Test: deadlock stress & profiling
encoding/codecv7_test.go
Adds TestDecodeAllDeadlock which imports crand (crypto/rand), log, net/http, and _ "net/http/pprof", starts a pprof HTTP server, generates random bytes, compresses with the V8 DACodec, and repeatedly decompresses the blob to surface a zstd deadlock; test is skipped by default.
Decoder configuration update (V7 bytes)
encoding/codecv7_types.go
In decompressV7Bytes, constructs the zstd decoder with zstd.WithDecoderConcurrency(1) to limit decoder concurrency; other logic unchanged.
Decoder configuration update (scroll blobs)
encoding/da.go
In decompressScrollBlobToBatch, creates the zstd reader with zstd.WithDecoderConcurrency(1) instead of the previous default; rest of function unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant T as TestDecodeAllDeadlock
  participant P as pprof HTTP Server
  participant C as V8 DACodec (Compress)
  participant D as Decompressor (call sites)
  participant Z as zstd Decoder (concurrency=1)

  Note over T: generate random bytes (crypto/rand)
  T->>P: start goroutine: http.ListenAndServe (pprof)
  T->>C: Compress(random bytes)
  C-->>T: compressed blob

  loop repeat many times
    T->>D: Decompress(compressed blob)
    D->>Z: DecodeAll / Read (with concurrency=1)
    Z-->>D: decompressed bytes
    D-->>T: return bytes
    T->>T: assert equality with original
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I thump my paw at threads that balk,
I stir the bytes, I check the clock.
One decoder stride, no race to chase,
Pprof hums softly—steady pace.
I nibble bugs and hop with grace. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/zstd-deadlock-issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
encoding/codecv7_types.go (1)

485-489: Concurrency=1 hotfix is reasonable; add rationale + make it test-configurable

Forcing decoder concurrency to 1 is a pragmatic mitigation for the upstream deadlock. To aid future maintainers and enable targeted reproduction in tests, consider:

  • Adding a short comment with a link/reference to the upstream issue explaining why we force 1.
  • Making the concurrency value a package-level variable so tests can temporarily override it (e.g., to set 2 and reproduce).

Apply this diff to use a configurable variable at the call site:

- zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+ zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(v7DecoderConcurrency))

Add this near the top of the file (right after imports):

// v7DecoderConcurrency defaults to 1 to avoid a rare deadlock observed in the zstd library when concurrency == 2.
// Tests can temporarily override this to reproduce/report upstream issues.
var v7DecoderConcurrency = 1
encoding/codecv7_test.go (1)

24-49: Test won’t reproduce deadlock now that runtime forces concurrency=1; optionally allow per-test override

Since decompressV7Bytes now forces concurrency=1, this test cannot reproduce the issue that happens at exactly 2. If you implement the suggested v7DecoderConcurrency variable, you can enable reproduction here:

Add inside the test (right after t.Skip is removed when you want to run it):

old := v7DecoderConcurrency
v7DecoderConcurrency = 2
t.Cleanup(func() { v7DecoderConcurrency = old })

This makes the test meaningful for upstream reporting while keeping production safe.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2cfec8c and 6a97fc7.

📒 Files selected for processing (2)
  • encoding/codecv7_test.go (2 hunks)
  • encoding/codecv7_types.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

  • encoding/codecv7_types.go
  • encoding/codecv7_test.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (1)
encoding/codecv8.go (1)
  • NewDACodecV8 (34-41)
🪛 GitHub Check: check
encoding/codecv7_test.go

[failure] 35-35:
SA1019: rand.Read has been deprecated since Go 1.20 because it shouldn't be used: For almost all use cases, crypto/rand.Read is more appropriate. (staticcheck)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests

yiweichi
yiweichi previously approved these changes Aug 22, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
encoding/codecv7_test.go (2)

34-38: Good switch to crypto/rand.Read; resolves SA1019.

Using crypto/rand.Read for randomBytes fixes the staticcheck deprecation (SA1019) and is the right choice here. Error handling via require.NoError is also correct.


3-13: Remove duplicate import of net/http (keeps pprof side-effect import intact).

You’re importing net/http twice: once normally and once as a blank import. Go disallows importing the same package twice; this will fail to compile. Keep the normal import and the blank import for net/http/pprof only.

Apply this diff to the import block:

 import (
 	crand "crypto/rand"
 	"encoding/hex"
 	"encoding/json"
 	"fmt"
 	"log"
 	"math/big"
 	"math/rand"
 	"net/http"
-	_ "net/http"
 	_ "net/http/pprof"
 	"strings"
 	"testing"
 )
🧹 Nitpick comments (1)
encoding/codecv7_test.go (1)

27-33: Optionally gate the pprof server behind an env var to avoid accidental port conflicts when un-skipping.

If this stress test is ever enabled locally/CI, listening on a fixed port can clash. Consider starting the pprof server only when explicitly requested via an env var.

Apply this diff:

 func TestDecodeAllDeadlock(t *testing.T) {
-	t.Skip("Skip test that triggers deadlock in zstd library")
+	t.Skip("Skip test that triggers deadlock in zstd library")
 
-	go func() {
-		log.Println(http.ListenAndServe("localhost:6060", nil))
-	}()
+	if os.Getenv("DA_CODEC_RUN_PPROF") == "1" {
+		go func() {
+			log.Println(http.ListenAndServe("localhost:6060", nil))
+		}()
+	}

Additionally, add the import:

 import (
+	"os"
 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6a97fc7 and d801a9c.

📒 Files selected for processing (1)
  • encoding/codecv7_test.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

  • encoding/codecv7_test.go
🧬 Code graph analysis (1)
encoding/codecv7_test.go (2)
encoding/codecv8.go (1)
  • NewDACodecV8 (34-41)
encoding/interfaces.go (1)
  • CompressScrollBatchBytes (148-151)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests

colinlyguo
colinlyguo previously approved these changes Aug 22, 2025
@jonastheis jonastheis dismissed stale reviews from colinlyguo and yiweichi via fd86a7c August 22, 2025 09:54
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
encoding/da.go (2)

555-556: Document the rationale inline and avoid future drift across callsites.

Add a short note explaining why concurrency is pinned to 1 so future refactors don’t “optimize” it away, and consider centralizing zstd reader construction in a helper to keep behavior consistent across packages.

Apply this minimal inline comment to capture context:

-	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+	// NOTE: Restrict decoder concurrency to 1 to avoid an upstream deadlock when concurrency==2 (see PR #61).
+	// TODO: Revisit once upstream is patched and consider restoring auto-concurrency.
+	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))

Optionally, introduce a small helper to keep options consistent (applies outside the selected lines):

// package-level default to make it easy to tweak/feature-flag in one place.
var zstdDecoderConcurrency = 1

func newZstdDecoder(r io.Reader) (*zstd.Decoder, error) {
	return zstd.NewReader(r, zstd.WithDecoderConcurrency(zstdDecoderConcurrency))
}

Then replace the inlined constructor here with:

-	zr, err := zstd.NewReader(r, zstd.WithDecoderConcurrency(1))
+	zr, err := newZstdDecoder(r)

561-569: Don’t silently swallow non-EOF decode errors; return them to avoid truncated data.

The loop currently breaks on any error and returns whatever was read so far. If the input is malformed or the decoder returns a real error, we’ll surface partial data without signaling failure. Recommend breaking only on EOF/UnexpectedEOF and returning other errors.

Apply this diff to handle errors precisely:

-	for {
-		i, err := zr.Read(batchOfBytes)
-		res = append(res, batchOfBytes[:i]...) // append already decoded bytes even if we meet error
-		// the error here is supposed to be EOF or similar that indicates that buffer has been read until the end
-		// we should return all data that read by this moment
-		if i < readBatchSize || err != nil {
-			break
-		}
-	}
+	for {
+		i, err := zr.Read(batchOfBytes)
+		if i > 0 {
+			res = append(res, batchOfBytes[:i]...)
+		}
+		if err != nil {
+			// Only treat stream termination as success; surface other errors.
+			if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
+				break
+			}
+			return nil, fmt.Errorf("zstd decompress read: %w", err)
+		}
+		if i < readBatchSize {
+			break
+		}
+	}

Additionally, add the missing import (outside the selected lines):

import "io"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d801a9c and fd86a7c.

📒 Files selected for processing (2)
  • encoding/codecv7_test.go (2 hunks)
  • encoding/da.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • encoding/codecv7_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-18T03:40:09.800Z
Learnt from: colinlyguo
PR: scroll-tech/da-codec#25
File: encoding/codecv1_types.go:105-116
Timestamp: 2024-10-18T03:40:09.800Z
Learning: The code in `encoding/codecv1_types.go`, specifically the `Encode` method in `daBatchV1`, has been updated. Previous comments regarding hardcoded byte offsets may be outdated.

Applied to files:

  • encoding/da.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (2)
encoding/da.go (2)

555-555: Solid fix: forcing zstd decoder concurrency to 1 mitigates the known deadlock at exactly 2.

Constraining the decoder with zstd.WithDecoderConcurrency(1) is a pragmatic, low-risk workaround. Close is deferred correctly.


548-574: All zstd.NewReader calls now pin decoder concurrency to 1. Verified across the repository—no instances remain without WithDecoderConcurrency(1). Great work!

@yiweichi yiweichi merged commit 0f1ad36 into main Aug 22, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants