-
Notifications
You must be signed in to change notification settings - Fork 819
fix(traceloop-sdk): Improve JSON serialization by sanitizing unpicklable dataclass fields #3432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA sanitization layer is introduced to the decorator base module to convert potentially unpicklable objects into JSON-friendly representations before serialization. The new helper function processes span inputs and outputs recursively, handling primitives, containers, dataclasses, and fallback string conversions. Exception handling in the output handler is expanded to catch both TypeError and ValueError. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to be36de0 in 1 minute and 39 seconds. Click for details.
- Reviewed
102lines of code in1files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/decorators/base.py:205
- Draft comment:
Using copy.copy/deepcopy on each dataclass field may incur performance overhead for large or complex objects. Consider optimizing or caching results if performance becomes an issue. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/traceloop-sdk/traceloop/sdk/decorators/base.py:175
- Draft comment:
Consider adding type annotations (e.g. obj: Any -> Any) in _sanitize_for_serialization for improved clarity and static type checking. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/traceloop-sdk/traceloop/sdk/decorators/base.py:243
- Draft comment:
Good improvement: sanitizing arguments before JSON serialization helps avoid errors from unpicklable objects. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/traceloop-sdk/traceloop/sdk/decorators/base.py:269
- Draft comment:
Catching ValueError in addition to TypeError in _handle_span_output is a useful enhancement to capture more serialization errors. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_jcTtHgSoDJLwD5j8
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| return span, ctx, ctx_token | ||
|
|
||
|
|
||
| def _sanitize_for_serialization(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider handling circular references in _sanitize_for_serialization to prevent infinite recursion when processing cyclic data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py (1)
175-236: Reconsider the approach: "picklability" ≠ JSON serializability.The function uses
copy.deepcopy()to test whether objects are "picklable," but the actual goal is JSON serialization. These are fundamentally different concerns:
- An object can be picklable but not JSON-serializable (e.g., custom classes)
- An object can be JSON-serializable but not picklable
- Testing with
deepcopy()is expensive and conceptually misalignedAdditionally:
- Circular references are not handled, which can cause infinite recursion (as noted in the past review).
- Redundancy with
JSONEncoder: The code already passescls=JSONEncodertojson.dumps(). This sanitization layer may be unnecessary, or the logic should be integrated into the custom encoder instead.- Inefficiency: Calling
copy.copy()(line 208) andcopy.deepcopy()(lines 215, 230) just to test copyability is computationally expensive.- Local imports:
dataclassesandcopy(lines 180-181) should be imported at the module level.- Incomplete type coverage: Common types like
set,frozenset,bytes, etc., are not handled.Consider one of these approaches:
Option 1 (Preferred): Extend the existing
JSONEncoderto handle unpicklable objects directly duringjson.dumps():# In utils/json_encoder.py or similar class JSONEncoder(json.JSONEncoder): def default(self, obj): if dataclasses.is_dataclass(obj): try: return dataclasses.asdict(obj) except (TypeError, ValueError): return f"<non-serializable: {type(obj).__name__}>" # Handle other types... try: return super().default(obj) except TypeError: return f"<non-serializable: {type(obj).__name__}>"Option 2: If pre-sanitization is required, test JSON serializability directly instead of using deepcopy:
def _sanitize_for_serialization(obj, _seen=None): """Recursively sanitize objects for JSON serialization.""" # Handle circular references if _seen is None: _seen = set() obj_id = id(obj) if obj_id in _seen: return f"<circular reference: {type(obj).__name__}>" # Primitives pass through if isinstance(obj, (type(None), str, int, float, bool)): return obj # Track this object _seen.add(obj_id) try: if isinstance(obj, (list, tuple)): result = type(obj)(_sanitize_for_serialization(item, _seen) for item in obj) elif isinstance(obj, dict): result = {key: _sanitize_for_serialization(value, _seen) for key, value in obj.items()} elif dataclasses.is_dataclass(obj): result = { field.name: _sanitize_for_serialization(getattr(obj, field.name), _seen) for field in dataclasses.fields(obj) } else: # Test actual JSON serializability json.dumps(obj) result = obj except (TypeError, ValueError): result = f"<non-serializable: {type(obj).__name__}>" finally: _seen.discard(obj_id) return resultMove
dataclassesimport to the top of the file.
🧹 Nitpick comments (1)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py (1)
269-269: Verify and document the addition ofValueErrorto exception handling.The exception clause was expanded to catch
ValueErrorin addition toTypeError. While this may be necessary, it should be verified and documented:
- What specific scenarios raise
ValueErrorduring serialization or sanitization?- Is this a common case or an edge case?
Consider adding a comment explaining when
ValueErroris expected.If you can provide specific examples of when
ValueErroris raised, add a clarifying comment:- except (TypeError, ValueError) as e: + except (TypeError, ValueError) as e: # ValueError can occur during dataclass field access Telemetry().log_exception(e)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/traceloop-sdk/traceloop/sdk/decorators/base.py
🧬 Code graph analysis (1)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py (2)
packages/opentelemetry-instrumentation-mcp/opentelemetry/instrumentation/mcp/fastmcp_instrumentation.py (2)
_should_send_prompts(156-160)_truncate_json_if_needed(166-176)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
🪛 Ruff (0.14.2)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py
224-224: Consider moving this statement to an else block
(TRY300)
232-232: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (1)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py (1)
242-247: ****The sanitization and encoder layers are not redundant—they serve complementary, necessary purposes:
_sanitize_for_serialization(pre-serialization): Tests picklability viacopy.deepcopy()and replaces unpicklable objects with string placeholders beforejson.dumps()is called. This prevents TypeError exceptions from truly unpicklable types (locks, sockets, etc.).
JSONEncoder(during serialization): Converts custom picklable types (dataclasses, objects withto_json(), etc.) to JSON-compatible forms. This runs duringjson.dumps()and cannot prevent errors from unpicklable objects.The encoder alone cannot handle unpicklable objects because it operates too late in the pipeline. The current two-layer approach is architecturally correct: sanitize first to eliminate problematic objects, then apply type conversion for the remainder. Moving sanitization logic into the encoder would not solve the fundamental issue—unpicklable objects would still cause
json.dumps()to fail.Likely an incorrect or invalid review comment.
feat(instrumentation): ...orfix(instrumentation): ....Important
Add
_sanitize_for_serialization()to handle JSON serialization of unpicklable dataclass fields inbase.py, updating input/output handling and exception catching._sanitize_for_serialization()tobase.pyto handle JSON serialization of unpicklable dataclass fields._sanitize_for_serialization()into_handle_span_input()and_handle_span_output()to sanitize inputs and outputs._handle_span_output()to catchValueErrorin addition toTypeError.This description was created by
for be36de0. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit