Skip to content

Conversation

@SmartManoj
Copy link
Contributor

Fixes: #1003

FileEditor now supports viewing image files (.png, .jpg, .jpeg, .gif, .webp, .bmp) by returning base64-encoded image data and displaying it as ImageContent. Updated FileEditorObservation and tool description to handle image data, and added tests to verify image handling and backward compatibility with text files.

Before:
image_2025-11-04_15-17-30

After:
image_2025-11-04_15-12-42

FileEditor now supports viewing image files (.png, .jpg, .jpeg, .gif, .webp, .bmp) by returning base64-encoded image data and displaying it as ImageContent. Updated FileEditorObservation and tool description to handle image data, and added tests to verify image handling and backward compatibility with text files.
@simonrosenberg
Copy link
Collaborator

Thank you @SmartManoj !

Perhaps we should wait for #929 to be merged before merging this though. Will let you know very soon

@blacksmith-sh blacksmith-sh bot requested a review from xingyaoww November 6, 2025 13:01
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 6, 2025

[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a testcase that actually load an image file and assert the file editor will return ImageContent without error?

Introduces a test to verify that viewing a PNG image file with FileEditor returns an ImageContent object containing a base64-encoded image URL. This ensures image files are handled correctly by the view command.
Replaces single quotes with double quotes for byte string literals in the PNG image data and updates an attribute check to use double quotes for consistency in test_view_image_file_returns_image_content.
@enyst
Copy link
Collaborator

enyst commented Nov 8, 2025

This is very nice, thank you for the work on it!

Comment on lines 344 to 354
mime_type = "image/png" # default
if image_base64.startswith("/9j/"):
mime_type = "image/jpeg"
elif image_base64.startswith("iVBORw0KGgo"):
mime_type = "image/png"
elif image_base64.startswith("R0lGODlh"):
mime_type = "image/gif"
elif image_base64.startswith("UklGR"):
mime_type = "image/webp"
elif image_base64.startswith("Qk"):
mime_type = "image/bmp"
Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this!

I am no expert but all those if/elifs seem a bit hacky here. Would it make sense to use a library instead?

import mimetypes

mime_type, _ = mimetypes.guess_type(path)
mime_type = mime_type or "application/octet-stream"

or

import imghdr

image_type = imghdr.what(path)
mime_type = f"image/{image_type}" if image_type else "application/octet-stream"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a library instead?

Yes.


Similar snippet in codebase:

if self.screenshot_data:
mime_type = "image/png"
if self.screenshot_data.startswith("/9j/"):
mime_type = "image/jpeg"
elif self.screenshot_data.startswith("iVBORw0KGgo"):
mime_type = "image/png"
elif self.screenshot_data.startswith("R0lGODlh"):
mime_type = "image/gif"
elif self.screenshot_data.startswith("UklGR"):
mime_type = "image/webp"

Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands /codereview-roasted read the comments above
the following block of code is 1. duplicated (copy-pasted from software-agent-sdk/openhands-tools/openhands/tools/browser_use/definition.py#L50-L59) 2. a bit hacky.
What are you suggestions to improve this?

 if self.screenshot_data: 
     mime_type = "image/png" 
     if self.screenshot_data.startswith("/9j/"): 
         mime_type = "image/jpeg" 
     elif self.screenshot_data.startswith("iVBORw0KGgo"): 
         mime_type = "image/png" 
     elif self.screenshot_data.startswith("R0lGODlh"): 
         mime_type = "image/gif" 
     elif self.screenshot_data.startswith("UklGR"): 
         mime_type = "image/webp"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! simonrosenberg can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

I've analyzed the duplicated and hacky MIME type detection code in PR #1016. Here's my assessment:

Problems Identified:

  1. Code Duplication: The exact same MIME type detection logic (lines 344-354 in file_editor/editor.py) is copy-pasted from browser_use/definition.py (lines 50-59)

  2. Wrong Approach: The code detects MIME types by pattern-matching base64-encoded strings rather than checking actual file data. This is brittle because you're inspecting an encoding artifact, not the actual file format.

  3. Ignoring Available Data: In file_editor/editor.py, you have the file path but still read the entire file, encode it to base64, and pattern-match strings - when Python's mimetypes.guess_type(path) would do this correctly in one line.

Recommended Solutions:

For file_editor.py (SIMPLEST FIX):
Replace lines 344-354 with:

import mimetypes

mime_type, _ = mimetypes.guess_type(path)
if not mime_type or not mime_type.startswith('image/'):
    mime_type = 'application/octet-stream'

This is simpler (3 lines vs 10), faster (no file inspection needed), more correct (uses standard library), and more maintainable (handles all formats automatically).

For long-term maintainability:
Create a shared utility function in openhands-tools/openhands/tools/utils/mime_type.py that:

  • Uses mimetypes.guess_type() when a file path is available
  • Falls back to checking actual magic numbers (not base64 prefixes) for raw data
  • Can be used by both file_editor and browser_use to eliminate duplication

Verdict:

Needs rework - Violates DRY principle and uses incorrect approach for type detection

The current approach works but is unnecessarily complex and will become a maintenance burden. Using Python's standard library is the pragmatic, simple solution that eliminates special cases.

View full conversation

Copy link
Collaborator

@simonrosenberg simonrosenberg Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can implement the recommended solution @SmartManoj ? I dont think we need the long-term maintainability solution yet

Copy link
Contributor Author

@SmartManoj SmartManoj Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mime_type = 'application/octet-stream'

@simonrosenberg, what do @OpenHands think about quote-style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed " is preferred!

SmartManoj and others added 3 commits November 8, 2025 16:55
Replaced base64 prefix checks with mimetypes.guess_type for determining image MIME type when reading image files. Falls back to 'image/png' if MIME type is not detected or not an image. This simplifies and improves accuracy of MIME type detection.
Replaces single quotes with double quotes in MIME type string comparison and assignment for consistency in the file editor.
@xingyaoww
Copy link
Collaborator

xingyaoww commented Nov 8, 2025

@OpenHands review the changes introduce in the current PR, then help me add an integration test under tests/integration/tests very similar to its existing format.

The test should work similar to examples/01_standalone_sdk/17_image_input.py:

  1. it sets up the workspace by downloading the logo in example 17
  2. it ask the agent what the color of the logo is (blue, yellow, or green)
  3. in check phase, it should check the agent answered logo have yellow color

@openhands-ai
Copy link

openhands-ai bot commented Nov 8, 2025

I'm on it! xingyaoww can track my progress at all-hands.dev

This test verifies that the FileEditor can view and display image files
by having an agent analyze the OpenHands logo and identify its colors.

The test:
1. Downloads the OpenHands logo to the workspace
2. Asks the agent to view the logo.png file and identify its colors
3. Verifies that the agent correctly identifies yellow as one of the colors

This test requires a vision-capable LLM model.

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Nov 8, 2025

Summary

I've successfully completed your request! Here's what was done:

✅ Reviewed PR #1016 Changes

The PR adds image file viewing support to FileEditor:

  • FileEditor now supports viewing image files (.png, .jpg, .jpeg, .gif, .webp, .bmp)
  • Images are returned as base64-encoded data wrapped in ImageContent
  • File validation was updated to allow image files (previously rejected as binary)
  • Unit tests verify image handling and backward compatibility

✅ Created Integration Test

Added tests/integration/tests/t08_image_file_viewing.py that:

  1. Setup Phase: Downloads the OpenHands logo from the same URL used in example 17
  2. Instruction: Asks the agent to view logo.png and identify if it's blue, yellow, or green
  3. Verification Phase: Checks that the agent's response mentions "yellow" color
  4. Vision Check: Ensures the test runs only with vision-capable LLM models (like example 17)

The test follows the existing integration test pattern (extends BaseIntegrationTest) and will be automatically discovered by the test runner.

✅ Quality Checks

The integration test is now ready to be executed either through GitHub Actions (using the integration-test label) or locally via run_infer.py.

View full conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle image content in FileReadObservation

5 participants