Skip to content

param char_whitelist for Text::OCRTesseract::create() should be an empty string instead of null which fallbacks to [0-9a-zA-Z] #3457

@n0099

Description

@n0099
System information (version)
  • OpenCV => 4.7.0
  • Operating System / Platform => Windows 8.1 and Ubuntu 22.04
  • Compiler => ❔
Detailed description

@param char_whitelist specifies the list of characters used for recognition. NULL defaults to
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".

This behavior spends me hours on figuring out why using Tesseract to recognize CJK chars is working on Emgu.CV but not OpenCvSharp:
shimat/opencvsharp#1542
shimat/opencvsharp#873
shimat/opencvsharp#1364

Steps to reproduce
Issue submission checklist
  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    forum.opencv.org, Stack Overflow, etc and have not found any solution
  • I updated to the latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions