UTF-8 and new Documents via Client API

I'm using the Java client API 1.0 in my project to input/post documents to DocumentDb. Many of the DocumentDb JSON documents include text in languages with diacritic marks.

The POJO to JSON conversion is performed via FasterXML Jackson - code has been in production for some time (18 months) with JSON being written to Mongo, ElasticSearch, and Postgres (JSON and now JSONB). The UTF-8 encoding of the JSON to those data stores has not been an issue. However, the same JSON written to DocumentDB will lose its encoding; e.g., 

ou vítimas de maus tratos da região

becomes:

becomes: ou v�timas de maus tratos da regi�o

Inputting (updating) the document from the Azure Portal with the encoded text works, so it appears to be an issue with the client code. I'm checking the byte[] for UTF-8 encoding and no issues prior to inputting JSON to the client API for document creation - again the same JSON goes into the other data stores with no issues.

As seen here, dependencies are managed via Ivy - not relevant to the issue - just not keene on directly managing the Java DocumentDb source directly, much prefer just referencing the lib.

<dependency org="com.fasterxml.jackson.core" name="jackson-core" rev="2.5.1"/>
<dependency org="com.microsoft.azure" name="azure-documentdb" rev="1.0.0"/>

I welcome ideas and suggestions - thank you in advance ...

Jack


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UTF-8 and new Documents via Client API #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF-8 and new Documents via Client API #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions