-
Notifications
You must be signed in to change notification settings - Fork 47
Description
I'm using the Java client API 1.0 in my project to input/post documents to DocumentDb. Many of the DocumentDb JSON documents include text in languages with diacritic marks.
The POJO to JSON conversion is performed via FasterXML Jackson - code has been in production for some time (18 months) with JSON being written to Mongo, ElasticSearch, and Postgres (JSON and now JSONB). The UTF-8 encoding of the JSON to those data stores has not been an issue. However, the same JSON written to DocumentDB will lose its encoding; e.g.,
ou vítimas de maus tratos da região
becomes:
becomes: ou v�timas de maus tratos da regi�o
Inputting (updating) the document from the Azure Portal with the encoded text works, so it appears to be an issue with the client code. I'm checking the byte[] for UTF-8 encoding and no issues prior to inputting JSON to the client API for document creation - again the same JSON goes into the other data stores with no issues.
As seen here, dependencies are managed via Ivy - not relevant to the issue - just not keene on directly managing the Java DocumentDb source directly, much prefer just referencing the lib.
I welcome ideas and suggestions - thank you in advance ...
Jack