-
Notifications
You must be signed in to change notification settings - Fork 71
Description
I am using version 1.5.3 to ingest xml files to kafka
maybe it is a quetion instead of an issue.
This is related to but different than the issues #74
The fix for the above issue, starts to read the attributes of the XML file, but the resulting inferred sechema* can be different when the attribute has only 1 elelement versus when the attribute has more than 1 elements.
Now the situation is, I have 2 xml files, you can see the inferred schema (hence the payload format) is different from the resulting message.
I do not know whether i can force the attribute to be array.
Below is the content of file1
<?xml version="1.0" encoding="UTF-8"?>
<data>
<field1>field1Value</field1>
<field2>
<value attributeWantToBefield="attribute1Value">27</value>
</field2>
<field4>2020-08-01T18:00:00</field4>
</data>
below is the content of file 2
<?xml version="1.0" encoding="UTF-8"?>
<data>
<field1>field1Value</field1>
<field2>
<value attributeWantToBefield="attribute1Value">25</value>
<value attributeWantToBefield="attribute2Value">77</value>
</field2>
<field4>1919-08-02T18:02:00</field4>
</data>
from the attached files, you can see
Issue: The payload part is different in 'format', this will make the parsing end json difficult.
payload file 1 (not array)
"payload": {
"data": {
"field1": "field1Value",
"field2": {
"value": {
"attributeWantToBefield": "attribute1Value",
"value": "27"
}
},
"field4": "2020-08-01T18:00:00"
}
}
payload file 2 (array for the attribute part)
"payload": {
"data": {
"field1": "field1Value",
"field2": {
"value": [
{
"attributeWantToBefield": "attribute1Value",
"value": "25"
},
{
"attributeWantToBefield": "attribute2Value",
"value": "77"
}
]
},
"field4": "1919-08-02T18:02:00"
}
}
}
Attached are message result (1 is the two xml files combined not formatted message, the tother 2 files are the formatted message )
result.formatted_fiel1.json.txt
result.formatted_file2.json.txt
result_to_reformat.json.txt
from the inferred schema you can see value
is inferred to be struct type, and the other one is inferred as array.