This repository was archived by the owner on Mar 24, 2025. It is now read-only.

Description
Hello,
we are trying to extract information from the XML Tag SubscriptionType that only has text content.
<myFile>
<SalesBySubscription>
<!-- One SubscriptionType & Currency per XML-->
<SubscriptionType>Free</SubscriptionType>
<Currency>EUR</Currency>
.....
<!-- millions of rows-->
</SalesBySubscription>
</myFile>
However, the schema printed from the call:
val SubscriptionTypeDS = sparksession.read
.format("com.databricks.spark.xml")
.option("rowTag", "SubscriptionType")
SubscriptionTypeDS.load(pathToFile)
SubscriptionTypeDS.cache()
SubscriptionTypeDS.printSchema()
is always null.
Is this a known issue? Can we help you somehow?
Thank you!