Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
This repository was archived by the owner on Mar 24, 2025. It is now read-only.

Self-closing tags are not supported as top-level rows #92

@upalkhouski

Description

@upalkhouski

Self-closing tags that are supported by xml standard are not supported currently.

Trying to read xml with self-closing tags generates empty schema.

Python Code snippet:

df = sqlContext.load(source="com.databricks.spark.xml", rowTag = 'book', path = 'file:///home/cloudera/pp/1/books.xml')
df.printSchema()

XML:

<?xml version="1.0"?>
<catalog>
   <book id="bk101"/>
   <book id="bk102"/>
   <book id="bk103"/>
   <book id="bk104"/>
</catalog>

Schema output:
root

Spark Version: 1.3

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions