XML from absolutely nothing¶
XML elements¶
The basic unit of an XML document is an XML element.
A standard XML element consists of:
a start tag – e.g.
<my-element>
;optional content – e.g.
Some data
;an end tag – e.g.
</my-element>
.
<my-element>Some data</my-element>
If an element has no content, it can be abbreviated into an empty element tag:
<my-element />
To repeat then, there are three types of tag:
a start tag – e.g.
<a-name>
;an end tag – e.g.
</a-name>
;an empty element tag – e.g
<a-name />
.
A tag always starts with <
and ends with >
.
A start tag and an empty element tag must start with an element name. This is a case-sensitive string starting with a letter or underscore, followed by any combination of letters, digits, hyphens, underscores, and periods.
A start tag and an empty element tag can have attributes. These are name, value pairs:
<a-name an-attribute="my value" another-attribute="3">Some text</a-name>
<a-name an-attribute="my value" another-attribute="3" />
The content of an element consists of zero or more items, where an item can be:
Text
An element
Elements contained in other elements are child elements, e.g:
<a-parent>
<a-child>with text content</a-child>
<another-child>with more text</another-child>
</a-parent>
You can mix text items and element items in element content, like this:
<a-parent>
Some text
<a-child>with text content</a-child>
More text
<another-child>with more text</another-child>
Text continues
</a-parent>
but it is more common to have element content that is either one or more element items, or one single text item.
XML documents¶
There is a single element at the root of a valid XML document. This is the root element.
some_example.xml
<a-root-element my-type="example">
<at-second-level>
<first-thing>Some text</first-thing>
<second-thing>More text</second-thing>
</at-second-level>
</a-root-element>
To take another example, this would be a valid XML document, because it has a single element at the root level:
<my-element>
Some text
</my-element>
But this would not, because it has two elements at the root level:
<my-element>
Some text
</my-element>
<another-element>
More text
</another-element>
The XML document may start with a special construction called the XML prolog of this form:
<?xml version="1.0"?>
Default XML encoding is UTF-8, but you can specify another encoding in the XML prolog:
<?xml version="1.0" encoding="UTF-16"?>
Reading XML¶
For example, in Python:
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('some_example.xml')
>>> root = tree.getroot()
>>> print(root.tag)
a-root-element
>>> print(root.attrib)
{'my-type': 'example'}
>>> children = root.getchildren()
>>> print(len(children))
1
>>> only_child = children[0]
>>> for child in only_child.getchildren():
... print(child.tag, child.text)
...
first-thing Some text
second-thing More text