The .Net framework offers many ways to parse and process XML files, some more convenient than others but I find that they can be problematic when parse extremely large amounts of data.
The example below demonstrates how you can quickly use the XmlTextReader class to read the contents of an XML file being modest on memory utilization
Assume you have an xml file like this:
<albums>
<album releasedate="February 15, 1975">Fly by Night</album>
<album releasedate="September 1, 1977">A Farewell to Kings</album>
<album releasedate="June 12, 2012">Clockwork Angels</album>
</albums>
And want to extract all the album names in it. You can do it like this:
// this is just a outer method used to keep things clean. // note the finally block closing the XML reader resources // it will return a list of all album names in the XML file above, in their appearing order private IList< String > GetAlbumNames(String albumXmlFileName) { // creates a xmlReader to parse the file in albumXmlFileName XmlTextReader xmlReader = new XmlTextReader(albumXmlFileName); try{ return GetAlbumNames(xmlReader); } finally { xmlReader.Close(); } } private IList< String > GetAlbumNames(XmlTextReader xmlReader) { IList< String > albumNames = new List< String >(); // keeps moving the "cursor" in the file till the end while (xmlReader.Read()) { // regular nodes are typically a XML Element of type XmlNodeType.Element and the Name property // of the node if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "album") { // at this exact point, the cursor is pointing to the Element node // we need to do another read to move the cursor to the content of the node we want to read xmlReader.Read(); string albumName = xmlReader.Value; albumNames.Add(albumName); } } return albumNames ; }
If you want to read an attribute present in one of the nodes, e.g, "releaseDate" in your example XML above, you can just the method GetAttribute
Like this:
private IList< String > GetAlbumReleaseDates(XmlTextReader xmlReader) { IList< String > albumReleaseDates = new List< String >(); while (xmlReader.Read()) { // the attribute releaseData belongs to the element album so we need to make sure // the cursor is in the right place if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "album") { String releaseDate = xmlReader.GetAttribute("releaseDate"); // always check for nulls, even if you're enforcing schema to have attributes. It is a good practice if(releaseDate != null) { albumReleaseDates.Add(releaseDate); } } } return albumReleaseDates ; }
For more on XmlTextReader, see this. As I mentioned before, there plenty of other alternatives to parse XML in .Net and I shall write about them in my next posts.
No comments:
Post a Comment