Wednesday, 12 November 2014

How to parse and read large (or small) XML files in .Net using C#

Frequently I hear people saying the XML is going to die soon and be full replaced by JSON. That may even happen one day but the truth is there's plenty of XML data around that needs to be processed.

The .Net framework offers many ways to parse and process XML files, some more convenient than others but I find that they can be problematic when parse extremely large amounts of data.

The example below demonstrates how you can quickly use the XmlTextReader class to read the contents of an XML file being modest on memory utilization

Assume you have an xml file like this:

<albums>
  <album releasedate="February 15, 1975">Fly by Night</album>
  <album releasedate="September 1, 1977">A Farewell to Kings</album>
  <album releasedate="June 12, 2012">Clockwork Angels</album>
</albums>


And want to extract all the album names in it. You can do it like this:

// this is just a outer method used to keep things clean.
// note the finally block closing the XML reader resources
// it will return a list of all album names in the XML file above, in their appearing order
private IList< String > GetAlbumNames(String albumXmlFileName)
{            
    // creates a xmlReader to parse the file in albumXmlFileName
    XmlTextReader xmlReader = new XmlTextReader(albumXmlFileName);
    try{
        return GetAlbumNames(xmlReader);
    }
    finally
    {
        xmlReader.Close();
    }                        
}

private IList< String > GetAlbumNames(XmlTextReader xmlReader)
{
    IList< String > albumNames = new List< String >();
            
    // keeps moving the "cursor" in the file till the end
    while (xmlReader.Read())
    {
        // regular nodes are typically a XML Element of type XmlNodeType.Element and the Name property
        // of the node
        if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "album")
        {                  
            // at this exact point, the cursor is pointing to the Element node
            // we need to do another read to move the cursor to the content of the node we want to read  
            xmlReader.Read();
            string albumName = xmlReader.Value;
            albumNames.Add(albumName);
        }

    }
    return albumNames ;
}



If you want to read an attribute present in one of the nodes, e.g, "releaseDate" in your example XML above, you can just the method GetAttribute
Like this:
private IList< String > GetAlbumReleaseDates(XmlTextReader xmlReader)
{
    IList< String > albumReleaseDates = new List< String >();
            
    while (xmlReader.Read())
    {
        // the attribute releaseData belongs to the element album so we need to make sure
        // the cursor is in the right place
        if (xmlReader.NodeType == XmlNodeType.Element && xmlReader.Name == "album")
        {                  
            String releaseDate = xmlReader.GetAttribute("releaseDate");
            // always check for nulls, even if you're enforcing schema to have attributes. It is a good practice
            if(releaseDate != null)
            {
              albumReleaseDates.Add(releaseDate);
            }
        }
    }
    return albumReleaseDates ;
}


For more on XmlTextReader, see this. As I mentioned before, there plenty of other alternatives to parse XML in .Net and I shall write about them in my next posts.

No comments:

Post a Comment