More articles

How To Read Any Kind Of File In C#

Written by
Filed under
Published on
Modified on

If you're a data analyst or data scientist or work with data in some way, shape, or form, then at some point you've had to deal with random text files full of data that needs to be extracted. And more than likely, you end up forgetting how to do this each and every time in whichever language you use in your endeavors. So today I'm going to cover all the possible files that you may need to read into your programs in the hopes that this can be a one-stop shop for your data extraction needs.

Table of contents

1. Text files

This is the easiest, and one of the more common files that you will encounter on your journey. The fastest way to get this done, is to use the System.IO File object, which comes with a handy method ReadAllText, which does exactly what its name implies.

System.IO.File


    var fileContents = System.IO.File.ReadAllText(Server.MapPath(@"~/data/important.txt"));

This is the quickest way to read in the entire contents of a file, however, that's also it's weakness. If you're loading in massive files, then you're going to have a bad time. Loading an object in memory with a ton of data that you will have to parse and traverse is probably not going to go well. But if you're dealing with a file in the Kilobytes perhaps, then this would be a good approach.

Note that the ReadAllText method requires a drive directory path in order to retrieve the files contents. And because we don't know the exact drive path normally, as that could result in a security concern, we can use the built in .NET Server method MapPath which takes in a relative file path, and returns the physical location on that particular server.

You can also read the entire contents of a file into an array of strings. More convenient for CSV files, however the same resource limitations apply as well. In order to accomplish this, you can use the ReadAllLines method of the File class.


    string[] lines = System.IO.File.ReadAllLines(Server.MapPath(path));

If performance becomes an issue however, you can use the following StreamReader approach.

StreamReader

The StreamReader implementation requires a bit more work, however, it is more resource friendly, as it will open a stream to your file in which you can read the contents bytes at a time as oppose to all at once.

note: You won't be able to use the file while it is read from. This includes deletions or edits to the file.

    try
    {
        using (StreamReader sr = new StreamReader(Server.MapPath("~/TestFile.txt"))) 
        {
            string line;
            while ((line = sr.ReadLine()) != null) 
            {
                 // do what you will
            }
        }
    }
    catch (Exception e) 
    {
        // log the error
    }

2. Read in XML

This is a bit more work, and there are a few more ways to go about doing this in .NET. There are a few built in .NET classes that help work with XML, but you always have option of parsing the content yourself. There is no right or wrong here. Many times, I need data in a hurry, and don't normally have time to make sure that my XML is well formatted and such.

Using XmlTextReader

This is the most straightforward way to read in an XML file, however, it requires the most hands on work.


    XmlTextReader reader = new XmlTextReader(strXmlFile);

    while(reader.Read())
    {

    }

This is the most resource method to read in an XML file because it is using a stream of data. However, it is also the most hands on and somewhat tedious approaches. Each and every node, attribute, element etc will have to be parsed and detected by your code, so knowing the schema beforehand becomes important.

You also have the option to use the XMLDocument object however. Which gives much more built-in support as it loads the XML document into a strongly typed object.

Using XmlDocument

This is the easiest way to handle XMl files in .NET, however, it is also more resource intensive. I've seen many examples of its use with 10 maybe 20 records as sample data. In real life however, you're going to be loading in large sized XML files, and that, is noticeable on your hardware.


    XmlDocument doc = new XmlDocument();
    doc.Load(serverpath);

Once you load up the XML document you have some options to explore. You can get the entire contents of the file using the following:


    string contents = doc.OuterXml;

Or you can traverse through the document node by node. But again, many ways you can go about doing this. For example, you can get elements by their tag names as such:


    var ele = doc.GetElementsByTagName("book");
    var book1 = doc.GetElementById("book1");

I find this to be the most useful approach as I have much more control over how I manipulate the data. I can get all of the "book" elements, and for each use the same GetElementsByTagName method to extract data from that particular node. You will have to play around with the many built in functions to see which is the most beneficial to you.

You also have the option of loading XML content directly into the Load method as well. But of course, since we're thinking more big data, this probably won't apply to you.

3. Read in JSON

This one is yet more work than the XML, but definitely comes in handy for many many things. For one, you can load data directly into a strongly-typed object with very minimal work. Here is a sample JSON file that you can use for your testing pleasure.


{
    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
						"GlossSeeAlso": ["GML", "XML"]
                    },
					"GlossSee": "markup"
                }
            }
        }
    }
}

Using JSON.net

The simplest way to accomplish reading in JSON data is to use Json.Net by newtonsoft. It can accomplish two major things for you.

1. Serialize data into JSON


    Product product = new Product();
    product.Name = "Apple";
    product.Expiry = new DateTime(2008, 12, 28);
    product.Sizes = new string[] { "Small" };

    string json = JsonConvert.SerializeObject(product);

This will get converted into the following string:

{
   "Name": "Apple",
   "Expiry": "2008-12-28T00:00:00",
   "Sizes": [
     "Small"
   ]
}

2. Deserialize data into a strongly-typed object


string json = @"{
  'Name': 'Bad Boys',
  'ReleaseDate': '1995-4-7T00:00:00',
  'Genres': [
    'Action',
    'Comedy'
  ]
}";

Movie m = JsonConvert.DeserializeObject(json);
string name = m.Name;

And you are 1 DLL away from having full functionality of it in your project. Just head on over to newtonsoft's website to download the source package.

Read an Excel Sheet

Excel sheets are still around. And despite their antiquity, they are probably holding some very important data right now in someone's 10 year old desktop. The cool thing about Excel sheets, is that they are built using the ODBM data standard, and in .NET that means that we can use SQL Queries to parse and retrieve data.


string json = @"{
  'Name': 'Bad Boys',
  'ReleaseDate': '1995-4-7T00:00:00',
  'Genres': [
    'Action',
    'Comedy'
  ]
}";

Movie m = JsonConvert.DeserializeObject(json);

string name = m.Name;

Notes on performance

Data extraction is pretty fun for the most part. You get to turn raw strings into meaningful structures and you get to play around with it as you wish. Having said that, there is almost always a performance problem when doing so. Or a timing issue. Depending on the amount of processing that you'll be doing, you could end up with a process that takes 1 second per record to process, which sounds great, just to realize that you have 1 million records to process. So if you have 277 hours of processing time available then have at it. But if you don't, then a big part of data analysis will also have to deal with how well your application performs.

Walter G. is a software engineer, startup co-founder, former CTO of several tech companies and currently teaches programming for a coding bootcamp. He has been blogging for the past 5 years and is an avid BMX rider, bio-hacker and performance enthusiast.
Buy me a coffeeBuy me a coffee
If you read this far, then I hope you enjoyed this post and found it useful! Consider adding to my daily coffee funds to continue to provide better and more helpful articles in the future!
Maybe later

Discussion / Comments / Questions

No messages posted yet

Add a comment

Send me your weekly newsletter filled with awesome ideas
Post comment