Friday, October 17, 2003

DataSet Computed Columns

My client asked me about a slightly tricky problem the other day. I thought I’d share the answer here, since I thought it was fairly clever.

The problem stems from the fact that they have a database that has a table with columns that can hold values like ‘Y’, ‘N’, ‘0’, or ‘1’ to indicate true/false. Yes, they know this was probably not the best way to do it. No, they can’t change it, because there’s still a ton of existing code that both reads and writes this data. So the question was, “When writing new code, how do I get this to show up as a Boolean value in the DataSet?”

My first thought was to simply take care of the conversion in the SQL. If we knew which database we were using, we could fairly easily write our query to simply turn ‘Y’ and ‘1’ into true, and everything else into false. Unfortunately, the application has to work against both SQL Server and Oracle. While I actually have a way to deal with the differences in SQL between the two (more on that another time), I was still hoping to find some magic in the System.Data stuff that would take care of it for me.

Bob Beauchemin – database guru for DevelopMentor and all-around nice guy – turned me on to a nice way to handle it. He pointed out that the DataSet has the ability to create derived columns, where the value of the new column is computed using the value of existing columns and/or some simple operators. Perfect! A quick test showed that the code for creating my Boolean column looks something like this:

// Retrieves a DataSet from the database that has a Y/N/0/1 column called tfCol
DataSet ds = GetDataSet();

DataColumn dc2 = new DataColumn("derivedCol", typeof(bool), "(tfCol = 'Y') OR (tfCol = '1')");



That’s really all there is to it – every row will now have a Boolean column that will be true when the original column is ‘Y’ or ‘1’, and false otherwise.


The important part is the expression in the new DataColumn (which I’ve called derivedCol). I’ve bolded the expression. Notice that I have the ability to use comparison (=) and logical (OR) operators. There are a bunch more things you can do, too. They’re listed under the DataColumn.Expression property in the docs.


One other cool idea that Bob had was that I could suppress the original column (the one with the Y/N values) from the serialized representation of the DataSet by also setting the ColumnMapping property of the DataColumn to MappingType.Hidden, like so:


ds.Tables[0].Columns["tfCol"].ColumnMapping = MappingType.Hidden;


If I do this, when serializing the DataSet to XML – as happens when returning it from a WebMethod call – the tfCol column will not be included in the data at all…but my derived column will. Which means that this is a fairly reasonable and easy way to clean up the data before sending it off to the client.



  1. Did you have to handle updates from the DataSet ? I was wondering how to enable any new code working with this dataset to edit the derived column and persist the changes later in the original column. Doesn't seem possible on a first look.

  2. Good question. But of course it's not in general possible. Imagine I had two columns I was getting from the database, a and b, and a third column c that I calculated based on the formula c = a + b. If a is 2 and b is 3, and I change c from 5 to 7, what should I update a and b to be? 3 and 4? 1 and 6?

    Even in my case, if I set the derived column to true, should I choose 'Y' or '1'?

    Of course, the DataSet provides enough hooks for you to write code that defines the right thing and makes it happen, but you're not going to get it automatically.