Monday, May 26, 2014

Don't Be Afraid of SingleOrDefault

The following is based on an issue I've seen many times in a code base at work, and only recently caused the problem described below.

Consider the following code:

Assume that for good reasons we need to find JobSources by their Source and JobId values. The calling code doesn't have the Id. The "database" doesn't enforce uniqueness of these 2 values as a key, we just assume that every other system that feeds up data behaves this way, too. Well, hope is a better word, as we'll see.

The first time UpdateOrAdd is called, it will find the one and only matching JobSource and update it. Then someone adds a second JobSource for job 2. We again try to update the JobSource with JobId 2 and SourceType File. Success! No exceptions.

Except, we're using Any to see if there are 1 or more items and then updating the first one. The code tolerates the condition of there being more than one item that matches our "key". We have potentially updated the wrong data. Corruption!

Now let's change UpdateOrAdd to use SingleOrDefault:

Not only is the expected shape of the data clear, but it's shorter too. SingleOrDefault returns the 1 item that matches or the default value for the type. For any class type, the default is null. If more than 1 item matches, it throws an exception.

Now on the second call to UpdateOrAdd, we get the following exception:

System.InvalidOperationException: Sequence contains more than one element.

Oh no! A noisy exception, instead of silent data corruption!

Why would someone write code the earlier Any/First stuff? Because some coders are scared of triggering exceptions. More scared of that than corrupting data. Any time I see this pattern in code at work I change it to use SingleOrDefault, or just Single if the item is definitely expected to be present. I've had to convince people that it's better to report a problem than update the wrong item.

Yes, there is a chance that the exception won't happen during testing because the data in the lab might be very clean. Perhaps no one has yet created the conditions there that break your assumption about the shape of the data. It just might be found by a real customer. That customer, though, will report the issue and, more importantly, not lose data.

Monday, May 12, 2014

A Very Practical Use of C# Structs

A programming interview question folks will tell you to be prepared for is "when should you use structs" and the general response is "when you want pass by value behaviour". That is, when you want data to be passed as a copy rather than as a reference.

Ok, but when would you ever want that behaviour? An issue at work recently provided an example: when you can't trust that the thing your object was passed won't be changed. Or that the object you passed in turns out to be the same one passed out later. Here's an example:

There is a class, PersonUsingClasses, that takes items from two data sources and holds on to the references, rather than making copies of all of the properties of each. It then uses those references to construct a new property, DisplayName. I've written such code myself, unaware of the dangers. This class also has a way to convert it back to one of the data source types, ToServicePerson.

A couple of things to notice:
  1. The PersonClassFromService reference sent into the constructor is the same one passed out in ToServicePerson
  2. Someone forgot to remove his debugging code in the constructor.
This means that any changes made to the Service person object inside or outside of the PersonUsingClasses object will appear everywhere at once. Frank changed his salutation in Main to Dr. from outside the class, but the class reflected this change. Also, some code in Main tries to detect if the Age property changed by comparing the input and output objects, and what do you know, there was no change.

One option to fix this is to not hold on to the references passed into the constructor and make copies of every property that matters to PersonClassUsingClasses. That might be a good option, but we need to support recreating the PersonClassFromService object, which could have properties that don't matter. Or, there could be dozens and dozens of properties to track. (I worked at a bank - trust me when I say this happens.)

A nearly equivalent approach is to use return structs from the DB and service:

Now the system copies every property in every input parameter for us because they are structs. It copies every property again when we call ToServicePerson. This means that changing the input parameter does not get reflected in PersonUsingStructs and we can detect the accidental change in the constructor. This approach uses more memory because of all of this copying, but it's not much more than if we'd made copies of every property ourselves in PersonUsingClasses. We've protected the system from accidental changes.