Saturday, February 18, 2017

WebAuthenticationBroker and OAuth2 UserCancel Error

Windows Phone 8.1 programming can be a little... opaque. Yes, I was working on a Phone 8.1 project. My phone can't be upgraded to 10. I'm in the less than 1% of phone users.

I wanted to build an app that can do OAuth2 authentication, so I started with an example from IdentityServer3.Samples on GitHub. It uses WebAuthenticationBroker to present the OAuth server's UI to the user. A client with an ID of "implicitclient" was missing from Clients.cs, but I just copied one of the JavaScript implicit flow clients. The sample worked - I could get both an ID token and an access token at the same time.

I created a similar Client.cs in my existing project and pretty much copied the sample WinPhone example. I could get an ID token from my server and I could get an access token. But not both at the same time.

The WebAuthenticationResult.ResponseStatus value was WebAuthenticationStatus.UserCancel, a very generic error that has many sources. The ResponseErrorDetail property had a more specific error number, 2148270093. I couldn't find many references to this number on the web, but in its hex form, 0x800C000D, I found results. It's a "URL moniker code" produced by IE meaning INET_E_UNKNOWN_PROTOCOL. The description is "The protocol is not known and no pluggable protocols have been entered that match." Still not much to go on.

The callback URL for Phone 8.1 apps starts with ms-app://, which I thought maybe wasn't being recognized. I pointed my Phone app at the sample server and I could get both tokens at the same time.

I debugged the Phone app and grabbed the URL from both my auth server and the sample one. The sample server's callback URL was quite a bit shorter. It started to dawn on me that Phone 8.1 uses IE11 and that it might have a fairly conservative URL size limit. It turns out it's 2083, which is not long enough to hold both of my server's tokens. My signing certificate's key is twice the size of the sample servers, making the token signature twice as long.

So, how to shorten the URL?

I was needlessly including some claims, so I cut them out. I read that elliptical curve keys are shorter, which makes for shorter signatures. IdentityServer3 doesn't have support for EC certificates out of the box, so that would have been some work.

Then I finally stumbled across the idea of reference tokens. It turns out that they are the typical way to shorten an OAuth2 callback URL. Instead of the entire access token the URL contains a short identifier. Clients send the identifier to the server and the server looks it up from its database.

After 3 or 4 days of beating my head against the wall, problem solved. Now I can login.

Wednesday, December 14, 2016

Don't be Afraid of SingleOrDefault - Much Worse, Performance Problem Edition

I found another example at work of someone not taking advantage of SingleOrDefault when making Linq-to-Sql calls, but in a much worse way. I previously mentioned the use of Any() and First() in https://jamesmclachlan.blogspot.ca/2014/05/dont-be-afraid-of-singleordefault.html.

The new example constructs a query and calls Any() to test for the presence of at least 1 result. Any() is very efficient on its own because it uses SQL's EXISTS to just check for at least 1 record without reading anything about the record. This isn't a problem on its own.

Unfortunately, this was followed by a call to ToList() and then [0] to get the first item, instead of First(). The effect of ToList()[0] is to run the query and pull every one of the matching records into memory and then take the first item. First() at least tells SQL to only return the TOP 1 item.

Even worse is that because of faulty logic the code fails to add any query parameters, loading ALL of the records in the system of a particular type. Production has many tens of thousands of such records. Luckily, it only does this in a very specific case. If anyone has ever seen a problem they haven't reported it.

So it's not enough to hunt down uses of First() and FirstOrDefault(). We also need to look for ToList()[0], or perhaps all uses of All().


Friday, October 14, 2016

Handling User Data Securely - Really, Really Securely

Imagine you're building a web app that requires a password from a user for encryption. This password is not the same as their login one, it's used only for encrypting and decrypting their data in the browser. The user enters the password, the app uses it to generate an encryption key that the browser can use and then encrypts and uploads data. Later, the user types the password in again so that they can work with the encrypted data.

One of the things your app promises is that it never stores the password on the server in any form that another party could intercept or crack. And if someone compromises the user's computer or it gets stolen or seized the app promises that there is no trace of the password left behind.

This means never including the password or a hash of it on any posts to the server, neither in form submissions nor AJAX calls nor in something like ASP.NET Webforms view state. Similarly, the app must not save the password in any form to a persistent store like the browser's local storage or one of the database systems that some browsers support.

What options do you have for building such a system with a typical web system that involves form posts like ASP.NET MVC or WebForms?

There are 3 ways a web app typically persists data that the user enters from one page to another:
  1. cookies written to disk by the browser and sent back to the server with the page request
  2. a view state system that includes bits of data in hidden form fields
  3. session storage in the browser to leave data behind for the resulting page to read
The first 2 obviously send data to the server, so they are not solutions. The third one does't send data to the server, and it might seem as good as being only in memory. However different browsers treat it differently. They don't all clear it when you close the browser, and if the browser dies or the power goes out it may be left on disk.

You might be tempted to encrypt the value before writing it to view state or session storage, but who controls that encryption key? If you do, then you have to manage it and may have to reveal it under court order. If you generate a key in the user's browser, then you have the same problem of storing something between page views.

You conclude that the you can only hold the user's password in memory, meaning a JavaScript variable. The problem is that they disappear every time you visit another page. The easiest solution to program is for the app to ask the user to enter their password on every page that needs it. Not exactly a delightful experience.

Obviously you want to minimize the number of times the user enters their password. One approach would be for the app to have a field that collects the password, perhaps at the top. The rest of the page performs AJAX-y form posts that submit encrypted data, receives HTML from the server and updates the DOM with the response. Something like WebForms' UpdatePanel or Ajax.BeginForm in MVC.

If the user navigates to another part of the app that doesn't need the password, though, they have to reenter it when they return. This might be an advantage because the app only asks for the password in applicable contexts. Users would learn to do everything they need to before navigating, or they'd use 2 browser windows or tabs.

You could get the number of times the user needs to enter the password down to once per use session by using a single page app (SPA). If the app is all JavaScript and AJAX calls for data then it can keep the variable holding the password alive for the entire duration of use. If the user closes the window the password evaporates. (Well, it might still be sitting in RAM or on disk in a virtual memory file, but this seems beyond our control in a browser.)

SPAs have an entirely different development process that you may have to learn, especially if you are used to server side web development (ask me how I know :). Putting your time in to learn will be worth it, though, if you want to give the user the best possible experience of entering a secure value exactly once.

Sunday, July 17, 2016

Code Learnability vs. Feature Addability

I'd like to talk about made up words.

No, really I'd like to talk about the tension between making a code base easy to learn vs. the ease of adding things to it, specifically about web applications.

Imagine you had a slider control for your code that went from "easy to learn" to "easy to maintain".

When you slide the control all the way over to "easy to learn", most of the classes disappear along with all of the interfaces and many if statements appear. Methods balloon in size and contain duplicate logic because there are so few classes to encapsulate it. It's very easy to trace data and logic through calls up and down the stack. Anyone reading the code will come across all of the decisions it makes. The reader won't have a good sense of why the code does what it does, though. The lack of little methods like DetermineRanking or classes like RankingEngine mean that such code lives all over the place.

At "easy to learn", you can give a new developer your codebase and they'll be productive very quickly with little mentoring. Need to add a new data field when adding a customer? Just find the giant AddCustomer method or click button event handler and change line 654. A quick bit of functional testing shows that more related logic lives down on line 1233.

Oops, the same logic appears in SaveCustomer and now you can add a customer with a new field but users can't update it. QA or, even worse, customer support opens a bug, new developer finds duplicate logic, copy, paste, fixed. Again and again.

In some organizations, this is totally acceptable to devs, managers and customers. They can hire less skilled, cheaper developers and undercut the competition. Their customers are probably buying the software for some cost center in their business, so spending as little as possible is the goal. Everyone here operates at the low end of some market, which might make sense at the moment. As the dev shop or the customer grows they'll try to level up the software they make or use.

(The field of Economics contains an idea that ideally every product exists at every quality and price point so that everyone can afford the version of a product that fits precisely with their income. If there were only shacks and mansions to live in you'd have to live in a shack until you can afford a mansion. Better to have a wide range of housing options, especially within a neighbourhood.)

I'll bet the development shop's employee turnover is pretty high, though. As soon as a dev learns a little more about development they look for a more advanced employer. Again, some companies accept this and keep hiring from the ever growing pool of junior developers, which is fine.

I'll also bet that successfully adding features takes weeks, including writing code and back and forth with QA. Such code is not unit testable, so people find all bugs during end to end manual or automated testing.

At the other end of the slider, "easy to maintain", classes and interfaces abound, if statements largely disappear, but the structure obfuscates the code's flow. Frameworks and libraries, like inversion of control (IOC) containers and request pipeline handlers, automatically set up actions and dependencies. This means that for the app to know what to do with a class or method you only have to put a specific interface or attribute on it.

The experienced developer's productivity goes way up and bugs go way down because of unit testability and the inherent design (e.g. code no longer allows the string "0" or "1" for values, it uses booleans). However, the major trade off is the learning curve and cognitive load of the code base. The concept count goes through the roof. With so many classes and seemingly magic behaviour, new developers need a lot of guidance with the design. That could be in person or in a recorded video or design docs.

(I recently experienced this myself looking at code samples that rely exclusively on IOC containers to resolve dependencies. Some containers use an easy to follow "map interface IFoo to class Foo" syntax, while others use "map all of Foo's interfaces to Foo". The latter makes it harder to find the thing that maps IFoo to Foo with Visual Studio's Find All References because IFoo doesn't appear in the mapping. I even found one IOC container with a convention-based mapping method like "map all classes in assembly X to the interfaces matching their names". This meant that neither the classes nor the interfaces appear in the container setup code. Unless you find the conference video or series of blog posts where the author goes from original code to the heavily refactored version, you can end up puzzling over the approaches you see.)

Adding features now takes days and involves mostly one way trips through QA.Production bug counts due to actual code problems go down.

The code now requires more experienced developers who demand higher pay. This causes the software to cost more. Customers are happy to pay, though, because the software "just works", which they need to keep their profit centers working. Turnover drops.

There is a joke (belief?) in programming that the more complicated and obscure the code the more job security you have because only you know how to update it. This depends on what makes the code complicated, though, and what the benefit is to the company. Are you using well documented but advanced libraries that do a lot of heavy lifting, or are you writing 1000 line methods of bit shifting code with single letter variables?

Friday, April 1, 2016

It Doesn't Matter How They Get The Data

I was reading about using the SecureString class. This is a class that encrypts a string so that it is readable only on the current machine so that it does not hang around in plain text. Strings in C# are immutable, so they stick around until they are garbage collected, even if you set your reference to a string to String.Blank (""). The typical use is to collect the value from the user directly into a SecureString and shuttle it around on the current machine. As soon as it leaves the machine it must be turned back into a plain string, which is not a best practice.

I was looking into using SecureString in ASP.NET to handle a password. That password is passed in in plain text, and probably again when you actually use it, so there would be more than 0 instances in clear text in memory. Most StackOverflow answers point this out, but it would at least cut down on the number of instances as it is passed from method to method. Strings are a value type, meaning that every time you call another method with one as an argument .NET creates another copy.

In addition, I kept coming across the same kind of comment:

Yeah, no one has ever been able to read a remote, web-accessible machine's memory. Until they could.

This reminded me of a comment a co-worker made about something you could do with a value from one of our databases: "How could anyone get it?"

It doesn't matter.

How did they get the Ashley Madison data? How did they get the Sony data? How did they get the U.S. Office of Personnel Management data?

It doesn't matter.

What matters is what more they can do with it. This is called "pivoting", or using one breach to hop a level deeper. Security through obscurity works until your whole database appears on the web. Any SQL system is one SQL injection vulnerability away from giving up everything. Any software involved in serving web requests is a buffer-overflow bug away from barfing memory.

Practice defense in depth to limit a small compromise from becoming much bigger.



Friday, March 13, 2015

Assert.DoesNotThrow Does not Preserve Stack Traces

Just a short post about an issue with an NUnit feature.

It has an assertion called DoesNotThrow that takes a delegate function as an argument. If the delegate throws an exception then NUnit will throw its own exception. The method is great at pointing out exactly where the code under test is. However, if an exception does occur, NUnit reports the exception type, but looses the original stack trace. There have been enough occurrences of lost stack traces that I've stopped using it. Now I just comment the actual test line and let the exception bubble up to the test runner.

Wednesday, March 4, 2015

Make Classes not Enums

The main C# code base I work with at work is full of generic objects and methods written to use enums to drive behaviour. I'm not sure of the reason. The people who created it have all since moved on, so I don't know who to ask. All I can do is live with the giant switch statements and undo them into classes when I can. Why classes? Let's take a look.

The full example code base on which the gists are taken is available in https://github.com/claq2/MakeClassesNotEnums.

The (Evil) Enums Way


Say you have a service that creates different kinds of backup jobs. One of its arguments is an enum that indicates what kind of job to create. It returns only 1 kind of object, Job, that is not abstract. This object represents all the different possible kinds of jobs. A property, JobType, indicates whether it's for backing up files or a SQL database.

In this example, creating a job means calling a single CreateJob method that takes an enum argument that indicates the type. This might seem contrived, as in "who would build this", but it's easy for a UI to consume. Show the user a drop down with the job types and call the one method with the user's choice in the enum.

In object oriented programming there is something called the Single Responsibility Principle. It states that a class should have only 1 reason to change. Another way to put it is that a class should have exactly 1 concern. The enum approach violates this in at least 2 ways.

First, the Job class represents all possible job types. It needs properties for every job type. A change to any job type or the addition of a new one means changes to this class. Some properties apply to all types and some apply to only some types. How do upstream developers of the Job class know which properties to fill in for each type? They have to look at the documentation. I hope they don't miss one of the comments.

Second, the JobService has 3 concerns - determining which type of job to build, verifying the values and mapping input values to properties. If any of these aspects change this class must be updated. The more you need to change it, the more risk there is that you will introduce a bug. Especially if the switch has many clauses with pages of complex logic.

The cyclomatic complexity of this method is already 7, meaning you need at least 7 unit tests to completely cover it. This approach will only get harder to test fully and understand. There's already a little extra logic for Sql2014Jobs. How long before the logic becomes so tied together that you have to write tortured logic to get exactly what you want?

Think about what load and save methods in the JobService class will look like. Nearly the same - more switches with extra if statements for the special cases.

The (Righteous) Classes Way


There are 2 sources of cyclometric complexity in a system - the conditions that the business defines and those created by developers. You usually can't reduce the business's conditions, but you should be able to control the developer-created ones. The way you do this in an object oriented system is usually with classes and inheritance.

In this example one might wonder why the interface doesn't already know what kind of job it need to create in response to the user's actions. Well, it does. Outside of a text-menu driven console app, there shouldn't be any logic needed to decide what to create. The user chose to click on a button or use a drop down to pick the job type. The UI now has enough information to call a job-specific method and expect a job-specific result. Hence, separate job service classes.



No conditional logic. The logic for SQL 2014 jobs is contained in a class that only has to do with SQL 2014 jobs. Any change to File jobs does not involve anything related to any other job or the services. As written, the services don't really add any value to consumers. They could just use the job constructors themselves. Imagine they had load and save methods, though. Those methods will be specific to the job types.

The logic for constructing the SQL 2014 job's instance name and for rejecting non-root path file job is now in those classes. There is no chance that a change to file job logic could impact other job types.

A Hybrid Approach


Let's say you have a system that looks like the first example that has a single method and for non-technical reasons you cannot easily change the interface. Are you stuck? No.

Change the service to be or use a factory, as in the factory pattern. Job-specific logic is still in separate classes and the service now has just 1 responsibility - to invoke the right service and return the result.

It is a lot more code than the original enum-only version, but it more closely follows the SOLID principles that make software more maintainable. The other classes are the same as in the class version, except the AbstractJob class. It now has an abstract ToJob method that each specific job class must implement to convert itself into a generic Job object. Again, this logic is encapsulated inside each specific class.

Summary


Enums and switches make for less code, but over time it becomes harder to understand and more easily broken. In an object-oriented language classes are there to segregate logic and responsibility for better maintenance. It's a little more code, but you will introduce less bugs.