Wednesday, November 2, 2005

Unwanted Stack Trace in Web Services

At one of my clients, I've been helping them port their product to .NET. As part of the process, we've added a web service façade that exposes a good chunk of their functionality. It's been really interesting figuring out how to do that correctly. One of the things I recommended is that they put a catch (Exception) block in every [WebMethod].

 

The idea here is to control completely what goes out on the wire. If uncaught exceptions are allowed to propagate outside the WebMethod, the pipeline will call ToString() on the exception object and stick the result into the soap:fault. That means that we're projecting stuff like our call stack and the exception type onto the wire. This is neither useful nor desirable to the clients of the web service. Even if it was, the format the information is in would be a pain to parse.

 

So instead, we've done something like this:

 

[WebMethod] FooResult DoFoo(FooRequest request) {

   try {

      CheckSecurity();

      return ExecuteFoo(request);

  }

  catch (Exception e)

  {

     throw LogAndMapException(e);

  }

}

 

Where LogAndMapException looks something like this:

 

SoapException LogAndMapException(Exception e) {

  LogException(e);

 

  CustomException ce = e as CustomException;

 

  if (ce != null) {

    return CreateSoapException(ce.Code, ce.FriendlyMessage);

  }

  else {

    return CreateSoapException("generic error message");

  }

}

 

CustomException is an exception that we wrote that is thrown when application-specific error conditions occur. When we get this, we know we have information that might be useful to the client, for example, "You do not have permission to do what you tried to do." In this case, we create a SoapException that contains a code and a convenient human-readable string for that error. In all other cases, we throw a SoapException that just contains a generic error message. So we return as much information as we have, but not more than the client needs.

 

The key here is that when SoapException is thrown, the pipeline knows that you're trying to make the soap:fault look a particular way, and it'll map the properties of the SoapException into the fault message, rather than simply dumping the result of ToString() into it. Very handy. Except when it stops working.

 

Which is exactly what happened to us the other day. Since we have a UI that displays the human-readable part of the fault in a web page, it was pretty obvious to us that we were getting the stackdump style of message, rather than the nicely formatted one. To add to the mystery, we couldn't recreate the error on any of our dev machines - only on the web farm we deploy to as part of every build.

 

But wait! There's more! There are two nodes in the web farm (we'll call them A and B), and as I was playing around with the setup, I found that the problem only occurred when I pointed the web UI on A to the web service on A. If the web UI on A talked to the web service on B (and vice versa) the problem went away. Even weirder, if I set up tcpTrace as a simple call redirector so that web service calls went from machine A to machine C back to machine A, things also worked correctly. And I tried addressing the calls using the machine's full name, "localhost" and "127.0.0.1", but none of that made a difference. As long as the call didn't leave the machine, the error text was wrong.

 

After much screwing around trying to get remote debugging to work (with little success, as usual with that technology), I asked a few friends if they'd seen this before. Fortunately, the inestimable Henk had. (Henk is one of the smartest guys I know - hire him if you ever get the chance.) It turned out to simply be a matter of adding

 

<customErrors mode="On" />

 

to the web.config for the web service. Poof! Problem gone. I suppose it makes sense in retrospect that this tag would control pipeline mapping of unhandled exceptions for web services the same way it does for ASP.NET web forms, but frankly I found the resulting behavior highly unintuitive. I also can't explain why it only failed on our Windows 2003 web farm, and not on any of the other Windows 2003 or Windows XP machines we tried. But at least we're up and running again.

6 comments:

  1. Craig-



    I have this suspicion that your Web farm didn't have the default ASP.NET error pages on it - when you have CustomErrors set to OFF, your web server will look for these pages and blow up with a 404 if it can't find them. I don't know if this sounds like it could have had something to do with your issues or not...



    -Dave

    ReplyDelete
  2. Rather than wrapping each method with a try catch, why not catch webmethod exceptions declaritively using my Exception Injection technique...



    http://haacked.com/archive/2005/06/29/7392.aspx

    ReplyDelete
  3. I should point out that you can change this technique to show or hide whichever information you wish. My implementation shows too much info for debugging purposes. But you'd want to change that before release.

    ReplyDelete
  4. So, I went down the SoapExtension route at first with that system, and ultimately abandoned it. There were several reasons, but in the end it turned out to be trivially easy to follow the catch pattern, and we didn't have to rely on magic invisible code running. We even enforce the convention via a custom FxCop rule.



    I'm not saying it isn't a good idea sometimes, just that it wasn't worth the tradeoffs in this particular system.

    ReplyDelete
  5. A New Codus Release

    [Via: breichelt ]

    ASP.NET Podcast Show #26 - The Guru of Gurus Bob

    Beauchemin...

    ReplyDelete
  6. thanks a lot for your solution, I've been trying to solve this for weeks.

    ReplyDelete