Tuesday, June 21, 2005

PerformanceCounter Constructor Horribly Slow

I'm doing some work on FlexWiki, and I'm finding that this call:


PerformanceCounter answer = new PerformanceCounter(s_PerformanceCounterCategoryName, name, false);


is sometimes taking a really, really long time to come back. Like several minutes. I started tracking it down via Reflector and native debugging, and it looks as though PerformanceCounter internally spins in this loop, inside a BCL-internal class called SharedPerformanceCounter:


int num1 = SharedPerformanceCounter.MaxSpinCount;
while (spinLockPointer[0] != 0)
{
  Thread.Sleep(1);
  num1--;
  if (num1 == 0)
  {
    spinLockPointer[0] = 0;
  }
}


Aside from the fact that this looks thread-unsafe to me, I'm not entirely sure what it's waiting for at this point. But given that MaxSpinCount is 10000, on my machine that seems to work out to around four minutes, so that explains where the delay is coming from. But what I don't understand is under what conditions this occurs - the SharedPerformanceCounter code is a bit hard to wade through, and I haven't made much headway with it.


In an RTFM moment, Shawn Van Ness suggested I look over at the Product Feedback Center to see if there were any similar bugs. I'd Googled, but hadn't landed there, although I should have thought of it. At any rate, when I did look I found this rather disturbing bug. Why disturbing? Because it implies that slowness in PerformanceCounter is to be expected. Although one could also read the bug report to merely indicate that the service hanging because of a long-running operation is the expected behavior. Either way, not very helpful.


Anyway, I haven't been able to figure anything out on this, so I'm blogging it. Ideally someone will know what I'm doing wrong. As a second best, perhaps the entry will serve as a sanity check for others with the same problem. Has anyone seen this constructor take a really, really long time to come back before?

8 comments:

  1. is this the cause of the mysterious slowdowns saw a few moths ago?

    ReplyDelete
  2. Possibly. Were the symptoms that it would get really, really slow every 20 minutes or so? Because the perfcounter stuff is an initialization step, but under the default IIS6 settings a lightly used web app will reset itself every 20 minutes or so.

    ReplyDelete
  3. i used perf counters several times, but never noticed this...that's disturbing

    ReplyDelete
  4. I can shed some light on these issues. PerformanceCounters use a machine wide memory mapped file to publish perf counter data. That file contains an object graph and various places we use as spinlocks.

    The problem comes in when processes get killed unexpectedly. A process which dies while holding a spinlock won't release it, meaning the next process gets blocked. After spinning 10000 times, we assume some process died and continue on, so we won't block forever - 4 minutes on your machine. My guess is that IIS is terminating your app when it resets it.



    In Whidbey we've fixed these problems. If you're interested, I can post a blog entry with more details.



    The bug you mention actually isn't related. It has to do with reading perf counters, and that really is horribly slow. It was also specific to services.

    ReplyDelete
  5. Ah! Very good - thanks for the wonderful answer. I still need to track the problem down, of course, but your guess of it being related to IIS reset is likely a good one.



    While I think others might be interested in the Whidbey fixes, it wouldn't really help me with my FlexWiki problem. For obvious reasons, we're pretty well tied to CLR 1.1 for the near future.



    Again, many thanks.

    ReplyDelete
  6. Craig-



    I uncommented all that code in my install and it fixed the slowness problems I told you about a while back.

    ReplyDelete
  7. Excellent. I've actually got an uncommitted change that lets you disable the perf counters. It has made unit testing reasonable again.

    ReplyDelete
  8. It doesn't seem to be fixed in Whidbey. I've just hit this with beta 2. It constantly happens in my code on .NET 2.0 Beta 2 (win2k + sp4) with VSTS. All I do is trying to set:

    System.Net.ServicePointManager.set_UseNagleAlgorithm



    Here is the callstack I see it sitting in:



    0356bec8 77f8915e [HelperMethodFrame: 0356bec8] System.Threading.Thread.SleepInternal(Int32)

    0356bf1c 7a74d94c System.Diagnostics.SharedPerformanceCounter.WaitForCriticalSection(Int32*)

    0356bf28 7a74d8d8 System.Diagnostics.SharedPerformanceCounter.WaitAndEnterCriticalSection(Int32*, Boolean ByRef)

    0356bf58 7a74e9c6 System.Diagnostics.SharedPerformanceCounter.TryReuseInstance(Int32, System.String, CategoryEntry*, InstanceEntry**, System.Diagnostics.PerformanceCounterInstanceLifetime)

    0356bfa4 7a74e229 System.Diagnostics.SharedPerformanceCounter.GetCounter(System.String, System.String, Boolean, System.Diagnostics.PerformanceCounterInstanceLifetime)

    0356c01c 7a74cc0f System.Diagnostics.SharedPerformanceCounter..ctor(System.String, System.String, System.String, System.Diagnostics.PerformanceCounterInstanceLifetime)

    0356c03c 7a73d78d System.Diagnostics.PerformanceCounter.Initialize()

    0356c070 7a73cea7 System.Diagnostics.PerformanceCounter..ctor(System.String, System.String, System.String, Boolean)

    0356c090 7a6a722f System.Net.NetworkingPerfCounters.Initialize()

    0356c0c0 7a701747 System.Net.Configuration.SettingsSectionInternal..ctor(System.Net.Configuration.SettingsSection)

    0356c0d8 7a701ad1 System.Net.Configuration.SettingsSectionInternal..cctor()

    0356c44c 796c03d3 [GCFrame: 0356c44c]

    0356cd54 796c03d3 [HelperMethodFrame: 0356cd54]

    0356cdb8 7a672f3b System.Net.ServicePointManager.set_UseNagleAlgorithm(Boolean)

    [...]

    ReplyDelete