Sunday, August 21, 2005

Decade++

A few weeks ago, I blogged about a problem with SourceForge's bug/feature request tracker export feature. Specifically, they aren't escaping illegal characters in the generated XML. Very bad, particularly because it renders the output useless to conforming XML parsers like the ones in System.Xml.

 

The problem is, during business hours the trackers on SourceForge are so slow as to be nearly useless. I assume this is because they're overloaded. Whatever the reason, if all you want to do is scan through all the open bugs in FlexWiki, it's annoying: I'm not going to update anything, after all, and looking through 200 bugs when page loads take in excess of 30 seconds each is not an option.

 

So I set out to see what I could hack together quickly to fix the SourceForge export. And what I came up with is tracker-tidy. The source is at the bottom of this post. It's an ugly little program that does one thing: escapes & and < within a particular XML tag. It does this by scanning a text file one line at a time, looking for <foo>, where "foo" is whatever you specify on the command line. When it finds it, it replaces all occurrences of & and < with &amp; and &lt; (respectively) between that point and where it finds </foo>.

 

Like I said, it's ugly. It doesn't deal with attributes on the opening tag, it doesn't know jack about XML namespaces, and it doesn't replace the contents of more than one tag at a time (although it will do all tags with that name). Doing all that would have required a lot more than 70 lines of code. But what I have works against the SourceForge feed, although I have to run it several times, once for each tag that might have illegal content. I use wget and tracker-tidy together in this batch file:

 

wget http://sourceforge.net/export/sf_tracker_export.php?group_id=113273^&atid=665396 -O bugs.xml
copy /y bugs.xml input.xml
tracker-tidy summary input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy detail input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy old_value input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy text input.xml > intermediate.xml
copy /y intermediate.xml bugs-tidied.xml

 

wget http://sourceforge.net/export/sf_tracker_export.php?group_id=113273^&atid=665399 -O rfe.xml
copy /y rfe.xml input.xml
tracker-tidy summary input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy detail input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy old_value input.xml > intermediate.xml
copy /y intermediate.xml input.xml
tracker-tidy text input.xml > intermediate.xml
copy /y intermediate.xml rfe-tidied.xml

 

del bugs.xml
del rfe.xml
del input.xml
del intermediate.xml


 

This simply downloads the tracker bug and feature request exports for FlexWiki and tidies up the output with multiple passes of tracker-tidy (one for each problem tag in the export).

 

Once I have the tidied files bugs-tidied and rfe-tidied, it's a simple matter to open them up in Excel or InfoPath, or whatever your favorite XML-based tool is. From there, sorting and searching is dead easy. If I get really ambitious, one of these days I'll write an XSL to generate a nice little web report instead. But given how little effort went into getting something that works as well as this does, I sort of doubt it.

 

Anyway, hopefully this will help someone else. Obviously, you'll need to replace the URLs in the batch file with the group_id and atid for your project, but other than that it should work fine. Just remember to escape any & signs in the URL with ^, the command-line escape character.

 

using System;
using System.IO;
using System.Text;

 

namespace Wangdera
{
  public class App
  {
    public static void Main(string[] args)
    {
      string tag = args[0];
      FileStream inputStream = new FileStream(args[1], FileMode.Open, FileAccess.Read, FileShare.None);
      StreamReader inputReader = new StreamReader(inputStream);

 

      string startTag = string.Format("<{0}>", tag);
      string endTag = string.Format("</{0}>", tag);

 

      string line;
      bool inTag = false;
      while ((line = inputReader.ReadLine()) != null)
      {
        int tagStart = line.IndexOf(startTag);
        int tagEng = line.IndexOf(endTag);
        int escapeStart = 0;
        int escapeEnd = line.Length;
     
        if (tagStart != -1)
        {
          inTag = true;
          escapeStart = tagStart + startTag.Length;
        }
     
        if (tagEng != -1)
        {
          escapeEnd = tagEng;
        }
     
        if (!inTag)
        {
          Console.WriteLine(line);
        }
        else
        {
          string beginning = line.Substring(0, escapeStart);
          string middle = line.Substring(escapeStart, escapeEnd - escapeStart);
          string end = line.Substring(escapeEnd, line.Length - escapeEnd);
          Console.Write(beginning);
          Console.Write(Escape(middle));
          Console.Write(end);
        }

 

        if (tagEng != -1)
        {
          inTag = false;
        }

 

      }
    } 
 
    public static string Escape(string input)
    {
      StringBuilder builder = new StringBuilder(input);
      builder.Replace("&", "&amp;");
      builder.Replace("<", "&lt;");
      return builder.ToString();
    }

 

  }
}

No comments:

Post a Comment