Monday, June 12, 2006

Announcing the MTPS Content Service

Over the last few months, you've seen me drop vague hints about the work I'm doing at MSDN. Well, today we launched it at a TechEd chalk talk (DEVTLC03), so I can finally talk about it: the Microsoft/TechNet Publishing System (MTPS) Content Services.

 

In brief, the MTPS Content Services are a set of web services for exposing the content in MTPS. MTPS is the application I helped write a few years back that stores and processes all the content at MSDN2. With the web service, you now have programmatic access to all that data via SOAP. So if you want to embed access to the documentation for System.Xml.XmlTextReader into your application, go for it. If you want to know what the child nodes of System.DateTime.ToString() are in the table of contents, you can go and find that, too. I expect to see some fairly interesting uses of the service pop up in the near future. There's such a huge amount of good information in MTPS that I imagine lots of people will want to leverage it.

 

The web service is reasonably well-documented here (of course, I wrote the web service and the documentation, so maybe I'm not the best person to judge the quality of the docs), but let me give a brief explanation of how it works.

 

The web service consists of two operations: GetContent and GetNavigationPaths. GetContent - as you might imagine - allows you to retrieve content (XHTML, GIFs, etc.) from MTPS. GetNavigationPaths lets you get the table of contents (TOC) data for the items in the system. I imagine most people will use GetContent far more often than GetNavigationPaths.

 

The system is organized around the concept of a content item. A content item is a collection of documents identified collectively by a content key. A document has a type, a format, and some content. The document most people will probably be interested in is the document of type primary, format Mtps.Xhtml, but there are other documents associated with a content item as well (for example, images can be stored in the content item as well). See the docs for more detail.

 

A content key consists of three parts: a content identifier, a locale, and a version. The locale is something like en-us (US English) or de-de (German as they speak it in Germany). The version is something like SQL.90 (SQL Server 2005).

 

The content identifier is a bit more complicated. It can be one of five things:

 


  1. A short ID. This is an eight-character identifier like "ms123401".

  2. A content alias. This is a "friendly name" for the content item, like "System.Xml.XmlTextReader".

  3. A content GUID. Topics can also be identified by a GUID.

  4. A content URL. To allow for easy integration with the HTML front end of MTPS URLs like http://msdn2.microsoft.com/en-us/library/b8a5e1s5(VS.80).aspx can also be used to identify a content item.

  5. An asset ID. This is how topics are identified internally by the system, and they occasionally appear in the output. They always begin with "AssetId:".

 

With the exception of asset IDs, these are all the same pieces that you can already use in the URLs for MSDN2, so the concepts should be familiar if you've spent any time looking at that stuff.

 

There are two slightly funky (but highly intentional) things about what GetContent returns that you'll need to keep in mind. The first is that, by default, the body of the documents that make up a content are not returned. Unless you list a document in the requestedDocuments section of the request message, you'll just get the types and formats of the available documents. This is because documents can be quite large, and it would be a waste to transmit all of them every time.

 

The other thing to be aware of is the idea of available versions and locales. If you send in a request for content item ms123401, locale en-us, version MSDN.10, you'll get back that content item, but you'll also receive a list that will tell you that the content item is also available for locale/version fr-fr/MSDN.10 and locale/version en-us/MSDN.20. This list is particularly valuable when the content key you request does not correspond to a known content item - in that case it represents the best guess by the MTPS system for reasonable alternatives.

 

GetNavigationPaths has a few twists as well. First, there's the name. We seriously considered calling it GetToc, but it's not exactly TOC data, since it's used for other things, like that little trail of links (sometimes called "breadcrumbs" or the "eyebrow") at the top of MSDN2 pages. What it really returns is all the ways to navigate between two content items. Hence, GetNavigationPaths.

 

GetNavigationPaths accepts two content keys. In this case, the identifier in the keys must be a short ID. (If you need to, you can resolve a short ID from an alias, a GUID, a URL or an asset ID via a call to GetContent first.) The first key identifies the root, which is the content item you'd like to start at, and second key identifies the target, which is the content item you'd like to wind up on.

 

What you get back is a list of navigation paths between the root and the target. There might be more than one path, because a content item can appear in more than one place in the TOC. A navigation path is a list of navigation nodes, where each navigation node is made up of a title, a navigation node key, and a content node key. There's also some information about something called phantoms, but I'll defer that to the docs.

 

The title is fairly self-explanatory, but the distinction between a navigation node key and a content node key is somewhat less intuitive…I had to have it explained to me more than a few times when I was writing the system. Basically, it arises out of the fact that every node in the TOC is itself a separate content item in the system, whose content consists of a reference to the content item that TOC node represents and a list of child nodes. So the navigation node key is a content key (identifier plus version plus locale) that represents the TOC node itself, and the content node key identifies the content item the TOC node corresponds to. You can tell the difference between the two because the content item identified by the navigation node key will always have a primary document of format "Mtps.Toc".

 

Another way to look at it is that the navigation node key tells you where you are in the left hand tree of MSDN2, and the content node key tells you what goes in the right hand content pane.

 

Like I said, I don't expect as many people to use GetNavigationPaths as to use GetContent, so I wouldn't lose too much sleep over the details. Of course, if you do wind up using it (or any part of the system), I'd love to hear about it, or about how we could make it or the documentation better.

 

This was a very interesting system to write for a variety of reasons, but I think I'll save the "how" for another post. I've also got what I think is a pretty cool application of the service that (at least some) people are really going to like. More on that later, too.

 

We consider the system to be roughly in beta, as we already know several things we need to improve or change. That said, we feel good enough about it to turn the world loose on it. If you come up with any cool ideas about how to use the service, or ideas about how we could improve it, drop a comment here.

29 comments:

  1. "The web service is reasonably well-documented here": The link is not accessible. Is that an MS internal link?

    ReplyDelete
  2. Whoops! Link typo. Fixed. Thanks!

    ReplyDelete
  3. Craig Andera has been working with my team on a set of web services exposing MSDN2 content hosted by...

    ReplyDelete
  4. I like this. A lot. MSDN now has a web API into the MSDN and TechNet Publishing System (MTPS) MSDN2 (not...

    ReplyDelete
  5. Microsoft udostepnil API pozwalajace na bezposredni dostep do MSDN za pomoca uslugi WWW. Opis uslugi...

    ReplyDelete
  6. A few Microsoft-related news items of interest that occurred over the past week: The stories of Bill

    ReplyDelete

  7. Announcing the MTPS Content Service.

    Here’s an announcement from Craig Andera regarding how...

    ReplyDelete
  8. Voici un projet de plus dont je ne connaissais pas l’existence : la prĂ©sentation, sous forme d’un web...

    ReplyDelete
  9. After Craig Andera blogged about MTPS Content Service, Ian decided it will be cool to have a WPF App...

    ReplyDelete
  10. Je viens seulement de lire le post : Rendu public le 12 (mais non, je ne suis pas en retard dans...

    ReplyDelete
  11. Why I always get 0 when calling getNavigationPaths()?



    ContentService proxy = new ContentService();



    navigationKey root = new navigationKey();

    // This is the root of the MSDN English TOC tree - you'll

    // probably specify this as your root most often.

    root.contentId = "ms123401";

    root.locale = "en-us";

    root.version = "MSDN.10";



    navigationKey target = new navigationKey();

    target.contentId = "abhtw0f1";

    target.locale = "en-us";

    target.version = "VS.80";



    getNavigationPathsRequest request = new getNavigationPathsRequest();

    request.root = root;

    request.target = target;



    getNavigationPathsResponse response = proxy.GetNavigationPaths(request);



    Console.WriteLine("There were {0} paths between the root and the target.",

    response.navigationPaths.Length);

    int pathNumber = 1;



    // Loop through all the resultant navigation paths

    foreach (navigationPath path in response.navigationPaths)

    {

    Console.WriteLine("Path number {0}", pathNumber++);



    int nodeNumber = 1;



    // For each path, loop over all the navigationPathNodes

    foreach (navigationPathNode node in path.navigationPathNodes)

    {

    // For each node, print out the relevant information

    Console.WriteLine("Node number {0}: ", nodeNumber++);

    Console.WriteLine("Title: {0}", node.title);

    Console.WriteLine("Navigation node: {0}, {1}, {2}",

    node.navigationNodeKey.contentId,

    node.navigationNodeKey.locale,

    node.navigationNodeKey.version);

    Console.WriteLine("Content node: {0}, {1}, {2}",

    node.contentNodeKey.contentId,

    node.contentNodeKey.locale,

    node.contentNodeKey.version);

    //Console.WriteLine("Is Phantom? {0}", node.isPhantom); // does not work

    }

    }

    ReplyDelete
  12. SOrry, for the above post, I mean

    response.navigationPaths.Length is always 0, where I am wrong? thanks!

    ReplyDelete
  13. Yeah, sorry for the confusion: you should be using "ms310241" as the root ID. That's the Mtps.Toc root node - "ms123401" is the corresponding Mtps.Xhtml root node. GetNavigationPaths needs the TOC one.



    This is an error in the documentation that's entirely my fault. I'll get it fixed soon.



    Sorry again!

    ReplyDelete
  14. thanks, it works!

    ReplyDelete
  15. We are in the process of migrating the Office Developer Center and all the MSDN Library Office developer content to MSDN2! I love to blog about anything related to Office development, but this time I will share with you some information related with MSDN that you might find interesting.

    ReplyDelete
  16. We are in the process of migrating the Office Developer Center and all the MSDN Library Office developer content to MSDN2! I love to blog about anything related to Office development, but this time I will share with you some information related with MSDN that you might find interesting.

    ReplyDelete
  17. Je viens seulement de lire le post : Rendu public le 12 (mais non, je ne suis pas en retard dans mes

    ReplyDelete
  18. A few Microsoft-related news items of interest that occurred over the past week: The stories of Bill

    ReplyDelete
  19. Hi, i have some code which will bring back the Xhtml documents i want, but the imageDocument collection , while it has the name and type, I can not get the image data back.

    Can you please help.

    ReplyDelete
  20. You're going to have to be more specific about what you want and what you've tried - I don't understand your question.

    ReplyDelete
  21. How might one get a link for a generic type by type name. I've tried (for example):

    System.Collections.Generic.Dictionary
    System.Collections.Generic.Dictionary
    System.Collections.Generic.Dictionary`2

    No dice. Any thoughts?

    ReplyDelete
  22. Did you try retrieving it by short ID (xfhwa508) and looking at the info that comes back? Might give you some clue - I have no idea off the top of my head. I'll ask someone, but you can try that in the meantime.

    ReplyDelete
  23. Did you try retrieving it by short ID (xfhwa508) and looking at the info that comes back? Might give you some clue - I have no idea off the top of my head. I'll ask someone, but you can try that in the meantime.

    ReplyDelete
  24. I did try that. It returns a response, but the contentAlias field (which is where I'm assuming that friendly name alias should be) is blank.

    I'm really hoping not to have to write a manual lookup table. Heh.

    ReplyDelete
  25. OK, I've sent off a question to someone at MSDN. Will post here if I get an answer.

    ReplyDelete
  26. Thanks so much!

    If you need to reach me via email, I'm at chris at programcsharp dot com.

    ReplyDelete
  27. When I was trying to request for 'mtps.annotations' documents by calling 'GetContents', I've always got something like this:(for example: "dd831853","en-us","VS.100")

    EXCEPTION:
    System.Web.Services.Protocols.SoapException
    MESSAGE:
    Server was unable to process request. ---> There was an error generating the XML document. ---> Reference to undeclared entity 'nbsp'. Line 9, position 346.
    -----


    And some like this: (for example: "AssetId:msdn%7cMSDN_2E%7c%7eV2%5cMain.Menu.htm","en-us","msdn.10")
    EXCEPTION:
    System.Web.Services.Protocols.SoapException
    MESSAGE:
    content identifier not found.

    and... few weeks ago, I've found a wrongly typed word "messsage"(three-'s') in this page: "http://services.msdn.microsoft.com/ContentServices/ContentService.asmx?wsdl"

    Is there some relativity between these problems ?

    ReplyDelete
  28. So, it turns out that generics do not get aliases. The < in the name complicated the URL mapping the first time around, so the MSDN team elected to leave them out. There's a chance that they'll address it in the future, but for now: no aliases.

    ReplyDelete
  29. @tsuui: Hmm. Good question. It has literally been years since I worked on that code. I will forward your question to the MSDN team and post here if I find anything out.

    ReplyDelete