Angle Brackets are Dead. Long Live XML.
If you take a look at the XML Information Set specification, you will find this near the top:
“This specification defines an abstract data set called the XML Information Set (Infoset). Its purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document.”
http://www.w3.org/TR/xml-infoset/
XML is so commonplace today that it is easy to overlook the importance of this statement. One of the core concepts behind XML is the idea of a logical infoset that says nothing about angle brackets and the textual representation that you and I will instantly recognize as an XML document. The logical infoset is little more than a tree of nodes which theoretically can be represented in any number of ways Understanding this is one of the keys to understanding WCF at a deeper level than the rest of the world.
If you’ve worked with XML in the .NET framework, you know that there are multiple ways to deal with XML. .NET 1.0 gave us the most commonly used XML classes: the XmlDocument, the XmlReader and the XmlWriter.
If you take a look at the methods exposed by the XmlWriter class, you will notice a lot of what appear to be helper methods. Methods like WriteBase64 and WriteValue take non-text arguments like byte arrays and dates and encode the values into XML representations. For instance, take a look at the following code:
using (Stream output = File.Create("output.txt"))
{
using (XmlWriter xmlWriter = XmlWriter.Create(output))
{
xmlWriter.WriteStartElement("rawBytes");
byte[] rawBytes = new byte[1000];
byte[] data = new byte[] { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33 };
data.CopyTo(rawBytes, 0);
xmlWriter.WriteBase64(rawBytes, 0, rawBytes.Length);
xmlWriter.WriteEndElement();
}
}
Executing this code will produce the following text in the output XML document:
<rawBytes>SGVsbG8gV29ybGQhAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==</rawBytes>
No surprises here. Our data is base 64 encoded just like the method implies. However, as I said before, the XML infoset is logical and doesn’t define encoding. What if I had a lot of binary data that I wanted to send to an endpoint, and I didn’t want to waste all those extra bits with base 64 encoding? Maybe I would want to write that base 64 data out without any encoding to save space.
As it turns out, there is a specification for doing exactly that type of thing, which is known as MTOM. MTOM borrows some ideas from e-mail servers, which have been doing this type of thing forever with email attachments and brings the same concept to the XML world. MTOM splits a message up into “MIME parts” which can each be encoded a different way. One part of the message will contain the plain XML, and all the binary data will be moved to its own section, where we don’t have to worry about special characters and encoding. By telling .NET to create an MTOM XML writer instead, we can see how this works:
using (Stream output = File.Create("output.txt"))
{
using (XmlWriter xmlWriter = XmlDictionaryWriter.CreateMtomWriter(output, Encoding.UTF8, Int32.MaxValue, "text/xml"))
{
xmlWriter.WriteStartElement("rawBytes");
byte[] rawBytes = new byte[1000];
byte[] data = new byte[] { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33 };
data.CopyTo(rawBytes, 0);
xmlWriter.WriteBase64(rawBytes, 0, rawBytes.Length);
xmlWriter.WriteEndElement();
}
}
Executing this code produces the following output:
MIME-Version: 1.0
Content-Type: multipart/related;type=”application/xop+xml”;boundary=”4f98bb98-f37b-412a-b90e-dbe8f939d214+id=1″;start=”<http://tempuri.org/0/634011443993914940>”;start-info=”text/xml”–4f98bb98-f37b-412a-b90e-dbe8f939d214+id=1
Content-ID: <http://tempuri.org/0/634011443993914940>
Content-Transfer-Encoding: 8bit
Content-Type: application/xop+xml;charset=utf-8;type=”text/xml”<rawBytes><xop:Include href=”cid:http%3A%2F%2Ftempuri.org%2F1%2F634011443993924705″ xmlns:xop=”http://www.w3.org/2004/08/xop/include”/></rawBytes>
–4f98bb98-f37b-412a-b90e-dbe8f939d214+id=1
Content-ID: <http://tempuri.org/1/634011443993924705>
Content-Transfer-Encoding: binary
Content-Type: application/octet-stream
Hello World!
–4f98bb98-f37b-412a-b90e-dbe8f939d214+id=1–
Notice that our XML chunk in the first part of the document has been replaced with a special “XOP” reference node which instructs the recipient that the data can be found in a MIME attachment, and the binary data is no longer base 64 encoded. This looks nothing like the first example and is certainly not the type of XML document you are used to seeing on a daily basis; however, it represents the same logical infoset as the first example. If we had an MTOM reader available (which we do), we could read this document and work with its content the same way as any other XML document.
So let’s extend this idea a little bit. Let’s suppose that we wanted to represent a raw binary file using an XML infoset. How would we represent it? The basic problem should be obvious: most files aren’t XML documents and most files aren’t encoded using MIME parts. But remember that XML and MIME are nothing more than different ways to encode the infoset. If it’s possible to write the binary data as a MIME part, there is no reason we can’t just write the binary data and exclude all the MIME wrappers. To do that, we can create a simple XmlWriter implementation (we’ll leave a lot of the methods unimplemented since our schema doesn’t support anything but our raw binary format):
public class RawXmlWriter : XmlWriter
{
public RawXmlWriter(Stream stream)
{
_stream = stream;
}
Stream _stream;
public override void WriteBase64(byte[] buffer, int index, int count)
{
_stream.Write(buffer, index, count);
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
if (localName != "rawBytes")
{
throw new InvalidOperationException("This xml writer only supports raw binary data");
}
}
public override void WriteEndElement()
{
}
public override void Close()
{
_stream.Close();
}
}
The rest of the methods, we’ll leave unimplemented, since we are creating a writer which supports a very limited schema. Now we change our code to use our Raw XML Writer:
using (Stream output = File.Create("output.txt"))
{
using (XmlWriter xmlWriter = new RawXmlWriter(output))
{
xmlWriter.WriteStartElement("rawBytes");
byte[] rawBytes = new byte[1000];
byte[] data = new byte[] { 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33 };
data.CopyTo(rawBytes, 0);
xmlWriter.WriteBase64(rawBytes, 0, rawBytes.Length);
xmlWriter.WriteEndElement();
}
}
And if we take a look at the file on disk, we see the raw bytes from our array written to disk. Creating a RawXmlReader would allow us to read in any file from disk and use its data anywhere we can use an XmlReader.
So what does this have to do with WCF? Well, this is exactly how WCF works under the covers to support multiple forms of message encoding while using infosets to represent the data. When using bindings such as net.tcp, a different implementation of XmlReader and XmlWriter are plugged in which use an optimized XML infoset representation that significantly reduces the overhead imposed by a standard text encoder. When sending data using the REST features of WCF, the WebMessageEncoder uses the same strategy we employed in the final example to read and write raw binary data.
The abstract XML infoset is a powerful concept. By always remembering that the infoset itself is logical, we can enable our application to read or write any format imaginable with a common interface and a simple tree representation.


This is awesome. One time I overrode ToString() to look up its value from a central web server. That way I knew who was using my code and that they were using it right. That's why I'm a big fan of you overriding WriteBase64 and completely ignoring the need for it to actually be base64. Brilliant!
I also like the check for "rawBytes". Some people get mad at me when I ovverode equals so that it failed unless you put in a magic string. I'm glad to that WCF uses the same logic I do.
I used to agree with OldSchool until I developed some interfaces that were completely broken by WCF sp1. I have a lot of experience writing enterprise code and for 6 years I worked for a very large software company in the Pacific Northwest you have heard of.
I can tell you a bunch of devs there would not use WCF because of this. I’ve also studied the WCF code base a little and I am horrified by the lack of if statements. One of the first things I look for in a code base is if statements because that means there is probably error handling.
Finally, you’ll notice BizTalk is not written in WCF. BizTalk is easily one of the best products Microsoft has ever created. Sure they added some WCF adapters but the core is not WCF because WCF doesn’t scale and it’s not as intuitive as the BizTalk Adapters and Pipelines which are enterprise strength because they are built with COM and use COM features like Transactions.
This is also a great example of how design patterns produce better frameworks. The XmlReader/Writer apply the "Builder" pattern which turned out to be very useful.