The SOAP W3C specification rests on a number of other W3C specifications and on terminology typical of communications programmers. The specification language can be an obstacle if you’re just trying to understand SOAP (Simple Object Access Protocol). In this article, I present the core SOAP ideas expressed in a simple Web page and a Perl CGI program. The syntax is familiar (and wrong), but the ideas are right. Learning the ideas first and syntax later is an easy way to go.
What is SOAP?
SOAP, also called XMLP (for XML Protocol), provides a standard way for two programs to exchange information. This is a fundamental necessity if separate organizations are to cooperate electronically.
There are many ways to exchange messages, including e-mail, chat, and Remote Procedure Calls (RPC). E-mail and chat messages aren’t generally computer-friendly. E-mail headers are computer readable, but the typed content is not understood by a silicon brain. The same applies to chat. RPC, on the other hand, is computer readable but not human readable.
Computers do know how to understand XML. SOAP describes how to bundle messages into XML. It also describes who-sends-what-message-where-and-when. That is why it’s called a protocol. SOAP is not separate from e-mail protocols (SMTP), RPC (sockets & IDL), or Web protocols (HTTP). SOAP uses those systems as a starting point for its messages.
Standard form submission under the microscope
The sample files included in Listing A and Listing B (Simple.html and Simple.cgi) are about as basic as anything you’ll ever see.
The CGI program is just as simple: Read the form data, process it, and return the HTML page. The only oddity is that the arguments have been collected into a %args hash before use. In this simple case, that’s unnecessary, but the point is that all the form data can be put into one Perl data structure.
Let's examine this form submission from a protocol point of view. An HTTP POST request carries the form data to the server. The form data consists of name=value pairs. An HTTP response carries back the replacement Web page. The Web page is a document of any type, in this case type “text/html.”
None of the protocol content is very standard. The request content (form data) just contains whatever the NAME attributes of the form elements are. If the form elements change, the submitted data changes to match—the two are linked. Also, you can’t send two pieces of data with the same name without confusion, so it’s inflexible. The returned document, on the other hand, is too flexible; it can contain any rubbish.
It’s not a very clean messaging system either. The page that sent the original message gets changed, instead of just sending and receiving what it needs. Perl can return nothing (a 204 ‘No Content’ response), but then the response can’t carry a message back to the browser. From a pure messaging point of view, it’s all a bit unstructured with too many side effects.
Reworked submission with SOAP-isms
Consider the reworked Simple.html, now called Complex.html (Listing C), with two frames, Hidden.html (Listing D) and Visible.html (Listing E). The Simple.cgi is now Complex.cgi. It’s still pretty simple. The service to the end user is unchanged. All that has happened is that the messages to and from the server have been cleaned up to mimic SOAP.
Some changes are necessary but trivial. Simple.html (Listing C) now has two forms. Form 1 (Listing D) has all the text fields as before. Form 2 (Listing E) has the Submit button and a hidden field. When Form 2 submits, any response will be loaded into the "fhidden" frame. In fact, it will be a variation on the initial Hidden.html. This allows Visible.html to sit before the viewer without disappearing. That is SOAP-like. SOAP does its messaging behind the scenes and doesn’t replace anything.
In Listing F (Complex.cgi), the sent Perl data structure is dug out. No matter what it contains (as long as it’s a Perl literal), the magic eval() line will extract that data. That is very SOAP-like—the server can handle any structured data. How it handles the data depends on whether it recognizes the fields in the structure. In this simple example, Complex.cgi can work only if the content is understood. In SOAP terms, the request implies that mustUnderstand is set to 1 (true). The server must understand the message, or it will fail. In real SOAP, you have to set this flag if the understanding constraint is to apply.
Stepping into SOAP territory
Now, let's SOAP-ify the example. When the request is sent, it is in one form field called request. In SOAP, this is called the envelope—the outside boundary of the sent message. So use a SOAP envelope instead of a one-field form. An envelope is just an <envelope> content </envelope> tag pair.
Download the files covered in this article
The files are available in one zip file: perlsoap.zip.
Next, you have to get agreement on who mustUnderstand what. So, every time a message is sent, include in it extra data—a URL to a resource that describes the message’s format. Perhaps that resource is a DTD or an XScheme document. Now the recipient always knows what it is that must be understood without guessing. Then the recipient can say with total confidence, I don’t know what to do (or more likely, I do know what to do).
Finally, if you want, you can take away the browser. Just use two XML server programs talking to each other for business-to-business reasons. Keep the use of the HTTP request/response pairs, though, because SOAP often relies on them to do the grunt work even when no browser is involved.
SOAP is a protocol for passing messages. The most common arrangement for SOAP messages is a request/response pair, basically a question and answer. HTML form submission is also a request-response pair. SOAP is to form submission a bit like Microsoft Windows is to DOS. DOS does a really basic job, which is great if that’s all you want. Windows is more complicated, but once you’ve learned the detail, it’s more powerful too. You can mimic SOAP concepts with plain HTML forms, but for the real thing, learn the SOAP syntax and buy some three-phase power tools.