You know the basics of how VoIP works: Voice signals convert to digital data that’s broken into packets and sent over the Internet or any TCP/IP network. But you may be confused by all of the protocols you hear about in connection with VoIP. What are the differences among them? How do they interact? Why are there so many? Here’s a look at common protocols used in VoIP communications.

Call-signaling protocols

The most frequently referenced VoIP protocols are the call-signaling protocols. VoIP networks use these protocols to locate the device at the other end of the communication and then negotiate the exchange between the sending and receiving devices.

Here are the two most often used call-signaling protocols:

  • Session Initiation Protocol (SIP), defined by the Internet Engineering Task Force (IETF)
  • H.323, defined by the International Telecommunications Union (ITU)

These two protocols basically do the same thing, and most VoIP devices use one or the other. Under the hood, the two protocols work differently to accomplish the establishment of a VoIP connection; SIP is ASCII-based, and H.323 is binary-based.

Although H.323 was by far the more popular at first — and many feel it’s superior in its ability to work with the public switched telephone network (PSTN) and to transmit video — SIP has become increasingly popular due to support from the devices of many VoIP vendors. Many users also find SIP to be easier to deploy.


SIP is an application layer protocol that provides a means for identification of the calling and called numbers, authentication of the caller and recipient, and forwarding of calls. In identifying the caller and recipient, SIP addresses are similar to the PSTN with phone numbers, but SIP addresses look a little like e-mail addresses; the format is

Users register their addresses with SIP servers called registrars, and the caller sends a SIP request to the server. Users can send SIP messages over either TCP or User Datagram Protocol (UDP).

You can put links to SIP addresses in a Web page or other HTML documents so users can click it to place a voice call. For a detailed discussion of how SIP works, check out this reference.


H.323 is a suite made up of a number of many different protocols that perform specific tasks together. Some members of the suite include:

  • H.225.0, which establishes the connection
  • H.332, used for large conferences
  • H.235, which provides security and authentication
  • H.245, which negotiates channel usage
  • RAS, which handles registration, admission, and status messages

For a complete list of the H.323 protocols and to see what each one does, check out this reference.

Gateway protocols

A gateway, in its generic sense, is a device that provides an interface between two types of networks. A VoIP gateway connects an IP-based network to the PSTN or to a regular analog phone. VoIP gateways have two parts:

  • The media gateway controller (MGC), also known as the soft switch
  • The media gateway (MG)

Another set of protocols, called device control protocols, separate the call control logic from the media processing logic in VoIP gateways. Examples of these protocols include:

  • Media Gateway Control Protocol (MGCP)
  • H.248 (also known as Media Gateway Controller or Megaco)

The request for comments (RFC) protocol 3435 defines MGCP. It uses a call agent that directs and controls the MG and signaling gateway. Multiple call agents create fault tolerance. The MGC uses MGCP to find the locations and capabilities of the VoIP endpoints.

The IETF uses the name Megaco, and the ITU uses H.248 to refer to the same protocol. The two organizations developed the protocol through a joint effort. This outgrowth of MGCP provides remote control of VoIP gateways and other session-aware devices. MGCP and Megaco are similar, but Megaco supports more types of networks, including ATM networks.

VoIP networks built on a centralized architecture typically use Megaco and MGCP; the MGC/call agent is the centralized device that communicates with the media gateways. Networks that rely on a distributed architecture use SIP and H.323.

For more information about MGCP and Megaco and to learn how they work, read this article.

Real-time Transport Protocol (RTP) and related protocols

Once the MG extracts the voice signal from the PSTN circuit, the RTP carries it across the TCP/IP network. RTP is a standard for transmitting audio and video over IP networks. RFC 3550 defines it, and it works in conjunction with SIP or H.323. A VoIP call uses two RTP streams, one going in each direction.

RTP typically uses high numbered ports (16384 to 32767), but there’s no standard port of RTP communications. RTP itself also doesn’t provide for quality of service (QoS). RTP works with the RTP control protocol (RTPC), which provides the control information for the RTP communications. RTP handles the transmission of the data itself. RTPC can collect information (packets sent, packets lost, etc.) to report QoS issues.

The Secure Real Time Transport Protocol (SRTP) provides encryption, authentication, and integrity for RTP data. Secure RTCP (SRTCP) provides the same security services to RTPC. SRTP and SRTCP use the Advanced Encryption Standard (formerly known as Rijndael), adopted by the U.S. government to replace the Data Encryption Standard.

Proprietary protocols

Not all VoIP implementations use the standard protocols. Skype and some other VoIP services use proprietary protocols. The Skype protocols operate in a peer-to-peer setup instead of the client-server configuration used by most VoIP clients. Because its code is closed source, it’s difficult to get information about its protocols and how they work.

You may also hear about Skinny or Skinny Client Control Protocol (SCCP), which is a proprietary protocol used by Cisco for communications between its Call Managers (an H.323 proxy) and its VoIP phones. The H.323 proxy uses SCCP to communicate with Skinny clients.


It’s easy to get confused when trying to sort out the maze of protocols used for VoIP communications, but understanding the protocols is the first step toward understanding how VoIP works — and what implementation will work best for your organization.

Deb Shinder is a technology consultant, trainer, and writer who has authored a number of books on computer operating systems, networking, and security. She currently specializes in security issues and Microsoft products, and she has received Microsoft’s Most Valuable Professional (MVP) status in Windows Server Security.

Want more tips and tricks to help you plan or optimize your VoIP deployment? Automatically sign up for our free VoIP newsletter, delivered each Monday!