SIP 101 – What’s it for?

SIP 101 – What’s it for?

SIP is a term that has lost its true technical meaning in the hands of many non-technical writers. Most of the time, this doesn’t matter. Who cares if someone says “SIP carries voice” in the same way that someone talking about the iconic sight on the London skyline, “Big Ben”, intends to reference the clock tower and not the bell. Be still, learned scholars; we know what they meant!

However, I felt it might be helpful to empower the non-technical among us by demystifying some of the technical terms and uncover their true meanings.

Let’s take the biggest offender first – Session Initiation Protocol or SIP for short.

This term, often referred to as SIP Trunking, is sometimes used quite generically to refer to the transmission of voice communication over IP-based networks (like the Internet) or VoIP. VoIP, however, should be used as the more generic, non-technical term because SIP is something quite specific.

The clue is in its full name. SIP can be considered a set of instructions two parties exchange to route (find a path between the parties) and maintain communication. It’s often called signalling, which negotiates how the audio or video data will be communicated, rather than the stream itself.

Those who know me appreciate that I love analogies. I’ve always used one for SIP that is even more topical in our geopolitically fractious world – the international meeting. Let’s consider two trade delegations from different countries, each with its distinct language and cultural traditions.

They agree on a protocol so that communication can be as successful as possible. The parties might set rules such as the language and currencies used, the number of participants from each state, dress code, food and drink served, and even what NOT to mention. This most of what SIP does, rules are negotiated and agreed to set a clear path for audio (or video) transmission. Much like the diplomatic analogy, where the negotiation protocol is separate from the actual content of the discussion.

Unlike the world of international politics, this is done in fractions of a second and unnoticed by the user, so the conflation of SIP, VoIP, RTP, Media and other terms is perhaps excusable. To help clear up some of the confusion, I’ve provided some of those definitions below.

SIP Trunk

This is a hangover from the old world of telecoms, where “trunk” referred to a bundle of phone lines used by a PBX (more on that later) or collection of telephones shared with its users. Today, a “SIP Trunk” is often bought when connecting a PBX to a VoIP provider to replace traditional ISDN or Analogue lines. It usually allows several “channels”, which are the total number of concurrent calls allowed to take place between the PBX and the outside world.

SIP URI (Session Initiation Protocol Uniform Resource Identifier)

Much like in the traditional world of telephony, where you have a telephone number that identifies another person or service you want to call, the SIP URI is an identifier that allows one user to contact another. You can think of it as a bit like a “SIP phone number”. It often looks a bit like an email address ( Those with skills and control over their IT systems often make their SIP URI, and Email address appear to be the same, although they do very different jobs.

User Agent

From a typical user perspective, this is an endpoint on a network, that is to say, usually a physical or sometimes software-based phone that sends and receives SIP messages. There are two potential functions of a “User Agent”. Firstly, the UAC (User Agent Client) sends SIP requests, such as a request to accept a call. Secondly, the UAS (User Agent Server) receives such requests and returns a reply, often accepting, rejecting, or setting parameters for a call.


When parties agree to participate in a VoIP call, they set parameters for the call, such as quality, source and destination IDs or any restrictions. These attributes are usually maintained for the following call unless updated by either side. The information about call parameters is retained as a session; that is, an agreement about a set of attributes held over some time (a call).

It would waste bandwidth and processing power if every packet of Real-Time Media (more on this later) required a complete set of call parameters to be continuously retransmitted. Instead, a “session” is given a Call ID for the overall call period. Each party involved stores the complete session attributes locally, only referring to the session by a shortened identifier after the initial agreement. This makes ongoing communication faster and more efficient.

PBX (Private Branch eXchange)

PBX usually refers to the telephone system used, primarily by businesses or other large organisations, to manage internal and external telephone calls. Historically, vast amounts of cabling connected internal telephone handsets to this system, which in turn managed connections to the outside world via phone lines provided by telecoms providers.

Today the concept is the same, but externally, multiple traditional phone lines have been superseded by Internet connections and SIP Trunks. Internally, it’s increasingly common for Wireless transmission to be deployed to connect phones and computers.

The vPBX (or virtual PBX) allows the traditional power-hungry appliances often found in basements to be replaced by smaller computer servers running multiple applications. It’s even possible today to move PBX Systems into public and private clouds entirely, thanks to modern VoIP and Telecoms providers.

RTP (Real-time Transport Protocol)

The digital representation of voice and video is transmitted across networks in RTP “streams”. The packets of data in these streams need to be transmitted and received extremely quickly due to the nature of human communication. We all know how jarring it can be when making international phone calls. Even 1-2 seconds of delay can lead to confusion and a poor general experience. Of course, email or text-based chat doesn’t require almost instant transmission due to how our brains work, so voice and video needed a special protocol.

RTP is typically used along with SIP, where the latter sets the rules discussed earlier and makes/cancels a call request, but the former carries the actual voice and video data involved from each party in the call.


In our context, this term is relatively generic. It refers to the communication content between parties, such as the voice or video. It is often used to differentiate an area of technology being discussed, especially in troubleshooting situations. For example, an engineer might suggest that a problem is either related to signalling or media, that is to say, SIP or RTP, respectively, to follow the examples above.


Codec is a combination of the words coder/decoder. For voice or video to be transmitted across computer networks, the raw audio or visuals need to be digitised – converting natural analogue to digital. This process requires digital encoding on the side making a transmission, for example, from a microphone’s electrical signals, followed by decoding on the side receiving the call to be converted back into electrical impulses and ultimately vibrational sound through a speaker.

When setting up a VoIP call, a part of the SIP negotiation agrees on the quality and bandwidth implemented during the call. The quality and bandwidth used are dictated by the “Codec” choice. Simply put, some Codecs are better at different things, be that efficient use of network bandwidth, higher quality or a balance of both. It’s worth remembering that developers often license their Codec software, so the hardware or software provider often pays for this when implementing the technology.

RFC 3261

This one sounds worse than it is, although it is thankfully used sparingly in conversation. Simply put, it is the reference number for the complete technical definition of what SIP is, in the form of a document written for the IETF (Internet Engineering Task Force). This working group defines the standards that make most of our modern technologies work today. RFC means “Request for Comments”. This prefix suggests a working document that is occasionally updated by the skilled engineers involved in this prestigious working group.

I hope this post has dismissed many misunderstandings about SIP and has gone some way to improve understanding of the opaque terminology found within the technology. Of course, there is much more to SIP and VoIP in general, but I’ll leave those articles for our experienced engineers in the future.