WebSockets is an advanced technology that makes it possible to open an interactive communication session between the user’s browser and a server. With this API, you can send messages to a server and receive event-driven responses without having to poll the server for a reply. – Mozilla Foundation
Before going into websockets, I would to speak about how websockets came into being and why they are such a revolutionary piece of technology. To understand that we need to look at how systems were built previously and the problems they faced.
Client Server Communications
How does a client communicate with a server?
Well, there are 100’s of web protocols available like HTTP, FTP, SMTP, POP3, etc. and lower level transport protocols like TCP and UDP(common usage: games).
I’m not going to go into a discussion about each of these protocols here, but out of all these, HTTP(and later HTTPS) became, and still is. the go to standard for MOST applications. Over this HTTP protocol there we some other protocols and standards built, the most common being REST and SOAP.
SOAP used to be the standard, but not REST is becoming the go to solution for many simple web applications. This is because,
- SOAP is a heavy Protocol. This is because, not only does it carry the required data, it also carried instructions and protocols on how the client and serve should communicate. This means longer latency, more bandwidth usage, and longer processing times. Specially since the world is moving towards mobile, where processing power is a scarce resource.
- SOAP uses XML, meaning once again the size of data transferred is much larger than it needs to be, and processing take longer.
There are a number of scenarios where SOAP is better than rest, however the general rule of thumb is unless you have a solid reason to use SOAP, use rest. If you want to read more on the REST vs SOAP debate checkout this post.
So, now we have REST. The standard in a great number of web, mobile and desktop application. If you want to know more about rest there are plenty of resources online with in depth tutorials. Choose one related to the language you’re familiar with and get started.
Now, the principles of rest are (on a basic level) easy enough to understand and use. You send a HTTP request, specifying a endpoint and some headers, and you get a response back from the server. You can also specify some data and send that along the request as well. This works great about 90% of the time. However, what happens when:
- The data you request is not immediately available: This could be for any reason, either some processing needs to be done(common when requesting for a report).Then what normally happens in a REST service is the server will return a indication that data is not available along with some identification as to the task being processed, some advanced services will also return a ‘Retry-After’ time estimate. The client then has to keep polling the server until the server responds with the data.
- Getting Data In Real Time: Take the classic chat application as an example. If client 1 sends a message, how will client 2 know that user 1 has sent a message? Once again, this is polling. The other clients have to keep polling the server, or else they will not get the message which has been polled.
Now what if, instead of the client polling the server for data, the server could push the data to the client? This is sadly not supported by the REST protocol. The serve cannot initiate communications with a client. And most of the time a client request is not kept open for long periods of time(This depends on the language, the server container and the hardware).
And now, finally we come to websockets. One of the biggest breakthroughs in client/server communication in the past few years. Websockets introduce a part of the web that was missing, full duplex communication. Or, in simpler terms; bi-directional communication, simultaneously, with very low overhead.
This means that the client no longer has to keep polling the server for data , the server can instead ‘push’ data back to a connected client.
Now websockets aren’t the first attempt at real time web communications. There have been a variety of technologies like LiveConnect or The forever-frame technique, however websockets solve a few of the main issues that other libraries did not.
- WebSockets account proxies and firewalls, making streaming possible over any connection. This was a major issue with some of the previous attempts. The Websocket protocol simply detects any proxies and creates a tunnel.
- Support upstream and downstream communications over a single connection.
- Less burden on servers, allowing existing machines to support more concurrent connections.
How Websockets Work
Websockets use a single tcp port for traffic in both directions. They use HTTP as a transport layer to benefit from existing infrastructure (proxies, filtering,
authentication). Such technologies were implemented as trade-offs
between efficiency and reliability because HTTP was not initially
meant to be used for bidirectional communication. Websockets address the
goals of existing bidirectional HTTP technologies in the context of
the existing HTTP infrastructure; as such, it is designed to work
over HTTP ports 80 and 443 as well as to support HTTP proxies and
intermediaries, even if this implies some complexity specific to the
current environment. This however does not mean that the protocol will be bound only to the HTTP protocol, future implementations could implement a simpler.
Creating a Connection
The client will send a normal HTTP request to the server so that it is compatible with HTTP-based server-side software and intermediaries, meaning a single port can be used by the HTTP clients and WebSocket clients communicating with the server.
Initially the client will send a HTTP upgrade request to the server. This is basically a normal HTTP request except for the following headers, in no significant or particular order:
Upgrade : websocket Upgrade: websocket Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: http://example.com Sec-WebSocket-Version: 13
Below is a sample of the response, please not these are not strictly limited to websocket specific headers.
The server will then process each of the headers and then return a response to the user.
The response will have a status code of 101, which is to switch protocols as you can see in the below image.
The response headers are as follows:
Key things to note:
The status code is what determines whether the handshake was successfully completed or not, any other status other than 101 indicates the handshake DID NOT complete and normal HTTP semantics apply.
Apart from this the serve will respond with the necessary ‘Connection’ and ‘Upgrade’ headers.
The significance of the Sec-WebSocket-Key and Sec-WebSocket-Accept is simply websockets a mechanism to defend against Cross site scripting or cross site request forgery. How it works is the server will get the Sec-WebSocket-Key sent by the client and concatenate the string with a Globally Unique Identifier(GUID), 258EAFA5-E914-47DA-95CA-C5AB0DC85B11.So then we are left with something like this string: 258EAFA5-E914-47DA-95CA-dGhlIHNhbXBsZSBub25jZQ.
This is in string form. The string is then hashed using SHA-1(look here to see why this might change) and the hash is then base-64 encoded which results in: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
This is then returned to the client in the Sec-WebSocket-Accept header. The client will then check for scripted pages.(The keys won’t match). If the Sec-WebSocket-Accept value does not match the expected value, if the header field is missing, or if the HTTP status code is not 101, the connection will not be established, and WebSocket frames
will not be sent.
Data is transferred between the client and the serve using a series of frames.
Each of these frames HAVE to be masked by the client. This is mainly to avoid issues with proxies(which are a HUGE part of the internet).
A server WILL close the connection if it receives an unmasked frame and might respond with status code 1002, protocol error.
A server WILL NOT mask frames sent to a client, if the client receives a masked frame, it WILL close the connection.
Control frames are used to communicate state about the WebSocket and can even be interjected in the middle of a fragmented message.
Control frames are identified by opcodes where the most significant
bit of the opcode is 1. Currently defined opcodes for control frames
include 0x8 (Close), 0x9 (Ping), and 0xA (Pong). Opcodes 0xB-0xF are
reserved for further control frames yet to be defined.
All control frames MUST have a payload length of 125 bytes or less
and MUST NOT be fragmented.
Data frames carry application-layer, with an opcode to define what the data includes(Binary/Text so far).
Data frames are identified by opcodes where the most significant bit of the opcode is 0. Currently defined opcodes for data frames include 0x1 (Text), 0x2 (Binary).
Before sending data the client should check if the connection state is OPEN. If the data is too large then the data may be sent in a series of frames.
Closing a Connection
A connection is closed by sending a control frame with an opcode of 0x8.
This frame may contain a body to specify why the connection was closed.
Once a client sends a close frame, no more data should be sent to the server.
After both sending and receiving a Close message, an endpoint
considers the WebSocket connection closed and MUST close the
underlying TCP connection. The server MUST close the underlying TCP
connection immediately; the client SHOULD wait for the server to
close the connection but MAY close the connection at any time after
sending and receiving a Close message.
So there we have it, a short,not so short introduction to websockets, why they came into being, what purpose they serve, and a BASIC idea of how websockets work under the hood. However, it is really unlikely that a user will ever use the underlying interface or implementation considering that there are a number of robust libraries available that handle the low level plumbing for the user. Visit the mozilla documentation on websockets to find some of the suggested libraries as well as user friendly documentation of the websocket API and some guides to writing websocket servers.
If you want an in depth knowledge of what happens under the hood, visit the official rfc6455 documentation by IETF to find out all you need to know and then some.