Home Insights Developing real-time applications on AWS ELB without WebSocket
Developing real-time applications on AWS ELB without WebSocket

Developing real-time applications on AWS ELB without WebSocket

Recently, we started working on a mini chat application for a client support team. Though we expected a quick and easy implementation, we hit on some unexpected issues. We built the UI for our real-time application, wrote unit tests, and prepared some routes and controllers on the back end. Everything went fine up until we ran our first test – and that’s when our progress came to a halt. Our handshake request returned “Error during WebSocket handshake: Unexpected response code: 400.” It turns out that our client was running AWS Elastic Load Balancer (ELB) using Classical Load Balancing (CLB), and CLB doesn’t support WebSocket on the HTTP/S protocol, forcing us to find an alternative that would be compatible with our client’s existing platform.

Diagnosing the problem

Our stack used Angular 2 on the front end and Node.js with Express.js on the back end, plus WebSocket for full-duplex communication. The main application was running on AWS EC2 instances, fronted by CLB. 

AWS offers three types of load balancing: Classical Load Balancing (CLB), which routes traffic based on either application- or network-level information; Network Load Balancing (NLB), which operates at the connection network-level based on IP protocol data; and Application Load Balancing (ALB), which routes traffic based on advanced application-level information that includes the content of the request.

Let’s compare them:

Classical load balancer (CLB)

CLB is a key architectural component for many AWS-powered applications. It was released in 2009, runs at Layer 4 (transport), and is unaware of HTTP/S packets. Also, it doesn’t support WebSocket on the HTTP/S protocol; WebSocket communication gets blocked at ELB.

Network load balancer (NLB)

NLB is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. NLB supports long-lived TCP connections that are ideal for WebSocket type of applications.

Application load balancer (ALB)

ALB is relatively new; it was released in August 2016. From its documentation we know that “The Application Load Balancer supports two additional protocols: WebSocket and HTTP/2,” and “it provides native support for WebSocket via the ws:// and wss:// protocols.” 

So, it would seem, we could have fixed our issue by switching from CLB to NLB or ALB – in theory.

Unfortunately, we could not make that switch because our client’s legacy platform was configured to work with CLB. The client had private proxy servers that were not configured to support WebSocket. The Classical Load Balancer covered all their current needs. It was not worth the effort to make the switch just for a small chat application.

Almost every other option we had was unworkable as well, because they would have required major configuration changes at the platform level. For instance, HAProxy would have been a potential alternative to CLB, but with autoscaling, it’s pretty hard to dynamically add or remove instances to HAProxy. Another alternative would have been to use TCP as the main protocol for CLB, but this would require offloading the SSL certificate to the application server level. Managing decryption for each instance is a lot more work than letting ELB handle SSL,and would have taken more effort than it was worth.

In the end, we gave up and decided that rather than use modern technology, we’d stick to old-school techniques that would work with what our client already had in place, and made a business-based decision to stay with CLB. That meant we needed an alternative to WebSocket.

Exploring alternatives

As our first possible alternative to WebSocket we tried to use SockJS, which is a rich ecosystem of client and server libraries that tries to mimic the WebSocket protocol. From the SockJS GitHub README page: “Under the hood SockJS tries to use native WebSocket first. If that fails it can use a variety of browser-specific transport protocols and presents them through WebSocket-like abstractions.”

The problem is that ELB doesn’t support session stickiness for TCP listeners, and SockJS needs that for WebSocket emulation. We tried to spread SockJS’s session across all our instances by extending the SockJS library’s capabilities to be able to connect users with any of the machines. After spending some time on that idea, we realized that trying to extend third-party library code was very time-consuming and didn’t give us the results we needed anyway.

Lesson learned: Third-party libraries often can make our lives easier and enhance the development process, but they don’t always provide the necessary flexibility for implementing customized solutions.

Our last and only hope was to build something ourselves.

Comet turned out to be the solution

To achieve real-time socket-like communication, we forced the server to push changes to the client via HTTP with Comet, an event-driven, server-push data streaming web application model.

Comet architecture has two parts:

  1. Comet client – client-side module for handling connection, reconnection, and responses
  2. Comet server – server-side module for handling timeouts and pushing data to the Comet stream

Any part of an application can subscribe to the Comet server by calling its subscribe method. When the Comet client receives a subscription request, it starts polling the server by making an HTTP POST request to a special endpoint, typically /api/comet.

This route is processed only by the Comet server. When the new request is received, the Comet server holds it for 60 seconds by default. The time can be customized, but we recommend not exceeding a one-minute timeout. Depending on the configuration, the server might send a 408 Request Timeout error – which is not desired behavior. After 60 seconds the Comet server responds to the client with a special reconnect key. The Comet client sees the key and makes another request. This process continues until you stop it by unsubscribing from all subscribed Comet events.

A little bit about the Comet client and how it works: 

The Comet client has four main methods:

  • subscribe – @params:
    • topic – String|Array<string> – represents a single subscription inside a Comet request
    • callback – function which will be executed on successful data response.
  • send – makes an HTTP POST request to the Comet endpoint with a topic as endpoint fragment (`/api/comet/$`) and a payload object that contains:
    sessionGuid – used to identify connection (browser tab)
    • userId – the ID associated with the current user
  • reconnect – responsible for reconnection. All it does is call with the send method, but prior to that it can do another useful job, such as error logging.
  • unsubscribe – @params:
    • topic – String|Array<string> – optional parameter. This method unsubscribes specified listeners passed either by a string or an array of strings. If nothing is passed it unsubscribes all listeners.

Here’s a little bit of code to show you how to implement what we just discussed:

public subscribe(topic: string, callback: any): void {
// store topics to be able to unregister 
if (!this.topics[topic]) {
this.topics[topic] = {topic, callback};
}

this.poll(topic);
}

private poll = (): void => {
const requestBody = {
sessionID,
topics: Object.keys(this.topics)
};

return this.http.post(this.cometcometUrl, requestBody, this.options)
.toPromise()
.then(data => {
if (data.reconnect) {
this.reconnect();
return;
}

// data must be {topic, data}
this.topics[data.topic].callback(null, data.data);
this.reconnect();
})
.catch(e => {
// error handling logic
});
}

private reconnect = (): void => {
// any login we would like here. E.g. logging
this.poll();
}

public unsubscribe = (topics?: string | Array<string>): void => {
if (typeof topics === 'undefined') {
this.topics = {};
return;
}

if (!Array.isArray(topics)) {
topics = [topics];
}

topics.forEach(topic => {
delete this.topics[topic];
});
}

Now let’s talk about the Comet server implementation.

When we need to push a change to a client, we call the emit method on the Comet server module:

function createMessage(req, res, next) {
    const body = req.body;
    const newMessage = Message(body);
newMessage.save(function(err, message) {
    if (err) next(err);
    
    Comet.emit('messages', message);
    
    res.json(message);
});

}

Internally, the Comet server chooses which clients need to receive the change, and sends it to the Comet client code in the front-end application.

this.registerListeners(topics,(data) => {
    clearTimeout(timeout);
    res.json(data);
});

If the Comet server does not receive any data during the 60-second timeout period, it unregisters all listeners and sends a reconnect key to the client.

timeout = setTimeout(() => {
    this.removeListeners();
    res.json({'reconnect': true});
}, waitTime);

Here’s a little visualization of the Comet server module.

Visualization of the Comet server module.

Comet gave us simple but powerful technology, and it can be used under any condition and any circumstances.

Conclusion

While WebSocket is great technology that gives us the power of real-time communication, and is very efficient for game development as well, there are cases when you want to stick to old-school methods. When we had trouble using WebSocket, knowing and understanding how protocols and transports work and understanding the strengths and limitations of AWS ELB and Comet-like libraries let us create a solution that worked when other alternatives would have been too complex or costly.

References

Get in touch

We'd love to hear from you. Please provide us with your preferred contact method so we can be sure to reach you.

    Developing real-time applications on AWS ELB without WebSocket

    Thank you for getting in touch with Grid Dynamics!

    Your inquiry will be directed to the appropriate team and we will get back to you as soon as possible.

    check

    Something went wrong...

    There are possible difficulties with connection or other issues.
    Please try again after some time.

    Retry