Why build it?
Recap about communication patterns
Interoperation of services can be organized in several ways (communication patterns):
- Remote (Procedure) Call or RPC — the synchronous pattern when one party is calling another then waiting for an answer before continue. No matter if a message is stored intermediately or not. No matter is there asynchronous protocol or direct communication underneath. The fact that the calling party depends on an answer makes this communication pattern synchronous.
- Asynchronous messaging — one party sends something to the other without the need for an answer to continue (fire and forget). The message is stored until the receiver takes it (otherwise communication become synchronous).
- Publisher / Subscriber — one sends something to the indefinite number of the others (others control their subscriptions). The message (or its copies) is (are) stored until all of the subscribers take it.
- Streaming — data is transmitted at the initiative of the sender since a receiver requested for the stream. Streaming requires direct communication protocols underneath, real-time implied.
Those patterns can be implemented using a variety of protocols, for example:
- RPC using HTTP REST, TCP-sockets, DCOM, gRPC or a message bus with AMQP (like RabbitMQ).
- Asynchronous messaging is what a message bus is perfect for.
- Pub / Sub pattern is also based on a message storing, therefore technologies are the same as for the unicast asynchronous messaging.
- Best choise for the streaming is gRPC (which is HTTP/2 application) for reliable delivery (lost packets are retransmitted) or RTP (Real-Time Protocol) over UTP for video / voice (retransmission is meaningless).
This is not the comprehensive list of the implementation options, only most reasonable of them for today.
So, what patterns are covered by RabbitMQ?
There are three of them: RPC, asynchronous messaging and Pub / Sub. See at the official resource.
Non-bus alternatives are more limited. HTTP REST is applicable only for RPC. gRPC is applicable for RPC and the streaming. Distributed log systems (like Kafka) don’t support RPC due to the lack of request-response correlation (also, Kafka requires a periodic polling at the client side, which increases the latency).
Why use RabbitMQ for more of communications?
Ok, we noticed that RabbitMQ supports 3 patterns of 4, but why not use only 2 of them (asynchronous messaging and Pub / Sub) and make more remote calls over HTTP (with REST, SOAP or gRPC)?
There are arguments for direct (service-to-service) HTTP:
- In a fully cloud-native solution there is no place for any central element, that a message bus is, because, even being highly available high-throughput one, such element fundamentally constrains the scaling. The service mesh is a real cloud-native ideology.
- HTTP is the most widely used protocol for everything. It offers the easiest integration.
When any service calls any other service directly it’s called the service mesh. Ok, but average business can’t afford the service mesh. That’s the substantial argument against HTTP-centric strategy. Let’s look, how to mitigate the resilience problems with the service mesh approach:
- You need to implement exponential retries at the caller side for any its HTTP dependency. Otherwise, any network instability would cost you loss of the processing resources for any partially processed request and instantly lead to the failure of the whole request chain from a user app to occasionally (maybe for a matter of milliseconds) inaccessible microservice. The Retry pattern can be implemented at the communication library level (but all your services might be written in different languages, so you may have many such libraries working different way) or in the reverse proxies that run paired to the services (the same Pod in Kubernetes).
- You need to prevent DoS for the services after some of them were down for a short time due to retries (see point 1). Because messages are not stored intermediately and have to be re-sent, the whole mesh attacks a previously unavailable service preventing its return to normal. There is the Circuit Breaker pattern to solve this problem which, again, have to be implemented.
- When the service instances are started or stopped, traffic have to be rebalanced.
- Excessive load should be shed.
Every point above is related to the nature of direct communicatons like REST or gRPC over HTTP and unrelated to messaging with intermediary, like RabbitMQ or similar.
Going HTTP way an enterprise eventually comes to Kubernetes + Istio + Envoy e.t.c. supported by well-trained staff. How deeply trained? Well, if you have read WeChat’s paper “Overload Control for Scaling WeChat Microservices” from 18 December 2018 or learned from Netflix and similar companies, you’ve got that understanding — ownership of the service mesh is expensive.
An IT solution should match to a business. If your business grew faster than your IT organization, but you are not going to be new Netflix, be honest, you need to solve existing problems, but not to build infinitely scalable IT infrastructure. There are so many awkward solutions, so huge monoliths are in production, so few real IT-experts in enterprises. Efficiency, that’s why an enterprise needs to pull its IT solution up for tomorrow’s demand, but not to overinvest into a technology which would never match to its business.
RabbitMQ gives you efficiency by the following:
- Easy to install. An average admin would set it up in a few hours without expensive training (the same unimaginably with Kubernetes for instance). Alternatively, you can subscribe in the cloud even faster.
- Easy to shape as usage is growing. Being widely spread technology it has its experts. It supports distributed scenarios and high-availability. Frankly, I don’t understand why someone still continues to invent new message buses after RabbitMQ. RabbitMQ supports high-load, it’s reliable, it’s configurable, widely used. It’s hard to incorrectly implement RabbitMQ solution on-premise, but, again, it’s also available in the cloud.
You are able to get the following technical advantages with it:
- As any message bus RabbitMQ can connect your services between any DMZs or locations no matter what shape of firewall / network / DNS solution is in your enterprise. Because every service instance opens only one connection to RabbitMQ, even for bi-directional communications with many others, it’s easy to check that there are no problems with accessibility by just looking at the dashboard (even at the moment when a service silent, it’s connected to the bus). It means fast troubleshooting. It means that RabbitMQ connects your on-premise and cloud or ISP hosted IT systems easily, including microservices which are dockerised and non-dockerized, kubernetized or not.
- Despite the ease of connection, it’s safe. Taking control of machines, containers or getting an ability for code execution inside any of the services doesn’t allow an attaker to arrange a DoS attack against other services over a message bus. Because machines or containers don’t make direct connections for service calls, an attacker can’t know where other services are. About excessive load at the service level read forward. Also, RabbitMQ can throttle a message flow.
- RabbitMQ keeps messages, repeats delivery if communication breaks. This makes resiliency.
- RabbitMQ ensures load distribution between service instances. Every instance can limit how many messages it can consume simultaneously. The message bus waits until the previous message is acknowledged before dispatching the next one if the instance is working at its limit. It is natural to give someone a brick into his free hands, but not into the full hands as HTTP does.
- The excessive load is shed by itself because messages have their expiration time. Set it enough short in case of RPC and enough long with other communication patterns. If a service was unable to take a message before its expiration, you can consider it was excessive load. On the other hand, if a new instance is started it immediately takes one from the queue.
Most of the mumbo-jumbo in WeChat’s paper was around the last two 2 points (especially about shedding) and, by the way, they justified (read it!) why shedding has to be based on the queuing time (not CPU load or whatever). Look, RabbitMQ does exactly that.
Do you see how much you eventually save with it? That’s the reasons why we took RabbitMQ in our small enterprise for microservices initiative.
When not to use RabbitMQ?
With the aim to be efficient, we (company I’m currently working for) don’t change any previously built service from HTTP to RabbitMQ. It’s meaningless. Also, we can do nothing with SaaS solutions or commercial products with HTTP endpoints. But we build the new services with RabbitMQ.
You’ll have to intergate with HTTP services no matter what your strategy is. But, you can build your new services and adapters for RabbitMQ.
How to build it?
The easiest way to learn RabbitMQ is to follow official tutorials. It’s really simple. We started this way, made some services and then realised — we need more security and traceability features. I’ll explain:
- Security. Not many people think about it even implementing HTTP REST services. An average developer makes APIs with the creativity in the authentication and authorization field similar to the creativity of an average admin who makes firewall configuration. Everyone differently. The most popular approach is, meanwhile, to use tokens and claims. The most popular token format is JWT. So we had to agree on how to pass JWT over RabbitMQ and how to map its scopes and claims to services and permissions.
- Traceability. I wondered, how can we get know why a user app fails (especially when it’s a blinking failure) having a big number of dependencies (microservices)? Many of them might be a cause, right? So do we need to write and then read a lot of logs? But logs are still uncorrelated! There should be (and now it is) a solution to keep traces of the requests from a user app to an entry service / gateway and down from each individual service to the other. I’m sure, mostly, developers don’t care about such things, no matter is it REST or RabbitMQ dependencies, but we would.
AMQP has message headers like HTTP. Message headers make the difference between a bus and a distributed log systems like Kafka (tell me if I’m wrong). So, you have to use message headers, of course.