Why We’re Switching to gRPC

677 阅读4分钟
原文链接: eng.fromatob.com

When you use a microservice-style architecture, one pretty fundamental decision you need to make is: how do your services talk to each other? The default choice seems to be to send JSON over HTTP – using so-called REST APIs, although most people don’t quite take the REST principles seriously. At fromAtoB, this is how we started, but more recently we decided to make gRPC our standard.

gRPC is a system for remote procedure calls that was developed at Google and is now open-source. Although it’s been around for several years, I haven’t found much online about why people are or aren’t using it, so I decided to write an article explaining our reasons for using gRPC.

The obvious advantage of gRPC is that it uses an efficient binary encoding, which can make it faster than JSON/HTTP. While more speed is always welcome, there are two aspects that were more important for us: clear interface specifications and support for streaming.

gRPC interface specifications

When you create a new gRPC service, the first step is always to define the interface in a .proto file. The code below shows what that looks like – it’s a simplified version of a small part of our own API. The example defines a single remote procedure call “Lookup” and types for its input and output.

syntax = "proto3";

package fromatob;

// FromAtoB is a simplified version of fromAtoB’s backend API.
service FromAtoB {
	rpc Lookup(LookupRequest) returns (Coordinate) {}
}

// A LookupRequest is a request to look up the coordinates for a city by name.
message LookupRequest {
	string name = 1;
}

// A Coordinate identifies a location on Earth by latitude and longitude.
message Coordinate {
	// Latitude is the degrees latitude of the location, in the range [-90, 90].
	double latitude = 1;

	// Longitude is the degrees longitude of the location, in the range [-180, 180].
	double longitude = 2;
}

With this file, you can then generate client and server code using the protoc compiler, and you can start writing code that provides or consumes the API.

So, why is this a good thing, and not just extra work? Take another look at the code sample above. Even if you’ve never used gRPC or Protocol Buffers, it’s pretty readable: for example, it’s clear that to make a Lookup request you should send a name, which is a string, and you’ll get back a Coordinate, which consists of latitude and longitude. In fact, once you’ve added some simple comments, like in the example, the .proto file is the API documentation for your service.

The specification for a real service can be much bigger, of course, but it won’t be much more complicated. It’ll just be more rpc statements for methods and message statements for data types.

The code generated by protoc will also make sure that data sent by client or server corresponds to the specification. This can be a huge help for debugging. I remember two instances where the service I was working on generated JSON data in the wrong format and because that format wasn’t validated anywhere, the problem only showed up in the user interface. The only way to find out what went wrong was to debug the JavaScript frontend code – not so easy when you’re a backend developer who’s never used the JavaScript framework used in the frontend!

Swagger / OpenAPI

In principle, you can get the same advantages for HTTP/JSON APIs with Swagger or its successor OpenAPI. Here’s an example equivalent to the gRPC API above:

openapi: 3.0.0

info:
  title: A simplified version of fromAtoB’s backend API
  version: '1.0'

paths:
  /lookup:
    get:
      description: Look up the coordinates for a city by name.
      parameters:
        - in: query
          name: name
          schema:
            type: string
          description: City name.
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Coordinate'
        '404':
          description: Not Found
          content:
            text/plain:
              schema:
                type: string

components:
  schemas:
    Coordinate:
      type: object
      description: A Coordinate identifies a location on Earth by latitude and longitude.
      properties:
        latitude:
          type: number
          description: Latitude is the degrees latitude of the location, in the range [-90, 90].
        longitude:
          type: number
          description: Longitude is the degrees longitude of the location, in the range [-180, 180].

Compare this to the gRPC spec above. The OpenAPI one is much harder to read! It’s more verbose and has a much more complicated structure (eight levels of indentation instead of one).

Using an OpenAPI spec for validation is also more difficult than with gRPC. At least for internal services, this all means specs either don’t get written, or they don’t get updated and become useless as the API evolves.

Streaming

Earlier this year, I started designing a new API for our searches (think “give me all connections from Berlin to Paris on 1 June 2019”). After I built a first version of the API with HTTP and JSON, one of my colleagues pointed out that in some cases we needed to stream results, meaning we should start sending the first results as soon as we got them. My API just returned a single JSON array, so the server couldn’t send anything until it had collected all results.

What we do in the API used by the frontends is to have clients poll for results. They send a POST request to set up the search, then send repeated GET requests to retrieve results. The response includes a field that indicate if the search is complete or not. This works fine, but it’s not very elegant, and it requires the server to use a datastore such as Redis to keep intermediate results. The new API would be implemented by multiple smaller services and I didn’t want to force all of them to implement this logic.

That was when we decided to try out gRPC. To send the results of a remote procedure call with gRPC, you just add the stream keyword in the .proto file. Here’s the definition for our Search function:

rpc Search (SearchRequest) returns (stream Trip) {}

The code generated by the protoc compiler includes an object with a Send function that our server code calls to send Trip objects one-by-one, and an object with a Recv function that the client code calls to retrieve them. From the programmer’s point of view, this is much easier than implementing a polling API.

Caveats

There are a couple of downsides of gRPC that I want to mention. Both of them have to do with tooling, rather than the protocol itself.

When you build an API with HTTP/JSON, you can use curl, httpie or Postman for simple manual testing. There is a similar tool for gRPC called grpcurl, but it’s not quite as seamless: you have to either add the gRPC server reflection extension on the server side or specify the .proto file on every command. We’ve found it more convenient to include a small command-line utility with the server that lets you make simple requests. The client code generated by protoc actually makes that pretty easy.

A bigger issue for us was that the Kubernetes load balancer, which we’re using for our HTTP service, doesn’t work well for gRPC. Basically, gRCP requires load balancing at the application level rather than at the level of TCP connections. To solve this, we set up Linkerd following this tutorial: gRPC Load Balancing on Kubernetes without Tears .

Conclusion

Although building gRPC APIs requires a bit more work upfront, we found that having clear API specifications and good support for streaming more than makes up for that. For us, gRPC is going to be the default option for any new internal service we build.