[英] Erlang 十年

1,554 阅读14分钟
原文链接: ferd.ca

Ten Years of Erlang

I've joined the Erlang community about 10 years ago, in the midst of its first major hype phase. Erlang, we were told, was the future of concurrency and parallelism. The easiest and fastest way to get it done, and you could get distribution for free too. Back then, things were just starting to get amazing. The virtual machine had recently gotten SMP support, before which you needed to run multiple VMs on the same computer to truly use all CPUs.

I wanted to take a bit of time to reflect over most of that decade. In this post, I'll cover a few things such as hype phases and how this related to Erlang, the ladder of ideas within the language and how that can impact adoption, what changed in my ten years here, and I'll finish up with what I think Erlang still has to bring to the programming community at large.

The Hype Phase

The Hype cycle introduces phases in the lifetime of a product or technology. It's a marketing concept, not a scientific one, but it's often felt useful to describe how things are going. The part that interests me the most is the idea of a hype phase, a kind of gold rush that happens in programming communities. You probably have seen one or more of them, and they all seem to be attached to some killer app that forces everyone to rush in.

Examples that come to mind may include Ruby on Rails and How to Build a Blog Engine in 15 minutes ("Look at all the things I'm not doing!" is still a fun sentence), or Go with Kubernetes (it was already seeing significant usage before, but really exploded then), for example. To a lesser extent, Elixir and Phoenix could fit that list as well.

During a hype phase like that, an incredible influx of newcomers drop by to see what the fuss is all about. Some will stay, most will leave. Your stay might be in terms of months or years, and in rare cases where you find a home it could last decades. But the vast majority of them will be a continuous flow of serial early adopters who surf from tech to tech, sniffing the best opportunity to gain a competitive advantage by being first to use a type of framework, language, or toolkit.

So the idea is, often, that what you need to do is get one true killer app, and people will come to your ecosystem. The killer app drives the rush. If you build it, they will come. If you can keep a small percentage of them active and to stay, you'll have a lively community for the foreseeable future. This is, in a weird way, reminiscent of Rain follows the plow:

God speed the plow. ... By this wonderful provision, which is only man's mastery over nature, the clouds are dispensing copious rains ... [the plow] is the instrument which separates civilization from savagery; and converts a desert into a farm or garden. ... To be more concise, Rain follows the plow.

The basic premise of the theory was that human habitation and agriculture through homesteading effected a permanent change in the climate of arid and semi-arid regions, making these regions more humid. The theory was widely promoted in the 1870s as a justification for the settlement of the Great Plains, a region previously known as the "Great American Desert". It was also used to justify the expansion of wheat growing on marginal land in South Australia during the same period.

If only we can get one big project going, then the devs will appear, and it will become self-sustaining. I believe this is patently false, mostly because Erlang has had dozens of killer apps during its biggest hype phase, and yet, the community remained small. See, for example, the following killer apps from that era:

  • ejabberd (2002, first stable release in 2005): it was by far one of the most, if not the most scalable hosted chat server one could run. Ejabberd was a massive success, and to some extent still is. You will, to this day, still find StackOverflow questions about modules for it. Around 2011, it was forked into MongooseIM, and both solutions still are maintained.
  • CouchDB (2005): one of the first popular databases written in Erlang following the CAP theorem, and one of the new waves of multi-master document stores at the time. While MongoDB ate most of that space, CouchDB still has spiritual children in storage engines such as BarrelDB, on top of still being maintained as well.
  • RabbitMQ (2007): the one queue software implementation that pretty much ate the whole AMQP space. It's still on-going and relevant, and often gets debated along with Kafka when it comes to streaming workloads, although they have pretty distinct properties and use cases.
  • Facebook chat (2008): the initial version of Facebook's Chat was written in Erlang. Due to a lot of internal decisions (stability, strong internal presence of C++ engineers with an established set of solutions), it was rewritten in C++ at a later time
  • WhatsApp (2009, bought in 2014): Once facebook got rid of Erlang for their chat system, they ended up buying WhatsApp, which famously needed only 50 engineers for 900 millions users. It is still on-going today, and in fact, WhatsApp folks have decided to get far more heavily involved in the Erlang and Elixir communities than they ever were before.
  • Riak (2009): One of the best examples of muscle-flexing in the distributed systems world. Riak was a really solid distributed key-value store, a Basho product that still runs in healthcare systems and other critical pieces of infrastructure. After Basho struggled financially and was forced to go bankrupt (in no small part due to violations of fiduciary duties that put the company "on a greased slide to failure"). The folks at Bet365 have since then bought out all the IP, gracefully opened it all up, and the database still lives on in the open source world, albeit with more limited support than in its better days.

Many of these came around the time where Joe Armstrong's book, Programming Erlang, first came out. This created a kind of perfect storm for heavy adoption, and Erlang had a ton of onlookers. Even the day where Hacker News forced all discussions to be about the Innards of Erlang had a noticeable impact. Yet, few people stayed compared to how many took a look.

I think now that killer apps are driven by people glutting to an initial hype phase, not the opposite. There is always a smaller, earlier phase of people sniffing out interesting tech, deciding they like it, then building something, and if that something is a killer app, then you do get an even bigger hype phase out of it. People will cargo cult things, and a success story breeds more copycats. The other common thing is a phase of "reinventing the world", where everyone spends their time reimplementing everything that exists, so you get a bunch of announcements about "something but in language."

But killer apps on their own are never really sufficient. One interesting consequence for these is that products like RabbitMQ and Ejabberd, for all their popularity, have communities of users far larger than communities of contributors. The thousands and thousands of corporations that use their products do not necessarily participate in the Erlang community that much.

Part of it is no doubt due to the idea that most of Erlang's killer apps turn out to be in specialized infrastructure: you create one high reliability black box component that everybody else can use, and if it works well enough, they never need to look inside the box. Off you go, a few dozen developers have provided the foundations to thousands of other products and services. Specialized infrastructure, by definition, is a space where you don't need a massive amount of people to have a massive impact. It's always going to have smaller contributor groups and communities than things that sit closer to the end product, such as web frameworks with uncountable web developers, or even more generalized infrastructure that makes sense to use in small-scale deployment projects where any business may find a use for it.

But even without these factors, it's easy to feel like Erlang missed on a massive opportunity of capturing a larger share of the foot traffic that came through during its hype phase.

The Ladder of Ideas

I won't get into counterfactuals by describing what could or should have been done. Instead, I want to dig into common learning patterns I've seen in the Erlang community during my years of teaching it and writing about it. Those are also patterns I see happening right now in the Elixir community, and that I feel could be signs of a similar future for it.

A pet theory of mine is that a technical topic like a programming language (and its ecosystem) have multiple layers of complexity, with various concepts to go learn and discover. I first started toying with this idea in Learn You Some Erlang, with a diagram I called The Nine Circles of Erl.

Now that's a tongue in cheek approach, and I don't think learning a piece of tech is endless suffering (at least, it shouldn't be). I just liked the pun. But to put it simply, there is often a more "core" track or sequence of topics you'd study learning the technology, creating a "ladder of ideas", where more worthwhile concepts are put higher and higher, but as they are harder to reach, fewer people actually make it there.

In Erlang, what I would consider the ladder might look like this:

  1. functional programming
  2. isolated processes and concurrency
  3. reliable concurrency (links, monitors, timeouts)
  4. OTP behaviours and other system abstractions
  5. How to structure OTP systems
  6. How to build releases and handle their life cycle
  7. How to never take the system down, and how to operate it

If you're starting with Erlang for the first time and grabbing a beginner's book, you'll likely spend most of your first days on the first rung: getting to be friends with functional programming, immutability, recursion, and similar concepts. Sooner or later, you then get into concurrency and parallelism, processes, and message passing. Right after that, you start to learn about links and monitors, handling failures, and what makes Erlang what it is. During Erlang's big hype phase, the second and third rungs were what was sold as truly amazing to most onlookers. If you had to learn something to carry with you in all future projects, it was one of these things.

Other rungs would follow up later, but only if you stuck through with the program. Particularly, OTP (rung 4) would be decried as what it's actually all about. Concurrency and functional programming were nice for sure, but the general development framework represented by OTP was something truly unique that you had to stick with and use. A lot of people would play with them, find out about the nice abstractions they make, but may feel a bit confused about how to structure everything right.

In fact, applications like Ejabberd had the most of their development barely breaking the 4th rung. The ecosystem at the time was a bit like the Wild West, OTP knowledge was a thing for folks who had worked at Ericsson and the most motivated self-learners. Most people would never reach the 5th rung until they had something worth putting in production and started having issues and wanted to look for a better way. The 6th rung was rare until probably 2015 or 2016, when Relx came to make the whole experience easier. The 7th rung is almost never reached, and in fact a bunch of people feel like you should never hot upgrade a node, and that ideally you'd never SSH on there to debug it in production either.

In practice, not everyone will go through all of these in the same order, and some books flip them around (Erlang and OTP in Action comes to mind). That's fine, the ladder is just for illustrative purposes.

Communities tend to move in waves. Since hype phases can increase the size of a community tenfold or a hundredfold for a while, and that most people will take a curious look and then leave, most users in a community will tend to sit at the first rung and rarely make it past there. A fraction will make it a level above, and an ever shrinking fraction will make it above that one, and so on, until you have inner circles of experts at the highest levels.

I think that for Erlang, the first three rungs were probably the easy ones to get into. The fourth one took a few years to develop and to be perceived as worthwhile. The fifth one is where things became extremely hard. Erlang's tooling and ecosystem was lacking. People in the Erlang community had self-selected to be those who could tolerate that barren environment, and as such were insensitive to the plight of newcomers. To keep this post short (well, long rather than absurdly long), my Erlang User Conference keynote is probably the most complete rant I have on the ecosystem:

In any case, if you're an Elixir user, you can probably see where you are on this arbitrarily-defined ladder, and you can get a feel of where factions within a community generally are located on there. A lot of folks, probably those who are fine doing Phoenix and nothing else, will rarely break above the 4th rung, and many will stick on level 3 or below for the foreseeable future. This is, in many cases, fine. It's not a judgment call, just an observation. As someone who has seen a lot of the rungs (and possibly there are still a few above my own head in this environment, like "patching the VM" or something), it feels like they'd be missing out on a lot, but frankly this might never prove to be useful information to them. That's fine.

But all of this is to say: I think we, as a community, probably hamstrung ourselves by making it very difficult for people to go above the basic levels. Some of the lessons to be learned can't be rushed, and to some extent the blind were leading the blind because Erlang was so small that there were not enough people to share all the experience that was required. Things are easier today, and if you're getting in outside of a hype cycle, you're much more likely to be able to find good help because there are fewer people asking for it all at once.

I'd like to think that were Erlang to have a second hype phase tomorrow, we'd be in a better place to welcome it than when I was riding the big wave myself. And hopefully, this experience, along with the much better collaboration between the Erlang and Elixir communities, doubles our chances of success by increasing our surface area.

What Changed

Erlang didn't stay in a glass container filled with formaldehyde, awaiting to be taken out in broad daylight. It has continuously evolved. Parts of it were due to pressures and demands from the Elixir community, who fortunately came in expecting more of their tools than Erlang users had grown accustomed to. Parts of it were also due to actual industrial needs that pushed the platform forwards, and academia just driving things forwards as they like to do.

Here are a few things I can think of, that people might be glad to know changed since 2009 or earlier:

  • multicore support is now good. It used to be that past 2-4 cores, things would start to hit all kinds of bottlenecks that were out of your control as an application developer. Then you could handle 12-16 cores fine. These days I'm not quite sure what the max value is, but I'm pretty sure I wrote and operated stacks that ran on more than 32 cores without a hiccup.
  • There are line numbers in stacktraces. It's almost unthinkable to go back to the era before line numbers. Back then, "write short self-descriptive functions" was not just a question of design, it was a question of survival. You can now debug Erlang programs without otherworldly debugging skills, although having those never hurt.
  • Unicode support is now acceptable. The string module contains most important algorithms, and the unicode module handles most conversions and normalizations fine. There are general strategies to deal with raw codepoints, UTF-8, UTF-16, and UTF-32. Locale support is still lacking, but things are now workable. Modules such as re (for regular expressions) and all higher level file-handling code can also cope with Unicode fine.
  • Maps (implemented as HAMTs) are supported, with explicit pattern matching syntax. The type analysis done on them with Dialyzer also allows to substitute them for multiple use cases where records were previously used with a lot of pain
  • The time handling mechanisms in the virtual machine are world class and do things right when it comes to dealing with time warping, various types of clocks, and so on. Timezone and formatting handling is still mostly better done with community libraries, however.
  • High-performance tools such as atomics, counters, and persistent terms have been added to help improve all underlying mechanisms that power observability features and lower-level core libraries
  • All signal handling has been made asynchronous, including with ports, which massively reduced bottlenecks
  • The compiler has been and is still being rewritten for higher level analysis and performance gains through SSA
  • Dirty schedulers with NIFs now exist and make integration with C or even Rust code simple, with both support for IO- or CPU-intensive workloads. So while the language is probably not infinitely faster even though it is faster, it is easier than ever to drop down for higher performance libraries without impacting the runtime stability too much
  • Various improvements to memory allocation and management
  • Faster and more flexible live tracing and micro-state accounting for correctness and performance investigations
  • A more flexible gen_statem OTP behaviour to implement finite state machines that can handle selective receives
  • A new and improved logging framework, with built-in support for structured logging
  • A rewrite of the crypto module to use NIFs instead of more complex (and often slower to update) drivers
  • An entire rewrite of the file driver using NIFs for huge performance gains
  • An ongoing rewrite of the network drivers using NIFs for similar performance gains
  • A whole rewrite of the ssl application for TLS handling. Back in my days at Heroku, we managed to make it competitive with C++ solutions in terms of latency (maybe 5% slower) and a whole lot better in terms of predictability (around 10-30x lower 99th percentiles)
  • Major improvements to ETS performance
  • I wrote a manual on how to operate and debug production systems using the Erlang VM
  • An entirely new build tool (rebar3) that integrates with a unified package manager for the Erlang ecosystem
  • Multiple new programming languages are also available on the VM, with interchangeable library usage, including (but not limited to) Elixir, Efene, LFE, Luerl, Clojerl, and at least two languages with type inference with Gleam and Alpaca.
  • And a whole lot more, both inside and outside the core Erlang distribution.

If you're interested in finding more, you can just take a look at the whole list of release notes. But in short, if the years around releases of OTP 13 to 16 were a bit slower for the OTP team at Ericsson (we're on version 22 now!), the latest investments they've made in using Erlang in their flagship products have really been visible. But even outside of Ericsson, things have been moving. The Erlang community, along with the Elixir community and contributors from other languages on the Erlang VM, have all banded together to set up the Erlang Ecosystem Foundation, with lively working groups that now help coordinate and tackle issues regarding build and packaging tools, observability work, security, training and adoption, and more.

If like me, you were part of the big initial hype phase, but unlike me, you didn't stick around because a lot of things felt unusable or too tricky, you might want to give it a second try. The ergonomics of the language and its ecosystem have improved drastically.

Where Erlang Goes

There haven't necessarily been big killer apps popping out of the ether the way they were around 2007 to 2009, but that does not mean there are no projects showing promise. Erlang is still wedged deep in infrastructure in a lot of corporations, and most of its initial killer apps are still around. We also have plenty of interesting new players as every BEAM Conf will show. I'm myself really sold on concepts such as Property-Based Testing, and Erlang and Elixir have some of the best frameworks in the world available to them. Despite all of this, signs point to the idea that we are not in a hype phase right now, however.

Is there going to be another hype phase? Maybe, maybe not. You could say Elixir was the next hype phase. The ecosystem has enough in common that the lessons learned in one place are transferable to the other one. There are more similarities than differences between them. Maybe there's still a new renaissance to be had. I personally do not care that much about it anymore. I tend to like smaller communities so I feel good about this. Erlang does not need geometric growth for me to enjoy it, it just needs to remain sustainable.

The size of the Erlang community has also never been a blocker to its worldwide impact. Erlang has been, as long as I've known it, in that situation where there are not enough jobs for the amount of Erlang developers, and not enough developers for the number of Erlang jobs around: there's a lot of both to go around, but they're not aligning right in terms of geography. Corporations and employees that open themselves to remote markets tend to do best. And where Erlang could not easily pierce the webapp market before, the whole Elixir job market is now available with a rather minimal effort to adapt.

It's probably not too important, in the grand scheme of things, whether you are using a language like Erlang or not. While I do feel it's under-used and under-rated, the biggest benefit of it is not in running a system that uses it. The biggest benefit comes from learning about the fundamentals of solid system design, and internalizing its lessons in a practical context.

One type of questions I heard a lot over the years have to do with finding guidance. How can I learn about designing protocols? Is there any good reading you'd recommend on building distributed systems? How can you go the extra mile to make something very robust and fault tolerant? How do I know that my design is modular and my abstractions aren't leaking? What is good error handling? What's a good way to know when optimization is premature? What does it mean to make something declarative?

We like short and digestible solutions like cookbooks and best practices, but most real answers turn out to be a variation of "I've learned over the years". I can honestly say that there has been nothing in my career that could ever compare to spending the time in the world of Erlang and absorbing the experience of its veteran community by osmosis. It's not a large community by numbers, but it's certainly rich by any other metrics. In a few years, I've gone from a junior developer to working in senior roles, speaking around the world, finding ways to teach that experience back, and I owe most of that to the community.

Maybe I still can't write a blog engine in 15 minutes (and truth be told, I'm a slow developer anyway), but I have personally become a much more solid developer and systems architect in what I think was a very time-effective manner. Then again, what always spoke to me was not using systems, it was building them and making them work. What motivates people isn't universal anyway.

I can't imagine I'd have gotten as much in any other community. These last 10 years have been amazing. What's interesting is that the Erlang community is still small and mostly untapped. This means there's plenty of opportunity to get involved with anything, get some one on one time with folks full of wisdom who are eager to share it, and to make a place for yourself.