Chapters
-
Why are we really excited about introducing you to Erlang? What do we feel is really special about the language? Its lightweight concurrency model with massive process scalability independent of the underlying operating system is second to none. With its approach that avoids shared data, Erlang is the perfect fit for multicore processors, in effect solving many of the synchronization problems and bottlenecks that arise with many conventional programming languages. Its declarative nature makes Erlang programs short and compact, and its built-in features make it ideal for fault-tolerant, soft real-time systems. Erlang also comes with very strong integration capabilities, so Erlang systems can be seamlessly incorporated into larger systems. This means that gradually bringing Erlang into a system and displacing less-capable conventional languages is not at all unusual.
Although Erlang might have been around for some time, the language itself, the virtual machine, and its libraries have been keeping pace with the rapidly changing requirements of the software industry. They are constantly being improved by a competent, enthusiastic, and dedicated team, aided by computer science researchers from universities around the world.
-
This chapter is where we start covering the basics of Erlang. You may expect we'll just be covering things you have seen before in programming languages, but there will be some surprises, whether your background is in C/CC++, Java, Python, or functional programming. Erlang has assignment, but not as you know it from other imperative languages, because you can assign to each variable only once. Erlang has pattern matching, which not only determines control flow, but also binds variables and pulls apart complex data structures. Erlang pattern matching is different in subtle ways from other functional languages. So, you'll need to read carefully! We conclude the chapter by showing how to define Erlang functions and place them into modules to create programs, but we start by surveying the basic data types in Erlang.
-
Erlang's design was heavily influenced by functional and logic programming languages. When dealing with sequential programs, those familiar with languages such as Prolog, ML, or Haskell will recognize the influence they have had on Erlang's constructs and development techniques. When working in functional programming languages, you replace iterative constructs such as while and for loops with recursive programming techniques.
Recursion is the most useful and powerful of all the techniques in a functional programmer's armory. It allows a programmer to traverse a data structure via successive calls to the same function, with the patterns of function calls mirroring the structure of the data itself. The resulting programs are more compact and easier to understand and maintain. Functional programs are, importantly, side-effect-free, unless side effects are specifically needed for printing or for access to external storage.
-
Concurrency is the ability for different functions to execute in parallel without affecting each other unless explicitly programmed to do so. Each concurrent activity in Erlang is called a process. The only way for processes to interact with each other is through message passing, where data is sent from one process to another. The philosophy behind Erlang and its concurrency model is best described by Joe Armstrong's tenets:
-
The world is concurrent.
-
Things in the world don't share data.
-
Things communicate with messages.
-
Things fail.
The concurrency model and its error-handling mechanisms were built into Erlang from the start. With lightweight processes, it is not unusual to have hundreds of thousands, even millions, of processes running in parallel, often with a small memory footprint. The ability of the runtime system to scale concurrency to these levels directly affects the way programs are developed, differentiating Erlang from other concurrent programming languages.
What if you were to use Erlang to write an instant messaging (IM) server, supporting the transmission of messages between thousands of users in a system such as Google Talk or Facebook? The Erlang design philosophy is to spawn a new process for every event so that the program structure directly reflects the concurrency of multiple users exchanging messages. In an IM system, an event could be a presence update, a message being sent or received, or a login request. Each process will service the event it handles, and terminate when the request has been completed.
You could do the same in C or Java, but you would struggle when scaling the system to hundreds of thousands of concurrent events. An option might be to have a pool of processes handling specific event types or particular users, but certainly not a new process for every event. Erlang gets away with this because it does not use native threads to represent processes. It has its own scheduler in the virtual machine (VM), making the creation of processes very efficient while at the same time minimizing their memory footprint. This efficiency is maintained regardless of the number of concurrent processes in the system. The same argument applies for message passing, where the time to send a message is negligible and constant, regardless of the number of processes. This chapter introduces concurrent programming in Erlang, letting you in on one of the most powerful concurrency models available today.
-
-
Processes in Erlang systems can act as gateways to databases, handle protocol stacks, or manage the logging of trace messages. Although these processes may handle different requests, there will be similarities in how these requests are handled. We call these similarities design patterns. In this chapter, we are going to cover the most common patterns you will come across when working with Erlang processes.
The client/server model is commonly used for processes responsible for a resource such as a list of rooms, and services that can be applied on these resources, such as booking a room or viewing its availability. Requests to this server will allow clients (usually implemented as Erlang processes) to access these resources and services.
-
Whatever the programming language, building distributed, fault-tolerant, and scalable systems with requirements for high availability is not for the faint of heart. Erlang's reputation for handling the fault-tolerant and high-availability aspects of these systems has its foundations in the simple but powerful constructs built into the language's concurrency model. These constructs allow processes to monitor each other's behavior and to recover from software faults. They give Erlang a competitive advantage over other programming languages, as they facilitate development of the complex architecture that provides the required fault tolerance through isolating errors and ensuring nonstop operation. Attempts to develop similar frameworks in other languages have either failed or hit a major complexity barrier due to the lack of the very constructs described in this chapter.
-
As soon as your first Erlang product reaches the market and is deployed around the world, you start working on feature enhancements for the second release. Imagine 15,000 lines of code, which incidentally happens to be the size of the code base of the first Erlang product Ericsson shipped, the Mobility Server. In your code base, you have tuples that contain data relating to the existing features and constants that have been hardcoded. When you add new features, you need to add fields to these tuples. The problem is that the fields need to be updated not only in the code base where you are adding these features, but also in the remaining 15,000 lines of code where you aren't adding them. Missing one tuple will cause a runtime error. Assuming your constants also need to be updated, you need to change the hardcoded values everywhere they are used. And even more costly than implementing these software changes is the fact that the entire code base needs to be retested to ensure that no new bugs have been introduced or fields and constant updates have been omitted.
One of the most common constructions in computing is to bring together a number of pieces of data as a single item. Erlang tuples provide the basic mechanism for collecting data, but they do have some disadvantages, particularly when a larger number of data items are collected as a single object. In the first part of this chapter, you will learn about records, which overcome most of these disadvantages and which also make code evolution easier to achieve. The key to this is the fact that records provide data abstraction by which the actual representation of the data is hidden from the programs that access it.
-
As soon as your first Erlang product reaches the market and is deployed around the world, you start working on feature enhancements for the second release. Imagine 15,000 lines of code, which incidentally happens to be the size of the code base of the first Erlang product Ericsson shipped, the Mobility Server. In your code base, you have tuples that contain data relating to the existing features and constants that have been hardcoded. When you add new features, you need to add fields to these tuples. The problem is that the fields need to be updated not only in the code base where you are adding these features, but also in the remaining 15,000 lines of code where you aren't adding them. Missing one tuple will cause a runtime error. Assuming your constants also need to be updated, you need to change the hardcoded values everywhere they are used. And even more costly than implementing these software changes is the fact that the entire code base needs to be retested to ensure that no new bugs have been introduced or fields and constant updates have been omitted.
One of the most common constructions in computing is to bring together a number of pieces of data as a single item. Erlang tuples provide the basic mechanism for collecting data, but they do have some disadvantages, particularly when a larger number of data items are collected as a single object. In the first part of this chapter, you will learn about records, which overcome most of these disadvantages and which also make code evolution easier to achieve. The key to this is the fact that records provide data abstraction by which the actual representation of the data is hidden from the programs that access it.
-
At this stage of the book, we have covered all the fundamentals: sequential programming, concurrency, fault tolerance, and error recovery. The Erlang language-and in particular, its many libraries-offer more to help the programmer be as effective as possible. The various language features covered in this chapter, many of them derived from functional programming languages, are tools that will improve the productivity of a working Erlang programmer.
-
Many practical systems need to store and retrieve large amounts of data within demanding time constraints. For instance, a mobile phone application will need to access and manipulate subscriber details in handling calls as well as in billing and user support. Search times that are proportional to the amount of data being searched are not acceptable in soft real-time systems. Lookup times not only have to be constant, but also have to be very fast!
One of the main composite data types used in programming is a collection of items (or elements, or objects). Erlang lists provide one way to implement a collection, but with more than a small number of items in the list, access to elements becomes slow. On average, we need to check through 50% of the elements in a collection to confirm that a given element is present, and we need to look at all the elements to verify that a given value is absent.
-
To write a fault-tolerant system, you need at least two computers[27] and you need to distribute your program across them. Distributed systems lie at the heart of modern computing. In server-side programming, it is the exception rather than the rule to see a single computer performing a task of any difficulty; instead, a number of computers (or processors) will together provide a robust, efficient, and scalable platform upon which applications can be built.
[27] At least two, according to Joe Armstrong, but three if you ask Leslie Lamport.
Erlang distribution is built into the language, and from the user's point of view, it can be completely transparent: processes are accessed by a pid, and this may equally well refer to a process on the local computer or a process on a system on the other side of the world. In this chapter, we will look at the theory behind distributed systems and see how it is applied to Erlang-based systems.
-
In previous chapters, we introduced patterns that recur when you program using the Erlang concurrency model. We discussed functionality common to concurrent systems, and you saw that processes will handle very different tasks in a similar way. We also emphasized special cases and potential problems that have to be handled when dealing with concurrency.
For example, picture a project with 50 developers spread across several geographic locations. If the project is not properly coordinated and no templates are provided, how many different client/server implementations might the project end up with? Even more dangerous, how many of these implementations will handle special borderline cases and concurrency-related errors correctly, if at all? Without a code review, can you be sure there is a uniform way across the system to handle server crashes that occur after clients have sent a request to the server? Or guarantee that the response from a request is indeed the response, and not just any message that conforms to the internal message protocol?
-
Try to picture a cluster of Erlang nodes, distributed over half a dozen computers to which requests are forwarded. Data has to be accessible and up-to-date across the cluster and destructive database operations, even if they are rare, have to be executed in a transaction to avoid inconsistent data as a result of race conditions. You need to be able to add and remove nodes during runtime and provide persistence to ensure a speedy recovery from all possible failure scenarios.
The solution is to merge the efficiency and simplicity of ETS and Dets tables with the Erlang distribution and to add a transaction layer on top. This solution, called Mnesia, is a powerful database that comes as part of the standard Erlang distribution. Mnesia is the brainchild of Claes "Klacke" Wikström[31] from the days when he was working at Ericsson's Computer Science Lab. Håkan Mattsson eventually took over and brought Mnesia to the next level, productizing it and adding lots of functionality.
[31] Klacke is the same person we need to thank for giving us the ASN.1 compiler, the first-generation garbage collector, ETS, Dets, the Erlang Distribution, bit syntax, and YAWS. I am sure he will be thrilled to receive your bug reports.
-
Programming graphical user interfaces (GUIs) is not one of Erlang's touted strengths, but ongoing work has provided Erlang with a cross-platform, state-of-the-art GUI programming system: wxErlang, an Erlang binding of the wxWidgets system.
wxWidgets consists of an extensive C++ library that provides components for building menus, buttons, interactions, text and graphical displays, and much more; wxWidgets also provides a general framework for building cross-platform applications, including support for internationalization, and lower-level facilities such as memory management. Because of the size and complexity of wxErlang, this chapter cannot provide a comprehensive overview of it. Instead, this chapter covers the principles underlying the toolkit and provides a taste of some of its most-used aspects. Our coverage should be enough to get you started and give you the base from which to explore the library in more depth.
-
Although distributed Erlang might be a first step in allowing programs on remote machines to communicate with each other, we sometimes have to rely on lower-level mechanisms and standardized protocols. Sockets allow programs written in any language to exchange data on different computers by exchanging byte streams transmitted using the protocols of the Internet Protocol (IP) Suite.
Whereas sockets are used to create a byte-oriented communication stream between programs possibly running on different machines, ports, which we cover in the next chapter, will do the same for programs running on the same machine. Byte streams, which in Erlang can be viewed as either binaries or integer lists, often follow standards and application-level protocols that allow programs written independently of each other to interact with each other.
-
It is common for modern computer systems of any size to be built using more than one programming language. Device drivers are typically written in C, and many integrated development environments (IDEs)—such as Eclipse—and other GUI-heavy systems are written in Java or C#. Lightweight web apps can be developed in Ruby and PHP, and Erlang can provide lightweight, fault-tolerant concurrency. If you need to efficiently manipulate or parse strings, Perl or Python is the norm. The library that solves a particular problem for you may not be written in your favorite language, and you must choose whether to use the foreign library or bite the bullet and recode the whole thing in Erlang yourself.[36]
[36] This is done for the duplicate code detection algorithm in Wrangler, the Erlang refactoring tool. An existing efficient C library is used to identify candidate "clones" in Erlang software.
Interlanguage communication is never simple in natural languages or in programming. In natural languages, we must understand the different ways in which the languages work. Do they contain articles? Do they denote gender? Where do the verbs occur in a sentence? We also must understand how words translate. Does the verb ser in Portuguese mean the same as "to be" in English, for instance? (It doesn't.) It's the same for programming languages. Which paradigm do they come from? Are the languages functional, object-oriented, concurrent, or structured? Is an integer in Java the same thing as an integer in Erlang? (It isn't!)
-
Any respectable programming language that has deployments consisting of millions of lines of code running in thousands of installations worldwide must provide built-in low-level tracing mechanisms on which to build tools that can be used for live troubleshooting. Languages that don't provide these tools put a huge burden on developers and support engineers alike, as they have to either develop this infrastructure from scratch themselves or troubleshoot their systems in a black-box environment.
In Erlang, Ericsson's experiences of tracing live telephony switches are reflected in the trace BIFs, which, from being part of the first version of the language, have evolved through the years to become the foundation for a set of tools that give full visibility to the changing state of the system and, as a result, drastically reduce bug resolution times and troubleshooting efforts.
-
The basic types in Erlang-integers, floating-point numbers, atoms, strings, tuples, and lists-were introduced in Chapter 2; records were covered in Chapter 7; and further types-binaries and references-in Chapter 9. When we have declared functions and other definitions, we have also given an informal description of the types of their inputs and outputs.
This chapter shows how you can write down the types of functions as a part of their formal documentation in Erlang, using the EDoc documentation framework, written by Richard Carlsson. What you write down as the type of a function can be checked for consistency against the function definition using the TypEr tool, built by the implementers of Dialyzer. TypEr will infer types without any user input, and so it can be an essential tool for program understanding. TypEr and Dialyzer are the result of the High Performance Erlang (HiPE) team's research at Uppsala University. All of these tools are part of the standard Erlang distribution.
-
As you are writing a program, how do you understand how the program will behave? You might have a model in your mind of what the program will do, but you can be sure of it only when you exercise or interact with your program in some way. Chapter 18 showed you how you can use –spec to express what you think the input and output types of a function should be; TypEr can check whether this is consistent with the code itself.
Types don't tell you how a program behaves, however, and testing is one of the best ways to understand how your code will function. We have been doing this informally throughout the book; each time we have given some definitions, we have immediately gone to the Erlang shell and tried them out in practice. When you're developing in Erlang your coding and test cycles tend to be small. You write a few functions, and you test them. You add a few more, and you test them again. Repeating all of the tests in the shell every time becomes both time-consuming and error-prone.
-
Throughout this book, we have covered the do and don'ts of Erlang programming. We have introduced good practices and efficient constructs while pointing out bad practices, inefficiencies, and bottlenecks. Some of these guidelines you will probably recognize as being relevant to computing in general; others will be Erlang-related, and some will be virtual-machine-dependent. Learning to write efficient and elegant Erlang code will not happen overnight. In this chapter, we summarize design guidelines and programming strategies to use when developing Erlang systems. We cover common mistakes and inefficiencies and look at memory handling and profiling.
