Making difficult software architecture choices

Concurrency and asynchronous PHP

Allow me to take you on a little stroll down memory lane. I look back on the last 10 years of my career using PHP and realize just how much things have changed on the web and yet how little of PHP's core architecture has changed in response. This is not necessarily a bad thing, because it speaks volumes about how good those architectural choices were that they managed to survive all of the changes we've witnessed on the web. Changes like WebSockets and HTTP/2, which have fundamentally changed the way we think about web development today. Despite these changes, however, PHP remained one of the most dominant languages used for web development to this day.

The Back Story

So the reason I wrote this article and decided to share it with you is because I've been working on a little pet project of mine recently that really got me thinking. Years ago, when I first started out using PHP there was no threading library or asynchronous multi-tasking library for PHP. You basically just deal with the stateless nature of HTTP as it always was… One request at a time.

When WebSockets came into the picture a few years ago, however, they changed everything. Now you could actually build stateless or stateful services in your backend with a stateful protocol where asynchronous work was possible on both ends of the connection. This was a huge eye-opener for me.

Imagine all of the useful things you could do with this! Of course, Node.js became the namesake of this era at first. Though it wasn't long before people realized the architectural design details of their servers and underlying back-end applications were really brought to question at this point. It proved that languages like Go and Erlang, which were built for concurrent programming could really thrive in this space. Though, the fact that they didn't take off like a rocket to the moon isn't a direct result of any of that. There were bigger barriers there. People had been building and growing their code bases in other, more popular, languages for web development, like PHP and Ruby, for years. Very few people were actually willing to abandon all that work and switch to something new and cool just because it was cool.

Down this windy road we went and today more and more people approach me with the daunting question: How should I choose between PHP and other technologies when it comes to WebSockets and HTTP/2? and it's a perfectly reasonable question, because you may not have to choose between them when it comes down to it, but you do have to consider the architectural trade offs to get some idea of the consequences of your choices.

So let's get started by exploring the issue in more depth!

What is Software Architecture?

First, one should understand what is meant by software architecture and why it plays such an important role in software design. At its core, software architecture is about making trade offs to prevent failure in software systems. A trade off is sacrificing one thing in order to gain another thing. For example, one could sacrifice a little extra memory in order to gain slightly faster computation. We make such an architectural trade off in PHP with arrays, for example, because in PHP an array is an ordered hashmap, which requires more memory to store, but makes it possible to access the values of the array faster, because of the hashing property. This is seen as an acceptable trade off in PHP, because most PHP requests only live for the span of a few milliseconds, and as such we'd rather take a little more memory to run your request faster and consequently make it easier for someone to use a PHP array as both a dictionary or a list.

Another example of a trade off that PHP makes is evident in its concurrency model. PHP is able to serve hundreds of concurrent requests at the same time, because it is embeddable and extensible by design. In software architecture this is commonly referred to as a non-quality attribute, or a non-functional requirement. Because PHP can be embedded directly in another program, like your Apache httpd server or the PHP Fork Process Manager (or PHP-FPM), it need not worry about concurrency within your application code. The PHP core runtime itself can be duplicated either within a threaded context (typically only used on Windows where forking isn't possible) or a server context (the typical linux setup). This is possible by the Server API (or SAPI), which lets PHP build up and tear down the core runtime's heap memory for each independent request that the server needs to handle.

So in Apache's httpd server, this can be done via the mod_php SAPI quite easily using a pre-fork Multi Process Manager (or MPM). Each httpd child worker process has an embedded copy of the core PHP runtime loaded via an Apache httpd module, and can initialize one request per-process concurrently.

What's probably surprising is that PHP has stuck with this same architectural style for the better part of the last 20 years. So I can't help but wonder, why? I mean, the fact that it still works so effectively to this day, must say something about the things PHP got very right from the beginning.

And in truth, it does…

Because PHP doesn't share anything across requests, it doesn't actually matter that much if PHP fails at doing something in the middle of a request. So for example, let's say there's an odd bug in PHP that causes it to crash once every 100K requests. Well, on a busy server that's getting hundreds of requests per second, that means one critical failure every 15 minutes or so. Given that one PHP request would live in a single process in the aforementioned architecture, that's quite acceptable. Because if that one process crashes only a single request suffers. Note, this is the same architecture used for the vast-majority of PHP applications deployed in production and used on the web every day. If you're running mod_php or PHP-FPM, you're relying on this same multi-process based, shared-nothing architecture.

If you consider the alternative, where PHP lives in a single process, and uses threads to serve requests, a crash that happens in a single thread causes all of the other requests being served by that same process to go down as well. So you end up losing hundreds of requests every 15 minutes instead of just one (in our busy server scenario). That would be a rather poor architectural design detail for PHP and as such you don't really see anyone deploying PHP on Windows. It just doesn't make sense.

What is Concurrency?

If we're going to talk about concurrency, we should really try to understand what concurrency means and why it plays such a big role in PHP's architectural choices.

Concurrency really takes on two forms. There's hardware-based concurrency, which means that two things are happening at the same time at the hardware level. Then there's software-based concurrency, which means that it appears as though two things are happening at the same time in our software, except that as far as the hardware is concerned, they're happening serially (one after the other).

People very often confuse the terms asynchronous, concurrency, parallelism or threading quite easily. These don't exactly mean the same thing – despite being somewhat relatable. To clear up any confusion it usually helps to think about these things in terms of processing and execution.

A process is the object or container that defines the memory, instructions, and context of a software program. A thread is the unit of execution that occurs in relation to a process. So just by that contrast we can say that most processes are single-thread by default, since most of them only have a single execution thread, inherently. However, a process may define multiple execution threads and then it becomes multi-threaded. This takes us down the path of parallelism, since threading creates additional context within a process and can be aided by hardware concurrency.

So all of that covers how a process can create concurrency based on how many execution threads it has. When we think of synchronous and asynchronous, however, we're actually referring to the manner in which we view the execution from within a process context. A process context being the thread itself. So that means in a single-threaded process we can either view the execution context as being synchronous or asynchronous. The same applies for a multi-threaded process. We can still choose to view the execution within each thread as being synchronous or asynchronous.



Synchronous execution is about blocking. One routine or set of instructions being executed in a given thread can either block any other routines or sets of instructions from occurring or not. Because an execution thread is typically run on the CPU by a single physical or logical core, it is subject to sequential execution within the hardware itself. However, that doesn't mean, that we can't write the code in a way that allows us to create additional context within the execution thread itself that makes it possible to switch between execution contexts within the same process thread.

Let's take a look at a practical example of what synchronous execution looks like in code.

$numbers = [7, 4];
foreach($numbers as $number) {
    if (isPrime($number)) {
        echo "$number is a prime number!\n";
    } else {
        echo "$number is a not prime number...\n";

You can easily tell that iteration of this loop requires synchronous execution of each step in the loop. Here we're basically taking two numbers and trying to figure out if each of those numbers are prime number or not. A prime number is any number that is only evenly divisible by 1 and itself. If you're a math person than you're probably aware that this primality test (denoted by the function isPrime) has no efficient solution. As such each time we call the isPrime function inside our loop we must also synchronously execute n number of instructions to test if the number is prime. This is proven the by phi function, which tells us that to factor a prime number we must conduct n - 1 permutations of the integer n in order to prove that its sole factors are 1 and itself.

function isPrime($number) {
    for($n = $number - 1; $n > 1; $n--) {
        if (!($number % $n)) {
            return false;
    return true;

So notice if our isPrime function requires a lot of steps it will block any other call being made to isPrime in our earlier loop. Since the number 7 is indeed prime it will require a total of 5 steps before this function returns. This is because I'm simply checking n - 1 > 1 factors exclusive.

So it's easy to see how each step in the earlier foreach loop can be considered a synchronous task that involves each step in the for loop found in our isPrime function.

So how could we write this code asynchronously?

function isPrime($num) {
    for($i = 0, $n = $num - 1; $n > 1; $n--, $i++) {
        yield $i;
        if (!($num % $n)) {
            return false;
    return true;

function checkPrime(Array $numbers) {
    $tasks = $results = [];
    foreach($numbers as $task => $number) {
        $tasks[] = isPrime($number);
        echo "Task $task started\n";
    $n = 0;
    $j = count($tasks);
    do {
        $task = array_slice($tasks, $n % $j, 1);
        $task = reset($task);
        $current = $task->current();
        if ($current === null) {
            $results[$n % $j] = $task->getReturn();
            $t = $n % $j;
            yield ($n % $j) => [$current, $results[$n % $j]];
            unset($tasks[$n % $j]);
            if (!$j) {
        yield ($n % $j) => [$current, null];
    } while($tasks);
    return $results;

foreach(checkPrime([7,4]) as $task => list($instruction, $result)) {
    if ($result !== null) {
        echo "Task $task completed with result " . ($result ? 'true' : 'false'), "\n";
    } else {
        echo "Executed instruction $instruction for task $task\n";

Notice, it still takes the same number of steps to do this. The only difference being that we can get the result for the number 4 not being prime, faster in the asynchronous implementation. In the synchronous implementation we actually have to wait a full 5 steps for the primality test of the number 7 to complete, before we can even begin to test the number 4 for primality.

Now, again, it only seems faster, if the thing that's waiting for the result of isPrime(4) is independent of the thing that's waiting for the result of isPrime(7). Otherwise, there's really no point. Because in terms of actual time spent on both tasks, they take the same amount of time no matter if you're viewing the execution steps as being synchronous or asynchronous.

Hardware Concurrency

Imagine that these two tasks were two independent requests being made by two different people to your PHP server. If PHP were just a single-threaded, single process, and both of these requests happened to come in at the same time, the person waiting on the result for isPrime(4) might have to wait a lot longer if the person that sent the request for isPrime(7) happens to get executed first in our synchronous implementation.

This is why PHP's concurrency model is based on actors. Each requests is an actor that passes a message along to PHP via the SAPI.

In this view, each instance of the PHP interpreter that's running is considered a separate process or program that can have its own execution thread tied to a hardware CPU core. In a multi-core system this makes it so that request A can get a response within two hardware clock-cycles and request B can get a response within five hardware clock-cycles. This is what true concurrency looks like. Notice in our earlier diagram of the asynchronous view, within a single thread, we still had to wait a full four hardware clock cycles for the first task to complete. Whereas here we get real time concurrency and only have to wait two.

Now, this doesn't mean that every single process running on your system necessarily gets true hardware concurrency. There are systems were hardware concurrency is not always possible. For example in a single-core or embedded system or where there are more processes running at the same time than there are available hardware CPU cores. There's also the concept of logical cores like in Intel's x86 hyper-threading technology, where a single physical core can produce the same kind of asynchronous execution we demonstrated in the above PHP implementation, with multiple execution threads. This is called micro-architecture.

This should help get a better handle on the differences between how we look at concurrency and synchronous vs asynchronous views of execution.

So if you look at how threading works, for example, it's normally aided by hardware-based concurrency in order to speed up the unit of execution that is the process thread. This is all done in coordination with your operating system and micro-architecture through something called a task scheduler. It's what makes it possible to do multi-tasking both physically at the hardware level and then virtually at the software level when you've fully saturated your hardware capabilities. Because CPUs are so fast you hardly ever notice the difference between hardware and software based concurrency, but you do notice it the most when it fails.

Let's go back to our synchronous vs asynchronous implementation of the prime factorization implementation we looked at earlier in PHP. What happens in the synchronous version of that implementation if an Exception is thrown from inside the loop in the isPrime function? Well, that means, if we catch the Exception from inside the foreach loop, and we can retry or recover from handling that Exception, we would have to start all over again. That's not a big deal when we're testing the number 7 and we happen to fail at step 3 or 4, but it's a huge deal if we were doing isPrime(789343), however, and we happen to fail at step number 789340. We'd have to rerun through tens of thousands of steps all over again just to handle that failure case. That sucks!

Though with our asynchronous implementation, it's actually possible to recover from failure at a given step without having to restart the entire task from scratch or effect any of the other executing tasks from inside our foreach loop. Because generators are two-way pipes in PHP, we can send information back to the generator at any given step in order to recover from our failure (assuming recovery is possible), and this is a result of our architecture choice.

You see making good architecture choices is really difficult, because you usually don't realize all of your trade offs right away when you're making those choices. Think about that the next time you decide to use something like Node.js for your next project because you think it's cool or hip or whatever the parlance might be for the day. As cool or as new as something may seem for the moment, the fact that it's new just means it hasn't managed to stand up to the test of time just yet. Its architectural choices are yet to be seen and since architecture is about preventing failure, you need to see all the places where it can fail in order to decide whether you are willing to put up with those kinds of failures.

While Node.js makes it easy to do asynchronous work on the server using a familiar language like javascript, it also makes it very hard to deal with system-wide failure. Because Node is built on a single process, single-threaded, software-based concurrency model, this puts your entire application code, which is running on Node, subject to all the failures of that same model. It's true that Node has a thread pool for non blocking I/O, but it only takes one of those threads to bring down the entire server and it takes every concurrent request, it's serving, along with it.

PHP doesn't actually have that problem with its actor based model of computational concurrency. So it's possible to expand on that same model by extending PHP with a library like ZeroMQ, for example, and running multiple PHP processes in the background that can pass messages along to facilitate a multi-process multi-actor concurrency model that utlizes both hardware system-level concurrency and asynchronous application-level concurrency.

This is precisely what I did when I built the PoC for's WebSocket server.

Because we don't actually need to stray that far from what PHP has obviously gotten so right over the years, it turns out it's not nearly as complicated to do non-blocking asynchronous work in PHP as you might think. All it takes is figuring out your architecture driver and doing a little analysis to realize PHP's architecture has suited the web very well all along despite all of its other short-comings. So I took many of these same architectural design details and managed to re-purpose them in order to get a fully functioning—reliable—system up and running within a matter of days and not months.

The result is that my system is as easy to scale horizontally as it is to scale PHP. Not only that but it's also possible to utilize this multitenancy architecture on different nodes and cross the locality boundary between the server and its workers. This allows for even greater concurrent computation capabilities as the system scales.