The Maximum Subarray Problem

If you're into sports this one should be pretty fun for you. Say your favorite sports team just won 3 consecutive games. They're on a winning streak! Given an array of their wins and losses for each game they've played in this seasons, how would you find out if this is their longest streak?

Wait a minute...

What am I doing? I don't like sports. Fuck that shit! Let's get back to some geeky shit us programmers
actually care about.

So you're on Github and you look at your stats... It looks like you're on a roll! You just made your one-thousandths commit. You notice Github is keeping track of your longest streak and current streak. Have you ever wondered how you'd calculate the longest streak for similar data sets in your own program? What's the most efficient way of doing so? What would the implementation look like?

It turns out the longest streak is a pretty common problem and not always one implemented efficiently,
especially on the web where data is known to grow rampantly. If left unchecked, you could let this
problem eat away viciously at your server resources. I've seen a lot of people implement this poorly,
taking up to `O(n`

, where it could actually be done linearly. In this
article, I'll attempt to go over an efficient implementation and walk you through the process of
calculating the longest streak.
^{2})

Let's dive right into an example where we have an array, `$commits`

,
representing two weeks of commit history. Each element in the array is the total number of commits for
the day. We know that the array is sorted by date, in ascending order. So the first element represents
Sunday, the second represents Monday, the third represents Tuesday, and so on...

The question is how do we figure out where our longest streak is in this array? Meaing, on which consecutive days during this two week period, did we make the largest number of commits? To answer this question we first have to calculate the deltas — differences — between each successive pair of days across this two week span.

```
$commits = [4, 3, 5, 7, 12, 15, 9, 11, 8, 2, 0, 0, 1, 0];
function getPairs(Array $array) {
reset($array);
do {
$a1 = current($array);
$key = key($array);
$a2 = next($array);
yield [$a1, $a2];
} while($key = key($array) !== null);
}
foreach(getPairs($commits) as list($a, $b)) {
$diff = $b - $a;
$deltas[] = $diff;
}
```

Here we just use a generator as a quick and dirty *spliterator* way of iterating over successive
pairs of elements in the array. Such that we get `[ [4, 3], [3, 5], [5, 7], ... ]`

and so on.
Now, calculating the delta of each set of pairs becomes trivial and we end up with an array of 13 deltas
as a result. These deltas basically tell us whether we have a positive gain or negative loss for that
day based on the previous day. It doesn't account for the first day since we have no data from its prior
day. So we'll just ignore it, or assume it's 0.

Next, we need to take these deltas and figure out where our largest positive sum of deltas lay, as one contiguous block, within this array. That is to say, we want the maximum subarray of this array and its indices.

This brings us to the maximum subarray problem. This is basically where you try to find the largest positive contiguous sum of numbers in an array.

In computer science, the maximum subarray problem is the task of finding the contiguous subarray within a one-dimensional array of numbers (containing at least one positive number) which has the largest sum. For example, for the sequence of values`[−2, 1, −3, 4, −1, 2, 1, −5, 4]`

the contiguous subarray with the largest sum is`[4, −1, 2, 1]`

, with sum`6`

.

But how do we find the maximum subarray efficiently? There is actually a linear time implementation for
this known as Kadane's algorithm (*found by Jay Kadane of Carneigie-Mellon University*). The
implementation is pretty simple. You just keep track of two scalar values as you iterate over the array.
One, is the `current_maximum`

representing the current maximum sum up to the current iteration.
The other, is the `maximum`

representing the maximum sum reached across the set. You take
the maximum of 0 or the current maximum plus the given number in the iteration and keep it as your
current maximum. Then you take the maximum of the cumulative max or current max and keep that as your
max throughout the loop. By the end you should have the largest sum in your `maximum`

.

```
function maxSubarray(Array $array) {
$c = $m = 0;
foreach($array as $int) {
$c = max(0, $c + $int);
$m = max($m, $c);
}
return $m;
}
```

As you can see the implementation is quite straight-forward and only requires `O(n)`

. Now
all we have to do is figure out the elements that added up to this largest sum in order to know on
which days we encountered our *longest streak*.

```
function maxSubarray(Array $array) {
$current = $max = $end = 0;
$start = [0];
foreach($array as $index => $value) {
$current = max(0, $value + $current);
$max = max($current, $max);
if ($current && $current == $max) {
$end = $index;
} elseif (!$current) {
$start[] = min($index + 1, count($array));
}
}
do {
$s = array_pop($start);
} while($start && $end < $s);
$start = $s;
return [[$start, $end], $max];
}
```

By modifying our `maxSubarray`

function just slightly, we are able to keep track of the
indexes of those elements in the set of deltas, which make up the largest subarray. This tells us
on which days we hit our longest streak. Now the function returns a two-dimensional array, where the
first element is an array the first and last offset of elements that represent the start and end of
our subarray. The second is the sum of those elements.

So if you do the math here it looks like our longest streak of commits was hitting 46 commits between Sunday and Friday, during our first week. Great, now we have an efficient way of doing this!

Now let's do a little clean up, bring all of our code together, and ruffle the feathers one last time,
shall we? We're going to rename our `maxSubarray`

function `longestStreak`

and
our `getPairs`

function to `getDeltas`

, since this are just more appropriate names
for what they're doing here. Then we're going to call `getDeltas`

from within our
`longestStreak`

function in order to only loop over the array once. This requires modifying
the getPairs function just slightly by having it return the actual deltas of each pair, rather than the
pairs themselves. This also means we want to modify our original maxSubarray function implementation to
adapt it to return the correct sum and indices of actual `$commits`

array, rather then the
deltas that were calculated from that array. Remember, we're now calculating the deltas as we calculate
the maximum subarray for better efficiency.

What we end up with is something that looks like this...

```
function getDeltas(Array $array) {
reset($array);
$i = 0;
do {
$a1 = current($array);
$key = key($array);
$a2 = next($array);
yield $i => $a2 - $a1;
$i++;
} while($key = key($array) !== null);
}
function longestStreak(Array $array) {
$current = $max = $end = 0;
$start = [0];
foreach(getDeltas($array) as $index => $value) {
$current = max(0, $value + $current);
$max = max($current, $max);
if ($current && $current == $max) {
$end = $index;
} elseif (!$current) {
$start[] = min($index + 1, count($array));
}
}
do {
$s = array_pop($start);
} while($start && $end < $s);
$start = max(0, $s - 1);
$end = min(count($array) - 1, $end + 1);
return [[$start, $end], array_sum(array_slice($array, $start, $end - $start + 1))];
}
$dates = [
'6/28/2015',
'6/29/2015',
'6/30/2015',
'7/1/2015',
'7/2/2015',
'7/3/2015',
'7/4/2015',
'7/5/2015',
'7/6/2015',
'7/7/2015',
'7/8/2015',
'7/9/2015',
'7/10/2015',
'7/11/2015',
];
$commits = [4, 3, 5, 7, 12, 15, 9, 11, 8, 2, 0, 0, 1, 0];
$streak = longestStreak($commits);
$start = $dates[$streak[0][0]];
$end = $dates[$streak[0][1]];
echo "Longest Streak: {$streak[1]} ($start to $end)";
```

The output from this code should read...

Longest Streak: 46 (6/28/2015 to 7/3/2015)

Notice, we've also added an array of dates to give a little more context to the example.

Now you might be wondering, *how do I get the current streak?*

Given what you've learned about finding the longest streak, I'll leave it as an exercise for the reader
to figure out how to arrive at the *current streak*. Here's your hint: it still requires building
an array of deltas from the daily commit totals, but you may want to consider building it backwards.