Drift

The personal blog of Phoenix-based developer Josh Janusch

Laravel Collection Macros: Adding a "sortByMuti" function

Two weeks ago, I was faced with a fun little task – changing the way a report was sorted to sort by 6 different data points. Now, one of the best things about SQL is you can do this very easily. Unfortunately, the data set I was working with was half-derived at run-time and couldn't be cached or stored in the database because the results change based on what range of dates are generated and what date it is generated on.

With the option to use SQL's ability to sort multiple columns at once, I was left with one option: do it at run time in PHP, ideally in a re-usable fashion using Laravel's Collection class.

Overview of the Task

So let's start off with a better explanation of what we needed to do. Let's take this super simple data set:

$collection = collect([
    [
        'name' => 'John Doe',
        'city' => 'Dallas',
        'state' => 'Texas',
        'size' => [
            'height' => 72,
            'weight' => 210
        ]
    ],
    [
        'name' => 'Jane Doe',
        'city' => 'Houston',
        'state' => 'Texas',
        'size' => [
            'height' => 60,
            'weight' => 120
        ]
    ],
    [
        'name' => 'Sam Swanson',
        'city' => 'Dallas',
        'state' => 'Texas',
        'size' => [
            'height' => 72,
            'weight' => 274
        ]
    ],
    [
        'name' => 'Jeremy Simpson',
        'city' => 'Dallas',
        'state' => 'Texas',
        'size' => [
            'height' => 71,
            'weight' => 210
        ]
    ],
    [
        'name' => 'Lois Smith',
        'city' => 'Seattle',
        'state' => 'Washington',
        'size' => [
            'height' => 65,
            'weight' => 132
        ]
    ],
]);

Please note this is not the actual dataset we were working with, or even the correct keys; it's simply an easy example. As you can see, we have a collection of 5 records representing individuals in 3 cities in 2 states, along with some descriptive data representing their height (in inches) and weight (in lbs). The goal was to sort in this order:

  1. State
  2. City
  3. Height
  4. Weight
  5. Name

Since we are using the Laravel Collection class, we do have access to an easy-to-use sorting function. If we want to sort by state, all we need to do is this:

$collection->sortBy('state');

The problem comes when you then want to sort by city. Doing this:

$collection->sortBy('state')->sortBy('city')

Sorts the Collection by state and then sorts it again by city. In the end, you end up with a Collection sorted by city and nothing else, instead of sorting by state (so Texas then Washington) and then by city (so Dallas then Houston under Texas, and Seattle under Washington). Laravel's Collections do not have a built-in way to sort by multiple keys, unfortunately, so I had two choices:

  1. Find a package or snippet that will let me do it. This is always a viable option with Laravel thanks to its large and active community. I've looked for this in the past and found a few options, but none of them satisfied me so I decided against it.
  2. Spend an hour and build it myself

Plan

The very first thing I did was draw out exactly what had to happen so I could visualize what needed to happen. Some sketches (ie poorly drawn rectangles and brackets) with a physically printed data set led me to the following process:

  1. Group by state so that all Washington records are together and all Texas records are together
  2. Sort the grouped data by its group value, or Texas -> Washington
  3. Loop through each state group, and group again by city
  4. Sort the grouped city data by within each state group, or Dallas -> Houston, Seattle
  5. Repeat ad nauseam for each remaining key
  6. Undo each level of grouping so that in the end, we are left with a single collection of data with no nested data

Attempt 1: Manual, non-reusable

With those instructions in mind, I decided the first step was to create the result the long way before making it something that could easily be used and re-used. My initial thought was "I can just use groupBy(), sortBy(), and flatten() to easily achieve this, and that was half correct. Sorting a single level (minus making it into a single level) was simple enough:

$collection->sortBy('state')->groupBy('state');

I learned quickly that the best way to handle this was to swap points 1 and 2 of the plan and sort before grouping as it allows for a more limited amount of code to be used. Adding a second level was also simple, just by adding a map() call:

$collection->sortBy('state')->groupBy('state')->map(function (Collection $collection) {
    return $collection->sortBy('city')->groupBy('city');
});

With that, I could expand and do all 5 keys:

$collection->sortBy('state')->groupBy('state')->map(function (Collection $collection) {
    return $collection->sortBy('city')->groupBy('city')->map(function (Collection $collection) {
        return $collection->sortBy('size.height')->groupBy('size.height')->map(function (Collection $collection) {
            return $collection->sortBy('size.weight')->groupBy('size.weight')->map(function (Collection $collection) {
                return $collection->sortBy('name')->groupBy('name');
            });
        });
    });
});

And, again, it worked. The data was sorted exactly how I wanted it. The only problem was that the data I was sorting was now 4 levels deeper than I needed it. So I tried my initial thought which was flatten(), a Collection method I had never really worked with. Unfortunately, all that did was create a single record in each state-group, still nested 4 levels deep. Looking at the Collections - Available Methods documentation, I noticed the collapse() method which promised:

collapses a collection of arrays into a single, flat collection

Perfect! That is exactly what I am looking for! Except after playing with it, the best I could do was to get the very first record and only that record. I didn't dig too far into it, but it seemed to work much the same way as flatten does, in that it combines keys or something to that effect. Regardless, I was looking at unacceptable data destruction.

Using Macros to create an ungroup() method

After digging some more, I realized this simply was not possible using the built-in Laravel methods. I did some googling and found nothing really relevant and then decided to just build it myself. Initially, I was going to create a static util function to ungroup a collection. It was messy, but I was sure it would work. Then I remembered something I had read about in passing in the GitHub issues once: Collection Macros. We were using one or two View Macros already, but there was no documentation on Collection Macros anywhere. I did some quick Googling and found spatie/laravel-collection-macros. Spatie is one of the biggest contributors to the community, so I figured I could use their code as reference to build my own method. In the end, I came up with this:

if (!Collection::hasMacro('ungroup')) {
    /**
     * Ungroup a previously grouped collection (grouped by {@see Collection::groupBy()})
     */
    Collection::macro('ungroup', function () {
        // create a new collection to use as the collection where the other collections are merged into
        $newCollection = Collection::make([]);
        // $this is the current collection ungroup() has been called on
        // binding $this is common in JS, but this was the first I had run across it in PHP
        $this->each(function ($item) use (&$newCollection) {
            // use merge to combine the collections
            $newCollection = $newCollection->merge($item);
        });

        return $newCollection;
    });
}

All-in-all, an extremely simple method (thanks, in large part, to Laravel's fluent design). And it worked! If I added that to a service provider and combined it with my previous code, I got a single-level collection, all ordered in the correct manner. Here is a quick look at what the final manual implementation was:

$collection->sortBy('state')->groupBy('state')->map(function (Collection $collection) {
    return $collection->sortBy('city')->groupBy('city')->map(function (Collection $collection) {
        return $collection->sortBy('size.height')->groupBy('size.height')->map(function (Collection $collection) {
            return $collection->sortBy('size.weight')->groupBy('size.weight')->map(function (Collection $collection) {
                return $collection->sortBy('name')->groupBy('name');
            })->ungroup();
        })->ungroup();
    })->ungroup();
})->ungroup();

Attempt 2: Simplify, Abstract, and make Re-usable

With the data successfully sorted, I looked to take that highly repetitive snippet, abstract it into something that will work recursively, is re-usable, and ditches the repetition. With what I had just created with the ungroup() Macro, I believed that creating another Macro was the easiest route. I then took a page out of Jeffrey Way's book, and wrote how I wanted to execute the Macro, rather than writing the Macro how I wanted to write it. This has always seemed backwards to me, and even Way has said it doesn't come naturally to many devs, but it seemed like the easiest method to take.

How I wanted this to work was to simply pass in an array with the keys to sort by in the correct order. Expanding on that, due to an application necessity, I knew I also wanted to define the order I sorted each key in. These requirements led to:

$collection->sortByMulti([
    'state'       => 'ASC',
    'city'        => 'ASC',
    'size.height' => 'DESC',
    'size.weight' => 'DESC',
    'name'        => 'ASC',
]);

Perfect, now to write the sortByMulti() macro.

Using array_reduce()

My first attempt at this involved using array_reduce(), which would have let me do this without recursion. As I alluded to above, I was confident this could be handled with a recursive function but I try to avoid them for optimization and maintenance reasons where possible. So I started writing it and I encountered an issue pretty quickly: you can't get keys in array_reduce(), which was necessary if I wanted to defined what order to sort each key in. With a bit of searching, I found this comment in the PHP docs that showed a trick for getting the keys of the array. Using that, I was able to get to the point where I had the data sorted, but 4 levels deep. I couldn't come up with an easy way to implement the ungroup() macro, since the return of the array_reduce is sent to the next iteration and I needed that data to be grouped in that iteration and not ungrouped until all grouping and sorting had completed. Not seeing an obvious solution with my limited time allotment and knowing how complicated array_reduce() can be, I abandoned this attempt.

Using Recursion

With array_reduce() out of the question, I decided to just bite the bullet and write a recursive method to handle this. With this, I ran into a few hiccups:

  1. I wanted to do this without creating any Class-level variables or methods. But an initial test showed you couldn't pass an anonymous function into itself. If I did nothing, the function did not exist. If I passed it in via use(), it was available in-scope, as expected, but set to null. I then discovered I could pass it in by reference (ie use (&$anonFunc)), which wasn't pretty but I could live with.
  2. The primary benefit of array_reduce() was that I could iterate over one array (my keys) while manipulating another. With a recursive function, I had to find a way to work with those keys
  3. Any loop iterations would have to be tracked manually (ew!)

My final solution was this:

if (!Collection::hasMacro('sortByMulti')) {
    /**
     * An extension of the {@see Collection::sortBy()} method that allows for sorting against as many different
     * keys. Uses a combination of {@see Collection::sortBy()} and {@see Collection::groupBy()} to achieve this.
     *
     * @param array $keys An associative array that uses the key to sort by (which accepts dot separated values,
     *                    as {@see Collection::sortBy()} would) and the value is the order (either ASC or DESC)
     */
    Collection::macro('sortByMulti', function (array $keys) {
        $currentIndex = 0;
        $keys = array_map(function ($key, $sort) {
            return ['key' => $key, 'sort' => $sort];
        }, array_keys($keys), $keys);

        $sortBy = function (Collection $collection) use (&$currentIndex, $keys, &$sortBy) {
            if ($currentIndex >= count($keys)) {
                return $collection;
            }

            $key = $keys[$currentIndex]['key'];
            $sort = $keys[$currentIndex]['sort'];
            $sortFunc = $sort === 'DESC' ? 'sortByDesc' : 'sortBy';
            $currentIndex++;
            return $collection->$sortFunc($key)->groupBy($key)->map($sortBy)->ungroup();
        };

        return $sortBy($this);
    });
}

So some explanations:

  1. On Line 11, I take the $keys array and remap it so that each key/value pair was stored in a sub-array. This was necessary because on lines 20 and 21, I had to access the array by a numeric key which is not possible with an associative array. This isn't the cleanest code I've written, but it works and I thought it was a clever solution.
  2. On Line 15, you'll notice the solution to a recursive anonymous function: use (...&$sortby)
  3. Lines 16-18 determine when to stop the recursion. If the index grows larger than the size of the $keys array, stop and just return the collection.
  4. On Line 22, I determine whether I want to use sortBy() or sortByDesc(). I call that function using a variable to avoid having to write line 24 twice as part of a conditional.
  5. I manually track the index of the $keys we're on in line 23. Again, this was something I was displeased with but I could not find a better solution.

In the end, it did its job marvelously. I can now run this:

$collection->sortByMulti([
    'state'       => 'ASC',
    'city'        => 'ASC',
    'size.height' => 'DESC',
    'size.weight' => 'DESC',
    'name'        => 'ASC',
]);

and get these results:

[
    [
        "name"  => "Sam Swanson",
        "city"  => "Dallas",
        "state" => "Texas",
        "size"  => [
            "height" => 72,
            "weight" => 274,
        ],
    ],
    [
        "name"  => "John Doe",
        "city"  => "Dallas",
        "state" => "Texas",
        "size"  => [
            "height" => 72,
            "weight" => 210,
        ],
    ],
    [
        "name"  => "Jeremy Simpson",
        "city"  => "Dallas",
        "state" => "Texas",
        "size"  => [
            "height" => 71,
            "weight" => 210,
        ],
    ],
    [
        "name"  => "Jane Doe",
        "city"  => "Houston",
        "state" => "Texas",
        "size"  => [
            "height" => 60,
            "weight" => 120,
        ],
    ],
    [
        "name"  => "Lois Smith",
        "city"  => "Seattle",
        "state" => "Washington",
        "size"  => [
            "height" => 65,
            "weight" => 132,
        ],
    ],
];

Potential Optimizations and Clean Ups

As mentioned in the beginning of this post, I had about an hour, hour and a half to write this. As a result, I did take some shortcuts and this can definitely be optimized:

  1. The most obvious issue is that I am creating a lot of extra collections through sortBy() and groupBy(). ungroupBy() also creates a possibly unnecessary collection. Unfortunately, there isn't a much better option available short of writing it using the native array methods, which would have been slightly more performant. The biggest issue my method risks is memory usage and perhaps memory leaks. My use case is only seeing me sort a collection of about 200 records at a time, so I wasn't overly concerned with this.
  2. The $keys argument of the sortByMulti() method requires the ordering be passed in with each key. This is a bit of a pain and I would have liked for you to only have to declare keys that are sorted in DESC, but I felt it was an all right compromise to keep it simple and limit time spent.
  3. It's generally best to avoid recursive methods as much as possible. There's nothing wrong with them from an architectural standpoint, but it's possible I did not eliminate all opportunities for infinite recursion or memory leak, even if I did keep it in mind when writing. The best alternative is the array_reduce method I attempted, but the only way I can think of to use the ungroup() method would be a separate call after the reduction that would likely have to be recursive as well.
  4. The code itself is not the cleanest or most well organized and could be simplified further (such as inlining the declaration of the $key and $sort variables).

All in all, this was a fairly simple solution to a problem I've skirted around solving many times now and only took a little over an hour to accomplish, and I am fairly satisfied with the overall result.