A mixed multi-level order list sort implementation

I'm sorry, I know the title is confusing!😖 I couldn't think of a better way to summarize the task, so I went with that. So let me explain the problem we're trying to solve:

First, the basic version: imagine we have items in a list, and we're asked to sort them based on their order in another list:

We're given an array like this:

order = [
  'Jake',
  'Amy',
  'Charles',
  'Hitchcock',
  'Scully',
]

and some other array that may contain (some of) the items in the first (and maybe other items as well) in some unknown order. For example:

items = [
  'Scully'
  'Jake'
  'Rosa'
  'Hitchcock'
  'Amy'
]

The task is to sort array items so that its items are ordered the same as in the array order.

I call the first array an "order list", because it's a list of the items in the order they should be. ¯\(ツ)/¯ (Is there a standard name for this?)

The first real challenges here are that our order list can have extra items that aren't in items, and that items can have extra items that aren't in order, in which case you need to decide what to do with them (move them to the end? move them to the start?). Typically, the task will specify what you should do with the extras. However, these are minor problems.

For the "multi-level" part, we'll take it up a notch.

  • Instead of the items to be sorted being strings, they're objects/arrays with a name property. Example:

    $items = [
      { name:  'Scully' }, 
      { name: 'Jake' },
      // ...
    ]
    
  • The items also have sub-items, which also have some name strings (these strings are independent of the parents):

    $items = [
      { 
        name:  'Scully',
        related: [
          { name: 'Norm' },
          { name: 'Kelly' },
          { name: 'Michael' },
          // ...
        ]
      }, 
      {
        name: 'Jake',
        related: [
          { name: 'Franzia' },
          { name: 'Doug' },
      },
      // ...
    ]
    
  • And the order list can specify the order for their child items as well:

    $order = [
      'Jake' => [
        'Doug',
      ],
      'Scully' => [
        'Kelly',
        'Norm',
      ],
    ]
    
  • There's still more (the "mixed" part)! These children have a category property, in which case, they should first be sorted by their category, and then their name. And the order list can contain orders for all these!

    items = [
       { 
         name:  'Scully',
         related: [
           { name: 'Norm', category: 'self' },
           { name: 'Earl', category: 'friend' },
           { name: 'Kelly', category: 'pet' },
           { name: 'Michael', category: 'friend' },
           { name: 'Merissa', category: 'friend' },
           // ...
         ]
       }, 
       {
         name: 'Jake',
         related: [
           { name: 'Franzia', category: 'nemesis' },
           { name: 'Doug' },
       },
       // ...
     ]
    
    $order = [
      'Jake' => [
        'Doug',
      ],
      'Scully' => [
        'friend' => [
          'Michael',
          'Merissa',
        ],
        'pet' => [
          'Kelly',
        ],
        'self' => [
          'Norm',
        ]
      ],
    ]
    

Whew! It's getting pretty tricky now. But one more thing: the order list can omit items, or write them as values instead of keys if we don't care about the order of their child items.

$order = [
  'Jake',
  'Scully' => [
    // We don't care about the order of pets, 
    // just that they're before friends
    'pet',
    'friend' => [
      'Michael',
      'Merissa',
    ],
    // We want Norm to always be last among all children
    'Norm', 
  ],
]

This gives us something of a mixed bag in the order list. We have parent_name => category_name in some places (Scully => pet), parent_name => category_name => child_name in some others (Scully => friend => Michael), and parent_name => child_name in others (Scully => Norm). So, while the top level is always parent item names, the second level can mix category names with children names.

Okay, that's the problem we're trying to solve. It's not an abstract problem, by the way. It's a feature I recently implemented in Scribe. The items were groups of API endpoints, the children were endpoints, and the category field was an optional subgroup the endpoint could be in.

A note on design

This is something of a strange data structure we have here. The key is that PHP arrays are "God" data structures—they can contain both numbered and string indexes. Here's what you get when you dump the order list:

[
  0 => "Jake",
  "Scully" => [
    0 => "pet",
    "friend" => [
      "Michael",
      "Merissa",
    ],
    1 => "Norm",
  ],
]

It's awful, but it's beautiful. It's a cursed data structure, but we can use it to our advantage.

I learnt this approach from tools like Laravel and Docusaurus. It's nice not having to deal with boilerplate code, and focus on the things you really care about. The "pure" alternative would be making every thing conform to a rigid structure, like we had earlier, but I prefer having less boilerplate when the meaning is clear.

Alright, let's dive in!

Normalizing items

We start with the top-level sort. The first complication is that top-level items in the order list may have children (making them array keys, like "Scully" in the above example) or may not (making them array values, like "Jake"). So we create a helper function to normalize these. It'll take a mixed array like these and return the top-level items.

function getTopLevelItemsFromMixedList(array $mixedList): array
{
  $topLevels = [];
  foreach ($mixedList as $item => $value) {
    $topLevels[] = is_int($item) ? $value : $item;
  }
  return $topLevels;
}

This gives us what we need:

$order = [
  'Jake',
  'Scully' => [
    'pet',
    // ...
  ],
]
getTopLevelItemsFromMixedList($order);
// [ 'Jake', 'Scully' ]

And now we sort.

The comparator

Unfortunately, I don't think PHP's multitude of sort functions includes one that orders items in one list based on another (actually, I'm a bit surprised I haven't seen this inbuilt in any programming languages; let me know if you come across one).

Anyway, no fear, because we've still got this! We use usort(), passing in our own comparison function.

What's a comparison function? Well, most sort algorithms will compare two items at some stage to decide which should come first. It is this comparison function (I like to call it a "comparator") that we must provide.

The comparator must satisfy one requirement: given two items A and B, return -1 if A should be considered less than B, 1 if A should be considered greater than B, or 0 if both items should be considered equal. This is the same thing PHP's "spaceship operator", <=>, does.

The comparator is really important for values where native comparison operators (<, >, =, <=>) don't work (or don't work in the way you expect), such as strings and objects. With the comparator, you can specify yourself whether "chicken" comes before "egg".

If we were sorting numbers, our comparator would be:

$comparator = fn ($a, $b) => $a <==> $b

But we aren't, so let's write ours. The comparator has to extract the name property from each object, then compare them based on their positions in the order list (using array_search() to find the index of an item in an array.). Here we go!

function getComparator(array $order) {
  return function ($a, $b) use ($order) {
    $indexOfA = array_search($a['name'], $order);
    $indexOfB = array_search($b['name'], $order);
    return $indexOfA <=> $indexOfB;
  };
}

But remember that some items may not be in the order list. So we have to account for that as well. We make a decision: if any items aren't in the order list, we want to send them to the end of the sorted array (and sort them alphabetically). So we adjust our function:

function getComparator(array $order) {
  return function ($a, $b) use ($order) {
    $indexOfA = array_search($a['name'], $order);
    $indexOfB = array_search($b['name'], $order);
    
    // If both are in the $order list, compare them based on their position in the list
    if ($indexOfA !== false && $indexOfB !== false) {
      return $indexOfA <=> $indexOfB;
    }
    
    // If only A is in the $order list, then it must come before B.
    if ($indexOfA !== false) {
      return -1;
    }
  
    // If only B is in the $order list, then it must come before A.
    if ($indexOfB !== false) {
      return 1;
    }
  
    // If neither is present, fall back to natural sort
    return strnatcmp($a['name'], $b['name']);
  };
}

Here we're falling back to strnatcmp, which is like <=> for strings. It uses "natural" comparison, which is like alphabetical comparison, but the way humans expect it, not computers. For example, regular sort will place "Part 10" before "Part 2", while natural sort places "Part 2" before "Part 10".

We could also return 0 if we didn't care about sorting the extra items. 0 means that the items are equal, so PHP will not reorder them, and they'd end up at the end of the array in the same order.

And now we try it:

$orderList = [
  'Jake',
  'Scully' => [
    'pet',
    'friend' => [
      'Michael',
      'Merissa',
    ],
    'Norm',
  ],
  'Amy',
  'Charles',
  'Hitchcock',
];

$items = [
  [ 'name' => 'Scully' ],
  [ 'name' => 'Rosa' ],
  [ 'name' => 'Bill' ],
  [ 'name' => 'Jake' ],
  [ 'name' => 'Hitchcock' ],
  [ 'name' => 'Amy' ],
];
$topLevelOrder = getTopLevelItemsFromMixedList($orderList);
usort($items, getComparator($topLevelOrder));

Voila!

>>> $items
=> [
     [ "name" => "Jake" ],
     [ "name" => "Scully" ],
     [ "name" => "Amy" ],
     [ "name" => "Hitchcock" ],
     [ "name" => "Bill" ],
     [ "name" => "Rosa" ],
   ]

Here's a playground to try it for yourself.

sort_by()

There's a different, somewhat easier approach we could have chosen here. There isn't a function to automatically sort one array using another, but rather than overriding the comparison function, we can instead tell the sort function what to compare. In this case, we can map each item's name to its index in the order list, and sort that!

First, let's create a sort_by() function that takes a resolver function. It will sort the items based on what the resolver returns for each.

function sort_by(array $array, callable $resolver, ...$sortArgs) {
 $mapped = array_map($resolver, $array);
 asort($mapped, ...$sortArgs);
  
 $sorted = [];
 foreach ($mapped as $key => $index) {
   $sorted[] = $array[$key];
 }
  
 return $sorted;
}

Our resolver will be a function that maps the item to its index in the order list. But what we do if the item isn't in the order list? One trick is to return PHP_INT_MAX, INF (infinity, ∞), or just any large number. Returning this will ensure these items are sent to the end of the sorted list.

But in this case, we don't just want them to go to the end; we'd ideally like to sort them with natural sort as well (so "Bill" should come before "Rosa"). So instead we return the large number concatenated with the item's name. The number has to be the same for both items, so they both go the back.

To clarify: Given ["Rosa", "Bill"]:

  • If we return INF (or any large number) for both, PHP will send them both to the back, but they'll remain in this order
  • If we return "9999Rosa" and "9999Bill", PHP will sort them: ["Bill", "Rosa"]

How do we pick a number? We could just use something like 99999999, but a guaranteed way is to use the length of the order list, since the number just needs to be larger than the largest index in the list. Putting this all together, we get:

$sortKeyResolver = function ($item) use ($topLevelOrder) { 
  $orderListIndex = array_search($item['name'], $topLevelOrder);
  return $orderListIndex === false
    ? count($topLevelOrder).$item['name'] 
    : $orderListIndex;
};

And sorting (see playground):

$orderList = [
  'Jake',
  'Scully' => [
    'pet',
    'friend' => [
      'Michael',
      'Merissa',
    ],
    'Norm',
  ],
  'Amy',
  'Charles',
  'Hitchcock',
];

$items = [
  [ 'name' => 'Scully' ],
  [ 'name' => 'Rosa' ],
  [ 'name' => 'Bill' ],
  [ 'name' => 'Jake' ],
  [ 'name' => 'Hitchcock' ],
  [ 'name' => 'Amy' ],
];

sort_by($items, $sortKeyResolver, SORT_NATURAL);

Result:

[
  [ "name" => "Jake" ],
  [ "name" => "Scully" ],
  [ "name" => "Amy" ],
  [ "name" => "Hitchcock" ],
  [ "name" => "Bill" ],
  [ "name" => "Rosa" ],
]

Nested sort

Moving on, let's sort the child items (in related). If you've forgotten, remember our items array can look like this:

$items = [
  [ 
    'name' => 'Scully',
    'related' => [
      [ 'name' => 'Norm', 'category' => 'self' ],
      [ 'name' => 'Earl', 'category' => 'friend' ],
      [ 'name' => 'Kelly', 'category' => 'pet' ],
      [ 'name' => 'Michael', 'category' => 'friend' ],
      [ 'name' => 'Merissa', 'category' => 'friend' ],
    ], 
  ],
  // ...
];

Assuming we didn't have to worry about the category field, we could sort the child items with the sort_by() approach:

foreach ($items as $item) {
  $children = $item['related'];
  
  if (empty($orderList[$item['name']])) {
    // No order was specified for the children, so just sort them as normal
    $item['related'] = sort_by($children, fn ($child) => $child['name'], SORT_NATURAL);
    continue;
  }

  // An order was specified; sort the children with it
  $childrenOrder = getTopLevelItemsFromMixedList($orderList[$item['name']]);
  $sortKeyResolver = function ($childItem) use ($childrenOrder) { 
    $orderListIndex = array_search($childItem['name'], $childrenOrder);
    return $orderListIndex === false 
      ? count($childrenOrder).$childItem['name'] : $orderListIndex;
    };

  $item['related'] = sort_by($children, $sortKeyResolver, SORT_NATURAL);
}

But we can't just do that. Our other requirement is to sort by category first, if it exists. And the second level of the order list may contain either name or category values. So let's rework this.

The 0.1 trick

We're still going to use sort_by(). But the logic of our resolver has to change slightly.

As a reminder, here's a subset of our order list and items:

$orderList = [
  // ...
  'Scully' => [
    'pet',
    'friend' => [
      'Michael',
      'Merissa',
    ],
    'Norm',
  ],
  // ...
];

$items = [
  [ 
    'name' => 'Scully',
    'related' => [
      [ 'name' => 'Norm', 'category' => 'self' ],
      [ 'name' => 'Earl', 'category' => 'friend' ],
      [ 'name' => 'Kelly', 'category' => 'pet' ],
      [ 'name' => 'Michael', 'category' => 'friend' ],
      [ 'name' => 'Merissa', 'category' => 'friend' ],
    ], 
  ],
  // ...
];

There are four scenarios to account for:

  1. The child item is mentioned by name in the second-level order list (example: "Norm").
  2. The child item's category is in the L2 list, but its name isn't mentioned in the category's (L3) list (example: "Earl").
  3. The child item's category is in the L2 list, and its name is mentioned in the category's (L3) list (example: "Merissa" and "Michael").
  4. The child item's name and category are both not mentioned.

For scenario 1, we can simply return the child item's index as before, and for scenario 4, we can stick with our "large number + item name" trick.

For scenario 2, we adapt that trick a bit. This time, we return 'category index + item name'. And for scenario 3, we return category index + item index * 0.1. (Note that the previous +s were string concatenation, but this is addition.)

Let's eamine what this gives us for each child item in Scully:

  • [ 'name' => 'Norm', 'category' => 'self' ]: Scenario 1, so we return 2
  • [ 'name' => 'Earl', 'category' => 'friend' ]: Scenario 2 -> "1Earl"
  • [ 'name' => 'Kelly', 'category' => 'pet' ]: Scenario 3 -> 0.0
  • [ 'name' => 'Michael', 'category' => 'friend' ]: Scenario 3 -> 1.0
  • [ 'name' => 'Merissa', 'category' => 'friend' ]: Scenario 3 -> 1.1

Putting these together, we'll get [2, "1Earl", 0.0, 1.0, 1.1]. And when PHP sorts this naturally, we get:

This works because the sort value doesn't have to be an integer. The idea behind the "0.1 trick" is to use decimals to insert items in between two positions. And the "1Earl" part works because PHP's natural sort will sort the numerical part first, then the string part (so 1.3 will come before "1a", which will come before "1earl").

All done! Here it is in code:

foreach ($items as &$item) {
  $children = $item['related'] ?? [];
  
  if (empty($orderList[$item['name']])) {
    // No order was specified for the children, so just sort them as normal
    $item['related'] = sort_by($children, fn ($child) => $child['name'], SORT_NATURAL);
    continue;
  }

  // An order was specified; it can contain children names or categories,
  // so let'sjust  call it the "Level 2" order (L2).
  $l2Order = getTopLevelItemsFromMixedList($orderList[$item['name']]);
  $sortKeyResolver = function ($childItem) use ($l2Order) {
    // First, we check if there's an entry for the item's name
    $indexOfChildInL2Order = array_search($childItem['name'], $l2Order);
    if ($indexOfChildInL2Order !== false) {
      return $indexOfChildInL2Order;
    }

    // Check if there's an entry for the item's category
    $indexOfCategoryInL2Order = array_search($childItem['category'], $l2Order);
    if ($indexOfCategoryInL2Order !== false) {
      // There's an entry for the category! 
      // Now check if there's an entry for the item within the category.
      $orderOfChildrenInCategory = $l2Order[$childItem['category']] ?? [];
      $indexOfChildInCategory = array_search($childItem['name'], $orderOfChildrenInCategory);
      
      return $indexOfChildInCategory === false 
        ? $indexOfCategoryInL2Order.$childItem['name'] 
        : ($indexOfCategoryInL2Order + ($indexOfChildInCategory * 0.1));
    }

    return count($l2Order).$childItem['name'];
  };

  $item['related'] = sort_by($children, $sortKeyResolver, SORT_NATURAL);
}

Here's a gist and a playground with the full code.



I write about my software engineering learnings and experiments. Stay updated with Tentacle: tntcl.app/blog.shalvah.me.

Powered By Swish