Rockbox mail archiveSubject: Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
From: Paul Louden <paulthenerd_at_gmail.com>
Date: Wed, 18 Mar 2009 18:13:50 -0500
Al Le wrote:
> In my understanding, the intention of natsort is to change the rules
> of how strings (file names) are sorted. As a side effect, it alos
> fixes the problem with 1, 10, 2. But it's not only about that. It's
> more general.
What it does is more general. What it was intended to _fix_ was that
specific type of case. That's what prompted initial discussion of it,
and that case was the focus around which every proposal (that I
remember) for sorting to be "improved" or "changed" was based on.
Basically, the logic went "users expect a computer to know that the
series of numbers 1, 2, 3, 4, 5, 6, 7, 8 9, 10 goes in that order, and
seeing them in ASCII order is unexpected."
My personal position is also that if a user adds a 0 before a number,
they expect it to change something, rather than being ignored. I think,
on average, more 0s (in lists meant to be sorted) will be intentional
than "accidental." If you want the list sorted you either name the
files, or use a set of files named already to be sorted. I think it's
exceptionally rare that you'll have a list of files that a user has
created and intended to be sorted that have 3, 04, 5 in them and mean it
in that order. Meanwhile, it's exceedingly _rare_ in my opinion that
people would intend 1, 10, 2, 3, 4 as their sorting order. And in that
case we're not throwing out any data they added, just trying to "read"
what is written, rather than treat it as a string of unique characters.
I think 004 being treated as "00, then 4" is the same as 4a being
treated as "4, then a" rather than the string "4a". Otherwise we may as
well say "numbers need a space after them to denote they aren't part of
strings" or something. For example, l337-speak named files currently may
be sorted extremely awkwardly. B007Y for example. We should probably
assume zeros are intentional there (in my opinion).
I think it's just more consistent if we don't throw out any characters.
Received on 2009-03-19