Rockbox

  • Status Unconfirmed   Reopened
  • Percent Complete
    0%
  • Task Type Bugs
  • Category User Interface
  • Assigned To
    kugel.
  • Operating System All players
  • Severity Low
  • Priority Very Low
  • Reported Version Version 3.1
  • Due in Version Undecided
  • Due Date Undecided
  • Votes
  • Private
Attached to Project: Rockbox
Opened by bryan - 2009-03-17
Last edited by pondlife - 2011-09-06

FS#10030 - latest nat sort fix still sorts wrong

In the never ending saga of strnatcmp the fix for 10029 sorts as
001
002
01
02

Attached is yet another attempt to get it right.

Could you post it as a patch so we can see what is changed more readily without having to diff manually?

bryan commented on 2009-03-17 21:01

patch a is against 20339
patch b is against 20345

So you're counting the zeros? I'm thinking that could be done easier.

Anyway, I'm going to close again. We thought of special casing "real" leading zeros (those at the very beginning of the string, not those within the string when the string before is the same for both strings). But we decided on following the original implementation for now. It gives consistent results. It gives consistent behavior with other software using the algorithm.

And I think sorting
001
002
01
02
isn't a bug afterall. There's a good chance that this is intention. Also, imagine this are decimal numbers.
1.001
1.002
1.01
1.02
It would be even more correct for this case.
Or this case:
007_tomorrow-never-dies
24_S02E11
This sorting makes more sense to me too.

We've discussed this deeply, see discussion starting here: http://www.rockbox.org/irc/log-20090317#17:53:35

Due to the latest mailing list discussion, I've reopened this task.
As of now, the general consensus seems that this task is a bug, and should be fixed.

Thus, I'm attaching a patch which resolves this issue.

Before looping through the chars, strnatcmp will go through leading zeros, increasing the index. After that is finished, the index is decreased by 1 if there the current char is a non-digit, so that the leading zero counts for non-digits.

This leads to (strng → value for sorting):
00 → 0
0b → 0b
01 → 1
1 → 1

which is the behavior of nautilus and windows explorer too.

It also fixes  FS#10031 , by using tolower instead of toupper for case-insenstive sorting. tolower is used by strcmp too, so that strnatcmp sorts chars between 'Z' and 'a' like strcmp does.

bryan commented on 2009-03-19 18:02

Here's something I've been working on. It's based on the pre-revert version. It only deals with leading zeros. It does nothing with spaces.

I had to introduce a bias variable, since there's some kind of inconsistency on how the nat comp and strcmp handle the number of leading zeroes. For one file it favored more zeroes and in another it would favor less zeroes.

Also there's ifdef alternative code for a version that removes leading zeroes for number and letters. so [00, 1, 0b, 002] sorts as [00, 1, 002, 0b] instead of [00, 0b, 1, 002]

Uhm, I think my version does pretty good, have you tried it?

bryan commented on 2009-03-19 18:05

Oops bad line corrected version

bryan commented on 2009-03-19 18:13

Your's works fine.

Try it on files named.

000.txt
00.txt
001.txt
01.txt

You should see the inconsistency I ran into;

sorts as

00.txt
000.txt
001.txt
01.txt

You're removing compare_left again, which should break fractional numbers like 1.001.

Well, this is due to the strcmp fallback.

bryan commented on 2009-03-19 19:01

Sorry, I wasn't try to remove compare_left again or replace existing code. I was only trying to illustrate a different take. Basically, a run of numbers is compared as a integer, everything else is a separator. I used pre-revert code for this since it's closer to that assumption and easier to illustrate. I didn't need two functions since all number compares are treated the same.

For my code 1.001 is treated as 1[seperator]1 not One thousand and One or 1 point 1/1000th. Or whatever.

Personally, I think this is the easiest to implement and easiest to explain to non techie users.

As for the strcmp fallback, Yeah I know that's what causing it. I pointed that out in a previous comment. But, you just know somebody is going to file that as a bug, so I thought I'd give you a heads up. :D

Then try 1.002 and 1.01 as filenames.

So, your patch breaks decimal numbers (or any number after a non-digit separator).

The inconsitency you noted is only when the series of digits is the same. They sorted correctly if the series is different. (different by the means of: 001.txt and 01.txt are different, 000.txt and 00.txt are not).
I rather not break decimal numbers. And your inconsistency is at least consistently appearing, so my wishes to fix that are limited (and your version breaks more than it fixes).

Small update, no functional chage.

bryan commented on 2009-03-19 19:56

I'm not sure what you're trying to get at. They sort differently as to be expected. Different assumptions, different code.

Mine
1.01 → 1[sep]1
1.002 → 1[sep]2

Yours and current
1.002 → 1 and 2/1000
1.01 → 1 and 10/1000

If your saying 1.002 should be sorted first because it less than 1.01 because it can be interpreted as a real number. Mine is wrong, current is right.

If those are version numbers both are right and both are wrong. .01 can be interpreted as a 1 or a 10. Depending on who you ask.

If your using another countries rules they could be interpreted as 1.01 and 1002. In which case mine is wrong, but sorts right for the wrong reasons and current code is wrong.

I apologize if I'm misunderstanding you. This is obviously an easy subject for that to happen. Just look at the irc and dev list. But trying to argue what 1.002 vs 1.01 'means' just leads down a path where nobody can agree and everybody is frustrated.

I'm not suggesting my patch be included in RB. I was trying to implement the minimal version of 'sort by number' and figured somebody would be interested in looking at it. Not deal with spaces, not guess what somebody meant by add leading zeros, underscore, or whatever. Just if it's 0-9 treat it as a number.

I interpret it as "real number". That's (IIUC) what the discussion said should be done. Anything serves as a separator, so 1a002 < 1a01 is correct too, so it's not locale dependent.

This is also wanted for discnumber.tracknumber. 1.1 < 1.10, as well as 1.01 < 1.10, that's how it should sort discumber.tracknumber filenames.

fml2 commented on 2009-03-19 22:03
I interpret it as "real number"

I thought we didn't want to interpret real numbers because their interpretation depends on what char is used as a decimal point, which is country dependent. Hence I think that the logic to interpret "1.002" as "1 sep 2" is to be used.

And I repeatedly said, that this isn't dependent on locales, since any char acts as separator.

Maybe I missed it, but where did we agree on intentionally breaking decimal numbers? I heavily disagree on that.

Not to mention, that this would be against doing what the major filebrowser do again.

Project Manager

I'm strongly against trying to interpret floats. The separator _is_ locale dependent and you cannot code your way around that. Accepting "any" character just makes it worse.

A number should be a sequence of digits. Period.

It's not about interpreting floats. It's about treating any number within the string (i.e. with non-digits before) as fractional (this of course includes floats also).

Anyway, we agreed on mimic'ing major browsers, and I just double checked windows explorer and nautilus. And they do NOT interpret floats, i.e. they sort: 1.1 < 1.03 < 1.4 < 1.007. So, I guess I'm ok with it.

edit: ls -lv though interprets floats.

I might have missed something, but the discussion on the mailing list mostly was about treating leading zeros. If you consider any string a separator, how would files like "2_episode5", "2_episode_10" get sorted? I'm completely with Daniel here.

2_episode5" < "2_episode_10", because 5 comes before _.

Anyway, that's why I asked again. I may have missed it, because the discussion was almost entirely about leading zeros at the beginning of the string.

If we want to follow major browsers, we do not interpret fractional numbers.

argh, unintentionally put a stray _ in my filename examples. Question was intended to be "2_episode5", "2_episode10". I think that should have been clear in the first place but well.

Handling stuff other than leading zeros wasn't discussed in detail, so I guess there is another discussion coming up. Though I guess it will be shorter.

It would still sort 2_episode5 < 2_episode10. There's no float involved. strnatcmp identifies floats/fractional by a leading zero (leading for this particular series of digits, not for the whole string).

fix or close this…

Loading...

Available keyboard shortcuts

Tasklist

Task Details

Task Editing