This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#10030 - latest nat sort fix still sorts wrong
Attached to Project:
Rockbox
Opened by bryan vandyke (bryan) - Tuesday, 17 March 2009, 20:54 GMT+2
Last edited by Steve Bavin (pondlife) - Tuesday, 06 September 2011, 18:31 GMT+2
Opened by bryan vandyke (bryan) - Tuesday, 17 March 2009, 20:54 GMT+2
Last edited by Steve Bavin (pondlife) - Tuesday, 06 September 2011, 18:31 GMT+2
|
DetailsIn the never ending saga of strnatcmp the fix for 10029 sorts as
001 002 01 02 Attached is yet another attempt to get it right. |
This task depends upon
patch b is against 20345
Anyway, I'm going to close again. We thought of special casing "real" leading zeros (those at the very beginning of the string, not those within the string when the string before is the same for both strings). But we decided on following the original implementation for now. It gives consistent results. It gives consistent behavior with other software using the algorithm.
And I think sorting
001
002
01
02
isn't a bug afterall. There's a good chance that this is intention. Also, imagine this are decimal numbers.
1.001
1.002
1.01
1.02
It would be even more correct for this case.
Or this case:
007_tomorrow-never-dies
24_S02E11
This sorting makes more sense to me too.
We've discussed this deeply, see discussion starting here: http://www.rockbox.org/irc/log-20090317#17:53:35
As of now, the general consensus seems that this task is a bug, and should be fixed.
Thus, I'm attaching a patch which resolves this issue.
Before looping through the chars, strnatcmp will go through leading zeros, increasing the index. After that is finished, the index is decreased by 1 if there the current char is a non-digit, so that the leading zero counts for non-digits.
This leads to (strng -> value for sorting):
00 -> 0
0b -> 0b
01 -> 1
1 -> 1
which is the behavior of nautilus and windows explorer too.
It also fixes
FS#10031, by using tolower instead of toupper for case-insenstive sorting. tolower is used by strcmp too, so that strnatcmp sorts chars between 'Z' and 'a' like strcmp does.I had to introduce a bias variable, since there's some kind of inconsistency on how the nat comp and strcmp handle the number of leading zeroes. For one file it favored more zeroes and in another it would favor less zeroes.
Also there's ifdef alternative code for a version that removes leading zeroes for number and letters. so [00, 1, 0b, 002] sorts as [00, 1, 002, 0b] instead of [00, 0b, 1, 002]
Try it on files named.
000.txt
00.txt
001.txt
01.txt
You should see the inconsistency I ran into;
sorts as
00.txt
000.txt
001.txt
01.txt
For my code 1.001 is treated as 1[seperator]1 not One thousand and One or 1 point 1/1000th. Or whatever.
Personally, I think this is the easiest to implement and easiest to explain to non techie users.
As for the strcmp fallback, Yeah I know that's what causing it. I pointed that out in a previous comment. But, you just know somebody is going to file that as a bug, so I thought I'd give you a heads up. :D
The inconsitency you noted is only when the series of digits is the same. They sorted correctly if the series is different. (different by the means of: 001.txt and 01.txt are different, 000.txt and 00.txt are not).
I rather not break decimal numbers. And your inconsistency is at least consistently appearing, so my wishes to fix that are limited (and your version breaks more than it fixes).
Mine
1.01 -> 1[sep]1
1.002 -> 1[sep]2
Yours and current
1.002 -> 1 and 2/1000
1.01 -> 1 and 10/1000
If your saying 1.002 should be sorted first because it less than 1.01 because it can be interpreted as a real number. Mine is wrong, current is right.
If those are version numbers both are right and both are wrong. .01 can be interpreted as a 1 or a 10. Depending on who you ask.
If your using another countries rules they could be interpreted as 1.01 and 1002. In which case mine is wrong, but sorts right for the wrong reasons and current code is wrong.
I apologize if I'm misunderstanding you. This is obviously an easy subject for that to happen. Just look at the irc and dev list. But trying to argue what 1.002 vs 1.01 'means' just leads down a path where nobody can agree and everybody is frustrated.
I'm not suggesting my patch be included in RB. I was trying to implement the minimal version of 'sort by number' and figured somebody would be interested in looking at it. Not deal with spaces, not guess what somebody meant by add leading zeros, underscore, or whatever. Just if it's 0-9 treat it as a number.
This is also wanted for discnumber.tracknumber. 1.1 < 1.10, as well as 1.01 < 1.10, that's how it should sort discumber.tracknumber filenames.
I thought we didn't want to interpret real numbers because their interpretation depends on what char is used as a decimal point, which is country dependent. Hence I think that the logic to interpret "1.002" as "1 sep 2" is to be used.
Maybe I missed it, but where did we agree on intentionally breaking decimal numbers? I heavily disagree on that.
A number should be a sequence of digits. Period.
Anyway, we agreed on mimic'ing major browsers, and I just double checked windows explorer and nautilus. And they do NOT interpret floats, i.e. they sort: 1.1 < 1.03 < 1.4 < 1.007. So, I guess I'm ok with it.
edit: ls -lv though interprets floats.
Anyway, that's why I asked again. I may have missed it, because the discussion was almost entirely about leading zeros at the beginning of the string.
If we want to follow major browsers, we do not interpret fractional numbers.
Handling stuff other than leading zeros wasn't discussed in detail, so I guess there is another discussion coming up. Though I guess it will be shorter.