Rockbox

Tasklist

FS#8314 - Natural numeric sorting

Attached to Project: Rockbox
Opened by Nicolas Pennequin (nicolas_p) - Thursday, 13 December 2007, 16:48 GMT
Last edited by Thomas Martitz (kugel.) - Sunday, 01 March 2009, 18:12 GMT
Task Type Patches
Category User Interface
Status Closed
Assigned To No-one
Operating System All players
Severity Low
Priority Normal
Reported Version Daily build (which?)
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

This patch adds strnatcmp and strnatcasecmp, that are able to compare strings using natural numeric sorting. It also makes the trivial changes necessary to make use of this sorting in the file browser, without adding a setting.

This is very similar to  FS#2890 , but does several things better:
* It is not limited to leading numbers in a string
* It does not require a setting. strnatcmp and strnatcasecmp are drop-in replacements for strcmp and strcasecmp.
* It is much more generic and can be used anywhere else in the core. The file browser use is probably not the only place where we'd want to use natural sorting.

One thing that will be needed is to implement strnnatcmp and strnnatcasecmp.

This patch adds 628 bytes to a gigabeat build.

The code was written by Martin Pool and is available under the zlib license at http://sourcefrog.net/projects/natsort/

I found it and was inspired to port it to Rockbox thanks to http://www.codinghorror.com/blog/archives/001018.html
This task depends upon

Closed by  Thomas Martitz (kugel.)
Sunday, 01 March 2009, 18:12 GMT
Reason for closing:  Accepted
Additional comments about closing:  Committed in r20155. Thank you very much!
Comment by Linus Nielsen Feltzing (linusnielsen) - Tuesday, 25 November 2008, 16:42 GMT
After a discussion in the IRC channel, we came to the conclusion that it needs a separate option "Natural sorting" (or "Numerical sorting").
Comment by Thomas Martitz (kugel.) - Thursday, 15 January 2009, 23:13 GMT
Here's the setting. It's enabled by default (as per today's IRC discussion, where nobody raised his opinion against).

The settings name is "Human alphabetical", I'm not sure how nice it is, maybe someone comes up with a better name. Natural and numeric are worse imo (nature doesn't sort, and this doesn't only sort numeric files names).
As a side note: The code isn't prepared very well for 2 alphabetic sortings. That said, the other (by date etc) fall back to alphabetic sort if sorting failed. I made it falling back to human sort, as it's the default.
Another note: Are we considering renaming the classic alphabetical sorting to something that's seperating it more from "Human alphabetical"? Something like "ASCII alphabetical"? That of course means messing up most langs, so I didn't do anything in this direction yet.
Binsize increase is ~800 on my e200, of which ~300 is the setting.

I hope this can get in, I really prefer this human style sorting over ascii sorting.
Comment by Thomas Martitz (kugel.) - Thursday, 15 January 2009, 23:14 GMT
Oops, forgot to svn add the files.
Comment by Marc Guay (Marc_Guay) - Thursday, 15 January 2009, 23:29 GMT
I don't think that the term "Natural sorting" is intended to imply that Nature herself sorts things, but rather that it's natural for humans to sort things in this manner. Although it may be "natural" for a computer to sort it in the other way, I think that may be getting a little too deep into semantics for the scope of this project. :)
Comment by Steve Bavin (pondlife) - Friday, 16 January 2009, 08:41 GMT
I agree - "Natural sorting" is the standard term for this.
Comment by Thomas Martitz (kugel.) - Friday, 16 January 2009, 09:04 GMT
I don't know of a standard. It's probably de facto, but in my opinion this term doesn't fit at all.
This is what the IRC discussion yesterday showed too, i.e. that everybody knows it as natural sorting, but it's a weird term.
Comment by Dominik Riebeling (bluebrother) - Saturday, 17 January 2009, 14:17 GMT
Programmers tent to start counting with 0. Does this make it "human" for programmers to start with 0 but for non-programmers to start with 1? Some languages write from right to left (RTL). By which means should one be able calling the (from my understanding) more widely used LTR the "human" one? By which reasoning is sorting by numbers more "human" than sorting by comparing each digit individually? Btw, some languages even have different digits than the ones used in english. What sorting is "human" in that case?

I completely disagree with the naming "human sorting". Besides, there has to be a reason the imported code has the function named "strnatcmp" ... Arguing that "nature doesn't sort" doesn't make sense at all -- "natural sort" could be translated as "natürliche Sortierung", and this doesn't imply anything about mother nature. Besides (and probably most important), using a different term than the (probably de-facto) standard will only create confusion and has to be avoided IMO.
Comment by Thomas Martitz (kugel.) - Saturday, 17 January 2009, 15:32 GMT
Ok, I see the name is very controversial. It's seems very subjective which term fits best or which term doesn't fit at all.
Seeing this could be a long and wasteful discussion, sticking to an established de-facto standard seems like the best and most reasonable compromise.

The attached patch goes this compromise, and makes the database browser use this too (the database browser looks for the file sort setting, not for the dir sort setting), which should get by any disc and track number sorting issue. Give me feedback if that's unwanted.
Comment by alex wallis (alexwallis646) - Saturday, 17 January 2009, 15:42 GMT
Hi. OK, I have used 8314 now for some time. The original version of this patch before it was updated. I use it with the file browser, as I dislike the database. I don't completely understand what the latest revision of the patch does. Will this patch still sort folders and files for me? for example, on my player I have a series of folders called box 1 box 2 etc, going all the way up to box 32.

Before I used this patch, the folders were sorted incorrectly. after box 1, I would then get boxes 10 to 19 played assuming I was just going from one folder to the next. If this patch is still able to sort them into the correct numerical order then that's great. I was just a little confused with the above comment from Kugel.
Comment by Thomas Martitz (kugel.) - Saturday, 17 January 2009, 15:44 GMT
What I said in my above comment only applied to the database browser. Nothing changed for the file browser (except for the settings name which is changed globally of course).
Comment by alex wallis (alexwallis646) - Saturday, 17 January 2009, 15:57 GMT
Aah that's great then, I just wanted to be sure.
Its nice to see this patch being paid some attention as I really like it. I also really like 8630 as it means I don't have to reset the sleep timer value each time. But I think people don't like this patch for some reason.
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 16:47 GMT
So, another go at getting this in finally.
This implements the sorting as a modifier, which is applied to the existing sorting. Also, it avoids controversial names by adding the menu "Sort leading numbers" with the options "1, 2, 03, 10" and "03, 1, 10, 2".

I hope this can make people happy.
Comment by Dominik Riebeling (bluebrother) - Thursday, 26 February 2009, 19:49 GMT
Just had a quick look at the patch and the following caught my eye:

"sort numerics", "1 2 03 10,03 1 10 2"

As configuration strings this looks weird and inconsistent to me (remember, some people might want to edit the configuration files manually, and the manual describes this). If the menu calls it "Sort leading numbers" then the setting should get saved as "sort leading numbers" (or similar -- "sort numerics" isn't really matching) with possible values of "on,off", not some string containing somewhat arbitrary numbers. Similar applies to the strings shown in the menu itself. A menu "sort leading numbers" could have the options "numerical" and "alphabetical" (or "non-numerical"). This also would be more in line with other settings.

Maybe "Consider leading numbers on sorting" "Yes / No" would be a suitable naming. "Sort leading numbers" is misleading as it might imply that sorting will be done *only* by leading numbers. Also, at least to me it sound somewhat that only the numbers will be sorted (thus possibly not sorting the rest of the strings ...)

Simply avoiding controversial issues is not a solution, they should be discussed and agreed upon.
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 20:16 GMT
"sort numerics", "1 2 03 10,03 1 10 2"
This *is* the consensus we got in the recent discussions. Maybe you missed that, but using the examples instead of controversial names is already widely accepted. EDIT: This is NOT to avoid controversial issues. It has been agreed on that by quite a few people.

And about leading: As described in the mailing list, "leading" is already dropped. Just "Sort numbers"

I took the discussion to the mailing list, I'd appreciate if you answer there. It has more subscribers than this task.
Comment by Dominik Riebeling (bluebrother) - Thursday, 26 February 2009, 21:00 GMT
Well, it's general consensus that tracker tasks should get commented on that tasks, so why should I mail to the dev list now instead? Besides, if you write about "avoiding controversial names" you don't tell anything about a consensus that has been agreed on. Not all people do (or can -- there are people with do have a day job for a living) follow every discussion on IRC. There definitely wasn't a discussion on the ML.

I still don't think it's a good idea to move the "example values" to the saved setting. It might be consensus for the menu itself but I don't think that implies anything on how the value is saved.

Plus, my concerns about the naming still stand -- as the ML already showed confusion already begun.
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 21:03 GMT
In the 3 most recent discussion there was no disagreement about using examples.

"it's general consensus that tracker tasks should get commented on that tasks" - and it's a general consensus that the ML is always appropriate, especially when it comes to discussion of patches that seek inclusion.

"So why should I mail to the dev list now instead" - Again, because it has more subscribers, thus more opinions.

"There definitely wasn't a discussion on the ML." - This is what I just started, so please join or don't complain!
Comment by Dominik Riebeling (bluebrother) - Thursday, 26 February 2009, 21:08 GMT
1. "no disagreement" doesn't make it an agreement.
2. so now commenting in the tracker is getting less appropriate, just because the ML "is always appropriate"?
3. so this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?
4. There wasn't a discussion on the ML. Period. You just started one, but this doesn't change the history, and there wasn't a discussion in the past.

This kinda gets stupid ...
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 21:15 GMT
Indeed, and I really recommend you to read the logs of these discussions on irc. Then You'll see that we (=the people that care enough about this patch to join the discussion and have time for it) agreed on using examples.

"this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?" IMO, yes, when it comes to the point where the patch really needs opinions to get in. Click on history to see how many subscribers it has.
Comment by Daniel Stenberg (bagder) - Thursday, 26 February 2009, 21:18 GMT
In the IRC discussions I didn't see much, if any, concerns for the config file and what this would be called there and how it would be used so I think bluebrother raised a concern not previously dealt with very much
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 21:30 GMT
The config file has the examples too, this is due to CHOICE_SETTING, I don't think this can be changed without introducing some new _SETTING macro. But this wasn't bluebrothers main concern, was it?

edit: forget what I said about CHOICE_SETTING. Yes, it is of course possible to write other values into the config file. I can change that, but that's not holding it back anyway, is it?
Comment by Dominik Riebeling (bluebrother) - Thursday, 26 February 2009, 21:44 GMT
> Indeed, and I really recommend you to read the logs of these discussions on irc.

Well, you could have pointed out when these discussions happend. But of course you don't need to ...

> "this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?" IMO, yes

well, and my opinion is a definite NO here. If someone trying to drag a discussion to the ML makes the tracker less appropriate anyone could make every task he wants to the tracker less appropriate, thus moving the discussion away from it. The ML doesn't hold the history of the old comments that are present here, not everyone checks the tracker entry, and the main idea behind a tracker is: to have all comments in the same place. While I'd like to see more (initial!) discussion happening on the ML starting some thread can only accompany an existing task, not replace it. Thus, comments in the task are always appropriate.

And my concern about the setting name used for saving is that it isn't self explaining anymore like other settings are. If you read "sort numerics: 1 2 03 10" in the config file, how'd you know what to use to change that setting? While not every possible value is self explaining for other settings it definitely should be for an on/off setting (plus, "sort numerics" is still a bad naming IMO). And why is there a need to introduce a new setting macro? Just adjust the string used for saving -- it doesn't match the displayed string anyway (at least as of casing, and for non-english languages almost no setting name matches the title displayed in the menu.
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 21:50 GMT
Right, I was wrong (see my editted comment). I can surely change what config.cfg reads, if that's so wanted.
Comment by Thomas Martitz (kugel.) - Thursday, 26 February 2009, 22:19 GMT
Ok, another approach.

The settings now a on/off setting, the name is "Recognize numbers while sorting".

This IMO perfectly fits, since this is exactly what natsort does. It recognises numbers in strings while sorting, as opposed to strcmp, which only recognizes bytes (->digits), and treats them in a special way, but does not sort different from strcmp for strings without numbers.
Comment by Alexander Levin (fml2) - Friday, 27 February 2009, 19:11 GMT
Just a wild guess: wouldn't it make sense to use this setting everywhere where strings must be sorted? We could then have a global pointer to a function with the signature "int (char*, char*, int)". This pointer would be set either to strnsmp or to strnatcmp whenever the setting is changed. All the places in Rockbox where strings are compared would then use this pointer instead of strncmp thus using the right compare function. The setting would then better be put under System Settings.
Comment by Thomas Martitz (kugel.) - Friday, 27 February 2009, 19:14 GMT
I'm not sure about that. natsort is only meant for user-visible sorting stuff, not for internally, where strnatcmp could possibly mess everything up.
Also, the database already uses his own algorithm to sort by track number in a "natural" way, so there's no need.

Anyway, I can imagine it could work like you proposed in some places at least, I think we can think about it further when this is committed.
Comment by Steve Bavin (pondlife) - Saturday, 28 February 2009, 16:09 GMT
OK, any objections or is this ready for commit now?
Comment by Thomas Martitz (kugel.) - Saturday, 28 February 2009, 16:19 GMT
Well, given the endless tiresome discussions before I expect objections. But on the other hand, since nobody bothered to answer on the -dev ml, the objections don't seem to hold it back. I'd like to get it committed (very) soon.
Comment by Paul Louden (Llorean) - Saturday, 28 February 2009, 16:30 GMT
I don't think the name "Recognize numbers while sorting" is a very useful name. "Recognizing" them doesn't tell you what that means. Maybe "Interpret Numbers while Sorting" with the options "As digits" and "As whole numbers"?
Comment by Thomas Martitz (kugel.) - Saturday, 28 February 2009, 16:45 GMT
You mean ""Interpret numbers while sorting"? ;)

Anyway, sounds like a good idea to me. We could do that.

Loading...