This is the bug/patch tracker for Rockbox. Click here for more information.
Quick links: Bugs · Patches · Rockbox frontpage
FS#8314 - Natural numeric sorting
Attached to Project:
Rockbox
Opened by Nicolas Pennequin (nicolas_p) - Thursday, 13 December 2007, 17:48 GMT+2
Last edited by Thomas Martitz (kugel.) - Sunday, 01 March 2009, 19:12 GMT+2
Opened by Nicolas Pennequin (nicolas_p) - Thursday, 13 December 2007, 17:48 GMT+2
Last edited by Thomas Martitz (kugel.) - Sunday, 01 March 2009, 19:12 GMT+2
|
DetailsThis patch adds strnatcmp and strnatcasecmp, that are able to compare strings using natural numeric sorting. It also makes the trivial changes necessary to make use of this sorting in the file browser, without adding a setting.
This is very similar to * It is not limited to leading numbers in a string * It does not require a setting. strnatcmp and strnatcasecmp are drop-in replacements for strcmp and strcasecmp. * It is much more generic and can be used anywhere else in the core. The file browser use is probably not the only place where we'd want to use natural sorting. One thing that will be needed is to implement strnnatcmp and strnnatcasecmp. This patch adds 628 bytes to a gigabeat build. The code was written by Martin Pool and is available under the zlib license at http://sourcefrog.net/projects/natsort/ I found it and was inspired to port it to Rockbox thanks to http://www.codinghorror.com/blog/archives/001018.html |
This task depends upon
Closed by Thomas Martitz (kugel.)
Sunday, 01 March 2009, 19:12 GMT+2
Reason for closing: Accepted
Additional comments about closing: Committed in r20155. Thank you very much!
Sunday, 01 March 2009, 19:12 GMT+2
Reason for closing: Accepted
Additional comments about closing: Committed in r20155. Thank you very much!
The settings name is "Human alphabetical", I'm not sure how nice it is, maybe someone comes up with a better name. Natural and numeric are worse imo (nature doesn't sort, and this doesn't only sort numeric files names).
As a side note: The code isn't prepared very well for 2 alphabetic sortings. That said, the other (by date etc) fall back to alphabetic sort if sorting failed. I made it falling back to human sort, as it's the default.
Another note: Are we considering renaming the classic alphabetical sorting to something that's seperating it more from "Human alphabetical"? Something like "ASCII alphabetical"? That of course means messing up most langs, so I didn't do anything in this direction yet.
Binsize increase is ~800 on my e200, of which ~300 is the setting.
I hope this can get in, I really prefer this human style sorting over ascii sorting.
This is what the IRC discussion yesterday showed too, i.e. that everybody knows it as natural sorting, but it's a weird term.
I completely disagree with the naming "human sorting". Besides, there has to be a reason the imported code has the function named "strnatcmp" ... Arguing that "nature doesn't sort" doesn't make sense at all -- "natural sort" could be translated as "natürliche Sortierung", and this doesn't imply anything about mother nature. Besides (and probably most important), using a different term than the (probably de-facto) standard will only create confusion and has to be avoided IMO.
Seeing this could be a long and wasteful discussion, sticking to an established de-facto standard seems like the best and most reasonable compromise.
The attached patch goes this compromise, and makes the database browser use this too (the database browser looks for the file sort setting, not for the dir sort setting), which should get by any disc and track number sorting issue. Give me feedback if that's unwanted.
Before I used this patch, the folders were sorted incorrectly. after box 1, I would then get boxes 10 to 19 played assuming I was just going from one folder to the next. If this patch is still able to sort them into the correct numerical order then that's great. I was just a little confused with the above comment from Kugel.
Its nice to see this patch being paid some attention as I really like it. I also really like 8630 as it means I don't have to reset the sleep timer value each time. But I think people don't like this patch for some reason.
This implements the sorting as a modifier, which is applied to the existing sorting. Also, it avoids controversial names by adding the menu "Sort leading numbers" with the options "1, 2, 03, 10" and "03, 1, 10, 2".
I hope this can make people happy.
"sort numerics", "1 2 03 10,03 1 10 2"
As configuration strings this looks weird and inconsistent to me (remember, some people might want to edit the configuration files manually, and the manual describes this). If the menu calls it "Sort leading numbers" then the setting should get saved as "sort leading numbers" (or similar -- "sort numerics" isn't really matching) with possible values of "on,off", not some string containing somewhat arbitrary numbers. Similar applies to the strings shown in the menu itself. A menu "sort leading numbers" could have the options "numerical" and "alphabetical" (or "non-numerical"). This also would be more in line with other settings.
Maybe "Consider leading numbers on sorting" "Yes / No" would be a suitable naming. "Sort leading numbers" is misleading as it might imply that sorting will be done *only* by leading numbers. Also, at least to me it sound somewhat that only the numbers will be sorted (thus possibly not sorting the rest of the strings ...)
Simply avoiding controversial issues is not a solution, they should be discussed and agreed upon.
This *is* the consensus we got in the recent discussions. Maybe you missed that, but using the examples instead of controversial names is already widely accepted. EDIT: This is NOT to avoid controversial issues. It has been agreed on that by quite a few people.
And about leading: As described in the mailing list, "leading" is already dropped. Just "Sort numbers"
I took the discussion to the mailing list, I'd appreciate if you answer there. It has more subscribers than this task.
I still don't think it's a good idea to move the "example values" to the saved setting. It might be consensus for the menu itself but I don't think that implies anything on how the value is saved.
Plus, my concerns about the naming still stand -- as the ML already showed confusion already begun.
"it's general consensus that tracker tasks should get commented on that tasks" - and it's a general consensus that the ML is always appropriate, especially when it comes to discussion of patches that seek inclusion.
"So why should I mail to the dev list now instead" - Again, because it has more subscribers, thus more opinions.
"There definitely wasn't a discussion on the ML." - This is what I just started, so please join or don't complain!
2. so now commenting in the tracker is getting less appropriate, just because the ML "is always appropriate"?
3. so this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?
4. There wasn't a discussion on the ML. Period. You just started one, but this doesn't change the history, and there wasn't a discussion in the past.
This kinda gets stupid ...
"this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?" IMO, yes, when it comes to the point where the patch really needs opinions to get in. Click on history to see how many subscribers it has.
edit: forget what I said about CHOICE_SETTING. Yes, it is of course possible to write other values into the config file. I can change that, but that's not holding it back anyway, is it?
Well, you could have pointed out when these discussions happend. But of course you don't need to ...
> "this task having less subscribers (can you backup this btw? How many users are actually watching this task?) does make it less appropriate to comment?" IMO, yes
well, and my opinion is a definite NO here. If someone trying to drag a discussion to the ML makes the tracker less appropriate anyone could make every task he wants to the tracker less appropriate, thus moving the discussion away from it. The ML doesn't hold the history of the old comments that are present here, not everyone checks the tracker entry, and the main idea behind a tracker is: to have all comments in the same place. While I'd like to see more (initial!) discussion happening on the ML starting some thread can only accompany an existing task, not replace it. Thus, comments in the task are always appropriate.
And my concern about the setting name used for saving is that it isn't self explaining anymore like other settings are. If you read "sort numerics: 1 2 03 10" in the config file, how'd you know what to use to change that setting? While not every possible value is self explaining for other settings it definitely should be for an on/off setting (plus, "sort numerics" is still a bad naming IMO). And why is there a need to introduce a new setting macro? Just adjust the string used for saving -- it doesn't match the displayed string anyway (at least as of casing, and for non-english languages almost no setting name matches the title displayed in the menu.
The settings now a on/off setting, the name is "Recognize numbers while sorting".
This IMO perfectly fits, since this is exactly what natsort does. It recognises numbers in strings while sorting, as opposed to strcmp, which only recognizes bytes (->digits), and treats them in a special way, but does not sort different from strcmp for strings without numbers.
Also, the database already uses his own algorithm to sort by track number in a "natural" way, so there's no need.
Anyway, I can imagine it could work like you proposed in some places at least, I think we can think about it further when this is committed.
Anyway, sounds like a good idea to me. We could do that.