-
Notifications
You must be signed in to change notification settings - Fork 3.4k
The end of orddicts as lists? #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd personally lean on option 4. A new data type, because the ordering in orddicts is not lineal, they are not lists of tuples in the simple sense of the expression where list implies order by position (push order). In some languages hash tables are ordered, insertion order. But they do not offer index-access. Hashes in Ruby 1.9 are one example of this. They may offer iterators, keys or values accessors that respect the order, but not an index-based access. From my perspective, it would be natural that the interface of orddicts be dominated by keys as well. It could be the case that Elixir users would expect index-based access on the sorted ordering of orddicts, don't have enough context to know if that is the case. But if that was not expected, I think I would not provide it. Then, the language can afford having ordinary tuples in orddicts, rather than special-casing them to constrain them. Just food for thought :). |
IMHO: Orddict should be implemented as binary tree. |
I prefer the option 4 too. |
Option 4 seems like the best option here. I'd feel strange not supporting the access protocol on both lists and orddicts |
I'd prefer option 4, as it is the cleanest. At this point caring too much about a possible performance decrease would be premature optimization. There will be options for reimplementing orddicts to perform better than plain lists anyway. Generally, I think that it's not a good idea to choose an implementation that would automatically put constraints on how we can use or what we can implement for other types. orddicts deserve their own type. They probably shouldn't represent just a different view upon lists (i.e., only provide an API for dealing with lists), without having their own type that can be inspected not only at compile- but also at runtime (e.g. with If the internal representation is a list, so well. However, making it easy (or even encourage) to treat orddicts as lists feels wrong to me. Aside from possibly creating that said source of confusion, changing the implementation of orddicts will very likely not be possible without breaking existing code then, which also feels wrong to me. |
I propose a mix of 2, 4, 5. This post goes kind of off-topic in the end, but since we touched on the Orddict question, I'd like to express some thoughts on the subject. Having access protocol for lists does not sound compelling to me. The main reason is that lists are linked, not contiguous. They shouldn't be used like arrays are used in other langs. Therefore, I think is it better to use As for dicts, I think d = [a: 1, b: 2, c: 3]
x = Orddict.get d, :a
modified_x = f(x)
Orddict.put d, modified_x If at some point I decide to replace Orddict with another kind of dictionary, I'm screwed. I could do On the other hand, having a single module for associative containers, we would do the following: d = [a: 1, b: 2, c: 3]
x = Dict.get d, :a
modified_x = f(x)
Dict.put d, modified_x Now, if I want to use a different kind of dict, I only need to change the constructor. Coming back to the access protocol, basically my position is as follows:
As a closing thought, I think it's definitely better to add something down the road then remove something. That is to say, limiting access protocol to be used with dicts only does not prevent you from adding it to lists later on. |
I forgot to mention the main point I was trying to make. Please consider this previous statement invalid
Here's what I really meant to say. I believe we all recognize the importance of protocols. Any type should be treated according to the functionality it provides, not the way it's implemented. If a type implements the Dict protocol, it means that the type guarantees it can associate values with keys and do faster-than-linear key lookup, and the user can rely on it. Everything else is volatile. If this concept is stated clearly throughout the documentation, users generally shouldn't care whether their Orddict is just a list. In other words, it's not implementor's problem if a user has broken his Orddict by treating it as a list. If the docs properly convey the idea that a user should rely only on those things that are promised in protocols, the user should not be tempted to treat Orddict as a list. |
@alco, good points. The decision 4 will already hide the fact an Orddict is simply a list of tuples, so we will be half way. I also think a Dict protocol will be needed somewhere down the road. About having an integer accessor for lists, I will cave in. I will remove the integer based access, because it is a bad practice, although I will likely allow the list[1..10](when we have ranges) somewhere down the road. Thanks everyone for the feedback. |
I am halfway the implementation of the proposal #4 and making orddicts no longer be lists is introducing, much, much pain in the codebase. The main reason is because a pair of tuples is a common abstraction in Erlang and if we move away from it, interacting with Erlang code becomes (much) harder. So I am leaning towards option #5. I will rename the wording from orddicts to keywords and the default Keyword (following Smalltalk parlance) module only accepts atoms as keys. We will eventually provide one of Orddict, Dict, GBTrees modules for generalized dictionaries. |
The original problem as stated in the first post:
The least confusing solution is to not implement access protocol for lists at the moment. This solves the ambiguity. I'm voting for this one. The discussion proposing different options has been raised due to the fact that we want to implement access protocol for lists after all, so now we need a new dict. One that would work seamlessly with Erlang like orddict does. José says that the problem with the What can be done? In my previous remark about protocols I was describing an implementation that could look somewhat like the following: # This syntax remains as a shortcut for creating list of tuples.
# NOT orddicts, mind you
[c: 1, b: 2, a: 3]
#=> [{:c, 1}, {:b, 2}, {:a, 3}]
:erlang.some_function_accepting_proplist [option: <value>]
#=> :ok
dict = Dict.new_orddict [c: 1, b: 2, a: 3]
#=> { :orddict, [{:a, 3}, {:b, 2}, {:c, 1}] }
dict = Dict.new_orddict [:c, :b, :a], [1, 2, 3]
#=> { :orddict, [{:a, 3}, {:b, 2}, {:c, 1}] }
Dict.get dict, :a
#=> 3
Dict.put dict, :b, 100
#=> { :orddict, [{:a, 3}, {:b, 100}, {:c, 1}] }
tree = Dict.new_gb_tree [c: 1, b: 2, a: 3]
#=> { :gb_tree, <something> }
# An alternative constructor. Basically, GBTree contains nothing
# but constructors like .from_orddict, .from_pairs, ...
tree = GBTree.from_orddict dict
#=> { :gb_tree, <something> }
Dict.get tree, :a
#=> 3
# Dict.to_orddict can already be implemented with protocols
:erlang.some_function_accepting_orddict Dict.to_orddict dict
:erlang.some_function_accepting_orddict Dict.to_orddict tree
keyword = Dict.new_keyword [{1, 1}, {2, 2}]
#=> RuntimeError: keys must be atoms
keyword = Dict.new_keyword [a: 1, b: 1]
#=> { :keyword, [{:a, 1}, {:b, 1}] }
:erlang.some_function_accepting_proplist Dict.to_orddict keyword So, yes, if we want to pass orddicts between Elixir and Erlang, we will need glue. But the described approach allows us to implement access protocol for all kinds of dicts and lists while still maintaining interoperability without glue in some cases. In the worst case, we could add a syntactic sugar to replace So, basically we can get rid of glue by not implementing access protocol for lists (#2) now or restricting orddicts to use only atoms as keys (#5). |
Implementing the access protocol has exposed a flaw in the current approach of having orddicts as lists, without wrapping them. The issue is that an orddict is simply a list of two-items tuples:
That said, when we use the access protocol as in
variable[1]
, it could either mean: getting an item from the list (in the orddict case this item would be a tuple) or getting a value in the orddict whose key is1
. We have five options to solve this problem:1. Do not implement the access protocol for orddicts
This means
variable[1]
would always be considered a list access, there is no quick access to the orddict values except byOrddict.get(orddict, key)
. I don't like this approach because it makes orddict access verbose and it is quite common in Elixir due key-value args.2. Do not implement the access protocol for lists
This means
variable[1]
would always do a key lookup in the orddict. Accessing list items could be done with the list module. This is not that bad considering developers shouldn't be accessing lists by indexes in any case (since it is linear). However, we may extend the protocol in the future to support rangesvariable[1..3]
and sometimes a reverse access is convenient (as invariable[-1]
) which makes this option not as good.3. Have a mixed protocol
One option is to have
variable[1]
meaning list access andvariable[:foo]
(with an atom) meaning orddict access. This will handle most of the cases with key-value args, but may be a source of confusion.4. Create a specific data type for Orddicts
Another option is, instead of making orddicts as lists, we could wrap them in a record, which would basically be
{ Orddict, list }
. This allows them to have their own protocols and therefore we could have both semantics applying clearly. Theoretically, wrapping orddicts in a tuple would reduce performance but this couldn't be verified in practice.5. Limit the scope of Orddicts
A final option is to limit the scope of Orddicts to only have atoms as keys. This would allows us to still represent the orddicts as lists and have the mixed access as in the option
3
. We could eventually rename theOrddict
module toKeyword
. A general purpose Orddict class could be added somewhere down the road.Conclusion
I am currently divided between options 4 and 5. More feedback is welcome.
/cc @wycats @Raynes @alco @rafaelfranca @fxn
The text was updated successfully, but these errors were encountered: