Skip to content

Add uniq() to SequenceType #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions stdlib/public/core/Sequence.swift
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,18 @@ extension SequenceType where Generator.Element : Equatable {
return split(maxSplit, allowEmptySlices: allowEmptySlices,
isSeparator: { $0 == separator })
}

/// Return an `Array` containing the elements of `self`
@warn_unused_result
public func uniq() -> [Generator.Element] {
return reduce(Array<Generator.Element>()) { (arr, element) -> [Generator.Element] in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually the right way to do this, because it's O(n^2). See http://airspeedvelocity.net/2015/08/03/arrays-linked-lists-and-performance/ for details. You should switch to using a temporary array and a plain old for loop, despite the ugliness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rothomp3 beat me to it :) yes reduce can really suffer performance-wise. here's an elegant answer I stumbled upon a few month back (it's a global function in this case, but making it into a protocol extension is trivial)

func uniq<S: SequenceType, E: Hashable where E == S.Generator.Element>(source: S) -> [E] {
  var seen = [E: Bool]()
  return source.filter { seen.updateValue(true, forKey: $0) == nil }
}

http://stackoverflow.com/a/33984316/870155

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dictionary is probably unnecessary - there's no need to store the Bool values. You could do the same with just a Set:

extension SequenceType where Generator.Element : Hashable {

  public func uniq() -> [Generator.Element] {
    var prevs: Set<Generator.Element> = []
    return filter { el in
      if prevs.contains(el) { return false }
      prevs.insert(el)
      return true
    }
  }
}

Which is maybe a little more efficient. Both the Dictionary and Set versions translate easily into a lazy method:

extension LazySequenceType where Generator.Element : Hashable {

  public func uniq() -> LazyFilterSequence<Self> {
    var prevs: Set<Generator.Element> = []
    return lazy.filter { el in
      if prevs.contains(el) { return false }
      prevs.insert(el)
      return true
    }
  }
}

However, there's another problem with these implementations: they rely on side-effects from filter's closure, which is generally discouraged (I think). For instance, the function would be broken if the closure was evaluated more than once, or in a different order. So it's maybe best to have a more imperative for loop:

extension SequenceType where Generator.Element : Hashable {

  public func uniq() -> [Generator.Element] {
    var prevs: Set<Generator.Element> = []
    var uniqs: [Generator.Element] = []
    for el in self where !prevs.contains(el) {
      prevs.insert(el)
      uniqs.append(el)
    }
    return uniqs
  }
}

But then the lazy version would need to be more complex as well. (You'd probably have to write a custom struct, similar to the standard library's LazyFilterSequence. This is the kind of thing I mean)

Considering all that, I'm not sure what the function or method would add over just using the standard Set initialiser. The order of the elements is retained, and there's a possible lazy version, but it's a lot of extra complexity for maybe not much extra functionality.

if arr.contains({ $0 == element }) { return arr }

var newArr = arr
newArr.append(element)
return newArr
}
}
}

extension SequenceType {
Expand Down