Skip to content

Method 'Learner.tell' is ambiguous #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
basnijholt opened this issue Dec 19, 2018 · 7 comments
Closed

Method 'Learner.tell' is ambiguous #80

basnijholt opened this issue Dec 19, 2018 · 7 comments
Labels

Comments

@basnijholt
Copy link
Member

(original issue on GitLab)

opened by Joseph Weston (@jbweston) at 2018-05-28T12:50:24.892Z

Presently Learner.tell decides whether it has been passed a single point, or a sequence of points by checking whether both the "x" (domain) and "y" (codomain) values are iterable. If they are, then they are treated as being a sequence of points, otherwise they are treated as a single point.

This works for all the current learners because, by pure accident, we have not yet implemented learners for functions ℝ^N → ℝ^M. This recently had confusing consequences (see gitlab:#58).

We have 2 choices:

  • Have 2 methods (e.g. tell and tell_many) that treat the single point case and several point case respectively. This would be a pure rename of the previous (and badly named) methods add_point and add_data.

  • Use better logic for determining when a sequence of points is being passed, as opposed to a single point.

@basnijholt basnijholt added the bug label Dec 19, 2018
@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-05-28T12:57:26.367Z on GitLab

Option 2 gives better UX, but may be difficult to do.

The idea is to implement tell at the level of BaseLearner:

def tell(self, xs, ys):
    if is_sequence_of_points(xs, ys):
        for x, y in zip(xs, ys):
            self._tell(x, y)
    else:
        self._tell(xs, ys)

The difficult part is implementing is_sequence_of_points in general, because we don't know what a point looks like, for a general learner; a single point could a sequence of values scalar values (e.g. vector in ℝ^N).

@basnijholt
Copy link
Member Author

originally posted by Jorn Hoofwijk (@Jorn) at 2018-05-29T09:29:03.950Z on GitLab

As we already know the dimensionality of x based on self.bounds, we could use that to determine if you are adding a single point or multiple points at once, like this (pseudocode):

def tell(self, xs, ys):
    if self.ndim == 1 and typeof xs is number:
        self._tell(xs, ys)
    elif self.ndim == 1 and xs is iterable:
        for x, y in zip(xs, ys):
             self._tell(x, y)
    elif self.ndim == 1:  # not allowed
        throw some error
    elif xs is number:  # we know self.ndim > 1
        throw some other error
    elif len(xs.shape) == 1: # we know you are only adding a single point
        self._tell(xs, ys)
    elif len(xs.shape) == 2: # you are adding a list of points
        for x, y in zip(xs, ys):
             self.tell(x, y)
    else:
        throw some more errors

Maybe somewhere we should convert the input into a numpy array if it is an iterable. (so we can call xs.shape safely) and we should check that xs and ys have the same number of entries.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-05-29T10:25:16.095Z on GitLab

Unfortunately BaseLearner does not have bounds. We can of course provide explicit implementations of tell for each subclass, but the point was to have the common logic specified only once.

Currently subclasses should implement _tell, which only takes a single point, unless the algorithm that the learner implements can do something more efficient than "call _tell n times", in which case it should implement tell.

@basnijholt
Copy link
Member Author

originally posted by Bas Nijholt (@basnijholt) at 2018-05-29T19:40:59.308Z on GitLab

I agree, option 2 would be the best solution.

@basnijholt
Copy link
Member Author

originally posted by Anton Akhmerov (@anton-akhmerov) at 2018-07-02T13:59:40.269Z on GitLab

I am actually in favor of the first solution. My reason is that:

  • It's easy for any consumer or author to use either of these without knowing that another one exists.
  • To cover common cases we can provide a default implementation of both methods, so that any learner should just implement one of the two.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-07-02T17:32:20.080Z on GitLab

I think that solution 2 is preferable from a UX standpoint, but will implementing it properly will require too much effort for too little UX reward.

I cannot think of any actual scenario where having 2 methods would actually restrict you from doing anything.

@basnijholt
Copy link
Member Author

originally posted by Bas Nijholt (@basnijholt) at 2018-07-02T17:37:23.897Z on GitLab

OK fair enough, I agree and will close gitlab:!78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant