Skip to content

Words: improve count consistency across datatype representation #30187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slel opened this issue Jul 21, 2020 · 36 comments
Closed

Words: improve count consistency across datatype representation #30187

slel opened this issue Jul 21, 2020 · 36 comments

Comments

@slel
Copy link
Member

slel commented Jul 21, 2020

Before this ticket, count depends on the representation (list vs str) in a surprising way, see this question on ask:

sage: s = 'aaabaaababaaaaabbbbb'
sage: u = Word(list(s))
sage: v = Word(s)
sage: u.count('ab')
0
sage: v.count('ab')
4

because

sage: type(v)
<class 'sage.combinat.words.word.FiniteWord_str'>
sage: type(u)
<class 'sage.combinat.words.word.FiniteWord_list'>

The proposed solution is to fix count so that it behaves like list.count and tuple.count:

sage: list(u).count('ab')
0
sage: list(v).count('ab')
0

In the proposed branch, the method count is now an alias to the method number_of_letter_occurrences.

Doing so, we also deprecate few methods in favor of:

_pos_in                    -> first_occurrence
first_pos_in               -> first_occurrence
factor_occurrences_in      -> factor_occurrences_iterator
nb_factor_occurrences_in   -> number_of_factor_occurrences
nb_subword_occurrences_in  -> number_of_subword_occurrences

See also:

CC: @seblabbe @slel

Component: combinatorics

Keywords: words, count

Author: Sébastien Labbé

Branch/Commit: 4c9870d

Reviewer: Jonathan Kliem

Issue created by migration from https://trac.sagemath.org/ticket/30187

@slel slel added this to the sage-9.2 milestone Jul 21, 2020
@seblabbe
Copy link
Contributor

comment:1

According to the documentation count counts the number of letters as in the behavior of the method count for python lists (not as the method count for python str which counts factors as well!). In that sense, it is a bad usage from the user. The reason for the different behavior outside of the documented domain is that the implementation depends on the data structure.

sage: W = Words('ab', 20)
sage: u = W.random_element()
sage: v = Word('aaabaaababaaaaabbbbb')

sage: type(u)
<class 'sage.combinat.words.word.FiniteWord_list'>
sage: type(v)
<class 'sage.combinat.words.word.FiniteWord_str'>

sage: v.count
<built-in method count of FiniteWord_str object at 0x7f3853f0cc50>
sage: u.count
<built-in method count of FiniteWord_list object at 0x7f3853f12a48>

The method FiniteWord_list.count and FiniteWord_str.count do not do the same thing as both overwrites the generic method FiniteWord_class.count (copied below) and delagate their job to list.count and str.count (faster).

The generic method is:

sage: sage.combinat.words.finite_word.FiniteWord_class.count??
Signature: sage.combinat.words.finite_word.FiniteWord_class.count(self, letter)
Source:   
    def count(self, letter):
        r"""
        Count the number of occurrences of ``letter`` in ``self``.

        EXAMPLES::

            sage: Word('abbabaab').count('a')
            4
        """
        return Integer(sum(1 for a in self if a == letter))

The method FiniteWord_list.count is:

sage: sage.combinat.words.word_datatypes.WordDatatype_list.count??
Source:
    def count(self, a):
        r"""
        Returns the number of occurrences of the letter ``a`` in the word
        ``self``.

        INPUT:

        -  ``a`` - a letter

        OUTPUT:

        - integer

        EXAMPLES::

            sage: w = Word([0,1,1,0,1])
            sage: w.count(0)
            2
            sage: w.count(1)
            3
            sage: w.count(2)
            0

        """
        return self._data.count(a)

The method FiniteWord_str.count is:

sage: sage.combinat.words.word_datatypes.WordDatatype_str.count??
Source:
    def count(self, letter):
        r"""
        Count the number of occurrences of ``letter``.

        INPUT:

        - ``letter`` - a letter

        OUTPUT:

        - integer

        EXAMPLES::

            sage: w = Word("abbabaabababa")
            sage: w.count('a')
            7
            sage: w.count('b')
            6
            sage: w.count('c')
            0

        """
        return self._data.count(letter)

But, I agree that it could be more robust (provide an error message if the letter is not in the alphabet? or return zero? or advertise about nb_factor_occurrences?). Or should count works also for counting occurences of factors?

@seblabbe
Copy link
Contributor

Author: Sébastien Labbé

@seblabbe
Copy link
Contributor

Commit: 57a1323

@seblabbe
Copy link
Contributor

Branch: u/slabbe/30187

@seblabbe
Copy link
Contributor

comment:2

It also fixes #30143


New commits:

57a132330187: count -> number_of_occurrences_of_letter

@seblabbe
Copy link
Contributor

comment:3

Salut Samuel, tu veux faire le review?

@kliem
Copy link
Contributor

kliem commented Sep 4, 2020

comment:4
File "src/sage/modular/multiple_zeta.py", line 38, in sage.modular.multiple_zeta
Failed example:
    Multizeta(2)*Multizeta(3)
Expected:
    6*ζ(1,4) + 3*ζ(2,3) + ζ(3,2)
Got:
    doctest:warning
      File "/home/sagemath/sage-9.1/src/bin/sage-runtests", line 182, in <module>
        err = DC.run()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/control.py", line 1230, in run
        self.run_doctests()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/control.py", line 931, in run_doctests
        self.dispatcher.dispatch()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 2046, in dispatch
        self.parallel_dispatch()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 1941, in parallel_dispatch
        w.start()  # This might take some time
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 2213, in start
        super(DocTestWorker, self).start()
      File "/usr/lib/python3.7/multiprocessing/process.py", line 112, in start
        self._popen = self._Popen(self)
      File "/usr/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "/usr/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
        return Popen(process_obj)
      File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
        self._launch(process_obj)
      File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
        code = process_obj._bootstrap()
      File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
        self.run()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 2185, in run
        task(self.options, self.outtmpfile, msgpipe, self.result_queue)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 2514, in __call__
        doctests, extras = self._run(runner, options, results)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 2561, in _run
        result = runner.run(test)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 913, in run
        return self._run(test, compileflags, out)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 715, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/doctest/forker.py", line 1139, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.modular.multiple_zeta[3]>", line 1, in <module>
        Multizeta(Integer(2))*Multizeta(Integer(3))
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/categories/magmatic_algebras.py", line 219, in _product_from_product_on_basis_multiply
        for (mon_left, coeff_left) in left.monomial_coefficients().items()
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/combinat/free_module.py", line 1025, in linear_combination
        factor_on_left=factor_on_left ),
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/combinat/free_module.py", line 1023, in <genexpr>
        return self._from_dict(blas.linear_combination( ((element._monomial_coefficients, coeff)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/categories/magmatic_algebras.py", line 220, in <genexpr>
        for (mon_right, coeff_right) in right.monomial_coefficients().items() )
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/modular/multiple_zeta.py", line 798, in product_on_basis
        return MZV_it.composition(p1p2)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/modules/with_basis/morphism.py", line 399, in __call__
        for (index, coeff) in mc.items())
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/combinat/free_module.py", line 1025, in linear_combination
        factor_on_left=factor_on_left ),
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/combinat/free_module.py", line 1023, in <genexpr>
        return self._from_dict(blas.linear_combination( ((element._monomial_coefficients, coeff)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/modules/with_basis/morphism.py", line 399, in <genexpr>
        for (index, coeff) in mc.items())
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/modular/multiple_zeta.py", line 1461, in composition_on_basis
        return (-1)**w.count(1) * codomain(iterated_to_composition(w))
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/misc/superseded.py", line 420, in __call__
        self.__name__, other))
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/misc/superseded.py", line 100, in deprecation
        warning(trac_number, message, DeprecationWarning, stacklevel)
      File "/home/sagemath/sage-9.1/local/lib/python3.7/site-packages/sage/misc/superseded.py", line 146, in warning
        warn(message, warning_class, stacklevel)
      File "/usr/lib/python3.7/warnings.py", line 110, in _showwarnmsg
        msg.file, msg.line)
    :
    DeprecationWarning: count is deprecated. Please use number_of_occurrences_of_letter instead.
    See https://trac.sagemath.org/30187 for details.
    6*ζ(1,4) + 3*ζ(2,3) + ζ(3,2)

@kliem
Copy link
Contributor

kliem commented Sep 4, 2020

comment:5

In principal the change is ok, also I would have expected count(self, 'ab') to count the number of times the pattern 'ab' appears in self. Then again 'ab' can also be a letter, so we have to distinguish.

What I don't like is the mess with the names. For v = Word('ab') there is:

  • number_of_factors
  • number_of_inversions
  • number_of_{left/right}_special_factors
  • nb_factor_occurs_in
  • nb_subword_occurs_in

Could we maybe add aliases

  • number_factor_occurs_in and number_of_occurences_of_factor (the later would cast the other into a word, if it isn't already)

and likewise for subword.

This would make it much easier to find and in my opinion would make #30143 obsolete.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 4, 2020

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f131a7130187: count -> number_of_occurrences_of_letter
2138d0130187: do not deprecate count yet

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 4, 2020

Changed commit from 57a1323 to 2138d01

@seblabbe
Copy link
Contributor

seblabbe commented Sep 4, 2020

comment:7

I rebased the branch on top of beta11. I also removed the deprecation: it was not the original goal of the ticket. Also, maybe it is desirable to have w.count() behave the same whether w is a word or a tuple or a list. At least, replacing count by number_of_occurrences_of_letter in multiple_zeta.py shows that the input is sometime a word and sometimes a tuple.

(I saw your recent comment just now after I force-pushed the branch.)

@kliem
Copy link
Contributor

kliem commented Sep 5, 2020

Reviewer: Jonathan Kliem

@kliem
Copy link
Contributor

kliem commented Sep 5, 2020

comment:8

Works for me.

What do you think of those aliases?

I missed that you added the SEEALSO and this is great. Once you want to count something, you can look into documentation of count, which helps you find all the methods related to counting (in particular you see that there are methods nb_of_... and number_of_....

@seblabbe
Copy link
Contributor

seblabbe commented Sep 5, 2020

comment:9

I still want to fix the nb_methods. Let me add a commit on the branch.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 5, 2020

Changed commit from 2138d01 to c0fea81

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 5, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

b1d58b0deprecating _pos_in, first_pos_in, factor_occurrences_in, nb_factor_occurrences_in, nb_subword_occurrences_in
3051d03moving first_occurrence method to abstract_word.py
c0fea81number_of_occurrences_of_letter -> number_of_letter_occurrences

@seblabbe
Copy link
Contributor

seblabbe commented Sep 5, 2020

comment:11

Those methods names (nb_* and *_in) were from my master thesis back in 2007 where I just did not know what good choice to make. Today, I think the nb_ shoud be replaced by n_ or number_of, let's take number_of to be consistent with other methods in the same class. Also, I don't like the *_in like f.factor_occurrences_in(w) which I think should be better written as w.factor_occurrences(f). Here I think I was mislead by the way the method __str__.find is implemented...

Anyway, since the set of nb_ methods was strongly intersecting the set of *_in methods, I decided to deprecate them all at once.

There are still other stuff that I dislike in that module, like u.is_factor(w) should be the other way around w.is_factor(u), but let's keep those for another ticket.

@seblabbe
Copy link
Contributor

seblabbe commented Sep 5, 2020

comment:12

Ready for review!

@fchapoton
Copy link
Contributor

comment:13

doc does not build

docstring of sage.combinat.words.abstract_word.Word_class.first_occurrence:17: WARNING: Inline literal start-string without end-string.
make[1]: *** [Makefile:1871: doc-html] Error 1
make[1]: Leaving directory '/home/sagemath/sage-9.1/build/make'

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 5, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

c75439d30187: fixing doc build warning

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 5, 2020

Changed commit from c0fea81 to c75439d

@seblabbe
Copy link
Contributor

seblabbe commented Sep 5, 2020

comment:15

oups, forgot to check that. Now doc builds ok on my machine.

@kliem
Copy link
Contributor

kliem commented Sep 7, 2020

comment:16

I would suggest:

-        sage: w[1:].first_occurrence(u)
-        8
+        sage: w.first_occurrence(u, start=1)
+        9

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 8, 2020

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

704623430187: count -> number_of_occurrences_of_letter
55c68cc30187: do not deprecate count yet
182161cdeprecating _pos_in, first_pos_in, factor_occurrences_in, nb_factor_occurrences_in, nb_subword_occurrences_in
54075cdmoving first_occurrence method to abstract_word.py
c65a769number_of_occurrences_of_letter -> number_of_letter_occurrences
51cd41e30187: fixing doc build warning
b449aa330187: improved doctest after reviewer's comment

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 8, 2020

Changed commit from c75439d to b449aa3

@seblabbe
Copy link
Contributor

seblabbe commented Sep 8, 2020

comment:18

rebased on 9.2.beta12 + fixed w[1:] doctest. Needs review.

@seblabbe

This comment has been minimized.

@seblabbe
Copy link
Contributor

seblabbe commented Sep 8, 2020

comment:19

updating description of ticket

@seblabbe

This comment has been minimized.

@seblabbe seblabbe changed the title Words: improve count consistency across alphabets Words: improve count consistency across datatype representation Sep 8, 2020
@kliem
Copy link
Contributor

kliem commented Sep 8, 2020

comment:22

Sorry, two more things:

+    def first_occurrence(self, other, start=0):
+        r"""
+        Return the position of the first occurrence of ``other`` in ``self``,
+        or ``None`` if ``other`` is not a factor of ``self``.
+
+        INPUT:
+
+        - ``other`` -- a finite word

other needs no longer be finite. Maybe also a quick comment, about that the function might not terminate for infinite words. I don't think it is a problem. I tried it and one can keyboard interrupt without any problems.

@kliem
Copy link
Contributor

kliem commented Sep 8, 2020

comment:23

Sorry, I didn't read closely. Of course other does need to be finite (an AttributeError is raised, if it is not, which is fine for an error I think).

The comment about the function not terminating would still be nice.

@kliem
Copy link
Contributor

kliem commented Sep 8, 2020

comment:24

The patchbot is morally green. Once you added this comment, you can put this on positive review on my behalf.

(I think the comment is needed, as you claimed that the function returns None if the factor is not present, which is not true for infinite words.)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 10, 2020

Branch pushed to git repo; I updated commit sha1. New commits:

4c9870d30187: add comment on possible infinite loop

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 10, 2020

Changed commit from b449aa3 to 4c9870d

@seblabbe
Copy link
Contributor

comment:26

Doc builds ok on my side.

@vbraun
Copy link
Member

vbraun commented Sep 15, 2020

Changed branch from u/slabbe/30187 to 4c9870d

@vbraun vbraun closed this as completed in 762de29 Sep 15, 2020
vbraun pushed a commit to vbraun/sage that referenced this issue May 17, 2025
sagemathgh-39262: Add script for checking for old deprecations
    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

To find deprecations that can now safely removed.
Example:
```
python tools/check_deprecations.py src/sage/combinat
Found 9 deprecations.
100%|
Deprecations over one year ago:
File: src\sage\combinat\permutation.py, PR:
sagemath#26810, Closed Date: 2019-01-23
File: src\sage\combinat\permutation.py, PR:
sagemath#27467, Closed Date: 2019-06-12
File: src\sage\combinat\words\finite_word.py, PR:
sagemath#30187, Closed Date: 2020-09-15
File: src\sage\combinat\designs\difference_family.py, PR:
sagemath#35211, Closed Date: 2023-04-01
```

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [ ] The title is concise and informative.
- [ ] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#39262
Reported by: Tobias Diez
Reviewer(s): Frédéric Chapoton, Tobias Diez
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants