Skip to content

BUG: Reference de-duplication isn't parallel-safe #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
astrofrog opened this issue Sep 30, 2017 · 22 comments · Fixed by #136
Closed

BUG: Reference de-duplication isn't parallel-safe #112

astrofrog opened this issue Sep 30, 2017 · 22 comments · Fixed by #136

Comments

@astrofrog
Copy link

astrofrog commented Sep 30, 2017

Currently Numpydoc is marked as a parallel-safe sphinx plugin, but the renaming of duplicate references in rename_references is not parallel-safe because reference_offset will be [0] in each process at the start of the function call.

I'm running into conflicts when building the Astropy docs:

/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midcorrelation:70: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/modeling/index.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midcovariance:138: WARNING: duplicate citation R33, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.BaseCoordinateFrame.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midvariance:101: WARNING: duplicate citation R56, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.EarthLocation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midvariance:103: WARNING: duplicate citation R66, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.EarthLocation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/bayesian_blocks.py:docstring of astropy.stats.FitnessFunc:7: WARNING: Unknown target name: "scargle2012".
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/bayesian_blocks.py:docstring of astropy.stats.FitnessFunc:26: WARNING: Unknown target name: "scargle2012".
docstring of astropy.stats.LombScargle.false_alarm_probability:62: WARNING: duplicate citation R1313, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.SkyCoord.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/polynomial.py:docstring of astropy.modeling.polynomial.SIP:55: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcorrelation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/functional_models.py:docstring of astropy.modeling.functional_models.AiryDisk2D:91: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcorrelation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/functional_models.py:docstring of astropy.modeling.functional_models.Gaussian2D:133: WARNING: duplicate citation R33, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcovariance.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/optimizers.py:docstring of astropy.modeling.optimizers.SLSQP:19: WARNING: duplicate citation R99, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.Galactic.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/optimizers.py:docstring of astropy.modeling.optimizers.Simplex:18: WARNING: duplicate citation R1111, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.LombScargle.rst
@astrofrog astrofrog changed the title Reference de-duplication isn't parallel-safe BUG: Reference de-duplication isn't parallel-safe Sep 30, 2017
@stefanv
Copy link
Contributor

stefanv commented Sep 30, 2017

This is correct: numpydoc reference renumbering is not parallel safe.

@jnothman
Copy link
Member

Should we consider making the reference renumbering a separate extension, marked unsafe? It would seem to me to be completely separate from the rest of numpydoc.

@pv
Copy link
Member

pv commented Oct 23, 2017 via email

@stefanv
Copy link
Contributor

stefanv commented Oct 23, 2017

At the cost of some nastier looking reference names, we could hash the reference text itself to generate the link IDs.

@stefanv
Copy link
Contributor

stefanv commented Oct 23, 2017

(This should solve the parallel problem;)

@jnothman
Copy link
Member

jnothman commented Oct 23, 2017 via email

@stefanv
Copy link
Contributor

stefanv commented Oct 24, 2017

We can probably just generate internal links?

@jnothman
Copy link
Member

I.e. work around the reST citation mechanism? I suppose we could.

@jnothman
Copy link
Member

It would be hard to get citation formatting matching the default in both web and PDF using :ref: links.

@jnothman
Copy link
Member

(I.e. in PDF the bibliography appears as an endnote)

@jnothman
Copy link
Member

At the cost of some nastier looking reference names, we could hash the reference text itself to generate the link IDs.

I think this could be acceptable. If the same hash appears multiple times in a doc, the first will be the link target, but maybe we can live with this. Alternatively, we can use numbered refs prefixed by the docstring object name.

There seem to be a number of issues with reference mangling. It appears to be untested. It should be possible to disable, and perhaps should be an extension of its own with parallel_read_safe=False. Unfortunately, metadata such as parallel_read_safe cannot be a function of config, so we are forced to set parallel_read_safe=False as long as an unsafe solution is possible within the numpydoc extension.

I'm inclined to one of the parallel-safe solutions above, but perhaps still making the mangling possible to disable...

@jnothman
Copy link
Member

Alternatives for solving this:

  1. Mark numpydoc as not parallel-safe
  2. Separate out the reference renaming as a separate plugin (it already essentially runs as a post-process), losing numpydoc backwards compatibility.
  3. Prefix each reference with the fully qualified object name (only feasible once mangle_docstring is called on already-processed docstrings #134 is resolved). Verbose.
  4. Prefix each reference with a hash of the fully qualified object name (only feasible once mangle_docstring is called on already-processed docstrings #134 is resolved). Less verbose but less intelligible.
  5. Replace the reference label with a hash of the reference text. Allows for global references, but might mean local links are a bit obscure. Still relatively unintelligible.
  6. Rewrite the reST footnoting mechanism using :ref: internal links instead. This will mean we no longer get endnotes in TeX builds etc.

Opinions?

@stefanv
Copy link
Contributor

stefanv commented Oct 31, 2017

I don't like (1) and (2) much. Anything that makes the references appear in text as garble seem less than ideal. I'm OK with (6), since we mostly generate HTML, but I suspect we'll get strong resistance.

Does option (3) or (4) allow for reasonable looking links backed by hashes?

@jnothman
Copy link
Member

I'm not sure what "backed by hashes" means. I'll try to throw together some implementations and screenshots I suppose

@stefanv
Copy link
Contributor

stefanv commented Oct 31, 2017

I just mean the links can point to anything, they don't need to be legible URIs, but the text that appears on the page should be reasonable.

@jnothman
Copy link
Member

I don't think that's possible with reST's citation rendering, so that requires (6).

@pv
Copy link
Member

pv commented Oct 31, 2017

From a pragmatic point, is avoiding (1) a big issue?
Does parallel reading speed up doc builds noticeably in practice?

@jnothman
Copy link
Member

There may be a way to do this with a read-doctree hook. Firstly we need a way to mark where numpydoc's processing begins and ends. Fortunately comments are accessible in the doctree, but we can't just bookend the mangled docstring with comments because the beginning of the docstring is processed naively by autosummary to produce a snippet... :\

@stefanv
Copy link
Contributor

stefanv commented Oct 31, 2017

@pv Good question. Do we rely on users to set parallel processing to false, or can we control that from within the extension?

@jnothman
Copy link
Member

jnothman commented Nov 1, 2017 via email

@astrofrog
Copy link
Author

From my experiments with astropy, using parallel mode can speed things up by a factor of 2 or more in some cases (saving minutes of build time) so definitely worth it as a developer who wants to check what updated docs look like. It would be a shame to lose that just because of reference de-duplication. So if you wanted to go down the road of 'won't fix', at least please consider making it possible to disable de-duplication as a numpydoc option and then return 'True' for parallel_read_safe in that case.

@jnothman
Copy link
Member

jnothman commented Nov 1, 2017

I have a fix for this in #136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants