BUG: Reference de-duplication isn't parallel-safe #112

astrofrog · 2017-09-30T16:11:47Z

Currently Numpydoc is marked as a parallel-safe sphinx plugin, but the renaming of duplicate references in rename_references is not parallel-safe because reference_offset will be [0] in each process at the start of the function call.

I'm running into conflicts when building the Astropy docs:

/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midcorrelation:70: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/modeling/index.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midcovariance:138: WARNING: duplicate citation R33, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.BaseCoordinateFrame.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midvariance:101: WARNING: duplicate citation R56, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.EarthLocation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/biweight.py:docstring of astropy.stats.biweight_midvariance:103: WARNING: duplicate citation R66, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.EarthLocation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/bayesian_blocks.py:docstring of astropy.stats.FitnessFunc:7: WARNING: Unknown target name: "scargle2012".
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/stats/bayesian_blocks.py:docstring of astropy.stats.FitnessFunc:26: WARNING: Unknown target name: "scargle2012".
docstring of astropy.stats.LombScargle.false_alarm_probability:62: WARNING: duplicate citation R1313, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.SkyCoord.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/polynomial.py:docstring of astropy.modeling.polynomial.SIP:55: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcorrelation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/functional_models.py:docstring of astropy.modeling.functional_models.AiryDisk2D:91: WARNING: duplicate citation R11, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcorrelation.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/functional_models.py:docstring of astropy.modeling.functional_models.Gaussian2D:133: WARNING: duplicate citation R33, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.biweight_midcovariance.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/optimizers.py:docstring of astropy.modeling.optimizers.SLSQP:19: WARNING: duplicate citation R99, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.coordinates.Galactic.rst
/Users/tom/Dropbox/Code/Astropy/astropy/astropy/modeling/optimizers.py:docstring of astropy.modeling.optimizers.Simplex:18: WARNING: duplicate citation R1111, other instance in /Users/tom/Dropbox/Code/Astropy/astropy/docs/api/astropy.stats.LombScargle.rst

The text was updated successfully, but these errors were encountered:

stefanv · 2017-09-30T19:32:01Z

This is correct: numpydoc reference renumbering is not parallel safe.

jnothman · 2017-10-23T07:25:26Z

Should we consider making the reference renumbering a separate extension, marked unsafe? It would seem to me to be completely separate from the rest of numpydoc.

pv · 2017-10-23T08:47:37Z

IIRC the numpydoc citation format is not RST citation but RST footnotes, so not so clear all mangling can be removed.

stefanv · 2017-10-23T18:36:09Z

At the cost of some nastier looking reference names, we could hash the reference text itself to generate the link IDs.

stefanv · 2017-10-23T18:36:42Z

(This should solve the parallel problem;)

jnothman · 2017-10-23T21:07:39Z

ideally we wouldn't change the displayed reference names at all, but generate unique IDs for linking. I wonder if there's some way to do that.

stefanv · 2017-10-24T06:02:15Z

We can probably just generate internal links?

jnothman · 2017-10-24T06:34:59Z

I.e. work around the reST citation mechanism? I suppose we could.

jnothman · 2017-10-24T19:29:43Z

It would be hard to get citation formatting matching the default in both web and PDF using :ref: links.

jnothman · 2017-10-25T02:09:32Z

(I.e. in PDF the bibliography appears as an endnote)

jnothman · 2017-10-25T22:53:42Z

At the cost of some nastier looking reference names, we could hash the reference text itself to generate the link IDs.

I think this could be acceptable. If the same hash appears multiple times in a doc, the first will be the link target, but maybe we can live with this. Alternatively, we can use numbered refs prefixed by the docstring object name.

There seem to be a number of issues with reference mangling. It appears to be untested. It should be possible to disable, and perhaps should be an extension of its own with parallel_read_safe=False. Unfortunately, metadata such as parallel_read_safe cannot be a function of config, so we are forced to set parallel_read_safe=False as long as an unsafe solution is possible within the numpydoc extension.

I'm inclined to one of the parallel-safe solutions above, but perhaps still making the mangling possible to disable...

jnothman · 2017-10-31T22:45:08Z

Alternatives for solving this:

Mark numpydoc as not parallel-safe
Separate out the reference renaming as a separate plugin (it already essentially runs as a post-process), losing numpydoc backwards compatibility.
Prefix each reference with the fully qualified object name (only feasible once mangle_docstring is called on already-processed docstrings #134 is resolved). Verbose.
Prefix each reference with a hash of the fully qualified object name (only feasible once mangle_docstring is called on already-processed docstrings #134 is resolved). Less verbose but less intelligible.
Replace the reference label with a hash of the reference text. Allows for global references, but might mean local links are a bit obscure. Still relatively unintelligible.
Rewrite the reST footnoting mechanism using :ref: internal links instead. This will mean we no longer get endnotes in TeX builds etc.

Opinions?

stefanv · 2017-10-31T22:51:52Z

I don't like (1) and (2) much. Anything that makes the references appear in text as garble seem less than ideal. I'm OK with (6), since we mostly generate HTML, but I suspect we'll get strong resistance.

Does option (3) or (4) allow for reasonable looking links backed by hashes?

jnothman · 2017-10-31T22:57:00Z

I'm not sure what "backed by hashes" means. I'll try to throw together some implementations and screenshots I suppose

stefanv · 2017-10-31T22:59:44Z

I just mean the links can point to anything, they don't need to be legible URIs, but the text that appears on the page should be reasonable.

jnothman · 2017-10-31T23:07:13Z

I don't think that's possible with reST's citation rendering, so that requires (6).

pv · 2017-10-31T23:28:30Z

From a pragmatic point, is avoiding (1) a big issue?
Does parallel reading speed up doc builds noticeably in practice?

jnothman · 2017-10-31T23:33:17Z

There may be a way to do this with a read-doctree hook. Firstly we need a way to mark where numpydoc's processing begins and ends. Fortunately comments are accessible in the doctree, but we can't just bookend the mangled docstring with comments because the beginning of the docstring is processed naively by autosummary to produce a snippet... :\

stefanv · 2017-10-31T23:55:00Z

@pv Good question. Do we rely on users to set parallel processing to false, or can we control that from within the extension?

jnothman · 2017-11-01T00:00:32Z

We currently emit metadata in setup that says numpydoc is parallel_read_safe.

…

On 1 November 2017 at 10:55, Stefan van der Walt ***@***.***> wrote: @pv <https://github.com/pv> Good question. Do we rely on users to set parallel processing to false, or can we control that from within the extension? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#112 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6-0nqXUV4CBf30mNOv89Uq820Qvhks5sx7NUgaJpZM4PpmtG> .

astrofrog · 2017-11-01T00:52:55Z

From my experiments with astropy, using parallel mode can speed things up by a factor of 2 or more in some cases (saving minutes of build time) so definitely worth it as a developer who wants to check what updated docs look like. It would be a shame to lose that just because of reference de-duplication. So if you wanted to go down the road of 'won't fix', at least please consider making it possible to disable de-duplication as a numpydoc option and then return 'True' for parallel_read_safe in that case.

jnothman · 2017-11-01T11:08:28Z

I have a fix for this in #136

astrofrog changed the title ~~Reference de-duplication isn't parallel-safe~~ BUG: Reference de-duplication isn't parallel-safe Sep 30, 2017

jnothman added type: Bug fix needs-work needs-decision labels Oct 26, 2017

jnothman mentioned this issue Nov 1, 2017

Ensure reference renaming is parallel-safe #136

Merged

jnothman closed this as completed in #136 Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Reference de-duplication isn't parallel-safe #112

BUG: Reference de-duplication isn't parallel-safe #112

astrofrog commented Sep 30, 2017 •

edited

Loading

stefanv commented Sep 30, 2017

jnothman commented Oct 23, 2017

pv commented Oct 23, 2017 via email

stefanv commented Oct 23, 2017

stefanv commented Oct 23, 2017

jnothman commented Oct 23, 2017 via email

stefanv commented Oct 24, 2017

jnothman commented Oct 24, 2017

jnothman commented Oct 24, 2017

jnothman commented Oct 25, 2017

jnothman commented Oct 25, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Oct 31, 2017

pv commented Oct 31, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Nov 1, 2017 via email

astrofrog commented Nov 1, 2017

jnothman commented Nov 1, 2017

BUG: Reference de-duplication isn't parallel-safe #112

BUG: Reference de-duplication isn't parallel-safe #112

Comments

astrofrog commented Sep 30, 2017 • edited Loading

stefanv commented Sep 30, 2017

jnothman commented Oct 23, 2017

pv commented Oct 23, 2017 via email

stefanv commented Oct 23, 2017

stefanv commented Oct 23, 2017

jnothman commented Oct 23, 2017 via email

stefanv commented Oct 24, 2017

jnothman commented Oct 24, 2017

jnothman commented Oct 24, 2017

jnothman commented Oct 25, 2017

jnothman commented Oct 25, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Oct 31, 2017

pv commented Oct 31, 2017

jnothman commented Oct 31, 2017

stefanv commented Oct 31, 2017

jnothman commented Nov 1, 2017 via email

astrofrog commented Nov 1, 2017

jnothman commented Nov 1, 2017

astrofrog commented Sep 30, 2017 •

edited

Loading