Skip to content

Commit 90f00de

Browse files
committed
Update the PackageFinder architecture document.
1 parent aa12b20 commit 90f00de

File tree

1 file changed

+22
-17
lines changed

1 file changed

+22
-17
lines changed

docs/html/development/architecture/package-finding.rst

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,16 @@ Overview
1515
Here is a rough description of the process that pip uses to choose what
1616
file to download for a package, given a requirement:
1717

18-
1. Access the various network and file system locations configured for pip
19-
that contain package files. These locations can include, for example,
20-
pip's :ref:`--index-url <--index-url>` (with default
21-
https://pypi.org/simple/ ) and any configured
22-
:ref:`--extra-index-url <--extra-index-url>` locations.
23-
Each of these locations is a `PEP 503`_ "simple repository" page, which
24-
is an HTML page of anchor links.
25-
2. Collect together all of the links (e.g. by parsing the anchor links
26-
from the HTML pages) and create ``Link`` objects from each of these.
27-
The :ref:`LinkCollector <link-collector-class>` class is responsible
28-
for both this step and the previous.
18+
1. Collect together the various network and file system locations containing
19+
project package files. These locations are derived, for example, from pip's
20+
:ref:`--index-url <--index-url>` (with default https://pypi.org/simple/ )
21+
setting and any configured :ref:`--extra-index-url <--extra-index-url>`
22+
locations. Each of the project page URL's is an HTML page of anchor links,
23+
as defined in `PEP 503`_, the "Simple Repository API."
24+
2. For each project page URL, fetch the HTML and parse out the anchor links,
25+
creating a ``Link`` object from each one. The :ref:`LinkCollector
26+
<link-collector-class>` class is responsible for both the previous step
27+
and fetching the HTML over the network.
2928
3. Determine which of the links are minimally relevant, using the
3029
:ref:`LinkEvaluator <link-evaluator-class>` class. Create an
3130
``InstallationCandidate`` object (aka candidate for install) for each
@@ -111,6 +110,12 @@ One of ``PackageFinder``'s main top-level methods is
111110
class's ``compute_best_candidate()`` method on the return value of
112111
``find_all_candidates()``. This corresponds to steps 4-5 of the Overview.
113112

113+
``PackageFinder`` also has a ``process_project_url()`` method (called by
114+
``find_best_candidate()``) to process a `PEP 503`_ "simple repository"
115+
project page. This method fetches and parses the HTML from a PEP 503 project
116+
page URL, extracts the anchor elements and creates ``Link`` objects from
117+
them, and then evaluates those links.
118+
114119

115120
.. _link-collector-class:
116121

@@ -119,12 +124,8 @@ The ``LinkCollector`` class
119124

120125
The :ref:`LinkCollector <link-collector-class>` class is the class
121126
responsible for collecting the raw list of "links" to package files
122-
(represented as ``Link`` objects). An instance of the class accesses the
123-
various `PEP 503`_ HTML "simple repository" pages, parses their HTML,
124-
extracts the links from the anchor elements, and creates ``Link`` objects
125-
from that information. The ``LinkCollector`` class is "unintelligent" in that
126-
it doesn't do any evaluation of whether the links are relevant to the
127-
original requirement; it just collects them.
127+
(represented as ``Link`` objects) from file system locations, as well as the
128+
`PEP 503`_ project page URL's that ``PackageFinder`` should access.
128129

129130
The ``LinkCollector`` class takes into account the user's :ref:`--find-links
130131
<--find-links>`, :ref:`--extra-index-url <--extra-index-url>`, and related
@@ -133,6 +134,10 @@ method is the ``collect_links()`` method. The :ref:`PackageFinder
133134
<package-finder-class>` class invokes this method as the first step of its
134135
``find_all_candidates()`` method.
135136

137+
``LinkCollector`` also has a ``fetch_page()`` method to fetch the HTML from a
138+
project page URL. This method is "unintelligent" in that it doesn't parse the
139+
HTML.
140+
136141
The ``LinkCollector`` class is the only class in the ``index.py`` module that
137142
makes network requests and is the only class in the module that depends
138143
directly on ``PipSession``, which stores pip's configuration options and

0 commit comments

Comments
 (0)