Skip to content

[pyreverse] Fix duplicate arrows when class attribute is assigned more than once #10333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

pranav-data
Copy link

@pranav-data pranav-data commented Apr 9, 2025

Type of Changes

Type
βœ“ πŸ› Bug fix
✨ New feature
πŸ”¨ Refactoring
πŸ“œ Docs

Description

Postprocessing relationships and classes to find replication across relationship types / attributes.

Output for example:

classDiagram
  class A {
    var : int
  }
  class B {
    a_obj
    func()
  }
  A --* B : a_obj

Closes #9267
Closes: #8046
Fixes closed issue: #8189

@pranav-data pranav-data marked this pull request as ready for review April 9, 2025 20:24
@pranav-data pranav-data requested a review from DudeNr33 as a code owner April 9, 2025 20:24
@Pierre-Sassoulas Pierre-Sassoulas added Bug πŸͺ² pyreverse Related to pyreverse component labels Apr 9, 2025
@Pierre-Sassoulas Pierre-Sassoulas added this to the 3.3.7 milestone Apr 9, 2025
@Pierre-Sassoulas Pierre-Sassoulas changed the title Closing #9267 [pyreverse] Fix duplicate arrows when class attribute is assigned more than once Apr 9, 2025
Copy link

codecov bot commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 9 lines in your changes missing coverage. Please review.

Project coverage is 95.85%. Comparing base (6faf5b4) to head (608bbc0).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pylint/pyreverse/diadefslib.py 76.92% 9 Missing ⚠️

❌ Your patch check has failed because the patch coverage (76.92%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #10333      +/-   ##
==========================================
- Coverage   95.90%   95.85%   -0.05%     
==========================================
  Files         176      176              
  Lines       19108    19145      +37     
==========================================
+ Hits        18325    18352      +27     
- Misses        783      793      +10     
Files with missing lines Coverage Ξ”
pylint/pyreverse/diadefslib.py 94.47% <76.92%> (-4.84%) ⬇️

... and 1 file with indirect coverage changes

πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This comment has been minimized.

@DudeNr33
Copy link
Collaborator

Hi @pranav-data, thanks for the PR! πŸ™‚
Can you remove the additions to the .gitignore and the pyenv.cfg file? These are custom to your own environment and should not be added to the repo.

This comment has been minimized.

@pranav-data
Copy link
Author

@DudeNr33 I have removed the files and pushed.

@pranav-data
Copy link
Author

I have been noticing duplicate class definitions captured, in a project I am working on, but unable to reproduce it in a simpler example for a unit test. The parts where code coverage is lacking is addressing that issue.

Copy link
Collaborator

@DudeNr33 DudeNr33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few remarks before we can merge this:

  1. The logic to filter out the duplications is better suited in diagrams.ClassDiagram. This is where the logic to extract the relationships lives. diadefslib is more about handling the configuration options provided by the user when generating the diagrams.
    Ideally, the duplicates are not added to the relationships in the first place instead of doing post-processing to get rid of them.
  2. The description links multiple issues as 'closed', but there are no testcases for the examples given in Pyreverse: Duplicated class variablesΒ #8046 - could add them to verify if this solution also solves the problems mentioned in the isse? Pay attention to this comment as there are some caveats when differentiating class and instance attributes.
  3. New code without code coverage (removal of duplicated classes) is not ideal. It is better to split this PR up into two parts: first MR that focuses on the duplicated relationship arrows (where we have reproduction examples from the linked issues). The second MR for the duplicated classes, but only once a minimal reproduction example can be provided.

Thank you for taking the time fixing bugs in pyreverse!

Copy link
Contributor

πŸ€– According to the primer, this change has no effect on the checked open source code. πŸ€–πŸŽ‰

This comment was generated for commit 608bbc0

@pranav-data
Copy link
Author

Removing the duplicate from diagrams.py could clean up the classes but messes up the relationships. From my debugging, it appears that the duplicate originates from how astroid parses the repository for dependencies, where one copy of the class is required to establish the relationship.

Breaking it down to 2 separate fixes will fix one while breaking the other.

@DudeNr33
Copy link
Collaborator

I do not see yet how moving the logic to diagrams.py would be a problem, as all the post-processing happens on instance attributes of the ClassDiagram objects.

I tried it locally and the test still passes, expand to see the diff of my changes:

diff --git a/pylint/pyreverse/diadefslib.py b/pylint/pyreverse/diadefslib.py
index f91a89eca..d3b045437 100644
--- a/pylint/pyreverse/diadefslib.py
+++ b/pylint/pyreverse/diadefslib.py
@@ -68,10 +68,7 @@ class DiaDefGenerator:
         leaf_nodes = [
             module
             for module in self.args
-            if not any(
-                other != module and other.startswith(module + ".")
-                for other in self.args
-            )
+            if not any(other != module and other.startswith(module + ".") for other in self.args)
         ]
         return leaf_nodes
 
@@ -120,12 +117,8 @@ class DiaDefGenerator:
         absolute_depth = name.count(".")
 
         # Retrieve the base depth to compare against
-        relative_depth = next(
-            (v for k, v in self.args_depths.items() if name.startswith(k)), None
-        )
-        return relative_depth is not None and bool(
-            (absolute_depth - relative_depth) <= self.config.max_depth
-        )
+        relative_depth = next((v for k, v in self.args_depths.items() if name.startswith(k)), None)
+        return relative_depth is not None and bool((absolute_depth - relative_depth) <= self.config.max_depth)
 
     def show_node(self, node: nodes.ClassDef) -> bool:
         """Determine if node should be shown based on config."""
@@ -143,9 +136,7 @@ class DiaDefGenerator:
         self.linker.visit(node)
         self.classdiagram.add_object(self.get_title(node), node)
 
-    def get_ancestors(
-        self, node: nodes.ClassDef, level: int
-    ) -> Generator[nodes.ClassDef]:
+    def get_ancestors(self, node: nodes.ClassDef, level: int) -> Generator[nodes.ClassDef]:
         """Return ancestor nodes of a class node."""
         if level == 0:
             return
@@ -154,9 +145,7 @@ class DiaDefGenerator:
                 continue
             yield ancestor
 
-    def get_associated(
-        self, klass_node: nodes.ClassDef, level: int
-    ) -> Generator[nodes.ClassDef]:
+    def get_associated(self, klass_node: nodes.ClassDef, level: int) -> Generator[nodes.ClassDef]:
         """Return associated nodes of a class node."""
         if level == 0:
             return
@@ -170,9 +159,7 @@ class DiaDefGenerator:
                     continue
                 yield node
 
-    def extract_classes(
-        self, klass_node: nodes.ClassDef, anc_level: int, association_level: int
-    ) -> None:
+    def extract_classes(self, klass_node: nodes.ClassDef, anc_level: int, association_level: int) -> None:
         """Extract recursively classes related to klass_node."""
         if self.classdiagram.has_node(klass_node) or not self.show_node(klass_node):
             return
@@ -203,9 +190,7 @@ class DefaultDiadefGenerator(LocalsVisitor, DiaDefGenerator):
         """
         mode = self.config.mode
         if len(node.modules) > 1:
-            self.pkgdiagram: PackageDiagram | None = PackageDiagram(
-                f"packages {node.name}", mode
-            )
+            self.pkgdiagram: PackageDiagram | None = PackageDiagram(f"packages {node.name}", mode)
         else:
             self.pkgdiagram = None
         self.classdiagram = ClassDiagram(f"classes {node.name}", mode)
@@ -273,99 +258,6 @@ class DiadefsHandler:
         self.config = config
         self.args = args
 
-    def _get_object_name(self, obj: Any) -> str:
-        """Get object name safely handling both title attributes and strings.
-
-        :param obj: Object to get name from
-        :return: Object name/title
-        :rtype: str
-        """
-        return obj.title if hasattr(obj, "title") else str(obj)
-
-    def _process_relationship(
-        self, relationship: Relationship, unique_classes: dict[str, Any], obj: Any
-    ) -> None:
-        """Process a single relationship for deduplication.
-
-        :param relationship: Relationship to process
-        :param unique_classes: Dict of unique classes
-        :param obj: Current object being processed
-        """
-        if relationship.from_object == obj:
-            relationship.from_object = unique_classes[obj.node.qname()]
-        if relationship.to_object == obj:
-            relationship.to_object = unique_classes[obj.node.qname()]
-
-    def _process_class_relationships(
-        self, diagram: ClassDiagram, obj: Any, unique_classes: dict[str, Any]
-    ) -> None:
-        """Merge relationships for a class.
-
-        :param diagram: Current diagram
-        :param obj: Object whose relationships to process
-        :param unique_classes: Dict of unique classes
-        """
-        for rel_type in ("specialization", "association", "aggregation"):
-            for rel in diagram.get_relationships(rel_type):
-                self._process_relationship(rel, unique_classes, obj)
-
-    def deduplicate_classes(self, diagrams: list[ClassDiagram]) -> list[ClassDiagram]:
-        """Remove duplicate classes from diagrams."""
-        for diagram in diagrams:
-            # Track unique classes by qualified name
-            unique_classes: dict[str, Any] = {}
-            duplicate_classes: Any = set()
-
-            # First pass - identify duplicates
-            for obj in diagram.objects:
-                qname = obj.node.qname()
-                if qname in unique_classes:
-                    duplicate_classes.add(obj)
-                    self._process_class_relationships(diagram, obj, unique_classes)
-                else:
-                    unique_classes[qname] = obj
-
-            # Second pass - filter out duplicates
-            diagram.objects = [
-                obj for obj in diagram.objects if obj not in duplicate_classes
-            ]
-
-        return diagrams
-
-    def _process_relationship_type(
-        self,
-        rel_list: list[Relationship],
-        seen: set[tuple[str, str, str, Any | None]],
-        unique_rels: dict[str, list[Relationship]],
-        rel_name: str,
-    ) -> None:
-        """Process a list of relationships of a single type.
-
-        :param rel_list: List of relationships to process
-        :param seen: Set of seen relationships
-        :param unique_rels: Dict to store unique relationships
-        :param rel_name: Name of relationship type
-        """
-        for rel in rel_list:
-            key = (
-                self._get_object_name(rel.from_object),
-                self._get_object_name(rel.to_object),
-                type(rel).__name__,
-                getattr(rel, "name", None),
-            )
-            if key not in seen:
-                seen.add(key)
-                unique_rels[rel_name].append(rel)
-
-    def deduplicate_relationships(self, diagram: ClassDiagram) -> None:
-        """Remove duplicate relationships between objects."""
-        seen: set[tuple[str, str, str, Any | None]] = set()
-        unique_rels: dict[str, list[Relationship]] = defaultdict(list)
-        for rel_name, rel_list in diagram.relationships.items():
-            self._process_relationship_type(rel_list, seen, unique_rels, rel_name)
-
-        diagram.relationships = dict(unique_rels)
-
     def get_diadefs(self, project: Project, linker: Linker) -> list[ClassDiagram]:
         """Get the diagram's configuration data.
 
@@ -386,5 +278,4 @@ class DiadefsHandler:
             diagrams = DefaultDiadefGenerator(linker, self).visit(project)
         for diagram in diagrams:
             diagram.extract_relationships()
-            self.deduplicate_relationships(diagram)
-        return self.deduplicate_classes(diagrams)
+        return diagrams
diff --git a/pylint/pyreverse/diagrams.py b/pylint/pyreverse/diagrams.py
index 6db056300..73ec71ba0 100644
--- a/pylint/pyreverse/diagrams.py
+++ b/pylint/pyreverse/diagrams.py
@@ -6,6 +6,7 @@
 
 from __future__ import annotations
 
+from collections import defaultdict
 from collections.abc import Iterable
 from typing import Any
 
@@ -45,9 +46,7 @@ class DiagramEntity(Figure):
 
     default_shape = ""
 
-    def __init__(
-        self, title: str = "No name", node: nodes.NodeNG | None = None
-    ) -> None:
+    def __init__(self, title: str = "No name", node: nodes.NodeNG | None = None) -> None:
         super().__init__()
         self.title = title
         self.node: nodes.NodeNG = node or nodes.NodeNG(
@@ -109,9 +108,7 @@ class ClassDiagram(Figure, FilterMixIn):
         rel = Relationship(from_object, to_object, relation_type, name)
         self.relationships.setdefault(relation_type, []).append(rel)
 
-    def get_relationship(
-        self, from_object: DiagramEntity, relation_type: str
-    ) -> Relationship:
+    def get_relationship(self, from_object: DiagramEntity, relation_type: str) -> Relationship:
         """Return a relationship or None."""
         for rel in self.relationships.get(relation_type, ()):
             if rel.from_object is from_object:
@@ -126,14 +123,11 @@ class ClassDiagram(Figure, FilterMixIn):
         properties = {
             local_name: local_node
             for local_name, local_node in node.items()
-            if isinstance(local_node, nodes.FunctionDef)
-            and decorated_with_property(local_node)
+            if isinstance(local_node, nodes.FunctionDef) and decorated_with_property(local_node)
         }
 
         # Add instance attributes to properties
-        for attr_name, attr_type in list(node.locals_type.items()) + list(
-            node.instance_attrs_type.items()
-        ):
+        for attr_name, attr_type in list(node.locals_type.items()) + list(node.instance_attrs_type.items()):
             if attr_name not in properties:
                 properties[attr_name] = attr_type
 
@@ -142,9 +136,7 @@ class ClassDiagram(Figure, FilterMixIn):
                 continue
 
             # Handle property methods differently to correctly extract return type
-            if isinstance(
-                associated_nodes, nodes.FunctionDef
-            ) and decorated_with_property(associated_nodes):
+            if isinstance(associated_nodes, nodes.FunctionDef) and decorated_with_property(associated_nodes):
                 if associated_nodes.returns:
                     type_annotation = get_annotation_label(associated_nodes.returns)
                     node_name = f"{node_name} : {type_annotation}"
@@ -184,9 +176,7 @@ class ClassDiagram(Figure, FilterMixIn):
             if isinstance(node, astroid.Instance):
                 node = node._proxied
             if (
-                isinstance(
-                    node, (nodes.ClassDef, nodes.Name, nodes.Subscript, nodes.BinOp)
-                )
+                isinstance(node, (nodes.ClassDef, nodes.Name, nodes.Subscript, nodes.BinOp))
                 and hasattr(node, "name")
                 and not self.has_node(node)
             ):
@@ -194,11 +184,7 @@ class ClassDiagram(Figure, FilterMixIn):
                     node_name = node.name
                     names.append(node_name)
         # sorted to get predictable (hence testable) results
-        return sorted(
-            name
-            for name in names
-            if all(name not in other or name == other for other in names)
-        )
+        return sorted(name for name in names if all(name not in other or name == other for other in names))
 
     def has_node(self, node: nodes.NodeNG) -> bool:
         """Return true if the given node is included in the diagram."""
@@ -237,9 +223,7 @@ class ClassDiagram(Figure, FilterMixIn):
             # associations & aggregations links
             for name, values in list(node.aggregations_type.items()):
                 for value in values:
-                    self.assign_association_relationship(
-                        value, obj, name, "aggregation"
-                    )
+                    self.assign_association_relationship(value, obj, name, "aggregation")
 
             associations = node.associations_type.copy()
 
@@ -249,9 +233,9 @@ class ClassDiagram(Figure, FilterMixIn):
 
             for name, values in associations.items():
                 for value in values:
-                    self.assign_association_relationship(
-                        value, obj, name, "association"
-                    )
+                    self.assign_association_relationship(value, obj, name, "association")
+        self.deduplicate_relationships()
+        self.deduplicate_classes()
 
     def assign_association_relationship(
         self, value: astroid.NodeNG, obj: ClassEntity, name: str, type_relationship: str
@@ -266,6 +250,92 @@ class ClassDiagram(Figure, FilterMixIn):
         except KeyError:
             return
 
+    def _get_object_name(self, obj: Any) -> str:
+        """Get object name safely handling both title attributes and strings.
+
+        :param obj: Object to get name from
+        :return: Object name/title
+        :rtype: str
+        """
+        return obj.title if hasattr(obj, "title") else str(obj)
+
+    def _process_relationship(self, relationship: Relationship, unique_classes: dict[str, Any], obj: Any) -> None:
+        """Process a single relationship for deduplication.
+
+        :param relationship: Relationship to process
+        :param unique_classes: Dict of unique classes
+        :param obj: Current object being processed
+        """
+        if relationship.from_object == obj:
+            relationship.from_object = unique_classes[obj.node.qname()]
+        if relationship.to_object == obj:
+            relationship.to_object = unique_classes[obj.node.qname()]
+
+    def _process_class_relationships(
+        self, diagram: ClassDiagram, obj: Any, unique_classes: dict[str, Any]
+    ) -> None:
+        """Merge relationships for a class.
+
+        :param diagram: Current diagram
+        :param obj: Object whose relationships to process
+        :param unique_classes: Dict of unique classes
+        """
+        for rel_type in ("specialization", "association", "aggregation"):
+            for rel in diagram.get_relationships(rel_type):
+                self._process_relationship(rel, unique_classes, obj)
+
+    def deduplicate_classes(self) -> None:
+        """Remove duplicate classes from diagrams."""
+        # Track unique classes by qualified name
+        unique_classes: dict[str, Any] = {}
+        duplicate_classes: Any = set()
+
+        # First pass - identify duplicates
+        for obj in self.objects:
+            qname = obj.node.qname()
+            if qname in unique_classes:
+                duplicate_classes.add(obj)
+                self._process_class_relationships(self, obj, unique_classes)
+            else:
+                unique_classes[qname] = obj
+
+        # Second pass - filter out duplicates
+        self.objects = [obj for obj in self.objects if obj not in duplicate_classes]
+
+    def _process_relationship_type(
+        self,
+        rel_list: list[Relationship],
+        seen: set[tuple[str, str, str, Any | None]],
+        unique_rels: dict[str, list[Relationship]],
+        rel_name: str,
+    ) -> None:
+        """Process a list of relationships of a single type.
+
+        :param rel_list: List of relationships to process
+        :param seen: Set of seen relationships
+        :param unique_rels: Dict to store unique relationships
+        :param rel_name: Name of relationship type
+        """
+        for rel in rel_list:
+            key = (
+                self._get_object_name(rel.from_object),
+                self._get_object_name(rel.to_object),
+                type(rel).__name__,
+                getattr(rel, "name", None),
+            )
+            if key not in seen:
+                seen.add(key)
+                unique_rels[rel_name].append(rel)
+
+    def deduplicate_relationships(self) -> None:
+        """Remove duplicate relationships between objects."""
+        seen: set[tuple[str, str, str, Any | None]] = set()
+        unique_rels: dict[str, list[Relationship]] = defaultdict(list)
+        for rel_name, rel_list in self.relationships.items():
+            self._process_relationship_type(rel_list, seen, unique_rels, rel_name)
+
+        self.relationships = dict(unique_rels)
+
 
 class PackageDiagram(ClassDiagram):
     """Package diagram handling."""

There is also no problem when removing the call to deduplicate_classes.

Maybe there was some misunderstanding, so can you please check the proposed diff and comment if your concerns about moving the logic and splitting the MR are still valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants