Skip to content

Commit 8a1b927

Browse files
authored
Move SourceDefinition to dbt/artifacts (#9543)
* Move `ColumnInfo` to dbt/artifacts * Move `Quoting` resource to dbt/artifacts * Move `TimePeriod` to `types.py` in dbt/artifacts * Move `Time` class to `components` We need to move the data parts of the `Time` definition to dbt/artifacts. That is not what we're doing in this commit. In this commit we're simply moving the functional `Time` definition upstream of `unparsed` and `nodes`. This does two things - Mirrors the import path that the resource `time` definition will have in dbt/artifacts - Reduces the chance of ciricular import problems between `unparsed` and `nodes` * Move data part of `Time` definition to dbt/artifacts * Move `FreshnessThreshold` class to components module We need to move the data parts of the `FreshnessThreshold` definition to dbt/artifacts. That is not what we're doing in this commit. In this commit we're simply moving the functional `FreshnessThreshold` definition upstream of `unparsed` and `nodes`. This does two things - Mirrors the import path that the resource `FreshnessThreshold` definition will have in dbt/artifacts - Reduces the chance of ciricular import problems between `unparsed` and `nodes` * Move data part of `FreshnessThreshold` to dbt/artifacts Note: We had to override some of the attrs of the `FreshnessThreshold` resource because the resource version only has access to the resource version of `Time`. The overrides in the functional definition of `FreshnessThreshold` make it so the attrs use the functional version of `Time`. * Move `ExternalTable` and `ExternalPartition` to `source_definition` module in dbt/artifacts * Move `SourceConfig` to `source_definition` module in dbt/artifacts * Move `HasRelationMetadata` to core `components` module This is a precursor to splitting `HasRelationMetadata` into it's data and functional parts. * Move data portion of `HasRelationMetadata` to dbt/artifacts * Move `SourceDefinitionMandatory` to dbt/artifacts * Move the data parts of `SourceDefinition` to dbt/artifacts Something interesting here is that we had to override the `freshness` property. We had to do this because if we didn't we wouldn't get the functional parts of `FreshnessThreshold`, we'd only get the data parts. Also of note, the `SourceDefintion` has a lot of `@property` methods that on other classes would be actual attribute properties of the node. There is an argument to be made that these should be moved as well, but thats perhaps a separate discussion. Finally, we have not (yet) moved `NodeInfoMixin`. It is an open discussion whether we do or not. It seems primarily functional, as a means to update the source freshness information. As the artifacts primarily deal with the shape of the data, not how it should be set, it seems for now that `NodeInfoMixin` should stay in core / not move to artifacts. This thinking may change though. * Refactor `from_resource` to no longer use generics In the next commit we're gonna add a `to_resource` method. As we don't want to have to pass a resource into `to_resource`, the class itself needs to expose what resource class should be built. Thus a type annotation is no longer enough. To solve this we've added a class method to BaseNode which returns the associated resource class. The method on BaseNode will raise a NotImplementedError unless the the inheriting class has overridden the `resouce_class` method to return the a resource class. You may be thinking "Why not a class property"? And that is absolutely a valid question. We used to be able to chain `@classmethod` with `@property` to create a class property. However, this was deprecated in python 3.11 and removed in 3.13 (details on why this happened can be found [here](python/cpython#89519)). There is an [alternate way to setup a class property](python/cpython#89519 (comment)), however this seems a bit convoluted if a class method easily gets the job done. The draw back is that we must do `.resource_class()` instead of `.resource_class` and on classes implementing `BaseNode` we have to override it with a method instead of a property specification. Additionally, making it a class _instance_ property won't work because we don't want to require an _instance_ of the class to get the `resource_class` as we might not have an instance at our dispossal. * Add `to_resource` method to `BaseNode` Nodes have extra attributes. We don't want these extra attributes to get serialized. Thus we're converting back to resources prior to serialization. There could be a CPU hit here as we're now dictifying and undictifying right before serialization. We can do some complicated and non-straight-forward things to get around this. However, we want to see how big of a perforance hit we actually have before going that route. * Drop `__post_serialize__` from `SourceDefinition` node class The method `__post_serialize__` on the `SourceDefinition` was used for ensuring the property `_event_status` didn't make it to the serialized version of the node. Now that resource definition of `SourceDefinition` handles serialization/deserialization, we can drop `__post_serialize__` as it is no longer needed. * Merge functional parts of `components` into their resource counter parts We discussed this on the PR. It seems like a minimal lift, and minimal to support. Doing so also has the benefit of reducing a bunch of the overriding we were previously doing. * Fixup:: Rename variable `name` to `node_id` in `_map_nodes_to_map_resources` Naming is hard. That is all. * Fixup: Ensure conversion of groups to resources for `WritableManifest`
1 parent 2411f93 commit 8a1b927

18 files changed

+303
-220
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
kind: Under the Hood
2+
body: Move data parts of `SourceDefinition` class to dbt/artifacts
3+
time: 2024-02-08T12:06:20.696709-08:00
4+
custom:
5+
Author: QMalcolm
6+
Issue: "9384"

core/dbt/artifacts/resources/__init__.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
11
from dbt.artifacts.resources.base import BaseResource, GraphResource
22

33
# alias to latest resource definitions
4-
from dbt.artifacts.resources.v1.components import DependsOn, NodeVersion, RefArgs
4+
from dbt.artifacts.resources.v1.components import (
5+
ColumnInfo,
6+
DependsOn,
7+
FreshnessThreshold,
8+
HasRelationMetadata,
9+
NodeVersion,
10+
Quoting,
11+
RefArgs,
12+
Time,
13+
)
514
from dbt.artifacts.resources.v1.documentation import Documentation
615
from dbt.artifacts.resources.v1.exposure import (
716
Exposure,
@@ -50,3 +59,10 @@
5059
SemanticModel,
5160
SemanticModelConfig,
5261
)
62+
from dbt.artifacts.resources.v1.source_definition import (
63+
ExternalPartition,
64+
ExternalTable,
65+
SourceDefinition,
66+
ParsedSourceMandatory,
67+
SourceConfig,
68+
)

core/dbt/artifacts/resources/types.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,12 @@ class RunHookType(StrEnum):
5454
class ModelLanguage(StrEnum):
5555
python = "python"
5656
sql = "sql"
57+
58+
59+
class TimePeriod(StrEnum):
60+
minute = "minute"
61+
hour = "hour"
62+
day = "day"
63+
64+
def plural(self) -> str:
65+
return str(self) + "s"

core/dbt/artifacts/resources/v1/components.py

Lines changed: 88 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
from dataclasses import dataclass, field
2+
from datetime import timedelta
3+
from dbt.artifacts.resources.types import TimePeriod
24
from dbt.artifacts.resources.v1.macro import MacroDependsOn
3-
from dbt_common.dataclass_schema import dbtClassMixin
4-
from typing import Dict, List, Optional, Union
5+
from dbt_common.contracts.config.properties import AdditionalPropertiesMixin
6+
from dbt_common.contracts.constraints import ColumnLevelConstraint
7+
from dbt_common.contracts.util import Mergeable, Replaceable
8+
from dbt_common.dataclass_schema import dbtClassMixin, ExtensibleDbtClassMixin
9+
from typing import Any, Dict, List, Optional, Union
510

611

712
NodeVersion = Union[str, float]
@@ -35,3 +40,84 @@ def keyword_args(self) -> Dict[str, Optional[NodeVersion]]:
3540
return {"version": self.version}
3641
else:
3742
return {}
43+
44+
45+
@dataclass
46+
class ColumnInfo(AdditionalPropertiesMixin, ExtensibleDbtClassMixin, Replaceable):
47+
"""Used in all ManifestNodes and SourceDefinition"""
48+
49+
name: str
50+
description: str = ""
51+
meta: Dict[str, Any] = field(default_factory=dict)
52+
data_type: Optional[str] = None
53+
constraints: List[ColumnLevelConstraint] = field(default_factory=list)
54+
quote: Optional[bool] = None
55+
tags: List[str] = field(default_factory=list)
56+
_extra: Dict[str, Any] = field(default_factory=dict)
57+
58+
59+
@dataclass
60+
class Quoting(dbtClassMixin, Mergeable):
61+
database: Optional[bool] = None
62+
schema: Optional[bool] = None
63+
identifier: Optional[bool] = None
64+
column: Optional[bool] = None
65+
66+
67+
@dataclass
68+
class Time(dbtClassMixin, Mergeable):
69+
count: Optional[int] = None
70+
period: Optional[TimePeriod] = None
71+
72+
def exceeded(self, actual_age: float) -> bool:
73+
if self.period is None or self.count is None:
74+
return False
75+
kwargs: Dict[str, int] = {self.period.plural(): self.count}
76+
difference = timedelta(**kwargs).total_seconds()
77+
return actual_age > difference
78+
79+
def __bool__(self):
80+
return self.count is not None and self.period is not None
81+
82+
83+
@dataclass
84+
class FreshnessThreshold(dbtClassMixin, Mergeable):
85+
warn_after: Optional[Time] = field(default_factory=Time)
86+
error_after: Optional[Time] = field(default_factory=Time)
87+
filter: Optional[str] = None
88+
89+
def status(self, age: float) -> "dbt.artifacts.schemas.results.FreshnessStatus": # type: ignore # noqa F821
90+
from dbt.artifacts.schemas.results import FreshnessStatus
91+
92+
if self.error_after and self.error_after.exceeded(age):
93+
return FreshnessStatus.Error
94+
elif self.warn_after and self.warn_after.exceeded(age):
95+
return FreshnessStatus.Warn
96+
else:
97+
return FreshnessStatus.Pass
98+
99+
def __bool__(self):
100+
return bool(self.warn_after) or bool(self.error_after)
101+
102+
103+
@dataclass
104+
class HasRelationMetadata(dbtClassMixin, Replaceable):
105+
database: Optional[str]
106+
schema: str
107+
108+
# Can't set database to None like it ought to be
109+
# because it messes up the subclasses and default parameters
110+
# so hack it here
111+
@classmethod
112+
def __pre_deserialize__(cls, data):
113+
data = super().__pre_deserialize__(data)
114+
if "database" not in data:
115+
data["database"] = None
116+
return data
117+
118+
@property
119+
def quoting_dict(self) -> Dict[str, bool]:
120+
if hasattr(self, "quoting"):
121+
return self.quoting.to_dict(omit_none=True)
122+
else:
123+
return {}
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
import time
2+
3+
from dataclasses import dataclass, field
4+
from dbt.artifacts.resources.base import GraphResource
5+
from dbt.artifacts.resources.types import NodeType
6+
from dbt.artifacts.resources.v1.components import (
7+
ColumnInfo,
8+
FreshnessThreshold,
9+
HasRelationMetadata,
10+
Quoting,
11+
)
12+
from dbt_common.contracts.config.base import BaseConfig
13+
from dbt_common.contracts.config.properties import AdditionalPropertiesAllowed
14+
from dbt_common.contracts.util import Mergeable, Replaceable
15+
from dbt_common.exceptions import CompilationError
16+
from typing import Any, Dict, List, Literal, Optional, Union
17+
18+
19+
@dataclass
20+
class ExternalPartition(AdditionalPropertiesAllowed, Replaceable):
21+
name: str = ""
22+
description: str = ""
23+
data_type: str = ""
24+
meta: Dict[str, Any] = field(default_factory=dict)
25+
26+
def __post_init__(self):
27+
if self.name == "" or self.data_type == "":
28+
raise CompilationError("External partition columns must have names and data types")
29+
30+
31+
@dataclass
32+
class ExternalTable(AdditionalPropertiesAllowed, Mergeable):
33+
location: Optional[str] = None
34+
file_format: Optional[str] = None
35+
row_format: Optional[str] = None
36+
tbl_properties: Optional[str] = None
37+
partitions: Optional[Union[List[str], List[ExternalPartition]]] = None
38+
39+
def __bool__(self):
40+
return self.location is not None
41+
42+
43+
@dataclass
44+
class SourceConfig(BaseConfig):
45+
enabled: bool = True
46+
47+
48+
@dataclass
49+
class ParsedSourceMandatory(GraphResource, HasRelationMetadata):
50+
source_name: str
51+
source_description: str
52+
loader: str
53+
identifier: str
54+
resource_type: Literal[NodeType.Source]
55+
56+
57+
@dataclass
58+
class SourceDefinition(ParsedSourceMandatory):
59+
quoting: Quoting = field(default_factory=Quoting)
60+
loaded_at_field: Optional[str] = None
61+
freshness: Optional[FreshnessThreshold] = None
62+
external: Optional[ExternalTable] = None
63+
description: str = ""
64+
columns: Dict[str, ColumnInfo] = field(default_factory=dict)
65+
meta: Dict[str, Any] = field(default_factory=dict)
66+
source_meta: Dict[str, Any] = field(default_factory=dict)
67+
tags: List[str] = field(default_factory=list)
68+
config: SourceConfig = field(default_factory=SourceConfig)
69+
patch_path: Optional[str] = None
70+
unrendered_config: Dict[str, Any] = field(default_factory=dict)
71+
relation_name: Optional[str] = None
72+
created_at: float = field(default_factory=lambda: time.time())

core/dbt/artifacts/schemas/freshness/v3/freshness.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from typing import Dict, Any, Sequence, List, Union, Optional
33
from datetime import datetime
44

5+
from dbt.artifacts.resources import FreshnessThreshold
56
from dbt.artifacts.schemas.results import ExecutionResult, FreshnessStatus, NodeResult, TimingInfo
67
from dbt.artifacts.schemas.base import (
78
ArtifactMixin,
@@ -12,7 +13,6 @@
1213
from dbt_common.dataclass_schema import dbtClassMixin, StrEnum
1314
from dbt_common.exceptions import DbtInternalError
1415

15-
from dbt.contracts.graph.unparsed import FreshnessThreshold
1616
from dbt.contracts.graph.nodes import SourceDefinition
1717

1818

core/dbt/contracts/graph/manifest.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -772,6 +772,9 @@ class ManifestStateCheck(dbtClassMixin):
772772
project_hashes: MutableMapping[str, FileHash] = field(default_factory=dict)
773773

774774

775+
NodeClassT = TypeVar("NodeClassT", bound="BaseNode")
776+
777+
775778
@dataclass
776779
class Manifest(MacroMethods, DataClassMessagePackMixin, dbtClassMixin):
777780
"""The manifest for the full graph, after parsing and during compilation."""
@@ -1020,26 +1023,30 @@ def build_group_map(self):
10201023
group_map[node.group].append(node.unique_id)
10211024
self.group_map = group_map
10221025

1026+
@classmethod
1027+
def _map_nodes_to_map_resources(cls, nodes_map: MutableMapping[str, NodeClassT]):
1028+
return {node_id: node.to_resource() for node_id, node in nodes_map.items()}
1029+
10231030
def writable_manifest(self) -> "WritableManifest":
10241031
self.build_parent_and_child_maps()
10251032
self.build_group_map()
10261033
return WritableManifest(
10271034
nodes=self.nodes,
1028-
sources=self.sources,
1035+
sources=self._map_nodes_to_map_resources(self.sources),
10291036
macros=self.macros,
10301037
docs=self.docs,
1031-
exposures=self.exposures,
1032-
metrics=self.metrics,
1033-
groups=self.groups,
1038+
exposures=self._map_nodes_to_map_resources(self.exposures),
1039+
metrics=self._map_nodes_to_map_resources(self.metrics),
1040+
groups=self._map_nodes_to_map_resources(self.groups),
10341041
selectors=self.selectors,
10351042
metadata=self.metadata,
10361043
disabled=self.disabled,
10371044
child_map=self.child_map,
10381045
parent_map=self.parent_map,
10391046
group_map=self.group_map,
1040-
semantic_models=self.semantic_models,
1047+
semantic_models=self._map_nodes_to_map_resources(self.semantic_models),
10411048
unit_tests=self.unit_tests,
1042-
saved_queries=self.saved_queries,
1049+
saved_queries=self._map_nodes_to_map_resources(self.saved_queries),
10431050
)
10441051

10451052
def write(self, path):

core/dbt/contracts/graph/model_config.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
MetricConfig,
88
SavedQueryConfig,
99
SemanticModelConfig,
10+
SourceConfig,
1011
)
1112
from dbt_common.contracts.config.base import BaseConfig, MergeBehavior, CompareBehavior
1213
from dbt_common.contracts.config.materialization import OnConfigurationChangeOption
@@ -54,11 +55,6 @@ class Hook(dbtClassMixin, Replaceable):
5455
index: Optional[int] = None
5556

5657

57-
@dataclass
58-
class SourceConfig(BaseConfig):
59-
enabled: bool = True
60-
61-
6258
@dataclass
6359
class NodeAndTestConfig(BaseConfig):
6460
enabled: bool = True

0 commit comments

Comments
 (0)