Skip to content

OM text exposition for NH #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

vesari
Copy link
Contributor

@vesari vesari commented Jan 26, 2025

For the sake of separation of concerns, this PR focuses only on the OM exposition format for native histograms and it does so by mocking native histograms in tests i.e. it uses the helper function custom_collector (as it is already done for gauge histograms) instead of the observe function. I will carry out the implementation of the "complete" support for native histograms in client_python in another iteration (I'm already on it).
Incidentally, in the exposition logic, the #HELP line gets appended before the #TYPE line, whereas in the parser logic is the other way around; also, in the exposition logic the value is always a float in the OM format, whereas in the parser it doesn't seem to be the norm. In this PR I follow the behaviour already existing in the exposition logic. Let me know if I should change anything in this regard.
P.S. Of course this way of arranging the tests is just temporary; in the next iteration, they will be modified to use observe accordingly.
Edit: this PR addresses part of the issue prometheus/OpenMetrics#279

Copy link
Member

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, I have some suggestions to improve type checking and also the extra spaces need removal

@@ -30,24 +30,24 @@ def generate_latest(registry):
escape_metric_name(mname), _escape(metric.documentation)))
output.append(f'# TYPE {escape_metric_name(mname)} {metric.type}\n')
if metric.unit:
output.append(f'# UNIT {escape_metric_name(mname)} {metric.unit}\n')
output.append(f'# UNIT {escape_metric_name(mname)} {metric.unit}\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra spaces from end of lines. Also reduces the diff.

if s.labels:
items = sorted(s.labels.items())
items = s.labels.items()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OM does not require ordering of labels, still I wonder why change this, doesn't seem part of the scope of the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to use the exact same strings I used in the parser tests (there the labels are not sorted). Since I knew that ordering is not required, I removed the sorting, and once I saw that the pre-existing exposition tests don’t fail, I simply thought I’d kill two birds with one stone, i.e. keeping the same test strings and removing the sorting performance overhead (which I admit I haven’t quantified btw). Having said that, I realize this change is out of the scope of this PR, so of course I will revert it, if you wish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the sorting. I'd have to go and make sure that it doesn't cause an issue anywhere not tested. Probably ok, but for example the sorting is important for multiprocess as that is how we match keys.

timestamp,
exemplarstr,
native_histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be placed immediately after value, because the timestamp and exemplars come last

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say we should just store native histogram into value and keep the format string intact. There is only one value field allowed in a line so having two fields feels wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing native histograms in value was my first approach in the initial PR for the parser. That approach was discarded because, for value to support both float and nativeHistogram types, changes would have been necessary also in registry.py and in bridge/graphite.py. Those changes were deemed undesirable, thus the decision of adding native_histogram as a separate attribute of the Sample class. It was added for last (after exemplars and timestamp) as I thought that would have made it less likely to break things downstream rather than inserting it before. Please find the original conversation here: #1040 . While it’s my bad for not having noticed that supporting exemplars had turned from a non-goal to a goal by the time I had already started working on the parser, I knew I should have dealt with that at some point, but I thought to do something along the lines of client_golang, where the exemplars for native histograms (if I understand the code correctly) are represented separately from the “classic” exemplars https://github.com/prometheus/client_golang/blob/e729ba11961abb3a50910c324221d06a35bdce4f/prometheus/histogram.go#L1678 by means of a nativeExemplar type.

I’ll be happy to open a new PR on the parser first in order to support exemplars and address your change requests, but I can’t do it without reaching a consensus on how class Sample should look like. The way I see it, there are at least 3 possibilities:

  1. Class Sample stays as it is, while I add an attribute (say, native_exemplars of some NativeExemplar type) to class NativeHistogram.

  2. I revert class Sample to have the value attribute accepting both float and NativeHistogram types, (but then I wouldn’t know how to avoid changing registry.py and bridge/graphite.py).

  3. I move up the native_histogram attribute, so that class Sample becomes:

class Sample(NamedTuple):
    name: str
    labels: Dict[str, str]
    value: float
    native_histogram: Optional[NativeHistogram] = None
    timestamp: Optional[Union[float, Timestamp]] = None
    exemplar: Optional[Exemplar] = None

and change things accordingly elsewhere.

Which one shall I opt for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go we opted for separate sample types for floats , and native histograms (actually there's two native histogram sample types in Go, but I digress) due to performance issues negatively affecting the float sample values.

I don't know how much of a concern it is in Python, I suspect it's not a big issue.

So my vote is:
let's finish this PR with option 1. My comment about the order of printing is still valid.

Let's try to implement option 2 in a separate PR . I wish we could unify the types in Go as well...

I don't understand option 3, my comments are about the printing order , not the order in the tuple which I don't care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking about changing types elsewhere, just in this small bit of code when doing the formatting. E.g. just above this line do:

value = ''    
if s.native_histogram:
    value = native_histogram
elif s.value is not None
    value = floatToGoString(s.value)

timestamp,
exemplarstr,
native_histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be placed immediately after value, because the timestamp and exemplars come last

@@ -94,6 +95,125 @@ def test_histogram(self):
# EOF
""", generate_latest(self.registry))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add test for exemplars (note that native histograms may have multiple exemplars) see example in https://github.com/prometheus/proposals/blob/main/proposals/2024-01-29_native_histograms_text_format.md

Comment on lines +96 to +98
# Signal presence for pos/neg spans/deltas
pos = False
neg = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these just shadow s.native_histogram.pos_spans and s.native_histogram.neg_spans so I'd just write that, but no strong opinion

native_histogram = nh_sample_template.format(*args)

value = ''
if s.value is not None or not s.native_histogram:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be enough to write

Suggested change
if s.value is not None or not s.native_histogram:
if not s.native_histogram:

At least that's what we do in Prometheus. Type checks should prevent having None in the value.


def test_native_histogram(self):
hfm = HistogramMetricFamily("nh", "nh")
hfm.add_sample("nh", None, None, None, None, NativeHistogram(24, 100, 0, 0.001, 4, (BucketSpan(0, 2), BucketSpan(1, 2)), (BucketSpan(0, 2), BucketSpan(1, 2)), (2, 1, -3, 3), (2, 1, -2, 3)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation of add_sample says labels: Dict[str, str], which means that None is not allowed. I think adding {} should work. Same comment on the rest of the tests.

The type annotation of add_sample says value: float , which means that None is not allowed. Also valid comment for the other tests here.

Suggested change
hfm.add_sample("nh", None, None, None, None, NativeHistogram(24, 100, 0, 0.001, 4, (BucketSpan(0, 2), BucketSpan(1, 2)), (BucketSpan(0, 2), BucketSpan(1, 2)), (2, 1, -3, 3), (2, 1, -2, 3)))
hfm.add_sample("nh", None, 0, None, None, NativeHistogram(24, 100, 0, 0.001, 4, (BucketSpan(0, 2), BucketSpan(1, 2)), (BucketSpan(0, 2), BucketSpan(1, 2)), (2, 1, -3, 3), (2, 1, -2, 3)))

The reason why mypy CI check doesn't catch these is due to not having type annotation on the def line.

@@ -94,6 +95,125 @@ def test_histogram(self):
# EOF
""", generate_latest(self.registry))


def test_native_histogram(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_native_histogram(self):
def test_native_histogram(self) -> None:

To get type checks. Also for all the other tests here.

Copy link
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more comments to add to krajorama's.

Comment on lines +116 to +124
# Append pos deltas if pos spans were added
if pos:
nh_sample_template += ',positive_deltas:[{}]'
args.append(positive_deltas)

# Append neg deltas if neg spans were added
if neg:
nh_sample_template += ',negative_deltas:[{}]'
args.append(negative_deltas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't these just be included in the already present if statements above and then there would be no need for the pos and neg variables?

timestamp,
exemplarstr,
native_histogram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say we should just store native histogram into value and keep the format string intact. There is only one value field allowed in a line so having two fields feels wrong.

Comment on lines +181 to +183



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@vesari
Copy link
Contributor Author

vesari commented Feb 12, 2025

@csmarchbanks and @krajorama , thanks a lot for your reviews! Before I proceed addressing your change requests, let me know how you think class Sample should look like, as per my comment above. Many thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants