Skip to content

gh-89083: add support for UUID version 8 (RFC 9562) #123224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 12, 2024
42 changes: 33 additions & 9 deletions Doc/library/uuid.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:mod:`!uuid` --- UUID objects according to :rfc:`4122`
:mod:`!uuid` --- UUID objects according to :rfc:`9562`
======================================================

.. module:: uuid
:synopsis: UUID objects (universally unique identifiers) according to RFC 4122
:synopsis: UUID objects (universally unique identifiers) according to RFC 9562
.. moduleauthor:: Ka-Ping Yee <[email protected]>
.. sectionauthor:: George Yoshida <[email protected]>

Expand All @@ -12,7 +12,8 @@

This module provides immutable :class:`UUID` objects (the :class:`UUID` class)
and the functions :func:`uuid1`, :func:`uuid3`, :func:`uuid4`, :func:`uuid5` for
generating version 1, 3, 4, and 5 UUIDs as specified in :rfc:`4122`.
generating version 1, 3, 4, 5, and 8 UUIDs as specified in :rfc:`9562` (which
supersedes :rfc:`4122`).

If all you want is a unique ID, you should probably call :func:`uuid1` or
:func:`uuid4`. Note that :func:`uuid1` may compromise privacy since it creates
Expand Down Expand Up @@ -65,7 +66,7 @@ which relays any information about the UUID's safety, using this enumeration:

Exactly one of *hex*, *bytes*, *bytes_le*, *fields*, or *int* must be given.
The *version* argument is optional; if given, the resulting UUID will have its
variant and version number set according to :rfc:`4122`, overriding bits in the
variant and version number set according to :rfc:`9562`, overriding bits in the
given *hex*, *bytes*, *bytes_le*, *fields*, or *int*.

Comparison of UUID objects are made by way of comparing their
Expand Down Expand Up @@ -137,7 +138,7 @@ which relays any information about the UUID's safety, using this enumeration:

.. attribute:: UUID.urn

The UUID as a URN as specified in :rfc:`4122`.
The UUID as a URN as specified in :rfc:`9562`.


.. attribute:: UUID.variant
Expand All @@ -149,9 +150,13 @@ which relays any information about the UUID's safety, using this enumeration:

.. attribute:: UUID.version

The UUID version number (1 through 5, meaningful only when the variant is
The UUID version number (1 through 8, meaningful only when the variant is
:const:`RFC_4122`).

.. versionchanged:: next
Added UUID version 8.


.. attribute:: UUID.is_safe

An enumeration of :class:`SafeUUID` which indicates whether the platform
Expand Down Expand Up @@ -216,6 +221,23 @@ The :mod:`uuid` module defines the following functions:

.. index:: single: uuid5


.. function:: uuid8(a=None, b=None, c=None)

Generate a pseudo-random UUID according to
:rfc:`RFC 9562, §5.8 <9562#section-5.8>`.

When specified, the parameters *a*, *b* and *c* are expected to be
positive integers of 48, 12 and 62 bits respectively. If they exceed
their expected bit count, only their least significant bits are kept;
non-specified arguments are substituted for a pseudo-random integer of
appropriate size.

.. versionadded:: next

.. index:: single: uuid8


The :mod:`uuid` module defines the following namespace identifiers for use with
:func:`uuid3` or :func:`uuid5`.

Expand Down Expand Up @@ -252,7 +274,9 @@ of the :attr:`~UUID.variant` attribute:

.. data:: RFC_4122

Specifies the UUID layout given in :rfc:`4122`.
Specifies the UUID layout given in :rfc:`4122`. This constant is kept
for backward compatibility even though :rfc:`4122` has been superseded
by :rfc:`9562`.


.. data:: RESERVED_MICROSOFT
Expand All @@ -267,7 +291,7 @@ of the :attr:`~UUID.variant` attribute:

.. seealso::

:rfc:`4122` - A Universally Unique IDentifier (UUID) URN Namespace
:rfc:`9562` - A Universally Unique IDentifier (UUID) URN Namespace
This specification defines a Uniform Resource Name namespace for UUIDs, the
internal format of UUIDs, and methods of generating UUIDs.

Expand All @@ -283,7 +307,7 @@ The :mod:`uuid` module can be executed as a script from the command line.

.. code-block:: sh

python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5}] [-n NAMESPACE] [-N NAME]
python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5,uuid8}] [-n NAMESPACE] [-N NAME]

The following options are accepted:

Expand Down
8 changes: 8 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,14 @@ unittest
(Contributed by Jacob Walls in :gh:`80958`.)


uuid
----

* Add support for UUID version 8 via :func:`uuid.uuid8` as specified
in :rfc:`9562`.
(Contributed by Bénédikt Tran in :gh:`89083`.)


.. Add improved modules above alphabetically, not here at the end.

Optimizations
Expand Down
35 changes: 34 additions & 1 deletion Lib/test/test_uuid.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
import io
import os
import pickle
import random
import sys
import weakref
from itertools import product
from unittest import mock

py_uuid = import_helper.import_fresh_module('uuid', blocked=['_uuid'])
Expand Down Expand Up @@ -267,7 +269,7 @@ def test_exceptions(self):

# Version number out of range.
badvalue(lambda: self.uuid.UUID('00'*16, version=0))
badvalue(lambda: self.uuid.UUID('00'*16, version=6))
badvalue(lambda: self.uuid.UUID('00'*16, version=42))

# Integer value out of range.
badvalue(lambda: self.uuid.UUID(int=-1))
Expand Down Expand Up @@ -681,6 +683,37 @@ def test_uuid5(self):
equal(u, self.uuid.UUID(v))
equal(str(u), v)

def test_uuid8(self):
equal = self.assertEqual
u = self.uuid.uuid8()

equal(u.variant, self.uuid.RFC_4122)
equal(u.version, 8)

for (_, hi, mid, lo) in product(
range(10), # repeat 10 times
[None, 0, random.getrandbits(48)],
[None, 0, random.getrandbits(12)],
[None, 0, random.getrandbits(62)],
):
u = self.uuid.uuid8(hi, mid, lo)
equal(u.variant, self.uuid.RFC_4122)
equal(u.version, 8)
if hi is not None:
equal((u.int >> 80) & 0xffffffffffff, hi)
if mid is not None:
equal((u.int >> 64) & 0xfff, mid)
if lo is not None:
equal(u.int & 0x3fffffffffffffff, lo)

def test_uuid8_uniqueness(self):
# Test that UUIDv8-generated values are unique
# (up to a negligible probability of failure).
u1 = self.uuid.uuid8()
u2 = self.uuid.uuid8()
self.assertNotEqual(u1.int, u2.int)
self.assertEqual(u1.version, u2.version)

@support.requires_fork()
def testIssue8621(self):
# On at least some versions of OSX self.uuid.uuid4 generates
Expand Down
41 changes: 32 additions & 9 deletions Lib/uuid.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
r"""UUID objects (universally unique identifiers) according to RFC 4122.
r"""UUID objects (universally unique identifiers) according to RFC 4122/9562.

This module provides immutable UUID objects (class UUID) and the functions
uuid1(), uuid3(), uuid4(), uuid5() for generating version 1, 3, 4, and 5
UUIDs as specified in RFC 4122.
uuid1(), uuid3(), uuid4(), uuid5(), and uuid8() for generating version 1, 3,
4, 5, and 8 UUIDs as specified in RFC 4122/9562.

If all you want is a unique ID, you should probably call uuid1() or uuid4().
Note that uuid1() may compromise privacy since it creates a UUID containing
Expand Down Expand Up @@ -124,12 +124,12 @@ class UUID:

int the UUID as a 128-bit integer

urn the UUID as a URN as specified in RFC 4122
urn the UUID as a URN as specified in RFC 4122/9562

variant the UUID variant (one of the constants RESERVED_NCS,
RFC_4122, RESERVED_MICROSOFT, or RESERVED_FUTURE)

version the UUID version number (1 through 5, meaningful only
version the UUID version number (1 through 8, meaningful only
when the variant is RFC_4122)

is_safe An enum indicating whether the UUID has been generated in
Expand Down Expand Up @@ -214,9 +214,9 @@ def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
if not 0 <= int < 1<<128:
raise ValueError('int is out of range (need a 128-bit value)')
if version is not None:
if not 1 <= version <= 5:
if not 1 <= version <= 8:
raise ValueError('illegal version number')
# Set the variant to RFC 4122.
# Set the variant to RFC 4122/9562.
int &= ~(0xc000 << 48)
int |= 0x8000 << 48
# Set the version number.
Expand Down Expand Up @@ -355,7 +355,7 @@ def variant(self):

@property
def version(self):
# The version bits are only meaningful for RFC 4122 UUIDs.
# The version bits are only meaningful for RFC 4122/9562 UUIDs.
if self.variant == RFC_4122:
return int((self.int >> 76) & 0xf)

Expand Down Expand Up @@ -719,14 +719,37 @@ def uuid5(namespace, name):
hash = sha1(namespace.bytes + name).digest()
return UUID(bytes=hash[:16], version=5)

def uuid8(a=None, b=None, c=None):
"""Generate a UUID from three custom blocks.

* 'a' is the first 48-bit chunk of the UUID (octets 0-5);
* 'b' is the mid 12-bit chunk (octets 6-7);
* 'c' is the last 62-bit chunk (octets 8-15).

When a value is not specified, a pseudo-random value is generated.
"""
if a is None:
import random
a = random.getrandbits(48)
if b is None:
import random
b = random.getrandbits(12)
if c is None:
import random
c = random.getrandbits(62)
int_uuid_8 = (a & 0xffff_ffff_ffff) << 80
Copy link
Member

@vstinner vstinner Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense / is it possible to reject values outside the expected value range? For example, reject negative numbers?

Maybe something like:

orig_a = a
a &= 0xffff_ffff_ffff
if a != orig_a: raise ValueError("...")

I don't know. Would it be consistent with other uuid functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUIDv1 does not reject them so I wouldn't worry too much about it. v3 and v5 are not based on integral inputs.

Copy link
Member

@vstinner vstinner Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having the same behavior than uuid1() in this case, since it's documented.

int_uuid_8 |= (b & 0xfff) << 64
int_uuid_8 |= c & 0x3fff_ffff_ffff_ffff
return UUID(int=int_uuid_8, version=8)

def main():
"""Run the uuid command line interface."""
uuid_funcs = {
"uuid1": uuid1,
"uuid3": uuid3,
"uuid4": uuid4,
"uuid5": uuid5
"uuid5": uuid5,
"uuid8": uuid8,
}
uuid_namespace_funcs = ("uuid3", "uuid5")
namespaces = {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add :func:`uuid.uuid8` for generating UUIDv8 objects as specified in
:rfc:`9562`. Patch by Bénédikt Tran
Loading