Skip to content

Commit 990ad27

Browse files
picnixzhugovkvstinner
authored
gh-89083: add support for UUID version 6 (RFC 9562) (#120650)
Add support for generating UUIDv6 objects according to RFC 9562, §5.6 [1]. The functionality is provided by the `uuid.uuid6()` function which takes as inputs an optional 48-bit hardware address and an optional 14-bit clock sequence. The UUIDv6 temporal fields are ordered differently than those of UUIDv1, thereby providing improved database locality. [1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-5.6 --------- Co-authored-by: Hugo van Kemenade <[email protected]> Co-authored-by: Victor Stinner <[email protected]>
1 parent 214562e commit 990ad27

File tree

5 files changed

+235
-17
lines changed

5 files changed

+235
-17
lines changed

Doc/library/uuid.rst

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212

1313
This module provides immutable :class:`UUID` objects (the :class:`UUID` class)
1414
and the functions :func:`uuid1`, :func:`uuid3`, :func:`uuid4`, :func:`uuid5`,
15-
and :func:`uuid.uuid8` for generating version 1, 3, 4, 5, and 8 UUIDs as
16-
specified in :rfc:`9562` (which supersedes :rfc:`4122`).
15+
:func:`uuid6`, and :func:`uuid8` for generating version 1, 3, 4, 5, 6,
16+
and 8 UUIDs as specified in :rfc:`9562` (which supersedes :rfc:`4122`).
1717

1818
If all you want is a unique ID, you should probably call :func:`uuid1` or
1919
:func:`uuid4`. Note that :func:`uuid1` may compromise privacy since it creates
@@ -153,8 +153,8 @@ which relays any information about the UUID's safety, using this enumeration:
153153
The UUID version number (1 through 8, meaningful only when the variant is
154154
:const:`RFC_4122`).
155155

156-
.. versionchanged:: 3.14
157-
Added UUID version 8.
156+
.. versionchanged:: next
157+
Added UUID versions 6 and 8.
158158

159159

160160
.. attribute:: UUID.is_safe
@@ -212,6 +212,22 @@ The :mod:`uuid` module defines the following functions:
212212
that will be encoded using UTF-8).
213213

214214

215+
.. function:: uuid6(node=None, clock_seq=None)
216+
217+
Generate a UUID from a sequence number and the current time according to
218+
:rfc:`9562`.
219+
This is an alternative to :func:`uuid1` to improve database locality.
220+
221+
When *node* is not specified, :func:`getnode` is used to obtain the hardware
222+
address as a 48-bit positive integer. When a sequence number *clock_seq* is
223+
not specified, a pseudo-random 14-bit positive integer is generated.
224+
225+
If *node* or *clock_seq* exceed their expected bit count, only their least
226+
significant bits are kept.
227+
228+
.. versionadded:: next
229+
230+
215231
.. function:: uuid8(a=None, b=None, c=None)
216232

217233
Generate a pseudo-random UUID according to
@@ -314,7 +330,7 @@ The :mod:`uuid` module can be executed as a script from the command line.
314330

315331
.. code-block:: sh
316332
317-
python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5,uuid8}] [-n NAMESPACE] [-N NAME]
333+
python -m uuid [-h] [-u {uuid1,uuid3,uuid4,uuid5,uuid6,uuid8}] [-n NAMESPACE] [-N NAME]
318334
319335
The following options are accepted:
320336

@@ -330,8 +346,8 @@ The following options are accepted:
330346
Specify the function name to use to generate the uuid. By default :func:`uuid4`
331347
is used.
332348

333-
.. versionadded:: 3.14
334-
Allow generating UUID version 8.
349+
.. versionchanged:: next
350+
Allow generating UUID versions 6 and 8.
335351

336352
.. option:: -n <namespace>
337353
--namespace <namespace>

Doc/whatsnew/3.14.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -919,8 +919,8 @@ urllib
919919
uuid
920920
----
921921

922-
* Add support for UUID version 8 via :func:`uuid.uuid8` as specified
923-
in :rfc:`9562`.
922+
* Add support for UUID versions 6 and 8 via :func:`uuid.uuid6` and
923+
:func:`uuid.uuid8` respectively, as specified in :rfc:`9562`.
924924
(Contributed by Bénédikt Tran in :gh:`89083`.)
925925

926926
* :const:`uuid.NIL` and :const:`uuid.MAX` are now available to represent the

Lib/test/test_uuid.py

Lines changed: 150 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
import unittest
2-
from test import support
3-
from test.support import import_helper
41
import builtins
52
import contextlib
63
import copy
@@ -10,10 +7,14 @@
107
import pickle
118
import random
129
import sys
10+
import unittest
1311
import weakref
1412
from itertools import product
1513
from unittest import mock
1614

15+
from test import support
16+
from test.support import import_helper
17+
1718
py_uuid = import_helper.import_fresh_module('uuid', blocked=['_uuid'])
1819
c_uuid = import_helper.import_fresh_module('uuid', fresh=['_uuid'])
1920

@@ -724,6 +725,152 @@ def test_uuid5(self):
724725
equal(u, self.uuid.UUID(v))
725726
equal(str(u), v)
726727

728+
def test_uuid6(self):
729+
equal = self.assertEqual
730+
u = self.uuid.uuid6()
731+
equal(u.variant, self.uuid.RFC_4122)
732+
equal(u.version, 6)
733+
734+
fake_nanoseconds = 0x1571_20a1_de1a_c533
735+
fake_node_value = 0x54e1_acf6_da7f
736+
fake_clock_seq = 0x14c5
737+
with (
738+
mock.patch.object(self.uuid, '_last_timestamp_v6', None),
739+
mock.patch.object(self.uuid, 'getnode', return_value=fake_node_value),
740+
mock.patch('time.time_ns', return_value=fake_nanoseconds),
741+
mock.patch('random.getrandbits', return_value=fake_clock_seq)
742+
):
743+
u = self.uuid.uuid6()
744+
equal(u.variant, self.uuid.RFC_4122)
745+
equal(u.version, 6)
746+
747+
# 32 (top) | 16 (mid) | 12 (low) == 60 (timestamp)
748+
equal(u.time, 0x1e901fca_7a55_b92)
749+
equal(u.fields[0], 0x1e901fca) # 32 top bits of time
750+
equal(u.fields[1], 0x7a55) # 16 mid bits of time
751+
# 4 bits of version + 12 low bits of time
752+
equal((u.fields[2] >> 12) & 0xf, 6)
753+
equal((u.fields[2] & 0xfff), 0xb92)
754+
# 2 bits of variant + 6 high bits of clock_seq
755+
equal((u.fields[3] >> 6) & 0xf, 2)
756+
equal(u.fields[3] & 0x3f, fake_clock_seq >> 8)
757+
# 8 low bits of clock_seq
758+
equal(u.fields[4], fake_clock_seq & 0xff)
759+
equal(u.fields[5], fake_node_value)
760+
761+
def test_uuid6_uniqueness(self):
762+
# Test that UUIDv6-generated values are unique.
763+
764+
# Unlike UUIDv8, only 62 bits can be randomized for UUIDv6.
765+
# In practice, however, it remains unlikely to generate two
766+
# identical UUIDs for the same 60-bit timestamp if neither
767+
# the node ID nor the clock sequence is specified.
768+
uuids = {self.uuid.uuid6() for _ in range(1000)}
769+
self.assertEqual(len(uuids), 1000)
770+
versions = {u.version for u in uuids}
771+
self.assertSetEqual(versions, {6})
772+
773+
timestamp = 0x1ec9414c_232a_b00
774+
fake_nanoseconds = (timestamp - 0x1b21dd21_3814_000) * 100
775+
776+
with mock.patch('time.time_ns', return_value=fake_nanoseconds):
777+
def gen():
778+
with mock.patch.object(self.uuid, '_last_timestamp_v6', None):
779+
return self.uuid.uuid6(node=0, clock_seq=None)
780+
781+
# By the birthday paradox, sampling N = 1024 UUIDs with identical
782+
# node IDs and timestamps results in duplicates with probability
783+
# close to 1 (not having a duplicate happens with probability of
784+
# order 1E-15) since only the 14-bit clock sequence is randomized.
785+
N = 1024
786+
uuids = {gen() for _ in range(N)}
787+
self.assertSetEqual({u.node for u in uuids}, {0})
788+
self.assertSetEqual({u.time for u in uuids}, {timestamp})
789+
self.assertLess(len(uuids), N, 'collision property does not hold')
790+
791+
def test_uuid6_node(self):
792+
# Make sure the given node ID appears in the UUID.
793+
#
794+
# Note: when no node ID is specified, the same logic as for UUIDv1
795+
# is applied to UUIDv6. In particular, there is no need to test that
796+
# getnode() correctly returns positive integers of exactly 48 bits
797+
# since this is done in test_uuid1_eui64().
798+
self.assertLessEqual(self.uuid.uuid6().node.bit_length(), 48)
799+
800+
self.assertEqual(self.uuid.uuid6(0).node, 0)
801+
802+
# tests with explicit values
803+
max_node = 0xffff_ffff_ffff
804+
self.assertEqual(self.uuid.uuid6(max_node).node, max_node)
805+
big_node = 0xE_1234_5678_ABCD # 52-bit node
806+
res_node = 0x0_1234_5678_ABCD # truncated to 48 bits
807+
self.assertEqual(self.uuid.uuid6(big_node).node, res_node)
808+
809+
# randomized tests
810+
for _ in range(10):
811+
# node with > 48 bits is truncated
812+
for b in [24, 48, 72]:
813+
node = (1 << (b - 1)) | random.getrandbits(b)
814+
with self.subTest(node=node, bitlen=b):
815+
self.assertEqual(node.bit_length(), b)
816+
u = self.uuid.uuid6(node=node)
817+
self.assertEqual(u.node, node & 0xffff_ffff_ffff)
818+
819+
def test_uuid6_clock_seq(self):
820+
# Make sure the supplied clock sequence appears in the UUID.
821+
#
822+
# For UUIDv6, clock sequence bits are stored from bit 48 to bit 62,
823+
# with the convention that the least significant bit is bit 0 and
824+
# the most significant bit is bit 127.
825+
get_clock_seq = lambda u: (u.int >> 48) & 0x3fff
826+
827+
u = self.uuid.uuid6()
828+
self.assertLessEqual(get_clock_seq(u).bit_length(), 14)
829+
830+
# tests with explicit values
831+
big_clock_seq = 0xffff # 16-bit clock sequence
832+
res_clock_seq = 0x3fff # truncated to 14 bits
833+
u = self.uuid.uuid6(clock_seq=big_clock_seq)
834+
self.assertEqual(get_clock_seq(u), res_clock_seq)
835+
836+
# some randomized tests
837+
for _ in range(10):
838+
# clock_seq with > 14 bits is truncated
839+
for b in [7, 14, 28]:
840+
node = random.getrandbits(48)
841+
clock_seq = (1 << (b - 1)) | random.getrandbits(b)
842+
with self.subTest(node=node, clock_seq=clock_seq, bitlen=b):
843+
self.assertEqual(clock_seq.bit_length(), b)
844+
u = self.uuid.uuid6(node=node, clock_seq=clock_seq)
845+
self.assertEqual(get_clock_seq(u), clock_seq & 0x3fff)
846+
847+
def test_uuid6_test_vectors(self):
848+
equal = self.assertEqual
849+
# https://www.rfc-editor.org/rfc/rfc9562#name-test-vectors
850+
# (separators are put at the 12th and 28th bits)
851+
timestamp = 0x1ec9414c_232a_b00
852+
fake_nanoseconds = (timestamp - 0x1b21dd21_3814_000) * 100
853+
# https://www.rfc-editor.org/rfc/rfc9562#name-example-of-a-uuidv6-value
854+
node = 0x9f6bdeced846
855+
clock_seq = (3 << 12) | 0x3c8
856+
857+
with (
858+
mock.patch.object(self.uuid, '_last_timestamp_v6', None),
859+
mock.patch('time.time_ns', return_value=fake_nanoseconds)
860+
):
861+
u = self.uuid.uuid6(node=node, clock_seq=clock_seq)
862+
equal(str(u).upper(), '1EC9414C-232A-6B00-B3C8-9F6BDECED846')
863+
# 32 16 4 12 2 14 48
864+
# time_hi | time_mid | ver | time_lo | var | clock_seq | node
865+
equal(u.time, timestamp)
866+
equal(u.int & 0xffff_ffff_ffff, node)
867+
equal((u.int >> 48) & 0x3fff, clock_seq)
868+
equal((u.int >> 62) & 0x3, 0b10)
869+
equal((u.int >> 64) & 0xfff, 0xb00)
870+
equal((u.int >> 76) & 0xf, 0x6)
871+
equal((u.int >> 80) & 0xffff, 0x232a)
872+
equal((u.int >> 96) & 0xffff_ffff, 0x1ec9_414c)
873+
727874
def test_uuid8(self):
728875
equal = self.assertEqual
729876
u = self.uuid.uuid8()

Lib/uuid.py

Lines changed: 58 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
r"""UUID objects (universally unique identifiers) according to RFC 4122/9562.
22
33
This module provides immutable UUID objects (class UUID) and the functions
4-
uuid1(), uuid3(), uuid4(), uuid5(), and uuid8() for generating version 1, 3,
5-
4, 5, and 8 UUIDs as specified in RFC 4122/9562.
4+
uuid1(), uuid3(), uuid4(), uuid5(), uuid6(), and uuid8() for generating
5+
version 1, 3, 4, 5, 6, and 8 UUIDs as specified in RFC 4122/9562.
66
77
If all you want is a unique ID, you should probably call uuid1() or uuid4().
88
Note that uuid1() may compromise privacy since it creates a UUID containing
@@ -101,6 +101,7 @@ class SafeUUID:
101101
_RFC_4122_VERSION_3_FLAGS = ((3 << 76) | (0x8000 << 48))
102102
_RFC_4122_VERSION_4_FLAGS = ((4 << 76) | (0x8000 << 48))
103103
_RFC_4122_VERSION_5_FLAGS = ((5 << 76) | (0x8000 << 48))
104+
_RFC_4122_VERSION_6_FLAGS = ((6 << 76) | (0x8000 << 48))
104105
_RFC_4122_VERSION_8_FLAGS = ((8 << 76) | (0x8000 << 48))
105106

106107

@@ -127,7 +128,9 @@ class UUID:
127128
128129
fields a tuple of the six integer fields of the UUID,
129130
which are also available as six individual attributes
130-
and two derived attributes:
131+
and two derived attributes. The time_* attributes are
132+
only relevant to version 1, while the others are only
133+
relevant to versions 1 and 6:
131134
132135
time_low the first 32 bits of the UUID
133136
time_mid the next 16 bits of the UUID
@@ -353,8 +356,19 @@ def clock_seq_low(self):
353356

354357
@property
355358
def time(self):
356-
return (((self.time_hi_version & 0x0fff) << 48) |
357-
(self.time_mid << 32) | self.time_low)
359+
if self.version == 6:
360+
# time_hi (32) | time_mid (16) | ver (4) | time_lo (12) | ... (64)
361+
time_hi = self.int >> 96
362+
time_lo = (self.int >> 64) & 0x0fff
363+
return time_hi << 28 | (self.time_mid << 12) | time_lo
364+
else:
365+
# time_lo (32) | time_mid (16) | ver (4) | time_hi (12) | ... (64)
366+
#
367+
# For compatibility purposes, we do not warn or raise when the
368+
# version is not 1 (timestamp is irrelevant to other versions).
369+
time_hi = (self.int >> 64) & 0x0fff
370+
time_lo = self.int >> 96
371+
return time_hi << 48 | (self.time_mid << 32) | time_lo
358372

359373
@property
360374
def clock_seq(self):
@@ -756,6 +770,44 @@ def uuid5(namespace, name):
756770
int_uuid_5 |= _RFC_4122_VERSION_5_FLAGS
757771
return UUID._from_int(int_uuid_5)
758772

773+
_last_timestamp_v6 = None
774+
775+
def uuid6(node=None, clock_seq=None):
776+
"""Similar to :func:`uuid1` but where fields are ordered differently
777+
for improved DB locality.
778+
779+
More precisely, given a 60-bit timestamp value as specified for UUIDv1,
780+
for UUIDv6 the first 48 most significant bits are stored first, followed
781+
by the 4-bit version (same position), followed by the remaining 12 bits
782+
of the original 60-bit timestamp.
783+
"""
784+
global _last_timestamp_v6
785+
import time
786+
nanoseconds = time.time_ns()
787+
# 0x01b21dd213814000 is the number of 100-ns intervals between the
788+
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
789+
timestamp = nanoseconds // 100 + 0x01b21dd213814000
790+
if _last_timestamp_v6 is not None and timestamp <= _last_timestamp_v6:
791+
timestamp = _last_timestamp_v6 + 1
792+
_last_timestamp_v6 = timestamp
793+
if clock_seq is None:
794+
import random
795+
clock_seq = random.getrandbits(14) # instead of stable storage
796+
time_hi_and_mid = (timestamp >> 12) & 0xffff_ffff_ffff
797+
time_lo = timestamp & 0x0fff # keep 12 bits and clear version bits
798+
clock_s = clock_seq & 0x3fff # keep 14 bits and clear variant bits
799+
if node is None:
800+
node = getnode()
801+
# --- 32 + 16 --- -- 4 -- -- 12 -- -- 2 -- -- 14 --- 48
802+
# time_hi_and_mid | version | time_lo | variant | clock_seq | node
803+
int_uuid_6 = time_hi_and_mid << 80
804+
int_uuid_6 |= time_lo << 64
805+
int_uuid_6 |= clock_s << 48
806+
int_uuid_6 |= node & 0xffff_ffff_ffff
807+
# by construction, the variant and version bits are already cleared
808+
int_uuid_6 |= _RFC_4122_VERSION_6_FLAGS
809+
return UUID._from_int(int_uuid_6)
810+
759811
def uuid8(a=None, b=None, c=None):
760812
"""Generate a UUID from three custom blocks.
761813
@@ -788,6 +840,7 @@ def main():
788840
"uuid3": uuid3,
789841
"uuid4": uuid4,
790842
"uuid5": uuid5,
843+
"uuid6": uuid6,
791844
"uuid8": uuid8,
792845
}
793846
uuid_namespace_funcs = ("uuid3", "uuid5")
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Add :func:`uuid.uuid6` for generating UUIDv6 objects as specified in
2+
:rfc:`9562`. Patch by Bénédikt Tran.

0 commit comments

Comments
 (0)