Saturday, December 21, 2024

Creating Deterministic UUIDs with UUIDv5 and X.500


Introduction

When you need consistent, repeatable identifiers in your projects, UUIDv5 can be your go-to solution. Unlike UUIDv4, which is purely random, or other versions that are less random, but still unpredictable, UUIDv5 combines a namespace with a name (or set of attributes) to produce the same UUID for the same input pair regardless of the platform or programming language. This is highly useful when you have data that should map to a fixed identifier—like a country code or an organizational unit—where you always want to get back the exact same UUID for that data.

In the RFC 4122, Appendix C , there are four primary standard namespaces for UUIDv5:

  1. DNS (Domain Name System)
  2. URL (Uniform Resource Locator)
  3. OID (Object Identifier)
  4. X.500 (Directory Services)

In this post, we’ll:

  1. Introduce the four standard UUIDv5 namespaces: DNS, URL, OID, and X.500.
  2. Show simple examples using DNS, URL, and OID, including their full names and brief definitions.
  3. Dive deeper into the X.500 namespace.
  4. Discuss how to create UUIDv5-based database keys from existing uniqueness constraints or compound keys using the X.500 namespace.

We’ll use code samples from Symfony’s Symfony\Component\Uid\Uuid component, but UUIDv5 is available in several libraries across different platforms, which will produce the exact same deterministic UUIDs in all cases.


Quick Examples: DNS, URL, and OID Namespaces

Before diving into X.500, here are three simple examples demonstrating how to generate deterministic UUIDs using the DNS, URL, and OID namespaces.

1. DNS Example

  • DNS stands for Domain Name System, a hierarchical naming system used to resolve domain names into IP addresses.
  • A DNS name typically looks like threeleaf.com or subdomain.example.org.

use Symfony\Component\Uid\Uuid;

/**
 * Generate a deterministic UUID using the DNS namespace.
 *
 * @param string $hostname The hostname to convert.
 * @return string The deterministic UUID for the given hostname.
 */
function generateDnsUuid(string $hostname): string
{
    /* Generate a deterministic UUIDv5: */
    $dnsUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_DNS),
        $hostname
    );

    return (string) $dnsUuid;
}

/* Example usage: */
$hostname = 'threeleaf.com';
echo generateDnsUuid($hostname);

2. URL Example

  • URL stands for Uniform Resource Locator, a reference (an address) to a resource on the internet.
  • A URL typically looks like https://threeleaf.com/blog.

/**
 * Generate a deterministic UUID using the URL namespace.
 *
 * @param string $url The URL to convert.
 * @return string The deterministic UUID for the given URL.
 */
function generateUrlUuid(string $url): string
{
    /* Generate a deterministic UUIDv5: */
    $urlUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_URL),
        $url
    );

    return (string) $urlUuid;
}

/* Example usage: */
$url = 'https://threeleaf.com/blog';
echo generateUrlUuid($url);

3. OID Example

  • OID stands for Object Identifier, a globally unique identifier used in various standards (e.g., SNMP, LDAP) to name an object or concept.
  • An OID typically looks like 1.3.6.1.4.1..., where each number identifies a node in a hierarchy.

/**
 * Generate a deterministic UUID using the OID namespace.
 *
 * @param string $oid The OID string to convert.
 * @return string The deterministic UUID for the given OID.
 */
function generateOidUuid(string $oid): string
{
    /* Generate a deterministic UUIDv5: */
    $oidUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_OID),
        $oid
    );

    return (string) $oidUuid;
}

/* Example usage: */
$oid = '1.3.6.1.4.1';
echo generateOidUuid($oid);

Deep Dive: X.500 Namespaces and Distinguished Names

X.500 is a suite of standards for directory services. It defines a structure for Distinguished Names (DNs), which act like the “full path” for an entry in a directory (similar in concept to a file path in a filesystem). Each DN is a concatenation of attribute-value pairs that uniquely identify an entry. Common attribute keys include:

  • CN (Common Name, Full Name)
  • OU (Organizational Unit)
  • O (Organization Name)
  • L (Locality)
  • ST (State or Province)
  • C (ISO 3166 2-Letter Country Code)
  • UID (User ID)
  • SN (Surname, Last Name)
  • GivenName (First Name)
  • Mail (Email Address)

For a thorough list of attributes, check out:

Because these attributes are hierarchical and fairly stable (e.g., CN=John A. Marsh), the X.500 namespace is perfectly suited for generating deterministic UUIDs when referencing directory-like entities.

Generating UUIDs with X.500

To generate a UUIDv5 based on a Distinguished Name, you can use the NAMESPACE_X500 constant:


/**
 * Generate a UUIDv5 based on a Distinguished Name (DN) using the X.500 namespace.
 *
 * @param string $distinguishedName The distinguished name to convert.
 * @return string The deterministic UUID for the given distinguished name.
 */
function generateX500Uuid(string $distinguishedName): string
{
    /* Generate a deterministic UUIDv5: */
    $uuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $distinguishedName
    );

    return (string) $uuid;
}

/* Example usage: */
$distinguishedName = 'CN=John A. Marsh';
echo generateX500Uuid($distinguishedName);

$complexDistinguishedName = 'CN=Madison Stutts,OU=Engineering,O=Example Corp,C=US';
echo generateX500Uuid($complexDistinguishedName);

Important note about distinguished name volatility!
Depending on the specific use case, keep in mind that you may want to focus on attributes that are stable and unique for your application. If you are independently calculating deterministic UUIDs in several places over a long period of time, you will want to avoid attributes that might change in that timespan. For example, an application-assigned user ID or government-assigned ID might be considered permanent, whereas a user’s personal email address or phone number might change.


Creating UUID Database Keys

In many applications, you already have unique identifiers—like an internal employee ID or a composite key based on a combination of columns. By converting these existing keys into UUIDv5 values (using NAMESPACE_X500 in this case), you ensure you still get deterministic UUIDs while leveraging your current uniqueness constraints.

1. Converting a Non-UUID Employee ID into a UUID

Suppose you have a table employee with a unique, non-UUID primary key employee_id, such as 'EMP12345'. Since employeeNumber is a standard X.500 attribute for an employee’s unique ID, you could do:


/**
 * Convert a non-UUID employee ID into a UUID.
 *
 * @param string $employeeId The employee ID to convert.
 * @return string The deterministic UUID for the given employee ID.
 */
function generateEmployeeUuid(string $employeeId): string
{
    /* Construct an X.500 "distinguished name" style string: */
    $distinguishedName = 'employeeNumber=' . $employeeId;

    /* Generate a deterministic UUIDv5: */
    $employeeUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $distinguishedName
    );

    return (string) $employeeUuid;
}

/* Example usage: */
$employeeId = 'EMP12345';
echo generateEmployeeUuid($employeeId);

Here, every time you use "employeeNumber=EMP12345" with NAMESPACE_X500, you’ll get the same UUID, which can be stored or used in foreign keys.

2. Generating UUIDs from a Compound Key (Phone + Email)

Sometimes, uniqueness is enforced by a combination of columns—for example, phone_number and email. Two relevant LDAP attributes here are telephoneNumber and mail:


/**
 * Generate a UUID from a compound key (phone + email).
 *
 * @param string $phoneNumber The customer phone number.
 * @param string $email The customer email address.
 * @return string The deterministic UUID for the given phone number and email.
 */
function generateCustomerUuid(string $phoneNumber, string $email): string
{
    /* Create an X.500-style DN with multiple attributes: */
    $compoundDn = 'telephoneNumber=' . $phoneNumber . ',mail=' . $email;

    /* Generate the UUID using X.500 namespace: */
    $userUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $compoundDn
    );

    return (string) $userUuid;
}

/* Example usage: */
$phoneNumber = '+1234567890';
$email = 'john.marsh@example.com';
echo generateCustomerUuid($phoneNumber, $email);

Conclusion

Deterministic UUIDs (UUIDv5) can simplify your data by ensuring the same input always yields the same output. They’re particularly handy for X.500-style distinguished names, where hierarchical attributes remain stable over time. By combining a robust namespace like NAMESPACE_X500 with well-structured DNs, you’ll produce consistent identifiers throughout your applications.

Whether you’re generating UUIDs for DNS, URL, OID, or X.500, a Uuid v5 function makes it straightforward. Just remember to:

  1. Pick the right namespace (DNS, URL, OID, or X.500).
  2. Use suitable and stable attributes when generating UUIDs based on X.500 Distinguished Names.
  3. Leverage UUIDv5 to ensure deterministic results.

Finally, converting existing IDs or compound keys into UUIDv5 can unify how you manage references throughout your database and company. Once you adopt deterministic UUIDs, you’ll never have to worry about conflicting keys for the same underlying data.

Happy coding, and enjoy your deterministic UUIDs!



Appendix

Complete PHP Example (uuidv5-test.php)


require __DIR__ . '/../vendor/autoload.php';

use Symfony\Component\Uid\Uuid;

/**
 * Generate a deterministic UUID using the DNS namespace.
 *
 * @param string $hostname The hostname to convert.
 *
 * @return string The deterministic UUID for the given hostname.
 */
function generateDnsUuid(string $hostname): string
{
    /* Generate a deterministic UUIDv5: */
    $dnsUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_DNS),
        $hostname
    );

    return (string)$dnsUuid;
}

/**
 * Generate a deterministic UUID using the URL namespace.
 *
 * @param string $url The URL to convert.
 *
 * @return string The deterministic UUID for the given URL.
 */
function generateUrlUuid(string $url): string
{
    /* Generate a deterministic UUIDv5: */
    $urlUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_URL),
        $url
    );

    return (string)$urlUuid;
}

/**
 * Generate a deterministic UUID using the OID namespace.
 *
 * @param string $oid The OID string to convert.
 *
 * @return string The deterministic UUID for the given OID.
 */
function generateOidUuid(string $oid): string
{
    /* Generate a deterministic UUIDv5: */
    $oidUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_OID),
        $oid
    );

    return (string)$oidUuid;
}

/**
 * Generate a UUIDv5 based on a Distinguished Name (DN) using the X.500 namespace.
 *
 * @param string $distinguishedName The distinguished name to convert.
 *
 * @return string The deterministic UUID for the given distinguished name.
 */
function generateX500Uuid(string $distinguishedName): string
{
    /* Generate a deterministic UUIDv5: */
    $uuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $distinguishedName
    );

    return (string)$uuid;
}

/**
 * Convert a non-UUID employee ID into a UUID.
 *
 * @param string $employeeId The employee ID to convert.
 *
 * @return string The deterministic UUID for the given employee ID.
 */
function generateEmployeeUuid(string $employeeId): string
{
    /* Construct an X.500 "distinguished name" style string: */
    $distinguishedName = 'employeeNumber=' . $employeeId;

    /* Generate a deterministic UUIDv5: */
    $employeeUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $distinguishedName
    );

    return (string)$employeeUuid;
}

/**
 * Generate a UUID from a compound key (phone + email).
 *
 * @param string $phoneNumber The customer phone number.
 * @param string $email       The customer email address.
 *
 * @return string The deterministic UUID for the given phone number and email.
 */
function generateCustomerUuid(string $phoneNumber, string $email): string
{
    /* Create an X.500-style DN with multiple attributes: */
    $compoundDn = 'telephoneNumber=' . $phoneNumber . ',mail=' . $email;

    /* Generate the UUID using X.500 namespace: */
    $userUuid = Uuid::v5(
        Uuid::fromString(Uuid::NAMESPACE_X500),
        $compoundDn
    );

    return (string)$userUuid;
}

// Example usage section with formatted output:
echo '

Example Usage of UUID Generation Functions

'; echo "
"; // DNS Example $hostname = 'threeleaf.com'; $dnsResult = generateDnsUuid($hostname); echo "
"; echo "generateDnsUuid('$hostname')
"; echo "→ $dnsResult"; echo '
'; // URL Example $url = 'https://threeleaf.com/blog'; $urlResult = generateUrlUuid($url); echo "
"; echo "generateUrlUuid('$url')
"; echo "→ $urlResult"; echo '
'; // OID Example $oid = '1.3.6.1.4.1'; $oidResult = generateOidUuid($oid); echo "
"; echo "generateOidUuid('$oid')
"; echo "→ $oidResult"; echo '
'; // X500 Simple Example $distinguishedName = 'CN=John A. Marsh'; $x500Result = generateX500Uuid($distinguishedName); echo "
"; echo "generateX500Uuid('$distinguishedName')
"; echo "→ $x500Result"; echo '
'; // X500 Complex Example $complexDistinguishedName = 'CN=Madison Stutts,OU=Engineering,O=Example Corp,C=US'; $complexX500Result = generateX500Uuid($complexDistinguishedName); echo "
"; echo "generateX500Uuid('$complexDistinguishedName')
"; echo "→ $complexX500Result"; echo '
'; // Employee Example $employeeId = 'EMP12345'; $employeeResult = generateEmployeeUuid($employeeId); echo "
"; echo "generateEmployeeUuid('$employeeId')
"; echo "→ $employeeResult"; echo '
'; // Customer Compound Key Example $phoneNumber = '+1234567890'; $email = 'john.marsh@example.com'; $customerResult = generateCustomerUuid($phoneNumber, $email); echo "
"; echo "generateCustomerUuid('$phoneNumber', '$email')
"; echo "→ $customerResult"; echo '
'; echo '
';

Complete Python Example (uuidv5_test.py)


import uuid
from typing import Optional

def generate_dns_uuid(hostname: str) -> str:
    """Generate a deterministic UUID using the DNS namespace.

    Args:
        hostname: The hostname to convert.
    Returns:
        The deterministic UUID for the given hostname.
    """
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, hostname))

def generate_url_uuid(url: str) -> str:
    """Generate a deterministic UUID using the URL namespace.

    Args:
        url: The URL to convert.
    Returns:
        The deterministic UUID for the given URL.
    """
    return str(uuid.uuid5(uuid.NAMESPACE_URL, url))

def generate_oid_uuid(oid: str) -> str:
    """Generate a deterministic UUID using the OID namespace.

    Args:
        oid: The OID string to convert.
    Returns:
        The deterministic UUID for the given OID.
    """
    return str(uuid.uuid5(uuid.NAMESPACE_OID, oid))

def generate_x500_uuid(distinguished_name: str) -> str:
    """Generate a UUIDv5 based on a Distinguished Name (DN) using the X.500 namespace.

    Args:
        distinguished_name: The distinguished name to convert.
    Returns:
        The deterministic UUID for the given distinguished name.
    """
    return str(uuid.uuid5(uuid.NAMESPACE_X500, distinguished_name))

def generate_employee_uuid(employee_id: str) -> str:
    """Convert a non-UUID employee ID into a UUID.

    Args:
        employee_id: The employee ID to convert.
    Returns:
        The deterministic UUID for the given employee ID.
    """
    distinguished_name = f'employeeNumber={employee_id}'
    return str(uuid.uuid5(uuid.NAMESPACE_X500, distinguished_name))

def generate_customer_uuid(phone_number: str, email: str) -> str:
    """Generate a UUID from a compound key (phone + email).

    Args:
        phone_number: The customer phone number.
        email: The customer email address.
    Returns:
        The deterministic UUID for the given phone number and email.
    """
    compound_dn = f'telephoneNumber={phone_number},mail={email}'
    return str(uuid.uuid5(uuid.NAMESPACE_X500, compound_dn))

def print_example_header():
    print("

Example Usage of UUID Generation Functions

") print("
") def print_example_result(function_name: str, args: str, result: str): print("
") print(f"{function_name}({args})
") print(f"→ {result}") print("
") def main(): print_example_header() # DNS Example hostname = 'threeleaf.com' dns_result = generate_dns_uuid(hostname) print_example_result('generate_dns_uuid', f"'{hostname}'", dns_result) # URL Example url = 'https://threeleaf.com/blog' url_result = generate_url_uuid(url) print_example_result('generate_url_uuid', f"'{url}'", url_result) # OID Example oid = '1.3.6.1.4.1' oid_result = generate_oid_uuid(oid) print_example_result('generate_oid_uuid', f"'{oid}'", oid_result) # X500 Simple Example distinguished_name = 'CN=John A. Marsh' x500_result = generate_x500_uuid(distinguished_name) print_example_result('generate_x500_uuid', f"'{distinguished_name}'", x500_result) # X500 Complex Example complex_dn = 'CN=Madison Stutts,OU=Engineering,O=Example Corp,C=US' complex_result = generate_x500_uuid(complex_dn) print_example_result('generate_x500_uuid', f"'{complex_dn}'", complex_result) # Employee Example employee_id = 'EMP12345' employee_result = generate_employee_uuid(employee_id) print_example_result('generate_employee_uuid', f"'{employee_id}'", employee_result) # Customer Compound Key Example phone_number = '+1234567890' email = 'john.marsh@example.com' customer_result = generate_customer_uuid(phone_number, email) print_example_result('generate_customer_uuid', f"'{phone_number}', '{email}'", customer_result) print("
") if __name__ == "__main__": main()

Output (matches between PHP runnin on Linux and Python running on Macbook)


Example Usage of UUID Generation Functions

generate_dns_uuid('threeleaf.com')
→ d4a08aa5-9661-57ab-bf61-8f28be9b1f00
generate_url_uuid('https://threeleaf.com/blog')
→ b0277088-caf5-5a15-aa5d-5115112640c3
generate_oid_uuid('1.3.6.1.4.1')
→ 106dd502-8b3e-50db-80ed-1134f5c18eae
generate_x500_uuid('CN=John A. Marsh')
→ a953bc33-f538-5bb3-baa3-aaf081b5df93
generate_x500_uuid('CN=Madison Stutts,OU=Engineering,O=Example Corp,C=US')
→ b837ef27-6f4f-5a48-9419-badb502fb581
generate_employee_uuid('EMP12345')
→ 7aed46df-9f8e-5252-b340-34a0f15d33dd
generate_customer_uuid('+1234567890', 'john.marsh@example.com')
→ db23fe5d-ec32-5522-b563-f10847432d04

Listen to the podcast generated by NotebookLM

No comments: