Globally unique identifier

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

A globally unique identifier (GUID, /ˈɡwɪd/ or /ˈɡɪd/) is a unique reference number used as an identifier in computer software. The term "GUID" typically refers to various implementations of the universally unique identifier (UUID) standard.[1]

GUIDs are usually stored as 128-bit values, and are commonly displayed as 32 hexadecimal digits with groups separated by hyphens, such as:

21EC2020-3AEA-4069-A2DD-08002B30309D

They may or may not be generated from random (or pseudo-random) numbers. GUIDs generated from random numbers normally contain 6 fixed bits (these indicate that the GUID is random) and 122 random bits; the total number of unique such GUIDs is 2122 (approximately 5.3×1036). This number is so large that the probability of the same number being generated randomly twice is negligible; however other GUID versions have different uniqueness properties and probabilities, ranging from guaranteed uniqueness to likely duplicates. Assuming uniform probability for simplicity, the probability of one duplicate would be about 50% if every person on earth as of 2014 owned 600 million GUIDs.

Common uses

  • Microsoft Windows uses GUIDs internally to identify the classes and interfaces of COM objects. A script can activate a specific class or object without having to know the name or location of the dynamic linked library that contains it. Because of this, ActiveX, a system for downloading and installing controls in a web browser, uses GUIDs to uniquely identify each control.
  • Intel's GUID Partition Table, or GPT, a system for partitioning hard drives.
  • JT files use a partitioning into 4+2+2+8*1 bytes to represent nodes in the data structure and segment IDs.
  • Second Life uses GUIDs for identification of all assets in its world.[2]
  • Database developers and administrators may use GUIDs as primary keys for database tables to ensure uniqueness between database servers, at the cost of making the working set size for caching much larger for a relational database server, potentially impacting application performance.

Binary encoding

A GUID can be stored as a 16-byte (128-bit) number; Microsoft defines a format which is split into four fields,[1] defined as follows. Note that this format differs from the UUID standard [3] only in the byte order of the first 3 fields.

Bits Bytes Name Endianness

(Microsoft GUID Structure)

Endianness

RFC 4122

32 4 Data1 Native Big
16 2 Data2 Native Big
16 2 Data3 Native Big
64 8 Data4 Big Big

This endianness applies only to the way in which a GUID is stored, and not to the way in which it is represented in text. GUIDs and RFC 4122 UUIDs should be identical when displayed textually.

One to three of the most significant bits of the first byte in Data 4 define the type variant of the GUID:[3]

Pattern Description
0xx Network Computing System backward compatibility
10x Standard
110 Microsoft Component Object Model backward compatibility; this includes the GUIDs for important interfaces like IUnknown and IDispatch
111 Reserved for future use

For the "standard" variant, the most significant four bits of Data3 define the version number, and the algorithm used.[3]

Text encoding

A GUID is most commonly written in text as a sequence of hexadecimal digits separated into five groups, such as:

3F2504E0-4F89-41D3-9A0C-0305E82C3301

This text notation contains the following fields, separated by hyphens:

Hex digits Description
8 Data1
4 Data2
4 Data3
4 Initial two bytes from Data4
12 Remaining six bytes from Data4

For the first three fields, the most significant digit is on the left. The last two fields are treated as eight separate bytes, each having their most significant digit on the left, and they follow each other from left to right. Note that the digit order of the fourth field may be unexpected, since it is treated differently from the other fields in the structure.

Often braces are added to enclose the above format, such as:

{3F2504E0-4F89-41D3-9A0C-0305E82C3301}

This is sometimes known as "registry format".[4]

When printing fewer characters is desired, GUIDs are sometimes encoded into a base64 or Ascii85 string.[citation needed]

A base64-encoded GUID consists of 22 to 24 characters (depending on padding), for instance:

PyUE4E+JEdOaDAMF6CwzAQ
PyUE4E+JEdOaDAMF6CwzAQ==

and Ascii85 encoding gives 20 characters, for example:

5:$Hj:Pf\4RLB9%kU\Lj

In Uniform Resource Names (URN), GUIDs have namespace identifier "uuid",[3] e.g.:

urn:uuid:3F2504E0-4F89-41D3-9A0C-0305E82C3301

Algorithm

In the OSF-specified algorithm for generating new (V1) GUIDs, the user's network card MAC address is used as a base for the last group of GUID digits, which means, for example, that a document can be tracked back to the computer that created it. This privacy hole was used when locating the creator of the Melissa virus.[5] Most of the other digits are based on the time while generating the GUID.

The other parts of a V1 GUID make use of the time since the implementation of the Gregorian calendar in 1582. V1 GUIDs, containing a MAC address and time, can be identified by the digit "1" in the first position of the third group of digits, for example {2F1E4FC0-81FD-11DA-9156-00036A0F876A}. Version 1 GUIDs generated between about 1995 and 2010 have Data3 starting with 11D, while more recent ones have 11E.

Version 4 GUIDs simply use a pseudo-random number for filling in all but six of the bits. They have a "4" in the 4-bit version position, and the first two bits of 'data4' are 1 and 0 (so the first hex digit of 'data4' is 8, 9, A, or B), for example {38A52BE4-9352-453E-AF97-5C3B448652F0}. More specifically, the 'data3' bit pattern would be 0001xxxxxxxxxxxx in V1, and 0100xxxxxxxxxxxx in V4. Cryptanalysis of the WinAPI GUID generator shows that, since the sequence of V4 GUIDs is pseudo-random, given full knowledge of the internal state, it is possible to predict previous and subsequent values.[6]

Non-unique GUIDs

Certain GUIDs turn up again and again, both intentionally, and otherwise. In a GUID Partition Table (GPT), it is not appropriate for more than one disk to have the same Disk GUID, or for more than one partition to have the same Unique partition GUID, however it is appropriate for multiple partitions to use the same Partition type GUID. So only Linux swap partitions, and all Linux swap partitions on GPT-formatted disks can be counted on to have the GUID 0657FD6D-A4AB-43C4-84E5-0933C84B4F4F, for example. (In that case, the GUID uniquely identifies a type of partition, and that type is referenced by all partitions of that type. That is why said GUID turns up again and again).

Some flawed V4 GUID-generating implementations rely on pseudo-random number generators that use random number seed sources that turn out to be predictable. Standard V1 GUIDs are not chosen at random; they are chosen by standardized algorithms. (See RFC 4122.) These algorithms result in GUIDs that are more reliably unique than ones chosen using even a hypothetically perfect random number generator, for which any two GUIDs have a probability of 1 in 2122 (about 5.3×1036) of being identical.

Non-unique FireWire GUIDs

Operating systems (including Windows, Mac OS X and Linux) are designed based on the expectation that a given disk will never have the same Disk GUID as another. However, the so-called FireWire GUIDs (which are called GUIDs, but are a non-standard 64 bits) of every unit of several common models of hard drive/drive case that each use the same manufacturer's chipset, Oxford Semiconductor, all have the same GUID - 0x30E002E0454647 (sometimes displayed in decimal form as Connection ID 13757101839304263) including NewerTech, Vantec, and Cavalry, and this causes problems when such drives are daisy-chained or otherwise connected to the same system. The manufacturers were supposed to serialize them, but many did not.[when? clarification needed][7]

Sequential algorithms

GUIDs are commonly used as the primary key of database tables, and with that, often the table has a clustered index on that attribute. This presents a performance issue when inserting records because a fully random GUID means the record may need to be inserted anywhere within the table rather than merely appended near the end of it.

As a way of mitigating this issue while still providing enough randomness to effectively prevent duplicate number collisions, several algorithms have been used to generate sequential GUIDs.

The oldest technique, present as a feature in early version of Microsoft's GUIDGEN SDK tool, works by simply outputting the set of MAC-based version 1 GUIDs corresponding to a time interval, taking advantage of the fact that the time field in v1 GUIDs has a resolution of 100 ns, which allows a million sequential GUIDs to be generated by simply locking out other GUID generators on the computer for a tenth of a second (or 10000 GUIDs in a millisecond). These sequential GUIDs are unique, but the increment happens in the Data1 field, not at the end of the GUID.

The second technique, described by Jimmy Nilsson in August 2002[8] and referred to as a "COMB" ("combined guid/timestamp"), replaces the last 6 bytes of Data4 in a random (version 4) GUID with the least-significant 6 bytes of the current system date/time. While this can result in GUIDs that are generated out of order within the same fraction of a second, his tests showed this had little real-world[clarify] impact on insertion. One side effect of this approach is that the date and time of insertion can be easily extracted from the value later, if desired. The COMB technique tries to compensate for the reduced clustering in database indexes caused by switching to an OS version that uses random GUIDs rather than MAC-based GUIDs, and is useful only when it is not possible to revert to version 1 GUIDs.

Starting with Microsoft SQL Server version 2005, Microsoft added a function to the Transact-SQL language called NEWSEQUENTIALID(),[9] which essentially provides access to the traditional version 1 GUIDs (or something so close it fits the same description), with all their advantages and disadvantages.

In 2006, a programmer found that the SYS_GUID function provided by Oracle was returning sequential GUIDs on some platforms, but this appears to be a bug rather than a feature.[10]

Uses

In the Microsoft Component Object Model (COM), GUIDs are used to uniquely distinguish different software component interfaces. This means that two (possibly incompatible) versions of a component can have exactly the same name but still be distinguishable by their GUIDs. For example, in the creation of components for Microsoft Windows using COM, all components must implement the IUnknown interface to allow client code to find all other interfaces and features of that component, and they do this by creating a GUID which may be called upon to provide an entry point. The IUnknown interface is defined as a GUID with the value of {00000000-0000-0000-C000-000000000046}, and rather than having a named entry point called "IUnknown", the preceding GUID is used, thus every component that provides an IUnknown entry point gives the same GUID, and every program that looks for an IUnknown interface in a component always uses that GUID to find the entry point, knowing that an application using that particular GUID must always consistently implement IUnknown in the same manner and the same way.

GUIDs are also inserted into documents from Microsoft Office programs. Even audio or video streams in the Advanced Systems Format (ASF) are identified by their GUIDs.

Subtypes

There are several flavors of GUIDs used in COM:

  • IID – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at [HKEY_CLASSES_ROOT\Interface][11] )
  • CLSID – class identifier; (Stored at [HKEY_CLASSES_ROOT\CLSID])
  • LIBID – type library identifier; (Stored at [HKEY_CLASSES_ROOT\TypeLib][12])
  • CATID – category identifier; (its presence on a class identifies it as belonging to certain class categories, listed at [HKEY_CLASSES_ROOT\Component Categories][13])

DCOM introduces many additional GUID subtypes:

  • AppID – application identifier;
  • MID – machine identifier;
  • IPID – interface pointer identifier; (applicable to an interface engaged in RPC)
  • CID – causality identifier; (applicable to a RPC session)
  • OID – object identifier; (applicable to an object instance)
  • OXID – object exporter identifier; (applicable to an instance of the system object that performs RPC)
  • SETID – ping set identifier; (applicable to a group of objects)

These GUID subspaces may overlap, as the context of GUID usage defines its subtype. For example, there might be a class using the same GUID for its CLSID as another class is using for its IID — all without a problem. On the other hand, two classes using the same CLSID could not co-exist.

XML syndication formats

There is also a guid element in some versions of the RSS specification, and a mandatory id element in Atom, which should contain a unique identifier for each individual article or weblog post. In RSS the contents of the GUID can be any text, and in practice is typically a copy of the article URL. Atoms' IDs need to be valid URIs (usually URLs pointing to the entry, or URNs containing any other unique identifier).

See also

References

  1. 1.0 1.1 Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. 3.0 3.1 3.2 3.3 A Universally Unique IDentifier (UUID) URN Namespace
  4. Registry Keys and Entries for a Type 1 Online Store. Microsoft.com.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. Lua error in package.lua at line 80: module 'strict' not found.
  10. Watch out for sequential Oracle GUIDs!, Steven Feuerstein, Oracle Professional, 19 February 2006. Retrieved 2011-12-08.
  11. Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. Lua error in package.lua at line 80: module 'strict' not found.

Sources

External links