设计分布式唯一ID生成器

难度: medium

设计一个分布式唯一ID生成器,该生成器能够在多个节点上高吞吐量地生产出无冲突的唯一标识符,并且同步开销最小。

Solution

System requirements

Functional:

Id's must be monotonically increasing over time

64-bit or 128-bit limit

can use letters or numbers

Non-Functional:

Not single point of failure - must be distributed

Capacity estimation

Estimate the scale of the system you are going to design...

(1e6 id's per second) (3e7 sec/yr) -> 3e13 id's per year

3e13 * 16 bytes = ~1e15 bytes/yr -> 1 PB /yr

API design

Define what APIs are expected from the system...

generate_id() -> unique id that increases over time

we can check that it increases over time by parsing the timestamp portion of the id.

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

Database is not needed - this system is fully distributed and stateless.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

use an id format consisting of:

<datacenter_id>-<machine_id>-<timestamp>-<sequence_number>

aka: Twitter's Snowflake algorithm

datacenter_id = (2 bytes)

machine_id = (2 bytes)

timestamp = (10 bytes) milliseconds since the epoch (or some later date)

sequence_number = (2 bytes) the number of id's generated during that millisecond

flowchart TD
    B[client] --> C{server}
    C --> D[Database]
  

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

each request is directed to a regional datacenter, where a machine handles the request to generate and return an id. There is no centralized processing required to generate each id, thus no need to store the ids in a database.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

load balancer - route requests to the nearest DC

request handler - use the machine's monotonically increasing counter to associate a timestamp with each time

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

There may be clock-syncing issues across machines/datacenters. It's possible to have up to a few milliseconds of drift. This drift is negligible and common in distributed architectures, and shouldn't affect the overall requirements.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

If a DC or machine goes down, we can failover to another one. Our id format handles this by using the machine's unique id in our id format.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Improve clock drift using an atomic clock or synchronization mechanism like NTP.


得分: 8