难度: medium
Solution
System requirements
Functional:
Id's must be monotonically increasing over time
64-bit or 128-bit limit
can use letters or numbers
Non-Functional:
Not single point of failure - must be distributed
Capacity estimation
Estimate the scale of the system you are going to design...
(1e6 id's per second) (3e7 sec/yr) -> 3e13 id's per year
3e13 * 16 bytes = ~1e15 bytes/yr -> 1 PB /yr
API design
Define what APIs are expected from the system...
generate_id() -> unique id that increases over time
we can check that it increases over time by parsing the timestamp portion of the id.
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Database is not needed - this system is fully distributed and stateless.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
use an id format consisting of:
<datacenter_id>-<machine_id>-<timestamp>-<sequence_number>
aka: Twitter's Snowflake algorithm
datacenter_id = (2 bytes)
machine_id = (2 bytes)
timestamp = (10 bytes) milliseconds since the epoch (or some later date)
sequence_number = (2 bytes) the number of id's generated during that millisecond
flowchart TD B[client] --> C{server} C --> D[Database]
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
each request is directed to a regional datacenter, where a machine handles the request to generate and return an id. There is no centralized processing required to generate each id, thus no need to store the ids in a database.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
load balancer - route requests to the nearest DC
request handler - use the machine's monotonically increasing counter to associate a timestamp with each time
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
There may be clock-syncing issues across machines/datacenters. It's possible to have up to a few milliseconds of drift. This drift is negligible and common in distributed architectures, and shouldn't affect the overall requirements.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
If a DC or machine goes down, we can failover to another one. Our id format handles this by using the machine's unique id in our id format.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Improve clock drift using an atomic clock or synchronization mechanism like NTP.
得分: 8