难度: medium
Solution
System requirements
Functional:
- Distributed counter is needed which can be incremented/decremented.
- Counter value should be accurate. Dirty read or write should not be allowed.
Non-Functional:
- Highly available
- Scalable
- Resilient
- Secure
Capacity estimation
- one million counter in parallel
- Latency of the increment/decrement/GET to be less than sub-milliseconds
- Maximum count value is 2^64 - 1 =18 quintillion. Minimum value is 0.
API design
I will expose REST API's for distributed counter operation
API 1: Create a counter
POST v1/counter/create.
content-type/accept-type: "application-counter-api/json"
Request: {
"counter_name":"string",
"counter_description":"",
"metadata":{}
}
Response:{
"counter_id": <UUID of counter>
"Value":integer64
}
API 1: Increment a counter
POST v1/counter/{counter-id}/inc.
content-type/accept-type: "application-counter-api/json"
Response: {
"count":"Integer64",
}
API 1: Decrement a counter
POST v1/counter/{counter-id}/dec.
content-type/accept-type: "application-counter-api/json"
Response: {
"count":"Integer64",
}
Database design
I need a RDBMS for user management.
For a counter i will have custom storage.
High-level design
Client : HTTP client/APP which connect to counter service
HTTP load balancer: Load balance b/w state less API service which works based on Async IO and enque requset in Kafka TOPIC based with name {counter_name_ID}
Kafka: Will have each topic for couter_name_id. Each topic will have multiple partition for parallel message consumption.
Counter operation service: Will have kafka consumer group and read message and batch it (aggregate it)and update the storage counter in disk only if disk update is successful then it will mark offset in the kafka.
Storage service:
Storage system/service follow MVCC and maintain counter version. I will update the disk only if payload is having one version less than version in the file system. It enables Atomic update.
Storage service is distributed it will have primary copy for all the counter_name and counter value is replicated across other storage_service who act as secondary.
flowchart TD B[client] --> C{Load balancer} C{Load balancer}--> D[API server] D[API server] --> F[KAFKA] F[KAFKA] --> G[Counter Operation service] G[Counter Operation service] --> H[Distribured Configuration store] G[Counter Operation service] --> I[Storage Disk] G[Counter Operation service] --> K[Counter Operation service 1..N replicas] D[API server] --> L[Counter Read replicas]
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Storage service:
- Its a stateful service which store counter value.
- It expose increment and decrement endpoints.
- It internally implement MVCC mechanism for atomic update. If update request version is just one value less the current version in DB we will update otherwise reject the request.
- Storage service also responsible for replicating the counter value after every writes. Since we need good consistency we can complete write only after quorum of replicas have acknowledged successful write.
- All writes coming to storage service is Batched in order to reduce writes.
Trade offs/Tech choices
I could have used some RBDMS counter which has read replicas. But having custom counter gives better performance at scale.
Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.
DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.
Failure scenarios/bottlenecks
Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.
DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.
Future improvements
Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.
DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.
得分: 8