设计一个分布式计数器

难度: medium

开发一个分布式计数器系统，该系统能够在分布式环境中进行高频率的增加和减少操作。系统应确保准确性、高可用性、可扩展性以及最终一致性，同时在计数更新和检索时保持最低的延迟。

Solution

System requirements

Functional:

Distributed counter is needed which can be incremented/decremented.
Counter value should be accurate. Dirty read or write should not be allowed.

Non-Functional:

Highly available
Scalable
Resilient
Secure

Capacity estimation

one million counter in parallel
Latency of the increment/decrement/GET to be less than sub-milliseconds
Maximum count value is 2^64 - 1 =18 quintillion. Minimum value is 0.

API design

I will expose REST API's for distributed counter operation

API 1: Create a counter

POST v1/counter/create.

content-type/accept-type: "application-counter-api/json"

Request: {

"counter_name":"string",

"counter_description":"",

"metadata":{}

}

Response:{

"counter_id": <UUID of counter>

"Value":integer64

}

API 1: Increment a counter

POST v1/counter/{counter-id}/inc.

content-type/accept-type: "application-counter-api/json"

Response: {

"count":"Integer64",

}

API 1: Decrement a counter

POST v1/counter/{counter-id}/dec.

content-type/accept-type: "application-counter-api/json"

Response: {

"count":"Integer64",

}

Database design

I need a RDBMS for user management.

For a counter i will have custom storage.

High-level design

Client : HTTP client/APP which connect to counter service

HTTP load balancer: Load balance b/w state less API service which works based on Async IO and enque requset in Kafka TOPIC based with name {counter_name_ID}

Kafka: Will have each topic for couter_name_id. Each topic will have multiple partition for parallel message consumption.

Counter operation service: Will have kafka consumer group and read message and batch it (aggregate it)and update the storage counter in disk only if disk update is successful then it will mark offset in the kafka.

Storage service:

Storage system/service follow MVCC and maintain counter version. I will update the disk only if payload is having one version less than version in the file system. It enables Atomic update.

Storage service is distributed it will have primary copy for all the counter_name and counter value is replicated across other storage_service who act as secondary.

flowchart TD
    B[client] --> C{Load balancer}
    C{Load balancer}--> D[API server]
  D[API server] --> F[KAFKA]
  F[KAFKA] --> G[Counter Operation service]
  G[Counter Operation service] --> H[Distribured Configuration store]
  G[Counter Operation service] --> I[Storage Disk]
    G[Counter Operation service] -->  K[Counter Operation service 1..N replicas] 

    D[API server] --> L[Counter Read replicas]

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Storage service:

Its a stateful service which store counter value.
It expose increment and decrement endpoints.
It internally implement MVCC mechanism for atomic update. If update request version is just one value less the current version in DB we will update otherwise reject the request.
Storage service also responsible for replicating the counter value after every writes. Since we need good consistency we can complete write only after quorum of replicas have acknowledged successful write.
All writes coming to storage service is Batched in order to reduce writes.

Trade offs/Tech choices

I could have used some RBDMS counter which has read replicas. But having custom counter gives better performance at scale.

Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.

DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.

Failure scenarios/bottlenecks

Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.

DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.

Future improvements

Kafka should have cluster based deployment.should be deployed accross multiple region and Availabllity zone.Otherwise it will become bottle neck.

DCS(distribured configuration store) should be deployed accross multiple region and Availabllity zone to ensure storge cluster metadata is always available.

得分: 8