难度: medium
Solution
System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
keys can be overwritten
values can be in any format - just a byte array
"key-level transactionality"
- avoid conflicts on the same key
Non-Functional:
List non-functional requirements for the system...
100K reads/sec
10K writes/sec
read-heavy system
avg value: 1kb
highly available, low latency
tunable consensus
- w + r > n for "strong consistency"
- can update configurations for "weak consistency" (stale reads) or "eventual consistency"
value size limit of 1MB
Capacity estimation
Estimate the scale of the system you are going to design...
1e3 bytes/write * 1e4 writes/sec * 3e7 sec/yr => 3e14 bytes/yr => 100TB / yr
1e3 bytes/read * 1e5 reads/sec => 1e8 bytes/sec => 100 MB/sec
API design
Define what APIs are expected from the system...
write(key: []byte, value: []byte)
read(key)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
distribute load evenly across servers via consistent hashing
each shard has 2 replicas, across multiple regions
use gossip protocol and quorum consensus to detect failures and elect new leaders during failover
w + r > n
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
client: http client
coordinator: "partition-aware proxy" to direct the write/read to the appropriate partition
distributed databases: each running a hash table, flushing writes to disk
flowchart TD B[client] --> C{server} C --> D[Database]
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
when a write is made, we write it to our commit log (WAL), providing REDO information, then to our memory buffer which is later flushed to disk (a SS Table or append-only file).
when a read is made, we check the cache first. If in the cache, return from cache. Else, leverage a bloom filter to find which SS Table it's in, then return it from there.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
each partition will have a cache and SS table to improve durability. ie: write-ahead logging
Uses a write-through cache to leverage caching and ensure consistency between the cache and the persistent storage.
For writes, if the requested key not in memory, leverage a bloom filter to narrow down which SSTable contains the key, then look up the key in that SS Table and return the data.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Tunable consistency with r + w > n
strong consistency: avoids stale reads
weak consistency: stale reads are acceptable
eventual consistency: weak consistent where not all replicas have to be consistent
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
With replication and failover, we can fail over to a new leader when the leader goes down. Majority quorum is usually used, via algorithm like paxos or raft, to elect the new leader.
consistent hashing ensures minimal data is moved when a new server is added/removed to/from the ring
If using weak consistency, leverage anti-entropy processes, or read-repair, to prevent replicas from getting out of sync and minimize stale reads.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Handling large key values - compression, streaming, chunking, indexing, etc
得分: 9