设计一个类似Dropbox的服务

难度: hard

让我们来设计一个类似于Dropbox的文件托管服务

Solution

System requirements

Functional:

Ability to take the URL and return a shorter format of this URL

Non-Functional:

Availability over consistency -> We want the service to be available over consistency.

Low latency. We want our tiny URL to provide as minimum overhead as possible.

Be available in multiple regions of the world

Capacity estimation

We can see that we are going to get about 100 reads per second and maybe as much as 1 write per second

If we have 1 read per second, to encode a URL we would probably need 8 bytes of data, assuming there is about 10 * 10 ^5 seconds in a day, we get about 8 * 10 ^ 5 Bytes of data a day which is 800 MB of data every day. 

With 100 reads per second, we can't get away with having everything one host, I think we would need to scale beyond that

API design

We would want to have 2 types of APIs - one APi would be serving the get request and the URL would look something like this:

co.me/url-link-short.

This API would redirect the sender to the actual resource and on success return 302 (redirected) and on fail we can return something like 404

The other API would to let the user encode their URL through the UI portal. We may allow them specify the URL themselves (which internally would be translated into a hash that can be easily used to locate the node with the the link on it.

We may also want to ensure that the links ecnoded aren't malicious or point to dark web/denied reosuces. If a user tries to encode something like that, we may deny the request on create API flow.

Database design

The most perfect database choice for this would be a simple (persistent) key-value store. Since key-value stores are usually NoSQL, we don't have to worry about scaling issues too much.

High-level design

We are going to have to have a design where a client would first hit the load balancer which would redirect our request to the appropriate application server based on the RR stateless algorithm.

The application server would first check if the url is present in our in-memory cache (which we would know based on the cache server id based on the cache client daemon running on the application server which uses consistent hashing to determine which cache server to check. Cache is LRU

If we get the hit, we return right there. Else, we would want to check out the database. (Key-value store). NoSQL database would be able to route the request to the correct partition and return the request. If not present, we return 404.

Overall, I am okay if immediately after the resource is added, it takes a while for it to be fully propagated to all replicas (eventual consistency is OK). However, we do want to keep durability (making sure we don't lose the write request) and latency (making redirect as quickly as possible <20ms)

The writing flow is similar but with some details explained below.

flowchart LR
    B[client] --> C[LoadBalancer]
    C --> F[ApplicationServer]
    F --> M[Reddis Cache]
    F --> D[Database key-value store]
  

Request flows

Please see the sequence diagram attached for the read flow.

The writing flow would use a write-around cache - meaning the request is only written to the database and not written in the cache. The cache entries are updated on the write flow to minimize read latency (trading off write latency for read latency) since our system has to be very fast. ...

sequenceDiagram

Client->>+Server: Do you have the resource for me?

Server->>+Cache: Do you have this resource for me?

Cache->>-Server: Yes, here you goo.

Server-->>-Client: 302 Redirect

Server-->+Database: Do you have the request for me?

Database-->>-Server: Yes, here you go

Server-->>Client: 302 Redirect/404 Error

Detailed component design

I'd like the database to be replicating synchronously within the datacenter + async outside the datacenter. Version conflicts aren't a big concern at all since our software doesn't really contain a ton of potential duplicate issues (note one URL can be mapped to two shortened URLs with no problems)

The only concern here is the distributed generator of the unique IDs. For this, we can use something like a hash algorithm that would take /Facebook/com/post-id/1231 and return aH102. We would need to make sure that two opposite strings wouldn't generate the hash for the same resource otherwise we would have a problem. To prevent that, every time we generate a hash, we would check if this hash is already present in the database and mapped to another resource.

Trade offs/Tech choices

There are some trade-offs I am making when designing the read/write flows around the cache. Some choices are who is adding things to the cache and who is just reading from the cache. We are trading off write latency (which is OK to be slow) with read latency (which we care that is fast).

Another trade we are making is around consistency. We are again trading off consistency for latency, we want the request to be serving reads as fast as possible without fully waiting for all replications to complete. The reason is again we care about latency and most likely link doesn't need to be immediately available within seconds of creation. Since our users would probably not share a link instantly.

Failure scenarios/bottlenecks

Well, we do have a few things which are SPOF - we can deploy many load balancers, servers, databases nodes, and cache nodes, but we need to be very careful with the algorithm that is generating hashes for the shorter URLs. there is many things that can go wrong:

  1. We are not keeping track of used hashes effectively,
  2. We are making a lot of DB calls on create, lowering write latency
  3. The hashes ran out so we shorter urls aren't short anymore.
  4. System can go down if someone very famous posts a URL that generates a lot of traffic
  5. We have a few components that introduce SPOF

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Additionally, since our system is only eventually consistent, there is a few scenarios where this may not be acceptable. Let's say there is a very famous person who posts a link and within seconds we get 100s of thousands of requests. There are 2 problems here now:

the shortened url hasn't been replicated to all db nodes (especially the ones across the world);

we are getting a thundering herd problem - all of these requests are going to cache miss and all go to the disk which is slow and can cause additional problems/cost in our DB usage.

We can solve this problem by using a "push" based approach in cache for famous/reserve users - we first write in cache and then we write (and replicate) on disk. That would solve the replication/and thundering herd problem, but only if the cache daemon on the application server can have the ability to precisely determine the node id where the cache node with the resource is located.


得分: 9