设计简易短链接服务:TinyURL案例

难度: easy

设计一个类似于TinyURL的短链接服务系统。该服务能生成简短的别名,用来重定向到原始的、更长的URL。著名的类似服务包括bit.ly、ow.ly和short.io。

Solution

System requirements

Functional:

  • Generate tiny url
  • Redirect user from tiny url to destination
  • Tiny URL should be as short as possible
  • Tiny URL should be different for different input URL
  • We use a combination of alphanumeric characters (A-Z, a-z, 0-9) to generate tiny url

Non-Functional:

  • Support 100 million users
  • High availability

Capacity estimation

estimated QPS = 100000000 / 24 / 60 / 60 = 1157

assuming each user generate 10 urls per day

QPS: 10 * 1157 = 11570

peak QPS estimation: 11570 * 2 = 23140

If we use alphanumeric, in total there are 26+26+10 = 62 characters for a single length

The total tiny url per day is 10 * 100 M = 1 T => 10^9

Assuming we keep the tiny url in our DB for 50 years

Then the total URL would be:

10^9 * 50 * 365 = 1.8 ^ 13

So the max length of tiny url could be: 62 ^ X > 1.8 ^ 13

X = 8

We can make the tiny url length as 8

API design

  • Generate URL
  • Post
  • api/url/generate
  • data : { url }
  • return: { tinyUrl }
  • Redirect URL
  • Get
  • tiny/{generated_tiny_url}

Database design

url table

  • id
  • long_url
  • short_url

High-level design

  1. Client
  2. Client shows UI for generating tiny url for user
  3. For user uses a tiny url, no client is needed
  4. Load balancer
  5. Balance generating and redirection request to different machines
  6. API gateway
  7. Forward API request to corresponding service
  8. Rate limiting
  9. Tiny URL service
  10. Generate tiny url for a given url
  11. Redirect user from tiny url to the actual web page
  12. URL DB
  13. Store mapping date of tiny url and actual url
flowchart TD
    A[client] <--> B[Load balancer] <--> C[API Gateway]
    <--> D[Tiny URL service] <--> E[(URL DB)]
    D[Tiny URL service] <--> F[URL Cache]
  

Request flows

URL generation

  1. User inputs a url from client and send generate request from a button
  2. Load balancer direct the request to API gateway and then to a corresponding tiny URL service machine
  3. Tiny URL service checks whether the URL DB already generated a tiny url for the current url, if so, return the tiny url. If not, the service generates a tiny url for the url, store it in the URL DB and return to the user

Tiny URL redirection

  1. User enter a tiny url in browser
  2. The tiny url redirect request will go through load balancer and API gateway and then responded by a tiny url service machine
  3. Tiny url service checks whether the tiny url exists in DB or not, if exists, the service return the actual URL with a permanent redirect http request to the user's browser to the destination web page. If not exits, the service sends back a 404 not found request and redirect users to a graceful 'not found' web page
  4. To make step3 respond faster to support high availability, frequently used tiny urls can be stored in a URL cache with K-V pair: (tiny_url, actual_url), tiny url service can first check whether the tiny url is in cache or not

Detailed component design

  1. URL cache
  2. We can use redis cache, it's widely used distributed cache
  3. For cache eviction, we can use least recent used (LRU) strategy to move out the least recently used tiny url
  4. URL generation
  5. To make the tiny url unique with 8 length character, tiny url service can use UUID as the generation, we need to truncate the length of UUID to fit the tiny url as mentioned in below trade off section

Trade offs/Tech choices

  1. URL generation
  2. One common approach to generate unique strings is hashing, the common hash algorithms are MD5 and SHA-256. The shortest one we can is MD5, it generates 32 length string, it's actually too long to use as tiny url directly
  3. Another approach to generate unique strings is UUID, the generated length would be 32 characters, it's also too long to use as tiny url directly
  4. To make a trade-off, we can truncate UUID to only use the first 8 character as the tiny url. This truncation will introduce a collision if the first 8 character has already been used, we can add logic in tiny url service to check the collision and rerun the generation.
  5. URL DB Sharding
  6. We can shard by the first letter of the tiny url to improve the availability, it gives us 62 machine spaces, if it's not enough, we can use the first and second letter of the tiny url

Failure scenarios/bottlenecks

  1. Failure
  2. If one of the tiny url service machine went down, the requests it's processing would fail, user will not get generated tiny url back and cannot be redirected to the destination for current request. However, if user retries, it should work again for the user as the request is stateless and the new request can be processed by another healthy machine
  3. If user send too many requests for generating or redirection, the user will be rate limited by API gateway
  4. Bottlenecks
  5. The size of URL cache can be a bottleneck for availability

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

  1. As we used truncation of UUID as the URL generation approach, we need to handle collision. One improvement we can further make to avoid handle collision is to add a distributed unique ID generation service, the tiny url service can send request to the ID generation service to get a unique ID with length 8


得分: 10