设计类似Instagram的平台

难度: medium

设计一个平台，类似于Instagram，允许用户上传自己的照片，与他人分享。

Solution

System requirements

Functional:

User can upload photos and short videos.

User can follow other users.

User can access their home feed, which shows photos and short videos from other users.

Non-Functional:

Read response time. When user access home feed, it should display home feed under 500ms.

Upload response time is less important.

Availability

Consistency - we can have eventual consistency after users upload media. We don't need ACID properties in data.

Scalability

Works on all clients (phone, laptop, desktop)

Fault Tolerance

Key problems

How to keep response time low, on all geo locations, while making huge media file upload possible?

Capacity estimation

500M DAU

Average image size: 2MB

Average video size: 100MB

Each user accesses home feed twice a day

Each user uploads image or video once a week. Equal likelyhood.

Data:

500M * 51MB / 7 = 3.6PB every day

2628PB in 2 years.

Huge amount of data, so we need a blob store in the backend.

Work hard on compression of data. Maybe we don't have to keep original size media in some cases.

Bandwidth:

Skip for now, but we need to be close to users (ISP, IXP, CDN).

API design

upload(user_ID, media_contents, metadata)

follow(follower_ID, followed_ID)

show_homefeed(user_ID)

Returns HTTP error code (4XX if user tries to do something unreasonable, 5XX if backend cannot handle something due to capacity)

Database design

Blob store for the media files.

Users

Follow relationships

Media metadata

1B users * 1024B = 1TB data

Follows relationship, 1TB

Media metadata, ~10TB

MongoDB is a good storage for these because we can use indexes and joins for relational queries. I can see MongoDB can also be a good option, as media metadata will grow with time, compared to RDB. RDB has strength in ACID consistency, but MongoDB provides good configurable consistency guarantee.

High-level design

flowchart TD
    CL[client] --> ISP[Cache in ISP]
    ISP --> IXP[Cache in IXP]
    IXP --> CDN[CDN]
    CDN --> API[API Gateway]
    API --upload(), download()-->MD[Media Service]
    API --follow()-->US[Users Service]
    API --show_homefeed()-->HF[Homefeed Service]
    MD --media--> BB[Blob Store]
    MD --message--> MQ[Message Queue]
    CV[onverter] --pull--> MQ
    CV --convert, compress--> BB
    US --> UDB[Users DB]
    HF --> UDB
    CV --> UDB
    US --> CC[Redis Cache]
    HF --> CC

Request flows

This is a key for good response time. User's request goes through several steps of cache first. Closest to users first, so ISP is the first, IXP is next, CDN is next, and then finally hits Media Service. We want to make this as rare as possible.

Converter is another key component. This semi-real time job is triggered by a message in Message Queue, which is created when a user uploads a media file.

Converter generates several different formats (e.g. high resolution video for laptops, lower resolution and mobile friendly encoding format for mobile apps). Converter writes all these data back to the Blob Store.

Users DB stores users information, media metadata, and following relationship.

Detailed component design

We will use a adjustable video streaming protocol over HTTP. A connection is established between the client (e.g. a browser or a mobile app) and the media server (which can be in ISP, IXP, CDN, or Media Service). I should probably split Media Service into Uploading Service and video streaming service.

This protocol automatically adjusts the quality of video, and encoding format, depending on the client and the network bandwidth it is experiencing.

I think home feed generation would present an interesting algorithm challenge.

For User A, we need to return K most interesting posts. K can be small like 5 on a mobile device. Or can be larger if the user is on a browser on a laptop / desktop.

Interesting posts would be some combination of:

Recent posts and by User A themselves.
Recent posts by users followed by User A.
Recent posts by other famous people, even if they are not followed by User A.

Mathematically, it would be:

score(post) = weight_1 * popularity_of_user + weight_2 * popularity_of_post (e.g. likes, comments) + weight_3 * followed_by_user_A

The theoretical algorithm would be to calculate this number for each post, and take the top K posts, using Heap data structure.

For the posts by users that are not followed by User A, we can pre-compute this globally, because it is independent from who User A is.

For the posts by users followed by User A, we would maintain a dynamic list of posts by these users. The list would be updated by a service who is listening to the Message Queue which has messages corresponding to new posts.

Trade offs/Tech choices

Fast response time for users is enabled by the adjustable rate video streaming protocol that's established by the streaming server closest to the user, i.e., ISP, IXP, CDN.

Uploading huge files - we may need to use SFTP server instead of uploading files over HTTP. Converter enables adjustable rate video streaming, which helps on read response time.

Failure scenarios/bottlenecks

I would be most worried about the huge blob store, so question is how to partition data. Hierarchical storage architecture would help. The videos that get accessed often should store on higher speed, more expensive storage system.

The videos that get accessed less often should be stored on slower, less expensive storage system.

Geo location: I would imagine we can use geo location of clients to optimize storage strategy. For example, videos in a less commonly used language are more likely to be accessed from a particular region.

Still, it would be important to replicate data across regions (e.g. even Japanese videos should be backed up in some other regions) to enable disaster recovery. 3-2-1 backup would be important.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

得分: 9