设计Facebook新闻推送

难度: hard

为Facebook设计新闻推送功能，涵盖了用户关注的个人资料和页面产生的动态集合，包括帖子、图片、视频和状态更新。

Solution

System requirements

Functional:

User visits facebook.com. Gets home feed.

User follows another user.

User follows a page (specific topic).

On user's newsfeed, user will see:

Posted status updates, images, videos from the followed user.
Status updates, images, videos posted on the page.

Non-Functional:

Response time. After user visits the page, newsfeed should appear within 0.5 second.

Client can be a browser or a mobile app.

Availability.

Scalability. Increasing data e.g. videos and images.

We can relax a little bit on consistency. E.g. if one user sees a post at time 0, but another one sees the same post seconds later, that'd be acceptable.

Key Points:

System receiving huge amount of media files (videos and images), while maintaining short response time for users.

Capacity estimation

DAU: 500M users

Each user accesses homefeed twice a day

Each user posts a status update, image or video once a day.

Data:

Video: 10MB

Image: 1MB

Status update: 1KB

500M / 3 * 10MB = 1.7PB / day

Key observations on data:

Blob store to media files
Cache (CDN and in data center)
Write path - uploading lots of big files
Conversion steps - suitable for reads
Read path - quick read on newsfeed

For relationship, write path is follows APIs.

User profile - 2KB

2B users.

4TB data

Document based NoSQL, e.g., MongoDB, would be a good choice. It has configurable consistency models. Compared to RDB, it would allow us to be more horizontally more scalable. It would be possible to trade ACID consistency for scalability and performance.

API design

RESTful API.

follow_user(follower_user_ID, followed_user_ID)

follow_page(follower_user_ID, followed_page_ID)

get_newdfeed(user_ID)

upload(user_ID, media_content)

post_update(user_ID, update_content)

Database design

Essential parts of data models:

User:

User_ID (primary key)

Post:

Post_ID (primary key)
Author (foreign key to User table)
Type (text / image / video)
Text
URL (Media URL)
timestamp

Page:

Page_ID (primary key)

# this is a join table which represents the posts made by user

User_Post:

User_ID
Post_ID

# this is a join table which represents the posts made on a page

Page_Post:

Page_ID
Post_ID

Newsfeed Service will use the tables above to create feed. For example,

select Post_ID from User_Post where User_ID in (N users user is following) and timestamp (last 24 hours)

and aggregate the posts with some heuristics, e.g.,

score(post) = weight1 * recency + weight2 * popularity_of_followed_user + weight3 * number_of_likes_on_post

We would store some of this calculation in cache, so that Newsfeed Service does not have to do this query and calculation on every request.

Tracking recent posts by a user

{

User_ID: ,

Posts: [

Type: text/image/video,

Text: text content of status update,

URL: URL pointing to the media file

]

}

Media will be a parent URL. E.g. "https://sth.facebook.com/media/0123". Media Service will use this as a key to look up child URLs that contain media in different format and quality, e.g., http://sth.facebook.com/media/0123_480x320_mpeg or something like that.

High-level design

flowchart TD
    C[client] --> CDN[CDN at ISP]
    C --> IXP[CDN at IXP]
    IXP --> API[API Gateway]
    API --upload()--> UP[Upload Service]
    API --get_newsfeed()--> NEWS[Newsfeed Service]
    API --follow_()--> REL[Relationship Service]
    API --download()--> Media[Media Service]
    UP --write--> BLOB[Blob Store]
    UP --message--> MQ[Message Queue]
    JOBS[Async Jobs] --pull--> MQ
    JOBS--converted media-->BLOB
    JOBS--update-->CC[Cache]
    JOBS--update-->DB[Mongo DB]
    NEWS--read-->CC
    NEWS--read-->DB
    Media-->CC
    Media-->Blob

Request flows

Write flow:

When client uploads a media file, client & system will use SFTP protocol. This is more efficient than uploading files over HTTPS.
File will be written to blob store.
Async jobs are triggered by a message in Message Queue. They will convert the media files to different sizes and formats. This is important because different clients in different network environments require different media formats.
Another async job receives media files, and update cache & MongoDB that store latest posts by user or page.

Read flow:

Client requests newsfeed to Newsfeed Service.
NS will return JSON representation of the newsfeed:

{

user_ID,

timestamp: ,

[

type: text/image/video,

text: text content

URL: links to image/video

]

}

Client will try to download media files from first in CDN at the user's Internet Service Provider. Next in the CDN at Internet Exchange Point. If it cannot find it in either place, it will hit the datacenter. In the datacenter, Media Service will first try to get the files from cache. Failing that, it will read it from Blob Store. This hierarchical caching mechanism will ensure lowest possible response time.
Read path will also use adjustable rate media streaming protocol (e.g. DASH) between the client and the CDN or Media Service. The server will stream media data to the client, in different media format and quality to adjust to the client's capability and network condition. To enable this, it is important to store a media file in different formats & quality in Blob Store and CDN.

Detailed component design

Let's discuss Newsfeed Service more deeply.

The job of Newsfeed Service is to generate a list of posts catered for each user. Something like in JSON:

[

{

post_ID: 123,

textContent: ...,

mediaURLs: [],

comments: [],

]

and so on. The list would be sized at least 5 and can be dynamically generated to be larger size, as user scrolls their newsfeed downward to see more posts.

The top K posts are chosen by a scoring function, e.g.,:

score() = weight_1 * popularity_of_author + weight_2 * number_of_likes_on_post + weight_3 * number_of_sharing_on_post + weight_4 * number_of_comments_on_posts

Trade offs/Tech choices

An interesting tradeoff is when to compute this list of recommendations (discussed in Detailed component design section).

In one extreme, it gets generated when a user accesses the Newsfeed (lazy evaluation). This is efficient in terms of utilizing server side resources. However, it might take too long (e.g., more than a couple of seconds) and the user might get frustrated.

In another extreme, we can pre-compute this list periodically (e.g. every 5 min) for every user and cache the result. This gives a responsive user experience, but it requires huge amount of resources.

We can strike the balance between these two by having a category of users. For example, if a user is actively engaged with Newsfeed, it would make sense to refresh this user's list often. It would make sense to take real time update from followed users or famous people, using pub/sub messaging system.

But if a user is dormant - has not checked Newsfeed for months - then it would be fine not to calculate Newsfeed for these users. On-demand calculation would take some time, but the user would likely understand and accept. For the business, these are not very important users to take care of.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

得分: 9