难度: medium
Solution
System requirements
Functional:
- User can post tweets
- User can follow other users
- User can view followed tweets on their home timeline
- User can view another user's profile home page
- Tweets are shown in reverse chronological order
- User can like a tweet
- A tweet can contain texts and media files such as picture
Non-Functional:
- Posted tweets should be updated to show up in real time
- Prioritize high availability over consistency
Capacity estimation
Total users: 1M per day
Tweets sent per day: 5M => write QPS: 5M / 24 / 3600 = 58
Tweets view per day: 500M => read QPS: 500M / 24 / 3600 = 5800
Favorites per day: 50M => write QPS: 580
Total QPS: 6500
Peak QPS estimation: 2 * 6500 = 13000
This can be handled by 20s SQL machines
We can see read is much larger than write
Storage estimation:
One tweet = 200 byte text + 5M media
Assuming 20% of tweets contain media
One day: 5.2MByte * 1M + 200 byte * 4M = 5000 TB
If we store the data for 50 years: 50 * 365 * 5000 TB
API design
- User can post tweets
- Post /v1/tweet
- body {auth_token, user_id, content}
- User can follow other users
- Post /v1/follow
- body {auth_token, user_id, follow_user_id}
- User can view followed tweets on their home timeline
- Get /v1/home:user_id
- User can view another user's profile home page
- Get /v1/profile:user_id
- User can like a tweet
- Post /v1/favorite
- body {auth_token, user_id, tweet_id}
Database design
Tweet table - Store tweet info
- TweetId
- OwnerId
- Date
- Text
- Media link
- Like count
User table - Store user info
- UserId
- UserName
- RegisterDate
- Follower count
- Country
- Gender
- Birthday
Like table - Store like info
- TweetId
- LikedUserId
Follow table - Store follow info
- UserId
- FollwerId
Timeline table - Store timeline info managed
- UserId
- TweetId
High-level design
Client
- End user client
Media file CDN
- Store media files for tweets to ensure they are highly available
Load balancer
- Ensure requests are equally distributed to different servers
API Gateway / Webapp server
- Return end user web page
- Rate limiting
- Route API requests to corresponding services
Tweet service
- Handle post tweet request
Fanout service
- Fanout a newly posted tweet to follower's timeline
Message queue - Kafka
- Pub / sub for tweet post request between tweet service and fanout service
Follow service
- Handle user follow request
Favorite service
- Handle tweet like request
Home/profile service
- Handle timeline request
Timeline Cache
- Cache timeline to make sure it's highly available
flowchart TD B[client mobile/web] <--> C[LoadBalancer] <--> D[API Gateway / WebApp Serer] J[(Database)] D<--v1/tweet-->E[Tweet service]<-->J D<--v1/favorite-->F[Favorite service]<-->J D<--v1/follow-->G[Follow service] <--> J D<--/v1/home or /v1/profile-->H[Home/Profile service] <--> J H-->I[Timeline cache] I-->H K-->I K[Fanout Service] <--> J E--pub-->L[[Tweet post message queue-Kafka]] K--sub-->L B --fetch media--> M[(Media file CDN)]
Request flows
Post tweet
- User post a tweet from client, an API request Post v1/tweet is sent to tweet service after going through load balancer and API gateway
- Tweet service write the new tweet into tweet table
- Tweet service publish a tweet posted message to message queue
- Fanout service subscribes the message queue, when received a message, it add the new tweet into the user and followers timeline table as well as timeline cache
- On end user side, the user will see the new post on profile, follower will see the new post on their home
View home/profile timeline
- When user land on home page, home/profile service first get tweets from timeline cache, which stores X most recent tweets, then fetch tweet info from tweet table and return to the user to render the home page with tweets.
- The service also query timeline table in the DB to get more than X tweets
- Tweets will be sorted by reverse chronological order with tweet id to return to user
Follow / Unfollow
- When a user follow another user, a follow API request will be sent to follow service
- Follow service updates follow count in the user table, and then write a new line of data into follower table to record the follow
- Once the request is completed, it returns OK to the client, the follow button on client side will change to unfollow.
- For unfollow, it's a reverse operation to follow
Like
- When user clicks like button on a tweet, a favorite request is sent to favorite service
- The service update likes count for the tweet in tweet table as well as write a new line of data into like table
- Once the request is completed, it returns OK to the client and user see the like count is updated
Detailed component design
Home/profile service
- The service first get tweets from timeline cache, which stores X most recent tweets, then fetch tweet info from tweet table and return to the user to render the home page with tweets.
- The service also query timeline table in the DB to get more than X tweets
Timeline Cache
- The timeline cache stores X most recent tweets for a user which can be sorted which reverse chronological order by tweet id
- For cache eviction, the least recently visited users' timeline will be evicted
- Users who has large follower group such as celebrities would most likely have their timeline in cache, in order to allow other users to get benefit from cache, we can prepare separate caches for celebrities
Sharding strategy
- Similarly, celebrities tweets will have larger amount of viewing request, causing hotspot on read for most of the tables, we can shard a separate DB for celebrities to increase availability
Trade offs/Tech choices
TweetId
- We can use snowflake id as tweet id, it's a 64 bit id that include useful information such as timestamp info, region info, we can sort the tweetid to generate a timeline with reverses chronological order, and it does not require a central place to generate the id
Follow
- During favorite flow, if the favorite count is updated but follow request failed, user may see an inconsistency of follow count and follower, this is OK as it's not a key info, we can further introduce a daily worker to run on follow table to fix the follow count
Like
- Similar to follow, the like count can be inconsistency with the actual like if any of the like request failed, which is OK. We can also introduce a daily worker to fix the count.
Failure scenarios/bottlenecks
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
得分: 9