难度: medium
Solution
System requirements
Functional:
can send both text and media
both 1-1 and group chat, user can add members to group chat, change group name, update retention policy, delete messages
both web and mobile client
sort the feed based with conversation with unread messages first and then last read timestamp
Non-Functional:
very low latency with p50 on single digits on sending messages
when sync the feed and messages, clients only sync the new content since the last time it syncs
Capacity estimation
100M DAU
each user sends 10 messages per day
API design
online api
// send new messages
send(destinations []Conversation, content string, ttl long)
// delta-sync the chat feed
syncFeed(userID)
// delta-sync the conversation
syncConversation(conversationID)
// create a new conversation
createConversation(conversationID, conversationMetadata)
// updateConversation
updateConversation(conversationID, conversationMetadata)
// deleteConversation
deleteConversation(conversationID, conversationMetadata)
deleteMessage(conversationID, userID, messageID)
// offline api
sendNotification()
Database design
user table
userID uuid
username string
createdAt long
privacySetting enum
notificationSetting enum
friendship table
key(uuid/uuid): fromUserID/toUserID
friendType num
lastUpdateAt long
feed table
primary key: userID
secondary key: conversationID
lastUpdateAt: long(local secondary index)
conversation table
two type of rows
message
primary key: conversationID(uuid)
secondary key: messageID(long)
createAt: long
lastUpdateAt: long(local secondary index)
retentionPolicy enum
senderUserID uuid
messageContent []byte
conversation metadata
primary key: conversationID(uuid)
secondary key: special const
createAt: long
lastUpdateAt: long(local secondary index)
retentionPolicy enum
name: string
High-level design
CDN to cache media
cloud object storage to persist raw media
api gateway
chat gateway(support both web and mobile protocol), in order to directly route message from the sender to the receiver and also send notification in real-time
backend application which handles
worker service responsible for sending notification
flowchart TD B[client] --> C{server} C --> D[Database]
Request flows
Sync
- User open the web/mobile client app and it will establish a stream connection with the chat gateway. If the user has logged in, there should be an authenticated token in order to successfully establish the connection
- Clients triggers syncFeed which fetch only the updated conversations.
- Clients triggers syncConversation which fetch only the newly updated conversation metadata or newly updated messages per conversations corresponding to the update feed items returning from step3.
Send
- Clients initiates send/update/delete requests to server
- Specially when sending new media, it firstly upload the raw media to CDN and a separate media processing system. The raw media will eventually persisted in cloud object storage. Clients get a reference link to the raw media and attach the link as part of the message.
- Backend application first checks if the sender has the permission to add/update/delete messages into the selected destinations. For 1:1, calling Friend service checks if the sender~receiver indicates a valid friendship. For group, fetch the special row of conversation metadata and check if the participants contains the sender id.
- If the permission check passes, the message will be written into the conversation table.
- On the success of synchronized writes to conversation table, backend sever can return clients with 200. In addition, it will async update the feed table for all the participants entries. This will make the write much less expensive and might introduce inconsistency. We could dump the failure of updating the feed table into a message queue to aggressively retry.
- Another async tasks is to send the message to the websocket and check if the users corresponding to receiver is connected.
- Lastly, backend server will also async call or dump the notification events into the message queue the notification server which
Detailed component design
Deltasync on Feed/Conversation table
client sync token: last synced timestamp
Select * from Feed where primary_key = user_id and last_update_ts > feed_token
this will return only the feed items which have changed since the last time sync from servers.
Select * from Conversation where primary_key = conversation_id and last_update_ts > convo_token
this will return only the changed items in the conversations: new/updated messages or participant data
On write path, in order to properly update the sync token, we need to add a special row which maintains the global max of all the last updated timestamp belonging to the same user feed conversation. That means every time will do a read-modify-write on that single row.
Trade offs/Tech choices
Chat gateway routing
chat gateway is only one way from server to sender without direct connecting two clients. This sacrifices latency but can better guarantee consistency. Messages directly sending between two clients might have the issue that messages are delivered but haven't yet persisted on server or the persistence step fails. Then we need to add additional logic handling the failure. This will also impact the order of messages.
Delta Sync
This is a complicated feature which requires enabling txn on each write. But this will save huge amount of network bandwidth if we can only return delta information instead of pulling all messages.
Failure scenarios/bottlenecks
Contention on writes
since chat app is a very write intensive app, we didn't add the cache layer. there might be an issue to support conversations with large groups(>100).
Cold Start
on the first time open app(or initial sync without local cache/sync token), users will fetch tremendous amount of data. We need to paginate in this scenarion
Future improvements
add e2e encryption
regionalization so that users in different geo location can requests local servers
split the backend application into smaller micro services: user service/friend service/chat service/notification service
other offline tasks, check the message deletions/send scheduled events notifications/listen to other deletion events
得分: 9