设计类似Facebook的即时通讯工具

难度: medium

设计一个灵感来源于Facebook Messenger的网页和移动端即时通讯服务

Solution

System requirements

Functional:

can send both text and media

both 1-1 and group chat, user can add members to group chat, change group name, update retention policy, delete messages

both web and mobile client

sort the feed based with conversation with unread messages first and then last read timestamp

Non-Functional:

very low latency with p50 on single digits on sending messages

when sync the feed and messages, clients only sync the new content since the last time it syncs

Capacity estimation

100M DAU

each user sends 10 messages per day

API design

online api

// send new messages

send(destinations []Conversation, content string, ttl long)

// delta-sync the chat feed

syncFeed(userID)

// delta-sync the conversation

syncConversation(conversationID)

// create a new conversation

createConversation(conversationID, conversationMetadata)

// updateConversation

updateConversation(conversationID, conversationMetadata)

// deleteConversation

deleteConversation(conversationID, conversationMetadata)

deleteMessage(conversationID, userID, messageID)

// offline api

sendNotification()

Database design

user table

userID uuid

username string

createdAt long

privacySetting enum

notificationSetting enum

friendship table

key(uuid/uuid): fromUserID/toUserID

friendType num

lastUpdateAt long

feed table

primary key: userID

secondary key: conversationID

lastUpdateAt: long(local secondary index)

conversation table

two type of rows

message

primary key: conversationID(uuid)

secondary key: messageID(long)

createAt: long

lastUpdateAt: long(local secondary index)

retentionPolicy enum

senderUserID uuid

messageContent []byte

conversation metadata

primary key: conversationID(uuid)

secondary key: special const

createAt: long

lastUpdateAt: long(local secondary index)

retentionPolicy enum

name: string

High-level design

CDN to cache media

cloud object storage to persist raw media

api gateway

chat gateway(support both web and mobile protocol), in order to directly route message from the sender to the receiver and also send notification in real-time

backend application which handles

worker service responsible for sending notification

flowchart TD
    B[client] --> C{server}
    C --> D[Database]
  

Request flows

Sync

  1. User open the web/mobile client app and it will establish a stream connection with the chat gateway. If the user has logged in, there should be an authenticated token in order to successfully establish the connection
  2. Clients triggers syncFeed which fetch only the updated conversations.
  3. Clients triggers syncConversation which fetch only the newly updated conversation metadata or newly updated messages per conversations corresponding to the update feed items returning from step3.

Send

  1. Clients initiates send/update/delete requests to server
  2. Specially when sending new media, it firstly upload the raw media to CDN and a separate media processing system. The raw media will eventually persisted in cloud object storage. Clients get a reference link to the raw media and attach the link as part of the message.
  3. Backend application first checks if the sender has the permission to add/update/delete messages into the selected destinations. For 1:1, calling Friend service checks if the sender~receiver indicates a valid friendship. For group, fetch the special row of conversation metadata and check if the participants contains the sender id.
  4. If the permission check passes, the message will be written into the conversation table.
  5. On the success of synchronized writes to conversation table, backend sever can return clients with 200. In addition, it will async update the feed table for all the participants entries. This will make the write much less expensive and might introduce inconsistency. We could dump the failure of updating the feed table into a message queue to aggressively retry.
  6. Another async tasks is to send the message to the websocket and check if the users corresponding to receiver is connected.
  7. Lastly, backend server will also async call or dump the notification events into the message queue the notification server which

Detailed component design

Deltasync on Feed/Conversation table

client sync token: last synced timestamp

Select * from Feed where primary_key = user_id and last_update_ts > feed_token

this will return only the feed items which have changed since the last time sync from servers.

Select * from Conversation where primary_key = conversation_id and last_update_ts > convo_token

this will return only the changed items in the conversations: new/updated messages or participant data

On write path, in order to properly update the sync token, we need to add a special row which maintains the global max of all the last updated timestamp belonging to the same user feed conversation. That means every time will do a read-modify-write on that single row.

Trade offs/Tech choices

Chat gateway routing

chat gateway is only one way from server to sender without direct connecting two clients. This sacrifices latency but can better guarantee consistency. Messages directly sending between two clients might have the issue that messages are delivered but haven't yet persisted on server or the persistence step fails. Then we need to add additional logic handling the failure. This will also impact the order of messages.

Delta Sync

This is a complicated feature which requires enabling txn on each write. But this will save huge amount of network bandwidth if we can only return delta information instead of pulling all messages.

Failure scenarios/bottlenecks

Contention on writes

since chat app is a very write intensive app, we didn't add the cache layer. there might be an issue to support conversations with large groups(>100).

Cold Start

on the first time open app(or initial sync without local cache/sync token), users will fetch tremendous amount of data. We need to paginate in this scenarion

Future improvements

add e2e encryption

regionalization so that users in different geo location can requests local servers

split the backend application into smaller micro services: user service/friend service/chat service/notification service

other offline tasks, check the message deletions/send scheduled events notifications/listen to other deletion events


得分: 9