设计一款健身追踪应用

难度: easy

构建一个应用程序，允许用户跟踪他们的体育活动，设定目标，并监控进展。

Solution

System requirements

Lets start with what this fitness app will do. This application will keep track of two main forms of exercise, Endurance and Strength.

Functional:

For weightlifting exercises, the tracking is done mostly by the different exercises, how many repetitions, at what weight.
For endurance exercises like running / hiking / swimming. We will have a GPS that tracks the distance travelled, at what pace, and reflect the route you travelled as well. your pace will be reflected on the application every 1-2 seconds, but the backend will only keep your speed every 10 second interval.
If you have a heart rate monitor, your heart rate will also be tracked. When doing your exercises, your heart rate will be reflected on the application every 1-2 seconds, but the backend will only keep your HR every 10 second interval.
You can also set goals for your exercises, and you will get notifications on when you achieved them

Non-Functional:

For GPS, location should be updated frequently and reflected on your device.
Similarly for heart rate, it should be updated frequently and reflected on your device.
Data should be kept for up to 10 years, data will be cleared after the expiry date

Capacity estimation

Database entity estimation:

Weightlifting data

UserID (4 bytes)
Exercise ID (4 bytes)
Type of exercise (10 bytes)
Repetitions (4 bytes)
Weight (4 bytes)

Running data

UserID (4 bytes)
Exercise ID (4 bytes)
Type of exercise (200 bytes) // running, cycling, swimming etc
Distance travelled (4 bytes),
Average speed (4 bytes),
Time (4 bytes)
URL to route image (200 bytes)
route image (2 MB)
Array of speed data (2KB)

HeartRate data

UserID (4 bytes)
ExerciseID (4 bytes)
Average HR (4 bytes)
Array of HR data (2 KB)

Goals data

UserID (4 bytes)
Type of exercise (200 bytes)
Goals (endurance type or strength type) (12 bytes)

Storage estimation

If lets say we have 1 million daily users, each of them posts 2 exercises per day, each of them set 1 goal per day (lets say 1 weight lifting, 1 endurance, both will also have HR data)

1 user new data per day: (4+10+4+4) + (4+200+4+4+200+2KB + 2MB) + 2*(4+4+2KB) + (4 + 200 + 12) = 500 bytes + 2KB + 2MB = 3 MB
1 million users = 2.9TB a day
1 year = 1PB
10 years = 10PB
Note that most of the storage will be from the image of the route taken

Traffic estimation

Lets say there are about 50,000 users every moment.
Lets say they send HR requests and GPS requests every second.
Meaning about 100,000 read requests are made per second
2 million write requests per day = 85000 write requests per hour = 23 writes per second

Cache estimation

Lets say we want to cache 20% of the data per day.
600 GB for the cache

API design

We can use REST APIs to expose functionality of our service

request_gps(user_ID) -> our server will likely use one of the common geolocation APIs available including google's and apple's geolocation APIs. We will likely use this data to plot your locations as well
request_hr(user_Id) -> calling heart rate API and return to the client
add_exercise_data(user_ID, exercise_type, HR info, GPS info) -> will add the necessary information into the DB
add_goals(user_ID, exercise_type, goals info)
pause_exercise(user_id) -> stops tracking
start_exercise(user_id) -> starts tracking

We have already defined the data that we want to store, what kind of database should we be using? I would want to use a Document DB like MongoDB to keep the endurance data, strength data and goals data. This is because there are not a lot of relationships between the data entities in the database, furthermore I want to keep an array of speed data, which is of variable size, meaning the data that we keep is not structured. This is supported by the goals data, which stores different types of goals for different types of exercises. In addition, we need to do scaling as our user base grows, MongoDB is easy to scale horizontally.

To keep the route image, we will keep it in a cloud storage like Google Cloud or Azure. We will use a URL to access the image.

High-level design

User Interface:

The user can start and stop the tracking of exercise using a button, calling the API that starts and stops tracking.
The client sends requests frequently to our backend servers to request data for location (if needed for endurance exercises) and HR. These information will then be reflected on the application interface. These information should be updated frequently (1-2 seconds)
Once the client stops the exercise, the data will be saved to DB

Servers

Heart Rate Server: Server that exclusively handles Heart Rate requests
Location Server: Server that exclusively handles Location requests
Normal servers: Saves and retrieves exercise information
The client will send read and write requests to these three kinds of servers

Database

MongoDB
Google Cloud Object Storage

flowchart TD
    B[client] --> C{server}
    C --> D[Database]

Request flows

Upon starting, the client sends requests for location and heart rate requests to the location and heart rate servers respectively.
These requests will be sent every 1-2 seconds and the data is kept locally on the client side. The client side will keep an array for location and an array for HR and will append the responses to them every time the servers respond. The client side will also calculate the average speed and average HR and distance travelled and reflect it on the interface for users to see. The client side will also plot on a map the route taken
When pause, the tracking stops temporarily but no save happens yet
When you stop the exercise, The route image is sent to the Object Storage to be kept, the URL to that image is also generated from this. The URL, the location and HR arrays are sent to the normal servers to be kept in the Document DB.
Users might also request to retrieve the data from past sessions, here the server will look through the Document DB to retrieve the necessary information, and retrieve the route image from the object storage and reflect them on the interface.
When you save the data, the server will also check whether the goals were met from matching the userID and exercise type from the incoming data to the goals set. If the goals were met, the request will also send an indication that the goals were met.

Detailed component design

From the traffic calculation, we see that there is going to be a lot of Location and HR requests per second, hence we must use multiple servers for each of them in order to meet the traffic demands. We will have a load balancer to divide the load to the different HR and Location servers. This is similar to the Normal Servers were we will also have multiple normal servers to meet the demands. Load balancers will be placed between the servers and the client for this.

In terms of how we can partition the data, one possible method could

be dividing the data by the UserID, There wont be a situation where there is a 'hot' user that can overload a server since a user can only exercise at most 3 times a day. Hence partitioning the data by userID seems like a viable option

Lastly, we can definitely have a cache for users who are more active on the application, who will access their past sessions more often. We can do this using a memcache and caching 20% of the data

We should also have a server that serves to clear expired data from the database and cloud storage. This allows the HR, Location and Normal Servers to continually meet demands without worrying about expired data in the database and cloud storage

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

We want our locations and HR servers to be reliable, hence we should have multiple back up servers to continually meet high request demands

We should also have backup DBs to prevent loss of data. The server that serves to clear data can also serve to duplicate data

Future improvements

We could possibly have a function that allows users to follow other users on the application. This means users can view the past training sessions of other users. If this is the case, then we must rethink how we can partition the data. This is because there could be popular users whos training sessions are viewed by many people.

One possible way of dividing the data is through the ExerciseID or SessionID, Data with the same ExerciseID will be kept in the same DB, hence when people want to view the sessions of popular users, the request load is divided among the databases.

得分: 9