难度: easy
Solution
System requirements
Functional:
List functional requirements for the system (Ask interviewer if stuck)...
- Users are able to add/delete/update tags
- Tags should support different data types
- Support search, filter
- Support tags recommendation (optional)
- Support tags normalization
Non-Functional:
- Highly available
- easy to scale
- High performance: ensure efficient fast retrieval and search
Capacity estimation
Estimate the scale of the system you are going to design...
Assume that there are 10,000 DAU, each user has 500 tags on average. So there are 500*10,000 = 5,000,000 tags in total
Assume that each user add 5 tags per day, then there are 50,000 tags added per day
if 20% of the datas added per day are updated, then 50,000 * 20% = 10,000 updates per day;
Assume each tag is 80 bytes, then we need at least 5,000,000 * 80 bytes = 0.4 Gb
API design
Define what APIs are expected from the system...
RESTful APIs:
@Create
void create(String tagName)
@Batch_Create
void batchCreate(List<String> tagNames)
@Update
void update(Tag tag)
@Delete
void delete(Tag tag)
@Get
List<Tag> getTags()
List<Tag> recommendTags(String input)
List<Tag> normalizeTags(List<Tag> tags)
List<Entity> searchByTags(List<Tag> tags)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Data model:
Tag {
id: int (autoincrement),
name: varchar,
metadata: varchar
}
TagCategory {
id: int,
name: varchar,
}
TagToCategoryMapping {
tagId: int,
categoryId: int
}
Choice of Database:
- Relational database (eg, postgreSQL)
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
graph TD; A[User Interface] -- HTTP Requests --> B[API Gateway]; B -- Queries --> C[Tagging Service]; C -- Manages Data --> D[Database]; C -- Utilizes --> E[NLP Module]; E -- Provides Suggestions --> C; C -- Sends Notifications --> F[Message Broker]; F -- Handles Messaging --> C; B -- Forwards Requests --> G[Microservice 1]; G -- Interacts with --> D; B -- Forwards Requests --> H[Microservice 2]; H -- Interacts with --> D;
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
- How to efficiently store and retrieve tags in db? (database schema design)
We store the tags in the relational database.
Indexing:
- index the tag name with tagId (clustered index)
- index the itemId in the mapping table for fast items retrieval
Data Storage mechanism:
- Store metadata as JSON
- How to implement tag suggestions?
Using typeahead search.
Create a table to store popular or trending tags for generating suggestions:
tagFrequency {
id: int -> primary key,
tagName: varchar,
frequency: int
}
We need a tagSuggestionService to do the recommendation.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Trade-offs:
- Consistency vs Performance: Consistent normalization may require more computational resources, potentially affecting system performance
- Scalability vs flexibility
Tech Choices:
- Database:
- Relational db vs NoSQL: NoSQL is good at handling unstructured data, useful for storing diver tag structures
- Use microservices architecture: could enhance scalability and maintainability
- Utilize NLP for tag suggestions based on content analysis
Could leverage message broker such as Kafka to do the real-time updates for tag suggestions
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
SOP(single point of failure): do data replications. use master-slave servers
Utilize in-memory data stores like Redis for caching frequently accessed tags and metadata. Implement indexing on tag fields to speed up search operations. Utilize search technologies like Elasticsearch for efficient full-text search capabilities. Employ sharding techniques to distribute data across multiple nodes and balance the load effectively.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
得分: 8