设计一个食品外卖服务

难度: hard

创建一个高效的食品外卖服务,此服务能够连接用户和餐厅,并将订单送达用户家门口。设计直观的界面,方便用户浏览菜单、下单以及实时跟踪送货情况。实现如订单个性化定制、安全的支付处理和司机跟踪等功能,确保为用户和餐厅提供无缝且满意的外卖体验。示例包括Doordash和Uber Eats。

Solution

System requirements

Functional:

For Customers:

  • Customers should be able to search for restaurants by cuisine type, menu items, and more.
  • They can build a cart, add items to it, and place their order.
  • Once an order is placed, they'll receive updates on its status.
  • They can track their order directly within the app.
  • Orders can be canceled if needed.
  • Payment for orders should be straightforward and secure.
  • Customers can create or update their account and contact details.

For Restaurants:

  • Restaurants can set up their profile, add or update their menu items, and upload photos.
  • They'll get notifications about incoming orders, which delivery driver is assigned, and can update the order's progress.
  • If a restaurant closes or stops taking online orders, they can easily exit the platform.

For DoorDash Drivers:

  • Drivers will receive alerts about available orders nearby, which they can choose to take.
  • They'll know when orders are ready for pickup.
  • They can communicate any issues during order pickup or delivery to the customer or restaurant.
  • Drivers can opt out of the service if they decide to stop working.

Non-Functional:

  • Latency: The system must be fast, especially the search and ordering processes, to keep users satisfied. Changes in restaurant or menu data should appear quickly but can have some delay.
  • Consistency: New restaurant or menu data may not need to be shown immediately, but once an order is placed, everyone involved—customer, restaurant, and driver—should see consistent order details.
  • Availability: The system must be highly reliable to ensure that it's always operational for customers, restaurants, and drivers. No one wants the system to crash, especially not the restaurants.
  • High Throughput: The system should handle large volumes of users and orders smoothly, even during peak times.

Capacity estimation

Let's imagine we're serving customers across 10 million area codes, and we expect this number to increase.

On average, there could be 100 restaurants per area code, and each restaurant might offer about 15 different dishes. This results in a total of 15 billion dish records.

With 20 million customers in this system:

If each customer places two orders daily, that totals 40 million orders per day.

Order peaks usually fluctuate throughout the week, with weekends typically busier than weekdays. The busiest times are likely around lunch and dinner in each region.

From a system perspective, searching for menus and restaurants will primarily involve reading data, while placing orders will require more data writing. After an order is delivered and eaten, the likelihood of customers revisiting their past orders is minimal.

API design

Ordering Food:

  • Search: Users can search using a variety of terms.
  • Add to Cart: Users can select menu items to add to their shopping cart.
  • Order: Customers can place their order from the items in their cart.
  • Status: Users can check the status of their order using the order ID.
  • Retrieve Order: Customers can look up details of an existing order.
  • Cancel Order: Allows customers to cancel an order using the order ID.

Managing Profiles:

  • Create Profile: Customers can set up a new profile.
  • Update Profile: Allows users to make changes to their profile details.
  • Update Address: Users can change their address details.
  • Update Payment Method: Customers can change or update their method of payment.

Database design

The selection of a database often hinges on various factors such as the volume of data to be handled, scalability, ease of partitioning, and replication capabilities. Application managers sometimes use a combination of different databases to meet specific requirements. For needs involving ACID principles (Atomicity, Consistency, Isolation, Durability), a relational database is generally preferred over a NoSQL alternative.

Both NoSQL and relational databases offer unique benefits and limitations, and the choice between them should be carefully considered based on the required functionalities.

Given the large volumes of data anticipated, including restaurant details, menu descriptions, user and delivery personnel information, a NoSQL or columnar database like Cassandra might be appropriate. This is especially relevant if the data structure varies significantly across restaurants, which might make it challenging to conform to a fixed relational schema.

Images related to restaurants and menu items are ideally stored in an object storage service like Amazon S3, due to its efficiency in handling large amounts of unstructured data.

Since ordering is a transactional process, it’s best managed using a transaction-capable relational database such as Oracle, MySQL, or Postgres.

High-level design

UI Client Overview

The application will be accessible on various devices including mobiles, tablets, and web browsers. Depending on the user's role—whether they are a customer, restaurant staff, a delivery driver (doordasher), or an admin—the interface they interact with will vary. Each version of the interface is tailored to its specific user, connecting them to the appropriate services. For instance, the search functionality is managed by the Restaurant Search Service, while orders are processed by the Ordering Service.

Search Ecosystem

A key feature of our system is the robust search functionality which allows customers to explore menu items, cuisines, and restaurants. This entry point is vital for users who don't have a predetermined choice and rely on our personalized recommendations, which are informed by their past searches and orders. To handle the heavy reading load of this search process, we might integrate established technologies like Elasticsearch or Apache Solr, which excel in fast data retrieval and are based on Apache Lucene.

For updating search data, we'll employ a queue to manage asynchronous updates. When a restaurant's profile or menu is updated in our database, these changes are queued and then processed by a data indexer. This indexer formats the new data correctly and updates the search cluster to reflect changes. The Restaurant Search Service then uses this data to respond to user queries, potentially incorporating geospatial searches to show users nearby dining options.

Ordering Service

This service manages the complete ordering process from menu selection and cart management to payment processing, which is handled through an external payment gateway. It also records all transaction details in an Orders database due to the transactional nature of ordering. Additionally, customers can view their complete order history and cancel orders if necessary.

Order Fulfillment Service

This service ensures the smooth operation from when the restaurant accepts an order to when the order is ready for pickup. It communicates any changes or delays to the customer and notifies the doordasher when the order is ready. Both customers and doordashers can check the status of the orders through this service.

User Profile Management & Preferences Service

This service manages the creation and updates of profiles for all system users including customers, restaurant staff, doordashers, and admins. Each user role has specific preferences and settings, which are managed here, allowing for personalized interactions with the system according to their role and needs.

Doordasher Dispatch Service

This service caters specifically to delivery drivers. It allows doordashers to view available pickup orders, accept them, and review past orders. It also facilitates communication between the doordasher, the customer, and the restaurant in case of issues during the pickup or delivery process.

Restaurant Profile Service

Responsible for all restaurant-related data, this service allows restaurants to onboard, update their profiles, manage menus, and upload images. It also handles the financial transactions and updates for each restaurant, ensuring they are paid for the orders processed.

External Payment Gateway

This component integrates with well-known payment providers like PayPal, Amazon Payments, ApplePay, and major credit card companies to process payments securely and efficiently at the moment an order is confirmed.

Notification Service

This critical service manages communications across the system, sending tailored notifications to customers, restaurants, and doordashers based on their preferences. Notifications might be delivered via push notifications, emails, texts, or in-app alerts, depending on the user's settings.

This holistic component design ensures that all parts of the service ecosystem work together seamlessly to provide a robust and user-friendly experience.

flowchart TD
    A[Customer] -->  |Places Order| B(Ordering Service)
    B -->  |Processes Order| C{Order Fulfillment}
    C -->  |Notifies Customer| D(Notification Service)
    C -->  |Coordinates Delivery| E(Doordasher Dispatch)

    %% Additional Detail %%
    subgraph Services
        B(Ordering Service) --> |REST APIs| E(Inventory Management)
        B(Ordering Service) --> |Websockets| F(Payment Gateway)
        C{Order Fulfillment} --> |HTTP Request| G(Order Preparation)
        E(Doordasher Dispatch) --> |Geolocation API| H(Driver Tracking Service)
    end

Request flows

Let's clarify some terms first:

  • Ordering Service: OS
  • Order Fulfillment Service: OFS
  • Notification Service: NS
  • Doordasher Dispatch Service: DDS
  • Message: MSG

Here’s a breakdown of the order processing workflow after a customer places an order using either a mobile or web client through the OS:

  1. Order Placement:
  • OS sends a MSG (#1) to the queue signaling that an order has been placed, alerting downstream services to begin their processes.
  • NS picks up MSG (#1) and informs both the restaurant and the customer that an order is placed and awaiting acceptance.
  1. Order Acceptance:
  • Upon acceptance by the restaurant via OFS, a MSG (#2) is sent to the queue confirming the order's acceptance.
  • DDS reads MSG (#2) and dispatches a local-specific MSG (#3) to the queue. DDS might also automatically assign the order to an available doordasher, sending out a corresponding message.
  • NS notifies the customer that the restaurant has accepted the order using MSG (#2).
  • NS alerts nearby doordashers about the new order using MSG (#3) for potential pickup.
  1. Doordasher Assignment:
  • A doordasher accepts the order through DDS, which then sends a MSG (#4) to the queue regarding the doordasher’s assignment.
  • NS uses MSG (#4) to update both the customer and the restaurant about which doordasher will handle the delivery.
  1. Order Ready for Pickup:
  • When the food is ready, the restaurant updates the order's status via OFS, which issues a MSG (#5) indicating readiness for pickup.
  • NS picks up MSG (#5) and notifies both the assigned doordasher and the customer that the food is ready to be picked up.
  1. Order Pickup:
  • After the doordasher picks up the order, the restaurant staff updates the order status using OFS, which then sends a MSG (#6) noting that the order has been picked up.
  • NS reads MSG (#6) and informs the customer that the order is on its way.
  1. Delivery Completion:
  • The doordasher delivers the food and marks the order as completed. DDS celebrates this with a MSG (#7).
  • OFS reads MSG (#7) and updates the order status to completed.
  • NS reads MSG (#7) and notifies both the restaurant and the customer that the food has been delivered.

Detailed component design

The system architecture is designed around a microservices model, utilizing the publisher-subscriber pattern with a queuing technology such as Kafka, RabbitMQ, ActiveMQ, Amazon SNS, or Amazon MQ. In this setup, each microservice communicates by publishing messages to and subscribing from queues, channels, or topics. This decouples the services from one another, meaning that microservice A does not need to know about microservice B’s endpoint when it sends out a message. Thus, publishers are unaware of the subscribers, and vice versa. The Pub/Sub system acts as a middleman, facilitating communication between all involved services.

Furthermore, each microservice should operate with its own database, adhering to the database-per-service principle, which prevents services from accessing each other's databases directly. This avoids the traditional monolithic architecture where a single large schema contains all tables. Instead, microservices architecture encourages functional partitioning, assigning specific tables or groups of tables to particular microservices. This method requires thoughtful consideration, especially when integrating both relational and non-relational databases.

Data Partitioning

As data volumes grow, storing all information in a single database instance becomes impractical. Restaurant data, for instance, can be partitioned based on different criteria such as area code, restaurant ID, or menu items. Each partitioning approach offers unique advantages and potential drawbacks, demanding careful planning to optimize performance and manage any possible negative impacts.

graph LR
    A[Restaurant Data] --> |Partitioning by Area Code| B[Partition 1]
    A --> |Partitioning by Restaurant ID| C[Partition 2]
    A --> |Partitioning by Menu Items| D[Partition 3]

Caching

To enhance search response times, frequently ordered or searched items, as well as images of restaurants and dishes, can be stored in a distributed cache. This allows the Restaurant Search Service to quickly access and suggest popular items without continuously querying the main search infrastructure. Technologies like Redis, Hazelcast, and Memcached are commonly used for caching, employing strategies like Least Recently Used (LRU) or Least Frequently Used (LFU) for managing cache eviction.

While a Content Delivery Network (CDN) could be used to distribute content based on geographic location, it may be excessive for our current needs.

graph TD
    A[Main Search Infrastructure] -- Cache -->  B[Redis]

Security

To secure data transmission, all communication between web and mobile clients is encrypted using HTTPS/SSL-TLS. For authorization and token management, OAuth 2.0 is implemented. If using Kafka, additional security can be configured with SASL and SSL per topic.

Load Balancing

Multiple instances of each service are maintained to handle varying loads, with load balancers placed in front of services to distribute requests efficiently using methods such as the Least Connection or Round Robin. This setup prevents any single instance from being overwhelmed and helps improve overall response times. In environments like Kafka, the queue itself manages load balancing by distributing partitions among consumers within a consumer group, ensuring each consumer handles a fair share of the load.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Replication and Fault Tolerance

Ensuring system reliability involves identifying potential points of failure and implementing backup solutions for each critical component. It’s crucial that each service in our architecture is designed to be horizontally scalable. This includes having multiple nodes for our NoSQL databases and search systems, as well as partitioning and replication capabilities within our queuing systems.

Every component should be capable of being scaled independently to manage specific demands. For example, autoscaling should be activated to automatically increase the number of instances during peak load times, ensuring smooth operation under varying loads.

Moreover, should any node or queue partition fail, alternative instances need to be ready to immediately take over their responsibilities. The system should also support self-healing processes where failed nodes can perform cleanup operations and restart without manual intervention, maintaining continuous service availability.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?


得分: 9