设计内存文件系统

难度: Hard

Submission

运行时间: 31 ms

内存: 16.5 MB

class TrieNode:    
    def __init__(self):
        self.child = collections.defaultdict(TrieNode)
        self.contents = "" # save file's contents eg hello in "d" file of "/a/b/c/d"

class FileSystem:
    def __init__(self):
        self.root = TrieNode()
               
    def _insert(self, path, content=""):
        cur = self.root
        for c in path.split("/")[1:]:
            cur = cur.child[c]
        cur.contents += content # save "hello" in d or it can be "". 1 file only store 1 content. ''means directory
            
    def _search(self, path):    # search for path and return node at the end of path
        cur = self.root
        for c in path.split("/")[1:]:
            if c in cur.child:
                cur = cur.child[c]
        return cur # correspond to the last file name or dir name
    
    # 给定路径字符串，若对应一个目录，则输出其中包含的目录和文件（字典序）；若对应一个文件，则只输出该文件名
    def ls(self, path: str) -> List[str]:
        node = self._search(path)
        if node.contents: # if node corresponds to a file
            # return [node.contents] # wrong
            return [path.split('/')[-1]]    # 这里只输出一个[file name]
        else: # node is a folder. node.child.keys() is the next level files within the folder
            # the **list of file and directory names** in this directory.
            return sorted(node.child.keys())    # lexicographic order
        
    # 创建目录，若目录不存在，则递归创建缺少的目录
    def mkdir(self, path: str) -> None:
        self._insert(path)
        
    # 在文件中追加内容，若文件不存在，则新建
    def addContentToFile(self, filePath: str, content: str) -> None:
        self._insert(filePath, content)
        
    # 从文件中读取内容并返回
    def readContentFromFile(self, filePath: str) -> str:
        node = self._search(filePath)
        return node.contents


# [('FileSystem', [], None), 
# ('mkdir', ['/goowmfn'], None),
#  ('ls', ['/goowmfn'], []), 
#  ('ls', ['/'], ['goowmfn']), 
#  ('mkdir', ['/z'], None), 
#  ('ls', ['/'], ['goowmfn', 'z']),
#   ('ls', ['/'], ['goowmfn', 'z']), 
#   ('addContentToFile',['/goowmfn/c', 'shetopcy'], None), 
#   ('ls', ['/z'], []), 
#   ('ls', ['/goowmfn/c'], ['c']), 
#   ('ls', ['/goowmfn'], ['c'])]

"""
Trie solution: search/add/insert都是O(L)的时间复杂度，L是filePath的长度. 
像这种method里面函数很多的情况，不需要额外写一些helper funciton, 最好直接把Trie的实现在已经给定的class里面
path = "/a/b"
print(path.split("/")) -> ['', 'a', 'b']


Use Trie!

Initiate a root hashmap (work as a trie).
For folder name in the path:
The folder name is the key of the current level hashmap, the value is another hashmap.
For operation "addContentToFile" and "readContentFile":
The last variable in the path is the file name, the file name is the key of the current level hashmap, the value is a string.

Note that for operation "ls":
the last variable in the path could be a folder or a filename, so we need to do a check.

Time:
Operation "ls":
O(n + klgk), n is the length of the input string, k is the length of keys in the last folder
Operation "mkdir":
O(n)
Operation "addContentToFile" and "readContentFromFile":
O(n)

Space:
O(n)
https://leetcode.com/problems/design-in-memory-file-system/discuss/2013879/Clearly-Explained-Solution-Using-Trie-in-Python-Easy-to-Understand!
"""


# Your FileSystem object will be instantiated and called as such:
# obj = FileSystem()
# param_1 = obj.ls(path)
# obj.mkdir(path)
# obj.addContentToFile(filePath,content)
# param_4 = obj.readContentFromFile(filePath)

"""
Design an in-memory file system to simulate the following functions:

ls: Given a path in string format. If it is a file path, return a list that only contains this file’s name. If it is a directory path, return the list of file and directory names in this directory. Your output (file and directory names together) should in lexicographic order.

mkdir: Given a directory path that does not exist, you should make a new directory according to the path. If the middle directories in the path don’t exist either, you should create them as well. This function has void return type.

addContentToFile: Given a file path and file content in string format. If the file doesn’t exist, you need to create that file containing given content. If the file already exists, you need to append given content to original content. This function has void return type.

readContentFromFile: Given a file path, return its content in string format.

https://www.jasonjson.com/archivers/design-in-memory-file-system.html
https://leetcode.com/problems/design-in-memory-file-system/discuss/103359/Python-Straightforward-with-Explanation

http://bookshadow.com/weblog/2017/05/21/leetcode-design-in-memory-file-system/
https://grandyang.com/leetcode/588/

"""



# Your FileSystem object will be instantiated and called as such:
# obj = FileSystem()
# param_1 = obj.ls(path)
# obj.mkdir(path)
# obj.addContentToFile(filePath,content)
# param_4 = obj.readContentFromFile(filePath)

Explain

这道题使用前缀树 (Trie) 来实现一个内存文件系统。思路是： 1. 用一个 TrieNode 类来表示前缀树的节点，包含一个 defaultdict 类型的 child 字段存储子节点，一个 contents 字段存储文件内容。 2. FileSystem 类表示整个文件系统，包含一个 TrieNode 类型的 root 字段作为前缀树的根节点。 3. 对于不同的操作： - ls：从根节点开始，沿着给定路径逐层搜索对应的节点。如果该节点是文件，则返回文件名；如果是目录，则返回目录下的所有文件和子目录名，并按字典序排序。 - mkdir：从根节点开始，沿着给定路径逐层插入对应的节点，缺失的中间目录也会被创建。 - addContentToFile：类似 mkdir，但如果最后的节点已经存在则追加内容，否则创建新文件并写入内容。 - readContentFromFile：从根节点开始搜索给定路径对应的节点，返回其 contents 字段的内容。

时间复杂度: ls: O(L + klogk)，其中 L 是路径长度，k 是目录下子节点数量 mkdir、addContentToFile、readContentFromFile: O(L)，其中 L 是路径长度

空间复杂度: O(N)，其中 N 是文件系统中的总节点数

class TrieNode:    
    def __init__(self):
        self.child = collections.defaultdict(TrieNode)  # 存储子节点的字典
        self.contents = "" # 存储文件内容，空字符串表示目录

class FileSystem:
    def __init__(self):
        self.root = TrieNode()  # 文件系统的根节点
               
    def _insert(self, path, content=""):
        cur = self.root
        for c in path.split("/")[1:]:  # 沿路径逐层插入节点
            cur = cur.child[c]
        cur.contents += content # 更新文件内容
            
    def _search(self, path):   # 搜索给定路径对应的节点 
        cur = self.root
        for c in path.split("/")[1:]:
            if c in cur.child:
                cur = cur.child[c]
        return cur # 返回路径最后对应的节点
    
    def ls(self, path: str) -> List[str]:  
        node = self._search(path)
        if node.contents: # 如果节点是文件，则返回文件名
            return [path.split('/')[-1]]    
        else: # 如果是目录，返回目录下的所有文件和子目录名
            return sorted(node.child.keys())    # 按字典序排序
        
    def mkdir(self, path: str) -> None:  # 创建目录
        self._insert(path)
        
    def addContentToFile(self, filePath: str, content: str) -> None:  # 追加或创建文件内容
        self._insert(filePath, content)
        
    def readContentFromFile(self, filePath: str) -> str:  # 读取文件内容
        node = self._search(filePath)
        return node.contents

Explore

前缀树（Trie）适合用于存储和检索字符串数据集中的键，特别是当这些键具有共同前缀时。在文件系统中，文件路径可以视为由多个部分组成的字符串，且这些部分常常在不同路径中重复出现。使用前缀树可以高效地共享和管理这些共同前缀，从而节省空间并提高检索效率。相比之下，哈希表虽然检索快速，但不支持按前缀有序地枚举键，且每个键都需要独立存储，不利于路径的共享存储；平衡树可以保持键的有序性，但在处理具有公共前缀的长路径时，效率不如前缀树。

`ls`操作在Unix和类Unix系统中通常用于列出目录内容，包括子目录和文件名，而不是显示文件内容。模仿这种行为，当`ls`操作指向文件时返回文件名而非内容，可以保持与传统文件系统的操作习惯一致，并且使得`ls`操作的输出保持一致性（总是返回文件或目录名列表）。这样设计也有助于区分查看目录内容和读取文件内容的操作，提高用户使用的直观性。

在执行`mkdir`操作时，如果目录路径中的某个部分已经存在，该操作不会对已存在的目录进行重新创建或覆盖，而是简单地忽略这一部分，继续在该路径下创建不存在的目录。这种方式避免了数据的不必要覆盖，同时也反映了Unix和类Unix系统中`mkdir`命令的常见行为，即可以安全地重复执行而不会导致错误或数据损失。

在`addContentToFile`方法中，如果文件所在的路径中包含尚未创建的目录，这些目录会被自动创建。这种设计使得文件的添加操作更为灵活和容易，用户无需事先创建所有必需的目录结构即可直接添加或更新文件内容。这避免了操作的复杂性并增强了用户体验。