Write Issues - Design Issues - AshFS: 一個支援離線操作的輕量化行動檔案系統

Chapter 3 Design Issues

3.2 Write Issues

Figure 2: Connection status and timeout

3.2 Write Issues

We have mentioned that all file operations will be redirected to local disk first, it’s also true for the writing. AshFS uses asynchronous writeback mechanism at all bandwidth levels, which means that AshFS will not propagate the update to server when the writing has just finished. This can reduce some unnecessary communications between client and server and utilize the limited bandwidth more efficiently.

Some of MAFS’s design aspects are adopted in AshFS: we utilized asynchronous writeback at all bandwidth levels and defined priorities for each kind of AshFS’s file operations. However we didn’t intend to adopt MAFS’s invalidation-based propagation

scheme, because it needs server-side changes and consumes more resources. Since AshFS is designed for personal use, we abandoned this propagation scheme which is useful in maintaining file consistency in a multi-client system. Besides using priorities like MAFS to make bandwidth-usage more efficient, we also focused on reducing client-server network traffic when transferring files.

Please look at Figure 3. It illustrates how a write operation progresses. Here we omit the procedures of connecting to the server. In fact, when there is a write miss, AshFS will download it from the server first, and then remaining operations are redirect to the local files.

Just similar to the read operation, before writing to the local file, the validity should be verified to prove consistency. If something causes write failure (like disk full or memory leak), the write process will aborts. On the contrary, if it successfully finished, AshFS will record this write operation and push it into the write queue. Then AshFS will create another thread to start Upload manager.

Figure 3: Write flow

3.2.1 Upload manager

Upload manager is designed for uploading all the updates in write queue to the server.

Figure 4 shows how Upload manager synchronizes these modified items to the server and some important mechanisms. To be briefly, it is responsible for all the updating process, including doing optimization, checking network status, resolving conflicts, avoiding read-write contention, labeling time stamp, and marking the upload status. We’ll discuss these processes soon. When an update has been uploaded to the server successfully, write manager will remove this item from the write queue. It is deserved to be mentioned that before processing each queue item, Upload manager will check the network status first. If it is in Disconnected or Weak state, Network manager will cease current upload process until it changes back to Strong state. Because when the connection is weak, uploading will exhaust the poor and precious bandwidth; therefore cause a high latency of AshFS.

Figure 4: Upload manager

3.2.2 Write optimization and read-write contention

Item in queue can be optimized because that some file operations could be canceled out.

For example, if a user creates a new file, and after writing this file for several times, he finally removes the file. Operations to this file can all be canceled and it’s not necessary to upload it ever.

AshFS write updates asynchronously to the server. When upload manager propagates these updates in background, user can still use AshFS to do other read/write operations in foreground. So it’s usual that there are both read traffic and write traffic at the same time. This situation is called read-write contention [5]. However, most of update processes are deferrable, and they contest available bandwidth with foreground activities. Especially when the available bandwidth is poor, such competition will severely slow down the operation speed of file system. Using a network file system over such a low-bandwidth environment must consume less bandwidth, and avoid monopolizing network links in use for other purposes [26].

To smooth user’s file system operations without being interfered by minor tasks, we divided AshFS’s network operations into several classes, and sorted them based on their importance to the most users. Table 1 is the priorities of AshFS’s network operations. The most important operation is getting the metadata. Because it can let the user know quickly what contents an uncached directory has, or how large of an uncached file is. Fetching a file in background, which has been discussed in part B, we gave it a priority number 3, because the user may want to read it as soon as possible. Synchronization messages and upload are last two in our priority table. Although it may cause higher file inconsistency, it’s acceptable to do such a compromise.

AshFS operations Descriptions

Get metadata Directory contents, file attributes, file type.., etc.

Fetch file Ordinary fetch

Background fetch Fetch large file

Synchronize Check time stamps

Upload Upload file to the server

Table 2: Priorities of network accesses

3.2.3 Conflict detection and resolution

Because the server doesn’t positively notify its clients which files were changed, the consistency can only be checked by the client itself. There are two kinds of file inconsistency:

one is Stale, and the other is Conflict, which we are going to discuss in this section. “Stale”

means that the file on the server has been modified before the client writes it, so the cached file is out-of-date. We have talked about this in 3.1.2. Unfortunately, when Upload manager are trying to propagate a modified file to the server, and it finds that this file has already been changed on the server, then conflict occurs. It probably means that there are two clients have modified the same file simultaneously.

Let’s explain how AshFS detects a conflict here. We compare the time stamps. There are three kinds of time stamps which will be tracked by AshFS:

z Tm: time stamp of the file recorded by AshFS.

z T_s:time stamp of server’s file.

z Tc :time stamp of the file in AshFS’s cache now.

Case 1: T_c= T_m < T_s

Î Server’s file has been modified.

Î Our client doesn’t modify (use) it yet.

Î Stale.

Case 2: Tm < Ts and Tm < Tc

Î Server’s file has been modified.

Î Our client has also modified it but doesn’t upload it yet.

Î Conflict.

In short, when these three time stamps are all unequal, we could say there is a conflict.

To resolve a conflict, AshFS applies the “no lost update” principle [3]. It keeps both server’s and client’s new file, but rename client’s one. The new name of file is similar with old one, with a suffix and a time label appended to it. For example, the original file name is “hello.txt”, and its conflict copy could be renamed as “hello-conflict-20080521.txt”. So user can easily identify it as a conflict file. This principle may be a little conservative, but preserving both sides’ files let user can decide by himself / herself, could be the safest one. Resolution should be automatic and transparent without user intervention. In our implementation, users will not receive messages about conflicts. But they can perceive the existence of conflict files by the filename or the upload log.

在文檔中 AshFS: 一個支援離線操作的輕量化行動檔案系統 (頁 25-30)