Chapter 3 Design and Implementation
3.2 Implementation and Integration of the Approaches
3.2.1 Implementation of Three Mechanisms
As mentioned before, we implement TempFFS by modifying an existing RAM file system called Ram-FS (Resizable simple ram File System), and then insert it between VFS and file system implementations. We intercept the invocation of the VFS file create function (i.e., vfs_create()) and direct the invocation to the file create function in RamFS. After the creation, operations on the file will use the file operations in RamFS because the file now is placed in RamFS not in file system, such as Ext2.
Upon memory pressure or the size of file is over the threshold, we transform files in Ram-FS into the file system. The detail steps of transform are shown in Section 3.1.1.
After transforming, the file is belong to the file system, we only use the original file operations in file system to access it.
To implement the intelligent write-back policy relo (recency and location), we record the information of a page in a zone information table, which is shown in Figure 3.3. When the page becomes dirty and inactive, we record this dirty page into the corresponding zone information table. To achieve this, we invoke a function add_to_zone() in two situation. First, when it calls the function that marks the page dirty (i.e., set_page_dirty()), we check the active flag (PG_active) of the page. If this dirty page is in the inactive list (i.e., PG_active flag is not set), we invoke a function add_to_zone() for set_page_dirty() function. Second, when it calls the function that moves the page form the active list into the inactive list (i.e., add_page_to_inactive_list()), we check whether the page is dirty or not. If this page is dirty, we invoke a function add_to_zone() for add_page_to_inactive_list() function.
The pseudo code of the add_to_zone() function is shown in Figure 3.7(a). First, we get the block number of the page, and calculate the zone corresponding to this page (i.e., block number of page divides block number per zone). Second, numbers of dirty pages in zone information table increases by one. If the former block and latter block of this page do not record in zone information table, it means that this page stands alone. If this page is recorded in the zone information table, it produces a new segment (contiguous dirty pages). Therefore, segment numbers of zone information table increases by one. Lastly, we calculate the ASL (average segment length) as a basis of selecting the zone to write back.
When the dirty page is clean or active, we also need to remove the information of the page from the zone information table. To achieve this, we invoke a function remove_from_zone() in two situation. First, when it calls the function that clears dirty of the page (i.e., clear_page_dirty_for_io()), we invoke a function remove_from_zone() for each call of the clear_page_dirty_for_io() function. Second, when it calls the function that moves the page form the inactive list into the active list (i.e., add_page_to_active_list()), we invoke a function remove_from_zone() for each call of the add_page_to_active_list() function. The pseudo code of the remove_from_zone() function is shown in Figure 3.7(b). First, we also get the block number of this page to calculate the corresponding zone. Second, the dirty page numbers decreases by one. If the former block and latter block of this page do not record in zone information table, when it removes this page, it reduces a segment.
Therefore, segment numbers of zone information table decreases by one. If the former block and latter block of this page both record in zone information table, when it removes this page, the original segment divide into two segments. Therefore, segment numbers of zone information table increases by one. Lastly, it recalculates the ASL of
In Linux, when the system writes back the data in memory into the disk, the system wakes up the Pdflush thread to call background_writeout(). In background_writeout(), we change the original function (writeback_inodes()) into writeback_segment_zone() which selects a zone to write back. It is shown in Figure 3.7(c). First, we select the zone with largest average segment length. Second, we traverse all page lists recorded in zone information table to write back all pages. Lastly, if the number of written-back pages is greater than or equal to the number of demand for write-back pages, it finishes. If not, it selects the next zone to write back.
Figure 3.7 Pseudo Code of Segment/Zone Algorithm
/* Adding a page to the zone info. table */
add_to_zone( page ){
get page’s block number;
zone_number = page_block_number / pages_per_zone;
zone_information_table[zone number].dirty_pages++;
if (a new segment is created for this page) zone.segment++;
ASL = zone. dirty_pages / zone.segment }
(a)
/* removing a page from the zone info. table */
remove_from_zone( page ){
get page’s block number;
zone_number = page_block_number / pages_per_zone;
zone_information_table[zone number]. dirty_pages --;
if (a segment is deleted due to the removal of the page) zone.segment--;
if (a new segment is produced due to the removal of the page) zone.segment++;
ASL = zone. dirty_pages / zone.segment }
(b) /* Segment-zone writeback algorithm*/
writeback_segment_zone(writeback_control wbc){
begin : select the zone with largest average segment length;
traverse the all segment’s page list of the zone to writeback all pages;
if (number of pages written back >= wbc.nr_to_write) finish;
else
writeback_segment_zone( wbc );
goto begin;
}
(c)
As mentioned in Section 3.1.3, we provide a transaction API for file systems the require transaction support. Figure 3.8 shows the pseudo code of the major function implementations, transaction_start(), transaction_stop() and duplicate(), in the API.
Transaction_start() firstly creates a transaction data structure trans, links this trans into current process. Lastly, it inserts this trans into the global transaction list.
Transaction_stop() firstly gets a transaction data structure trans from current process, clears the pointer of current process that points to this tans. Lastly, it frees all duplicated data (replica) of this trans, and removes this trans form global transaction list. Duplicate() firstly also gets a transaction data structure trans from current process, creates a replica data structure to store the duplicated data, and inserts this replica into corresponding trans. Lastly, it duplicates the data into this replica.
To demonstrate the effectiveness of the API, we augmented the ext2 file system to leverage the API. We inserted the function pair transaction_start() and transaction_stop() in all ext2 file system operations such as ext2_create(), ext2_link(), ext2_mkdir(), ext2_unlink(), ext2_rmdir(), etc. Moreover, we inserted the invocation of duplicate() in functions that modify metadata and data such as ext2_new_inode(), ext2_free_inode(), ext2_new_block() and ext2_free_blocks(), etc. We ensure the invocation of the duplicate() function is right before the modification of metadata or data.
Figure 3.8 Implementation of the Transaction API (Pseudo Code)
/* Creating a transaction and inserting it to the transaction list */
transaction_start( ){
create a transaction data structure;
link this trans into current process;
insert this trans into transaction list;
}
(a)
/* removing a transaction from the transaction list */
transaction_stop( ){
get a trans from current->journal_info; // journalling filesystem info set current->journal_info as NULL;
free all replicas in this trans;
free this trans from transaction list;
}
(b)
/* duplicate metadata or data*/
duplicate(buffer_head *bh, size_t size){
get a trans from current process;
create a replica data structure;
insert this replica into replica list of trans;
copy data from buffer_head *bh to this replica;
}
(c)