CS307&CS356: Operating Systems

(1)

CS307&CS356: Operating Systems

Dept. of Computer Science & Engineering

Chentao Wu

(2)

Download lectures

• ftp://public.sjtu.edu.cn

• User: wuct

• Password: wuct123456

• http://www.cs.sjtu.edu.cn/~wuct/os/

(3)

Chapter 14: File System

Implementation

(4)

Chapter 14: File System Implementation



File-System Structure



File-System Operations



Directory Implementation



Allocation Methods



Free-Space Management



Efficiency and Performance



Recovery



Example: WAFL File System

(5)

Objectives



Describe the details of implementing local file systems and directory structures



Discuss block allocation and free-block algorithms and trade-offs



Explore file system efficiency and performance issues



Look at recovery from file system failures



Describe the WAFL file system as a concrete example

(6)

File-System Structure

 File structure

 Logical storage unit

 Collection of related information

 File system resides on secondary storage (disks)

 Provided user interface to storage, mapping logical to physical

 Provides efficient and convenient access to disk by allowing data to be stored, located retrieved easily

 Disk provides in-place rewrite and random access

 I/O transfers performed in blocks of sectors (usually 512 bytes)

 File control block (FCB) – storage structure consisting of information about a file

(7)

Layered File System

(8)

File System Layers

 Device drivers manage I/O devices at the I/O control layer

 Given commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060” outputs low-level hardware specific commands to hardware controller

 Basic file system given command like “retrieve block 123” translates to device driver

 Also manages memory buffers and caches (allocation, freeing, replacement)

 Buffers hold data in transit

 Caches hold frequently used data

 File organization module understands files, logical address, and physical blocks

(9)

File System Layers (Cont.)



Logical file system manages metadata information



Translates file name into file number, file handle, location by maintaining file control blocks (inodes in UNIX)



Directory management



Protection



Layering useful for reducing complexity and redundancy, but adds overhead and can decrease performanceTranslates file name into file number, file handle, location by maintaining file control blocks (inodes in UNIX)



Logical layers can be implemented by any coding method

according to OS designer

(10)

File System Layers (Cont.)



Many file systems, sometimes many within an operating system



Each with its own format (CD-ROM is ISO 9660; Unix has UFS, FFS; Windows has FAT, FAT32, NTFS as well as floppy, CD, DVD Blu-ray, Linux has more than 130 types, with extended file system ext3 and ext4 leading; plus distributed file systems, etc.)



New ones still arriving – ZFS, GoogleFS, Oracle ASM,

FUSE

(11)

File-System Operations



We have system calls at the API level, but how do we implement their functions?



On-disk and in-memory structures



Boot control block contains info needed by system to boot OS from that volume



Needed if volume contains OS, usually first block of volume



Volume control block (superblock, master file table) contains volume details



Total # of blocks, # of free blocks, block size, free block pointers or array



Directory structure organizes the files



Names and inode numbers, master file table

(12)

File-System Implementation (Cont.)



Per-file File Control Block (FCB) contains many details about the file



typically inode number, permissions, size, dates



NFTS stores into in master file table using relational DB

structures

(13)

In-Memory File System Structures

 Mount table storing file system mounts, mount points, file system types

 system-wide open-file table contains a copy of the FCB of each file and other info

 per-process open-file table contains pointers to appropriate entries in system-wide open-file table as well as other info

 The following figure illustrates the necessary file system structures provided by the operating systems

 Figure 12-3(a) refers to opening a file

 Figure 12-3(b) refers to reading a file

 Plus buffers hold data blocks from secondary storage

 Open returns a file handle for subsequent use

 Data from read eventually copied to specified user process memory address

(14)

In-Memory File System Structures

(15)

Directory Implementation

 Linear list

of file names with pointer to the data blocks



Simple to program



Time-consuming to execute



Linear search time



Could keep ordered alphabetically via linked list or use B+ tree

 Hash Table

– linear list with hash data structure



Decreases directory search time



Collisions – situations where two file names hash to the same location



Only good if entries are fixed size, or use chained-

overflow method

(16)

Allocation Methods - Contiguous



An allocation method refers to how disk blocks are allocated for files:



Contiguous allocation – each file occupies set of contiguous blocks



Best performance in most cases



Simple – only starting location (block #) and length (number of blocks) are required



Problems include finding space for file, knowing file size, external fragmentation, need for

compaction off-line (downtime) or on-line

(17)

Contiguous Allocation

 Mapping from logical to physical

LA/512

Q

R Block to be accessed = Q + starting address

Displacement into block = R

(18)

Extent-Based Systems



Many newer file systems (i.e., Veritas File System) use a modified contiguous allocation scheme



Extent-based file systems allocate disk blocks in extents



An extent is a contiguous block of disks



Extents are allocated for file allocation



A file consists of one or more extents

(19)

Allocation Methods - Linked



Linked allocation – each file a linked list of blocks



File ends at nil pointer



No external fragmentation



Each block contains pointer to next block



No compaction, external fragmentation



Free space management system called when new block needed



Improve efficiency by clustering blocks into groups but increases internal fragmentation



Reliability can be a problem



Locating a block can take many I/Os and disk seeks

(20)

Allocation Methods – Linked (Cont.)



FAT (File Allocation Table) variation



Beginning of volume has table, indexed by block number



Much like a linked list, but faster on disk and cacheable



New block allocation simple

(21)

Linked Allocation



Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk

pointer block =



Mapping

Block to be accessed is the Qth block in the linked chain of blocks representing the file.

Displacement into block = R + 1 LA/511

Q R

(22)

Linked Allocation

(23)

File-Allocation Table

(24)

Allocation Methods - Indexed



Indexed allocation



Each file has its own index block(s) of pointers to its data blocks



Logical view

(25)

Example of Indexed Allocation

(26)

Indexed Allocation (Cont.)

 Need index table

 Random access

 Dynamic access without external fragmentation, but have overhead of index block

 Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512 bytes. We need only 1 block for index table

LA/512

Q R

Q = displacement into index table

(27)

Indexed Allocation – Mapping (Cont.)

 Mapping from logical to physical in a file of unbounded length (block size of 512 words)

 Linked scheme – Link blocks of index table (no limit on size)

LA / (512 x 511)

Q₁

R₁

Q₁ = block of index table R₁ is used as follows:

R₁ / 512

Q₂

R₂

Q₂ = displacement into block of index table R₂ displacement into block of file:

(28)

Indexed Allocation – Mapping (Cont.)

 Two-level index (4K blocks could store 1,024 four-byte pointers in outer index -> 1,048,567 data blocks and file size of up to 4GB)

LA / (512 x 512)

Q₁

R₁

Q₁ = displacement into outer-index R₁ is used as follows:

R₁ / 512

Q₂ R₂

(29)

Indexed Allocation – Mapping (Cont.)

(30)

Combined Scheme: UNIX UFS

4K bytes per block, 32-bit addresses

(31)

Performance

 Best method depends on file access type

 Contiguous great for sequential and random

 Linked good for sequential, not random

 Declare access type at creation -> select either contiguous or linked

 Indexed more complex

 Single block access could require 2 index block reads then data block read

 Clustering can help improve throughput, reduce CPU overhead

 For NVM, no disk head so different algorithms and optimizations needed

 Using old algorithm uses many CPU cycles trying to avoid non- existent head movement

 With NVM goal is to reduce CPU cycles and overall path needed for I/O

(32)

Performance (Cont.)



Adding instructions to the execution path to save one disk I/O is reasonable



Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000 MIPS



http://en.wikipedia.org/wiki/Instructions_per_second



Typical disk drive at 250 I/Os per second



159,000 MIPS / 250 = 630 million instructions during one disk I/O



Fast SSD drives provide 60,000 IOPS



159,000 MIPS / 60,000 = 2.65 millions instructions

during one disk I/O

(33)

Free-Space Management

 File system maintains free-space list to track available blocks/clusters

 (Using term “block” for simplicity)

 Bit vector or bit map (n blocks)

…

0 1 2 n-1

bit[i] =



1  block[i] free

0  block[i] occupied Block number calculation

(number of bits per word) * (number of 0-value words) + offset of first 1 bit

CPUs have instructions to return offset within word of first “1” bit

(34)

Free-Space Management (Cont.)

 Bit map requires extra space

 Example:

block size = 4KB = 2¹² bytes disk size = 2⁴⁰ bytes (1 terabyte) n = 2⁴⁰/2¹² = 2²⁸ bits (or 32MB)

if clusters of 4 blocks -> 8MB of memory

 Easy to get contiguous files

(35)

Linked Free Space List on Disk

 Linked list (free list)

 Cannot get contiguous space easily

 No waste of space

 No need to traverse the entire list (if # free blocks recorded)

(36)

Free-Space Management (Cont.)

 Grouping

 Modify linked list to store address of next n-1 free blocks in first free block, plus a pointer to next block that contains free-block- pointers (like this one)

 Counting

 Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or clustering

Keep address of first free block and count of following free blocks

Free space list then has entries containing addresses and counts

(37)

Free-Space Management (Cont.)

 Space Maps

 Used in ZFS

 Consider meta-data I/O on very large file systems

Full data structures like bit maps couldn’t fit in memory ->

thousands of I/Os

 Divides device space into metaslab units and manages metaslabs

Given volume can contain hundreds of metaslabs

 Each metaslab has associated space map

Uses counting algorithm

 But records to log file rather than file system

Log of all block activity, in time order, in counting format

 Metaslab activity -> load space map into memory in balanced- tree structure, indexed by offset

Replay log into that structure

(38)

TRIMing Unused Blocks

 HDDS overwrite in place so need only free list

 Blocks not treated specially when freed

 Keeps its data but without any file pointers to it, until overwritten

 Storage devices not allowing overwrite (like NVM) suffer badly with same algorithm

 Must be erased before written, erases made in large chunks (blocks, composed of pages) and are slow

 TRIM is a newer mechanism for the file system to inform the NVM storage device that a page is free

Can be garbage collected or if block is free, now block can be erased

(39)

Efficiency and Performance



Efficiency dependent on:



Disk allocation and directory algorithms



Types of data kept in file ’ s directory entry



Pre-allocation or as-needed allocation of metadata structures



Fixed-size or varying-size data structures

(40)

Efficiency and Performance (Cont.)



Performance



Keeping data and metadata close together



Buffer cache – separate section of main memory for frequently used blocks



Synchronous writes sometimes requested by apps or needed by OS



No buffering / caching – writes must hit disk before acknowledgement



Asynchronous writes more common, buffer-able,

faster

(41)

Page Cache



A page cache caches pages rather than disk blocks using virtual memory techniques and addresses



Memory-mapped I/O uses a page cache



Routine I/O through the file system uses the buffer (disk) cache



This leads to the following figure

(42)

I/O Without a Unified Buffer Cache

(43)

Unified Buffer Cache



A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O to avoid double caching



But which caches get priority, and what replacement

algorithms to use?

(44)

I/O Using a Unified Buffer Cache

(45)

Recovery



Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix

inconsistencies



Can be slow and sometimes fails



Use system programs to back up data from disk to another storage device (magnetic tape, other magnetic disk, optical)



Recover lost file or disk by restoring data from backup

(46)

Log Structured File Systems

 Log structured (or journaling) file systems record each metadata update to the file system as a transaction

 All transactions are written to a log

 A transaction is considered committed once it is written to the log (sequentially)

 Sometimes to a separate device or section of disk

 However, the file system may not yet be updated

 The transactions in the log are asynchronously written to the file system structures

 When the file system structures are modified, the transaction is removed from the log

 If the file system crashes, all remaining transactions in the log must

(47)

Example: WAFL File System



Used on Network Appliance “Filers” – distributed file system appliances

 “Write-anywhere file layout”



Serves up NFS, CIFS, http, ftp



Random I/O optimized, write optimized



NVRAM for write caching



Similar to Berkeley Fast File System, with extensive

modifications

(48)

The WAFL File Layout

(49)

Snapshots in WAFL

(50)

The Apple File System

(51)

Homework



Exercises at the end of Chapter 14 (OS book)



14.1

(52)