Network Protocols:
Design and Analysis
Polly Huang EE NTU
http://cc.ee.ntu.edu.tw/~phuang phuang@cc.ee.ntu.edu.tw
What’s important about the web?
• easy access to information
– vs. gopher: also easy to use, but no graphics – vs. e-mail, ftp
– and today, search engine
• supports rich data formats (text, graphics,
hypertext, etc.)
• fast, cheap, ubiquitous
• huge economic impact
Why was the web successful?
• Multimedia information
– vs. gopher: web had graphics – vs. ftp: a lot easier to use
• Multi-platform
– Really simple to implement
– My undergrad network course students are implementing one
Aside: the web and computer
science
• the web was invented by a physicist; why not a computer scientist?
– computer scientists did hypertext since the late ’60s
– he had info to exchange
• with non-CS expert
• (vs. CS people who knew FTP)
– web was too simple
• not interesting research, except that it found the right trade-off in complexity vs. benefit
What is the web?
• protocols
– formatting: HTML (XML, CSS)
– addressing: URLs (builds on DNS, IP, files; als o URNs)
Formatting: HTML
• SGML subset: simple ASCII with tags
– structure (ex. blockquote, headings, em)—logical info – display (ex. fonts, justifiction, colors, it)—physical layo
ut
– pro: easy to write (no special tools req’d)
• extensible (has grown [a lot!] over time)
– now includes forms, tables, frames, math, style sheets, etc.
– augmented by CSS (Cascading Style Sheets) do define display for HTML markup
Addressing: URLs
• protocol://host:port/hierarchy#part?search
– multiple protocols (http, ftp, etc.) – host:port—use DNS, distributed – hierarchy—locally defined
• pros:
– location independent (can move servers around, as long as DNS name stays same)
– global
• cons:
– some things don’t have URLs (sometimes data is “hidden” behind search forms and logins)
– some things are location dependent (ex. ~johnh stuff)
Data Transfer: HTTP
• stateless: no inherent notion of client
– so no shopping carts! later work added optional cookies to identify clients
• layers over TCP
• designed as faster replacement to FTP
• many features tacked on:
– content negotiation (graphics, language) – caching support
HTTP/0.9
• original protocol
– client: “GET /path” (close) – server: “data” (close)
• very very simple
• still valid
HTTP/1.0
• informational rfc1945 (60 pages) • client
– “GET /path HTTP/1.0”, headers, body (close) – other operations (GET, HEAD, POST)
• can send data (POST)
• can check for changes (HEAD)
• server
– headers, data (close)
– headers allow info (type, change time, etc.) – uses MIME types, adds content negoation, etc.
HTTP/1.1
• rfc2068 (162 pages!)
• basically like 1.0, but adds:
– persistent connections
– identity of host (supports virtual hosts) – detailed caching models
HTTP Request—More Detail
• Request line
– Method
• GET – return URI
• HEAD – return headers only of GET response • POST – send data to the server (forms, etc.)
– URI
• E.g. http://www.isi.edu/~govindan/index.html with a proxy • E.g. /index.html if no proxy
HTTP Request
• Request headers
– Authorization – authentication info – Acceptable document types/encodings – From – user email
– If-Modified-Since
– Referer – what caused this page to be requested – User-Agent – client software
• Blank-line • Body
HTTP Request Example
GET / HTTP/1.1 Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Wi ndows NT 5.0)
Host: www.isi.edu
HTTP Response
• Status-line
– HTTP version
– 3 digit response code
• 1XX – informational • 2XX – success
• 3XX – redirection • 4XX – client error • 5XX – server error
HTTP Response
• Headers
– Location – for redirection – Server – server software
– WWW-Authenticate – request for authentication – Allow – list of methods supported (get, head, etc) – Content-Encoding – E.g x-gzip
– Content-Length – Content-Type – Expires – Last-Modified • Blank-line • Body
HTTP Response Example
HTTP/1.1 200 OK
Date: Tue, 27 Mar 2001 03:49:38 GMT
Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/ 0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24
Last-Modified: Mon, 29 Jan 2001 17:54:18 GMT ETag: "7a11f-10ed-3a75ae4a"
Accept-Ranges: bytes Content-Length: 4333
Keep-Alive: timeout=15, max=100 Connection: Keep-Alive
Content-Type: text/html …..
Caching
• Why cache?
– can save a lot of bandwidth
• Risk: out-of-date data
– file systems are usually strongly cache
coherent, detecting when data is modified and invalidating cached copies
– the web is provides only weak coherence, you can get old data
Web Cache Coherence
• Assume page is good until time t
–t could be in past
• First request for http:// foo/
–returns data with valid p eriod (“Expires: t”)
–clients may estimate val
• Subsequent request:
–if in valid period, replay data out of cache (no request)
–(server can prohibit this w/0 valid period)
–if beyond valid period, issue a conditional request
• GET with “If-Modified-Si nce” header
• Respose is either “304 Not modified” or “200 OK” +
Example Cache Check Request
GET / HTTP/1.1 Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a"
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows N T 5.0)
Example Cache Check Response
HTTP/1.1 304 Not Modified
Date: Tue, 27 Mar 2001 03:50:51 GMT
Server: Apache/1.3.14 (Unix) (Red-Hat/Linu
x) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.
2 PHP/4.0.1pl2 mod_perl/1.24
Connection: Keep-Alive
Web Workloads
• Users make connections and request pages made up of objects
– a structural model
• user arrival: Poisson
• connection duration, number of object per page, object size: all often heavy tailed (Pareto)
– P&M give object stats:
• median: 1946 bytes • mean: 13767 bytes
Page Stats
• Popularity
– Zipf distribution
– requests for ith most popular document ~ i-a
– (popular are really popular, but long tail)