Network Protocols:
Design and Analysis
Polly Huang EE NTU
http://cc.ee.ntu.edu.tw/~phuang phuang@cc.ee.ntu.edu.tw
Multicast Routing
Polly Huang, NTU EE 3
Key ideas
• lays foundation for IP multicast
– defines IP service model
• ex. best effort, packet based, anon group
• compare to ISIS with explicit group membership, guaranteed ordering (partial or total ordering)
• several algorithms
– extended/bridged LANs – distance-vector extensions – link-state extensions
Polly Huang, NTU EE 4
Why Multicast
• save bandwidth
Polly Huang, NTU EE 5
Characterizing Groups
• pervasive or dense
– most LANs have a receiver
• sparse
– few LANs have receivers
• local
Polly Huang, NTU EE 6
Service Model
• same delivery characteristics as unicast
– best effort packet delivery
– open-loop (no built-in congestion/flow control)
• scoping as control mechanism
• groups identified by a single IP address
• group membership is open
– anyone can join or leave
– do security at higher levels
Polly Huang, NTU EE 7
Routing Algorithms
• single spanning tree
– for bridged LANs
• distance-vector based
• link-state based
Polly Huang, NTU EE 8
Distance-vector Mcast Rtg
• Basic idea: flood and prune
• flood: send info about new sources everywhere
• prune: routers will tell us if they don’t have
receivers
• routing info is soft state; periodically re-flood (and
prune) to refresh this info
– if no refresh, then the info goes away => easy fault recovery
Polly Huang, NTU EE 9
Example Topology
g g
s
Polly Huang, NTU EE 10
Phase 1:
Flood
using
Truncated Broadcast
g g
s
g
truncated broadcast:
this router knows it has no gropus on its LAN, so it doesn’t b roadcast
Polly Huang, NTU EE 11
Phase 2:
Prune
g g s prune (s,g) prune (s,g) gPolly Huang, NTU EE 12 graft (s,g) graft (s,g)
Phase 3:
Graft
g g s g g report (g)Polly Huang, NTU EE 13
Phase 4: Steady State
g g
s
g g
Polly Huang, NTU EE 14
Sending Data in DVMRP
• Data packets are sent on all branches of the tree
– send on all interfaces except the one they came in on
• RPF (Reverse Path Forwarding) Check:
– drop packets that arrive on incorrect interfaces (i.e., not from the unicast direction to the sending host)
Polly Huang, NTU EE 15
DVMPR Pros and Cons
• Pros:
– simple
– works well with many receivers. why? overhead is per-sender, receivers are passive
• Cons:
– works poorly with many groups (why? every sender in every group floods the nets)
– works poorly with sparse groups (why? flood data
everywhere and then prune back, expensive if only
Polly Huang, NTU EE 16
Link-state Multicast Routing
• Basic idea: treat group members (receivers) as ne
w links
– flood info about them to everyone in LSA msg (just lik e LSA rtg)
• Compute next-hop for mcast routes on-demand (la
zily)
– unlike for LSA unicast where all are computed as soon as LSA arrives
Polly Huang, NTU EE 17 S1 R1 R2 X Y Z
Link state: Each router floods link state advertisement Multicast: add membership information to “link state”
Each router computes multicast tree for each active source, builds forwarding entry with outgoing interface list.
Polly Huang, NTU EE 18 S1 R1 R2 X Y
Z has network map, including membership at X and Y Z computes shortest path tree from S1 to X and Y
(when it gets a data packet on G), puts in rtg table W, Q, R, each do same thing as data arrives at them
Z W
Q
Polly Huang, NTU EE 19 R1 R2 X Y Z W Q R S1
Link state advertisement with new topology may require re-computation of tree and forwarding entry
Polly Huang, NTU EE 20 R1 R2 X Y Z W Q R S1
T
R3
Link state advertisement (T) with new membership (R3) may require incremental computation and addition of interface
Polly Huang, NTU EE 21
MOSPF Pros and Cons
• Pros:– simple add on to OSFP
– works well with many senders. why? no per-sender state
• Cons:
– works poorly with many receivers (why? per-receiver costs)
– works poorly with sparse groups (why? lots of info goes places tha t don’t want it)
– works poorly with large domains (why? link-state scales wrt numb er of links—many links causes frequent changes)
PIM
Polly Huang, NTU EE 23
Key ideas
• want a mcast routing protocol that works w
ell with sparse users
• use a single shared tree; fix one host as rend
ezvous point
Polly Huang, NTU EE 24
Why not just DVMRP or
MOSPF?
• With sparse groups, both are expensive
– DVMRP problem with many senders – MOSPF problem with many receivers – neither works well with sparse groups
• Solution: PIM-SM
– use rendezvous point as a place to meet – but dowside:
• single point of failure
• don’t necessarily get shortest path
• also concerned about “concentration” of all data going through rendezvous point
Polly Huang, NTU EE 25
New Design Questions
• Where to place RP?
• How to make the RP robust?
– don’t want a single point of failure
• How to build the tree given an RP?
• How to send data with a shared tree?
• What is the overhead of going through RP (a
shared tree)?
Polly Huang, NTU EE 26
Where to place RP?
• RP is a node to which people send join mes
sages
• place it in the core
– at the edge is more expensive since tfc must go
through it
Polly Huang, NTU EE 27
Robustness
• single RP is single point of failure, so must
have backup plan
• approach:
– start with a set of cores
– hash the group name to form an ordered list
• basic idea: order RPs, hash(G) selects one, use it • if it fails, hash(G) to find the next one
• if everyone uses the same hash function, people find the same RPs
Polly Huang, NTU EE 28
Building the Shared Tree
• Simply send a message towards the RP
– use the unicast routing table to get there
• Add links to the tree as you go
• Stop if you get to a rtr that’s already in the t
ree
Polly Huang, NTU EE 29
PIM Example: build Shared tree
Shared tree after R1,R2,R3 join Join message toward RP RP R1 R2 R3 R4 (*, G) (*, G) (*, G) (*, G) (*, G) (*, G) (*, G) (*, G)
Polly Huang, NTU EE 30
PIM: Sending Data
• If you are on the tree, you just send it as wit
h other mcast protocols
– it follows the tree
• If you are not on the tree (say, you’re a send
er but not a group member), the pkt is tunne
led to the RP that sends it
Polly Huang, NTU EE 31
PIM Example: sending data on
the tree
RP R1 R2 R3 R4 (*, G) (*, G) (*, G) (*, G) (*, G) (*, G) (*, G) (*, G) R4 sends dataPolly Huang, NTU EE 32
Sending data if not on the tree
RP
R1
R2 R3
R4
S1 unicast encapsulated data packet to RP in Register
RP decapsulates, forwards down shared tree
Polly Huang, NTU EE 33
What is the cost of the shared
tree?
• Some data goes further than it should
– but latency is bounded to 2x SPT
• All data goes on one tree, rather than on
many trees
– but no guarantee you get multiple paths with
source-specific trees
• But to optimize things, PIM-SM supports
Polly Huang, NTU EE 34
Build source-specific tree
RP R1 R2 R3 R4 Join messages toward S1 RP distribution tree
Build source-specific tree for high data rate source
S1 (S1, G) (*,G) (S1, G) (S1, G) (S1, G), (*,G) (S1, G) (*,G) (*, G)
Polly Huang, NTU EE 35
Forward packets on
“longest-match” entry
RP R1 R2 R3 R4 R5 S1 Source (S1)-specific distribution tree Shared tree Source-specific entry is“longer match” for source S1 than is Shared tree entry
that can be used by any source
(S1, G) (*,G) (S1, G) (S1, G) (S1, G), (*,G) (S1, G) (*,G) (*, G) (*,G) (*,G)
Polly Huang, NTU EE 36
SPT and Shared Trees
• Many more details to be careful about
– need to handle switchover from shared-tree to SPT gracefully
– need to support pruning for both SPT and shared-tree
• and have to worry about LANs with multiple
routers, multiple senders, etc.
• Uses similar protocols (soft-state, refresh, etc.),
but lots of details
Polly Huang, NTU EE 37
PIM-SM observations
• does a good job at intra-domain mcast routing that
scales to
– many senders – many receivers – many groups – large bandwidth
• preserves original (simple) service model
• but quite complex
Polly Huang, NTU EE 38
Multi-AS Mcast Routing
• Fine, PIM-SM (or DVMRP or MOSPF) work insi
de an AS, what about between ASes?
– lots of policy questions
– and have to show ISPs why they should deploy (how th ey can make money :-)
– and convince them the world won’t end
• multicast, that’s for high-bandwidth video, right? • multicast can flood all my links with data, right? • what apps, again?
Polly Huang, NTU EE 39
MSDP
• Support for inter-domain PIM-SM
• Temporary solution
• Basic approach:
– send all sources to all ASes (like original
flood-and-prune)
– AS border routers are PIM-SM RPs for their do
main
Polly Huang, NTU EE 40
But does this seem complicated?
• some people thought so
• and commercial deployment has been slow
• if we change the service model, maybe we c
an greatly simplify things
– and make it easier for ISPs to understand how t
o change/manage mcast
Express
Polly Huang, NTU EE 42
Key ideas
• use channels: a single sender, many subscri
bes
– makes mcast tree easier to config
– easier to tell who can send
• add mechanism to let you count subscribers
• easier to think about billing
Polly Huang, NTU EE 43
Multicast Problems
• need billing mechanism
– need to know number of subscribers
• need access control
– need to limit who can send and subscribe
– ISPs concerned about mcast
• IPv4 mcast addresses too limited
• current protocols too complex
Polly Huang, NTU EE 44
Express vs. Multicast Problems
• need billing mechanism
– record sources
– count receivers
• need access control
– only subscriber can send
• IPv4 mcast addresses too limited
Polly Huang, NTU EE 45
Express Approach
• all addresses are source specific (S,E)
– 224 channels per source, (232 sources)
• access control
– only source can send
– channels optionally protected by “key” (really just a sec ret)
• sub-cast support (encapsulate pkt to any router on
the tree [if you know who they are])
Polly Huang, NTU EE 46
Express Components
• ECMP: Express Count Mgt Protocol
– like IGMP, but also adds count support
– counts used to determine receivers or for other
things like voting
• not clear how general
• session relays
– service at source that can relay data on to tree
(similar to PIM tunneling)
Polly Huang, NTU EE 47
Observations
• Simpler? yes
Polly Huang, NTU EE 48
Another Alternative:
Application-level Multicast
• if the ISPs won’t give us multicast, we’ll ta
ke it :-)
• just do it all at the app
• results in some duplicated data on links
• and app doesn’t have direct access to unicas
t routing
Polly Huang, NTU EE 49
Application-level Multicast
Example
Polly Huang, NTU EE 50