• 沒有找到結果。

Optimizing Network Performance of Computing Pipelines in Distributed Environments

N/A
N/A
Protected

Academic year: 2022

Share "Optimizing Network Performance of Computing Pipelines in Distributed Environments"

Copied!
38
0
0

加載中.... (立即查看全文)

全文

(1)

Optimizing Network Performance of Computing Pipelines in Distributed

Environments

Qishi Wu1, Yi Gu1 Mengxia Zhu2 Nageswara S.V. Rao3

1Dept of Computer Science 2Dept of Computer Science University of Memphis Southern Illinois University

3Computer Science & Math Div Oak Ridge National Laboratory

2008 IPDPS

(2)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(3)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(4)

Introduction

Introduction

The demands of large-scale collaborative applications in various domains are beyond the capabilities of the traditional solutions based on standalone workstations.

Supporting high performance computing pipelines over WAN is critical to enabling large-scale distributed scientific

applications.

A number of large-scale computational applications require efficient executions of computing tasks that consist of a sequence of linearly arranged modules, also referred to as subtasks or stages.

These modules form a so-called computing pipeline between a data source and an end user.

(5)

Introduction

We consider two types of large-scale computing applications comprising of a number of modules or subtasks to be executed sequentially in a distributed network environment:

1 Interactive applications where a single dataset is sequentially processed along a computing pipeline.

Goal Minimize the end-to-end delay of a pipeline to provide fast response

2 Streaming applications where a series of datasets continuously flow through a computing pipeline.

Goal Maximize the frame rate of a pipeline to achieve smooth data flow

(6)

Introduction

Introduction

We construct analytical cost models for computing modules, network nodes, and communication links to estimate the computing times on nodes and the data transport times over connections.

Based on these time estimates, we present the Efficient Linear Pipeline Configuration(ELPC) method based on dynamic programming that partitions the pipelines modules into groups and maps them onto a set of selected computing nodes in a network.

For comparison purposes, we also implement and test Streamline algorithm (By B. Agarwalla et al.)

Greedy algorithm

with the same simulation datasets on the same computing platform.

(7)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(8)

Cost Models and Problem Formulation

Cost Models of Pipeline and Network Components

Mi A computing module

ci The computational complexity of Mi mi−1 The incoming data size

ciand mi−1determine the number of CPU cycles needed to complete the subtask

vi A network node

pi The overall computing power of vi

Li,j The communication link between viand vj bi,j The bandwidth of Li,j

di,j The minimum link delay of Li,j

(9)

Cost Models of Pipeline and Network Components

The computing time of Mirunning on vj: Tcomputing(Mi, vj) =ci·mpi−1

j

The transfer time of message size m over Li,j: Ttransport(m, Li,j) =bm

i,j+di,j

(10)

Cost Models and Problem Formulation

Cost Models of Pipeline and Network Components

We consider a transport network consisting of k geographically distributed computing nodes v1, v2, · · · , vk.

The general pipeline consists of n sequential modules

M1, M2, · · · , Mn, where M1is a data source and Mnis an end user.

(11)

Problem Formulation

The objective of a general mapping scheme is to decompose the pipeline into q groups of modules denoted by g1, g2, · · · , gq, and map them onto a selected path P of q nodes from a source node vsto a destination node vd, where q ∈ [1, min(k, n)].

Path P consists of a sequence of unnecessarily distinct nodes vP[1]=vs, vP[2], · · · , vP[q]=vd.

For each mapping, we consider two cases:

1 With node reuse, two or more modules are allowed to run on the same node.

2 Without node reuse, a node on the selected path P executes exactly one module.

(12)

Cost Models and Problem Formulation

Minimal total delay for interactive application

We achieve the fastest system response by minimizing the total computing and transport delay of the pipeline from the source node to the destination node.

Total delay

Ttotal(Path P of q nodes) = Tcomputing+Ttransport

=

q

X

i=1

Tgi+

q−1

X

i=1

TLP[i],P[i+1]

=

q

X

i=1

à 1 pP[i]

X

Mj∈gi,j≥2

(cjmj−1)

! +

q−1

X

i=1

µ m(gi) bP[i],P[i+1]

(13)

Maximal frame rate for streaming applications

To produce the smoothest data flow for streaming applications, we maximize the frame rate. Which is achieved by identifying and minimizing the time incurred on a bottleneck link or node.

Time on bottleneck

Tbottleneck(Path P of q nodes)

= max

Path p of q nodes i=1,2,...,q−1

Tcomputing(gi), Ttransport(LP[i],P[i+1]), Tcomputing(gq)

= max

Path p of q nodes i=1,2,...,q−1





1 pP[i]

P

Mj∈gi,j≥2(cjmj−1),

m(gi) bP[i],P[i+1],

1 pP[q]

P

Mj∈gq,j≥2(cjmj−1)





(14)

Algorithm Design

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(15)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(16)

Algorithm Design ELPC Algorithm

Minimum End-to-end Delay with Node Reuse

For interactive applications, our goal is to minimize the end-to-end delay incurred on the nodes and links from the source to the destination to achieve the fastest response.

A single dataset is processed and there is only one module being executed at any particular time.

Node can be reused but are not shared simultaneously among different modules.

(17)

Illustration of ELPC Mapping Scheme for

Minimum End-to-end Delay

(18)

Algorithm Design ELPC Algorithm

Minimum End-to-end Delay with Node Reuse

Let Tj(vi) denote the minimal total delay with the first j

modules mapped to a path from the source node vsto node vi. We have the following recursion leading to the final solution Tn(vd).

Minimal total delay

Tj(vi)

j=2 to n, vi∈V

=min

Tj−1(vi) +cjmpj−1

vi

u∈adj(vmini)(Tj−1(u) +cjmpj−1

vi +mbj−1

u,vi)

Base condition

T2(vi)

vi∈V , and vi6=vs

=

( c2m1

pvi + m1

bvs,vi , ∀evs,viE

, otherwise

(19)

Minimum End-to-end Delay with Node Reuse

Let Tj(vi) denote the minimal total delay with the first j

modules mapped to a path from the source node vsto node vi. We have the following recursion leading to the final solution Tn(vd).

Minimal total delay

Tj(vi)

j=2 to n, vi∈V

=min

Tj−1(vi) +cjmpj−1

vi

u∈adj(vmini)(Tj−1(u) +cjmpj−1

vi +mbj−1

u,vi)

Base condition

T2(vi)

vi∈V , and vi6=vs

=

( c2m1

pvi + m1

bvs,vi , ∀evs,viE

, otherwise

(20)

Algorithm Design ELPC Algorithm

Minimum End-to-end Delay with Node Reuse

(21)

Minimum End-to-end Delay with Node Reuse

The complexity of this algorithm is O(n × |E|) - n denotes the number of modules

- |E| is the number of edges

(22)

Algorithm Design ELPC Algorithm

Maximum Frame Rate without Node Reuse

For streaming applications, our goal is to maximize frame rate.

The maximum frame rate a computing pipeline can achieve is limited by the bottleneck unit which is the slowest transport link or computing node.

Node reuse in streaming applications causes resource sharing, and hence affects the optimality of the solutions to previous mapping subproblems.

We consider a restricted version of the mapping problem for maximum frame rate by limiting the use of each node to a single module.

(23)

Illustration of ELPC Mapping Scheme for

Maximum Frame Rate

(24)

Algorithm Design ELPC Algorithm

Maximum Frame Rate without Node Reuse

We attempt to find the widest1network path with exact n nodes to map n modules in the pipeline on a one-to-one basis.

This problem is NP-complete.

We develop an approximate solution by adapting the method for minimum end-to-end delay with some necessary

modifications.

(25)

Maximum Frame Rate without Node Reuse

1

Tj(vi) denote the maximal frame rate with the first j modules mapped to a path from source node vsto node vi.

Also we have the following recursion leading to the final solution Tn(vd)

Time on bottleneck Tj(vi)

j=2 to n,vi∈V

= min

u∈adj(vi)

µ max

µ

Tj−1(u),cjmj−1 pvi ,mj−1

bu,vi

¶¶

Base condition

T2(vi)

vi∈V , and vi6=vs

=

( max³

c2m1

pvi ,bm1

vs,vi

´

, ∀evs,viE

, otherwise

(26)

Algorithm Design ELPC Algorithm

Maximum Frame Rate without Node Reuse

1

Tj(vi) denote the maximal frame rate with the first j modules mapped to a path from source node vsto node vi.

Also we have the following recursion leading to the final solution Tn(vd)

Time on bottleneck Tj(vi)

j=2 to n,vi∈V

= min

u∈adj(vi)

µ max

µ

Tj−1(u),cjmj−1 pvi ,mj−1

bu,vi

¶¶

Base condition

T2(vi)

vi∈V , and vi6=vs

=

( max³

c2m1

pvi ,bm1

vs,vi

´

, ∀evs,viE

, otherwise

(27)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(28)

Algorithm Design Streamline Algorithm

Streamline Algorithm

Agarwalla et al. proposed a grid scheduling algorithm for graph dataflow scheduling in a network with n resources and n × n communication links.

This algorithm considers application requirements in terms of

1 Per-stage computation and communication needs

2 Application constraints on co-location of stages

3 Availability of computation and communication resources This scheduling heuristic expects to maximize the throughput of an application by assigning the best resources to the most needy stages at each step

The complexity of this algorithm is O(m × n2) - m is the number of stages or modules - n is the number of nodes

(29)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(30)

Algorithm Design Greedy Algorithm

Greedy Algorithm

A greedy algorithm iteratively obtain the greatest immediate gain based on certain local optimality criteria at each step.

We calculate the end-to-end delay or maximum frame rate for the mapping of a new module onto the current node when node reuse is allowed or one of its neighbor nodes and choose the optimal one.

This algorithm makes a mapping decision at each step only based on the current information without considering the effect of this local decision on the mapping performance in the later steps.

The complexity of this algorithm is O(m × n)

- m denotes the number of modules in the linear pipeline - n is the number of nodes in the network

(31)

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(32)

Implementation and Experimental Results

Implementation

We conduct an extensive set of mapping experiments using a wide variety of simulated application pipelines and computing networks.

We generate these simulation datasets by randomly varying the pipeline and network attributes within a suitably selected range of values.

For each mapping problem, we designate a source node and a destination node to run the first module and the last module of the pipeline.

(33)

Performance comparison of the three algorithms

(34)

Implementation and Experimental Results

Performance of Comparison of Minimum End-to-end Delay for Three Algorithms

The x-axis represents the case number and there are 20 cases.

(35)

Performance of Comparison of Maximum

Frame Rate for Three Algorithms

(36)

Conclusions and Future Work

Outline

1 Introduction

2 Cost Models and Problem Formulation

3 Algorithm Design ELPC Algorithm Streamline Algorithm Greedy Algorithm

4 Implementation and Experimental Results

5 Conclusions and Future Work

(37)

Conclusions

We designed an ELPC scheme based on dynamic programming that strategically maps modules of computing pipelines to shared or dedicated network environments to achieve the minimum end-to-end delay and maximum frame rate.

The experimental results show that the ELPC exhibits superior mapping performance over the other methods.

(38)

Conclusions and Future Work

Future Work

We will study the pipeline mapping problem for maximum frame rate in the case of node reuse.

And also extend linear pipelines to graph workflows and study the complexity of and develop efficient solutions to graph workflow mapping problems in distributed environments.

參考文獻

相關文件

SDP and ESDP are stronger relaxations, but inherit the soln instability relative to measurement noise. Lack soln

In an ad-hoc mobile network where mobile hosts (MHs) are acting as routers and where routes are made inconsistent by MHs’ movement, we employ an associativity-based routing scheme

In this project, we developed an irregular array redistribution scheduling algorithm, two-phase degree-reduction (TPDR) and a method to provide better cost when computing cost

Moreover, this chapter also presents the basic of the Taguchi method, artificial neural network, genetic algorithm, particle swarm optimization, soft computing and

This study proposed the Minimum Risk Neural Network (MRNN), which is based on back-propagation network (BPN) and combined with the concept of maximization of classification margin

The purpose of this paper is to achieve the recognition of guide routes by the neural network, which integrates the approaches of color space conversion, image binary,

The results of constructions are plenteous, including sewage treatment plant, station facilities, underground pipelines systems, the customer sewer pipelines, and the

To enhance the generalization of neural network model, we proposed a novel neural network, Minimum Risk Neural Networks (MRNN), whose principle is the combination of minimizing