Cloud computing - 雲端上巨大場景之光線追蹤法實作

“Cloud computing” is more a commercial slogan than a strictly defined technical termi-nology. However, it is widely agreed that there are some characteristics required (or best-to-have) for a cloud computing environment, including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service, etc. With the maturing of the virtualization technologies, the service providers can serve the cloud ser-vices at affordable prices nowadays, enabling users to rent the resources efficiently with minimum cost. In order to exploit the power of cloud computing, the user should adjust the application to fit the needs of the cloud computing environment. Therefore, it is required to analyze the characteristics between cloud computing and the application.

The National Institute of Standards and Technology defines the five characteristics of cloud computing, [22] which ouline the big picture of the cloud computing environment and suggests current business model.

On-demand self-service

A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction

with each service provider.

Broad network access

Capabilities are available over the network and accessed through standard mecha-nisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).

Resource pooling

The provider’s computing resources are pooled to serve multiple consumers us-ing a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of loca-tion independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of re-sources include storage, processing, memory, and network bandwidth.

Rapid elasticity

Capabilities can be elastically provisioned and released, in some cases automati-cally, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.

Measured service

Cloud systems automatically control and optimize resource use by leveraging a me-tering capability1 at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

From users’ point of view, the cloud computing environment provides the resources with the following characteristics, which are different from the traditional cluster and grid computing environment.

Elastic deployment

The resources are provisioned and released elastically on-demand. With the help of the visualization technology, to deploy a large-scale computing environment is efficient and mush easier than before. The users can launch needed amount of in-stances with the pre-built images or snapshots from other inin-stances. For example, a large-scale environment with several hundreds or thousands of instances can be launched in a few minutes with the minimum manpower.

Resource performance

The cloud computing services are usually built on top of the virtualization technol-ogy. Therefore, the performance of processors, memories, network, and storages, etc. varies by many factors, including the infrastructure of the underlying hardware topology, how much resource currently occupied by neighbors within the same pyh-sical machines, etc.

Instance types

The service providers usually try to utilize the resources to serve as many customers as possible to gain the maximum profit. Therefore, a big instance with many pro-cessors and much memory usually gives worse price/performance ratio than a small one with less processors and few memory.

We aim to design a ray tracer which can run on the cloud computing environment with out-of-core scenes support. To fit the needs for the cloud computing environment, the rendering framework is designed with some properties:

• Scalability Performance should improve with additional resources in the system.

• Elastic resource management Instances can be added to or removed from the system dynamically.

• Heteogeneous hosts Hosts in the system can be implemented in different languages and platforms. Each host can also have different resources.

Chapter 3 Design and implementation

The design and the implementation details of this work will be described in this chapter.

This implementation is based onPBRT version 2 [25]. The design of PBRT is clean, elegant, well-documented and open-sourced. It is a popular physical-based ray tracer in research and academia. The originalPBRT runs on a single machine with multi-threading support.

With the moderate extension, our implementation, which is called cloudrenderer, runs in distributed systems, such as cloud computing environments. The computing model, network framework and the resource utilization management are all designed and extended into PBRT to support the cloud environment. They are described in the the following sections.

3.1 Distributed computing model

In our cloud environment, the distributed system consists of many distinct hosts connected through network connections. Each host has its own system, including processors, mem-ories, storages, and a stand-alone operating system. Hosts can communicates with other hosts via network connections. Besides that, each host is basically independent. The dis-tributed computing model we designed tries to split the whole task into smaller tasks and dispatch them to all hosts. The model governs the task computing, state communications, progress management, etc. to collaborate all hosts in the system to finish the task.

3.1.1 Roles

All the hosts in the system are initially equal but we selct one host to be themanager that coordinates the whole system. All other hosts becomeworkers to finish the tasks receievd from either themanager or from other workers.

Theworkers receive tasks continuously and process the tasks based on task properties.

If theworker can process the task locally, the task will be added to a ready queue for furthur execution; otherwise theworker will forward the task to another worker to process it. When the task is executed in aworker, the task may generate more tasks. Every new tasks will be handled with the same strategy. The job is finished when all task executions are complete and there are no new tasks beeing generated. Themanager collects all task responses to form the final results.

When theworkers are processing the tasks, the manager continually splits the job into more tasks and dispatches them toworkers. The manager also monitors all workers and periodically updateworkers’ statuses. With real-time monitoring, the system can balance the gradually load, and is tolerant the accidents such as hardware failures. All the details will be explained in the following sections.

3.1.2 Scene partition

InPBRT, the entire scene is loaded at the begining. However, the hosts in cloud computing environments may be not able to load the entire scene because of memory limitations.

Therefore, we designed a mechanism to split the scene into parts. Each host loads only parts of the scene instead of the entirety.

ThePBRT scene file format is a collection of statement lines. Each statement represents a graphics command, which is comprised of a token and zero or more parameters. The tokens indicate the graphics command of the statement. For example,Shape indicates primitive definition andLight indicates light definition. The scene file is not state-less, meaning that the current graphics context depends on all the statements previously defined.

For example, if there is a transformation statement defined in line 10, it will affect all other statement after line 10 till the end of the scene files. PBRT also provide the statements to

create or leave a new graphics context, such as AttributeBegin and AttributeEnd.

When leaving the newly created graphics context, all the changes within that graphics context are discarded and the previous context is restored. PBRT also supports scene in-clusion with theInclude statement. With this feature one can place some statements in another scene file and include that file in the main scene file.

This kind of scene format is intuitive and easy to parse, but it is difficult to load just parts of the scene. In order to split the scene and support partial loading, we modified the parser module but kept the scene formats compatible. After the scene is split, one main scene file and many scene shape parts files are generated. The main scene files contains all statements but no shapes. All shape statements are placed in the shape parts files. When loading partial scenes, the parser ignores the parts which are not specified to be loaded in this host. In this way, each host only holds the main scene file and a subset of the scene part files, which is much smaller than the whole scene. There are some constraints to be followed in distributing the scene parts into all hosts. We call the procedure to distribute the scene parts into all hosts “scene partition.”

Suppose H₁, H₂, . . . , H_m and S₁, S₂, . . . , S_n denote the host 1, host 2, etc. and scene num-ber specified by users. The minimum replication numnum-ber is an integer numnum-ber specified by the user to indicate the least number of times that a scene part should be replicated. It suggests the maximum number of host failures can occur at the same time.

There is not only one solution for SP s. In our implementation, we calculate the sum-mation of the scene parts and hosts’ main memory to get the average replication number R_avg. Then we sort the scene parts by size and distribute the scene parts one by one from the largest scene parts to the smallest one. When distributing a scene part, we randomly

choose a host with sufficient remaining main memory to hold the scene part. The dis-tribution repeats until all the scene parts are distributed R_avg times. The distribution is maintained in a map namedscene-host map by the manager. The map describes which scene parts is held by which hosts. Besides the scene-host map, the manager also maintains ahost list to describe available hosts and their current loads. The manager periodically updates, and distributes thescene-host map and host list to all hosts.

3.1.3 Tasks

ATask is the basic computing object in our distributed computing model. The hosts can generate tasks, process tasks, submit tasks to other hosts or reply to other hosts with the processing results. A basic task has the following attributes and operations:

• ID an unique identifier in the whole system,

• state the task state,

• submit() a host can submit a task to another host,

• reply() a host can reply the results to the submitted host,

• generate() a task can generate more tasks,

• collect() a task can gather the replies from the descendant tasks.

Among the operations,submit() and reply() are used between hosts and hosts while generate() and collect() are used internally within a host.

Every host has five task queues. Tasks are moved from one queue to another queue based on the finite state machine as Figure 3.1. Initially, a new task is generated with an unique identifier. We use Universally Unique IDentifier (UUID) [21] in implementation.

The newly generated task is then judged by the scheduler thread regarding whether the new task can be processed locally or not. If the task can be processed locally, it will be moved to theready queue and its state accordingly changed to ready; otherwise it will be moved to thesubmitting queue and wait to be submit to another host that can process the task.

end

Figure 3.1: The finite state machine of a task’s life cycle. a. the task can be processed locally, b. a running thread picks up the task, c. the task is finished, d. the task cannot be process locally, e. failed to submit the task to another worker, f. success to submit the task to another worker, g. generate new tasks, h. all descendant tasks are finished, i. failed to reply the response to the submitter, j. success to reply the response to the submitter.

The tasks in theready queue will be picked up by a thread to enter the running state.

When the task is processed, it cangenerate() more tasks. The newly generated tasks will be initialized withnew states and dependencies with the original task defined. The original task will be moved to the waiting queue to wait for all generated tasks to be finished. If the task does not generate any new tasks, the task is finished and moves to the replying queue.

The communication thread continuously receives responses from other hosts and in-vokescollect() of the corresponding tasks in the waiting queue. When all dependent next generation tasks of a given are finished, the task is moved to theready queue for further processing.

Communications may fail in the real world. Therefore the tasks in thesubmitting or replying queue will repeat the submitting or replying until success is confirmed or until removal of target hosts (the tasks will be immediately discarded). The operations related to fault tolerance will be described later.

3.1.4 Task categories

There are several type of tasks working for different implemented functions to support ray tracing rendering.

• loader/unloader load/unload the scene parts,

• renderer render a specific part of the film,

• tracer trace a specific ray,

• aggregator aggregate the rays at the position of the ray-primitive intersection,

• intersection test the ray-primitive intersection.

Initially, themanager receives a rendering job from the user. The manager parses the scene file and splits it into multiple scene parts. Then themanager submits the loader tasks to allworkers with different scene parts. The workers reply to the manager when the scene parts are fetched and completely loaded. Then the manager starts to submit renderer tasks to workers. When the workers process the renderer tasks, the sampler samples many rays and generate correspondingtracer tasks. The intersection and aggregator tasks are generated while processing tracer tasks. The workers may pro-cessintersection and aggregator tasks locally or submit them to other hosts and wait for completion. When all the dependent tasks are finished, the radiance of the specific pixel is replied to the manager. The manager updates the pixel value to the film, then submits and wait for otherrenderer tasks until all the pixel values are drawn.

Theintersection tasks test ray-object intersections, which dominate the whole ray tracing algorithm. Since every hosts hold only parts of the whole scene, anintersection task may involve more than one host to finish. A subset of the hosts, which hold the entirety, are selected for anintersection task. It is intuitive that the number of hosts in this subset should be as small as possible, and the load of the hosts should be as low as possible. It leads to the minimization

arg min

SH |SH| + αload(SH) subject to Si ∈ SP (SH) ∀i = 1, 2, . . . , n, (3.3)

where

SH is the subset of hosts,

load(SH) is the average load of the SH,

α is a parameter to balance two cost terms, Siis the i-th scene parts,

SP (SH) is the scene parts held for all hosts in SH.

This minimization is not able to efficiently minimized. We use a greedy algorithm to find the local optimal of the minimization. When choosing SH, the host with lowest the load is selected and the related scene parts is marked. We continue to select the lowest host and mark the scene parts until all the scene parts are covered. All the selected hosts form the SH and are used for theintersection task.

在文檔中雲端上巨大場景之光線追蹤法實作 (頁 25-35)