PDF Archive

Easily share your PDF documents with your contacts, on the Web and Social Networks.

Share a file Manage my documents Convert Recover PDF Search Help Contact



multireot .pdf



Original filename: multireot.pdf

This PDF 1.4 document has been generated by PScript5.dll Version 5.2.2 / Acrobat Distiller Server 8.1.0 (Sparc Solaris, Built: 2007-09-07), and has been sent on pdf-archive.com on 20/10/2016 at 18:55, from IP address 90.190.x.x. The current document download page has been viewed 226 times.
File size: 1.5 MB (7 pages).
Privacy: public file




Download original PDF file









Document preview


ISSN 0361 7688, Programming and Computer Software, 2014, Vol. 40, No. 4, pp. 208–214. © Pleiades Publishing, Ltd., 2014.
Published in Russian in Programmirovanie, 2014, Vol. 4, No. 4.

Multiple Reference Octrees for a GPU Photon Mapping
and Irradiance Caching
V. A. Frolova*, A. A. Kharlamovb**, V. A. Galaktionova***,
and K. A. Vostryakovb****
a

Keldysh Institute of Applied Mathematics RAS, Miusskaya square 4, Moscow, 125047 Russia
b
Nvidia Moscow, Dvincev 12, Russia
e mail: *vfrolov@graphics.cs.msu.ru, **akharlamov@nvidia.com,
***vlgal@gin.keldysh.ru, ****kvostryakov@nvidia.com
Received January 15, 2014

Abstract—This paper introduces an efficient and simple algorithm for constructing Multiple Reference
(MR) octrees on a GPU in application to Photon Mapping and Irradiance Caching techniques. Although
MR octrees are hierarchical structures, we successfully ignore their hierarchical nature and present an
approach with plain construction, compact data layout and stack less traversal. Our algorithm uses only 2
parallel primitives (parallel append and parallel sort) and can be expressed in several lines of pseudo code.
DOI: 10.1134/S0361768814040033
1

1. INTRODUCTION

Octrees are popular data structures due to their
spatial regularity and relatively small memory foot
print compare to other regular spatial structures in 3D.
Each internal node has exactly eight children and
recursively subdivides it’s space into eight octants.
In Single Reference Octrees each node contains a list
of references to objects that are enclosed by the node.
In a Multiple Reference (MR) Octree each leaf
contains references to all objects that overlap with the
leaf’s AABB. Such octrees are used to simplify the tra
versal procedure and are essential in a variety of algo
rithms, for example in the final gathering pass of pho
ton mapping. A search of all objects which enclose a
certain point lead to a simple traverse from root to leaf
without a recursion and stack.
2. RELATED WORK
2.1. Multiple. Reference Octrees
MR octrees are commonly used for irradance
caching technique (IC) on CPU. It was reported that
Multiple Reference Octrees were slower to build but
are faster during look ups compared to Single Refer
ence Octrees [1]. MR Octree construction is usually
performed progressively. To insert a new record we
traverse the tree recursively from top to bottom and IC
record is passed to each node that overlaps with it until
the traversal reaches a leaf. If a leaf has too many
records, it becomes a node and is subdivided further.
This and other hierarchical approaches are quite hard
1 The article is published in the original.

to be implemented on a GPU in a straightforward
manner due to their non uniform memory allocations
and serial nature of construction algorithm (it is not
possible to construct next level of tree while the cur
rent level is not completed).
2.2. GPU Octree Construction
The interesting feature of octrees is the direct cor
respondence between an octree and 3D Morton Code
[2] (or ZCurve). Once the data is marked with a
Z index and sorted by this index, the sorted list repre
sents a certain level of an octree. Evaluating a Z index
for a custom point in 3D space and using binary search
inside the sorted list is equivalent to traversal.
[3] uses mentioned correspondence and notes the
property of parent Z index to be a prefix for child: if to
discard the least significant 3 bits from Z index value
of a cell we get Z index of the parent cell. In this work
octrees were stored in a set of regular grids and authors
do not discuss a problem of storing point lists. So their
octree can only perform queries to check if something
occupies a given region of the 3D space, but it can’t
return the associated point list.
[4] introduces an efficient GPU octree construc
tion based on Morton Codes in application to surface
reconstruction from point clouds. Zhou et al. start
with the leaf nodes and perform a series of parallel
compaction operations to determine their ancestors,
one level at a time.
[5] presents a technique for converting polygonal
models to voxel representation based on octrees and
hardware rasterisation. Leaves of the resulting octree
store RGBA8 values rather than object lists. The

208

MULTIPLE REFERENCE OCTREES FOR A GPU PHOTON MAPPING

implementations is very fast but it is not suitable for
general purpose octree construction. The papers dis
cussed above build each level of the tree sequentially,
[6] introduces a general purpose algorithm for con
structing an entire tree in parallel and extends Z curve
approach to the various types of spatial trees. The gen
eral idea of this approach is to construct a binary radix
tree, and then convert it to an octree.
The previous approaches for constructing general
purpose octrees work with point sets. When you treat a
Z index of a point as a unique node id, a list of points,
sorted according to their Z indices, is also arranged in
a way that all points from a node present a continuous
sequence. Thus in previous approaches each point
could only be included inside a single node, allowing
construction of SR data structures (BVH, kd, octree).
But it is often beneficial to store objects of non zero
size, for example triangles, splatted photons or IC
records in a MR data structure. Our algorithm extends
[6] approach to objects of non zero size for MR octrees
and handles an arbitrary primitive type.

209

The idea of using multiple references on a GPU
was originally mentioned at [7] where sparse grid data
structure was used for collision detection purposes.
The size of each grid cell was set to twice maximum size
of the biggest object and thus having N objects one can
never gain more than 8*N references. This allows to
write each reference to it’s particular memory location.
However, the existent limitations of cell size may result
to poor search performance when objects with different
sizes are presents in the input data set. Good example of
such dataset is an Irradiance Cache records set.

the photon mapping. Authors underline that different
applications require different hash table features.
To avoid collisions during look up procedure [13]
sort (key, value) pairs by key after the hash table is con
structed. To locate nearest photons in a given point the
key for that point should be evaluated and search
should be performed in 27 neighbouring cells. [13]
notes that hash tables are fast for caustic photons, but
with increasing of number of gathered photons (more
than 40) they can’t outperform kd tree for a global
photon map. We will show that our method outper
forms recursive solution for more than 640 photons.
[14] uses reduced spatial hashing technique to sto
chastically store a single photon instead of saving full
list of photons. This technique is very fast for building
hash tables and query photons, but as shown in [15]
this degrades overall quality and leads to the necessity
to trace more photons. The photon tracing time may
end up being much higher than the tree construction
time. This usually happens when a lot of photons
emitted from the light never reach visible areas (con
sider a sun shining through a window). [16–18], sug
gests to rasterize photon volumes in conjunction with
the hardware alpha blending to accumulate photons
instead of building an acceleration structure; this tech
nique works in image space and can be used for pri
mary visible surfaces.
Although in the last decade there was a lot of
research around GPU photon mapping and final gath
ering procedure in particular, a stable and fast algo
rithm for building acceleration structures for fast pho
tons query is still required. The proposed method is
close to a theoretical optimum for gather operation
and it don’t degrade the convergence of the photon
mapping algorithm.

2.4. GPU Photon Mapping

3. SUGGESTED APPROACH

GPU photon mapping is a popular topic for
research. One of the most complicated problem in this
topic is a fast gathering operation which usually leads
to building acceleration structures. The earliest work
[8] introduce 2 grid based methods. The first method
is a multipass technique that uses fragment programs
to directly sort the photons into a compact grid. The
second method uses a single rendering pass combining
a vertex program and the stencil buffer to route pho
tons to their respective grid cells, producing an
approximate photon map. The disadvantage of both
methods—high memory consumption due to using reg
ular data structure. [9] introduces GPU Kd tree con
struction algorithm and k nearest neighbour search. [10]
uses BVH with the fixed gathering radius. The main
disadvantage of these approaches is a poor gather per
formance because of high branch divergence during
recursive (stack based) tree traversal.
[12] introduces a GPU cuckoo hashing technique
but didn’t investigate their method in application to

Similar to [1] (where IC implementation relies on
MR Octrees) instead of seeking points inside a sphere
with center C and radius R, all points are transformed
to spheres with radius R and look up procedure is per
formed for all spheres that overlap point C.
The suggested approach builds MR octrees with a
fixed depth K. All nodes operated by the algorithm are
leaves and non leaf elements are not stored.
Consider a regular grid of a fixed resolution defined
by an octree depth K. Each cell is uniquely enumer
ated by a Z index of cell’s center. It is possible to vox
elize a list of objects by taking their axis aligned bound
ing boxes and checking cell object overlapping (Fig. 2,
left) of all cells in the grid. Each cell can have several
objects intersecting with it thus keeping all pairs (cell
Z index, object id) and appending them to a buffer
provides all entries of a MR octree. Sorting this list by
Z index, allows constructing a MR octree in a several
steps.
The algorithm can be defined by pseudo code:

2.3. Multiple References on a GPU

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 40

No. 4

2014

210

FROLOV et al.

Fig. 1. Single Reference (left) and Multiple Reference
(right) Octrees as described at [1]. Each arrow represents
one reference.

Data: objects
Result: nodes, refs
refs ← Append(Voxelize(objects)); // 1
refs : sort(compareByZindex);
// 2
nodesAppend(Unique(refs));
// 3
nodes : sort(compareByiIndex);
// 4
return(nodes; refs);
Algorithm 1: Build MR octree.
Where 'ZIndex(cell)' is 3D Morton Code evalua
tion function. The append operations is implemented
in CUDA via atomicAdd and a global counter; sorting
is performed via thrust library.
On the third step we remove duplicates similar to
[6] to form node array. But in contrast to [6] we use
append/sort combination instead of parallel compac
tion to avoid allocation of additional array with size
equal to reference count. Data types we used during
construction presented in listing 1.
It is convenient to use Structure Of Arrays (SOA)
data layout for the Reference structure, so that when
the octree construction is complete, it is possible to
free the Z index array and reduce memory pressure.
The produced final reference fits in one 32 bit integer.
type Reference is record
zindex : int32; — Mortone Code
dataindex : int32; — data offset
end record;
type Node is record
zindex : int32; — Mortone Code
refStart : int32; — first ref
lefEnd : int32; — last ref
end record;
Listing 1
3.1. Limitations
The limitation of this approach is the fixed octree
depth K mentioned earlier. For photon mapping with
fixed gathering radius this is not a problem because it’s
possible to compute k from the current gathering

Fig. 2. Voxelizing objects on regular grid (left) arid sparse
set of voxels overlapping objects (right): the list of
appended references—bottom. Each arrow represents one
reference.

radius as the leaf size should be approximately of the
same size. But for IC a fixed leaf size may lead to poor
lookup performance (if the depth is selected too low)
or high memory consumption (if the depth value is too
high). To have the same look up performance compa
rable with a traditional MR octree (where leaves can
be located at various depth levels) we have to select a
leaf size in the range of [1/2, 1.0]*(average validity
radius size). In this case one IC record may produce
from 15 to 30 references. In comparison with the tra
ditional MR Octree is common to produce up to
10 references for one IC record, so the increased ref
erence count is within reasonable limits. With the sug
gested approach, for 100 K records an octree reference
array takes 100 K*30*sizeof(int) 12 MB.
4. RESULTS
We accelerate gathering procedure 2–5 times in
comparison with a single reference octree (Fig. 3).
The specific acceleration depends on the gathering
radius (or the same, gathered photon number). Our
algorithm builds MR octree for 1 million splatted
photons in 20–50 ms on GTX680 and consumes
100 MB on average during construction (Table 3). The
specific build time and memory consumption depend
on the real object size. It is known that the time of
building a single reference octree is limited by the perfor
mance of sorting [6], so we took the time needed for sort
ing of 1 million points as the best possible time for build
ing single reference octree by [6] method (Table 3).
Figure 2 shows comparison of our (mr gather) with a
classic (sr gather) octree approach for different number
of gathered photon. Measured for 1024 × 1024 primary
rays for Sponza scene and global photon map.
Table 1 shows time statistics for different steps of
our algorithm in per cent of total construction time.
For global photon map (with bigger gather radius) and
caustic. To have better cache coherency when gather
photons we additionally sort photons with their
Z indices before our algorithm starts (step 0). Row
“other*” shows total overhead for alloc/free opera

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 40

No. 4

2014

MULTIPLE REFERENCE OCTREES FOR A GPU PHOTON MAPPING

tions, CUDA API calls, copying kernels and Z indiees
evaluation.
Figure 3 shows percentage of time taken by differ
ent stages depend on number of gather operations (in
millions). Measured for Cornell Box with water.
Table 2 shows relative time of different stages. Col
umn “alg” indicates rendering method; column “ph t”
shows photon tracing time; “octree”—tree construc
tion time; “ray t” ray tracing time and “gather” pho
tons gathering time. For each architectural scene with
“FG+IC” or “FC” technique we accumulate 1 M
photons in a buffer and then perform Final Gathering
in records of irradiance cache. Our irradiance cache
implementation is done similarly to [20] except that
we use our GPU construction algorithm for the IC
octree. For scenes with caustics we used SPPM for
caustic evaluation and the path tracing for other
effects. To speed up rendering of the original Cornell
Box (cornel1) we traced 8 paths per pixel from eye per
one photon pass and 2 paths per pixel for the Cornell
Box with water (cornel2). This is a simple and effective
balance technique for scenes with caustics when using
hybrid rendering approach (Path Tracing + SPPM for
caustics). This technique balance computational
resources between caustics and a the rest of image.
Table 3 shows per pass data for 1 M photons. Mea
sured for 1024 × 1024 and one path per pixel. The col
umn “refs” show references amount were generated
during construction; “memory” shows memory con
sumption of our construction algorithm for 1 million
photons; “garth (mr)” and “garth (mr)” are timings
taken by gather operation for multiple reference and
single reference octrees respectively; build(sr) and
build(mr) show construction time for single and mul
tiple reference octrees; “ph t” represent real photon
tracing time and the “p.t (opt)”—the lower bound for
the optimized photon tracing time as if all emitted
photons were stored in the photonmap. “p.t (opt)” =
“p.t”*(accumulated photons/emitted photons).
In comparison with the single reference octree
solution for the photon mapping, where all photons
are represented by points, our tree construction is 10–
20 times slower on average. This is not unusual
because MR octree essentially has 15–30 duplicates
of a reference. But gathering operation is 2–5 times
faster (Table 3, Fig. 3). Table 1 show detailed statistics
for different steps of our algorithm. We show below
that the tree building slowdown is not essential for the
overall render time, so gathering speed up is more
important.
In any single reference tree or hash tables gathering
consists of two repeating steps:
1. look up next cell in hash table (or traverse to next
leaf in three)
2. loop over all photons inside a cell (or leaf) to
perform gather operation
Opposite to that, the MR octree gather pass is very
lightweight on step 1, that is performed only a single
PROGRAMMING AND COMPUTER SOFTWARE

211

Time in ms
100
90
80
70
60
50
40
30
20
10
0
mr gather
sr gather

5
1.8
9.8

10
2.1
11.2

20
3.4
13.3

40
5.1
17.5

80
8.2
21.2

160
11.7
26.3

320
16.7
36.6

640
51.7
88.9

Fig. 3. Comparison of our (mr gather) with a classic (sr
gather) octree approach for different number of gathered
photon. Measured for 1024 × 1024 primary rays for Sponza
scene and global photon map.

time, and is completely limited by step 2, that is done
in a single loop with no branches. Thus we believe
MR octrees are close to the theoretical maximum of
performance on gather procedure. The finite system
performance is not bound by the octree construction
(Tables 2, 3; Fig. 4). As can be seen on the plot from
Fig. 4, as the number of gather operations (with a fixed
radius) increases (due to higher resolution or super
sampling for example) the relative cost of construction
a MR octree dramatically decreases. On the other
hand using MR octree instead of SR octree acceler
ates gathering passes in 2x.
4.1. Discussion
When using hybrid rendering approaches (simple
combination of path tracing and SPPM like our
implementation does or methods similar to [21] and
[22]) or final gathering, the rendering time depends on
many parameters and can be estimated as:
Table 1. Time statistics for different steps of our algorithm
in per cent of total construction time. For global photon
map (with bigger gather radius) and caustic. To have better
cache coherency when gather photons we additionally sort
photons with their Z indices before our algorithm starts
(step 0). Row “other*” shows total overhead for alloc/free
operations, CUDA API calls, copying kernels and Z indi
ces evaluation

Vol. 40

s. number
0
1
2
3
4

No. 4

s.name
sort photons
append refs
sort refs
append nodes
sort nodes
other*
2014

g.map

c.map

7.6%
21%
27%
20%
4.8%
20%

15%
11%
21%
5.2%
12%
35%

212

FROLOV et al.

Table 2. Relative time of different stages. Column “alg” indicates rendering method; column “ph t” shows photon tracing
time; “octree”—tree construction time; “ray t”—ray tracing time and “gather”—photons gathering time. For each archi
tectural scene with “FG + IC” or “FG” technique we accumulate 1 M photons in a buffer and then perform Final Gath
ering in records of irradiance cache. Our irradiance cache implementation is done similarly to [20] except that we use our
GPU construction algorithm for the IC octree. For scenes with caustics we used SPPM for caustic evaluation and the path
tracing for other effects. To speed up rendering of the original Cornell Box (cornell) we traced 8 paths per pixel from eye
per one photon pass and 2 paths per pixel for the Cornell Box with water (cornell2). This is a simple and effective balance
technique for scenes with caustics when using hybrid rendering approach (Path Tracing + SPPM for caustics). This tech
nique balance computational resources between caustics and a the rest of image
scn
sponza
class
museum
cornel1
cornel2
ring
water

alg

ph t

octree

ray t

gather

total

IC + FG
FG
IC + FG
SPPM
SPPM
SPPM
SPPM

0.6%
0.5%
0.5%
50%
59%
82%
78%

0.2%
0.2%
0.1%
5%
11%
6.6%
6.3%

72%
76%
83%
28%
22%
8.4%
6.7%

27%
23%
16%
17%
8%
3%
9.0%

25 s
62 s
94 s
4.0 s
1.7 s
3.3 s
4.0 s

T = N ∗ (P + B) + M ∗ (G + R)

4.2. Conclusion and Contribution

• N—total number of processed photons,
• P—photon tracing time per one photon,
• B—octree build time per one photon,
• M—number of gather paths; i.e. number of paths
traced per pixel multiplied with image resolution,
• G—time taken by gather procedure per 1 path,
• R—time taken by ray tracing per 1 path.
Our algorithm increases B and reduces G. Thus, in
comparison with single reference trees or hash tables,
it reduce rendering time when M*G becomes several
times bigger than N*B (Fig. 4). But opposite to [14]
our algorithm does not influence on N*P member that
may be significant for industrial scenes (Table 3).
In % of render time
45
40
35
30
25
20
15
10
5
0 1 2
4
8
mr gather(in %)
sr gather (in %)
mr build (in %)

3.4
8.6
12.5

5.9
14.1
10.7

9.1
20.8
8.2

12.5
27.2
5.6

We introduce a general purpose Multiple Refer
ence Octree construction algorithm for GPUs that is
capable to work with objects of a non zero size. Oppo
site to [6] we sort cell references instead of points and
we store in memory only leaves of a fixed depth K.
We don’t store any other nodes and thus don’t have to
perform parent search (step 7 of the [6] algorithm).
Unlike [7], our algorithm is capable to work with any
relations between objects and node size. Compared to
the previous approaches for building trees or other
spatial data structures on the GPU, the proposed algo
rithm is simple and straightforward to implement.
By using MR Octrees, we accelerate photon gath
ering 2–5 times in comparison with a Single Refer
ence Octree. We have shown (Table 2, Fig. 4) that for
high quality rendering it is much more important to

16
15.4
32.1
3.4

32
17.4
35.4
1.9

64
18.6
37.3
1

128
19.2
38.3
0.5

256
19.6
38.8
0.2

512
19.8
39.1
0.1

1024
19.9
39.2
0

Fig. 4. Percentage of time taken by different stages depend on number of gather operations (in millions). Measured for Cornell
Box with water.
PROGRAMMING AND COMPUTER SOFTWARE

Vol. 40

No. 4

2014

MULTIPLE REFERENCE OCTREES FOR A GPU PHOTON MAPPING

1 M photons

10 M photons

Final Gathering

213

10 M caustic photons

Fig. 5. Column A shows direct photonmap visualisation for 1 pass (i.e. 1M photons). B—the same for 10 passes(10 M photons)
of SPPM [22]. C—the resulting image we computed using Final Garthering and Irradiance Caching techniques(except class
scene where FG was without IC). Column D show images rendered with 10 passes (10 M photons) of SPPM for caustics. Both
Final Gathering and caustic map were used for the last scene (water).

Table 3. Per pass data for 1 M photons. Measured for 1024 × 1024 and one path per pixel. The column “refs” show refer
ences amount were generated during construction: “memory” shows memory consumption of our construction algorithm
for 1 million photons: “garth (mr)” and “garth (mr)” are timings taken by gather operation for multiple reference and sin
gle reference octrees respectively: build (sr) and build (mr) show construction time for single and multiple reference octrees:
“ph t” represent real photon tracing time and the “p.t (opt)”—the lower bound for the optimized photon tracing time as
if all emitted photons were stored in the photonmap. “p.t (opt)” = “p.t”* (accumulated photons/emitted photons)
scene
sponza

tri.
66 K

refs

leafes

mem.

p.t

p.t (opt) build (mr) build (sr) gath (mr) gath (sr)

10 M

418 K

115 MB

155 ms

22 ms

46 ms

3.0 ms

3.1 ms

14 ms

cry sponza

262 K

7.7 M

88 K

88 MB

410 ms

41 ms

40 ms

3.1 ms

3.6 ms

16 ms

class

653 K

9.9 M

1.1 M

110 MB

160 ms

23 ms

48 ms

3.1 ms

7.7 ms

30 ms

museum

1.4 M

11.2 M

327 K

128 MB

470 ms

26 ms

52 ms

3.2 ms

4.4 ms

16 ms

cornel1

12

2.3 M

74 K

26 MB

200 ms

20 ms

20 ms

3.0 ms

3.0 ms

6.1 ms

cornel2

19 K

2.3 M

99 K

26 MB

100 ms

37 ms

18 ms

3.0 ms

3.2 ms

7.1 ms

ring

18 K

2.0 M

57 K

23 MB

174 ms

27 ms

24 ms

3.0 ms

11 ms

29 ms

water

98 K

2.0 M

15 K

22 MB

310 ms

35 ms

25 ms

3.1 ms

36 ms

78 ms

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 40

No. 4

2014

214

FROLOV et al.

speed up gathering rather than tree (or hash table)
construction.
ACKNOWLEDGMENTS
This work was supported by the Russian Foundation
for Basic Research, project no. 12 01 00560, and by the
RF Presidential Grant, project no. SP 4053.2013.5.
REFERENCES
1. Krivanek, J. and Gautron, P., Practical global illumina
tion with irradiance caching. Synthesis lectures in com
puter graphics and animation, Morgan and Claypool
Publishers, 2009, ISBN: 1598296442, 978 1598296440.
2. Morton, G., A computer oriented geodetic data base
and a new technique in file sequencing, International
Business Machines Company, 1966.
3. Ajmera, P., Goradia, R., Chandran, S., and Aluru, S.,
Fast, parallel, GPU based construction of space filling
curves and octrees, Proceedings of the 2008 Symposium
on Interactive 3D Graphics and Games, I3D '08, New
York, NY, USA: ACM, 2008, 10:1–10:1.
4. Zhou, K., Gong, M., Huang, X., and Guo, B., Data
parallel Octrees for Surface Reconstruction, IEEE
Transactions on Visualization and Computer Graphics,
2011, vol. 17, no. 5, pp. 669–681.
5. Crassin, G. and Green, S., Octree based sparse voxel
ization using the GPU hardware rasterizer, OpenGL
Insights, CRC Press, Patrick Cozzi and Christophe
Riccio, 2012.
6. Karras, T., Maximizing parallelism in the construction
of BVHs, octrees, and k–d trees, EGGH HPG'12 Pro
ceedings of the Fourth ACM SIGGRAPH, Eurographics
Conference on High Performance Graphics, 2012,
pp. 33–37.
7. Le Grand, S., Broad phase collision detection with
CUDA, GPU Gems 3, Addison Wesley, 2008, pp. 697–
721.
8. Purcell, T.J., Donner, G., Cammarano, M., et al., Pho
ton mapping on programmable graphics hardware,
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
Conference on Graphics Hardware, Eurographics Asso
ciation, 2003, pp. 41–50.
9. Zhou, K., Hou, Q.,Wang, R., and Guo, B., Real time
KD tree construction on graphics hardware, ACM
Trans. Graph., 2008, vol. 27, no. 5, pp. 126:1–126:11.
10. Fabianowski, B. and Dingliana, J., Interactive global
photon mapping, Computer Graphics Forum, 2009,
vol. 28, no. 4, pp. 1151–1159.

11. Garanzha, K., Pantaleoni, J., and McAllister, D., Sim
pler and faster HLBVH with work queues, Proceedings
of the ACM SIGGRAPH Symposium on High Perfor
mance Graphics. HPG '11, New York, NY, USA: ACM,
2011, pp. 59–64.
12. Alcantara, D.A., Sharf, A., Abbasinejad, F., et al.,
Real time parallel hashing on the GPU, ACM Trans.
Graph., 2009, vol. 28, no. 5, pp. 154:1–154:9.
13. Fleisz, M., Photon mapping on the GPU, Master of
Science, 2009, vol. School of Informatics, University of
Edinburgh, no. 1, pp. 1–60.
14. Hachisuka, T. and Jensen, H.W., Parallel progressive
photon mapping on GPUs, ACM SIGGRAPH ASIA
2010 Sketches, SA '10, New York, NY, USA: ACM,
2010, pp. 54:1–54:1.
15. Carlberg, K., Stochastic progressive photon mapping
using parallel hashing, Master Thesis, 2011, vol. Lund
University, no. 1, pp. 1–51.
16. McGwire, M. and Luebke, D., Hardware accelerated
global illumination by image space photon mapping,
Proceedings of the 2009 ACM SIGGRAPH/EuroGraphics
Conference on High Performance Graphics, New York,
NY, USA: ACM, 2009.
17. Gautron, P., Krivanek, J., Bouatouch, K., and Pat
tanaik, S.N., Radiance cache splatting: A GPU
friendly global illumination algorithm, Rendering Tech
niques, Deussen, O., Keller, A., Bala, K., et al., Eds.,
Eurographics Association, 2005, pp. 55–64.
18. Mara, M., McGwire, M., and Luebke, D., Toward
practical real time photon mapping: Efficient GPU
density–estimation, Interactive 3D Graphics and Games
2013, 2013.
19. Aila, T. and Laine, S.,Understanding the efficiency of
ray traversal on GPUs, in Proceedings of the Conference
on High Performance Graphics 2009, New Orleans,
Louisiana, 2009, S. N.
20. Frolov, V., Vostryakov, K., Kharlamov, A., and Galak
tionov, V., Implementing irradiance cache in a GPU
realistic renderer, Trans. on Comput. Sci. XIX, LNCS
7870, 2013, vol. 7870, no. 1, pp. 17–32.
21. Doidge, I.C., Jones, M.W., and Mora, B., Mixing
Monte Carlo and progressive rendering for improved
global illumination, Vis. Comput., 2012, vol. 28, no. 68,
pp. 603–612.
22. Hachisuka, T., Pantaleoni, J., and Jensen, H.W., A path
space extension for robust light transport simulation,
ACM Trans. Graph., 2012, vol. 31, no. 6, pp. 191:1–
191:10.

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 40

No. 4

2014


Related documents


PDF Document multireot
PDF Document algorithmendercomputergraphik
PDF Document photonic integrated circuit pic market
PDF Document global photonic crystals market research report 2017
PDF Document solar solar
PDF Document motorola photon 4g video converter


Related keywords