Fast Inverse Reflector Design (FIRD)

This paper presents a new inverse reflector design method using a GPU‐based computation of outgoing light distribution from reflectors. We propose a fast method to obtain the outgoing light distribution of a parametrized reflector, and then compare it with the desired illumination. The new method works completely in the GPU. We trace millions of rays using a hierarchical height‐field representation of the reflector. Multiple reflections are taken into account. The parameters that define the reflector shape are optimized in an iterative procedure in order for the resulting light distribution to be as close as possible to the desired, user‐provided one. We show that our method can calculate reflector lighting at least one order of magnitude faster than previous methods, even with millions of rays, complex geometries and light sources.


Introduction
This paper presents a new method for a GPU-based computation of outgoing light distribution for inverse reflector design. When manufacturers need to produce a desired illumination, they sometimes do not know what shape the reflector must be. The usual solution is an iterative process, where a set of reflectors are manufactured and tested. This process is usually carried out in a very empirical way by experienced users that follow a trial and error procedure. This has a high manufacturing cost, both in materials and time.
In recent years, some research has been done in this field. Some works propose local lighting solutions, or define a very restricted set of possible reflectors, such as the families of parabolic reflector. Other solutions are based on global lighting simulation, but they have high computational costs, requiring hours or days to compute a reflector that produces an illumination distribution reasonably close to the desired one. However, these algorithms are not able to work with complex real world reflector shapes.
We propose a method that computes, from a family of possible reflectors, the best approximation to a given desired illumination distribution. A very fast GPU algorithm to calculate the reflected rays on the reflector is used to speed up the optimization process. We are able to compute reflector outgoing light distribution using millions of rays and highly complex reflector shapes in a couple of seconds. The set of reflectors is generated using a parametrizable basis. These parameters are optimized in an iterative process until the best solution is reached.
The rest of the paper is organized as follows. We discuss previous work in Section 2. We present an overview of our method in Section 3, we present the fast reflection method in Section 4, and explain the optimization method in Section 5. Then we show the results in Section 6 and discuss them in Section 7. Conclusions are presented in Section 8.

Previous Work
Our method is based on two main research areas: inverse reflector design and ray tracing on the GPU.
The first problem to solve in this paper can be explained in the context of inverse illumination problems, where we know the desired illumination, and we have to compute some of the parameters that produce it. In this case, we have to find the reflector shape that produces the desired light distribution. This kind of problem can be classified as an inverse geometry problem (IGP) [PP05]. To solve the IGP numerical problems, we can use local or global illumination methods. In [CKO99] a combination of parabolic reflectors is used to compute the local illumination. In [Neu97] a simple spline combined with local illumination is used to perform a local optimization. Unfortunately, these methods are useful only for really simple configurations. In [PPV04] and [PPV07], a method that uses global illumination is presented. It starts from an initial reflector mesh and moves the mesh vertices in an iterative process, until the generated light distribution is close enough to the desired one. The main disadvantage of this method is the high computational cost, which depends on the number of tested reflectors, the reflector mesh resolution and the number of rays traced for lighting computation. To improve the method we need to calculate the ray tracing of millions of rays on a highly complex reflector shape in a fast way.
Whe can employ several GPU methods to calculate ray tracings. On one hand, we do not have a complex generic scene, so we do not need a full engine [CHH02] or approximated ray tracing techniques [UPSK07]. On the other, acceleration methods based on space partitioning are more interesting in our case, because we can store the reflector geometry into a hierarchical subdivision structure. Several methods have been proposed to traverse the rays through this kind of structure. A fast algorithm is presented in [RUL00], where the geometry space is subdivided into an octree. This is a top-down algorithm where the voxel is selected according to ray parameters in tangent space. However, this is a CPU based algorithm, and its implementation in GPU would require the use of a stack for each fragment.
Other GPU approaches in hierarchical structures are presented in [SKU08], where some techniques are presented to calculate displacement mapping, and the displacement textures are transformed into hierarchical structures. Related to them, there is the quadtree relief mapping (QRM) technique [SG06], based on relief mapping [POJ05]. Relief mapping is a tangent space technique that tries to find the first intersection of a ray with a height field by walking along the ray in linear steps, until a position is found that lies beneath the surface. Then, a binary search is conducted to precisely locate the intersection point. QRM is a variation that takes larger steps along the ray without overshooting the surface. This is achieved through the use of a quadtree on a height map. This will be described in more detail in Section 4.2.

Overview
The goal of our method is to obtain a reflector shape that produces a minimum error between the desired and resulting light distributions.
The method has two components. First, we present a fast algorithm to calculate the outgoing light distribution from a given reflector. Second, we optimize a set of possible reflectors, obtaining the one that minimizes the error metric.
As input data we have the light source, the desired outgoing light distribution, and a parametric reflector space. The light source is represented by a set of rays, each composed of an origin and a direction (called a ray set). The desired outgoing light distributions used in this paper are far-field representations, which are light distributions measured far from the reflector, so that only directional distribution of light matters. However, our algorithm can deal with more complex representations (e.g. near-field) as well.
Reflector light calculation occurs in three steps. The first one transforms the reflector geometry into a hierarchical height field, in order to efficiently trace rays in the GPU. This structure is stored in the GPU as a mip-map of floating point textures that represents a quadtree, where each node contains the maximum height of its child nodes. In the second step, the set of rays is traced through the height field, searching for intersections with the reflector. The algorithm also considers multiple ray bounces (specular BRDF) inside the reflector. The third step captures all reflected rays and creates a far-field distribution that is compared with the desired far-field, and an error value is generated. Note that once the light rays leave the light source, further collisions with it are ignored.
The overall algorithm is implemented using GPU shaders, where each GPU fragment processes a light ray. This results in a very fast algorithm that is able, even for millions of rays and complex reflector geometry shapes, to calculate the reflector lighting in less than 3 seconds, as shown in Section 6.
The optimization procedure searches for a minimum error among a set of possible reflectors in an iterative process, where each reflector parameter is optimized between two bounding values. Then, for each reflector, a far-field light distribution is generated and compared with the desired light distribution. After testing all possible reflectors, the one which provides the most similar light distribution to the desired one is chosen.

Reflector Lighting
Reflector light distribution can be calculated in three steps. After the input data is preprocessed, the first step transforms the reflector geometry to a hierarchical height field model. The second one calculates the ray reflections over the reflector. Finally, the results are compared with the desired illumination. Figure 1 shows the general scheme of reflector light calculation.

Preprocessing of the input data
The user-provided data is composed of the desired far-field illumination specification, the light source characteristics and the reflector holder dimensions. The far-field is given by an IES specification. This specification is established as an industry standard (IESNA [ANS02], EULUMDAT [bCL99]), and assumes large distances from the sources to the lighting environment, so spatial information in the emission of the light can be neglected, considering it as a point light source with a non-uniform directional distribution emittance model. The provided far-field only takes into account the vectors reflected from the desired reflector, discarding the direct rays from the light source. The light source specification provides the light source position and dimensions, and the near-field emittance description. Finally, the reflector holder is used to fit the reflector shape into a bounding box.
In this preprocessing step, a ray set is obtained from the light source. Next, we discard the rays we are sure will not intersect with the reflector bounding box. The non-discarded rays are stored in two textures, one for ray directions and another for ray origin positions.

Reflector geometry transformation
At this stage, we need to construct a representation of the hierarchical height-field of the reflector. The structure used is a quadtree represented by a mip-map height texture. Each quadtree node contains the maximum height of its child nodes (see Figure 2).
As mentioned previously, the method does not depend on reflector geometry complexity. The only restriction is that the reflector must be able to be manufactured through a pressforming process, where the reflector shape is deformed only in the vertical direction. More precisely, the shape must satisfy certain constructive constraints that require the shape of the reflector to be the graph of a function defined on a subset of the plane delimited by the reflector's border. That is, in our formulation, for the shape to be 'build-able', it must be a function of type z = f (x, y). At the left, t bound is the minimum displacement to quadtree node bounds t x and t y . At the right, t height is the displacement to the stored node height h. The final selected step is the minimum between both.
We calculate one orthogonal projection view from which all reflector geometry is visible. The view direction can be used as the pressing direction, so in our case, the Z axis matches the press-forming vertical direction. For our experiments, just fitting the viewport to the reflector front is good enough.
When the viewport is specified, the reflector is rendered, and then the hardware z-buffer is read, considering the Z component as heights. Then, a GPU shader creates the mipmap texture, where the highest map level is a texture with one texel that contains the maximum reflector height.
To avoid an excessive number of texels representing the background, we fit the mip-map texture into a tight bounding rectangle around the reflector. Therefore the mip-map texture is non-power-of-two size, which means the number of heightfield levels will depend on the largest texture dimension.
Finally, another GPU shader extracts the reflector normal vectors, and stores them in a second texture. These normals will be used later to calculate the reflection vectors.

Reflections computation
Ray tracing on the reflector is based on QRM method [SG06], which it is a variation of relief rapping in tangent space [POJ05]. QRM takes adaptive steps along the view rays in tangent space without overshooting the surface, due to the use of a quadtree on the height map. The goal is to advance a cursor position over the ray until we reach the lowest quadtree level, thereby obtaining the intersection point.
The method starts at the highest quadtree level, where the root node has the maximum height. The ray cursor displacement at this point is t cursor 0 = 0. To advance the cursor, the ray is intersected with the quadtree node bounds (see Figure 3, left-hand panel), and with the stored quadtree node height (see Figure 3 right-hand panel). There are two possible node bound intersections in tangent space: t x and t y . From them, we use the nearest one, called t bound . Also, an intersection called t height is obtained by intersecting the ray with the height value stored in the node. If t bound is greater than t height , it means that the ray intersects with the current quadtree cell. So, the quadtree level is decreased, and the process starts again with one of the four child nodes. In this case, the cursor does not advance, so t cursor i+1 = t cursor i . Otherwise, the cursor advances to the cell bound, t cursor i+1 = t bound , and the process starts again with the neighbour cell. This process stops when the minimum quadtree level is reached, or when the cursor position is out of the texture bounds. In Figure 4 there is an example of this algorithm.
In the QRM algorithm, the first cursor position is found by intersecting the view ray with the geometry bounding box. In our case, the first cursor position is the light ray origin (see Figure 4). This means that one more step is processed in comparison with QRM, because we need to intersect the root quadtree node in an initial step. However, we avoid the ray-bounding box intersection calculation that QRM performs.
On the other hand, QRM only processes rays going down the quadtree hierarchy, being unable to process the rays going up. This is the case when the light source is inside the reflector, or when more than one ray bounce inside the reflector are considered. We propose an intersection search algorithm going up the quadtree hierarchy. The pseudo-code for the new algorithm, called RQRM, is presented in Algorithm 1. The original algorithm assumes that the cursor always advances in the opposite direction to the height map direction. Otherwise, QRM discards the ray because it does not intersect with the surface. To solve this case for reflections, we start the algorithm from the highest quadtree level using the new ray, composed of the current intersection point and reflection direction. A small offset is applied as initial cursor displacement to avoid self-intersections, thus t cursor 0 = δ (see lines 4-7 of Algorithm 2). Then, we go down through the quadtree until t cursor i > t height (see Algorithm 3 for tangent space bound calculations), which means the height of the current cursor position is above the current node height. Now, we are sure there are not any nodes under the current one that have a height that intersects with the ray. Hence, we jump to the neighbour node, so t cursor i+1 = t bound , and increase the quadtree level. If t cursor i < t height then there is not any possible intersection under current level. Thus, we decrease the current quadtree level, and do not update t cursor i (see lines 8-14 of Algorithm 4). The process stops when the intersection is reached, or when the cursor position falls out of the texture bounds (see lines 6-10 of Algorithm 1). In the second case, it is a reflected ray with no more bounces, and it is stored as an outgoing ray. In Figure 5 there is an example of this algorithm.
The algorithm is implemented in a GPU fragment shader. The ray set data is provided by the previously stored ray set textures. The textures are mapped into a quad, so each ray corresponds to a fragment. Each fragment program runs an iterative process that ends with an intersection point and a reflection vector. These values are stored in two output textures, one for the intersection positions, and the another one for the reflected directions. This shader is executed as return level many times as the maximum number of allowed bounces. The resulting textures are used as input textures for the next execution, thus a GPU ping-pong approach is used.

Comparison with a desired distribution
At this step we compare the obtained light distribution with the desired one. Both distributions are converted to far-field to be compared in the same domain (see Figure 6).  To convert the reflected rays to a far-field distribution, a regular grid is used to classify the ray directions. Each grid cell represents a pair of azimuth and altitude vector directions in horizontal coordinate system, and contains the number of rays in this direction. The total azimuth and altitude ranges are [−π . . . π] and [π/2 . . . − π/2], respectively. The grid size depends on the specified far-field directional space dis-cretization. We use two textures to store both grids, where each texel represents a grid cell.
We classify the reflected directions by calculating a histogram, where each interval represents a grid cell. The algorithm, based on [SH07], has two steps. First, after the last reflection step the results are stored into a vertex buffer object. Next, this vertex buffer is rendered, and a vertex shader classifies the directions by calculating the fragment coordinates for each reflected direction. Then, the fragment shader gathers the directions using counters and the hardware blending.
We use the same algorithm to classify the desired distribution. In this case, we do not have to use a counter because each far-field directional component has its respective emitted energy. To use the same measurement units, both the number of reflected rays and energy (usually in candelas) for each cell are transformed to lumens.
The comparison between both textures is done with a shader that calculates, for each fragment, the l 2 error metric: In addition, a reduction shader is used to calculate the summation part of the formula.

Optimization
To obtain a reflector shape that produces a light distribution close to the desired one, we optimize the parameters used in the parametric reflector shape definition. For each optimization step, a new reflector shape is obtained, and the outgoing light distribution is compared with the desired one. If the difference value is below a user-specified threshold, the process stops. If no reflector produces a light distribution close enough to the objective, the best one is chosen. Figure 7 shows the overall scheme of the optimization algorithm.
We use a standard optimization method, where for each parameter, a range and a constant step are specified. The algorithm is an iterative process where each parameter is increased inside a given range by its step value [PPV04]. The mip-map height texture must be regenerated at each iteration, due to reflector geometry changes. Hence, for each iteration we have to recalculate the outgoing light distribution. However we do not have to recalculate the ray set for each reflector, so the initial ray intersection step on the reflector bounding box guarantees that the ray set is valid for any reflector inside this box (see Section 4.1).

Method calibration
To test the accuracy of method, we performed a preprocessing step, where the lighting simulation was calculated several   We observe that the variance error decreases when the ray set increases. On the ray set of 1 million rays, the mean error is quite good, so we can use this ray set to perform the optimizations. The last row shows the calibration values to consider the goodness of our method.
Moreover, we are interested in knowing the minimum optimization parameter step required to consider that two consecutive measures are different. For this, we use the semivariogram [Ole99], a statistical measure that assesses the average decrease in similarity between two random variables as the distance between the variables increases. The measure defines a lag called range, at which the semi-variogram reaches a constant value. We can consider that two measures separated by a distance larger than the range are stochastically independent, so the range is equivalent to the notion of influence of an observation. That is, if we want to get significant measurements without being influenced by statistical noise problems, we can not take measurements that are closer than the range. From this, we can find a lower bound for the step size in the optimization process.
The semi-variogram is defined as follows: Given two locations x and x + h inside the domain of a random function Z, the semi-variogram is where n(h) is the number of pairs of measurements at a distance h apart. Figure 8 presents the semi-variogram for one of the parameters of Model A. From this graph we can see that the range of the semi-variogram is about 0.1. So, we have to use values larger than 0.1 as the step size for this parameter in the optimization process. We computed this lower bound for every degree of freedom included in the optimization process.

Results
We have tested our method with three cases. The first one, called Model A, uses a cylindrical light source with a cosine emittance along its surface, except for the caps that do not emit light. The cylinder dimensions are 4.1mm in length and , also in mm. The second case, called Model B, uses a spherical light source with a cosine emittance. It has a radius of 0.5 mm, and it is placed at (5, −5, −5) inside a holder bounding box located between (0, −10, −6) and (10, 0, 0). The third one, called Model C, uses a spherical light source with a cosine emittance. It has a radius of 1 mm, and it is placed at (5, 5, 0) inside a holder bounding box located between (0, 0, −6) and (10, 10, 0). The cross sections of the three cases and light source relative positions are shown in Figure 9. For Models A and C, the light sources emit 10 million rays, and 5 million rays for Model B. All of them have an overall energy of 1100 lumens. Also, for all cases, the mip-map height texture resolution is 1200 × 800, and a quadtree is created with nine subdivision levels. In the figures, both far-field and difference images are represented by false-colour histograms. These histograms are defined like far-field textures, thus the columns of the texture grid correspond to horizontal angles, and the rows correspond to vertical angles. The directional space resolution is 1800 × 900 for horizontal and vertical angles, respectively. Therefore, each histogram cell represents an angle of 0.2 • × 0.2 • . The colour scale represents the amount of energy for each histogram cell. Table 2   and all the rays intersect the height map. The time needed to compute the reflector lighting depends on the number of effective rays and the number of maximum allowed bounces. All models have a similar number of effective rays, but Model A has the lower computation time because only one bounce is specified. The optimization time depends on the reflector lighting time and the number of tested reflectors, and the number of tested reflectors depends on the number of optimizable parameters and on the range and offsets applied in the optimization procedure.  because all the models use the same mip-map height texture resolution. The intersection search time depends on the number of traced rays, on the maximum number of allowed bounces and on the height map levels. Figure 13 shows the progress in the optimization process The results are very similar between the three different models, because they have the same height map texture sizes (thus, the same number of quadtree levels), and the number of traced rays is also similar. The GPU parallel processing involves a linear computational cost on ray set size. Therefore, the most important factor in the intersection search procedure is the maximum number of allowed bounces. Finally, the error calculation has similar times for all cases, since the outgoing textures have the same size.

Discussion
As is shown in the previous section, we cannot obtain the desired reflector with zero error. This is because the opti- mization algorithm tests different parametrized reflectors by changing the parameter values in a constant step size and in a floating point space. On the other hand, we can improve the results by optimizing in very small steps, thereby guaranteeing convergence to a better solution, but this would strongly affect the processing times. Also, the semi-variogram gives us a lower bound to the step size for each parameter in the optimization.
The most time consuming part of our method is the intersection search algorithm. If we use a very refined height map, we will need more time to traverse the ray through the quadtree. If we wanted to manage very complex reflector shapes, we would need height maps with high resolutions. Therefore, we need to work to find a compromise between time costs and quality of results.

Conclusions and Future Work
We have presented a method for the inverse reflector design problem that improves on previous approaches. From a wide set of parametrized reflectors, the one that best approximates a given desired illumination distribution is found. The method is based on a very fast GPU algorithm that calculates the reflected rays on the reflector (with one or more bounces) in 2-3 seconds, using millions of rays and highly complex reflector shapes. The reflector parameters are optimized in an iterative process until the generated light distribution is close enough to the desired one.
We consider, as future work, the use of better optimization methods, e.g. adaptive methods, so the desired reflector can be obtained faster. Another line of research is optimization based on a combination of predefined complex reflector shapes, which can be stored as texture masks.