Shaders for Artists

Introduction
As the fidelity of graphics increase, as there are more and more sheets and models and polygons an artist must make to strive towards their intended graphical outcome, one can often feel lost in a sea of terms that we don't understand. Everyone knows what a normal map is, but how does it work? What is gloss? What does a specular texture do?

Most of us know what a normal map looks like, we know what gloss does, we know that specular means highlights. But its been my experience that actually knowing something about these techniques, getting under the hood of shaders (the things that drive the actual rendering, and all we are really concerned with as modelers and texture-artists), has increased my ability many times over.

I'm going to ignore the traditional way of learning vertex and pixel shaders; that is, doing your rendering with a vertex shader, and then showing you how to do it so much better with a pixel shader. As artists, this is mostly pointless.

What is a Shader?
Everyone has heard of shaders, or shader-based engines or software, but most people don't know what they are, or what the craze is. The most succinct explanation I can think of for a shader is:

A shader takes something, does stuff to it, and gives you something else.

Depending on who you are, that is either the most mundane explanation, or the most intriguing explanation. Obviously, programmers find it intriguing. And I hope you will too.

Shaders come in two sorts. Vertex shaders, and pixel shaders. They go together hand in hand, but it is pixel shaders that are where the magic happens.

The GPU/Rendering Pipeline
Given the definition of the shader above, it would be useful to briefly visit the rendering pipeline. Since we are concerned with shaders, we will be looking at the "programmable pipeline," as opposed to the now obsolete fixed-function pipeline. The GPU is what is known as a parallel processor: it receives information and operates on it. What it can do, however, is somewhat limited- it is a one-way street, in that the GPU cannot backtrack and reuse info. For example, even if we light an object with the same normal map in three passes, the normal must be calculated for each pass. The second and more serious limitiation is that the GPU cannot talk to other information it is processing. If we think of the GPU as a stack of tubes, information passes in one end and goes out the other- there is movement only one way and the tubes cannot cross or interact with each other. This means that you cannot get info for other vertices or pixels, etc.

It is important to understand at least in such a summary sense the rendering pipeline. It will, hopefully, help the rest of the article to make more sense. In some sense all of what follows are simply elements that are used in a single tube of many- these operations go on for every vertex and every pixel in near-unison but also near-isolation.

Integers, Floats, and Vectors

 * For more info, see Data Types

This is simple. A float is a real number with a decimal point. It contains a certain number of bits of information which limit its accuracy, but don't worry about it. So, a float2 are two real numbers, and a float3 are three real numbers, and a float4, four real numbers.

Integers are whole numbers (0, -2, 1035, etc.). But we won't worry about integers- for our purposes, we will only deal with floats. Another name for a float is a scalar value. "4." and 0.0374832 are both floats. Floats are used for a variety of things.

Float2's are normally used for UV coordinates. An example of a float2 is "5.3, 2.5493", ie, its just a list of two numbers.

Float3's are usually called vectors. They are depicted with the subtext of "name.xyz" or "name.rgb". However, they really have many sub-categories. Vectors are directions: 1 is along the positive axis and -1 along the negative axis for each component. Vectors are generally 'normalized' (see below), and each component falls between -1 and 1. Float3's can also be position values.

Though float3's are vectors, float4's are more commonly used in shaders because graphics hardware are optimized for them. They contain a fourth value, which can refer to the magnitude of a vector, or the value of an alpha channel. They are depicted with the subset of "name.xyzw" or "name.rgba".

Normalization

 * For more info, see Unit Vector

Normalization is essentially setting a vector's length/magnitude to 1. This makes the vector such that when the XYZ components are added they equal 1, and the W value also equals 1. This is important for multiplying and comparing vectors (a requirement for things such as lighting).

World, Object, and Tangent Space

 * For more info, see Coordinate Systems

We generally talk about shaders, and graphics in general, with regards to three different types of 'spaces': World Space, Object (or Local) Space, and Tangent Space. I will introduce these briefly.

"Spaces" work very similarly to the "Reference Coordinate System" in 3ds (or the Tool Settings panel in Maya). 3ds has two "Systems" which are of importance to us: World and Local.

To test them out, create an object, and translate and rotate it. Use "World," and your Move and Rotate axes will always stay aligned with the 3ds axis (Z up, X horizontal, Y into the screen). This is the equivalent of "World Space" in graphics programming. Now go into "Local." When you have an object selected, the Move and Rotate will adjust itself to the "local" axis of the object... if you rotate the object 90 degrees around the Z axis, the X axis now points into the screen, and the Y horizontal. Play around and experiment. This is the "Local/Object Space" in graphics programming. Finally, select a vertex of your object (still use Local Space). You will see that the move/scale gizmo now is aligned with the vertex's "normal" (actually its the averaged normal of the adjacent faces but that's not important). This is what is referred to as "Tangent Space." (A normal is a vector that is perpendicular to a surface).

So how does this apply to Shaders? Well, Vertex Shaders will 'use' these spaces, putting different 'inputs' into the same space so they can be measured and compared. For this article, we will be mostly concerned with World Space. Object space is very similar. Tangent Space is more complicated, but conceptually you should understand it after this article. We will explore Tangent Space more when we cover Normal Mapping.

Vertex Shader

 * For more info, see Vertex Shader

Now that we know what a Shader does, let's take a look at one. This shader converts things to World Space. We will break down a simple shader line-by-line. Remeber our initial definition? A shader takes something, does stuff to it, and gives you something else. Well, we first have to set up what "something" we take, and the "something else" we will eventually get. // input from application struct a2v { float4 position : POSITION; float2 texCoord : TEXCOORD0; float3 tangent : TANGENT; float3 binormal : BINORMAL; float3 normal : NORMAL; }; All application inputs inherited from the application like this are in Object Space (many rendering engines allow you to take variables, such as light position, in world space or object space, but they are not part of the vertex input structure... they are separate variables). These are the things the application passes into the vertex shader. The application says "this vector (float3) is your tangent, this is your normal, and this is your binormal. This float2 is the UV coordinates of the vertex.  And this vector (float4) is the vertex's position in Object Space. // output to fragment program struct v2f {  float4 position     : POSITION;  float3 lightVec     : TEXCOORD4;  float3 eyeVec      : TEXCOORD3;  float2 texCoord  : TEXCOORD0;  float3 worldTangent   : TEXCOORD6;  float3 worldBinormal  : TEXCOORD7;  float3 worldNormal    : TEXCOORD5; }; This is what the vertex shader outputs into the pixel shader (AKA, fragment program).  The actual code of the vertex shader will show us how we go calculate these outputs.  TEXCOORD is just a semantic for a "register," that says to the vertex shader, "store this number in this place with the name "TEXCOORD#". Now, the actual Vertex Shader code (this is the "stuff we do to it, going back to our original definition): v2f v(a2v In) { v2f Out = (v2f)0; Out.position = mul(In.position, wvp); Out.texCoord = In.texCoord; These are just your standard things to do.  This first "zeros out" your result to make sure the calculations are correct.  Then, you convert your vertex position (in object space) to "screen space" so it shows up correctly on the screen (multiplying by the "world view projection matrix").  Finally, you take your input UV coordinates and pass them through, unmodified.


 * For more info, see Transforms

float3 worldSpacePos = mul(In.position, world); Out.lightVec = lightPosition - worldSpacePos; Out.eyeVec = eyePosition - worldSpacePos; Matrix multiplication is a doozy... don't even try to think about it mathematically. This multiplies the object space vertex position by the "world matrix" to find the world space vertex position. Then we take the world space light and eye position, subtract the world space vertex position, and we get a vector pointing from the light (or eye) to the vertex position. In this case, our light and eye are in world space... if they weren't, we'd multiply their object space positions by the World Matrix to put them into world space, before we do the subtraction. Out.worldNormal = mul(In.normal, worldIT).xyz; Out.worldBinormal = mul(In.binormal, worldIT).xyz; Out.worldTangent = mul(In.tangent, worldIT).xyz; This just converts your normal, binormal, and tangent inputs, into world space, by multiplying them by the World Inverse Transpose Matrix. Now, everything (vertex position, light vector, eye vector, normal, binormal, and tangent vectors) are in world space.

Lighting
Before we move into Pixel Shaders, we need to understand lighting, both Vertex Lighting and Per-Pixel Lighting. Fortunately they use exactly the same math. Before we get into the pixel shader, which is intertwined with normal mapping as far as we are concerned, we must understand lighting more fully. As far as we are concerned, lighting comes in two essential forms: diffuse lighting, and specular lighting. There are different techniques for both of these, and there are also interesting ways to do ambient lighting, sub-surface scattering, anisotropic lighting, etc., but since this is an intro, we will look at the most common formulas of the most common lighting types.

Diffuse Lighting
Two inputs are of importance when we consider diffuse lighting. The normal (N), and the light vector (L). The comparison between these, called the Dot Product, determines how much illumination reaches a surface (we call this, NdotL). The dot product is a mathematical function of two vectors; if they are 'facing' each other head on, the result is 1. If they are perpendicular, the result is 0. If they are parallel, the result is -1, but we usually "clamp" any negative values to 0. So, let us look at this setup, of a single light positioned directly above the vertex of a plane. The lines pointing from the vertex are the vertex normals.

http://img177.imageshack.us/img177/8862/maya2007-03-3001-10-41-20.jpg http://img177.imageshack.us/img177/6460/maya2007-03-3001-10-24-71.jpg

The dot product of the vertex directly under the light is 1. The darker a vertex is, the lower its dot product. Exact values aren't important, only the idea is. Let's also look at the same setup, but with a plane with exponentially more vertices.

http://img177.imageshack.us/img177/7220/maya2007-03-3000-50-43-65.jpg http://img177.imageshack.us/img177/6078/maya2007-03-3000-50-31-92.jpg http://img177.imageshack.us/img177/1664/maya2007-03-3000-50-55-82.jpg

The lighting is done exactly the same way, the NdotL is calculated per-vertex, and interpolated linearly across the plane... meaning, that if one vertex has an NdotL of 1, and an adjacent vertex has an NdotL of 0, the point half-way between the two vertices will have a dot product of .5. Because things are done per-vertex, however, this interpolation/lack of sampling creates problems, as we can see in the following image.

http://img177.imageshack.us/img177/9507/maya2007-03-3001-17-34-57.jpg

The NdotL of all vertices is exactly the same, so the surface is shaded with a solid color (the NdotL is .6 at each vertex, and thus is .6 everywhere). Enter, per-pixel lighting and pixel shaders. With per-pixel lighting, we get the following result to the same exact geometry:

http://img177.imageshack.us/img177/1296/maya2007-03-3001-17-12-31.jpg  

The reason is that we are finding the NdotL at each pixel instead of vertex. This is much more accurate and precise. Instead of being concerned with the normal of each vertex, we are concerned with the normal of each pixel, which is given to us from, you guessed it, a normal map (or a normalized normal from the vertex shader, but those days of non-normal-mapped surfaces are behind us).

Finally, the original simple plane with per-pixel lighting:

http://img177.imageshack.us/img177/656/maya2007-03-3000-52-17-29.jpg   http://img177.imageshack.us/img177/7296/maya2007-03-3000-52-29-31.jpg

Specular Lighting
Specular lighting is a 'fake' reflection of the light on the surface. It provides 'central hotspot' and the size of the area around it. Specularity isn't as straightforward as diffuse lighting, but it is still relatively simple. Specular lighting is dependent upon the location of the eye/camera, remember, not just the normal and light.

Half Angle (H)
So, we start with specular lighting, by finding what is called the "half angle", that is, L + E, or the light vector plus the eye vector. This gives us a vector pointing half-way between the Light and Eye vectors (angles). Imagine a vector as an arrow, to add vectors, you add the bottom of one arrow to the arrowhead of the other arrow... draw this out, and you'll see that what we do indeed get is the half-way vector. Imagine your light and camera are in the same spot, the half angle points in the exact same direction. But as your camera rotates around the object, the half-angle vector rotates at half the 'rate', for all means and purposes.

NdotH
So, we take the half angle, or H, and get NdotH, the dot product between the normal and the half angle. So, once again, imagine our eye and light are in the same position and we are looking at a sphere. The shader returns an NdotH of 1 for the vertex directly in front of the camera/light. Now, let us move our camera 90 degrees around the sphere... the half angle is at 45 degrees from our start point, so if we are concerned with the NdotH of the vertex directly in front of us, it should return a value of ".5". The only time you will return an NdotH of 0 is when you are looking at the light towards a surface pointing directly away from it (the light on one side of the sphere, and the eye at the complete opposite).

Gloss/Shininess/Specular Power
For specular lighting, we have the "specular power," which is also called gloss, shininess, etc. This controls the size of the specular highlight. The best way to explain this is mathematically. What we do is take the NdotH, and raise it to the "gloss" power. So, let's compare two gloss powers: 5, and 60. The result of NdotH^gloss (NdotH raised to the "gloss" exponent/power) we will call the Specular Level.

Where NdotH is .5, the Specular Level becomes .03, and .000000000000000000867, respectively. Where NdotH is 1, the Specular Level becomes 1, and 1, respectively (1 to any exponent is 1). Where NdotH is .75, the Specular Level is .23, and .00000003189, respectively. What we can see happening, if we did this for every hundredth decimal place or so, is an exponential falloff (duh). Higher exponents lead to a tighter highlight (because even relatively high NdotH values are multiplied by themselves so many times to become negligable.

Ambient
Ambient lighting is "flat" traditionally. Now many games are using what are called "diffusely convolved cube maps" that simulate Global Illumination, but we can consider ambient lighting as adding a uniform brightness to the scene.

Putting Lighting Together
Lighting is done by adding together the diffuse, specular, and ambient components. The most simple, basic, greyscale lighting can essentially be written as follows: float3 lighting = dot(N, L) + dot(N, (E + L)) + ambientColor This is just adding together the three types of lighting for that vertex or pixel. The values are clipped to 0 and 1 by the hardware display (that popular buzzword, HDR, doesn't clip the values and dynamically adjusts what values are clipped, and some other things, to achieve a more realistic rendering and eye behaviour).