Saturday, February 18, 2012

Performances GPU vs CPU matrix palette skinning (WAS: Tutorial #2: Part 1: Animated character walking around. Walk cycle and animated clowds)

I planned, for a while, to do another GLES 20 demo for AnimKit with multiple characters on the scene. Found some time yesterday and this is the start (video bellow). Probably I'll cover this in multiple posts; posting the results here when I find some time to progress with it.




Note that the scene is just a work in progress, clouds behind are big panes causing some performance impact but I think a good illustration of what can be done in few hours and I'll use it as input for later performance optimization work on "AnimKit on OpenGL ES 2.0". I'll post code and more text soon, my one year old is about to wake up.


The first work in progress version of code is here: commit 3e7cfe2b27 WIP: Tutorial #2: Part 1: Animated character walking around.


The code could be used as illustration and for benchmarking, I guess if you have a simple scene with less then 4000 matrix skinned vertices it should also be usable, but for this example it is not OK - downloaded SDK for iOS 5.0 and tried it on iPhone 3GS. Got only 7 frames per second (~8000 skinned vertices).
Tried more complex model with 5 animated instances and got 1-2 frames per second (40000 skinned vertices). The model from post Skeleton and pose character animation using AnimKit runs at 14 frames per second.


As assumed in previous posts, CPU matrix skinning (akGeometryDeformer::LBSkinning()) that enumerates through all vertices and normals is the bottleneck - most of the time spent there.

On the other side, tried PoverWR example of GPU matrix skinning POWERVR Insider MBX Chameleon Man Demo and it runs quite smoothly,... but it has only ~1000 skinned vertices in the mesh.


Guess poly-reducing the mesh and moving the calculation to another thread would help, though not  significantly (based on the results above) so I plan to do that (reduce complexity of scene) but also check Chameleon Man source code. Code is available as part of PowerVR Insider SDK and you would just need to register to get it.


Update on March 2nd after implementing GPU skinning: didn't spend time on reducing the scene - just after implemented GPU GLSL matrix skinning the results already look promising (on iPhone 3GS):


Scene Skinned triangles FPS with CPU skinning FPS with GPU skinning
gles2farm - 1 animated character 5620 8 59
AppAnimKitGL - 5 animated characters 33700 1 22

Code is here.


Five animated characters scene looks like this:






In following post, I will try to explain how to implement matrix palette skinning (character skeleton animation using GPU).



CPU skinning is not the best choice for Opengl ES 2.0 devices.

CPU skinning is implemented like this: for every repaint, enumerate through all vertices, calculate and apply bone transformation. In more details, code bellow is an example of CPU skinning in AnimKit's  akGeometryDeformer::LBSkinningUniformScale. Apparently, on iPhone these matrix operations affect performance significantly and are better suited for vertex shader.

const btAlignedObjectArray<akMatrix4>& matrices = *mpalette;
for(unsigned int i=0; i<vtxCount; i++)
{
akMatrix4 mat(matrices[indices[0]] * weights[0]);
if (weights[1]) mat += matrices[indices[1]] * weights[1];
if (weights[2]) mat += matrices[indices[2]] * weights[2];
if (weights[3]) mat += matrices[indices[3]] * weights[3];
// position
*vtxDst = (mat * akVector4(*vtxSrc, 1.f)).getXYZ();
// normal
*normDst = (mat * akVector4(*normSrc, 0.0f)).getXYZ();
*normDst = normalize(*normDst);
akAdvancePointer(normSrc, normSrcStride);
akAdvancePointer(normDst, normDstStride);
akAdvancePointer(weights, weightsStride);
akAdvancePointer(indices, indicesStride);
akAdvancePointer(vtxSrc, vtxSrcStride);
akAdvancePointer(vtxDst, vtxDstStride);
}

Anyway, when bone transformation is applied and mesh vertices and normals updated, repaint is called after updating vertex buffer updating vertex buffer:

akSubMesh* sub = m_mesh->getSubMesh(i);
UTuint32 nv = sub->getVertexCount();
void *codata = sub->getSecondPosNoDataPtr();
UTuint32 datas = sub->getPosNoDataStride();
glBindBuffer(GL_ARRAY_BUFFER_ARB, m_posnoVertexVboIds[i]);
glBufferData(GL_ARRAY_BUFFER_ARB, nv*datas, NULL, GL_STREAM_DRAW);
glBufferSubData(GL_ARRAY_BUFFER_ARB, 0, nv*datas, codata);

This part also would get fixed by GPU skinning, since vertices only needs to be "uploaded" once in init() to vertex buffer, instead on every redraw:

            glBufferData(GL_ARRAY_BUFFER_ARB, nv*posnodatas, posnodata, GL_STATIC_DRAW)