Playing with pixels in depth with Kinect, part 2

07.02.12 George Profenza

As if the previous post wasn't geeky enough, here's a quick look at project that also ties in a bit of computer vision and neural networks.

One one of the courses, related to Programming for Architecture and Design, among other things, we had a lecture and tutorial on Neural Networks. There are multiple types of Neural Networks, mainly classified as supervised and unsupervised, based on how this networks learn.

Kohonen networks (which is what this post focuses on) are unsupervised networks, also known as a self organizing maps (SOM). As opposed to supervised networks, where neurons are trained what the output should be like (should weigh towards), this type of network is based on competitive learning - the outputs/neurons organize themselves towards the closest inputs. This idea of competitive learning is based on how it is thought the hippocampus(the part of our brain responsible for navigation) works. In a sense, the outputs display a particle-spring like behaviour towards the inputs, which make this type of network useful for surface fitting/dimensionality reduction/etc.

Initially a dataset of 3d points was given, but it thought it would be more fun for some reason to fit a surface on my face (or any face for that matter). This is what the video illustrated:

  • Computer vision(OpenCV's HAAR cascade feature) is used to detect faces and isolate an area in the Kinect depth map
  • depth pixels belong to the face are converted to 3D coordinates
  • once a point cloud was selected, the points can be fed as the inputs of the neural net and the outputs are vertices of the surface. The number of ouputs is variable, so a low-poly mesh can also be calculated

The mesh can also be saved to AutoCAD (.dxf) format, which is what I've used to render a creepy theatre like mask based on Max's face. Currently the default surface is a rectangular grid, which is a good start, but not ideal for fitting on a face. If you can imagine a face unwrapped into 2D space, it would not look like a perfect rectangular, but that's something to explore at a later time.

In the meantime, if you would like to have a play with the code, the source is included. If this leads to something interesting let us know. The code is written using Processing and uses OpenKinect and OpenCV. If you think this is something you would like explained further, leave a comment bellow and we'll post more details on the wiki.

Max Mask Max face point cloud 1 Max face point cloud 2




Playing with pixels in depth with Kinect, part 1

07.02.12 George Profenza

I've managed to get a bit of breathing time so I thought about posting a few nerdy bits and pieces. Currently I'm doing an MA in Adaptive Architecture and Computation at UCL which is pretty cool, but keeps me pretty busy lately. Been learning up a lot of new skills there, among others, using the Kinect Sensor.

In this post I'll demo a few things I've learned.

I'll start with a quick technical demo of what I was able to achieve using Kinect and Processing. It displays the following:

  • user isolation
  • stereo calibration (matching rgb pixels with depth data)
  • hand tracking (in 2D and 3D)
  • skeleton tracking (without the 'cactus' calibration pose)

Although there is an official Microsoft driver for the Kinect, it's for Windows only (no surprize there), so I've used the opensource drivers. There are plenty of wrapper libraries for various languages, but so far I've used wrapper libraries for Processing (Daniel Shiffman's OpenKinect Processing lib and SimpleOpenNI), OpenFrameworks (ofxKinect) and MaxMSP (jit.freenect.grab). Each library has it's pros and cons, but I won't go much into detail in this post.

Here's a list of the data you can get from a Kinect:

  • Depth/IR/RGB pixels
  • Accelerometer (accesible with some of the libraries)
  • Audio data (currently supported by the official KinectSDK at the moment)

Plenty that can be done with the above mentioned. Currently I'm keen to learn more about manipulation the raw data rather than relying on OpenNI to see what sort of interactions can be achieved.

I tend to be gravitate around unusual(think Aphex Twin) ideas lately, hence the first image on the side, which displays how skeleton tracking and user isolation can be used to duplicate parts of the body. When displaying the bounding box, the gray forearm is the copied version.

One unusual idea might be turning people into trees. It seems the Greeks beet me to it (a few thousand years back), as the myth of Heliades also portrays this idea. The second image on the side shows a tracked figure morphing into a tree by recursively copying forearms. You can see the full video here. It's split into 3 parts: context, prototyping and final piece. I'm using SimpleOpenNI and skeleton tracking, but the unstable release of the drivers which allows for a more responsive output, as the calibration pose is not required.

See you in part 2 !

realtime forearm cloning with Kinect Heliades: man morphing into tree