instantreality 1.0

Microsft Kinect for Windows

Keywords:
Microsoft, Depth, Sensor, Kinect, Skeleton
Author(s): Tobias Alexander Franke
Date: 2013-10-06

Summary: This tutorial describes the Kinect node, which is an InstantIO implementation of the Microsoft Kinect SDK.

Microsoft Kinect for Windows

Since Microsoft put out a standalon version of the Kinect camera for Windows, it also published its own SDK. While there are many similarities to the OpenNI implementation, the differences, especially focusing on skeleton tracking, may be worth looking into. Another potential benefit is the near field mode of a Kinect camera for Windows, which is able to provide depth data closer to the camera. In this tutorial, we will create a Kinect node to access skeleton data.

Prerequisites

Using the Kinect node requires an installation of the Microsoft Kinect SDK for Windows. The installer will take care of everything, including driver installation. Please note that this SDK can only be used with a Microsoft Kinect for Windows. You may also want to watch out for existing installations of OpenNI and NITE (both version 1.x) and avin2 drivers, which can interfere with the regular driver installation.

Make sure your driver installation is correct by testing one of the sample apps in the Kinect for Windows Developer Toolkit.

The node

Instantiating a Kinect for Windows node

The Kinect node in its complete instantiation is accessed via InstantIO as follows:

Code: OpenNI InstantIO node

<IOSensor DEF='kinect' type='Kinect' DeviceID='0' Width='640' Height='480' FPS='30' AlignViewpoints='TRUE' NormalizeDepth='TRUE' EnableNearMode='FALSE' FlipImages='FALSE'>
    <field accessType='outputOnly'  name='Image'                type='SFImage'/>
    <field accessType='outputOnly'  name='Depth'                type='SFImage'/>
    <field accessType='outputOnly'  name='UserMask'             type='SFImage'/>
    <field accessType='outputOnly'  name='TrackedUsers'         type='MFInt32'/>
    <field accessType='outputOnly'  name='NumUsers'             type='SFInt32'/>
    <field accessType='outputOnly'  name='JointPositions'       type='MFVec3f'/>   
    <field accessType='outputOnly'  name='SkeletonConfidence'   type='MFFloat'/>
    <field accessType='outputOnly'  name='ElevationAngle'        type='SFInt32'/>
    
    <field accessType='inputOnly'   name='ResetTrackedUsers'    type='SFBool'/>
    <field accessType='inputOnly'   name='NewElevationAngle'    type='SFInt32'/>
</IOSensor>

  • Since a Kinect device comes with two cameras (a depth and an image sensor), there are two fields to access these images: Image, which provides the color image of the first camera, and Depth, which provides the depth data of the sensor. Please note: the color image is a standard RGB 8bit image, whereas the depth image is a single channel 32bit float image.
  • Another image provided by the IOSensor is the UserMask: this is a simple greyscale image, in which each pixel represents the ID of a user. So for instance to filter out all pixels of user 2, simply write a shader that multiplies all other colors of the Image by 0 and leaves the rest unmodified.
  • Width, Height and FPS fields are self-explanatory configurations for the camera capturing the scene. Note that the FPS parameter is highly device dependent and may be fixed to one value only.
  • The AlignViewPoints parameter controls if the node will align both the depth and color image to a common viewpoint. If this parameter is set to FALSE, you will have to manually match the depth image to the color image (since both cameras are slightly apart on the device).
  • Setting NormalizedDepth controls if the node will automatically rescale the float values of the depth image to a range of 0.0 to 1.0. The scaling factor is a fixed, device-dependent number that is used internally by OpenNI. Setting this parameter to FALSE leaves the values untouched so you can arbitrarily rescale them in your shader.
  • To set the special near mode that a Kinect for Windows provides, simply set EnableNearMode to true. Objects may then be closer to the camera and still get valid depth results.
  • If FlipImages is set to TRUE, both the depth and color images will be flipped along both the x- and y-axis.
  • NumUsers is a simple counter which provides the number of humans detected in front of the device.
  • JointPositions is a field which will output positions for each joint of one tracked user skeleton. This array provides its joint data (please refer to the MSDN NUI specification on skeletons) in the following order: NUI_SKELETON_POSITION_HEAD, NUI_SKELETON_POSITION_SPINE, NUI_SKELETON_POSITION_SHOULDER_CENTER, NUI_SKELETON_POSITION_HIP_CENTER, NUI_SKELETON_POSITION_SHOULDER_LEFT, NUI_SKELETON_POSITION_ELBOW_LEFT, NUI_SKELETON_POSITION_WRIST_LEFT, NUI_SKELETON_POSITION_HAND_LEFT, NUI_SKELETON_POSITION_SHOULDER_RIGHT, NUI_SKELETON_POSITION_ELBOW_RIGHT, NUI_SKELETON_POSITION_WRIST_RIGHT, NUI_SKELETON_POSITION_HAND_RIGHT, NUI_SKELETON_POSITION_HIP_LEFT, NUI_SKELETON_POSITION_KNEE_LEFT, NUI_SKELETON_POSITION_ANKLE_LEFT, NUI_SKELETON_POSITION_FOOT_LEFT, NUI_SKELETON_POSITION_HIP_RIGHT, NUI_SKELETON_POSITION_KNEE_RIGHT, NUI_SKELETON_POSITION_ANKLE_RIGHT, NUI_SKELETON_POSITION_FOOT_RIGHT An additionl field SkeletonConfidence provides floating point confidence values between 0 (not confident) to 1 (confident) for each joint. These values can be used to ensure proper tracking and filter out unwanted updates.
  • TrackedUsers provides a sequence of IDs of currently tracked users. The list will be updated whenever someone enters or leaves the sensor area. Note that the sequence of IDs is the same as the JointPositions and JointOrientations fields provide joints for each skeleton: for instance, the sequence 4,1,3 will tell you that the first 15 entries of JointOrientations belong to user 4, the next 15 to user 1 and so forth.
  • In case tracked users and skeleton tracking will produce incoherent results, get stuck or simply stop working, you can use this send a TRUE value to this field to force an internal reset. This will wipe all user tracking and reset the tracker.
  • The elevation angle of the Kinect camera can be read from ElevationAngle. The integer value that is provided is the current angle in degrees. The camera can automatically reposition itself to a new angle by sending it to NewElevationAngle. This may be helpful if you'll notice in your application that parts of the users skeleton joints have gone missing, for instance by checking their confidence values.

Accessing a skeleton

The first step is to create a device in your X3D file. We will use an IOSensor node with two fields, Depth and JointPositions. We will also set up an ImageBackground where we display the Depth image, and two nodes to fill in the skeleton with connecting lines between joints.

Code: Hooking up a kinect device to our X3D example

<IOSensor DEF='kinect' type='Kinect' FlipImages='TRUE'>
    <field accessType='outputOnly'  name='Depth'           type='SFImage'/>
    <field accessType='outputOnly'  name='JointPositions'  type='MFVec3f'/>
</IOSensor>

<ImageBackground>
    <PixelTexture2D DEF='image'/>
</ImageBackground>

<Group DEF='skeleton'>
</Group>

<Shape DEF='lines'>
    <Appearance>
        <LineProperties linewidthScaleFactor='4'/>
        <Material emissiveColor='0 1 0'/>
    </Appearance>
</Shape>

After this setup we will need a script to grab the arrays provided by the Kinect node and create joints (made from boxes) and connect the right ones with lines together. Please have a look in the source file attached for more details!

Files: