Marker Tracking
Keywords:
vision,
tracking,
marker,
augmented reality
Author(s): Mario Becker, Sabine Webel, Svenja Kahn, Michael Zoellner
Date: 2010-02-05
Summary: This tutorial shows you how to use the basic features of instant reality's vision module for marker tracking. It also describes the virtual camera handling in Augmented Reality based X3D, or rather instantreality, applications. Furthermore it is discussed how to apply marker tracking to your X3D scene by using a Viewpoint node, what allows you to run an Augmented Reality scene with arbitrary aspect ratio of the window.
Introduction
instantvision is a set of visual tracking systems offering a variety of features such as simple marker tracking or markerless tracking (for example line trackers and KLT). The true power of the system lies in the ability to combine of several such tracking procedures, for instance using a line tracker for initialisation with an absolute pose and KLT for frame to frame tracking.
In this example we will focus on a simple marker tracking example using the VisionLib backend. First we will show how to track a single marker for a simple Augmented Reality application while keeping the window size flexible. It is described how the background image (coming from the camera) can be kept in the correct aspect ratio and how to specify whether it is fitted in the background relative to the window height or width. Then we will extend this example by tracking a second marker.
This example works with all cameras which support DirectShow. If your camera does not support DirectShow have a look at the Camera Setup Tutorial and adapt the "source_url" in the instantvision configuration files of this tutorial.
IOSensor
The marker tracking is loaded like any other Instant IO device via an IOSensor. These are the instantvision's fields which are necessary:
- VideoSourceImage (SFImage): Camera image
- TrackedObject1Camera_ModelView (SFMatrix): Camera's modelview matrix
- TrackedObject1Camera_PrincipalPoint (SFVec2f): Camera's principal point
- TrackedObject1Camera_FOV_horizontal (SFFloat): Camera's horizontal field of view
- TrackedObject1Camera_FOV_vertical (SFFloat): Camera's vertical field of view
- TrackedObject1Camera_CAM_aspect (SFFloat): The aspect of a pixel in the camera image
Code: IOSensor
<IOSensor DEF='VisionLib' type='VisionLib' configFile='TutorialMarkerTracking_OneMarker.pm'> <field accessType='outputOnly' name='VideoSourceImage' type='SFImage'/> <field accessType='outputOnly' name='TrackedObject1Camera_ModelView' type='SFMatrix4f'/> <field accessType='outputOnly' name='TrackedObject1Camera_PrincipalPoint' type='SFVec2f'/> <field accessType='outputOnly' name='TrackedObject1Camera_FOV_horizontal' type='SFFloat'/> <field accessType='outputOnly' name='TrackedObject1Camera_FOV_vertical' type='SFFloat'/> <field accessType='outputOnly' name='TrackedObject1Camera_CAM_aspect' type='SFFloat'/> </IOSensor>
Setting the background and the virtual camera
To set the camera's image in the background we use PolygonBackground node. By setting the PolygonBackground's field fixedImageSize the aspect ratio of the image can be defined. This aspect ratio is also kept when the window is resized. Depending on how you want the background image fit in the window you need to set the mode field to "VERTICAL" or "HORIZONTAL". When set to "VERTICAL" the image fits to the height of the window, when the "mode is set to "HORIZONTAL" it fits to the width of the window.
Code: Camera image on a PolygonBackground
<PolygonBackground fixedImageSize='640,480' mode='VERTICAL'> <Appearance> <PixelTexture2D DEF='tex' autoScale='false'/> <TextureTransform scale='1 -1'/> </Appearance> </PolygonBackground> <ROUTE fromNode='VisionLib' fromField='VideoSourceImage' toNode='tex' toField='image'/>
Now we add a PerspectiveViewpoint node, which represents the virtual camera. To apply the real camera's field of view and principal point to the virtual camera we route the corresponding values delivered by the IOSensor to our scene camera (PerspectiveViewpoint). If you decided to fit the background image to the height of the window (i.e. the mode field of the PolygonBackground is set to "VERTICAL"), you need to
- connect the vertical field of view (TrackedObject1Camera_FOV_vertical) of the IOSensor with the fieldOfView field of the PerspectiveViewpoint and
- set the fovMode field of the PerspectiveViewpoint to "VERTICAL"
Otherwise you need to use the horizontal field of view (TrackedObject1Camera_FOV_horizontal) of the IOSensor and set the fovMode field of the PerspectiveViewpoint to "HORIZONTAL". However, it is important that the PolygonBackground and the PerspectiveViewpoint are working in the same (field of view-) mode. To achieve an undistroted augmentation, also the CAM_aspect of the IOSensor must be routed to the aspect of the PerspectiveViewpoint.
The position field of the PerspectiveViewpoint is set to '0 0 0' and will not be changed on runtime.
Code: Setting up and connecting the scene camera (PerspectiveViewpoint)
<PerspectiveViewpoint DEF='vp' position='0 0 0' fovMode='VERTICAL'/> <ROUTE fromNode='VisionLib' fromField='TrackedObject1Camera_PrincipalPoint' toNode='vp' toField='principalPoint'/> <ROUTE fromNode='VisionLib' fromField='TrackedObject1Camera_FOV_vertical' toNode='vp' toField='fieldOfView'/> <ROUTE fromNode='VisionLib' fromField='TrackedObject1Camera_CAM_aspect' toNode='vp' toField='aspect'/>
To deactive the navigation (what is usually desired in Augmented Reality scenarios) we use a NavigationInfo node.
Code: Deactivating the navigation
<NavigationInfo type='none' />
Setting the virtual object's position and orientation relative to the camera
We add a virtual object (a yellow teapot) to the scene. Then we set its transformation relative to the camera by routing the modelview matrix from the IOSensor to a MatrixTransform, which acts as a "global transformation". A second transformation translates and rotates the teapot relative to the marker.
Code: Transforming the virtual object relative to the marker
<MatrixTransform DEF='TransfObj1RelativeToCamPosition'> <Transform DEF='transfObj1RelativeToMarker' translation='2.5 2.5 1.5' rotation='1 0 0 1.57'> <Shape> <Appearance> <Material emissiveColor='1 0.5 0' /> </Appearance> <Teapot size='5 5 5' /> </Shape> </Transform> </MatrixTransform> <ROUTE fromNode='VisionLib' fromField='TrackedObject1Camera_ModelView' toNode='TransfObj1RelativeToCamPosition' toField='set_matrix'/>
Print the first marker (the one which looks like a "L") to augment it with a virtual yellow teapot.
- TutorialMarkerTracking_OneMarker.x3d (Example)
- TutorialMarkerTracking_OneMarker.pm (Configuration File)
- TutorialMarkerTracking_Markers.pdf (Marker)
Tracking two markers
The InstantVision configuration file
Now let's add a second marker augmented with a blue teapot. First have a look at the InstantVision configuration file of the previous part of this tutorial (TutorialMarkerTracking_OneMarker.pm).
An InstantVision configuration file is an XML document. It has two main components: An "Action Pipe" with several actions and a "DataSet" where the data of the InstantVision module is stored.
We do not need to change anything in the ActionPipe to use several markers but let's nevertheless have a look at it. There are four actions in the ActionPipe of this example:
- VideoSourceAction: gets the camera image
- ImageConvertAction: converts the camera image from RGB to GREY
- MarkerTrackerAction: the marker tracker
- TrackedObject2CameraAction: creates a camera for every tracked object in the DataSet so we can transfer the data to the InstantPlayer
An Action usually has IN- and OUT-slots for incoming and outgoing data. The data routed to IN-Slots or OUT-Slots is identified by "keys". For example the VideoSourceAction has an OUT-Slot for which we use the key "VideoSourceImage". We use the same key for the IN-Slot of the ImageConvertAction to route the image from the VideoSourceAction to the ImageConvertAction. Actions can have an ActionConfig in which you can set several parameters of the Action. For example in the MarkerTrackerAction you can set whether to use a light invariant contour detection (ContourExtractor=0) or a silhouette extractor (ContourExtractor=1) and several other parameters. You can open a configuration file in InstantVision and have a look at the actions in the ActionManager to see a short description of the attributes.
Code: The ActionPipe
<ActionPipe category='Action' name='main'> <VideoSourceAction__ImageT__RGB_Frame category='Action' enabled='1' name='VideoSourceAction'> <Keys size='2'> <key val='VideoSourceImage' what='image live, Image*, out'/> <key val='' what='intrinsic parameters to be modified, out'/> </Keys> <ActionConfig source_url='ds'/> </VideoSourceAction__ImageT__RGB_Frame> <ImageConvertActionT__ImageT__RGB_FrameImageT__GREY_Frame category='Action' enabled='1' name='ImageConvertActionT'> <Keys size='2'> <key val='VideoSourceImage' what='source image, in'/> <key val='ConvertedImage' what='target image, out'/> </Keys> </ImageConvertActionT__ImageT__RGB_FrameImageT__GREY_Frame> <MarkerTrackerAction category='Action'> <Keys size='7'> <key val='ConvertedImage' what='input image, ImageGREY*, in'/> <key val='IntrinsicDataPGRFlea8mm' what='IntrinsicDataPerspective, IntrinsicDatra*, in'/> <key val='World' what='World of TrackedObjects, World*, in/out'/> <key val='MarkerTrackerInternalContour' what='Contour, Contour*, out'/> <key val='MarkerTrackerInternalSquares' what='GeometryContainer of corner points, GeometryContainer*, out'/> <key val='MarkerTrackerInternalCorresp' what='internal use'/> <key val='MarkerTrackerInternalPose' what='internal use'/> </Keys> <ActionConfig ContourExtractor='0' MTASilThresh='140' MTAThresh='140' MTAcontrast='1' MTAlogbase='10' RefineCorners='0' WithPoseNlls='1'/> </MarkerTrackerAction> <TrackedObject2CameraAction category='Action' enabled='1' name='TrackedObject2Camera'> <Keys size='3'> <key val='World' what='world, World*, in'/> <key val='IntrinsicDataPGRFlea8mm' what='intrinsic CameraPerspective parameters, IntrinsicDataPerspective*, in'/> <key val='Camera' what='suffix string for the CameraPerspective, out'/> </Keys> </TrackedObject2CameraAction> </ActionPipe>
In the DataSet you can see an IntrinsicData element and a World with one TrackedObject. The TrackedObject containes a marker and its description, four lines of markercode and the positions of the markers corners in the world.
Code: The DataSet
<DataSet key=''> <IntrinsicDataPerspective calibrated='1' key='IntrinsicDataPGRFlea8mm'> <Image_Resolution h='480' w='640'/> <Normalized_Principal_Point cx='5.0037218855e-01' cy='5.0014036507e-01'/> <Normalized_Focal_Length_and_Skew fx='1.6826109287e+00' fy='2.2557202465e+00' s='-5.7349563803e-04'/> <Lens_Distortion k1='-1.6826758076e-01' k2='2.5034542035e-01' k3='-1.1740904370e-03' k4='-4.8766380599e-03' k5='0.0000000000e+00'/> </IntrinsicDataPerspective> <World key='World'> <TrackedObject key='TrackedObject1'> <ExtrinsicData calibrated='0'> <R rotation='1 0 0'/> <t translation='0 0 0'/> <Cov covariance='0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'/> </ExtrinsicData> <Marker BitSamples='2' MarkerSamples='6' NBPoints='4' key='MarkerOfTrackedObject1'> <Code Line1='1000' Line2='1000' Line3='1000' Line4='1110'/> <Points3D nb='4'> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='0' y='5' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='5' y='5' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='5' y='0' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='0' y='0' z='0'/> </Points3D> </Marker> </TrackedObject> </World> </DataSet>
We add a second TrackedObject (TrackedObject2) with the marker code (1000), (0110), (0000), (0000). This describes the second marker which has a white field in the first row and two white fields in the second row.
Code: The second TrackedObject
<TrackedObject key='TrackedObject2'> <ExtrinsicData calibrated='0'> <R rotation='1 0 0'/> <t translation='0 0 0'/> <Cov covariance='0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0'/> </ExtrinsicData> <Marker BitSamples='2' MarkerSamples='6' NBPoints='4' key='MarkerOfTrackedObject2'> <Code Line1='1000' Line2='0110' Line3='0000' Line4='0000'/> <Points3D nb='4'> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='0' y='5' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='5' y='5' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='5' y='0' z='0'/> <HomgPoint3Covd Cov3x3='0 0 0 0 0 0 0 0 0 ' w='1' x='0' y='0' z='0'/> </Points3D> </Marker> </TrackedObject>
This is all we need to change in the InstantVision configuration file. Now we open the file TutorialMarkerTracking_OneMarker.x3d and add fields for the second object's translation and rotation to the IOSensor. As mentioned before the TrackedObject2CameraAction creates camera objects for each TrackedObject in a world description. These cameras are named KeyOfTrackedObject+Camera. The IOSensor (the gate between InstantVision and InstantPlayer) extracts certain data from these cameras and provides them as TrackedObject2Camera_ModelView, TrackedObject2Camera_PrincipalPoint etc. So we add the object names we need to the IOSensor definition and add routes to the scene object.
Code: Adding the field for the second tracked object to the IOSensor
<field accessType='outputOnly' name='TrackedObject2Camera_ModelView' type='SFMatrix4f'/>
Then we add the second teapot to the scene and set the routes needed for its transformation.
Code: Transforming a second virtual object relative to the second marker
<MatrixTransform DEF='TransfObj2RelativeToCamPosition'> <Transform DEF='transfObj2RelativeToMarker' translation='2.5 2.5 0' rotation='1 0 0 1.57'> <Shape> <Appearance> <Material emissiveColor='0 0.5 1' /> </Appearance> <Teapot size='5 5 5' /> </Shape> </Transform> </MatrixTransform> <ROUTE fromNode='VisionLib' fromField='TrackedObject2Camera_ModelView' toNode='TransfObj2RelativeToCamPosition' toField='matrix'/>
We can now track both markers and augment them with a yellow and a blue teapot.