Eye tracking refers to monitoring the eye movements. There are a number of techniques for measuring eye movements. The most common and widely used technique is video-based eye tracking that uses video cameras to record the image of the eye and extracts the information from the eye image.

Gaze Tracking

A gaze tracker is a device that measures the eye movements and additionally estimates the user’s gaze using the information obtained from the eyes. Depending on the gaze estimation technique employed, the output of the gaze trackers may be the Point-of-Regard (PoR) or Line of Sight (LoS) in 3D space, or it may be a point in a 2-dimensional image (e.g., the user’s field of view (scene image) or a computer display). Find detailed review of recent eye models and techniques for eye detection and tracking in this paper: In the Eye of the Beholder: A Survey of Models for Eyes and Gaze.

Remote vs Head-Mounted Gaze Trackers

Video-based gaze trackers can be categorized into two different types: Head mounted gaze tracker (HMGT) and Table-mounted (a.k.a Remote gaze tracker).

In a table-mounted tracker the system components (camera and infrared light sources) are placed away (remote) from the user. Some of the RGTs have more than one camera to track the eyes and the face. On a head-mounted tracking system, the eye and the front-view cameras are mounted on the head. Some of the HMGTs do not have a front-view camera and estimates the gaze in 3D space. Binocular HMGTs have two eye cameras for tracking both eyes.
In terms of Gaze Estimation Space, table-mounted systems usually only allow for estimating the Point of Regard (PoR) on a fixed planar surface (fixation plane), e.g. a computer display. In contrast, HMGT systems are commonly used for estimating the Line of Sight (LoS) and the gaze point of the user in his field of view. Table-mounted trackers allow a very limited range of movements of the subject's head and have limited field of view, whereas, HMGTs are mounting on the head and have a wide FoV.

Here is a problem we want to solve: The user is sitting in front of a computer display and is looking at a point inside the display. There is a camera looking at the user’s eyes and we want to use that the information captured by the camera to find out where the user is looking at. There are some eye features that can be detected and tracked in the eye image such as the pupil center, limbus (border of the iris) and eye corners. Many gaze trackers use pupil center together with the reflection of a light source (from the anterior surface of the cornea) for estimating the gaze point (also referred to as point-of-regard (PoR)).

One way of using these features in the image and relating them to the gaze point in space is to follow a geometrical method and find person's gaze vector in space. Once we find the gaze vector relative to our world coordinate system we can then find the intersection of this vector with the planar screen in front of the user and this intersection would be the gaze point. This method is basically a direct way of finding the gaze point in space and it requires a calibrated setup and knowing the geometry of the eye model and the system components. If you are interested in studying the mathematical details of this method read this paper: General Theory of Remote Gaze Estimation Using the Pupil Center and Corneal Reflections.

Although there are many different methods for gaze estimation, a low precision gaze tracking can be achieved by a simple interpolation. The interactive figure below shows the basics of an interpolation based method that maps the eye features extracted from the eye image to a point inside a 2D plane in front of the eye. Let's assume that the gaze point lies on a plane (e.g. a computer display) in front of the eye. Let's call this plane fixation plane. In the figure you see a schematic illustration of main elements of a remote gaze tracking setup. The eye camera is shown by a triangle indicating the camera image plane and the projection center. A simplified model of the eye is shown with its optical and visual axes. The visual axis intersects the fixation plane on the gaze point (PoR). You can also see how the light ray emitted from a light source will be reflected from the surface of the cornea and how it is projected onto the camera image. You can interact with this figure and move the eye and the camera by dragging the red circles. Play around with this figure and see how projections of the pupil center and the light source in the image change as you change the gaze direction.

Figure 1: Main elements of a remote gaze tracking setup. (link to applet: https://ggbm.at/R3qbcn9x)

How does a polynomial gaze estimation method work?

Now, let's see how we can map pupil position to a gaze point in the fixation plane. In the figure, uncheck the Glint checkbox (we will get back to this later). Although, the figure is showing a 2 dimensional view of the setup but the results can be generalized to 3D. In Figure 1, you can see a point called input. There is also a horizontal vector from PoR to this point. We use this vector to show the corresponding input value (in this case, the position of the pupil center inside the eye image) for each fixation point. We called this input because that's going to be the input of the mapping function that will estimate the position of the PoR for us. Rotate the eyeball and see how the input changes for different gaze points.

We calculate the mapping function via a calibration procedure. We ask the user to fixate at two different target points in the fixation plane. Use the buttons in the figure, take the first sample point, rotate the eye and take the second sample. After the two calibration samples are taken, you will see a orange line in the figure that shows the relationship between the input and output. This is basically our mapping function that the gaze tracker uses for gaze estimation. By using the equation of this line, we can map any input point (corresponding to each eye orientation) to a point inside the fixation plane. This function works fine as long as the position of the eyeball (head position) is fixed relative to the camera. After the calibration, you will see a point called output along the vertical line that indicates the output of the function for the given input. That is the estimated gaze point which is supposed to be coincide with the point PoR. however, if you move the location of the camera slightly you see that the output deviates significantly from its ideal position (PoR). This shows that an interpolation gaze estimation method that only relies on pupil center is very sensitive to relative movements between the head and the camera.

Robustness Against Head Movements

Although table-mounted gaze trackers are less invasive and more comfortable to the user, they have an undeniable disadvantage of the low tolerance for the users head movements. For that reason, some researchers seek techniques to improve gaze estimation so that the users can move their heads during an eye tracking session.Corneal reflection is a method that compensates for head movements to some degree. It uses the reflection of an infrared light source (i.e., glints) reflected from the surface of the cornea. In general, glints are used as reference points in relation to eye rotation movements. Instead of pupil center only, this method uses the vector connecting the pupil center to the glint center as the main input of the mapping function. Let's see how this works. In the figure, tick the Glint checkbox and redo the calibration (take two sample points) in Glint enabled condition. After calibration, move the camera in front of the eye and see how big the offset (error) is compared to pupil-only condition.

"Eyetracking in Virtual and Augmented Reality " tutorial was presented at ETRA 2018 and it was organized by COGAIN..


to be added later!


The example code presented in the tutorial is now a part of the EyeMRTK which is going to be a toolkit for rapid development of gaze interactive projects for mixed-reality in Unity supporting various eye trackers and VR/AR devices.

Video Tutorial