Skeleton code for eye-tracking using MediaPipe

Gaze calibration & tracker (.zip) (not complete)

1. Add a few screen points for calibration

In gaze_calibration.py:

h, w = 1080, 1920 # screen resolution
points = [(0, 0), (0, 512), (512, 512)]

2. Add regression to complete the code

pip install scikit-learn

In gaze_trakcer.py:

Initialize the regression model:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly = PolynomialFeatures(degree=2)
gaze_features = poly.fit_transform(gaze_vecs)
model_x = LinearRegression()
model_y = LinearRegression()
model_x.fit(gaze_features, screen_vecs[:, 0])
model_y.fit(gaze_features, screen_vecs[:, 1])

… and map the gaze features to screen positions:

features = poly.transform([[vec[0], vec[1]]])
pred_x = model_x.predict(features)[0]
pred_y = model_y.predict(features)[0]

3. Add Kalman filter for smooth movements

pip install filterpy

In gaze_trakcer.py:

Initialize Kalman filter:

from filterpy.kalman import KalmanFilter

# Kalman filter using filterpy
kalman_filter = KalmanFilter(dim_x=4, dim_z=2)

# Time step between measurements (frame interval assumption)
dt = 1.0
# State transition matrix: [x, y, vx, vy]
# assumes constant velocity motion model for gaze position
kalman_filter.F = np.array([[1, 0, dt, 0],
                            [0, 1, 0, dt],
                            [0, 0, 1, 0],
                            [0, 0, 0, 1]], dtype=np.float32)
# Measurement matrix maps the internal state to observed position [x, y] (probably don't need to tune)
kalman_filter.H = np.array([[1, 0, 0, 0],
                            [0, 1, 0, 0]], dtype=np.float32)
# Measurement noise covariance
kalman_filter.R = np.eye(2, dtype=np.float32) * 0.01
# Process noise covariance (probably don't need to tune)
kalman_filter.Q = np.eye(4, dtype=np.float32) * 1e-4

# Initialization of hidden vectors
kalman_filter.P = np.eye(4, dtype=np.float32) * 1.0
kalman_filter.x = np.zeros((4, 1), dtype=np.float32)

… and apply the filter to the predicted screen positions:

kalman_filter.predict()
measurement = np.array([[final_x], [final_y]], dtype=np.float32)
kalman_filter.update(measurement)

final_x = int(kalman_filter.x[0, 0])
final_y = int(kalman_filter.x[1, 0])

Learn more