Workout Monitoring Using Optical-Flow — Part 2
How we used Optical Flow to track weight-lifting
This is part 2 of a series of posts. In case missed part 1, read it here.
For a list of all posts in the series, click here.
Previously on “Optical-Flow to Monitor Workout Exercises”, We introduced the project, how it works and the algorithmic process that lead us to the final implementation of our system.
In this part, we’ll dive into the theoretical and technical aspects of our system, focusing mainly on optical flow and how it was used to provide feedback for dumbbell weight lifting.
As you recall from the last part, our main challenge was segmenting the trainee from the background. This was eventually achieved by using a movement estimation algorithm called optical flow.
Optical Flow
Problem Formulation
Optical flow represents the motion of a scene relative to an observation point. In laymen terms, it’s a way to measure movement between two video frames. For a given video frame 𝐼(X,Y,𝑡), Optical flow aims to estimate movement between consecutive frames in a sequence, meaning we would want an equation connecting 𝐼(𝑥,𝑦,𝑡) and 𝐼(𝑥+𝛿𝑥,𝑦+𝛿𝑦,𝑡+𝛿𝑡) where (𝛿𝑥,𝛿𝑦) is the small change in movement between the frames over 𝛿𝑡, a small amount of time. This can be illustrated by the following figure:
In order to model the optical flow problem, we must first assume constant brightness, meaning that the value of each pixel remains the same over small changes in position and time period. This assumption allows us to write the following:
We can now apply the first order Taylor Expansion to the right-hand side of the equation:
Removing common terms and dividing by 𝛿𝑡 would yield:
Assuming the change in location and time are small enough, this equation can be replaced with:
Where 𝑉=[𝑉𝑥,𝑉𝑦] is the vector of velocities we’re interested in and 𝛁𝑰 is the gradient of the frame I. The above equation is known as gradient constraint equation, where the gradient of the image can be easily computed, the velocities (𝑉𝑥,𝑉𝑦) are the unknown variables of interest.
Solving Optical Flow Using Farneback’s Method
Now that we understand the problem at hand, we can finally try to solve it. There are many ways to solve optical flow, but for now we will focus on Farneback’s method as it is the one we used for this project.
This method provides dense optical flow solutions based on matrix polynomial expansion and image pyramids. The idea is to split the image into neighborhoods of fixed size (3x3 or 5x5 for example), which will be represented by a second-order matrix polynomial like so:
Where x is the environment representing the image neighborhood, A is a symmetric matrix, b is a vector and c is a scalar. The coefficients A, B and c, are estimated from a Weighted Least-Squares fit to the image values in the neighborhood.
Using this representation, we can now model the velocity field as a translation transformation applied to the polynomial function of two frames. Given two polynomial functions of the same neighborhood in two different frames, the optical flow equation now becomes:
Where d is the displacement distance between the 2 frames. Plugging in the polynomial expressions we get:
Solving for d gives us the distance each pixel moved between the two frames, and given a reference time unit between frames, this effectively represents the velocity field, thus solving optical flow. For more details, both MATLAB and OpenCV have excellent implementations of Farneback’s method.
Weight-Lifting Exercise
Ok, so we understand the optical flow problem and how to approximate solutions for it efficiently — how does this help us with weight lifting?!
Using optical flow helpes us in more than one way:
- Segmenting the user from the background
- detect which side the trainee is facing (useful for better feedback)
- count repetitions and track the arm’s range of movement
Let’s unpack each of these uses on order to fully appreciate the power optical flow.
Segmentation Using Optical Flow Magnitude
Optical flow helped us solve the challenge of trainee segmentation. The best way to understand how it worked is to visualize the velocity fields obtained by solving optical flow. These can be represented in Cartesian coordinates (velocity along X and Y axes), but for now let’s use Polar coordinates which give each point a magnitude of movement and a movement angle to represent direction of movement. Let’s start by visualizing the normalized magnitude of movement:
This already looks very close to a good segmentation mask, since weight-lifting arm and the torso have the largest magnitude. We just need to use smart thresholding and filtering mechanisms to ensure we get a robust segmentation mask around the arm and the weight.
After some trial and error, we found that the best solution was to threshold the magnitude and then apply blob analysis to keep only the largest blob. This process is illustrated below:
In the final system, we visualized the mask and the optical flow vectors inside it using a quiver plot on top of the original image. Here’s an example for a downward movement:
Feedback Using Optical Flow Angle
Once we have our segmentation mask, the next step is to utilize the velocity angle in order to provide feedback to the trainee. Let’s observe the average velocity angle inside the masked area:
The angle is measured in degrees relative to the standard axes system, meaning angle 0 is when the arm is parallel to the X axis and angle 90 is when the arm is parallel to the Y axis. When observing the smoothed graph (on the right), you can easily see the sinusoidal pattern of each repetition of the exercise — in this case, 2 pairs of maximum and minimum points.
By measuring the average angle of the segmented area over time, we can generate this graph in real time and count repetitions by detecting extremum point. We can also use the angle data to provide a more qualitative feedback to the user by measuring the range of arm movement. For example, a repetition ranging from roughly -50 degrees to +100 degrees is a good one, but one that ranges from -10 to +45 degree is not.
Segmentation Refinement Using Cartesian Optical Flow
One last perspective on optical flow, is the use of its Cartesian representation to determine which hand is being used to lift the weight. This becomes quit helpful in refining the mask to only include the forearm and the hand of the trainee, and filtering out most of the torso. A refined mask provides more accurate angle measurements which allow for better and more robust exercise monitoring and feedback.
As previously mentioned, the Cartesian representation of optical flow velocity field is simply a pair (Vx, Vy) — velocities per axis. Let’s take a look at 2 examples of such representation:
In order to infer which arm is doing the lifting from these graphs, let’s focus on the first 10 frames. A negative spike in Vy correlates with the upwards motion at the beginning of the exercise (remember that in images the Y axis is inverted).
Assuming the trainee is standing at a profile and the lifting arm is closer to the camera, we can detect which arm is lifting by looking at Vx during the first upward movement of the exercise.
Can you figure out which arm is doing the lifting in each of the graphs above? The answer is at the end of this post.
Once we know which side is doing the lifting we can crop the segmentation mask to only include pixels that are close enough to the weight. For example, if the trainee is lifting with his right hand, we can crop out all the pixels that are too far from the leftmost pixel (which is the weight).
Putting It All Together
So far we’ve used optical flow to generate a segmentation mask and to track the arm’s angle during the weight-lifting exercise. This covers the essential components of our project. The final system is illustrated in the block diagram below:
As you can see, there is a lot more to cover, but this goes beyond the scope of this article. Future parts of the series will touch on different aspect of the system by explaining how other exercises were monitored.
As for the side detection game — the graph on the left represents lifting with the left hand, and the one the right is for the right hand. Lifting with the right hand facing the camera (just like the image at the top) means that when lifting the hand upwards (negative Vy), the movement in the X axis is in the positive direction until the angle is 0, and then it switches direction. This is shown in the Vx wavelet on the right graph where it starts with a positive spike and then changes sign (and vice versa for the left side).
References
Farneback, G. “Two-Frame Motion Estimation Based on Polynomial Expansion.” In Proceedings of the 13th Scandinavian Conference on Image Analysis, 363–370. Halmstad, Sweden: SCIA, 2003
(all visualizations by author)