So I decided to just do some extremely basic computer image analysis to look at the amount of overall movement of a group of performers and then look at what stood out in this very much reduced data set. This didn't involve any fancy 3D motion capture but just a video recording of the performance - which is something the Microsoft Kinect automatically collected.
Quantity of motion has been used in musical gesture analysis on a number of occasions but it has been calculated in various ways. One of the most simple was making use of image subtraction - literally subtracting the grey-scale values of video frames from one another. Do calculate the quantity of motion at frame f you need to look at the previous N frames (motion exists through time) and we'll set N as 30. The summation of subtracting consecutive frames from f-N to f is essentially what gives your value for frame f. The best way of understanding this might be to look at some Python code* that will calculate this:
*This is not a very efficient way to program, but it explains the logic in the most easy to understand way
import numpy as np class load_file: def __init__(self, filename, window_size): # Read file self.file = cv2.VideoCapture(filename) self.window_size = window_size # Get file attributes self.width = self.file.get(3) self.height = self.file.get(4) self.num_frames = self.file.get(7) def get_qom(self, frame): # Create a blank image img = np.zeros((self.width, self.height), dtype=np.uint8) # Iterate over the rest of the window for i in range(frame - self.window_size, frame): # Set the file to the frame we are looking at self.file.set(1, i) # Get FrameA and convert to grayscale ret, frame = self.file.read() A = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Get FrameB and convert to grayscale ret, frame = self.file.read() B = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Perform image subtraction on the two frames dif = cv2.absdiff(B, A) # Add the different to img = cv2.add(img, dif) # Threshold the final image r, img = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY) return img if __name__ == "__main__": # Load the video file fn = "path/to/file.wav" video = load_file(fn, N=30) data =  # Iterate over the frames for f in range(1, video.num_frames + 1): # 'img' is a 2D array that has a visualisation of the quantity of motion img = video.get_qom(f) # NumPy can count any nonzero values val = np.count_nonzero(img) # Store the values. The list 'data' can then be plotted over time data.append(val)
The Python program uses some very excellent libraries: OpenCV and NumPy which, if you don't have already, you need to install if you want to program with Python. The output of this script, the Python list called 'data', can be used to plot the quantity of motion over time:
This gave me an idea of where the most movement was occurring - it seemed periodic but not frequent enough to be at the bar level, so I added lines where the phrase boundaries occurred:
I found that the movement and phrasing of the piece were correlated and gave me a good platform to investigate this further. Please feel free to use the script above, or if you are interested in a much more efficient way of doing it - please get in contact and I'm happy to discuss.