Tuesday, 13 September 2016

Quantity of Motion Analysis of Ensemble Performers

One of my interests when starting my PhD was in motion capture and the analysis of three dimensional time-series data. I was recording a group of singers perform songs and I collected a large body of data. I didn't (and to some extent still don't) really know what I'd like to do with the data, as there are almost too many avenues to pursue at this point, and I needed something to help point me in the right direction. I had a lot of questions like "what's going to be happening in the data?" and "what relationships should I start looking at first?" - this last question was quite daunting as I didn't want to waste hours on something unfruitful.

So I decided to just do some extremely basic computer image analysis to look at the amount of overall movement of a group of performers and then look at what stood out in this very much reduced data set. This didn't involve any fancy 3D motion capture but just a video recording of the performance - which is something the Microsoft Kinect automatically collected.

Quantity of motion has been used in musical gesture analysis on a number of occasions but it has been calculated in various ways. One of the most simple was making use of image subtraction - literally subtracting the grey-scale values of video frames from one another. Do calculate the quantity of motion at frame f you need to look at the previous N frames (motion exists through time) and we'll set N as 30. The summation of subtracting consecutive frames from f-N to f  is essentially what gives your value for frame f. The best way of understanding this might be to look at some Python code* that will calculate this:

*This is not a very efficient way to program, but it explains the logic in the most easy to understand way


import cv2
import numpy as np

class load_file:
    def __init__(self, filename, window_size):
        # Read file
        self.file = cv2.VideoCapture(filename)
        self.window_size = window_size

        # Get file attributes
        self.width      = self.file.get(3)
        self.height     = self.file.get(4)
        self.num_frames = self.file.get(7)

    def get_qom(self, frame):

        # Create a blank image

        img = np.zeros((self.width, self.height), dtype=np.uint8)

        # Iterate over the rest of the window
        
        for i in range(frame - self.window_size, frame):

            # Set the file to the frame we are looking at

            self.file.set(1, i)

            # Get FrameA and convert to grayscale

            ret, frame = self.file.read()

            A = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Get FrameB and convert to grayscale

            ret, frame = self.file.read()

            B = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Perform image subtraction on the two frames

            dif = cv2.absdiff(B, A)

            # Add the different to

            img = cv2.add(img, dif)

        # Threshold the final image

        r, img = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

        return img

if __name__ == "__main__":

    # Load the video file
    fn = "path/to/file.wav"
    video = load_file(fn, N=30)

    data = []
    # Iterate over the frames
    for f in range(1, video.num_frames + 1):
        # 'img' is a 2D array that has a visualisation of the quantity of motion
        img = video.get_qom(f)
        # NumPy can count any nonzero values
        val = np.count_nonzero(img)
        # Store the values. The list 'data' can then be plotted over time
        data.append(val)


The Python program uses some very excellent libraries: OpenCV and NumPy which, if you don't have already, you need to install if you want to program with Python. The output of this script, the Python list called 'data', can be used to plot the quantity of motion over time:


This gave me an idea of where the most movement was occurring - it seemed periodic but not frequent enough to be at the bar level, so I added lines where the  phrase boundaries occurred:


I found that the movement and phrasing of the piece were correlated and gave me a good platform to investigate this further. Please feel free to use the script above, or if you are interested in a much more efficient way of doing it - please get in contact and I'm happy to discuss.



Monday, 2 May 2016

My first Algorave performance

So on Friday I performed at my first ever Algorave; an event where digital artists get together to perform music and create visual spectacles using computer code. The music is created using a form of composition called Live Coding where music is algorithmically programmed. I'd been interested in Live Coding ever since my Masters in Computer Music but found the area-specific language, such as SuperCollider and Tidal, a bit difficult to grasp and musical ideas slow to develop. This prompted me to start development on my own system, FoxDot.

FoxDot is a Python based language that takes an object-oriented approach to Live Coding and makes music by creating Player Objects that are given instructions such as the notes to play and their respective durations. The sounds themselves are synthesised in SuperCollider - for which FoxDot was originally designed as an abstraction.

The music at Algoraves comes in a variety of forms but mainly with the intention to make people dance. I was playing alongside some artists of whom I've watched countless videos and even written essays about, so I was very honoured to do so. I was very nervous as it was the first time I'd used my FoxDot language in a public setting and I think it showed in my performance. I noticed that many performers would stand (I chose to sit) and move rhythmically with the music and even spend some time away from the keyboard. By doing this I think  they not only could take a moment to enjoy the occasion, but also have a think about their next 'move' in terms of their sound. I was typing almost constantly and I think that  had a detrimental effect on the overall performance; the set was varied and had  too many lulls - but I did see some people dancing so I can't be completely disappointed.

Here's the whole event on YouTube (I start around 10 min)!


I'm next performing at the International Conference on Live Interfaces at the end of June at the University of Sussex and have given myself the challenge of including a computer vision aspect to the performance and I can't wait!

Monday, 7 March 2016

TOPLAP Leeds: First public outing of "FoxDot"

Outside of my PhD research into nonverbal communication in musicians I also have a passion for Live Coding music and programming my own music-making system, FoxDot. I'm quite new on the Live Coding "scene" and have been wanting to get more involved for the last year or so but never really plucked up the courage to do something about it. Luckily for me I was asked by the Live Coding pioneer, Dr. Alex McLean, to help set up TOPLAP Leeds, a node extension of the established Live Coding group TOPLAP, along with a few other students.

What this means, I don't really know, but Alex encouraged me to put on a small demonstration of the code I've been working on to get some feedback from my peers. It was the first time I had used FoxDot in a public setting that wasn't YouTube and I was actually really nervous. I started working on the system last year for a module in composition as part of my Computer Music MA and from there it grew - but I had never really thought I would use it to perform. The feedback was really positive and the group gave me some great ideas to put into practice but my PhD comes first and I'm trying to limit the amount of time I spend on FoxDot to only weekday evenings and Sunday - but it's just so fun! 

FoxDot is a pre-processed Python based language that talks to a powerful sound synthesis engine called SuperCollider and let's you create music-playing objects that can work together or independently to make music and I'm really hoping to promote it over the next few years amongst the Live Coding community. It's generally in a working state but I have so many ideas for it that it seems like it's still very much in its infancy. The reason for writing this blog post about it is to try and make it all a bit more real. For so long I've been working on it in a private way; only letting the public see it in brief clips on YouTube - it's time I actually started to put it out there and letting it loose on the world. If you're interested in Live Coding, I urge you to check out http://toplap.org and also my FoxDot website https://sites.google.com/site/foxdotcode/ and see what you can do. I'm performing at the ODI Leeds on Friday 29th April and you can get tickets here.

I hope this is the first step of a long and rewarding journey.

Tuesday, 1 March 2016

How to "iterate" over Kinect files and extract RGB video stream

If you have ever tried to extract data from a Microsoft Kinect Extended Event File (XEF), you may have had some trouble getting your hands on the RGB video stream that it contains, like some of the users in this MSDN thread. As mentioned in some of my previous blog posts, I've found it possible to play an XEF file in Kinect Studio and then record data as it is being played, as if it were a live stream. This makes it relatively easy to collect data on the bodies and their joint positions but difficult with the other types of streams due to the amount of data processing that each frame requires.

Incoming RGB video frames need to be written directly to file otherwise you heavily increase the amount of resources needed to store them in memory and risk crashing your computer but not all machines can perform enough I/O operations to keep up with the 30 fps playback rate of the XEF file, which makes extracting this data very difficult for most users. Being able to extract the RGB video data would be really useful when trying to synchronise the data with audio I am recording from a different source; something that is integral to the data analysis in my PhD studies and hard to achieve with the skeleton data alone. It wasn't until recently where I tried pressing the "step file" button in Kinect Studio to see whether each frame could be individually sent to the Kinect Service and, consequently, my data extraction Python applications. To my surprise: it worked!

Close up of the "step file" button

It seems like it would be possible to manually iterate over the file by clicking the "step file" button but for long files this would be very time consuming (a 5 minute file at 30 fps contains around 9,000 frames). Using the PyAutoGUI module I was able to set up an automated click every 0.5 seconds on the "step file" button, which could be specified by hovering over it and pressing the Return key, and iterate over the file automatically and allow me to extract and store the RGB video data successfully. I tried to implement the automated click to press as soon as the frame was processed but got some Windows Errors and will hopefully fix this in future to make the process faster, but right now it's at least a bit easier!

I am also hoping to find a bit of time to write up a simple README for the application, which is available at my GitHub here: https://github.com/FoxDot/PyKinectXEF

Please feel free to use it and give feedback - bad or good - and I look forward to hearing any suggestions!

- Ryan

Thursday, 11 February 2016

Synchronizing Externally Recorded Audio With Kinect Data

One of the biggest downsides to recording musical performances with the Microsoft Kinect V2 is the lack of a high quality microphone. It does contain six very low quality microphones, though, but when I extracted and accumulated the SubAudioFrame data from the Kinect playback, the results were not pretty (but audible, surprisingly) as you can see...


It is possible to get a more accurate waveform but it requires a hefty amount of noise removal and it's almost as useless as the one you see above. To be able to compare Kinect Data to a sound file, you are going to have to record it from a different source. I decided to try recording a few bars of "Eight Days a Week" by the Beatles with a friend using my smartphone, but any real recording should be performed with a much, much better piece of kit.

To synchronise the audio and the visuals I decided to start recording using both my smartphone and the Kinect and then clearly clap, so that I can line up the onset of the clap sound in the audio file, and the frame in which the hands make contact with each other. Unfortunately, to do this just using Python (what I've been writing my scripts in so far) would be a boat load of work, so I used the Python API for OpenCV and PyGame to make a work around. Instead of playing the frame data back to me using the PyGame package, I was able to save the pixel array as a frame of video and store that. (The code I'm working on will be on my GitHub soon - I just have to make sure there's absolutely no way any recorded data can end up there!)


Once I had my audio track clipped, I can compare the waveform and the recordings from the start, or from any point I choose. Next step is to automate the production of a spectogram that will run underneath (or be overlayed by) a graph that plots the performer's movements. Here is a little mock up using 20 seconds of data from the Beatles song.


You can see just from this graph there are some similarities between each of the lines, and also between the lines and the spectogram (created using Sonic Visualiser) that it's on top of. I'll need to get brushing up on my statistics soon to get more detailed analysis out of these sorts of graphs, but things are looking promising.

Links:

OpenCV - http://opencv.org/
Python - https://www.python.org/
PyGame - http://pygame.org/
Sonic Visualiser - http://www.sonicvisualiser.org/
PyKinect2 - https://github.com/Kinect/PyKinect2 (Not mine, but used in my code)
PyKinectXEF - https://github.com/FoxDot/PyKinectXEF (My prototype code)

Saturday, 19 December 2015

Saving the overheads on Kinect joint data

So I've spent a good part of the last month working on a couple of Python applications to get joint coordinates from the skeletons generated from a Version 2 Kinect and finally have something to show for it. When an XEF file is played back using the Kinect Studio, the Kinect Service interprets it as if the data is being captured live and you can get data out of this using the BodyFrame object, which contains data on all the bodies in the current frame and the location of their joints. The only problem is that the files are in the GigaBytes in size and takes up quite a bit of hard drive space. I've managed to store the location of each body's joints in a database with a concise database schema.

I can easily access which  "performance" I want using a unique ID that I assign when I record Kinect data and store it in a database that can be accessed by a variety of platforms using 0.1% of the original hard drive space. At the moment I am only using motion capture data of the 25 joints the Kinect can automatically detect and no RGB or sound data but I want to have some sort of compromise early in the new year. Using this data I can easily play back the data in a much more human-understandable way in case I need to analyse the data visually or quantitatively, or even just to make sense of the data. I've included a screen shot of how this looks below.



The other great thing about having this data in a database is that I don't need to iterate over the XEF files and extract joint coordinates whenever I want to create a graph, for example. I can just read the data out of the database using a simple SELECT query and plot a graph. Already I'm able to generate graphs tracking joint movement, velocity, and acceleration over time - not something that was particularly easy with motion capture data, even a decade ago. 

I can't wait to get recording ensembles and using the kit to get some real data and see what we can find! I've been getting some ideas for studies after my initial exploratory pilot test - watch this space!

Ryan

Monday, 7 December 2015

Extracting data from Kinect V2 .xef files

The Kinect V2 easily allows users to record and store stream data using .xef (extended event files) but getting any data out of that file once it has been recorded has proven to be quite a task. While traditional video data can be looped through frame by frame at your own leisure, the .xef file does not allow (or at least not for me despite my best efforts) this kind of analysis. At first, it seemed that RGB, Depth, or Body Frame data could only be intercepted at run time and, therefore, if I didn't get the data I wanted I would have to record it all again.

Fortunately, playback of stored Kinect data is actually interpreted by the Kinect Service as a live feed and applications designed to extract data at run time can be used to do so with recorded data. The only problem here is that the applications must be efficient enough to allow my machine to cope with handling the playback and data extraction simultaneously without dropping any frames. So far I have been able to extract the X, Y, and Z values of each of the 25 joints in the Body Frames in metres, which should be enough data for what I need but there is still the issue of working out which joint belongs to which body in the scene so we know who is moving in what way.



I'm not a big C# fan and don't enjoy programming in Visual Studio so I was looking at some alternative ways to access the Kinect Service API and found a really nice Python package that even came with a demo game that showed off its neat implementation:


I've spent the last few days getting my head around it and seem to be getting somewhere and I'm really excited to record a group of singers and seeing if I can get all the data I need. I can then easily store joint co-ordinates in a SQL database (Probably will do one for each recording) and run a few scripts on the these to get some nice pretty graphs. Can't wait to get started!

Ryan