Research

Make it stand out

Machine Learning for Music

Manchester, UK

UnSupeRvised research by the Machine Learning for Music (ML4M) Working Group explores the creative use of emerging Artificial Intelligence and Machine Learning technologies led by PRiSM (RNCM), the NOVARS Research Centre (The University of Manchester) and the Alliance Manchester Business School

Research Partnerships

We are expanding our network of researchers and partner institutions in areas of computer science, music, media performance and videogaming involving artificial intelligence.
ML4M provides yearly opportunities to postgraduate students from both RNCM and University of Manchester, as well as appointed guest artists, supported by Machine Learning experts at the Alliance Manchester Business School, PRiSM and the Computer Science Department at the University of Manchester.

Resources: PRiSM SampleRNN

PRiSM SampleRNN is a project including development of the code prism-samplernn, a computer-assisted compositional tool released on GitHub in June 2020 as part of PRiSM Future Music #2, and PRiSM’s first major contribution to the field of Machine Learning. It generates new audio outputs by ‘learning’ the characteristics of an existing corpus of sound or music. Changing parameters of the algorithm and how the dataset is organised significantly changes the output, making these choices part of the creative process. The audio generated can be used directly in a composition or to inform notated work to be played by an instrumentalist. Development of the software is funded by Research England, Expanding Excellence in England (E3).

Prism-samplernn
Led by Sam Salem and Christopher Melen (Hongshuo Fan since 2023)
Initiated by Sam Salem
prism-samplernn code by Christopher Melen

Prism-samplernn code Github

Prism-samplernn Google Colab Notebook

A PRiSM Collaboration also involving David De Roure, Marcus du Sautoy, and Emily Howard.

The RNCM Centre for Practice & Research in Science & Music (PRiSM) is funded by the Research England fund Expanding Excellence in England (E3).

Resources: NOVARS, University of Manchester

Dedicated Machine Learning High Performance Server at NOVARS.

NOVARS Hub Room is now home of a new PC tower dedicated to Machine Learning for NOVARS Postgraduates and Resident Artists or Scientists. This machine is integrated and accessible from the 4 cutting-edge Studios at NOVARS hosting up to 32-channel surround sound system including Genelec, ATCs and PMC speakers.
Among the Algorithms we are using for training and generating audio or datasets we use:

PRiSM SampleRNN - An Unconditional End-to-End Neural Audio Generation Model
NVIDIA WaveGlove - related CCRMA paper by Prateek Verma and Chris Chafe *different implementation.
MIDI-DDSP - A hierarchical audio generation model for synthesizing MIDI expanded from DDSP
Wav2CLIP - A robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP).
MAGENTA/MT3 - Multi-Task Multitrack Music Transcription model that uses the T5X framework. Read 2021 paper at ISMIR or 2022 paper at ICLR.
SonOpt-1.0 by TASOS Asonities (UnSupervised) . a Max/MSP, PythonOSC application for sonifying bi-objective population-based optimization algorithms.

High Performance Computing (HPC), the Computational Shared Facility (CSF)
Part of the increasing processing power required for UnSupervised is provided by University of Manchester HPC.
The CSF3 is a new HPC system at the University comprising new compute hardware (CPUs, GPUs) and the existing hardware from the CSF2 and DPSF systems. The CSF (aka Danzek) is a High Performance Computing (HPC) cluster (~9,700 cores + 68 GPUs) at the University of Manchester, managed by IT Services for the use of University academics, post-doctoral assistants and post-graduates to conduct academic research.

It is built on a shared model: the majority compute nodes are funded by contributions of funds to the system by University research groups; the cost of infrastructure such as login nodes, fileservers and network equipment is, for the most part, paid for by the University.
Academics are encouraged to contribute financially to the CSF rather than purchase their own smaller HPC clusters. The funds are used to buy compute hardware which is pooled in to the system. You are then given a proportional share of the available throughput in the system. Please see the benefits of the CSF for details on why this model is better than buying your own hardware.
The CSF is suitable for a variety of workloads. Small–moderate parallel jobs (2-120 cores), serial jobs (1-core), high throughput jobs (running many copies of the same application at the same time to process many datasets) and GPU jobs (using Nvidia v100 Volta GPUs) are all supported. The number of jobs you can submit to the system is not restricted. The time it takes to run all of your jobs depends on your group’s contribution to the system.
There is also some limited “free at the point of use” resource available in the CSF funded by the University. Please contact us if you are interested in using this.

For groups wishing to run larger parallel HPC jobs (128-1024 cores) the HPC Pool provides another resource (4096 cores in total). A separate, per-project application process is required to use it. For convenience, the CSF software and file-systems are available on the HPC Pool and so we document that system within these CSF online docs.

Resources Paper Publications by UnSupervised ML4M group

New Interfaces and Approaches to Machine Learning When Classifying Gestures within Music. Entropy 2020, 22(12), 1384
Chris Rhodes, Richard Allmendinger, Ricardo Climent.
Abstract
Interactive music uses wearable sensors (i.e., gestural interfaces—GIs) and biometric datasets to reinvent traditional human–computer interaction and enhance music composition. In recent years, machine learning (ML) has been important for the artform. This is because ML helps process complex biometric datasets from GIs when predicting musical actions (termed performance gestures). ML allows musicians to create novel interactions with digital media. Wekinator is a popular ML software amongst artists, allowing users to train models through demonstration. It is built on the Waikato Environment for Knowledge Analysis (WEKA) framework, which is used to build supervised predictive models. Previous research has used biometric data from GIs to train specific ML models. However, previous research does not inform optimum ML model choice, within music, or compare model performance. Wekinator offers several ML models. Thus, we used Wekinator and the Myo armband GI and study three performance gestures for piano practice to solve this problem. Using these, we trained all models in Wekinator and investigated their accuracy, how gesture representation affects model accuracy and if optimisation can arise. Results show that neural networks are the strongest continuous classifiers, mapping behaviour differs amongst continuous models, optimisation can occur and gesture representation disparately affects model mapping behaviour; impacting music practice. View Full-Text

Keywords: interactive machine learning; Wekinator; Myo; HCI; performance gestures; interactive music; gestural interfaces; gesture representation; optimisation; music composition

New Interfaces for Classifying Performance Gestures in Music. Conference Paper at Intelligent Data Engineering and Automated Learning – >>IDEAL 2019 Best Student Paper Award<<
Chris Rhodes, Richard Allmendinger, Ricardo Climent.
Abstract
Interactive machine learning (ML) allows a music performer to digitally represent musical actions (via gestural interfaces) and affect their musical output in real-time. Processing musical actions (termed performance gestures) with ML is useful because it predicts and maps often-complex biometric data. ML models can therefore be used to create novel interactions with musical systems, game-engines, and networked analogue devices. Wekinator is a free open-source software for ML (based on the Waikato Environment for Knowledge Analysis – WEKA - framework) which has been widely used, since 2009, to build supervised predictive models when developing real-time interactive systems. This is because it is accessible in its format (i.e. a graphical user interface – GUI) and simplified approach to ML. Significantly, it allows model training via gestural interfaces through demonstration. However, Wekinator offers the user several models to build predictive systems with. This paper explores which ML models (in Wekinator) are the most useful for predicting an output in the context of interactive music composition. We use two performance gestures for piano, with opposing datasets, to train available ML models, investigate compositional outcomes and frame the investigation. Our results show ML model choice is important for mapping performance gestures because of disparate mapping accuracies and behaviours found between all Wekinator ML models.

Keywords: Interactive machine learning; Wekinator; Myo; HCI; Performance gestures; Interactive music’; Gestural interfaces

SonOpt: Sonifying Bi-objective Population-Based Optimization Algorithms (2022) •SonOpt-1.0
Tasos Asonitis, Richard Allmendinger, Matt Benatan, Ricardo Climent

>>EvoMUSART 2022 Best Paper Award<<

Abstract
We propose SonOpt, the first (open source) data sonification application for monitoring the progress of bi-objective population-based optimization algorithms during search, to facilitate algorithm understanding. SonOpt provides insights into convergence/stagnation of search, the evolution of the approximation set shape, location of recurring points in the approximation set, and population diversity. The benefits of data sonification have been shown for various non-optimization related monitoring tasks. However, very few attempts have been made in the context of optimization and their focus has been exclusively on single-objective problems. In comparison, SonOpt is designed for bi-objective optimization problems, relies on objective function values of non-dominated solutions only, and is designed with the user (listener) in mind; avoiding convolution of multiple sounds and prioritising ease of familiarizing with the system. This is achieved using two sonification paths relying on the concepts of wavetable and additive synthesis. This paper motivates and describes the architecture of SonOpt, and then validates SonOpt for two popular multi-objective optimization algorithms (NSGA-II and MOEA/D). Experience SonOpt yourself via this https URL

Classifying Biometric Data for Musical Interaction within Virtual Reality. April 2022
Chris Rhodes, Richard Allmendinger, Ricardo Climent.

11th International Conference on Artificial Intelligence in Music, Sound, Art and Design
Publisher: Springer Nature

Abstract

Since 2015, commercial gestural interfaces have widened accessibility for researchers and artists to use novel Electromyographic (EMG) biometric data. EMG data measures musclar amplitude and allows us to enhance Human-Computer Interaction (HCI) through providing natural gestural interaction with digital media. Virtual Reality (VR) is an immersive technology capable of simulating the real world and abstractions of it. However, current commercial VR technology is not equipped to process and use biometric information. Using biometrics within VR allows for better gestural detailing and use of complex custom gestures, such as those found within instrumental music performance, compared to using optical sensors for gesture recognition in current commercial VR equipment. However, EMG data is complex and machine learning must be used to employ it. This study uses a Myo armband to classify four custom gestures in Wekinator and observe their prediction accuracies and representations (including or omitting signal onset) to compose music within VR. Results show that specific regression and classification models, according to gesture representation type, are the most accurate when classifying four music gestures for advanced music HCI in VR. We apply and record our results, showing that EMG biometrics are promising for future interactive music composition systems in VR.

Resources: Conference and Festival Performances, Presentations by UnSupervised MLM Group

11th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) 2022

PAPERS
Classifying Biometric Data for Musical Interaction within Virtual Reality
Rhodes, C. ; Allmendinger, R.; Climent, R.

SonOpt: Sonifying Bi-objective Population-Based Optimization Algorithms (2022)
Tasos Asonitis, Richard Allmendinger, Matt Benatan, Ricardo Climent

The 11th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) 2022 is a multidisciplinary conference that brings together researchers who are working on the application of Artificial Intelligence techniques in creative and artist fields.

There is a growing interest in the application of Artificial Neural Networks, Evolutionary Computation, Swarm Intelligence, Cellular Automata, Alife, and other Artificial Intelligence techniques in fields such as: visual art and music generation, analysis, and interpretation; sound synthesis; architecture; video; poetry; design; and other creative tasks. Therefore, the use of Artificial Intelligence in such creative domains became a significant and exciting area of research. EvoMUSART provides the opportunity to present, discuss and promote innovative contributions and ongoing work in the area.

Following the success of previous events and the importance of the field of Artificial Intelligence applied to music, sound, art and design, EvoMUSART has become an evo* conference with independent proceedings since 2012. The EvoMUSART proceedings have been published in Springer Lecture Notes in Computer Science (LNCS).

BEYOND19 Conference- UK, Edinburgh, United Kingdom
Date : 20 Nov 2019 → 21 Nov 2019
https://beyondconference.org/

POSTER
Sonifying the Self: Biometric Data as the New Paradigm for Interactive Music Composition

Author: Christopher Rhodes

Abstract

In recent years, wearable sensors have allowed us to utilise previously inaccessible forms of biometric data for interactive music composition and live music performances.

In particular, data which measures muscle tension (Electromyographic - EMG). EMG data is interesting to use because it allows for better gestural control when generating a desired sonic output (via Digital Signal Processing - DSP), in comparison to other datasets, such as Electroencephalography (EEG).

New research enquiries for music composition in different modalities (physical and digital spaces) can thus be made as a result of this improved gestural control.

The University of Manchester, UK
14 - 16 November 2019

PAPER
The 20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL) is an annual international conference dedicated to emerging and challenging topics in intelligent data analysis, data mining and their associated learning systems and paradigms.

The conference provides a unique opportunity and stimulating forum for presenting and discussing the latest theoretical advances and real-world applications in Computational Intelligence and Intelligent Data Analysis.

Sponsors: The University of Manchester | Alan Turing Institute | Springer | IEEE CIS UK & Ireland Chapter

EASTN-DC CUNEO, ITALY
8-13 April 2022

METS FEST – MUSIC ON THE (W)EDGE – Festival Europeo di Creatività Digitale – Cuneo,

CONCERT AND TWO PAPER PRESENTATIONS

Since November 2017 METS-Conservatorio di Cuneo become partner (the only one from Italy) of the European Art-Science-Technology Network for Digital Creativity (EASTN-DC), constituted by 14 European and 3 extra-European Institutions (universities, research centers, festivals…)involved in research, technology development, creation and education in the field of technologies applied to artistic creation, supported by the Culture Program of the European Union.

Dal Novembre 2017 METS-Conservatorio di Cuneo è diventato partner (unico italiano) dell’European Art-Science-Technology Network for Digital Creativity (EASTN-DC), costituito da 14 istituzioni europee e 3 extra-europee (università, centri di ricerca, festival…) che si occupano di ricerca, sviluppo tecnologico, creazione e istruzione nel campo delle tecnologie applicate alla creazione artistica, con il supporto del Culture Program dell’Unione Europea.

2019 Giga Hertz Production Award • Produktionspreisträger: innen 2019 -
Performance in October 2021

Hongshuo Fan (China)

für »Handwriting・WuXing« (2019), interaktive Multimedia Peformance, Dauer: 13‘

Biografie: Hongshuo Fan 范弘硕 (1990) ist ein chinesischer Komponist und Künstler im Bereich Neue Medien. Seine Arbeit umfasst eine Vielzahl von interaktiven Multimedia-Inhalten in Echtzeit, wie z.B. akustische Instrumente, Live-Elektronik, generative Visuals sowie Licht- und Körperbewegungen. Seine künstlerische Arbeit und Forschungsinteressen konzentrieren sich auf die Verschmelzung von traditioneller Kultur und moderner Technologie. Sein Werk umfasst Kammermusik, interaktive Live-Elektronik, Installationen und audiovisuelle Werke.
Hongshuo promoviert derzeit am NOVARS Research Centre (University of Manchester) und ist außerdem Dozent für interaktive Medientechnologien und postgradualer technischer Leiter für das MANTIS-System (ein System mit 56 Lautsprechern). Er war Mitglied der Fakultät für elektronische Musik am Sichuan Conservatory of Music in China und Mitglied des Sichuan Key Laboratory of Digital Media Arts.

Introduction and demonstration of Handwriting · WuXing

by Hongshuo Fan
Handwriting · WuXing (手书·五行)} is an 8-channel interactive media composition for live performance, using the hand-controlled gesture recognition Leap Motion Controller (an optical hand tracking module) for drawing and controlling Chinese calligraphy strokes using Machine Learning