I enjoyed creating different loop configurations with the Chaosflöte and appreciated how flexible the MaxMSP and Supercollider coding environments were to enable them. Yet I was eventually bothered by the element of predictability that the loops carried with it. It seemed to me that the loops themselves began to dictate the form of the many of my improvisations: First, the texture is very thin, as the software records the live sound material to be used in the loops. Then the texture begins to build as the first loop is introduced. The piece reaches some kind of climax, usually with a dense texture. Then somehow it ends through a process of reduction (or, if timed well, it could even end abruptly).
Managing input controls
There was another element that bothered me: There are limited physical inputs on the Chaosflöte, and in the vast majority of my improvisations, I would use these inputs to turn on or off certain live processes. This results in the need to always manually decide the activation/deactivation of these processes, which yields a couple of disadvantages: one becomes limited in the number of sound effects one can specifically employ at any given time, given the limited number of input controls; one can become preoccupied with activating and deactivating sound processes, the action of which detracts from the actual finger movement required to play the flute.
AIYA: Deconstructing key elements in human-to-human improvisation
I wanted to rethink both the looping mechanism and the sound processing controls of the Chaosflöte, and it was during this time that AIYA was created. At its core, AIYA is a software patch made in the MaxMSP environment, which sends instructions via Open Sound Control to other programs that generate the audio and visuals of AIYA. Initially, I created AIYA under the frame of the MTR Seminar “The Experiment in Arts and Science”; I had approximately four days to create and present an experiment pertaining to one’s own artistic practice. During this time, I put away my flute and began to think about the process of improvisation at its essence. How can the process of improvisation be translated into code? Are there core aspects of the improvisation process that can be convincing for a computer to emulate and productive to perform in human-computer improvisation? I brainstormed the following musical devices I could identify in my own improvisation practice, which I categorized into analysis and behavior elements. For each list of identified musical elements in human-to-human improvisation, I propose the following translations into actionable sonic transformations that the computer could enact (note: not all of these have been implemented in my own works, but can serve as references for further exploration):
Musical elements in improvisation practice
Translated into computer-actionable sonic transformations (real-time)
…of sonic entrance/exit of instrument (is someone playing or not?)
Set an audio gate; when the threshold has been reached, something is considered to be “playing.”
…of current playing dynamic level / of average playing dynamic level
Amplitude measurement of the audio signal
…of current pitch content / of pitch content over time (e.g., “what key are we playing in, is this applicable?”)
…of current “energy”  of improvisation / of average “energy” of improvisation
Frequency measurement of the audio signal; played pitches can be accumulated into a buffer which can be analyzed to analyze the “key”  in which one is playing.
Combined calculation of acceleration/velocity values of motion sensor, velocity of changes from note to note, velocity of changes in timbre
(not yet implemented) Amplitude envelope tracking
…of sound envelopes (soft entrances? sharp entrances? is one playing staccato?)
…of physical bodily gestures
Measurement of acceleration/velocity values of motion sensor
Musical elements in improvisation practice
Translated into computer-actionable sonic transformations (real-time)
Call and response
Buffer recall, buffer recall with transformation, complex delay lines
Creating foreground and background roles
Amplitude modulation, filtering, playing with repetition or holding of musical gestures/notes (possibly to indicate background textures, as an example)
Buffer recall combined with pitch shifting, time shifting, and/or timbral alteration of original source sound
Analyze the data parameters generated in real-time by the human (pitch, amplitude, motion, etc.), and generate material with qualities opposing these. Additionally, a timer and/or trigger could be set to activate/deactivate the "counter" behavior.
Anticipation/foreshadowing of next musical action
Analyze the data parameters generated real-time by the human (pitch, amplitude, motion, etc.), and generate material with qualities resembling these. Additionally, a timer and/or trigger could be set to activate/deactivate the "follow" behavior.
Create a [rhythmically] consistent visual gesture that precedes the main action (e.g., flashing the screen once to indicate a change in sound material)
Finding hybrid forms of expressions through imperfect translations
Part of me wonders if it is problematic that I try to recreate Western music practices within the machine. Hayles warns of a similar phenomenon occurring in virtual reality, that the majority of commercial and military VR projects attempt to reconstruct the mechanics of the “real world” into the virtual, and in doing so also replicate some of the same problematics (Hayles, 2002). Already I must acknowledge that trying to replicate the language of Western musical devices in the improvisation machine might end up limiting the potential expressive capacity of the improvisation and discourage exploration into alternative forms of performance. On the other hand, Western musical practice has been ingrained in my training as a professional musician over the past 20+ years. I must acknowledge that this is my starting point, and perhaps by the inevitable imperfections in my translations from Western musical practice to machine behavior, I can find a hybrid form of expression and performance practice with the machine.
Indeed, I have found that not only the imperfections of the translations, but also the glitches and every unexpected behavior of the machine is what creates openings in the otherwise rigid options of music making. One such example of this occurred in an audio delay effect I had programmed for the Chaosflöte. Its intended function was to serve as a delay of the original signal over five repetitions. Instead of a one-to-one repetition of the original signal over decreasing amplitude, each repetition shifted its pitch downwards by a semitone. Moreover, the delay sometimes spanned three repetitions instead of five, or sometimes two, or one, the number of which was out of my immediate control…This version of the sound process sounded to me orders of magnitude better than my original idea, and I decided to keep it for the performance. To this day, I am not sure what is causing the glitch, but I interpret it as a way to keep the machine’s own voice, in a way.
I regard glitches as a way for the machine to appear as a “separate entity,” and perhaps even as an improvisation partner. This contrasts with the aforementioned attribute in Section 2, regarding the machine as an extension of the body: “The expected output and the actual output of the audiovisual electronics are consistently aligned as closely as possible.” Glitches, even when they have a technically logical reason behind their actions, are the embodiment of the mismatch between expected output and actual output, and can open a way for the machine to gain perceived agency.
Some glitches, however, are regarded as problematic and foster my distrust of the machine as an improvisation partner. Lack of control and not knowing the reason behind the action: These are the main two attributes that characterize unfavorable glitches. There is a certain vulnerability one experiences when losing control and not knowing why. Yet, these are aspects that are also shared by “favorable” glitch moments and also occur all the time in human-to-human musical improvisation. Why could I not have the same trust for the machine’s actions that I do for my fellow musicians? Perhaps this has to do with the following: The machine/AIYA has at this time no significant history as perhaps a fellow human musician who had trained for decades in a practice of music performance I am familiar with. I know that if my improvisation partner plays an unexpected note, I can trust my improvising partner to keep going or take the improvisation into a new direction because I know this musician’s history of playing. So perhaps, I cannot trust a musician without having some idea of their history. What does this mean for the machine? Do I need to build a history for it as well?
One must also consider the other side of time: that of communicating the future actions of the improvisation machine. Hoffman highlights the important aspect of anticipation in human-robot interaction, as seen through his experiments with humans and two groups of robots: one categorized as “reactive” (decision-making based on only the currently perceived state) and the other as “anticipatory” (decision-making based on the combination of both the existing state and the predicted activity of the human teammate) (Hoffman, 2005). The trials involving the anticipatory robots reported that the robots seemed to reflect more “human” characteristics and appeared to exhibit more attributes of “intelligence” and “commitment of the robot to the team” than the trials involving the reactive group of robots. I would believe that refining these attributes can also contribute to the feeling of trust in the machine, and foster healthier relationships with improvisation machine partners.
As a first step, I have begun constructing AIYA’s musical history from its creation in April 2019. For each evolution of the improvisation machine, a corresponding work has been made and will be analyzed in the following sections:
3.2 AIYA for MIDI Keyboard - the first AIYA prototype
3.3 AIYA improvisations A and B – preliminary audio improvisation sketches with AIYA
3.4 bad decisions – first “full” performance work created with AIYA using a mix of live-generated audio, fixed audio, live-generated visuals, and fixed visuals
3.5 A->B first work to use only live-generated material
3.6 diaphragma – audiovisual performance work with AIYA with redesigned buffer system
3.7 aiya meets self – audiovisual installation work using PoseNet and no physical on/off switch for the machine
4. black box fading – 360°/VR video, interactive installation, and spatial audio work
 While this can be useful for some improvisations, especially in the Jazz context, many improvisations do not follow the traditional Western concept of “playing in a key”; thus, the usefulness of this measurement can be negotiated.
 In music, the term “energy” is often used as a qualitative assessment of a performance. Yet, I find the term “energy” to be difficult to define, especially in the quantitative sense. Is it simply a measure of velocity? Is it a measure of human intensity?