Daniel Oore (IICSI, MUN), “moocow: A Portrait of the Artist as a Young Robot,” (original musical composition using machine learning AI), NeurIPS 2020 Creativity Workshop (34th Conference and Workshop on Neural Information Processing Systems)

All raw MIDI files used (to trigger instrument/synth sounds in this piece) were generated by a machine learning system, based on the transformer architecture, that was fed the recording of the text reading (also included throughout the piece).

***

moocow: A Portrait of the Artist as a Young Robot
If a robot contributed a portrait of himself as a young man to an online gallery of AI art, what aspects of the robot’s sonic künstlerroman might elicit his pride or shame? Would it be the (lack of) emotional and spiritual impact of his statement? Or the (lack of) seamless integration of his inherited knowledge, demonstrated across both his statement’s small fragmentary scale and large arcing scale?

***

  • All (twenty six) choir voices composed, arranged, and sung by Daniel Oore
  • All (audio and MIDI triggered) sounds orchestrated, arranged, and edited by Daniel Oore
  • All raw MIDI files (notes & their relative durations) generated by a machine learning system by Jason D’Eon, Sri Harsha Dumpala, Sageev Oore, citation:
    • Jason d’Eon (Dalhousie, Vector), Sri Harsha Dumpala (Dalhousie, Vector), Sageev Oore (Dalhousie, Vector), Daniel Oore (IICSI, MUN), “A Speech-Based Music Composition Tool with Transformer,” NeurIPS 2020 (34th Conference and Workshop on Neural Information Processing Systems)
    • Trained using MAESTRO dataset
  • Text reading by Sageev Oore (audio fed into machine learning system to generate MIDI files)
  • Text by James Joyce
  • Image by Daniel Oore

***

Below, is a screenshot of the Ableton Live software window, representing all the raw MIDI and audio files respectively, as they are sequenced (left to right) in “tracks” (horizontal rows each containing digital software instruments to play MIDI files and audio channels to play audio files) from the beginning to end of the “moocow” audio piece:

Track numbers are indicated in small yellowish boxes along right-hand margin, e.g.:

To the left of yellow track number boxes, are labels with the track name (in yellow, blue and magenta), e.g.:


Each track contains fragments of audio or MIDI files…

From top to bottom of the full screen screenshot:

  • Track 1 (yellow) named “TEXT READING AUDIO” contains (repeating fragments of) the original audio file (yellow and labeled “OnceUponATime”) of the James Joyce text read by Sageev Oore (i.e. “Once upon a time and a very good time it was there was a moocow coming down along the road”), e.g.:

 

  • Tracks 2-11 (blue) are MIDI (synthesized or sampled) instruments, each of which is triggered by the raw MIDI files (and all raw MIDI files are generated by the machine learning system from the original “OnceUponATime” audio file seen in track 1, above), e.g.:

      • A label on each MIDI track indicates the types of (synthesized/sampled) instrument being triggered by the raw MIDI files (e.g. TRUMPET, FLUTE, ORGAN, TREMOLO [strings], etc.) and in some cases, also indicating whether the MIDI file is transposed by one or two octaves (indicated in the track name with “+12 st” [st = semitone] or “+24 st”, respectively).

All together, the different triggered MIDI instruments (without the playback of the text reading and choir singing audio files), sound like this, e.g.:

      • The MIDI files (green– and blue-tabbed files containing tiny black rectangles of different heights and lengths representing pitch and duration information respectively, and which trigger the MIDI instruments in the tracks that they populate) are each labeled along their green and blue tabs according to the manner in which the given raw MIDI file was generated, e.g.:
      • i.e.:
          • A MIDI file labeled “formant-extraction” (green), triggers a TRUMPET (track 8), e.g.:

      • MIDI files labeled “gap-fill-unconditioned” (blue), trigger an ORGAN (track 7) and electronic drum BEATS (track 6)

      • MIDI files labeled “gap-fill” (green), trigger a TREMOLO cello ensemble (track 9)

      • MIDI files labeled “overwrite-formant” (blue), trigger a MUTED TPT [trumpet] (track 3) and a FLUTE (track 4), both altered with an echo-delay effect

      • MIDI files labeled  “overwrite-gap-fill-unconditioned” (green), trigger a CELLO (track 10) and BASS (track 11)

      • MIDI files labeled “overwrite-gap-fill” (green), trigger a moon guitar [aka yueqin] (labeled “MOONTAR“)

  • The different types of audio-to-midi generation (e.g. formant extraction, gap-fill, etc.) are described in the following paper and presentation:
    Jason d’Eon, Sri Harsha Dumpala, Sageev Oore, Daniel Oore, “A Speech-Based Music Composition Tool with Transformer,” accepted to NeurIPS 2020 (34th Conference and Workshop on Neural Information Processing Systems)
  • Tracks 12-14 (magenta) named “MOO CHOIR (17 voices)” and “MOOCOW CHOIR (9 voices)” contain: audio files of the choirs composed and sung by Daniel Oore (singing “Moocow a commin'” and “Moo, moo, moo… [etc.]”, respectively), e.g.:

 

  • Red lines running vertically and diagonally along each track are (in all but the top two tracks) volume envelopes (high/low volume), e.g. (image below):
  • Transparent boxes (labeled in the middle of the outline with “Clip Deactivated”) in some tracks, indicate audio or MIDI files that were explored but then deactivated for musical/compositional reasons, e.g.: