Difference between revisions of "356-winter-2023/hw2"

From CCRMA Wiki
Jump to: navigation, search
(Things to Think With)
 
(41 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Programming Project #2: "Featured Artist" =
 
= Programming Project #2: "Featured Artist" =
[https://ccrma.stanford.edu/courses/356/ Music and AI (Music356/CS470)] | Winter 2023 | by Ge Wang
+
[https://ccrma.stanford.edu/courses/356-winter-2023/ Music and AI (Music356/CS470)] | Winter 2023 | by Ge Wang
  
 
<div style="text-align: left;">[[Image:Mosaiconastick.jpg|400px]]</div>
 
<div style="text-align: left;">[[Image:Mosaiconastick.jpg|400px]]</div>
  
In this programming project, we will learn to work with '''audio features''' for both supervised and unsupervised tasks. These include a '''real-time genre-classifier''' and a '''feature-based audio mosaic tool''' using similarity retrieval. Create a feature-driven '''musical statement or performance'''!
+
In this programming project, we will learn to work with '''audio features''' for both supervised and unsupervised tasks. These include a '''real-time genre-classifier''' and a '''feature-based audio mosaic tool'''. Using the latter, create a feature-based '''musical statement or performance'''!
 
+
* Phase One (Colab + ChucK):
+
** understanding audio, FFT, feature extraction
+
** extract different sets of audio features from GTZAN dataset
+
** run real-time classifier using different feature sets
+
** run cross-validation to evaluate the quality of classifier based different features
+
 
+
* Phase Two:
+
** curate your own set of sounds and songs
+
** build a database mapping sound <=> feature vectors
+
** using similarity retrieval, build a feature-based sound explorer
+
** using your database and retrieval tool, design an interactive audio mosaic tool
+
*** real-time
+
*** feature-based
+
*** used for performance
+
*** takes any audio input (mic or any unit generator)
+
** (optional) do this in the audiovisual domain
+
 
+
* Phase Three
+
** use your prototype from Phase Two to create a musical statement
+
** (optional) do this in the audiovisual domain
+
  
 
=== Due Dates ===
 
=== Due Dates ===
* Milestone 1: '''webpage due Monday (1/30, 11:59pm) | in-class check-in Tuesday (1/31)'''
+
* Coding tutorial: '''Thursday evening'''
* Final Deliverable: '''webpage due Wednesday (2/8, 11:59pm)'''
+
* Milestone (Phase One complete + Phase Two prototype): '''webpage due Monday (2/6, 11:59pm) | in-class critique Tuesday (2/7)'''
* In-class Presentation: '''Thursday (2/9)'''
+
* Final Deliverable: '''webpage due Monday (2/13, 11:59pm)'''
 +
* In-class Presentation: '''Tuesday (2/14)'''
  
 
=== Discord Is Our Friend ===
 
=== Discord Is Our Friend ===
Line 36: Line 16:
  
 
=== Things to Think With ===
 
=== Things to Think With ===
* read/skim the classic article [https://ccrma.stanford.edu/courses/356/readings/2002-ieee-genre.pdf "Musical Genre Classification of Audio Signals"] (Tzanetakis and Cook, 2002)
+
* read/skim the classic article [https://ccrma.stanford.edu/courses/356-winter-2023/readings/2002-ieee-genre.pdf "Musical Genre Classification of Audio Signals"] (Tzanetakis and Cook, 2002)
 
** don't worry about the details yet; first get a general sense what audio features and how they can be used
 
** don't worry about the details yet; first get a general sense what audio features and how they can be used
  
 
=== Tools to Play With ===
 
=== Tools to Play With ===
* get the latest [https://ccrma.stanford.edu/courses/356/bin/chuck/ '''bleeding edge secret <code>chuck</code> build'''] (2023.01.23 or later!)
+
* get the latest [https://ccrma.stanford.edu/courses/356-winter-2023/bin/chuck/ '''bleeding edge secret <code>chuck</code> build'''] (2023.01.23 or later!)
 
** '''macOS''' this will install both command line <code>chuck</code> and the graphical IDE miniAudicle, and replace any previous ChucK installation.
 
** '''macOS''' this will install both command line <code>chuck</code> and the graphical IDE miniAudicle, and replace any previous ChucK installation.
 
** '''Windows''' you will need to download and use the bleeding-edge command line <code>chuck</code> (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default <code>cmd</code> command prompt, or might consider downloading a [https://www.puttygen.com/windows-terminal-emulators terminal emulator].
 
** '''Windows''' you will need to download and use the bleeding-edge command line <code>chuck</code> (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default <code>cmd</code> command prompt, or might consider downloading a [https://www.puttygen.com/windows-terminal-emulators terminal emulator].
Line 46: Line 26:
 
** '''all platforms''' for this project, you will be using the command line version of chuck.
 
** '''all platforms''' for this project, you will be using the command line version of chuck.
 
* NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest [https://chuck.stanford.edu/ official ChucK release]
 
* NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest [https://chuck.stanford.edu/ official ChucK release]
 +
* [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/ '''sample code for all phases'''] (including optional video starter code)
  
 
=== GTZAN Dataset ===
 
=== GTZAN Dataset ===
Line 51: Line 32:
 
** 1000 30-second music clips, labeled by humans into ten genre categories
 
** 1000 30-second music clips, labeled by humans into ten genre categories
  
=== HW2 Sample Code ===
+
=== Phase One: Extract, Classify, Validate ===
* you can find [https://ccrma.stanford.edu/courses/356/code/featured-artist/ '''sample code here''']
+
* understanding audio, audio features, FFT, feature extraction
* start playing with these, and reading through these to get a sense of what the code is doing
+
* extract different sets of audio features from GTZAN dataset
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase1 '''word2vec-basic.ck'''] -- basic example that...
+
* run real-time classifier using different feature sets
*** loads a word vector
+
* run cross-validation to evaluate the quality of classifier based different features
*** prints # of words and # of dimensions in the model
+
* you can find [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify'''relevant code here''']
*** shows how to get a vector associated with a word (using <code>getVector()</code>)
+
** start playing with these, and reading through these to get a sense of what the code is doing
*** shows how to retrieve K most similar words using a word (using <code>getSimilar()</code>)
+
** [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify/example-centroid.ck '''example-centroid.ck'''] -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator <code>=^</code>) to extract an audio feature:
*** shows how to retrieve K most similar words using a vector (using <code>getSimilar()</code>)
+
*** generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
*** uses the W2V helper class to evaluate an expression like "puppy - dog + cat" (using <code>W2V.eval()</code>)
+
*** take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
*** uses the W2V helper class to evaluate a logical analog like dog:puppy::cat:?? (using <code>W2V.analog()</code>)
+
*** using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
** [https://ccrma.stanford.edu/courses/356/code/etude1/word2vec-prompt.ck '''word2vec-prompt.ck'''] -- interactive prompt word2vec explorer
+
*** note how the ChucK timing is used to precisely control how often to do a frame of analysis
*** this keeps a model loaded while allowing you to play with it
+
*** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
*** type <code>help</code> to get started
+
** [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
** [https://ccrma.stanford.edu/courses/356/code/etude1/starter-prompt.ck '''starter-prompt.ck'''] -- minimal starter code for those wishing to include an interactive prompt in chuck, with sound
+
** [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
* example of poems
+
** [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
** [https://ccrma.stanford.edu/courses/356/code/etude1/poem-i-feel.ck '''"i feel"'''] -- a stream of unconsciousness poem (dependency: glove-wiki-gigaword-50-tsne-2 or any other model)
+
** [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-1-classify/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
*** usage: <code>chuck poem-i-feel.ck</code>
+
* experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
** [https://ccrma.stanford.edu/courses/356/code/etude1/poem-randomwalk.ck '''"Random Walk"'''] -- another stream of unconsciousness poem
+
** available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
*** usage: <code>chuck poem-randomwalk.ck</code> or <code>chuck poem-randomwalk.START_WORD</code> (to provide the starting word)
+
** try at least '''five''' different feature configurations and evaluate the resulting classifier using cross-validation
** [https://ccrma.stanford.edu/courses/356/code/etude1/poem-spew.ck '''"Spew"'''] -- yet another stream of unconsciousness poem
+
*** keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
*** usage: <code>chuck poem-spew.ck</code> or <code>chuck poem-spew.ck:START_WORD</code> (to provide the starting word)
+
*** how do different--and different numbers of--features affect the classification results?
** [https://ccrma.stanford.edu/courses/356/code/etude1/poem-ungenerate.ck '''"Degenerate"'''] -- a prompt-based example (run in command line chuck)
+
*** in your experiment, what configuration yielded the highest score in cross-validation?
*** usage: <code>chuck poem-ungenerate.ck</code>
+
* briefly report on your experiments
  
=== Express Yourself! ===
+
=== Phase Two: Designing an Audio Mosaic Tool ===
* play with the ChucK/ChAI starter/example code for Word2Vec (feel free to use any part of these as starter code)
+
* you can find phase 2 [https://ccrma.stanford.edu/courses/356-winter-2023/code/featured-artist/phase-2-mosaic/'''sample code here''']
* write code to help you create some experimental poetry involving text, sound, and time.
+
* using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
** text: use the Word2Vec object in ChucK and one of the datasets to help you generate some poetry
+
** curate your own set of audio files can be mixture of
** sound: use sound synthesis and map the words (e.g., using their vectors to control parameters such as pitch, volume, timbre, etc.) to sound
+
*** songs or song snippets; we will perform feature extraction on audio windows from beginning to end; in essence each audio window is a short sound fragment with its own feature vector)
** time: don't forget about time! make words appear ''when'' you want them to; synchronize words with sound; visually and sonically "rap" the words in time! make a song!
+
*** (optional) short sound effects (~1 second), you may wish to extract a single vector per sound effect
* create two poetic programs / works / readings:
+
** use/modify/adapt the <code>feature-extract.ck</code> code from Phase One to build your database of sound frames to feature vectors:
** make them as different as possible
+
*** instead of generating one feature vector for the entire file, output a trajectory of audio windows and associated feature vectors
** for example one poem can be fully generated (you only need to run the chuck code, and the poem starts) and the other one interactive (incorporates input from the user, either through a a text prompt, or another means of input such as mouse / keypresses)
+
*** instead of outputting labels (e.g., "blues", "disco", etc.), output information to identify each audio window (e.g., filename and windowStartTime)
* In class, we will have a poetry reading where we will run your code
+
*** see reference implementation <code>mosaic-extract.ck</code>
 +
** note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
 +
* play with <code>mosaic-similar.ck</code>: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
 +
* using your database and retrieval tool and [https://en.wikipedia.org/wiki/Concatenative_synthesis concatenative synthesis] and the <code>mosaic-synth-mic.ck</code> and <code>mosaic-synth-doh.ck</code>, design an interactive audio mosaic generator
 +
** feature-based
 +
** real-time
 +
** takes any audio input (mic or any unit generator)
 +
** can be used for expressive audio mosaic creation
 +
* there are many functionalities you can choose to incorporate into your mosaic synthesizer
 +
** using a keyboard or mouse control to affect mosaic parameters: synthesis window length, pitch shift (through SndBuf.rate), selecting subsets of sounds to use, etc.
 +
** a key to making this expressive is to try different sound sources; play with them A LOT, gain understanding of the code and experiment!
 +
* (optional) do this in the audiovisual domain
 +
** (idea) build a audiovisual mosaic instrument or music creation tool / toy
 +
** (idea) build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize
  
=== Some Prompts and Bad Ideas ===
+
=== Phase Three: Make a Musical Mosaic! ===
* a poem can be about anything; hint: try starting with how you feel about a particular thing, person, or event
+
** use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
* starting with an existing poem and use word2vec to morph it over time
+
** (optional) do this in the audiovisual domain
* an experimental love poem
+
* stream of consciousness
+
* remember [https://www.poetryfoundation.org/poems/42916/jabberwocky "Jabberwocky"] by Lewis Carroll? maybe your poem doesn't need to make sense to everyone
+
* HINT: try to take advantage of the medium: in addition to printing out text to a terminal (both a limitation and a creative constraint/opportunity) you have control over sound and time at your disposal
+
* HINT: experiment with the medium to embody your message -- for example, a poem about chaos where the words gradually become disjointed and nonsensical
+
* HINT: can use a mixture of lines from existing poems and machine generated/processed output
+
  
 
=== Reflections ===
 
=== Reflections ===
* write ~250 words of reflection on your etude. It can be about your process, or the product, or the medium, or anything else. For example, how did your poems make you feel? Did you show them to a friend? What was their reaction? What makes something "poetry" (versus, say, "prose")?
+
* write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)
 +
 
 +
=== Milestone Deliverables ===
 +
submit a webpage for the project so far, containing:
 +
* a brief report of what you did / try / observed in Phase One, and a brief description of your experiments on in Phase Two so far
 +
* a demo video (doesn't have to be polished) briefly documenting your experiments/adventures in Phase Two, and a very preliminary sketch of Phase Three (a creative statement or performance using your system)
 +
* code and feature and usage instructions needed to run your system
 +
* list and acknowledge the source material (audio and any video) and people who have helped you along the way; source audio/video do not need to be posted (can submit these privately in Canvas)
 +
* In class, we will view your webpage/demo video and give one another feedback for this milestone.
  
=== Deliverables ===
+
=== Final Deliverables ===
 
* create a CCRMA webpage for this etude
 
* create a CCRMA webpage for this etude
** the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/etude1 or https://ccrma.stanford.edu/~YOURUSERID/470/etude1
+
** the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2
 
** alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
 
** alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
* your Etude #1 webpage is to include
+
* your webpage is to include
** a title and short description of the exercise (free free to link to this wiki page)
+
** a title and description of your project (free free to link to this wiki page)
** all relevant ChucK code for your experimental poetry tool
+
** all relevant chuck code from all three phases
** your two poems in some (video recording with audio) form; this will depend on what you chose to do; since sound and time are involved, you might include a screen capture with high-quality audio capture
+
*** phase 1: all code used (extraction, classification, validation)
** your 250-word reflection
+
*** phase 2: your mosaic generator
 +
*** phase 3: code used for your musical statement
 +
** video recording of your musical statement (please start early!)
 +
** your 300-word reflection
 
** any acknowledgements (people, code, or other things that helped you through this)
 
** any acknowledgements (people, code, or other things that helped you through this)
 
* submit to Canvas '''only your webpage URL'''
 
* submit to Canvas '''only your webpage URL'''

Latest revision as of 10:56, 18 January 2024

Programming Project #2: "Featured Artist"

Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang

Mosaiconastick.jpg

In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!

Due Dates

  • Coding tutorial: Thursday evening
  • Milestone (Phase One complete + Phase Two prototype): webpage due Monday (2/6, 11:59pm) | in-class critique Tuesday (2/7)
  • Final Deliverable: webpage due Monday (2/13, 11:59pm)
  • In-class Presentation: Tuesday (2/14)

Discord Is Our Friend

  • direct any questions, rumination, outputs/interesting mistakes to our class Discord

Things to Think With

Tools to Play With

  • get the latest bleeding edge secret chuck build (2023.01.23 or later!)
    • macOS this will install both command line chuck and the graphical IDE miniAudicle, and replace any previous ChucK installation.
    • Windows you will need to download and use the bleeding-edge command line chuck (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default cmd command prompt, or might consider downloading a terminal emulator.
    • Linux you will need to build from source, provided in the linux directory
    • all platforms for this project, you will be using the command line version of chuck.
  • NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
  • sample code for all phases (including optional video starter code)

GTZAN Dataset

  • next, you'll need to download the GTZAN dataset
    • 1000 30-second music clips, labeled by humans into ten genre categories

Phase One: Extract, Classify, Validate

  • understanding audio, audio features, FFT, feature extraction
  • extract different sets of audio features from GTZAN dataset
  • run real-time classifier using different feature sets
  • run cross-validation to evaluate the quality of classifier based different features
  • you can find relevant code here
    • start playing with these, and reading through these to get a sense of what the code is doing
    • example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator =^) to extract an audio feature:
      • generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
      • take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
      • using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
      • note how the ChucK timing is used to precisely control how often to do a frame of analysis
      • the .upchuck() is used to trigger an analysis, automatically cascading up the =^
    • example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
    • feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a FeatureCollector is used to aggregate multiple features into a single vector (see comments in the file for more details)
    • genre-classify.ck -- using output of feature-extract.ck, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
    • x-validate.ck -- using output of feature-extract.ck, do cross-validation to get a sense of the classifier quality
  • experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
    • available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
    • try at least five different feature configurations and evaluate the resulting classifier using cross-validation
      • keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
      • how do different--and different numbers of--features affect the classification results?
      • in your experiment, what configuration yielded the highest score in cross-validation?
  • briefly report on your experiments

Phase Two: Designing an Audio Mosaic Tool

  • you can find phase 2 sample code here
  • using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
    • curate your own set of audio files can be mixture of
      • songs or song snippets; we will perform feature extraction on audio windows from beginning to end; in essence each audio window is a short sound fragment with its own feature vector)
      • (optional) short sound effects (~1 second), you may wish to extract a single vector per sound effect
    • use/modify/adapt the feature-extract.ck code from Phase One to build your database of sound frames to feature vectors:
      • instead of generating one feature vector for the entire file, output a trajectory of audio windows and associated feature vectors
      • instead of outputting labels (e.g., "blues", "disco", etc.), output information to identify each audio window (e.g., filename and windowStartTime)
      • see reference implementation mosaic-extract.ck
    • note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
  • play with mosaic-similar.ck: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
  • using your database and retrieval tool and concatenative synthesis and the mosaic-synth-mic.ck and mosaic-synth-doh.ck, design an interactive audio mosaic generator
    • feature-based
    • real-time
    • takes any audio input (mic or any unit generator)
    • can be used for expressive audio mosaic creation
  • there are many functionalities you can choose to incorporate into your mosaic synthesizer
    • using a keyboard or mouse control to affect mosaic parameters: synthesis window length, pitch shift (through SndBuf.rate), selecting subsets of sounds to use, etc.
    • a key to making this expressive is to try different sound sources; play with them A LOT, gain understanding of the code and experiment!
  • (optional) do this in the audiovisual domain
    • (idea) build a audiovisual mosaic instrument or music creation tool / toy
    • (idea) build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize

Phase Three: Make a Musical Mosaic!

    • use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
    • (optional) do this in the audiovisual domain

Reflections

  • write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)

Milestone Deliverables

submit a webpage for the project so far, containing:

  • a brief report of what you did / try / observed in Phase One, and a brief description of your experiments on in Phase Two so far
  • a demo video (doesn't have to be polished) briefly documenting your experiments/adventures in Phase Two, and a very preliminary sketch of Phase Three (a creative statement or performance using your system)
  • code and feature and usage instructions needed to run your system
  • list and acknowledge the source material (audio and any video) and people who have helped you along the way; source audio/video do not need to be posted (can submit these privately in Canvas)
  • In class, we will view your webpage/demo video and give one another feedback for this milestone.

Final Deliverables

  • create a CCRMA webpage for this etude
  • your webpage is to include
    • a title and description of your project (free free to link to this wiki page)
    • all relevant chuck code from all three phases
      • phase 1: all code used (extraction, classification, validation)
      • phase 2: your mosaic generator
      • phase 3: code used for your musical statement
    • video recording of your musical statement (please start early!)
    • your 300-word reflection
    • any acknowledgements (people, code, or other things that helped you through this)
  • submit to Canvas only your webpage URL