356-winter-2023/hw2
From CCRMA Wiki
Revision as of 16:22, 5 February 2023 by Ge (Talk | contribs) (→Phase One: Extract, Classify, Validate)
Programming Project #2: "Featured Artist"
Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang
In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!
Due Dates
- Coding tutorial: Thursday evening
- Milestone (Phase One complete + Phase Two prototype): webpage due Monday (2/6, 11:59pm) | in-class critique Tuesday (2/7)
- Final Deliverable: webpage due Monday (2/13, 11:59pm)
- In-class Presentation: Tuesday (2/14)
Discord Is Our Friend
- direct any questions, rumination, outputs/interesting mistakes to our class Discord
Things to Think With
- read/skim the classic article "Musical Genre Classification of Audio Signals" (Tzanetakis and Cook, 2002)
- don't worry about the details yet; first get a general sense what audio features and how they can be used
Tools to Play With
- get the latest bleeding edge secret
chuck
build (2023.01.23 or later!)- macOS this will install both command line
chuck
and the graphical IDE miniAudicle, and replace any previous ChucK installation. - Windows you will need to download and use the bleeding-edge command line
chuck
(for now, there is no bleeding-edge miniAudicle for Windows); can either use the defaultcmd
command prompt, or might consider downloading a terminal emulator. - Linux you will need to build from source, provided in the
linux
directory - all platforms for this project, you will be using the command line version of chuck.
- macOS this will install both command line
- NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
- sample code for all phases (including optional video starter code)
GTZAN Dataset
- next, you'll need to download the GTZAN dataset
- 1000 30-second music clips, labeled by humans into ten genre categories
Phase One: Extract, Classify, Validate
- understanding audio, audio features, FFT, feature extraction
- extract different sets of audio features from GTZAN dataset
- run real-time classifier using different feature sets
- run cross-validation to evaluate the quality of classifier based different features
- you can find relevant code here
- start playing with these, and reading through these to get a sense of what the code is doing
- example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator
=^
) to extract an audio feature:- generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
- take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
- using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
- note how the ChucK timing is used to precisely control how often to do a frame of analysis
- the
.upchuck()
is used to trigger an analysis, automatically cascading up the=^
- example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
- feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a
FeatureCollector
is used to aggregate multiple features into a single vector (see comments in the file for more details) - genre-classify.ck -- using output of
feature-extract.ck
, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details) - x-validate.ck -- using output of
feature-extract.ck
, do cross-validation to get a sense of the classifier quality
- experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
- available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
- try at least five different feature configurations and evaluate the resulting classifier using cross-validation
- keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
- how do different--and different numbers of--features affect the classification results?
- in your experiment, what configuration yielded the highest score in cross-validation?
- briefly report on your experiments
Phase Two: Designing an Audio Mosaic Tool
- you can find phase 2 sample code here
- using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
- curate your own set of audio files can be mixture of
- songs or song snippets; we will perform feature extraction on audio windows from beginning to end; in essence each audio window is a short sound fragment with its own feature vector)
- (optional) short sound effects (~1 second), you may wish to extract a single vector per sound effect
- modify the
feature-extract.ck
code from Phase One to build your database of sound frames to feature vectors:- instead of generating one feature vector for the entire file, output a trajectory of audio windows and associated feature vectors
- instead of outputting labels (e.g., "blues", "disco", etc.), output information to identify each audio window (e.g., filename and windowStartTime)
- see reference implementation
mosaic-extract.ck
- note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
- curate your own set of audio files can be mixture of
- play with
mosaic-similar.ck
: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2) - using your database and retrieval tool and concatenative synthesis and the
mosaic-synth-mic.ck
andmosaic-synth-doh.ck
, design an interactive audio mosaic generator- feature-based
- real-time
- takes any audio input (mic or any unit generator)
- can be used for expressive audio mosaic creation
- there are many functionalities you can choose to incorporate into your mosaic synthesizer
- using a keyboard or mouse control to affect mosaic parameters: synthesis window length, pitch shift (through SndBuf.rate), selecting subsets of sounds to use, etc.
- a key to making this expressive is to try different sound sources; play with them A LOT, gain understanding of the code and experiment!
- (optional) do this in the audiovisual domain
- (idea) build a audiovisual mosaic instrument or music creation tool / toy
- (idea) build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize
Phase Three: Make a Musical Mosaic!
- use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
- (optional) do this in the audiovisual domain
Reflections
- write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)
Deliverables
- create a CCRMA webpage for this etude
- the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2
- alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
- your webpage is to include
- a title and description of your project (free free to link to this wiki page)
- all relevant chuck code from all three phases
- phase 1: all code used (extraction, classification, validation)
- phase 2: your mosaic generator, and database query/retrieval tool
- phase 3: code used for your musical statement
- video recording of your musical statement (please start early!)
- your 300-word reflection
- any acknowledgements (people, code, or other things that helped you through this)
- submit to Canvas only your webpage URL