Difference between revisions of "356-winter-2023/hw2"

From CCRMA Wiki
Jump to: navigation, search
(Phase One: Feature Extract, Classify, Validate)
(Phase One: Feature Extract, Classify, Validate)
Line 43: Line 43:
 
*** note how the ChucK timing is used to precisely control how often to do a frame of analysis
 
*** note how the ChucK timing is used to precisely control how often to do a frame of analysis
 
*** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
 
*** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-mfcc.ck '''example-mfcc.ck'''] -- like the previous example, but now we are computing a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/feature-extract.ck '''feature-extract.ck'''] -- now we use a <code>FeatureCollector</code> to aggregate multiple features into a single vector
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
 
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
 
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
  

Revision as of 01:19, 24 January 2023

Programming Project #2: "Featured Artist"

Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang

Mosaiconastick.jpg

In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool using similarity retrieval. Create a feature-driven musical statement or performance!

Due Dates

  • Milestone: webpage due Wednesday (2/1, 11:59pm) | in-class critique Thursday (2/2)
  • Final Deliverable: webpage due Wednesday (2/8, 11:59pm)
  • In-class Presentation: Thursday (2/9)

Discord Is Our Friend

  • direct any questions, rumination, outputs/interesting mistakes to our class Discord

Things to Think With

Tools to Play With

  • get the latest bleeding edge secret chuck build (2023.01.23 or later!)
    • macOS this will install both command line chuck and the graphical IDE miniAudicle, and replace any previous ChucK installation.
    • Windows you will need to download and use the bleeding-edge command line chuck (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default cmd command prompt, or might consider downloading a terminal emulator.
    • Linux you will need to build from source, provided in the linux directory
    • all platforms for this project, you will be using the command line version of chuck.
  • NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release

GTZAN Dataset

  • next, you'll need to download the GTZAN dataset
    • 1000 30-second music clips, labeled by humans into ten genre categories

Phase One: Feature Extract, Classify, Validate

  • understanding audio, FFT, feature extraction
  • extract different sets of audio features from GTZAN dataset
  • run real-time classifier using different feature sets
  • run cross-validation to evaluate the quality of classifier based different features
  • you can find relevant code here
  • start playing with these, and reading through these to get a sense of what the code is doing
    • example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator =^) to extract an audio feature:
      • generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
      • take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
      • using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
      • note how the ChucK timing is used to precisely control how often to do a frame of analysis
      • the .upchuck() is used to trigger an analysis, automatically cascading up the =^
    • example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
    • feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a FeatureCollector is used to aggregate multiple features into a single vector (see comments in the file for more details)
    • genre-classify.ck -- using output of feature-extract.ck, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
    • x-validate.ck -- using output of feature-extract.ck, do cross-validation to get a sense of the classifier quality

Phase Two: Curate Feature Database, Design Audio Mosaic Tool

  • build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
    • curate your own set of audio files can be mixture of
      • short sound effects (~1 second)
      • music (we will perform feature extraction on each short-time window)
  • prototype a feature-based sound explorer to query your database and perform similarity retrieval
  • using your database and retrieval tool, design an interactive audio mosaic generator
    • feature-based
    • real-time
    • takes any audio input (mic or any unit generator)
    • can be used for performance
  • (optional) do this in the audiovisual domain

Phase Three

    • use your prototype from Phase Two to create a musical statement
    • (optional) do this in the audiovisual domain

Reflections

  • write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)

Deliverables

  • create a CCRMA webpage for this etude
  • your webpage is to include
    • a title and description of your project (free free to link to this wiki page)
    • all relevant chuck code from all three phases
      • phase 1: all code used (extraction, classification, validation)
      • phase 2: your mosaic generator, and database query/retrieval tool
      • phase 3: code used for your musical statement
    • video recording of your musical statement (please start early!)
    • your 300-word reflection
    • any acknowledgements (people, code, or other things that helped you through this)
  • submit to Canvas only your webpage URL