Augmenting Cultural Recognition with Music

Sometimes, people see an object, place, or reference, and think of a tune that goes with it. This sound/song becomes a way to easily remember a place’s culture, and we hope to amplify that memory by making a language model that generates sounds.

Tools and References:

  • Linearity Mapping From Image to Text Space The paper's steps diagram
  • CLIP - OpenAICLIP from OpenAI
  • AudioCraft - Meta AI AudioCraft architecture
  • Pytorch pytorch architecture