Sometimes, people see an object, place, or reference, and think of a tune that goes with it. This sound/song becomes a way to easily remember a place’s culture, and we hope to amplify that memory by making a language model that generates sounds.
Tools and References:
- Linearity Mapping From Image to Text Space
- CLIP - OpenAI
- AudioCraft - Meta AI
- Pytorch