πŸ’­ Simon Willison on X: "Anyone got a lead on a good embedding mo... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ !https://twitter.com/simonw/status/1700528222382027039 Date: September 10, 2023 Image: Simon Willison (@simonw) on X β€” Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog?

It looks like VisualBERT is one, what are others?" loading=β€œlazy”>

Simon Willison (@simonw) on X Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog? It looks like Vis… X (formerly Twitter) Β· twitter.com Kinda mindblown that this is even possible. This is so far outside of my current thinking that i didn’t even think of an elegant way to implement semantic search accross images and text at the same time. I know it happens at Google, but I envision that as still text search accross tags and meta data about the image. Based on the number of responses CLIP is the thing that does this. NOTE β”‚ This post is a thought </thoughts/>. It’s a short note that I make about someone else’s content online #thoughts