💭 Simon Willison on X: "Anyone got a lead on a good embedding mo...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

!https://twitter.com/simonw/status/1700528222382027039

Date: September 10, 2023

Image: Simon Willison (@simonw) on X — Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog? <p>It looks like VisualBERT is one, what are others?" loading=“lazy”> </div> <div class= <https://pbs.twimg.com/profile_images/378800000261649705/be9cc55e64014e6d7663c50d7cb9fc75_200x200.jpeg> Simon Willison (@simonw) on X Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog? It looks like Vis… X (formerly Twitter) · twitter.com

Kinda mindblown that this is even possible. This is so far outside of my current thinking that i didn’t even think of an elegant way to implement semantic search accross images and text at the same time. I know it happens at Google, but I envision that as still text search accross tags and meta data about the image.

Based on the number of responses CLIP is the thing that does this.

[38;2;68;71;90mNOTE[0m
[38;2;68;71;90m│ [0mThis post is a [4m[38;2;248;248;242mthought[0m <[38;2;248;248;242m/thoughts/[0m>. It’s a short note that I make about someone else’s content online #thoughts