π Simon Willison on X: "Anyone got a lead on a good embedding mo...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
!https://twitter.com/simonw/status/1700528222382027039
Date: September 10, 2023
Image: Simon Willison (@simonw) on X β Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog?
It looks like VisualBERT is one, what are others?" loading=βlazyβ>
Simon Willison (@simonw) on X Anyone got a lead on a good embedding model that can embed both images and text into the same space, so you can search for "dog" and get back images most likely to contain a dog? It looks like Vis⦠X (formerly Twitter) · twitter.com
Kinda mindblown that this is even possible. This is so far outside of my current thinking that i didnβt even think of an elegant way to implement semantic search accross images and text at the same time. I know it happens at Google, but I envision that as still text search accross tags and meta data about the image.
Based on the number of responses CLIP is the thing that does this.
[38;2;68;71;90mNOTE[0m
[38;2;68;71;90mβ [0mThis post is a [4m[38;2;248;248;242mthought[0m <[38;2;248;248;242m/thoughts/[0m>. Itβs a short note that I make about someone elseβs content online #thoughts