- Details
- The CLIP model can represent images, text in the same vector space
- Enhance CLIP's image understanding by augmenting the plain image
with it's text based metadata.
Specifically with any subject, description XMP tags on the image
- Improve results by combining plain image similarity score with
metadata similarity scores for the highest ranked images
- Minor Fixes
- Convert verbose to integer from bool in image_search.
It's already passed as integer from the main program entrypoint
- Process images with ".jpeg" extensions too
- Previously:
The text the model was trained on was being used to
re-create a semblance of the original org-mode entry.
- Now:
- Store raw entry as another key:value in each entry json too
Only return actual raw org entries in results
But create embeddings like before
- Also add link to entry in file:<filename>::<line_number> form
in property drawer of returned results
This can be used to jump to actual entry in it's original file