Add Vision Support (#889)

# Summary of Changes * New UI to show preview of image uploads * ChatML message changes to support gpt-4o vision based responses on images * AWS S3 image uploads for persistent image context in conversations * Database changes to have `vision_enabled` option in server admin panel while configuring models * Render previously uploaded images in the chat history, show uploaded images for pending msgs * Pass the uploaded_image_url through to subqueries * Allow image to render upon first message from the homepage * Add rendering support for images to shared chat as well * Fix some UI/functionality bugs in the share page * Convert user attached images for chat to webp format before upload * Use placeholder to attached image for data source, response mode actors * Update all clients to call /api/chat as a POST instead of GET request * Fix copying chat messages with images to clipboard TLDR; Add vision support for openai models on Khoj via the web UI! --------- Co-authored-by: sabaimran <narmiabas@gmail.com> Co-authored-by: Debanjum Singh Solanky <debanjum@gmail.com>
2026-03-06 21:29:12 +00:00 · 2024-09-09 17:22:18 -05:00
parent b553bba1d8
commit 549686a7a4
33 changed files with 740 additions and 417 deletions
--- a/src/khoj/processor/conversation/openai/gpt.py
+++ b/src/khoj/processor/conversation/openai/gpt.py
@@ -123,6 +123,8 @@ def converse(
    location_data: LocationData = None,
    user_name: str = None,
    agent: Agent = None,
+    image_url: Optional[str] = None,
+    vision_available: bool = False,
 ):
    """
    Converse with user using OpenAI's ChatGPT
@@ -178,6 +180,8 @@ def converse(
        model_name=model,
        max_prompt_size=max_prompt_size,
        tokenizer_name=tokenizer_name,
+        uploaded_image_url=image_url,
+        vision_enabled=vision_available,
    )
    truncated_messages = "\n".join({f"{message.content[:70]}..." for message in messages})
    logger.debug(f"Conversation Context for GPT: {truncated_messages}")