Time to first token Log lines were shown multiple times if new chunk
bein streamed was empty for some reason.
This change makes the logic robust to empty chunks being recieved.
Previously research iterations and conversation logs were added to a
single user message. This prevented truncating each past iteration
separately on hitting context limits. So the whole past research
context had to be dropped on hitting context limits.
This change splits each research iteration into a separate item in a
message content list.
It uses the ability for message content to be a list, that is
supported by all major ai model apis like openai, anthropic and gemini.
The change in message format seen by pick next tool chat actor:
- New Format
- System: System Message
- User/Assistant: Chat History
- User: Raw Query
- Assistant: Iteration History
- Iteration 1
- Iteration 2
- User: Query with Pick Next Tool Nudge
- Old Format
- User: System + Chat History + Previous Iterations Message
- User: Query
- Collateral Changes
The construct_structured_message function has been updated to always
return a list[dict[str, Any]].
Previously it'd only use list if attached_file_context or vision model
with images for wider compatibility with other openai compatible api
Previously the research agent would have a hard time getting
quantitative data extracted by the web page reader tool AI.
This change aims to encourage the web page reader tool to extract
relevant data in verbatim form for higher granularity research and
responses.
Code tool should see code context and webpage tool should see online
context during research runs
Fix to include code context from past conversations to answer queries.
Add all queries to tool chat history when no specific tool to limit
extracting inferred queries for provided.
- Use much larger read, connect timeout if llm served over local url
- Use larger timeout duration than default (5s) for online llms too
This matches timeout duration increase calls to gemini api
Fallback to assume not a subscribed user if user not passed.
This allows user arg to be actually optional in the async
send_message_to_model_wrapper function
Previously Deepseek reasoner couldn't be used via API for completion
because of the additional formatting constrains it required was being
applied in this function.
The formatting fix was being applied in the chat completion endpoint.
DeepSeek reasoners returns reasoning in reasoning_content field.
Create an async stream processor to parse the reasoning out when using
the deepseek reasoner model.
The Qwen3 reasoning models return thoughts within <think></think> tags
before response.
This change parses the thoughts out from final response from the
response stream and returns as structured response with thoughts.
These thoughts aren't passed to client yet
OpenAI API doesn't support thoughts via chat completion by default.
But there are thinking models served via OpenAI compatible APIs like
deepseek and qwen3.
Add stream handlers and modified response types that can contain
thoughts as well apart from content returned by a model.
This can be used to instantiate stream handlers for different model
types like deepseek, qwen3 etc served over an OpenAI compatible API.
Recent changes enabled free tier users to switch free tier chat models
per conversation or the default.
This change enables free tier users to generate responses with their
conversation specific chat model.
Related: #725, #1151
# PR Summary
This small PR resolves the deprecation warnings on `datetime` in
Python3.12+. You can find them in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/14538833837/job/40792624987#step:9:134):
```python
/__w/khoj/khoj/src/khoj/processor/content/images/image_to_entries.py:61: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp_now = datetime.utcnow().timestamp()
```
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
Previously the model switching APIs and UX fields on web app were
completely disabled for free tier users
Rely on deepthought flag to control reasoning effort of low/high for
the grok model
This is different from the openai reasoning models which support
low/medium/high and for which we use low/medium effort based on the
deepthought flag
Note: grok is accessible over an openai compatible API
Disregard chart types as not using rich chart rendering
and they are duplicate of chart images that are rendered
Disregard text output associated with generated image files