- Improve overall flow of the contribute section of Readme
- Fix where to look for good first issues. The contributors board is outdated. Easier to maintain and view good-first-issue with issue tags directly.
Co-authored-by: Debanjum <debanjum@gmail.com>
Fallback to assume not a subscribed user if user not passed.
This allows user arg to be actually optional in the async
send_message_to_model_wrapper function
### Major
All reasoning models return thoughts differently due to lack of
standardization.
We normalize thoughts by reasoning models and providers to ease handling
within Khoj.
The model thoughts are parsed during research mode when generating final
response.
These model thoughts are returned by the chat API and shown in train of
thought shown on web app.
Thoughts are enabled for Deepseek, Anthropic, Grok and Qwen3 reasoning
models served via API.
Gemini and Openai reasoning models do not show their thoughts via
standard APIs.
### Minor
- Fix ability to use Deepseek reasoner for intermediate stages of chat
- Enable handling Qwen3 reasoning models
Previously Deepseek reasoner couldn't be used via API for completion
because of the additional formatting constrains it required was being
applied in this function.
The formatting fix was being applied in the chat completion endpoint.
DeepSeek reasoners returns reasoning in reasoning_content field.
Create an async stream processor to parse the reasoning out when using
the deepseek reasoner model.
The Qwen3 reasoning models return thoughts within <think></think> tags
before response.
This change parses the thoughts out from final response from the
response stream and returns as structured response with thoughts.
These thoughts aren't passed to client yet
OpenAI API doesn't support thoughts via chat completion by default.
But there are thinking models served via OpenAI compatible APIs like
deepseek and qwen3.
Add stream handlers and modified response types that can contain
thoughts as well apart from content returned by a model.
This can be used to instantiate stream handlers for different model
types like deepseek, qwen3 etc served over an OpenAI compatible API.
Recent changes enabled free tier users to switch free tier chat models
per conversation or the default.
This change enables free tier users to generate responses with their
conversation specific chat model.
Related: #725, #1151
Reason
---
- Simplify code and logic to stream chat response by solely relying on
asyncio event loop.
- Reduce overhead of managing threads to increase efficiency and
throughput (where possible).
Details
---
- Use async/await with no threading when generating chat response via
OpenAI, Gemini, Anthropic AI model APIs
- Use threading for offline chat model as llama-cpp doesn't support
async streaming yet
# PR Summary
This small PR resolves the deprecation warnings on `datetime` in
Python3.12+. You can find them in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/14538833837/job/40792624987#step:9:134):
```python
/__w/khoj/khoj/src/khoj/processor/content/images/image_to_entries.py:61: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
timestamp_now = datetime.utcnow().timestamp()
```
Overview
---
Enable free tier users to chat with any AI model made available on free tier
of production deployments like [Khoj cloud](https://app.khoj.dev).
Previously model switching was completely disabled for users on free tier.
Details
---
- Track price tier of each Chat, Speech, Image, Voice AI model in DB
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
- Update API to allow free tier users to switch between free models
- Update web app to allow model switching on agent creation, settings
chat page (via right side pane), even for free tier users.
Previously the model switching APIs and UX fields on web app were
completely disabled for free tier users
Rely on deepthought flag to control reasoning effort of low/high for
the grok model
This is different from the openai reasoning models which support
low/medium/high and for which we use low/medium effort based on the
deepthought flag
Note: grok is accessible over an openai compatible API
Disregard chart types as not using rich chart rendering
and they are duplicate of chart images that are rendered
Disregard text output associated with generated image files
Added a “Troubleshooting & Tips” section to the GCP Vertex documentation.
This section provides guidance for self-hosted users on common issues
they may encounter when setting up Google Vertex AI integration in Khoj.
Topics covered include permissions, region compatibility, prompt size
limits, API key testing, and secure key management with environment
variables. The goal is to improve the onboarding experience and reduce
setup errors for contributors and self-hosters using Vertex AI models
like Claude and Gemini.
Signed off by: brightally6@gmail.com