- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
tenacity package. This is officially suggested and used by other
popular GPT based libraries
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
argument to generate_chatml_messages_with_context method
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
- Add continuation suffix, ..., when reference definition truncated
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
- Improves color coding for readability
- Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
- Generalize the render-chat-response method to handle rendering
history or chat response from chat API reponse
- Trigger rendering of khoj chat history if Khoj chat buffer not
created for this session yet
- Use org-insert-link method to improve link rendering robustness
Previous simple mechanism to crete org-links would result in links
escaping out of formating. Use a user-facing org-mode method to
remove/reduce probability of this
- Replace newlines with space to render reference notes as links
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
- [sender-name]: *[message]*
- /[receive-date]/
- Add references as org links with context visible on hover,
but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
Use `-map-indexed' method from dash
- Reasons:
- GPT can extract date aware search queries with date filters
better than ChatGPT given the same prompt.
- Need quality more than cost savings for now.
- Need to figure ways to improve prompt for ChatGPT before using it
Update Search Actor prompt with answers, more precise primer and
two more examples for context
Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
extract complete questions
- This should improve search results quality
- Example Expected Inferred Questions from User Message using History:
1. "What is the name of Arun's daughter?"
=> "What is the name of Arun's daughter"
2. "Where does she study?" =>
=> "Where does Arun's daughter study?" OR
=> "Where does Arun's daughter, Reena study?"
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
Answer time range based questions
Limit search to specified timeframe in question using date filter
E.g "What national parks did I visit last year?" adds
dt>="2022-01-01" dt<"2023-01-01" to Khoj search
Note: Temperature set to 0. Message to search queries should be deterministic
Create Rubric to Test Chat Quality and Capabilities
### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities
### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
- Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
- Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
- Measure quality at 2 separate layers
- **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
- **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes.
- Mark desired but not currently available capabilities as expected to fail <br />
This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
answering
- Make GPT be more reliable in looking at past conversations for
forming response
# Improve Khoj Chat
## Main Changes
- Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost
- Improve Prompt to answer query using indexed notes
- Previously was asking GPT to summarize the notes
- Both the chat and answer API use this new prompt
- Support Multi-Turn conversations
- Pass previous messages and associated reference notes to ChatGPT for context
- Show note snippets referenced to generate response
- Allows fact-checking, getting details
- Simplify chat interface by using only single unified chat type for now
## Miscellaneous
- Replace summarize with answer API. Summarize via API not useful for now
- Only pass Khoj search results above a threshold confidence to GPT for context
- Allows Khoj to say don't know if it can't find answer to query from notes
- Allows relying on (only) conversation history to generate response in multi-turn conversation
- Move Chat API out of beta. Update Readme
GPT still mostly says I don't know when answer not in notes or chats
But with this its more inclined to answer general questions not in
chats or notes while informing user that the information is not from
existing chats or notes
- Chat uses compiled form of search results, not the raw entries to
provide context for chat. The compiled snipped search results
themselves are unique and using multiple of them for context from
the same raw note is fine if they cross the score and rank thresholds
This should improve the context provided for chat
- Also apply score_threshold, no deduplication to the answers API
- Issue
The file path separator by khoj server and the Obsidian vault were
different on Windows
- Fix
Normalize file path to use forward slash(/) to find the matching
note file in the Obsidian vault for jump to it
Resolves#177
Answer does not rely on past conversations, just the knowledge base.
It is meant for one off interactions, like search rather than a
continuing conversation like chat
For now it is only exposed via API. Later it will be expose in the
interfaces as well
Remove ability to select different chat types from the chat web
interface as there is only a single chat type
Stop appending answers to the conversation logs
- Only use decent quality search results, if any, as context
- Pass source results used by previous chat messages as context
- Loosen prompt to allow looking at previous chats and notes to answer
- Pass current date for context
- Make GPT provide reason when it can't answer the question. Gives
user context to tune their questions