klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-03 21:29:08 +00:00

Author	SHA1	Message	Date
Debanjum Singh Solanky	67c850a4ac	Add retry logic to OpenAI API queries to increase Chat tenacity - Move completion and chat_completion into helper methods under utils.py - Add retry with exponential backoff on OpenAI exceptions using tenacity package. This is officially suggested and used by other popular GPT based libraries	2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky	ff846f05c5	Clean-up khoj.el based on linting helpers and manual review	2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky	7e36f421f9	Truncate message logs to below max supported prompt size by model - Use tiktoken to count tokens for chat models - Make conversation turns to add to prompt configurable via method argument to generate_chatml_messages_with_context method	2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky	4725416fbd	Use shortcut keybindings in buffer to ease sending messages to Khoj	2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky	508b2176b7	Update Chat API, Logs, Interfaces to store, use references as list - Remove the need to split by magic string in emacs and chat interfaces - Move compiling references into string as context for GPT to GPT layer - Update setup in tests to use new style of setting references - Name first argument to converse as more appropriate "references"	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	b08745b541	Keep chat messages at 1 empty line visible distance in khoj.el - Clean redundant concat, format string - Improve variable name to emojified sender	2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky	27217a330d	Time chat API sub-components for performance analysis Time and the search query extraction, search and response generation components	2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky	5e9558d39d	Stylize references shown as footnote links in chat messages - Render references as superscript - Show reference definitions on hover over reference links to ease access - Truncate reference def shown on hover to 70 char - Add continuation suffix, ..., when reference definition truncated	2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky	cf28f104c7	Register separate timestamps for user query and response by Khoj Chat	2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky	93e2aff786	Add references as org footnotes instead of links	2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky	d78454d4ad	Load Khoj Chat buffer before asking for query to provide context	2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky	863933daaa	Resolve build issues found by melpazoid	2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky	e9ca04af0d	Require dash, org to run ERT tests for khoj.el	2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky	06df394d6c	Style chat messages as org-mode entries in Emacs - Style Message as Org Entries instead of List - Put khoj response as child of user query entry - Improves color coding for readability - Allows folding each back-n-forth - Put timestamp of message received into property drawer - Use standardized time format for new and old chat messages	2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky	364e6c11af	Render chat history from API in chat buffer on first run - Generalize the render-chat-response method to handle rendering history or chat response from chat API reponse - Trigger rendering of khoj chat history if Khoj chat buffer not created for this session yet	2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky	36b52fdd0a	Properly escape reference links before rendering - Use org-insert-link method to improve link rendering robustness Previous simple mechanism to crete org-links would result in links escaping out of formating. Use a user-facing org-mode method to remove/reduce probability of this - Replace newlines with space to render reference notes as links	2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky	72f63a6ef7	Add basic chat interface for Khoj on Emacs - Query khoj chat API to get Khoj Chat response to user message - Render chat messages as a org-mode list in format: - [sender-name]: [message] - /[receive-date]/ - Add references as org links with context visible on hover, but no jump to note - Require dash library for khoj.el to simplify list manipulation. Use `-map-indexed' method from dash	2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky	e4d67694e1	Add search to method, variable names meant for khoj search in khoj.el In preparation to introduce Khoj chat in Emacs	2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky	2f6284872d	Mention Khoj needs Python version 3.10 or lower in docs	2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky	601ff2541b	Revert to using GPT to extract search queries from users message - Reasons: - GPT can extract date aware search queries with date filters better than ChatGPT given the same prompt. - Need quality more than cost savings for now. - Need to figure ways to improve prompt for ChatGPT before using it	2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky	e28526bbc9	Extract search queries from users message using ChatGPT as Search Actor - Reasons - ChatGPT should be better at following instructions than GPT - At 1/10th the cost, it's much cheaper than using older GPT models	2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky	939d7731da	Fix-up Search Actor GPT's response for decoding it as valid JSON	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	f63fd0995e	Pass more search results as context to Chat Actor to improve inference	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	10836dedee	Search should return user message if GPT response is not valid JSON Previously would throw if GPT response is not valid JSON. Better to return original message to use for search instead	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	08f5fb315f	Add answers to context for Search Actor to generate relevant queries Update Search Actor prompt with answers, more precise primer and two more examples for context Mark the 3 chat quality tests using answer as context to generate queries as expected to pass. Verify that the 3 tests pass now, unlike before when the Search Actor did not have the answers for context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	45cb510421	Loosen search results score thresold used by chat for more context	2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky	d871e04a81	Use past user messages, inferred questions as context to extract questions - Keep inferred questions in logs - Improve prompt to GPT to try use past questions as context - Pass past user message and inferred questions as context to help GPT extract complete questions - This should improve search results quality - Example Expected Inferred Questions from User Message using History: 1. "What is the name of Arun's daughter?" => "What is the name of Arun's daughter" 2. "Where does she study?" => => "Where does Arun's daughter study?" OR => "Where does Arun's daughter, Reena study?"	2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky	1a5d1130f4	Generate search queries from message to answer users chat questions The Search Actor allows for 1. Looking up multiple pieces of information from the notes E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches 2. Allow date aware user queries in Khoj chat Answer time range based questions Limit search to specified timeframe in question using date filter E.g "What national parks did I visit last year?" adds dt>="2022-01-01" dt<"2023-01-01" to Khoj search Note: Temperature set to 0. Message to search queries should be deterministic	2023-03-18 16:28:51 -06:00
Debanjum	e75e13d788	Create Tests to Measure Chat Quality, Capabilities Create Rubric to Test Chat Quality and Capabilities ### Issues - Previously the improvements in quality of Khoj Chat on changes was uncertain - Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities ### Fix 1. Create an Evaluation Dataset to assess Chat Capabilities - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋) - Add a few of Paul Graham's more personal essays. [Easy to get as markdown](https://github.com/ofou/graham-essays) 2. Write Unit Tests to Measure Chat Capabilities - Measure quality at 2 separate layers - Chat Actor: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py` - Chat Director: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message. This is what the `/api/chat` API exposes. - Mark desired but not currently available capabilities as expected to fail <br /> This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat	2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky	7526a50dd4	Extract conversation processor utility funcs from gpt.py into utils.py	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	24ddebf3ce	Make converse prompt more precise. Fix default arg vals in gpt methods - Set conversation_log arg default to dict - Increase default temperature to 0.2 for a little creativity in answering - Make GPT be more reliable in looking at past conversations for forming response	2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky	8609e3129e	Fix, improve displaying chat messages, sources by Khoj in web interface Pretty pretty json in conversation logs	2023-03-14 11:24:47 -06:00
Debanjum	6c0e82b2d6	Merge Improve Khoj Chat PR #183 from debanjum/improve-chat-interface # Improve Khoj Chat ## Main Changes - Use the new [API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) for [ChatGPT](https://openai.com/blog/chatgpt) to improve conversation quality and cost - Improve Prompt to answer query using indexed notes - Previously was asking GPT to summarize the notes - Both the chat and answer API use this new prompt - Support Multi-Turn conversations - Pass previous messages and associated reference notes to ChatGPT for context - Show note snippets referenced to generate response - Allows fact-checking, getting details - Simplify chat interface by using only single unified chat type for now ## Miscellaneous - Replace summarize with answer API. Summarize via API not useful for now - Only pass Khoj search results above a threshold confidence to GPT for context - Allows Khoj to say don't know if it can't find answer to query from notes - Allows relying on (only) conversation history to generate response in multi-turn conversation - Move Chat API out of beta. Update Readme	2023-03-10 19:03:44 -06:00
Debanjum Singh Solanky	cccd225247	Deduplicate and simplify logic to render chat message with reference	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	b9caad458e	Type score_threshold with union, not \|, to support python <3.10	2023-03-10 18:58:11 -06:00
Debanjum Singh Solanky	a71f168273	Move the chat API out of beta. Save chat sessions at 15min intervals	2023-03-10 17:20:52 -06:00
Debanjum Singh Solanky	8bb8824d0c	Bump khoj versions in obsidian, emacs files	2023-03-10 15:23:17 -06:00
Debanjum Singh Solanky	e16d0b6d7e	Open references notes used for chat on mobile too (by clicking) Requires clicking the reference as hover doesn't work on mobile	2023-03-09 17:13:07 -06:00
Debanjum Singh Solanky	c3c7b8a951	Make Khoj chat a separate Progressive Web App (PWA) for easier access	2023-03-09 13:45:06 -06:00
Debanjum Singh Solanky	3838f9d8e3	Remove explicitly asking GPT to say I don't know in prompt for now GPT still mostly says I don't know when answer not in notes or chats But with this its more inclined to answer general questions not in chats or notes while informing user that the information is not from existing chats or notes	2023-03-09 12:11:44 -06:00
Debanjum Singh Solanky	f7b8cdd02e	Log prompts being passed to GPT for debugging	2023-03-08 19:17:52 -06:00
Debanjum Singh Solanky	2739a492b4	Log message metadata along with Khoj message instead of user message References should be attached to khoj chat messsage rather than the users message in the chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	87d1e1341d	Show reference notes used as response context in chat interface	2023-03-08 19:16:24 -06:00
Debanjum Singh Solanky	280061e1fa	Do not deduplicate search results used for chat context - Chat uses compiled form of search results, not the raw entries to provide context for chat. The compiled snipped search results themselves are unique and using multiple of them for context from the same raw note is fine if they cross the score and rank thresholds This should improve the context provided for chat - Also apply score_threshold, no deduplication to the answers API	2023-03-06 23:51:31 -06:00
Debanjum Singh Solanky	672f61529e	Make getting deduped search results configurable via Search API	2023-03-06 23:48:46 -06:00
Debanjum Singh Solanky	4fb628975c	Fix jumping to note from Khoj Obsidian search modal result on Windows - Issue The file path separator by khoj server and the Obsidian vault were different on Windows - Fix Normalize file path to use forward slash(/) to find the matching note file in the Obsidian vault for jump to it Resolves #177	2023-03-05 21:07:54 -06:00
Debanjum Singh Solanky	b6cdc5c7cb	Do not expose answer API as a chat type in chat web interface or API Answer does not rely on past conversations, just the knowledge base. It is meant for one off interactions, like search rather than a continuing conversation like chat For now it is only exposed via API. Later it will be expose in the interfaces as well Remove ability to select different chat types from the chat web interface as there is only a single chat type Stop appending answers to the conversation logs	2023-03-05 18:21:59 -06:00
Debanjum Singh Solanky	7f994274bb	Support multi-turn conversations in chat mode - Only use decent quality search results, if any, as context - Pass source results used by previous chat messages as context - Loosen prompt to allow looking at previous chats and notes to answer - Pass current date for context - Make GPT provide reason when it can't answer the question. Gives user context to tune their questions	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	d73042426d	Support filtering for results above threshold score in search API	2023-03-05 18:21:39 -06:00
Debanjum Singh Solanky	45f461d175	Keep search results passed to GPT as context in conversation logs This will be useful to 1. Show source references used to arrive at answer 2. Carry out multi-turn conversations	2023-03-05 16:00:19 -06:00

1 2 3 4 5 ...

679 Commits