Commit Graph

697 Commits

Author SHA1 Message Date
Debanjum Singh Solanky
6e8a40906d Allow disabling automatic server setup. Fix server start vs ready logic
- khoj-auto-setup controls whether to automatically check for and
  setup khoj server from within Emacs
- extract install, start, configure sequence into public, interactive
  method. Allows calling khoj-setup during package load via init.el

- Fix: Do not attempt to configure or wait for server ready if
  user has said no to auto-setup request
- Fix logic to mark server started vs ready
  - Previously the started/running vs ready variables defs were getting
    intertwined
  - Server started indicates server bootup has been triggered
  - Server ready indicates server API ready to accept requests
2023-03-27 17:53:08 +07:00
Debanjum Singh Solanky
526a927bce Fix org entry extraction test, variable prefixed with khoj in khoj.el
Discovered via failing build and test workflows on Github
2023-03-27 16:44:50 +07:00
Debanjum Singh Solanky
7243059507 Track index update asynchronously via moon phase progressbar in khoj.el 2023-03-27 06:01:04 +07:00
Debanjum Singh Solanky
8a9055f918 Restrict server messages show in echo area to main server files 2023-03-27 04:59:55 +07:00
Debanjum Singh Solanky
ae535a06eb Configure Khoj chat using khoj.el by setting OpenAI API key in Emacs 2023-03-27 04:59:54 +07:00
Debanjum Singh Solanky
36b17d4ae0 Generalize the directory from config extraction elisp method 2023-03-27 03:44:03 +07:00
Debanjum Singh Solanky
924424c754 Throw actionable exceptions when content types or chat not configured 2023-03-27 02:47:44 +07:00
Debanjum Singh Solanky
359a2cacef Fix khoj--server-running to work with unconfigured or external server
- If khoj server started outside emacs, khoj--server-ready should be set
to true by khoj--server-running method (instead of waiting for proc msg)

- If khoj server is unconfigured the /config/types endpoint wouldn't
return anything. Using config/data/default allows checking khoj server
running status without requiring it to be configured as well
2023-03-27 02:45:59 +07:00
Debanjum Singh Solanky
d7fb9a596e Auto configure server before loading khoj-menu
If the config hasn't changed there'll be no update. If config has
changed indexing will get triggered asynchronously. But user cannot
make query till indexing done

As easier to know when server ready to configure
2023-03-27 02:44:02 +07:00
Debanjum Singh Solanky
8a21aff438 Make khoj.el server start, stop, restart, setup methods interactive
No need to erase temporary buffers before working on them
2023-03-27 01:53:15 +07:00
Debanjum Singh Solanky
cb40a96c85 Index configured org files from khoj.el
- Set `khoj-org-files-index' to list of files to index
- Defaults to indexing org-agenda-files
- Uses khoj server api to configure org files to index
2023-03-27 01:05:26 +07:00
Debanjum Singh Solanky
50760acc37 Wait for Khoj server to get ready before opening khoj.el transient menu
- Use process filter, sentinel to mark when khoj server is ready or not
- Display server messages for visibility into server boot-up process
- Wait until server ready to open khoj transient menu in Emacs
  Until then khoj features wouldn't work anyway, so avoids confusion
2023-03-26 13:00:01 +07:00
Debanjum Singh Solanky
82eb4bfd0d Setup Khoj server on opening khoj from with Emacs
- Create helper methods to check, stop, restart, setup khoj server
- (Ask to) setup khoj server on calling khoj main entrypoint function
2023-03-26 10:12:06 +07:00
Debanjum Singh Solanky
99d19dcf43 Start Khoj server from Emacs using khoj.el 2023-03-26 09:38:46 +07:00
Debanjum Singh Solanky
c92d79118a Install Khoj server from Emacs using khoj.el 2023-03-26 08:50:03 +07:00
Debanjum Singh Solanky
e281a498b4 Style Khoj search org buffer via elisp instead of in-buffer settings 2023-03-26 06:34:18 +07:00
Debanjum Singh Solanky
4f655d20ae Style Khoj chat directly via elisp instead of via in-buffer settings 2023-03-26 06:03:30 +07:00
Debanjum Singh Solanky
f6ff7b1beb Render foonote reference links as superscript for Khoj Chat on Emacs 2023-03-26 05:33:08 +07:00
Debanjum Singh Solanky
67c850a4ac Add retry logic to OpenAI API queries to increase Chat tenacity
- Move completion and chat_completion into helper methods under utils.py
- Add retry with exponential backoff on OpenAI exceptions using
  tenacity package. This is officially suggested and used by other
  popular GPT based libraries
2023-03-26 05:12:35 +07:00
Debanjum Singh Solanky
ff846f05c5 Clean-up khoj.el based on linting helpers and manual review 2023-03-25 05:47:49 +07:00
Debanjum Singh Solanky
7e36f421f9 Truncate message logs to below max supported prompt size by model
- Use tiktoken to count tokens for chat models
- Make conversation turns to add to prompt configurable via method
  argument to generate_chatml_messages_with_context method
2023-03-25 05:13:56 +07:00
Debanjum Singh Solanky
4725416fbd Use shortcut keybindings in buffer to ease sending messages to Khoj 2023-03-25 05:06:01 +07:00
Debanjum Singh Solanky
508b2176b7 Update Chat API, Logs, Interfaces to store, use references as list
- Remove the need to split by magic string in emacs and chat interfaces
- Move compiling references into string as context for GPT to GPT layer
- Update setup in tests to use new style of setting references
- Name first argument to converse as more appropriate "references"
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
b08745b541 Keep chat messages at 1 empty line visible distance in khoj.el
- Clean redundant concat, format string
- Improve variable name to emojified sender
2023-03-24 22:10:11 +07:00
Debanjum Singh Solanky
27217a330d Time chat API sub-components for performance analysis
Time and the search query extraction, search and response generation
components
2023-03-24 20:39:41 +07:00
Debanjum Singh Solanky
5e9558d39d Stylize references shown as footnote links in chat messages
- Render references as superscript
- Show reference definitions on hover over reference links to ease access
- Truncate reference def shown on hover to 70 char
  - Add continuation suffix, ..., when reference definition truncated
2023-03-24 20:38:05 +07:00
Debanjum Singh Solanky
cf28f104c7 Register separate timestamps for user query and response by Khoj Chat 2023-03-24 18:31:58 +07:00
Debanjum Singh Solanky
93e2aff786 Add references as org footnotes instead of links 2023-03-24 18:31:42 +07:00
Debanjum Singh Solanky
d78454d4ad Load Khoj Chat buffer before asking for query to provide context 2023-03-24 13:43:46 +07:00
Debanjum Singh Solanky
863933daaa Resolve build issues found by melpazoid 2023-03-23 02:25:34 +04:00
Debanjum Singh Solanky
e9ca04af0d Require dash, org to run ERT tests for khoj.el 2023-03-23 01:46:26 +04:00
Debanjum Singh Solanky
06df394d6c Style chat messages as org-mode entries in Emacs
- Style Message as Org Entries instead of List
- Put khoj response as child of user query entry
  - Improves color coding for readability
  - Allows folding each back-n-forth
- Put timestamp of message received into property drawer
- Use standardized time format for new and old chat messages
2023-03-22 12:00:43 -06:00
Debanjum Singh Solanky
364e6c11af Render chat history from API in chat buffer on first run
- Generalize the render-chat-response method to handle rendering
  history or chat response from chat API reponse

- Trigger rendering of khoj chat history if Khoj chat buffer not
  created for this session yet
2023-03-22 12:00:35 -06:00
Debanjum Singh Solanky
36b52fdd0a Properly escape reference links before rendering
- Use org-insert-link method to improve link rendering robustness
  Previous simple mechanism to crete org-links would result in links
  escaping out of formating. Use a user-facing org-mode method to
  remove/reduce probability of this

- Replace newlines with space to render reference notes as links
2023-03-22 11:05:38 -06:00
Debanjum Singh Solanky
72f63a6ef7 Add basic chat interface for Khoj on Emacs
- Query khoj chat API to get Khoj Chat response to user message
- Render chat messages as a org-mode list in format:
  - [sender-name]: *[message]*
    - /[receive-date]/
- Add references as org links with context visible on hover,
  but no jump to note
- Require dash library for khoj.el to simplify list manipulation.
  Use `-map-indexed' method from dash
2023-03-22 10:47:55 -06:00
Debanjum Singh Solanky
e4d67694e1 Add search to method, variable names meant for khoj search in khoj.el
In preparation to introduce Khoj chat in Emacs
2023-03-21 21:44:11 -06:00
Debanjum Singh Solanky
2f6284872d Mention Khoj needs Python version 3.10 or lower in docs 2023-03-20 15:18:19 -06:00
Debanjum Singh Solanky
601ff2541b Revert to using GPT to extract search queries from users message
- Reasons:
  - GPT can extract date aware search queries with date filters
    better than ChatGPT given the same prompt.
  - Need quality more than cost savings for now.
  - Need to figure ways to improve prompt for ChatGPT before using it
2023-03-18 17:56:13 -06:00
Debanjum Singh Solanky
e28526bbc9 Extract search queries from users message using ChatGPT as Search Actor
- Reasons
  - ChatGPT should be better at following instructions than GPT
  - At 1/10th the cost, it's much cheaper than using older GPT models
2023-03-18 16:33:24 -06:00
Debanjum Singh Solanky
939d7731da Fix-up Search Actor GPT's response for decoding it as valid JSON 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
f63fd0995e Pass more search results as context to Chat Actor to improve inference 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
10836dedee Search should return user message if GPT response is not valid JSON
Previously would throw if GPT response is not valid JSON. Better to
return original message to use for search instead
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
08f5fb315f Add answers to context for Search Actor to generate relevant queries
Update Search Actor prompt with answers, more precise primer and
two more examples for context

Mark the 3 chat quality tests using answer as context to generate
queries as expected to pass. Verify that the 3 tests pass now, unlike
before when the Search Actor did not have the answers for context
2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
45cb510421 Loosen search results score thresold used by chat for more context 2023-03-18 16:30:55 -06:00
Debanjum Singh Solanky
d871e04a81 Use past user messages, inferred questions as context to extract questions
- Keep inferred questions in logs
- Improve prompt to GPT to try use past questions as context
- Pass past user message and inferred questions as context to help GPT
  extract complete questions
- This should improve search results quality

- Example Expected Inferred Questions from User Message using History:
  1. "What is the name of Arun's daughter?"
    => "What is the name of Arun's daughter"
  2. "Where does she study?" =>
    => "Where does Arun's daughter study?" OR
    => "Where does Arun's daughter, Reena study?"
2023-03-18 16:30:50 -06:00
Debanjum Singh Solanky
1a5d1130f4 Generate search queries from message to answer users chat questions
The Search Actor allows for
1. Looking up multiple pieces of information from the notes
   E.g "Is Bob older than Tom?" searches for age of Bob and Tom in 2 searches
2. Allow date aware user queries in Khoj chat
   Answer time range based questions
   Limit search to specified timeframe in question using date filter
   E.g "What national parks did I visit last year?" adds
   dt>="2022-01-01" dt<"2023-01-01" to Khoj search

Note: Temperature set to 0. Message to search queries should be deterministic
2023-03-18 16:28:51 -06:00
Debanjum
e75e13d788 Create Tests to Measure Chat Quality, Capabilities
Create Rubric to Test Chat Quality and Capabilities

### Issues
- Previously the improvements in quality of Khoj Chat on changes was uncertain
- Manual testing on my evolving set of notes was slow and didn't assess all expected, desired capabilities

### Fix
1. Create an Evaluation Dataset to assess Chat Capabilities
   - Create custom notes for a fictitious person (I'll publish a book with these soon 😅😋)
   - Add a few of Paul Graham's more personal essays. *[Easy to get as markdown](https://github.com/ofou/graham-essays)*
2. Write Unit Tests to Measure Chat Capabilities
   - Measure quality at 2 separate layers
     - **Chat Actor**: These are the narrow agents made of LLM + Prompt. E.g `summarize`, `converse` in `gpt.py`
     - **Chat Director**: This is the chat orchestration agent. It calls on required chat actors, search through user provided knowledge base (i.e notes, ledger, image) etc to respond appropriately to the users message.  This is what the `/api/chat` API exposes.
   - Mark desired but not currently available capabilities as expected to fail <br />
     This still allows measuring the chat capability score/percentage while only failing capability tests which were passing before on any changes to chat
2023-03-16 11:30:52 -06:00
Debanjum Singh Solanky
7526a50dd4 Extract conversation processor utility funcs from gpt.py into utils.py 2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
24ddebf3ce Make converse prompt more precise. Fix default arg vals in gpt methods
- Set conversation_log arg default to dict
- Increase default temperature to 0.2 for a little creativity in
  answering
- Make GPT be more reliable in looking at past conversations for
  forming response
2023-03-16 09:30:37 -06:00
Debanjum Singh Solanky
8609e3129e Fix, improve displaying chat messages, sources by Khoj in web interface
Pretty pretty json in conversation logs
2023-03-14 11:24:47 -06:00