Debanjum Singh Solanky
367d7377df
Ignore scheduled, closed, deadline time and logbook start, end in org node body
...
- Gives cleaner embeddings for semantic search
- Hopefully improves results and reduces size, compute
2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky
b77ccadcba
Make property key regex more strict. Property key has to be alphanumeric
2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky
ac9d746444
Fix Tags extraction in Org Node parser
...
- Previous version required two tags at least to work, not sure why
- Fixed it to extract all tags, even if only one tag in heading
2022-06-17 04:21:22 +03:00
Debanjum Singh Solanky
fb86be8cd9
Add ID, File+Heading based Links to Org-Mode Entries
...
- Add links to property drawer
- This ensures results returned by semantic search contain these links
- This allows the user to jump to entry within original file for context
- The ID, file+heading based links are more robust to find relevant
entry in original file than the line no based link,
as edits being done by user to original files between embedding regenerations
2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky
de23fc2051
Revert Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search
...
Sentence Transformer MSMarco Model isn't date aware
So no use of adding scheduled, deadline dates to model embeddings for consideration
This reverts commit a2a08d1354 .
2022-06-17 02:57:28 +03:00
Debanjum Singh Solanky
a2a08d1354
Add Scheduled, Deadlne date to Model Embeddings for Date Aware Search
2022-06-17 02:55:27 +03:00
Debanjum Singh Solanky
cfbd5c4ecc
Update global model on regenerate via API
2022-06-17 00:49:06 +03:00
Debanjum Singh Solanky
c78bf84eef
Introduce search api endpoint that auto infers search type intent
...
- Introduce prompt for GPT to automatically extract user's search intent
- Expose new search api endpoint to use that to set SearchType being
passed to search API
- Currently meant as an experimental API to gauge usefulness,
extendability. Evaluating for phone or voice use-case
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
8ef7917014
Fix json format passed in prompt to GPT
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
f57b7f65ea
Wrap prompts for GPT in triple quotes to improve prompt readability
...
To prompt improve readability:
- Remove newline escape sequence and use actual newline directly
- This avoids one long line of text as prompt and
- Remove escaping of double quotes
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
1eba7b1c6f
Use empty_escape_sequence constant to strip response text from gpt
2022-02-27 23:17:49 -05:00
Debanjum Singh Solanky
1c3a1420f8
Update asymmetric extract_entries method to handle uncompressed jsonl
...
This is similar to what was done for the symmetric extract_entries
method earlier
2022-02-27 19:03:31 -05:00
Debanjum Singh Solanky
3d8a07f252
Extract empty line escape sequences var into constants file for reuse
2022-02-27 19:01:49 -05:00
Debanjum Singh Solanky
bb5d0d8908
Improve Semantic Search Buffer Names in Emacs
...
- Allow multiple semantic searches buffers to exist simultaneously
- Uniquify semantic search buffer namew
- Add query and search-type to semantic search buffer name for easier
disambiguration, search and find appropriate
2022-02-26 18:30:14 -05:00
Debanjum Singh Solanky
b68558651b
Improve Extraction of Beancount Entries
...
- Only extract entries starting with YYYY-MM-DD from Beancount
- Strip Trailing Escape Sequences from Entries
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
b3ac2dd730
Improve Results Rendered on Emacs from Semantic Search on Ledger
...
- Add search query to top of buffer as Beancount comment
- Remove trailing ) from response
- Separate entries by empty line
- Load beancount-mode in semantic search on ledger buffer
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
502c68d4f8
Remove trailling escape sequence in ledger search response entries
...
- Fix loading entries from jsonl in extract_entries method
- Only extract Title from jsonl of each entry
This is the only thing written to the jsonl for symmetric ledger
- This fixes the trailing escape seq in loaded entries
- Remove the need for semantic-search.el response reader to do pointless complicated cleanup
- Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl
Both methods were doing similar work
- Make load_jsonl handle loading entries from both gzip and uncompressed jsonl
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
248aa632c0
Do not throw warning for beancount files with .beancount extension
2022-02-26 17:48:45 -05:00
Debanjum Singh Solanky
76cd63f4bd
Fix count of processed jsonl entries shown to user by ledger processor
...
Count lines not chars
2022-02-26 17:46:06 -05:00
Saba
33bc62dc19
Fix type of use_xmp_metadata to be bool, rather than str
2022-01-24 21:53:26 -05:00
Debanjum Singh Solanky
179153dc5a
Rename RawConfig Types for Consistency
...
- Naming convention - [ContentType][ConfigType]Config
- Where [ConfigType] ~ Content, Search, Processor
- Where [ContentType] ~ Text, Image, Asymmetric, Symmetric, Conversation
- Current Configs:
- Content:
- Org Notes
- Org Music
- Image
- Ledger/Beancount
- Search:
- Asymmetric
- Symmetric
- Image
- Processor:
- Conversation
2022-01-14 20:54:38 -05:00
Debanjum Singh Solanky
c64e0c2965
Load model from HuggingFace if model_directory unset in config YAML
...
- Do not save/load the model to/from disk when model_directory unset
in config.yml
- Add symmetric search default config to cli.py
2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
510faa1904
Save Image Search Model to Disk
2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
934ec233b0
Add Search Config for Symmetric Model. Save Model to Disk
2022-01-14 17:36:59 -05:00
Debanjum Singh Solanky
b63026d97c
Save Asymmetric Search Model to Disk
...
- Improve application load time
- Remove dependence on internet to startup application and perform semantic search
2022-01-14 17:36:27 -05:00
Debanjum Singh Solanky
2e53fbc844
Fix the user intent extraction prompt for GPT. Clean up chatbot test
2022-01-12 10:36:01 -05:00
Debanjum Singh Solanky
ea28897cdd
Remove deprecated conversation_history field from config
2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky
5a686b7be9
Add logs for chat bot in verbose mode
2022-01-12 10:35:52 -05:00
Debanjum Singh Solanky
6dc2a99d35
Merge branch 'master' of github.com:debanjum/semantic-search into add-summarize-capability-to-chat-bot
...
- Fix openai_api_key being set in ConfigProcessorConfig
- Merge addition of config UI and config instantiation updates
2021-12-20 13:30:42 +05:30
Debanjum Singh Solanky
65da7daf1f
Load, Save Conversation Session Summaries to Log. s/chat_log/chat_session
...
Conversation logs structure now has session info too instead of just chat info
Session info will allow loading past conversation summaries as context for AI in new conversations
{
"session": [
{
"summary": <chat_session_summary>,
"session-start": <session_start_index_in_chat_log>,
"session-end": <session_end_index_in_chat_log>
}],
"chat": [
{
"intent": <intent-object>
"trigger-emotion": <emotion-triggered-by-message>
"by": <AI|Human>
"message": <chat_message>
"created": <message_created_date>
}]
}
2021-12-15 10:17:07 +05:30
Saba
97a6dfaa1e
Use default value False for verbose parameter, and small changes
...
Pass config as parameter to initialize_search, change name of API methods to handle config CRUD operations, and initalize config to FullConfig
2021-12-11 14:13:14 -05:00
Saba
9536358d34
Fix key error model_name issue by upgrade sentence-transformers version
...
Refer to https://github.com/UKPLab/sentence-transformers/issues/1241
Also user verbose flag passed through function parameters in image_search
2021-12-11 11:58:19 -05:00
Saba
ce7a751e6b
Fix passing verbose flag down in symmetric_ledger.py
2021-12-11 11:36:32 -05:00
Saba
d65190c3ee
Update unit tests, files with removing model suffix to config types
2021-12-09 08:50:38 -05:00
Debanjum Singh Solanky
0ac1e5f372
Summarize chat logs and notes returned by semantic search via /chat API
2021-12-08 02:34:07 +05:30
Saba
76e9e9da2f
Update unit tests to use the new BaseModel types
2021-12-05 09:31:39 -05:00
Saba
9b16cdbb41
Use past tense for verbose log
2021-12-04 11:45:44 -05:00
Saba
10e4065e05
Consolidate the search config models and pass verbose as a top level flag
2021-12-04 11:43:48 -05:00
Saba
43e647835b
Append Model Suffixed to config models
2021-12-04 10:51:21 -05:00
Saba
e068968b35
Update imports for raw config models in config.py
2021-12-04 10:44:55 -05:00
Saba
4d6284b0af
Remove Test suffix from Config models
2021-12-04 10:44:13 -05:00
Saba
7fcc8d2cef
Add null check for processor config
2021-12-04 10:11:00 -05:00
Saba
7ca4fc3453
Resolve mrege conflicts with updated processor conversation data model
2021-11-28 16:22:52 -05:00
Saba
87a6c2d716
Use parse_obj instead of parse_raw as incoming data is in dict
2021-11-28 14:34:32 -05:00
Saba
5d50487d83
Linting
...
New line at end of config.html
Remove debug print statement
2021-11-28 13:32:56 -05:00
Saba
6f466c8d99
Use global config and add a regenerate button to the config ui' && git push
2021-11-28 13:28:22 -05:00
Saba
34d1e4199c
Use alias generator when deserializing the config file
2021-11-28 13:05:48 -05:00
Saba
19b81e82f0
Write back to the raw config.yml file on update
2021-11-28 12:34:40 -05:00
Saba
8837b02de6
dump updated config to a yaml file
2021-11-28 12:26:07 -05:00
Saba
5b80b87379
Streamline None checking in initialize_search
2021-11-28 12:05:04 -05:00