Commit Graph

23 Commits

Author SHA1 Message Date
Debanjum Singh Solanky
ebd5039bd1 Merge branch 'master' into support-incremental-updates-of-embeddings 2022-09-10 22:37:13 +03:00
Debanjum Singh Solanky
b9a6e80629 Make OrgNode tags stable sorted to find new entries for incremental updates
- Having Tags as sets was returning them in a different order
  everytime
- This resulted in spuriously identifying existing entries as new
  because their tags ordering changed
- Converting tags to list fixes the issue and identifies updated new
  entries for incremental update correctly
2022-09-10 20:59:52 +03:00
Debanjum Singh Solanky
976397bd82 Ignore empty #+TITLE, merge multiple #+TITLE for 0th level headings 2022-09-10 15:34:47 +03:00
Debanjum Singh Solanky
11917c6ddd Do not normalize absolute filenames for creating links in OrgNode 2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
07b98d35f1 Use filename or #+TITLE as heading for 0th level content in org files
- Set LINE, SOURCE link properties in property drawer correctly for
  content which falls under no heading
- See Issue #83 for more details
2022-09-10 15:34:31 +03:00
Debanjum Singh Solanky
d6bd7bf3e1 Fix initializing OrgNode level to string to parse org files
- Parsed `level` argument passed to OrgNode during init is expected to
  be a string, not an integer
- This was resulting in app failure only when parsing org files with
  no headings, like in issue #83, as level is set to string of `*`s
  the moment a heading is found in the current file
2022-09-10 14:21:08 +03:00
Debanjum Singh Solanky
150ae19660 Indent Timestamps, Drawers at Body Level in OrgNode Entry Representation 2022-08-10 18:55:37 +03:00
Debanjum Singh Solanky
fd31d339c1 Remove spurious space in Entries without Todo in OrgNode Entry Repr 2022-08-10 13:48:44 +03:00
Debanjum Singh Solanky
38df727ef4 Fix escape sequence usage in strings. Remove unneeded import of os
Rename /config API method to config to match it's purpose. UI is
anyway too generic, and not what it is doing
2022-08-03 18:51:55 +03:00
Debanjum Singh Solanky
d50bfb5188 Parse Logbook Entries in the OrgNode parser for Org-Mode. Update tests 2022-07-21 00:15:30 +04:00
Debanjum Singh Solanky
85fbe1c42b Normalize org notes path to be relative to home directory
- This is still clunky but it should be commitable
- General enough that it'll work even when a users notes are not in the home directory
- While solving for the special case where:
  - Notes are being processed on a different machine and used on a different machine
  - But the notes directory is in the same location relative to home on both the machines
2022-06-28 19:16:11 +04:00
Debanjum Singh Solanky
094eaf3fcc Fix minor bugs in OrgNode parser
- Bugs discovered from writing org-node tests
2022-06-17 19:14:54 +03:00
Debanjum Singh Solanky
36495038dd Fix storing parsed CLOSED date in OrgNode
The CLOSED date was getting parsed but not stored
Adding setClosed at start also fixed the issue
2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky
1c5754bf95 Simplify storing Tags in OrgNode object
- Use Set for Tags instead of dictionary with empty keys
- No Need to store First Tag separately
  - Remove properties methods associated with storing first tag separately
- Simplify extraction of tags string in org_to_jsonl
- Split notes_string creation into multiple f-string in separate line
  for code readability
2022-06-17 16:33:37 +03:00
Debanjum Singh Solanky
51a43245d3 Escape square brackets in file+heading based org-mode links 2022-06-17 16:20:19 +03:00
Debanjum Singh Solanky
04610f453a Include scheduled date, deadline date and close date in repr of org node
- Now that excluding the times line from the raw body of node,
  show it in repr so user can see it for reference

- But the model doesn't need to see it for it's embeddings to be
  confused by
2022-06-17 05:13:48 +03:00
Debanjum Singh Solanky
367d7377df Ignore scheduled, closed, deadline time and logbook start, end in org node body
- Gives cleaner embeddings for semantic search
- Hopefully improves results and reduces size, compute
2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky
b77ccadcba Make property key regex more strict. Property key has to be alphanumeric 2022-06-17 05:13:09 +03:00
Debanjum Singh Solanky
ac9d746444 Fix Tags extraction in Org Node parser
- Previous version required two tags at least to work, not sure why
- Fixed it to extract all tags, even if only one tag in heading
2022-06-17 04:21:22 +03:00
Debanjum Singh Solanky
fb86be8cd9 Add ID, File+Heading based Links to Org-Mode Entries
- Add links to property drawer
- This ensures results returned by semantic search contain these links
- This allows the user to jump to entry within original file for context
- The ID, file+heading based links are more robust to find relevant
  entry in original file than the line no based link,
  as edits being done by user to original files between embedding regenerations
2022-06-17 03:11:11 +03:00
Debanjum Singh Solanky
1832e418e5 Use raw string for regex in orgnode to fix deprecation warning 2021-10-02 17:38:31 -07:00
Debanjum Singh Solanky
f4bde75249 Decouple results shown to user and text the model is trained on
- Previously:
  The text the model was trained on was being used to
  re-create a semblance of the original org-mode entry.

- Now:
  - Store raw entry as another key:value in each entry json too
    Only return actual raw org entries in results
    But create embeddings like before
  - Also add link to entry in file:<filename>::<line_number> form
    in property drawer of returned results
    This can be used to jump to actual entry in it's original file
2021-08-29 06:06:54 -07:00
Debanjum Singh Solanky
af9660f28e Move application files under src directory. Update Readmes
- Remove callign asymmetric search script directly command.
  It doesn't work anymore on calling directly due to internal package
  import issues
2021-08-17 04:11:03 -07:00