Decouple results shown to user and text the model is trained on

- Previously:
  The text the model was trained on was being used to
  re-create a semblance of the original org-mode entry.

- Now:
  - Store raw entry as another key:value in each entry json too
    Only return actual raw org entries in results
    But create embeddings like before
  - Also add link to entry in file:<filename>::<line_number> form
    in property drawer of returned results
    This can be used to jump to actual entry in it's original file
This commit is contained in:
Debanjum Singh Solanky
2021-08-29 05:47:43 -07:00
parent 7ee3007070
commit f4bde75249
4 changed files with 20 additions and 20 deletions

View File

@@ -77,7 +77,7 @@ def makelist(filename):
deadline_date = ''
thisNode.setProperties(propdict)
nodelist.append( thisNode )
propdict = dict()
propdict = {'SOURCE': f'file:{filename}::{ctr}'}
level = hdng.group(1)
heading = hdng.group(2)
bodytext = ""
@@ -325,8 +325,14 @@ class Orgnode(object):
n = n + ':' + t
closecolon = ':'
n = n + closecolon
# Need to output Scheduled Date, Deadline Date, property tags The
# following will output the text used to construct the object
n = n + "\n" + self.body
# Need to output Scheduled Date, Deadline Date, property tags The
# following will output the text used to construct the object
n = n + "\n"
n = n + ":PROPERTIES:\n"
for key, value in self.properties.items():
n = n + f":{key}: {value}\n"
n = n + ":END:\n"
n = n + self.body
return n