mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-05 21:29:11 +00:00
Migrate to using docusaurus, rather than docsify for documentation (#603)
* Add docusaurus documentation (to replace the docsify setup * Remove older docs * Specify documentation as the gh pages build action working directory
This commit is contained in:
8
documentation/docs/miscellaneous/_category_.json
Normal file
8
documentation/docs/miscellaneous/_category_.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"label": "Miscellaneous",
|
||||
"position": 6,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Additional resources for learning about Khoj"
|
||||
}
|
||||
}
|
||||
32
documentation/docs/miscellaneous/advanced.md
Normal file
32
documentation/docs/miscellaneous/advanced.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Advanced Usage
|
||||
|
||||
### Search across Different Languages (Self-Hosting)
|
||||
To search for notes in multiple, different languages, you can use a [multi-lingual model](https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models).<br />
|
||||
For example, the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) supports [50+ languages](https://www.sbert.net/docs/pretrained_models.html#:~:text=we%20used%20the%20following%2050%2B%20languages), has good search quality and speed. To use it:
|
||||
1. Manually update the search config in server's admin settings page. Go to [the search config](http://localhost:42110/server/admin/database/searchmodelconfig/). Either create a new one, if none exists, or update the existing one. Set the bi_encoder to `sentence-transformers/multi-qa-MiniLM-L6-cos-v1` and the cross_encoder to `cross-encoder/ms-marco-MiniLM-L-6-v2`.
|
||||
2. Regenerate your content index from all the relevant clients. This step is very important, as you'll need to re-encode all your content with the new model.
|
||||
|
||||
### Query Filters
|
||||
|
||||
Use structured query syntax to filter entries from your knowledge based used by search results or chat responses.
|
||||
|
||||
- **Word Filter**: Get entries that include/exclude a specified term
|
||||
- Entries that contain term_to_include: `+"term_to_include"`
|
||||
- Entries that contain term_to_exclude: `-"term_to_exclude"`
|
||||
- **Date Filter**: Get entries containing dates in YYYY-MM-DD format from specified date (range)
|
||||
- Entries from April 1st 1984: `dt:"1984-04-01"`
|
||||
- Entries after March 31st 1984: `dt>="1984-04-01"`
|
||||
- Entries before April 2nd 1984 : `dt<="1984-04-01"`
|
||||
- **File Filter**: Get entries from a specified file
|
||||
- Entries from incoming.org file: `file:"incoming.org"`
|
||||
- Combined Example
|
||||
- `what is the meaning of life? file:"1984.org" dt>="1984-01-01" dt<="1985-01-01" -"big" -"brother"`
|
||||
- Adds all filters to the natural language query. It should return entries
|
||||
- from the file *1984.org*
|
||||
- containing dates from the year *1984*
|
||||
- excluding words *"big"* and *"brother"*
|
||||
- that best match the natural language query *"what is the meaning of life?"*
|
||||
13
documentation/docs/miscellaneous/credits.md
Normal file
13
documentation/docs/miscellaneous/credits.md
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
sidebar_position: 4
|
||||
---
|
||||
|
||||
# Credits
|
||||
Many Open Source projects are used to power Khoj. Here's a few of them:
|
||||
|
||||
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
|
||||
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
|
||||
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
|
||||
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
|
||||
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
|
||||
- [GPT4All](https://github.com/nomic-ai/gpt4all) to chat with local LLM
|
||||
25
documentation/docs/miscellaneous/performance.md
Normal file
25
documentation/docs/miscellaneous/performance.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
---
|
||||
|
||||
# Performance
|
||||
|
||||
Here are some top-level performance metrics for Khoj. These are rough estimates and will vary based on your hardware and data.
|
||||
|
||||
### Search performance
|
||||
|
||||
- Semantic search using the bi-encoder is fairly fast at \<100 ms across all content types
|
||||
- Reranking using the cross-encoder is slower at \<2s on 15 results. Tweak `top_k` to tradeoff speed for accuracy of results
|
||||
- Filters in query (e.g by file, word or date) usually add \<20ms to query latency
|
||||
|
||||
### Indexing performance
|
||||
|
||||
- Indexing is more strongly impacted by the size of the source data
|
||||
- Indexing 100K+ line corpus of notes takes about 10 minutes
|
||||
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
|
||||
- Note: *It should only take this long on the first run* as the index is incrementally updated
|
||||
|
||||
### Miscellaneous
|
||||
|
||||
- Testing done on a Mac M1 and a \>100K line corpus of notes
|
||||
- Search, indexing on a GPU has not been tested yet
|
||||
22
documentation/docs/miscellaneous/telemetry.md
Normal file
22
documentation/docs/miscellaneous/telemetry.md
Normal file
@@ -0,0 +1,22 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# Telemetry
|
||||
|
||||
We collect some high level, anonymized metadata about usage of Khoj. This includes:
|
||||
- Client (Web, Emacs, Obsidian)
|
||||
- API usage (Search, Chat)
|
||||
- Configured content types (Github, Org, etc)
|
||||
- Request metadata (e.g., host, referrer)
|
||||
|
||||
We don't send any personal information or any information from/about your content. We only send the above metadata. This helps us prioritize feature development and understand how people are using Khoj. Don't just take our word for it -- you can see [the code here](https://github.com/khoj-ai/khoj/tree/master/src/telemetry).
|
||||
|
||||
## Disable Telemetry
|
||||
|
||||
You can opt out of telemetry at any time. To do so,
|
||||
1. Open `~/.khoj/khoj.yml`
|
||||
2. Set `should-log-telemetry` to `false`
|
||||
3. Save the file and restart Khoj
|
||||
|
||||
If you have any questions or concerns, please reach out to us on [Discord](https://discord.gg/BDgyabRM6e).
|
||||
Reference in New Issue
Block a user