diff --git a/README.md b/README.md
index 84d763ec..cd1779b2 100644
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@
***
-Khoj is a desktop application to search and chat with your notes, documents and images.
+Khoj is a web application to search and chat with your notes, documents and images.
It is an offline-first, open source AI personal assistant accessible from your Emacs, Obsidian or Web browser.
It works with jpeg, markdown, notion, org-mode, pdf files and github repositories.
@@ -41,3 +41,12 @@ It works with jpeg, markdown, notion, org-mode, pdf files and github repositorie
| Quickly retrieve relevant documents using natural language | Get answers and create content from your existing knowledge base |
| Does not need internet | Can be configured to work without internet |
| | |
+
+## Contributors
+Cheers to our awesome contributors! 🎉
+
+
+
+
+
+Made with [contrib.rocks](https://contrib.rocks).
diff --git a/docs/setup.md b/docs/setup.md
index 7c399cae..ec60a3d3 100644
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -11,6 +11,12 @@ For Installation, you can either use Docker or install Khoj locally.
### 1. Installation (Docker)
+#### Prerequisites
+1. Install Docker Engine. See [official instructions](https://docs.docker.com/engine/install/).
+2. Ensure you have Docker Compose. See [official instructions](https://docs.docker.com/compose/install/).
+
+#### Setup
+
Use the sample docker-compose [in Github](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml) to run Khoj in Docker. Start by configuring all the environment variables to your choosing. Your admin account will automatically be created based on the admin credentials in that file, so pay attention to those. To start the container, run the following command in the same directory as the docker-compose.yml file. This will automatically setup the database and run the Khoj server.
```shell
@@ -35,7 +41,16 @@ Install [Postgres.app](https://postgresapp.com/). This comes pre-installed with
#### **Windows**
-Use the [recommended installer](https://www.postgresql.org/download/windows/)
+1. Use the [recommended installer](https://www.postgresql.org/download/windows/)
+2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#installation) in case you need to manually install it. Reproduced instructions below for convenience.
+
+```bash
+cd /tmp
+git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
+cd pgvector
+make
+make install # may need sudo
+```
#### **Linux**
From [official instructions](https://wiki.postgresql.org/wiki/Apt)
@@ -48,38 +63,30 @@ sudo apt install postgres-16 postgresql-16-pgvector
##### **From Source**
1. Follow instructions to [Install Postgres](https://www.postgresql.org/download/)
-2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#installation) in case you need to manually install it. Reproduced instructions below for convenience.
-
-```bash
-cd /tmp
-git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
-cd pgvector
-make
-make install # may need sudo
-```
+2. Follow instructions to [Install PgVector](https://github.com/pgvector/pgvector#windows) in case you need to manually install it. Windows support is experimental for `pgvector` currently, so we recommend using Docker.
##### Create the Khoj database
-Make sure to update your environment variables to match your Postgres configuration if you're using a different name. The default values should work for most people.
+Make sure to update your environment variables to match your Postgres configuration if you're using a different name. The default values should work for most people. When prompted for a password, you can use the default password `postgres`, or configure it to your preference. Make sure to set the environment variable `POSTGRES_PASSWORD` to the same value as the password you set here.
#### **MacOS**
```bash
-createdb khoj -U postgres
+createdb khoj -U postgres --password
```
#### **Windows**
```bash
-createdb khoj -U postgres
+createdb -U postgres khoj --password
```
#### **Linux**
```bash
-sudo -u postgres createdb khoj
+sudo -u postgres createdb khoj --password
```
@@ -139,7 +146,9 @@ You can use our desktop executables to select file paths and folders to index. Y
To use the desktop client, you need to go to your Khoj server's settings page (http://localhost:42110/config) and copy the API key. Then, paste it into the desktop client's settings page. Once you've done that, you can select files and folders to index.
### 3. Configure
-1. Go to http://localhost:42110/server/admin and login with your admin credentials. Go to the ChatModelOptions if you want to add additional models for chat.
+1. Go to http://localhost:42110/server/admin and login with your admin credentials.
+ 1. Go to [OpenAI settings](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/) in the server admin settings to add an Open AI processor conversation config. This is where you set your API key. Alternatively, you can go to the [offline chat settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and simply create a new setting with `Enabled` set to `True`.
+ 2. Go to the ChatModelOptions if you want to add additional models for chat. For example, you can specify `gpt-4` if you're using OpenAI or `mistral-7b-instruct-v0.1.Q4_0.gguf` if you're using offline chat. Make sure to configure the `type` field to `OpenAI` or `Offline` respectively.
1. Select files and folders to index [using the desktop client](./setup.md?id=_2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
- Select Notion workspaces and Github repositories to index using the web interface.
diff --git a/pyproject.toml b/pyproject.toml
index 63a50fac..42adf209 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -75,6 +75,7 @@ dependencies = [
"tzdata == 2023.3",
"rapidocr-onnxruntime == 1.3.8",
"stripe == 7.3.0",
+ "openai-whisper >= 20231117",
]
dynamic = ["version"]
diff --git a/src/interface/desktop/assets/icons/microphone-solid.svg b/src/interface/desktop/assets/icons/microphone-solid.svg
new file mode 100644
index 00000000..3fc4b91d
--- /dev/null
+++ b/src/interface/desktop/assets/icons/microphone-solid.svg
@@ -0,0 +1 @@
+
diff --git a/src/interface/desktop/assets/icons/stop-solid.svg b/src/interface/desktop/assets/icons/stop-solid.svg
new file mode 100644
index 00000000..a9aaba28
--- /dev/null
+++ b/src/interface/desktop/assets/icons/stop-solid.svg
@@ -0,0 +1,37 @@
+
+
diff --git a/src/interface/desktop/chat.html b/src/interface/desktop/chat.html
index 35bc7422..120f6647 100644
--- a/src/interface/desktop/chat.html
+++ b/src/interface/desktop/chat.html
@@ -292,14 +292,13 @@
.then(response => {
const reader = response.body.getReader();
const decoder = new TextDecoder();
+ let rawResponse = "";
let references = null;
function readStream() {
reader.read().then(({ done, value }) => {
if (done) {
- // Evaluate the contents of new_response_text.innerHTML after all the data has been streamed
- const currentHTML = newResponseText.innerHTML;
- newResponseText.innerHTML = formatHTMLMessage(currentHTML);
+ // Append any references after all the data has been streamed
newResponseText.appendChild(references);
document.getElementById("chat-body").scrollTop = document.getElementById("chat-body").scrollHeight;
return;
@@ -310,14 +309,15 @@
if (chunk.includes("### compiled references:")) {
const additionalResponse = chunk.split("### compiled references:")[0];
- newResponseText.innerHTML += additionalResponse;
+ rawResponse += additionalResponse;
+ newResponseText.innerHTML = "";
+ newResponseText.appendChild(formatHTMLMessage(rawResponse));
const rawReference = chunk.split("### compiled references:")[1];
const rawReferenceAsJson = JSON.parse(rawReference);
references = document.createElement('div');
references.classList.add("references");
-
let referenceExpandButton = document.createElement('button');
referenceExpandButton.classList.add("reference-expand-button");
@@ -374,7 +374,10 @@
}
} else {
// If the chunk is not a JSON object, just display it as is
- newResponseText.innerHTML += chunk;
+ rawResponse += chunk;
+ newResponseText.innerHTML = "";
+ newResponseText.appendChild(formatHTMLMessage(rawResponse));
+
readStream();
}
}
@@ -529,6 +532,18 @@
}
}
+ function flashStatusInChatInput(message) {
+ // Get chat input element and original placeholder
+ let chatInput = document.getElementById("chat-input");
+ let originalPlaceholder = chatInput.placeholder;
+ // Set placeholder to message
+ chatInput.placeholder = message;
+ // Reset placeholder after 2 seconds
+ setTimeout(() => {
+ chatInput.placeholder = originalPlaceholder;
+ }, 2000);
+ }
+
async function clearConversationHistory() {
let chatInput = document.getElementById("chat-input");
let originalPlaceholder = chatInput.placeholder;
@@ -543,17 +558,71 @@
.then(data => {
chatBody.innerHTML = "";
loadChat();
- chatInput.placeholder = "Cleared conversation history";
+ flashStatusInChatInput("🗑 Cleared conversation history");
})
.catch(err => {
- chatInput.placeholder = "Failed to clear conversation history";
+ flashStatusInChatInput("⛔️ Failed to clear conversation history");
})
- .finally(() => {
- setTimeout(() => {
- chatInput.placeholder = originalPlaceholder;
- }, 2000);
- });
}
+
+ let mediaRecorder;
+ async function speechToText() {
+ const speakButtonImg = document.getElementById('speak-button-img');
+ const chatInput = document.getElementById('chat-input');
+
+ const hostURL = await window.hostURLAPI.getURL();
+ let url = `${hostURL}/api/transcribe?client=desktop`;
+ const khojToken = await window.tokenAPI.getToken();
+ const headers = { 'Authorization': `Bearer ${khojToken}` };
+
+ const sendToServer = (audioBlob) => {
+ const formData = new FormData();
+ formData.append('file', audioBlob);
+
+ fetch(url, { method: 'POST', body: formData, headers})
+ .then(response => response.ok ? response.json() : Promise.reject(response))
+ .then(data => { chatInput.value += data.text; })
+ .catch(err => {
+ err.status == 422
+ ? flashStatusInChatInput("⛔️ Configure speech-to-text model on server.")
+ : flashStatusInChatInput("⛔️ Failed to transcribe audio")
+ });
+ };
+
+ const handleRecording = (stream) => {
+ const audioChunks = [];
+ const recordingConfig = { mimeType: 'audio/webm' };
+ mediaRecorder = new MediaRecorder(stream, recordingConfig);
+
+ mediaRecorder.addEventListener("dataavailable", function(event) {
+ if (event.data.size > 0) audioChunks.push(event.data);
+ });
+
+ mediaRecorder.addEventListener("stop", function() {
+ const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
+ sendToServer(audioBlob);
+ });
+
+ mediaRecorder.start();
+ speakButtonImg.src = './assets/icons/stop-solid.svg';
+ speakButtonImg.alt = 'Stop Transcription';
+ };
+
+ // Toggle recording
+ if (!mediaRecorder || mediaRecorder.state === 'inactive') {
+ navigator.mediaDevices
+ .getUserMedia({ audio: true })
+ .then(handleRecording)
+ .catch((e) => {
+ flashStatusInChatInput("⛔️ Failed to access microphone");
+ });
+ } else if (mediaRecorder.state === 'recording') {
+ mediaRecorder.stop();
+ speakButtonImg.src = './assets/icons/microphone-solid.svg';
+ speakButtonImg.alt = 'Transcribe';
+ }
+ }
+
@@ -582,8 +651,11 @@
-
@@ -633,7 +705,6 @@
.chat-message.you {
margin-right: auto;
text-align: right;
- white-space: pre-line;
}
/* basic style chat message text */
.chat-message-text {
@@ -650,7 +721,6 @@
color: var(--primary-inverse);
background: var(--primary);
margin-left: auto;
- white-space: pre-line;
}
/* Spinner symbol when the chat message is loading */
.spinner {
@@ -707,7 +777,7 @@
}
#input-row {
display: grid;
- grid-template-columns: auto 32px;
+ grid-template-columns: auto 32px 32px;
grid-column-gap: 10px;
grid-row-gap: 10px;
background: #f9fafc
diff --git a/src/interface/obsidian/src/chat_modal.ts b/src/interface/obsidian/src/chat_modal.ts
index fc6d5a48..16c5614f 100644
--- a/src/interface/obsidian/src/chat_modal.ts
+++ b/src/interface/obsidian/src/chat_modal.ts
@@ -1,4 +1,4 @@
-import { App, Modal, request, setIcon } from 'obsidian';
+import { App, Modal, RequestUrlParam, request, requestUrl, setIcon } from 'obsidian';
import { KhojSetting } from 'src/settings';
import fetch from "node-fetch";
@@ -51,6 +51,16 @@ export class KhojChatModal extends Modal {
})
chatInput.addEventListener('change', (event) => { this.result = (event.target).value });
+ let transcribe = inputRow.createEl("button", {
+ text: "Transcribe",
+ attr: {
+ id: "khoj-transcribe",
+ class: "khoj-transcribe khoj-input-row-button",
+ },
+ })
+ transcribe.addEventListener('click', async (_) => { await this.speechToText() });
+ setIcon(transcribe, "mic");
+
let clearChat = inputRow.createEl("button", {
text: "Clear History",
attr: {
@@ -205,9 +215,19 @@ export class KhojChatModal extends Modal {
}
}
- async clearConversationHistory() {
+ flashStatusInChatInput(message: string) {
+ // Get chat input element and original placeholder
let chatInput = this.contentEl.getElementsByClassName("khoj-chat-input")[0];
let originalPlaceholder = chatInput.placeholder;
+ // Set placeholder to message
+ chatInput.placeholder = message;
+ // Reset placeholder after 2 seconds
+ setTimeout(() => {
+ chatInput.placeholder = originalPlaceholder;
+ }, 2000);
+ }
+
+ async clearConversationHistory() {
let chatBody = this.contentEl.getElementsByClassName("khoj-chat-body")[0];
let response = await request({
@@ -224,15 +244,84 @@ export class KhojChatModal extends Modal {
// If conversation history is cleared successfully, clear chat logs from modal
chatBody.innerHTML = "";
await this.getChatHistory();
- chatInput.placeholder = result.message;
+ this.flashStatusInChatInput(result.message);
}
} catch (err) {
- chatInput.placeholder = "Failed to clear conversation history";
- } finally {
- // Reset to original placeholder text after some time
- setTimeout(() => {
- chatInput.placeholder = originalPlaceholder;
- }, 2000);
+ this.flashStatusInChatInput("Failed to clear conversation history");
+ }
+ }
+
+ mediaRecorder: MediaRecorder | undefined;
+ async speechToText() {
+ const transcribeButton = this.contentEl.getElementsByClassName("khoj-transcribe")[0];
+ const chatInput = this.contentEl.getElementsByClassName("khoj-chat-input")[0];
+
+ const generateRequestBody = async (audioBlob: Blob, boundary_string: string) => {
+ const boundary = `------${boundary_string}`;
+ const chunks: ArrayBuffer[] = [];
+
+ chunks.push(new TextEncoder().encode(`${boundary}\r\n`));
+ chunks.push(new TextEncoder().encode(`Content-Disposition: form-data; name="file"; filename="blob"\r\nContent-Type: "application/octet-stream"\r\n\r\n`));
+ chunks.push(await audioBlob.arrayBuffer());
+ chunks.push(new TextEncoder().encode('\r\n'));
+
+ await Promise.all(chunks);
+ chunks.push(new TextEncoder().encode(`${boundary}--\r\n`));
+ return await new Blob(chunks).arrayBuffer();
+ };
+
+ const sendToServer = async (audioBlob: Blob) => {
+ const boundary_string = `Boundary${Math.random().toString(36).slice(2)}`;
+ const requestBody = await generateRequestBody(audioBlob, boundary_string);
+
+ const response = await requestUrl({
+ url: `${this.setting.khojUrl}/api/transcribe?client=obsidian`,
+ method: 'POST',
+ headers: { "Authorization": `Bearer ${this.setting.khojApiKey}` },
+ contentType: `multipart/form-data; boundary=----${boundary_string}`,
+ body: requestBody,
+ });
+
+ // Parse response from Khoj backend
+ if (response.status === 200) {
+ console.log(response);
+ chatInput.value += response.json.text;
+ } else if (response.status === 422) {
+ throw new Error("⛔️ Failed to transcribe audio");
+ } else {
+ throw new Error("⛔️ Configure speech-to-text model on server.");
+ }
+ };
+
+ const handleRecording = (stream: MediaStream) => {
+ const audioChunks: Blob[] = [];
+ const recordingConfig = { mimeType: 'audio/webm' };
+ this.mediaRecorder = new MediaRecorder(stream, recordingConfig);
+
+ this.mediaRecorder.addEventListener("dataavailable", function(event) {
+ if (event.data.size > 0) audioChunks.push(event.data);
+ });
+
+ this.mediaRecorder.addEventListener("stop", async function() {
+ const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
+ await sendToServer(audioBlob);
+ });
+
+ this.mediaRecorder.start();
+ setIcon(transcribeButton, "mic-off");
+ };
+
+ // Toggle recording
+ if (!this.mediaRecorder || this.mediaRecorder.state === 'inactive') {
+ navigator.mediaDevices
+ .getUserMedia({ audio: true })
+ .then(handleRecording)
+ .catch((e) => {
+ this.flashStatusInChatInput("⛔️ Failed to access microphone");
+ });
+ } else if (this.mediaRecorder.state === 'recording') {
+ this.mediaRecorder.stop();
+ setIcon(transcribeButton, "mic");
}
}
}
diff --git a/src/interface/obsidian/styles.css b/src/interface/obsidian/styles.css
index 95a304f1..ff2dee8a 100644
--- a/src/interface/obsidian/styles.css
+++ b/src/interface/obsidian/styles.css
@@ -112,7 +112,7 @@ If your plugin does not need CSS, delete this file.
}
.khoj-input-row {
display: grid;
- grid-template-columns: auto 32px;
+ grid-template-columns: auto 32px 32px;
grid-column-gap: 10px;
grid-row-gap: 10px;
background: var(--background-primary);
diff --git a/src/khoj/database/adapters/__init__.py b/src/khoj/database/adapters/__init__.py
index 5bfab8d3..12a127e9 100644
--- a/src/khoj/database/adapters/__init__.py
+++ b/src/khoj/database/adapters/__init__.py
@@ -28,6 +28,7 @@ from khoj.database.models import (
OfflineChatProcessorConversationConfig,
OpenAIProcessorConversationConfig,
SearchModelConfig,
+ SpeechToTextModelOptions,
Subscription,
UserConversationConfig,
OpenAIProcessorConversationConfig,
@@ -370,6 +371,10 @@ class ConversationAdapters:
async def get_openai_chat_config():
return await OpenAIProcessorConversationConfig.objects.filter().afirst()
+ @staticmethod
+ async def get_speech_to_text_config():
+ return await SpeechToTextModelOptions.objects.filter().afirst()
+
@staticmethod
async def aget_conversation_starters(user: KhojUser):
all_questions = []
diff --git a/src/khoj/database/admin.py b/src/khoj/database/admin.py
index e1095ece..2213fb6e 100644
--- a/src/khoj/database/admin.py
+++ b/src/khoj/database/admin.py
@@ -9,6 +9,7 @@ from khoj.database.models import (
OpenAIProcessorConversationConfig,
OfflineChatProcessorConversationConfig,
SearchModelConfig,
+ SpeechToTextModelOptions,
Subscription,
ReflectiveQuestion,
)
@@ -16,6 +17,7 @@ from khoj.database.models import (
admin.site.register(KhojUser, UserAdmin)
admin.site.register(ChatModelOptions)
+admin.site.register(SpeechToTextModelOptions)
admin.site.register(OpenAIProcessorConversationConfig)
admin.site.register(OfflineChatProcessorConversationConfig)
admin.site.register(SearchModelConfig)
diff --git a/src/khoj/database/migrations/0021_speechtotextmodeloptions_and_more.py b/src/khoj/database/migrations/0021_speechtotextmodeloptions_and_more.py
new file mode 100644
index 00000000..37337791
--- /dev/null
+++ b/src/khoj/database/migrations/0021_speechtotextmodeloptions_and_more.py
@@ -0,0 +1,42 @@
+# Generated by Django 4.2.7 on 2023-11-26 13:54
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+ dependencies = [
+ ("database", "0020_reflectivequestion"),
+ ]
+
+ operations = [
+ migrations.CreateModel(
+ name="SpeechToTextModelOptions",
+ fields=[
+ ("id", models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name="ID")),
+ ("created_at", models.DateTimeField(auto_now_add=True)),
+ ("updated_at", models.DateTimeField(auto_now=True)),
+ ("model_name", models.CharField(default="base", max_length=200)),
+ (
+ "model_type",
+ models.CharField(
+ choices=[("openai", "Openai"), ("offline", "Offline")], default="offline", max_length=200
+ ),
+ ),
+ ],
+ options={
+ "abstract": False,
+ },
+ ),
+ migrations.AlterField(
+ model_name="chatmodeloptions",
+ name="chat_model",
+ field=models.CharField(default="mistral-7b-instruct-v0.1.Q4_0.gguf", max_length=200),
+ ),
+ migrations.AlterField(
+ model_name="chatmodeloptions",
+ name="model_type",
+ field=models.CharField(
+ choices=[("openai", "Openai"), ("offline", "Offline")], default="offline", max_length=200
+ ),
+ ),
+ ]
diff --git a/src/khoj/database/models/__init__.py b/src/khoj/database/models/__init__.py
index b0463df8..82348fbe 100644
--- a/src/khoj/database/models/__init__.py
+++ b/src/khoj/database/models/__init__.py
@@ -120,6 +120,15 @@ class OfflineChatProcessorConversationConfig(BaseModel):
enabled = models.BooleanField(default=False)
+class SpeechToTextModelOptions(BaseModel):
+ class ModelType(models.TextChoices):
+ OPENAI = "openai"
+ OFFLINE = "offline"
+
+ model_name = models.CharField(max_length=200, default="base")
+ model_type = models.CharField(max_length=200, choices=ModelType.choices, default=ModelType.OFFLINE)
+
+
class ChatModelOptions(BaseModel):
class ModelType(models.TextChoices):
OPENAI = "openai"
@@ -127,8 +136,8 @@ class ChatModelOptions(BaseModel):
max_prompt_size = models.IntegerField(default=None, null=True, blank=True)
tokenizer = models.CharField(max_length=200, default=None, null=True, blank=True)
- chat_model = models.CharField(max_length=200, default=None, null=True, blank=True)
- model_type = models.CharField(max_length=200, choices=ModelType.choices, default=ModelType.OPENAI)
+ chat_model = models.CharField(max_length=200, default="mistral-7b-instruct-v0.1.Q4_0.gguf")
+ model_type = models.CharField(max_length=200, choices=ModelType.choices, default=ModelType.OFFLINE)
class UserConversationConfig(BaseModel):
diff --git a/src/khoj/interface/web/assets/icons/microphone-solid.svg b/src/khoj/interface/web/assets/icons/microphone-solid.svg
new file mode 100644
index 00000000..3fc4b91d
--- /dev/null
+++ b/src/khoj/interface/web/assets/icons/microphone-solid.svg
@@ -0,0 +1 @@
+
diff --git a/src/khoj/interface/web/assets/icons/stop-solid.svg b/src/khoj/interface/web/assets/icons/stop-solid.svg
new file mode 100644
index 00000000..a9aaba28
--- /dev/null
+++ b/src/khoj/interface/web/assets/icons/stop-solid.svg
@@ -0,0 +1,37 @@
+
+
diff --git a/src/khoj/interface/web/chat.html b/src/khoj/interface/web/chat.html
index df39ca4f..de6b899c 100644
--- a/src/khoj/interface/web/chat.html
+++ b/src/khoj/interface/web/chat.html
@@ -330,15 +330,13 @@ To get started, just start typing below. You can also type / to see a list of co
.then(response => {
const reader = response.body.getReader();
const decoder = new TextDecoder();
+ let rawResponse = "";
let references = null;
function readStream() {
reader.read().then(({ done, value }) => {
if (done) {
- // Evaluate the contents of new_response_text.innerHTML after all the data has been streamed
- const currentHTML = newResponseText.innerHTML;
- newResponseText.innerHTML = "";
- newResponseText.appendChild(formatHTMLMessage(currentHTML));
+ // Append any references after all the data has been streamed
if (references != null) {
newResponseText.appendChild(references);
}
@@ -352,7 +350,9 @@ To get started, just start typing below. You can also type / to see a list of co
if (chunk.includes("### compiled references:")) {
const additionalResponse = chunk.split("### compiled references:")[0];
- newResponseText.innerHTML += additionalResponse;
+ rawResponse += additionalResponse;
+ newResponseText.innerHTML = "";
+ newResponseText.appendChild(formatHTMLMessage(rawResponse));
const rawReference = chunk.split("### compiled references:")[1];
const rawReferenceAsJson = JSON.parse(rawReference);
@@ -362,7 +362,6 @@ To get started, just start typing below. You can also type / to see a list of co
let referenceExpandButton = document.createElement('button');
referenceExpandButton.classList.add("reference-expand-button");
-
let referenceSection = document.createElement('div');
referenceSection.classList.add("reference-section");
referenceSection.classList.add("collapsed");
@@ -416,7 +415,9 @@ To get started, just start typing below. You can also type / to see a list of co
}
} else {
// If the chunk is not a JSON object, just display it as is
- newResponseText.innerHTML += chunk;
+ rawResponse += chunk;
+ newResponseText.innerHTML = "";
+ newResponseText.appendChild(formatHTMLMessage(rawResponse));
readStream();
}
}
@@ -557,6 +558,18 @@ To get started, just start typing below. You can also type / to see a list of co
}
}
+ function flashStatusInChatInput(message) {
+ // Get chat input element and original placeholder
+ let chatInput = document.getElementById("chat-input");
+ let originalPlaceholder = chatInput.placeholder;
+ // Set placeholder to message
+ chatInput.placeholder = message;
+ // Reset placeholder after 2 seconds
+ setTimeout(() => {
+ chatInput.placeholder = originalPlaceholder;
+ }, 2000);
+ }
+
function clearConversationHistory() {
let chatInput = document.getElementById("chat-input");
let originalPlaceholder = chatInput.placeholder;
@@ -567,17 +580,65 @@ To get started, just start typing below. You can also type / to see a list of co
.then(data => {
chatBody.innerHTML = "";
loadChat();
- chatInput.placeholder = "Cleared conversation history";
+ flashStatusInChatInput("🗑 Cleared conversation history");
})
.catch(err => {
- chatInput.placeholder = "Failed to clear conversation history";
- })
- .finally(() => {
- setTimeout(() => {
- chatInput.placeholder = originalPlaceholder;
- }, 2000);
+ flashStatusInChatInput("⛔️ Failed to clear conversation history");
});
}
+
+ let mediaRecorder;
+ function speechToText() {
+ const speakButtonImg = document.getElementById('speak-button-img');
+ const chatInput = document.getElementById('chat-input');
+
+ const sendToServer = (audioBlob) => {
+ const formData = new FormData();
+ formData.append('file', audioBlob);
+
+ fetch('/api/transcribe?client=web', { method: 'POST', body: formData })
+ .then(response => response.ok ? response.json() : Promise.reject(response))
+ .then(data => { chatInput.value += data.text; })
+ .catch(err => {
+ err.status == 422
+ ? flashStatusInChatInput("⛔️ Configure speech-to-text model on server.")
+ : flashStatusInChatInput("⛔️ Failed to transcribe audio")
+ });
+ };
+
+ const handleRecording = (stream) => {
+ const audioChunks = [];
+ const recordingConfig = { mimeType: 'audio/webm' };
+ mediaRecorder = new MediaRecorder(stream, recordingConfig);
+
+ mediaRecorder.addEventListener("dataavailable", function(event) {
+ if (event.data.size > 0) audioChunks.push(event.data);
+ });
+
+ mediaRecorder.addEventListener("stop", function() {
+ const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
+ sendToServer(audioBlob);
+ });
+
+ mediaRecorder.start();
+ speakButtonImg.src = '/static/assets/icons/stop-solid.svg';
+ speakButtonImg.alt = 'Stop Transcription';
+ };
+
+ // Toggle recording
+ if (!mediaRecorder || mediaRecorder.state === 'inactive') {
+ navigator.mediaDevices
+ .getUserMedia({ audio: true })
+ .then(handleRecording)
+ .catch((e) => {
+ flashStatusInChatInput("⛔️ Failed to access microphone");
+ });
+ } else if (mediaRecorder.state === 'recording') {
+ mediaRecorder.stop();
+ speakButtonImg.src = '/static/assets/icons/microphone-solid.svg';
+ speakButtonImg.alt = 'Transcribe';
+ }
+ }
@@ -598,8 +659,11 @@ To get started, just start typing below. You can also type / to see a list of co
+
+
+
-
+
@@ -763,7 +827,6 @@ To get started, just start typing below. You can also type / to see a list of co
.chat-message.you {
margin-right: auto;
text-align: right;
- white-space: pre-line;
}
/* basic style chat message text */
.chat-message-text {
@@ -780,7 +843,6 @@ To get started, just start typing below. You can also type / to see a list of co
color: var(--primary-inverse);
background: var(--primary);
margin-left: auto;
- white-space: pre-line;
}
/* Spinner symbol when the chat message is loading */
.spinner {
@@ -829,6 +891,7 @@ To get started, just start typing below. You can also type / to see a list of co
#chat-footer {
padding: 0;
+ margin: 8px;
display: grid;
grid-template-columns: minmax(70px, 100%);
grid-column-gap: 10px;
@@ -836,7 +899,7 @@ To get started, just start typing below. You can also type / to see a list of co
}
#input-row {
display: grid;
- grid-template-columns: auto 32px;
+ grid-template-columns: auto 32px 32px;
grid-column-gap: 10px;
grid-row-gap: 10px;
background: #f9fafc
diff --git a/src/khoj/processor/conversation/gpt4all/__init__.py b/src/khoj/processor/conversation/offline/__init__.py
similarity index 100%
rename from src/khoj/processor/conversation/gpt4all/__init__.py
rename to src/khoj/processor/conversation/offline/__init__.py
diff --git a/src/khoj/processor/conversation/gpt4all/chat_model.py b/src/khoj/processor/conversation/offline/chat_model.py
similarity index 100%
rename from src/khoj/processor/conversation/gpt4all/chat_model.py
rename to src/khoj/processor/conversation/offline/chat_model.py
diff --git a/src/khoj/processor/conversation/gpt4all/utils.py b/src/khoj/processor/conversation/offline/utils.py
similarity index 100%
rename from src/khoj/processor/conversation/gpt4all/utils.py
rename to src/khoj/processor/conversation/offline/utils.py
diff --git a/src/khoj/processor/conversation/offline/whisper.py b/src/khoj/processor/conversation/offline/whisper.py
new file mode 100644
index 00000000..56d2aaf5
--- /dev/null
+++ b/src/khoj/processor/conversation/offline/whisper.py
@@ -0,0 +1,17 @@
+# External Packages
+from asgiref.sync import sync_to_async
+import whisper
+
+# Internal Packages
+from khoj.utils import state
+
+
+async def transcribe_audio_offline(audio_filename: str, model: str) -> str:
+ """
+ Transcribe audio file offline using Whisper
+ """
+ # Send the audio data to the Whisper API
+ if not state.whisper_model:
+ state.whisper_model = whisper.load_model(model)
+ response = await sync_to_async(state.whisper_model.transcribe)(audio_filename)
+ return response["text"]
diff --git a/src/khoj/processor/conversation/openai/whisper.py b/src/khoj/processor/conversation/openai/whisper.py
new file mode 100644
index 00000000..72834d92
--- /dev/null
+++ b/src/khoj/processor/conversation/openai/whisper.py
@@ -0,0 +1,15 @@
+# Standard Packages
+from io import BufferedReader
+
+# External Packages
+from asgiref.sync import sync_to_async
+import openai
+
+
+async def transcribe_audio(audio_file: BufferedReader, model, api_key) -> str:
+ """
+ Transcribe audio file using Whisper model via OpenAI's API
+ """
+ # Send the audio data to the Whisper API
+ response = await sync_to_async(openai.Audio.translate)(model=model, file=audio_file, api_key=api_key)
+ return response["text"]
diff --git a/src/khoj/routers/api.py b/src/khoj/routers/api.py
index 55df06d8..cb0606f1 100644
--- a/src/khoj/routers/api.py
+++ b/src/khoj/routers/api.py
@@ -3,13 +3,14 @@ import concurrent.futures
import json
import logging
import math
+import os
import time
from typing import Any, Dict, List, Optional, Union
-
-from asgiref.sync import sync_to_async
+import uuid
# External Packages
-from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi import APIRouter, Depends, File, HTTPException, Request, UploadFile
+from asgiref.sync import sync_to_async
from fastapi.requests import Request
from fastapi.responses import Response, StreamingResponse
from starlette.authentication import requires
@@ -29,8 +30,10 @@ from khoj.database.models import (
LocalPlaintextConfig,
NotionConfig,
)
-from khoj.processor.conversation.gpt4all.chat_model import extract_questions_offline
+from khoj.processor.conversation.offline.chat_model import extract_questions_offline
+from khoj.processor.conversation.offline.whisper import transcribe_audio_offline
from khoj.processor.conversation.openai.gpt import extract_questions
+from khoj.processor.conversation.openai.whisper import transcribe_audio
from khoj.processor.conversation.prompts import help_message, no_entries_found
from khoj.processor.tools.online_search import search_with_google
from khoj.routers.helpers import (
@@ -585,6 +588,59 @@ async def chat_options(
return Response(content=json.dumps(cmd_options), media_type="application/json", status_code=200)
+@api.post("/transcribe")
+@requires(["authenticated"])
+async def transcribe(request: Request, common: CommonQueryParams, file: UploadFile = File(...)):
+ user: KhojUser = request.user.object
+ audio_filename = f"{user.uuid}-{str(uuid.uuid4())}.webm"
+ user_message: str = None
+
+ # If the file is too large, return an unprocessable entity error
+ if file.size > 10 * 1024 * 1024:
+ logger.warning(f"Audio file too large to transcribe. Audio file size: {file.size}. Exceeds 10Mb limit.")
+ return Response(content="Audio size larger than 10Mb limit", status_code=422)
+
+ # Transcribe the audio from the request
+ try:
+ # Store the audio from the request in a temporary file
+ audio_data = await file.read()
+ with open(audio_filename, "wb") as audio_file_writer:
+ audio_file_writer.write(audio_data)
+ audio_file = open(audio_filename, "rb")
+
+ # Send the audio data to the Whisper API
+ speech_to_text_config = await ConversationAdapters.get_speech_to_text_config()
+ openai_chat_config = await ConversationAdapters.get_openai_chat_config()
+ if not speech_to_text_config:
+ # If the user has not configured a speech to text model, return an unprocessable entity error
+ status_code = 422
+ elif openai_chat_config and speech_to_text_config.model_type == ChatModelOptions.ModelType.OPENAI:
+ api_key = openai_chat_config.api_key
+ speech2text_model = speech_to_text_config.model_name
+ user_message = await transcribe_audio(audio_file, model=speech2text_model, api_key=api_key)
+ elif speech_to_text_config.model_type == ChatModelOptions.ModelType.OFFLINE:
+ speech2text_model = speech_to_text_config.model_name
+ user_message = await transcribe_audio_offline(audio_filename, model=speech2text_model)
+ finally:
+ # Close and Delete the temporary audio file
+ audio_file.close()
+ os.remove(audio_filename)
+
+ if user_message is None:
+ return Response(status_code=status_code or 500)
+
+ update_telemetry_state(
+ request=request,
+ telemetry_type="api",
+ api="transcribe",
+ **common.__dict__,
+ )
+
+ # Return the spoken text
+ content = json.dumps({"text": user_message})
+ return Response(content=content, media_type="application/json", status_code=200)
+
+
@api.get("/chat", response_class=Response)
@requires(["authenticated"])
async def chat(
diff --git a/src/khoj/routers/helpers.py b/src/khoj/routers/helpers.py
index 273de15b..39448d1a 100644
--- a/src/khoj/routers/helpers.py
+++ b/src/khoj/routers/helpers.py
@@ -17,7 +17,7 @@ from asgiref.sync import sync_to_async
from khoj.database.adapters import ConversationAdapters, EntryAdapters
from khoj.database.models import KhojUser, Subscription
from khoj.processor.conversation import prompts
-from khoj.processor.conversation.gpt4all.chat_model import converse_offline, send_message_to_model_offline
+from khoj.processor.conversation.offline.chat_model import converse_offline, send_message_to_model_offline
from khoj.processor.conversation.openai.gpt import converse, send_message_to_model
from khoj.processor.conversation.utils import ThreadedGenerator, message_to_log
diff --git a/src/khoj/utils/config.py b/src/khoj/utils/config.py
index 7795d695..abda12b6 100644
--- a/src/khoj/utils/config.py
+++ b/src/khoj/utils/config.py
@@ -11,7 +11,7 @@ from typing import TYPE_CHECKING, List, Optional, Union, Any
import torch
# Internal Packages
-from khoj.processor.conversation.gpt4all.utils import download_model
+from khoj.processor.conversation.offline.utils import download_model
logger = logging.getLogger(__name__)
@@ -80,7 +80,7 @@ class GPT4AllProcessorConfig:
class GPT4AllProcessorModel:
def __init__(
self,
- chat_model: str = "llama-2-7b-chat.ggmlv3.q4_0.bin",
+ chat_model: str = "mistral-7b-instruct-v0.1.Q4_0.gguf",
):
self.chat_model = chat_model
self.loaded_model = None
diff --git a/src/khoj/utils/initialization.py b/src/khoj/utils/initialization.py
index ffc4d47e..313b18fc 100644
--- a/src/khoj/utils/initialization.py
+++ b/src/khoj/utils/initialization.py
@@ -6,6 +6,7 @@ from khoj.database.models import (
OfflineChatProcessorConversationConfig,
OpenAIProcessorConversationConfig,
ChatModelOptions,
+ SpeechToTextModelOptions,
)
from khoj.utils.constants import default_offline_chat_model, default_online_chat_model
@@ -73,10 +74,9 @@ def initialization():
except ModuleNotFoundError as e:
logger.warning("Offline models are not supported on this device.")
- use_openai_model = input("Use OpenAI chat model? (y/n): ")
-
+ use_openai_model = input("Use OpenAI models? (y/n): ")
if use_openai_model == "y":
- logger.info("🗣️ Setting up OpenAI chat model")
+ logger.info("🗣️ Setting up your OpenAI configuration")
api_key = input("Enter your OpenAI API key: ")
OpenAIProcessorConversationConfig.objects.create(api_key=api_key)
@@ -94,7 +94,34 @@ def initialization():
chat_model=openai_chat_model, model_type=ChatModelOptions.ModelType.OPENAI, max_prompt_size=max_tokens
)
- logger.info("🗣️ Chat model configuration complete")
+ default_speech2text_model = "whisper-1"
+ openai_speech2text_model = input(
+ f"Enter the OpenAI speech to text model you want to use (default: {default_speech2text_model}): "
+ )
+ openai_speech2text_model = openai_speech2text_model or default_speech2text_model
+ SpeechToTextModelOptions.objects.create(
+ model_name=openai_speech2text_model, model_type=SpeechToTextModelOptions.ModelType.OPENAI
+ )
+
+ if use_offline_model == "y" or use_openai_model == "y":
+ logger.info("🗣️ Chat model configuration complete")
+
+ use_offline_speech2text_model = input("Use offline speech to text model? (y/n): ")
+ if use_offline_speech2text_model == "y":
+ logger.info("🗣️ Setting up offline speech to text model")
+ # Delete any existing speech to text model options. There can only be one.
+ SpeechToTextModelOptions.objects.all().delete()
+
+ default_offline_speech2text_model = "base"
+ offline_speech2text_model = input(
+ f"Enter the Whisper model to use Offline (default: {default_offline_speech2text_model}): "
+ )
+ offline_speech2text_model = offline_speech2text_model or default_offline_speech2text_model
+ SpeechToTextModelOptions.objects.create(
+ model_name=offline_speech2text_model, model_type=SpeechToTextModelOptions.ModelType.OFFLINE
+ )
+
+ logger.info(f"🗣️ Offline speech to text model configured to {offline_speech2text_model}")
admin_user = KhojUser.objects.filter(is_staff=True).first()
if admin_user is None:
diff --git a/src/khoj/utils/state.py b/src/khoj/utils/state.py
index 91f5f0ce..b54cf4b3 100644
--- a/src/khoj/utils/state.py
+++ b/src/khoj/utils/state.py
@@ -7,6 +7,7 @@ from collections import defaultdict
# External Packages
from pathlib import Path
from khoj.processor.embeddings import CrossEncoderModel, EmbeddingsModel
+from whisper import Whisper
# Internal Packages
from khoj.utils import config as utils_config
@@ -21,6 +22,7 @@ embeddings_model: EmbeddingsModel = None
cross_encoder_model: CrossEncoderModel = None
content_index = ContentIndex()
gpt4all_processor_config: GPT4AllProcessorModel = None
+whisper_model: Whisper = None
config_file: Path = None
verbose: int = 0
host: str = None
diff --git a/src/telemetry/telemetry.py b/src/telemetry/telemetry.py
index 1a2a8f1e..fabaafa0 100644
--- a/src/telemetry/telemetry.py
+++ b/src/telemetry/telemetry.py
@@ -47,7 +47,7 @@ def v1_telemetry(telemetry_data: List[Dict[str, str]]):
# Create a table if it doesn't exist
cur.execute(
- """CREATE TABLE IF NOT EXISTS usage (id INTEGER PRIMARY KEY, time TIMESTAMP, type TEXT, server_id TEXT, os TEXT, api TEXT, client TEXT)"""
+ """CREATE TABLE IF NOT EXISTS usage (id INTEGER PRIMARY KEY, time TIMESTAMP, type TEXT, server_id TEXT, os TEXT, api TEXT, client TEXT, server_version TEXT)"""
)
# Log telemetry data
diff --git a/tests/test_gpt4all_chat_actors.py b/tests/test_gpt4all_chat_actors.py
index 782b54f2..7b59e1e3 100644
--- a/tests/test_gpt4all_chat_actors.py
+++ b/tests/test_gpt4all_chat_actors.py
@@ -19,8 +19,8 @@ except ModuleNotFoundError as e:
print("There was an error importing GPT4All. Please run pip install gpt4all in order to install it.")
# Internal Packages
-from khoj.processor.conversation.gpt4all.chat_model import converse_offline, extract_questions_offline, filter_questions
-from khoj.processor.conversation.gpt4all.utils import download_model
+from khoj.processor.conversation.offline.chat_model import converse_offline, extract_questions_offline, filter_questions
+from khoj.processor.conversation.offline.utils import download_model
from khoj.processor.conversation.utils import message_to_log