Scrape results from Serper results using Olostep (#627)

* Initailize changes to incporate web scraping logic after getting SERP results
- Do some minor refactors to pass a symptom prompt to the openai model when making a query
- integrate Olostep in order to perform the webscraping
* Fix truncation error with new line, fix typing in olostep code
* Use the authorization header for the token
* Add a small hint/indicator for how to use Khojs other modalities in the welcome prompt
* Add more detailed error message if Olostep query fails
* Add unit tests which invoke Olostep in chat director
* Add test for olostep tool
This commit is contained in:
sabaimran
2024-01-29 00:46:50 -08:00
committed by GitHub
parent 360b59cdb2
commit b782683e60
12 changed files with 214 additions and 30 deletions

View File

@@ -1,3 +1,4 @@
import os
import secrets
import numpy as np
@@ -6,6 +7,7 @@ import pytest
from scipy.stats import linregress
from khoj.processor.embeddings import EmbeddingsModel
from khoj.processor.tools.online_search import search_with_olostep
from khoj.utils import helpers
@@ -80,3 +82,18 @@ def test_encode_docs_memory_leak():
# If slope is positive memory utilization is increasing
# Positive threshold of 2, from observing memory usage trend on MPS vs CPU device
assert slope < 2, f"Memory leak suspected on {device}. Memory usage increased at ~{slope:.2f} MB per iteration"
@pytest.mark.skipif(os.getenv("OLOSTEP_API_KEY") is None, reason="OLOSTEP_API_KEY is not set")
def test_olostep_api():
# Arrange
website = "https://en.wikipedia.org/wiki/Great_Chicago_Fire"
# Act
response = search_with_olostep(website)
# Assert
assert (
"An alarm sent from the area near the fire also failed to register at the courthouse where the fire watchmen were"
in response
)