Improve data source, output mode selection

- Set output mode to single string. Specify output schema in prompt
  - Both thesee should encourage model to select only 1 output mode
    instead of encouraging it in prompt too many times
  - Output schema should also improve schema following in general
- Standardize variable, func name of io selector for readability
- Fix chat actors to test the io selector chat actor
- Make chat actor return sources, output separately for better
  disambiguation, at least during tests, for now
This commit is contained in:
Debanjum
2024-11-18 12:49:48 -08:00
parent e3fd51d14b
commit 653127bf1d
6 changed files with 104 additions and 88 deletions

View File

@@ -630,37 +630,36 @@ pick_relevant_tools = PromptTemplate.from_template(
"""
You are Khoj, an extremely smart and helpful search assistant.
{personality_context}
- You have access to a variety of data sources to help you answer the user's question
- You can use the data sources listed below to collect more relevant information
- You can select certain types of output to respond to the user's question. Select just one output type to answer the user's question
- You can use any combination of these data sources and output types to answer the user's question
- You can only select one output type to answer the user's question
- You have access to a variety of data sources to help you answer the user's question.
- You can use any subset of data sources listed below to collect more relevant information.
- You can select the most appropriate output format from the options listed below to respond to the user's question.
- Both the data sources and output format should be selected based on the user's query and relevant context provided in the chat history.
Which of the tools listed below you would use to answer the user's question? You **only** have access to the following:
Which of the data sources, output format listed below would you use to answer the user's question? You **only** have access to the following:
Inputs:
{tools}
Data Sources:
{sources}
Outputs:
Output Formats:
{outputs}
Here are some examples:
Example:
Chat History:
User: I'm thinking of moving to a new city. I'm trying to decide between New York and San Francisco.
User: I'm thinking of moving to a new city. I'm trying to decide between New York and San Francisco
AI: Moving to a new city can be challenging. Both New York and San Francisco are great cities to live in. New York is known for its diverse culture and San Francisco is known for its tech scene.
Q: What is the population of each of those cities?
Khoj: {{"source": ["online"], "output": ["text"]}}
Q: Chart the population growth of each of those cities in the last decade
Khoj: {{"source": ["online", "code"], "output": "text"}}
Example:
Chat History:
User: I'm thinking of my next vacation idea. Ideally, I want to see something new and exciting.
User: I'm thinking of my next vacation idea. Ideally, I want to see something new and exciting
AI: Excellent! Taking a vacation is a great way to relax and recharge.
Q: Where did Grandma grow up?
Khoj: {{"source": ["notes"], "output": ["text"]}}
Khoj: {{"source": ["notes"], "output": "text"}}
Example:
Chat History:
@@ -668,7 +667,7 @@ User: Good morning
AI: Good morning! How can I help you today?
Q: How can I share my files with Khoj?
Khoj: {{"source": ["default", "online"], "output": ["text"]}}
Khoj: {{"source": ["default", "online"], "output": "text"}}
Example:
Chat History:
@@ -676,17 +675,18 @@ User: What is the first element in the periodic table?
AI: The first element in the periodic table is Hydrogen.
Q: Summarize this article https://en.wikipedia.org/wiki/Hydrogen
Khoj: {{"source": ["webpage"], "output": ["text"]}}
Khoj: {{"source": ["webpage"], "output": "text"}}
Example:
Chat History:
User: I want to start a new hobby. I'm thinking of learning to play the guitar.
AI: Learning to play the guitar is a great hobby. It can be a lot of fun and a great way to express yourself.
User: I'm learning to play the guitar, so I can make a band with my friends
AI: Learning to play the guitar is a great hobby. It can be a fun way to socialize and express yourself.
Q: Draw a painting of a guitar.
Khoj: {{"source": ["general"], "output": ["image"]}}
Q: Create a painting of my recent jamming sessions
Khoj: {{"source": ["notes"], "output": "image"}}
Now it's your turn to pick the sources and output to answer the user's query. Respond with a JSON object, including both `source` and `output`. The values should be a list of strings. Do not say anything else.
Now it's your turn to pick the appropriate data sources and output format to answer the user's query. Respond with a JSON object, including both `source` and `output` in the following format. Do not say anything else.
{{"source": list[str], "output': str}}
Chat History:
{chat_history}

View File

@@ -46,7 +46,7 @@ from khoj.routers.helpers import (
FeedbackData,
acreate_title_from_history,
agenerate_chat_response,
aget_relevant_tools_to_execute,
aget_data_sources_and_output_format,
construct_automation_created_message,
create_automation,
gather_raw_query_files,
@@ -752,7 +752,7 @@ async def chat(
attached_file_context = gather_raw_query_files(query_files)
if conversation_commands == [ConversationCommand.Default] or is_automated_task:
conversation_commands = await aget_relevant_tools_to_execute(
chosen_io = await aget_data_sources_and_output_format(
q,
meta_log,
is_automated_task,
@@ -762,6 +762,7 @@ async def chat(
query_files=attached_file_context,
tracer=tracer,
)
conversation_commands = chosen_io.get("sources") + [chosen_io.get("output")]
# If we're doing research, we don't want to do anything else
if ConversationCommand.Research in conversation_commands:

View File

@@ -336,7 +336,7 @@ async def acheck_if_safe_prompt(system_prompt: str, user: KhojUser = None, lax:
return is_safe, reason
async def aget_relevant_tools_to_execute(
async def aget_data_sources_and_output_format(
query: str,
conversation_history: dict,
is_task: bool,
@@ -345,33 +345,33 @@ async def aget_relevant_tools_to_execute(
agent: Agent = None,
query_files: str = None,
tracer: dict = {},
):
) -> Dict[str, Any]:
"""
Given a query, determine which of the available tools the agent should use in order to answer appropriately.
Given a query, determine which of the available data sources and output modes the agent should use to answer appropriately.
"""
tool_options = dict()
tool_options_str = ""
source_options = dict()
source_options_str = ""
agent_tools = agent.input_tools if agent else []
agent_sources = agent.input_tools if agent else []
for tool, description in tool_descriptions_for_llm.items():
tool_options[tool.value] = description
if len(agent_tools) == 0 or tool.value in agent_tools:
tool_options_str += f'- "{tool.value}": "{description}"\n'
for source, description in tool_descriptions_for_llm.items():
source_options[source.value] = description
if len(agent_sources) == 0 or source.value in agent_sources:
source_options_str += f'- "{source.value}": "{description}"\n'
mode_options = dict()
mode_options_str = ""
output_options = dict()
output_options_str = ""
output_modes = agent.output_modes if agent else []
agent_outputs = agent.output_modes if agent else []
for mode, description in mode_descriptions_for_llm.items():
for output, description in mode_descriptions_for_llm.items():
# Do not allow tasks to schedule another task
if is_task and mode == ConversationCommand.Automation:
if is_task and output == ConversationCommand.Automation:
continue
mode_options[mode.value] = description
if len(output_modes) == 0 or mode.value in output_modes:
mode_options_str += f'- "{mode.value}": "{description}"\n'
output_options[output.value] = description
if len(agent_outputs) == 0 or output.value in agent_outputs:
output_options_str += f'- "{output.value}": "{description}"\n'
chat_history = construct_chat_history(conversation_history)
@@ -384,8 +384,8 @@ async def aget_relevant_tools_to_execute(
relevant_tools_prompt = prompts.pick_relevant_tools.format(
query=query,
tools=tool_options_str,
outputs=mode_options_str,
sources=source_options_str,
outputs=output_options_str,
chat_history=chat_history,
personality_context=personality_context,
)
@@ -403,43 +403,42 @@ async def aget_relevant_tools_to_execute(
response = clean_json(response)
response = json.loads(response)
input_tools = [q.strip() for q in response.get("source", []) if q.strip()]
output_modes = [q.strip() for q in response.get("output", ["text"]) if q.strip()] # Default to text output
selected_sources = [q.strip() for q in response.get("source", []) if q.strip()]
selected_output = response.get("output", "text").strip() # Default to text output
if not isinstance(input_tools, list) or not input_tools or len(input_tools) == 0:
if not isinstance(selected_sources, list) or not selected_sources or len(selected_sources) == 0:
raise ValueError(
f"Invalid response for determining relevant tools: {input_tools}. Raw Response: {response}"
f"Invalid response for determining relevant tools: {selected_sources}. Raw Response: {response}"
)
final_response = [] if not is_task else [ConversationCommand.AutomatedTask]
for llm_suggested_tool in input_tools:
result: Dict = {"sources": [], "output": None} if not is_task else {"output": ConversationCommand.AutomatedTask}
for selected_source in selected_sources:
# Add a double check to verify it's in the agent list, because the LLM sometimes gets confused by the tool options.
if llm_suggested_tool in tool_options.keys() and (
len(agent_tools) == 0 or llm_suggested_tool in agent_tools
if (
selected_source in source_options.keys()
and isinstance(result["sources"], list)
and (len(agent_sources) == 0 or selected_source in agent_sources)
):
# Check whether the tool exists as a valid ConversationCommand
final_response.append(ConversationCommand(llm_suggested_tool))
result["sources"].append(ConversationCommand(selected_source))
for llm_suggested_output in output_modes:
# Add a double check to verify it's in the agent list, because the LLM sometimes gets confused by the tool options.
if llm_suggested_output in mode_options.keys() and (
len(output_modes) == 0 or llm_suggested_output in output_modes
):
# Check whether the tool exists as a valid ConversationCommand
final_response.append(ConversationCommand(llm_suggested_output))
# Add a double check to verify it's in the agent list, because the LLM sometimes gets confused by the tool options.
if selected_output in output_options.keys() and (len(agent_outputs) == 0 or selected_output in agent_outputs):
# Check whether the tool exists as a valid ConversationCommand
result["output"] = ConversationCommand(selected_output)
if is_none_or_empty(final_response):
if len(agent_tools) == 0:
final_response = [ConversationCommand.Default, ConversationCommand.Text]
if is_none_or_empty(result):
if len(agent_sources) == 0:
result = {"sources": [ConversationCommand.Default], "output": ConversationCommand.Text}
else:
final_response = [ConversationCommand.General, ConversationCommand.Text]
result = {"sources": [ConversationCommand.General], "output": ConversationCommand.Text}
except Exception as e:
logger.error(f"Invalid response for determining relevant tools: {response}. Error: {e}", exc_info=True)
if len(agent_tools) == 0:
final_response = [ConversationCommand.Default, ConversationCommand.Text]
else:
final_response = agent_tools
return final_response
sources = agent_sources if len(agent_sources) > 0 else [ConversationCommand.Default]
output = agent_outputs[0] if len(agent_outputs) > 0 else ConversationCommand.Text
result = {"sources": sources, "output": output}
return result
async def infer_webpage_urls(