Encourage reasoner, grounder to work better together in binary operator

- Encourage grounder to adhere to the reasoners action instruction - Encourage reasoner to explore other actions when stuck in a loop Previously seemed to be forcing it too strongly to choose "single most important" next action. So may not have been exploring other actions to achieve objective on initial failure.
2026-04-28 00:19:25 +00:00 · 2025-05-10 16:33:44 -06:00
parent ac19f6d336
commit 9f3fbf9021
3 changed files with 5 additions and 3 deletions
@@ -1121,7 +1121,7 @@ terrarium_sandbox_context = """

 operator_execution_context = PromptTemplate.from_template(
    """
-Use the provided context from operating a browser to inform your response.
+Use the results of operating a web browser to inform your response.

 Browser Operation Results:
 {operator_results}
@@ -38,6 +38,7 @@ class GroundingAgentUitars:
    UITARS_USR_PROMPT_THOUGHT = """
    You are a GUI agent. You are given a task and a screenshot of the web browser tab you operate. You need to perform the next action to complete the task.
    You control a single tab in a Chromium browser. You cannot access the OS, filesystem, the application window or the addressbar.
+    Try fulfill the user instruction to the best of your ability, especially when the instruction is given multiple times. Do not ignore the instruction.

    ## Output Format
    ```
@@ -90,16 +90,17 @@ class BinaryOperatorAgent(OperatorAgent):
        """
        reasoning_system_prompt = f"""
 # Introduction
-* You are Khoj, a smart web browsing assistant. You help the user accomplish their task using a web browser.
+* You are Khoj, a smart and resourceful web browsing assistant. You help the user accomplish their task using a web browser.
 * You are given the user's query and screenshots of the browser's state transitions.
 * The current date is {datetime.today().strftime('%A, %B %-d, %Y')}.
 * The current URL is {current_state.url}.

 # Your Task
 * First look at the screenshots carefully to notice all pertinent information.
-* Then instruct a tool AI to perform the single most important next action to progress towards the user's goal.
+* Then instruct a tool AI to perform the next action that will help you progress towards the user's goal.
 * Make sure you scroll down to see everything before deciding something isn't available.
 * Perform web searches using DuckDuckGo. Don't use Google even if requested as the query will fail.
+* Use your creativity to find alternate ways to make progress if you get stuck at any point.

 # Tool AI Capabilities
 * The tool AI only has access to the current screenshot and your instructions. It uses your instructions to perform the next action on the page.