Only send start llm response chat event once, after thoughts streamed

A previous regression resulted in the start llm response event being sent with every (non-thought) message chunk. It should only be sent once after thoughts and before first normal message chunk is streamed. Regression probably introduced with changes to stream thoughts. This should fix the chat streaming latency logs.
2026-04-28 00:19:25 +00:00 · 2025-07-22 18:29:01 -05:00
parent 15c6118142
commit 70cfaf72e9
1 changed files with 5 additions and 3 deletions
@@ -1397,6 +1397,7 @@ async def event_generator(
    )

    full_response = ""
+    message_start = True
    async for item in llm_response:
        # Should not happen with async generator. Skip.
        if item is None or not isinstance(item, ResponseWithThought):
@@ -1410,10 +1411,11 @@ async def event_generator(
            async for result in send_event(ChatEvent.THOUGHT, item.thought):
                yield result
            continue
-
        # Start sending response
-        async for result in send_event(ChatEvent.START_LLM_RESPONSE, ""):
-            yield result
+        elif message_start:
+            message_start = False
+            async for result in send_event(ChatEvent.START_LLM_RESPONSE, ""):
+                yield result

        try:
            async for result in send_event(ChatEvent.MESSAGE, message):