Fixing OpenAI API Reasoning Streaming In JetBrains Koog
In the fast-paced world of AI development, especially when working with sophisticated models like those in the GPT-5 series, getting timely and accurate information is crucial. This is where the OpenAI Responses API and its ability to provide reasoning summaries comes into play. However, a recent discussion within the JetBrains and Koog communities has highlighted a snag: the OpenAILLMClient wasn't properly streaming these valuable reasoning summary text deltas to the caller. Furthermore, in stateless modes, where context passing is key, encrypted reasoning content might not be accumulating correctly, hindering the flow of information between turns in a conversation. This article dives deep into the issue, explores the current and expected behaviors, proposes a fix, and offers a workaround, all to ensure you can harness the full power of OpenAI's reasoning capabilities.
Understanding the Core Issue: Why Reasoning Summaries Aren't Streaming
The heart of the problem lies within the ai.koog.prompt.executor.clients.openai.OpenAILLMClient.executeResponsesStreaming() method. This component is responsible for processing the various event types that come back from the OpenAI API when you request streaming responses. Currently, it has a rather specific way of handling these events, focusing primarily on three types: OpenAIStreamEvent.ResponseOutputItemDone (for tool calls), OpenAIStreamEvent.ResponseCompleted (marking the end of the stream), and OpenAIStreamEvent.ResponseOutputTextDelta (for the main text output). The critical observation is that ResponseReasoningSummaryTextDelta events are simply being ignored. This means that as the model is generating its reasoning – those insightful snippets like "Investigating error source" – they aren't being sent back to you in real-time. Instead, you only get the complete reasoning summary after the entire response is finished, tucked away in the final ResponseCompleted event. This delay deprives users of the ability to see the model's thought process unfold live, which can be invaluable for debugging, understanding complex outputs, or simply providing a more dynamic user experience. Compounding this issue, the client also doesn't seem to capture ResponseOutputItemDone events specifically for reasoning items when item.type == "reasoning". This means the encrypted reasoning content, which is vital for maintaining context in stateless interactions, isn't being extracted. Consequently, the StreamFrame.End message, which should contain this contextual information for subsequent turns, is emitted with its messages field as null, even when reasoning was present. This oversight directly impacts applications relying on the continuity of reasoning context, such as agent frameworks that use this information to inform future actions.
The Expected Behavior: Real-Time Reasoning and Persistent Context
To truly leverage the power of the OpenAI Responses API, the expected behavior should mirror the capabilities the API offers. First and foremost, reasoning summaries should be streamed incrementally. When a configuration like ReasoningConfig.summary is set to DETAILED or AUTO, the API should emit events like `{