AI Module API: Detailed Specification And Usage

Nov 25, 2025 by Alex Johnson 48 views

AI Module API Specification: A Comprehensive Guide

This document provides a detailed specification for the AI Module API, outlining the request and response structures, action types, and example scenarios. This comprehensive guide is designed to help developers understand how to interact with the AI module effectively. Whether you're implementing new features or troubleshooting existing ones, this specification will serve as a valuable resource. The goal is to ensure seamless integration and optimal performance of the AI module within your applications. This AI Module API will help make the application development easier and scalable. So, understanding the nuances of this API specification is essential for any developer working with the AI module.

1. Introduction to the AI Module API

The AI Module API facilitates communication between applications and the AI module, enabling features such as natural language understanding, intent recognition, and automated actions. This API uses a POST method for requests and returns responses in JSON format. Let's delve deeper into the specifics of the API, exploring its structure, functionality, and usage through detailed examples and explanations. By understanding these foundational aspects, you'll be better equipped to leverage the AI module's capabilities and integrate them into your projects effectively.

1.1 Endpoint and Method

The primary endpoint for interacting with the AI module is /api/analyze, and it uses the POST method for all requests. This design choice ensures that all data is securely transmitted within the request body, aligning with industry best practices for API security and data handling. Understanding the endpoint and method is the first step in effectively utilizing the AI module. The use of POST allows for a more robust and secure exchange of data, which is particularly important when dealing with sensitive user input and contextual information. Furthermore, this consistent approach simplifies the integration process, as developers can rely on a single entry point for all AI-related functionalities.

Endpoint: /api/analyze
Method: POST

2. Request Message (Request Body)

The request body is a JSON object containing all the information needed for the AI module to process the user's input and context. It includes fields such as session_id, user_input, ocr_texts, and dialogue_history. Each of these fields plays a crucial role in providing the AI module with a comprehensive understanding of the user's current interaction and past conversations. A well-structured request body ensures that the AI module can accurately interpret the user's intent and respond appropriately. This section will break down each field, explaining its purpose and how it contributes to the overall AI processing. By mastering the structure of the request message, developers can optimize the AI module's performance and ensure seamless user interactions.

{
  "session_id": "sess_001",
  "user_input": "불고기 버거 하나",
  "ocr_texts": ["추천메뉴", "불고기버거", "4500원", "치즈버거", "다음"],
  "dialogue_history": []
}

2.1 Field Definitions (Top-level)

The request body consists of several key fields, each serving a specific purpose. Let's explore these fields in detail:

Field	Type	Required	Description
`session_id`	String	O	세션 추적 및 로깅용 ID
`user_input`	String	O	사용자 발화 (STT 결과)
`screen_context`	Array	O	화면 정보
`dialogue_history`	Array	O	이전 대화 목록 (빈 배열 가능)

2.1.1 `session_id`

The session_id is a unique string used to track and log user sessions. It is essential for maintaining context across multiple interactions and for debugging purposes. This identifier allows the AI module to associate a series of requests with a single user session, enabling features such as personalized responses and historical analysis. Proper management of session_id is crucial for ensuring data integrity and providing a seamless user experience. For instance, if a user orders multiple items, the session_id helps the system remember the context of the previous order, making the interaction more efficient and user-friendly.

2.1.2 `user_input`

The user_input field contains the user's utterance, typically the result of speech-to-text (STT) processing. This is the core input that the AI module analyzes to understand the user's intent. The quality of user_input directly impacts the accuracy of the AI's response, making it a critical component of the request. Ensuring the user_input is accurately captured and transmitted is paramount for effective AI interaction. For example, in a restaurant setting, the user_input might be "I want a cheeseburger and fries," which the AI module then processes to initiate the order.

2.1.3 `screen_context`

The screen_context provides the AI module with information about the current screen, such as text extracted via Optical Character Recognition (OCR). This context helps the AI module understand the available options and relevant information displayed to the user. The screen_context typically includes an array of text strings that the AI can use to identify actionable elements, such as buttons or menu items. This contextual awareness enables the AI to provide more targeted and relevant responses. For example, if the user says, "What are the options?" the AI can use the screen_context to list the available choices displayed on the screen.

2.1.4 `dialogue_history`

The dialogue_history is an array containing the history of previous interactions within the session. This context is vital for maintaining conversational flow and understanding complex user requests that span multiple turns. By reviewing the dialogue_history, the AI module can refer back to previous statements and user preferences, ensuring a coherent and personalized interaction. The dialogue_history can be an empty array if it's the first interaction of the session. For instance, if a user initially asks for a burger and then later asks to add fries, the dialogue_history allows the AI to understand that the user is modifying their existing order.

3. Response Message (Response Body)

The response body is a JSON object that the AI module returns after processing the request. It includes the status, confidence, response_message, and action fields. These fields provide a comprehensive overview of the AI's processing results, including its certainty level, the message to be displayed to the user, and the action the application should take. Understanding the response message is essential for correctly interpreting the AI's output and implementing the appropriate actions in the application. Let's examine each of these fields in detail to understand their individual contributions to the overall response.

{
  "status": "success",
  "confidence": 0.99,
  "response_message": "불고기버거를 선택합니다.",
  "action": {
    "type": "click_text",
    "params": {
      "target_text": "불고기버거"
    }
  }
}

3.1 Field Definitions

The response body comprises several key fields that communicate the AI module's processing results. The breakdown is as follows:

Field	Type	Description
`status`	String	`"success"`, `"ambiguous"`, `"fail"`
`confidence`	Float	신뢰도 (0.0 ~ 1.0)
`response_message`	String	사용자 안내 멘트 (TTS용 텍스트)
`action`	Object	수행할 행동 (아래 3. Action Type 참조)

3.1.1 `status`

The status field indicates the outcome of the AI module's processing. It can have one of three values: `