LLM Council: Minimal HTTP Server Implementation

by Alex Johnson 48 views

In this article, we'll explore the implementation of a minimal HTTP server for the LLM (Language Model) Council, adhering to the principles outlined in ADR-009. This server will expose the run_full_council functionality via a REST API, ensuring a stateless and single-tenant environment. We will delve into the context, implementation details, acceptance criteria, dependencies, and references related to this crucial component.

Summary

The core objective is to create a minimal, stateless HTTP server that allows interaction with the LLM Council's run_full_council function through a REST API. This server will serve as the entry point for external requests, providing a clean and efficient way to trigger council deliberations. The emphasis on being stateless ensures that no persistent data storage is required, simplifying the server's architecture and enhancing its scalability. By adhering to the single-tenant principle, we limit the scope of access, focusing on a controlled environment suitable for development and testing. The BYOK (Bring Your Own Key) approach offers flexibility in authentication, allowing API keys to be passed either within the request body or read from the environment variables, catering to various deployment scenarios.

This implementation aims to provide a lightweight and easily deployable solution for developers to interact with the LLM Council. The server will handle incoming HTTP requests, validate API keys, execute the run_full_council function, and return the results in a structured format. The simplicity of the design ensures that the server remains focused on its core task: facilitating communication with the LLM Council without introducing unnecessary complexity. The choice of FastAPI as the framework further streamlines the development process, offering automatic data validation, API documentation, and asynchronous request handling capabilities.

The decision to create a minimal server aligns with the overall philosophy of the LLM Council project, which emphasizes modularity and clear separation of concerns. By keeping the HTTP server lightweight and focused, we ensure that it can be easily integrated into various environments and workflows. This approach also simplifies maintenance and future enhancements, as the codebase remains manageable and well-defined. The stateless nature of the server ensures that it can be scaled horizontally without the need for complex state management mechanisms. This is particularly important for handling increased traffic and ensuring the responsiveness of the LLM Council service. Overall, the minimal HTTP server implementation provides a crucial bridge between external clients and the core LLM Council functionality, enabling seamless interaction and efficient utilization of the council's capabilities.

Context

Understanding the context behind this implementation is crucial. Per ADR-009, the HTTP server must adhere to specific constraints:

  • Stateless: This means no databases or persistent storage. Each request should be treated independently, without relying on past interactions. This design choice simplifies the server's architecture and improves scalability.
  • Single-tenant: Authorization is limited to an optional environment token. This simplifies security for development and testing environments by not requiring complex user management.
  • BYOK (Bring Your Own Key): API keys can be passed in the request body or read from environment variables. This approach provides flexibility in how API keys are managed and supplied.

These constraints are essential for maintaining a lightweight and secure server suitable for the LLM Council's development and testing phases. The stateless nature of the server ensures that it can be scaled horizontally without the need for complex session management or data persistence mechanisms. This is particularly important for handling increased traffic and ensuring the responsiveness of the LLM Council service. The single-tenant architecture simplifies security considerations by limiting access to authorized users and preventing unauthorized access to sensitive data. The BYOK approach offers flexibility in API key management, allowing developers to choose the method that best suits their needs, whether it's passing the key directly in the request body or storing it in environment variables. This flexibility makes it easier to integrate the LLM Council into various environments and workflows.

By adhering to these constraints, the HTTP server implementation focuses on its core responsibility: providing a clean and efficient interface for interacting with the LLM Council's run_full_council function. The design choices reflect a commitment to simplicity, scalability, and security, ensuring that the server remains a reliable and maintainable component of the overall LLM Council architecture. The context provided by ADR-009 serves as a guiding principle throughout the implementation process, ensuring that the server meets the specific requirements and constraints of the project.

Implementation: src/llm_council/http_server.py

The implementation revolves around a Python file, src/llm_council/http_server.py, which uses the FastAPI framework to define the HTTP server. Let's break down the code:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import os

from llm_council import run_full_council

app = FastAPI(
 title="LLM Council",
 description="Local development server for LLM Council",
 version="1.0.0"
)

class CouncilRequest(BaseModel):
 prompt: str
 models: Optional[List[str]] = None
 api_key: Optional[str] = None

@app.post("/v1/council/run")
async def council_run(request: CouncilRequest):
 """Run the full council deliberation."""
 api_key = request.api_key or os.getenv("OPENROUTER_API_KEY")
 if not api_key:
 raise HTTPException(400, "API key required")
 
 stage1, stage2, stage3, metadata = await run_full_council(request.prompt)
 return {"stage1": stage1, "stage2": stage2, "stage3": stage3, "metadata": metadata}

@app.get("/health")
async def health():
 return {"status": "ok", "service": "llm-council-local"}

This code snippet demonstrates the simplicity and effectiveness of using FastAPI to create a robust HTTP server. FastAPI's declarative approach makes defining API endpoints and request models straightforward. The CouncilRequest class, defined using Pydantic's BaseModel, ensures that incoming requests are validated against a predefined schema. This helps prevent errors and ensures that the server receives the expected data. The /v1/council/run endpoint is the core of the server, responsible for executing the run_full_council function. It first retrieves the API key from either the request body or the environment variables, enforcing the BYOK principle. If no API key is provided, it raises an HTTP 400 error, ensuring that only authorized requests are processed. The run_full_council function is then called with the prompt from the request, and the results are returned in a JSON format. The /health endpoint provides a simple health check, returning a JSON response indicating the server's status. This endpoint is crucial for monitoring and ensuring the server's availability.

The use of asynchronous functions (async def) allows the server to handle multiple requests concurrently, improving its performance and responsiveness. FastAPI's automatic API documentation generation, accessible at /docs, further simplifies development and testing. The code is well-structured and easy to understand, making it maintainable and extensible. Overall, this implementation showcases the power of FastAPI in creating minimal, yet robust HTTP servers for complex applications like the LLM Council.

Acceptance Criteria

The following acceptance criteria must be met to ensure the HTTP server functions correctly:

  • [ ] POST /v1/council/run executes the full council and returns results. This is the primary function of the server, so successful execution is critical.
  • [ ] GET /health returns status. A health endpoint is necessary for monitoring the server's availability.
  • [ ] BYOK enforced (key in request or env). The server must correctly handle API keys, either from the request or the environment.
  • [ ] No database or persistent state. This ensures the server remains stateless, as per ADR-009.
  • [ ] OpenAPI docs available at /docs. FastAPI's automatic documentation generation simplifies testing and integration.

These acceptance criteria provide a clear and concise checklist for validating the functionality of the HTTP server. The POST /v1/council/run endpoint is the core of the server, and its ability to execute the full council deliberation and return the results is paramount. The GET /health endpoint serves as a vital health check, allowing monitoring systems to verify the server's availability and responsiveness. The BYOK enforcement ensures that the server adheres to the security requirements, correctly handling API keys passed either in the request or retrieved from environment variables. The absence of a database or persistent state confirms that the server remains stateless, simplifying its architecture and enhancing scalability. The availability of OpenAPI documentation at /docs streamlines the testing and integration process, providing a clear and comprehensive guide to the server's API.

By meeting these acceptance criteria, the HTTP server demonstrates its adherence to the design principles outlined in ADR-009 and its readiness for deployment. Each criterion addresses a specific aspect of the server's functionality, ensuring that it is reliable, secure, and easy to use. Regular testing against these criteria will help maintain the server's quality and ensure that it continues to meet the needs of the LLM Council project.

Depends On

This implementation depends on:

  • #25 (HTTP dependencies): This likely refers to a pull request or issue related to installing the necessary HTTP libraries, such as FastAPI.

Understanding the dependencies of the HTTP server is crucial for ensuring a smooth development and deployment process. The dependency on #25, which likely addresses the installation and configuration of HTTP-related libraries like FastAPI, highlights the foundational nature of these components. FastAPI provides the core framework for building the server, handling routing, request parsing, and response generation. Without these dependencies properly resolved, the server cannot function as intended.

By explicitly stating the dependencies, the implementation documentation ensures that developers are aware of the prerequisites and can take the necessary steps to set up their environment correctly. This reduces the likelihood of encountering issues during development and deployment. The dependency on #25 also suggests that the HTTP server implementation is part of a larger effort to establish the project's HTTP infrastructure, ensuring that all components are compatible and work seamlessly together.

References

  • ADR-009: This Architecture Decision Record provides the context and requirements for the HTTP server implementation.

Referencing ADR-009 is crucial for maintaining consistency and adherence to the project's architectural vision. ADR-009 serves as the guiding document for the HTTP server implementation, outlining the specific requirements and constraints that must be met. By explicitly referencing this document, the implementation documentation ensures that developers have a clear understanding of the rationale behind the design choices and can easily verify that the server aligns with the overall architecture. ADRs (Architecture Decision Records) play a vital role in documenting significant decisions made during the development process, providing a historical record of the project's evolution and ensuring that all stakeholders are aware of the underlying principles.

In conclusion, the implementation of a minimal HTTP server for the LLM Council, as detailed in src/llm_council/http_server.py, provides a crucial interface for interacting with the council's deliberations. By adhering to the principles of statelessness, single-tenancy, and BYOK, this server ensures a secure and efficient environment for development and testing. For further information on building APIs with FastAPI, consider exploring the official FastAPI Documentation. This comprehensive resource offers in-depth guidance on leveraging FastAPI's features and best practices for creating robust and scalable web applications.