Asynchronous Requests for Generative AI Services

A scalable asynchronous service backend that enables configurable AI service integration and request handling. The system provides a flexible architecture for managing generative AI service requests, efficiently sharing data between pipeline steps, handling job queues, and tracking credit/resource usage across integrations.

Application Components

API (Django)

RESTful API service handling client requests and service configuration

User authentication and request management

Service configuration through admin interface

Request validation and queue management

Executor

Asynchronous handler for AI service integration

Task management and execution

Allocation and monitoring of shared resources like integration credits and maximum concurrent requests

Supports multiple AI service handlers

Recon

Request monitoring system

Ensures request completion

Resource usage tracking

System health monitoring

System Design

The system architecture is built around a request-based service model. Clients interact with services through RESTful endpoints (`/svc/<service>/<action>/`), which generate asynchronous requests. Each request is tracked through a dedicated request object that maintains state and results, accessible via `/svc/request/<id>/`.

The functional system components are:

Handlers: Service integration executors that process prompts against external APIs

Offers: Service interfaces that define handler interactions through templated prompts and set parameters

Deals: Bulk job orchestrators that combine multiple service offers

Service Handlers

Service Handlers are modular components that manage external API integrations. Each handler:

– Implements an `execute` function as the primary integration point

– Processes method-specific API requests based on the provided parameters

– Updates request objects with integration results and status

– Manages resource allocation and API quotas

Configuration requires a ServiceHandler database entry with:

```

id: <module>.<method> # Unique handler identifier

credits: <integer> # Internal API usage cost

“`

Resource specifications and requirements are managed through the configuration system. For implementation details, see Adding New Service Handlers.

Service Offers

Service Offers define the available actions for each service. They provide a standardized interface between clients and service handlers through configurable templates.

Configuration Parameters:

```

price: integer # Platform fee for the action

prompt_template: string # Template string with {variables}

handler: ServiceHandler # Associated handler entry

create_qty: integer # Number of results to generate

expiration: integer # Optional: Result expiration time in seconds

params: object # API integration configuration parameters

“`

When configured, each offer automatically exposes an endpoint at

/svc/<service>/<action>

The variables defined in the prompt template become the required input parameters for the endpoint.

Note: Services are virtual constructs defined by their available offers rather than standalone entities.

Service Deals

Service Deals enable bulk operations by combining multiple service offers into a single request. They support both required and optional service combinations.

Configuration:

```

Included Offers:

offers: [Service Offers] # Required service offers

price: integer # Fixed price for the deal

price_modifier: integer # Percentage discount on cumulative offer prices

Optional Add-ons:

addons: [Service Offers] # Optional service offers

addons_modifier: integer # Percentage discount on selected add-ons

“`

Deal Execution:

– Primary offers are always executed as part of the deal

– Add-on services can be selectively included

– Pricing is calculated based on:

Primary offers: Either fixed price or modified cumulative price
Add-ons: Modified cumulative price of selected services

Pipelines:

Each offer in a Service Deal can additionally be configured as a pipeline job, this will pass the output of the last job as the input. When an offer is configured as a pipeline it can further be configured to pass along only the job results (overwrite), the job results plus the results of the last job (extend), or just the results of the last job (pass through).

Request Workflow

Post request made by user to

 /svc/<service>/<action>/

API translates service and action to handler, method, and prompt template.
Payload values submitted in post request used to populate template.
Request ID generated from hash of user and inputs (prevents spam request / accidental multiple submits)
Request object stores Mongo request collection
Job message is built and sent to queue

Infrastructure

MongoDB: Primary database for storing request data, configurations, and service mappings

Redis: Queue management and job data storage

Prometheus: Metric collection for system monitoring

Grafana: Metric visualization and dashboards

Fluentd: Log aggregation and management

Alerts generated by Prometheus, Grafana and Recon sent to Discord server

Platform Administration

Admin Panel Configuration

Access the Django Admin site to configure:

– service handlers

– offers

– deals

Additionally, the use the management system to create/update/rollback the current application configs. Configs are stored as YAMLs with a custom protocol. See next section.

Configuration

App configs (and all historical versions) are stored centrally in Mongo.

The system uses YAML files for configuration with special protocols:

“`yaml

# Special Notations

~key: value # Encrypted values

^key: value # UTF-8 bytes string

key: !ENV ${VAR_NAME} # Environment variable

“`

when uploaded, ~keys are encrypted before being stored in DB

Meta Configuration

Each application component uses a bootstrap config.yaml:

“`yaml

version: 1.0

meta:

source: mongo

collection: !ENV${CFG_COLLECTION}

ids: # config, version

api: 1.0

service_map: 1.0

mongo:

uri: !ENV${MONGO_URI}

port: !ENV${MONGO_PORT}

db: !ENV${MONGO_DB}

# Key to decrypt configs
fernet_key: !ENV ${decrypt_key}

“`

When configs get imported from the DB, ~keys are decrypted and ^keys are cast to bytes

Logging and Monitoring

The Logging and Monitoring setup leverages the EFK stack for centralized log management and observability. A Prometheus exporter is configured for the Django API, enabling real-time monitoring. Custom Prometheus metrics in the Executor track total requests processed and job processing time, providing insights into system performance. To enhance traceability, the logging context includes a request ID, ensuring logs can be correlated across services. Grafana is used for visualization, offering clear and actionable insights into system health and performance.

Adding New Service Handlers

To integrate a new service handler:

1. Create handler class in `executor/handlers/`

2. Configure shared resources in `executor.resource_map`

3. Register handler in `service_map.py` INSTALLED_SERVICES

4. Add handler methods to `svc_handlers` config

5. Configure Service Handler in admin panel

Future enhancements

Websocket connection for request processing statuses
Kafka topics and configure handler specific Executors