Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.autoplay.ai/llms.txt

Use this file to discover all available pages before exploring further.

RAG context assembly: user query, real-time events, conversation history, and knowledge base flow into context assembly, system prompt, and LLM, producing a user-action-aware answer.
When a user asks a question about your software, the best answer rarely comes from a single source. To consistently deliver accurate, contextual responses, your pipeline needs to draw from four signals together:
SignalWhat it captures
User queryWhat the user is asking right now
Real-time product eventsWhat the user is actively doing in your product
Conversation historyWhat has already been discussed in this session
Knowledge baseRetrieved docs or chunks from your KB (when configured)
Weaving these together β€” rather than only querying a knowledge base in isolation β€” is what separates a generic AI response from one that feels genuinely helpful and contextually aware. autoplay_sdk.rag_query provides the framework to assemble these signals into a single, structured context block ready for any chat LLM.
This is not RagPipeline (ingestion β†’ vector store). rag_query is specifically for answering a user message using structured, multi-signal context at query time.
When to use ContextStore.enrich vs assemble_rag_chat_context:
  • ContextStore.enrich(session_id, query) returns one enriched string. Use it as retrieval input for embedding/vector search.
  • assemble_rag_chat_context(...) returns structured prompt parts (user_block, assembly). Use it to build chat LLM messages.

What it optimizes for

User query

The current message from the user β€” the question being answered right now.

Real-time events

What the user is doing in your product at this moment, including optional delta activity since their last chat message via session_activity_since.

Conversation history

Prior turns in the conversation, surfaced via ChatMemoryProvider.conversation_turns.

Knowledge base

Retrieved records from your KB via KnowledgeBaseRetriever on RagChatProviders when configured. The SDK is vendor-agnostic β€” swap in Zep, Postgres, Atlas, or any other backend behind the provided protocols.

Entry point

from autoplay_sdk.rag_query import (
    RagChatProviders,
    assemble_rag_chat_context,
    format_rag_system_prompt,
)
from autoplay_sdk.prompts import RAG_SYSTEM_PROMPT

# Implement ChatMemoryProvider + KnowledgeBaseRetriever for your stack, then:
user_block, assembly = await assemble_rag_chat_context(
    product_id="...",
    integration_config={"kb_knowledge_id": "..."},  # your KB ids
    conversation_id="...",
    user_message="How do I export?",
    email="user@example.com",
    session_id="sess_1",
    activity_since_cutoff=None,
    providers=your_rag_chat_providers,
)

system_text = format_rag_system_prompt(
    template_content=RAG_SYSTEM_PROMPT["content"],
    assembly=assembly,
    user_message="How do I export?",
)

# Pass system_text + user messages to your LLM.
# Log prompt_meta=RAG_SYSTEM_PROMPT for observability.
The assembled system_text bundles all three signals β€” query, events, and history β€” into a single prompt your LLM can reason over without additional orchestration.

Delta activity: since last chat message

To give your LLM visibility into product actions that happened after the user’s previous message, persist an inbound watermark per thread and pass its value into assembly.
1

Load the previous inbound timestamp

Before calling assemble_rag_chat_context, retrieve the watermark from your store:
previous_at = await store.get_previous_inbound_at(scope)
2

Pass the cutoff into assembly

user_block, assembly = await assemble_rag_chat_context(
    ...
    activity_since_cutoff=cutoff_for_delta_activity(previous_at),
)
3

Advance the cursor after replying

Once the assistant reply is successfully sent, move the watermark forward:
await store.set_last_inbound_at(
    scope,
    effective_inbound_timestamp(msg_created_at)
)
Use ChatWatermarkScope(conversation_id=..., product_id=...) (plus optional tenant_id) to key threads consistently across your store. For the store itself:
  • Production: implement InboundWatermarkStore backed by Redis or SQL.
  • Development / testing: use the built-in InMemoryInboundWatermarkStore.

Default prompts

The SDK ships versioned prompt dicts (each with name, description, version, and content fields):
PromptPurpose
RAG_SYSTEM_PROMPTPrimary system prompt for RAG chat assembly
REASONING_PROMPTGuides multi-step reasoning over retrieved context
RESPONSE_PROMPTShapes the final user-facing answer format
Import from autoplay_sdk.prompts or use the root package re-exports.

Observability

The SDK does not configure logging for you. Enable debug output from the assembly step:
import logging
logging.getLogger("autoplay_sdk.rag_query").setLevel(logging.DEBUG)
OutcomeLog levelWhat’s emitted
SuccessDEBUGStructured extra only: product_id, conversation_id, session_id, coarse flags (has_memory, has_kb, has_delta_activity), and character lengths β€” never full message text or prompt content
FailureWARNINGexc_info=True with the same correlation IDs, then re-raises the original exception (providers are not silently swallowed)
See Logging for full conventions covering the autoplay_sdk.* namespace, lazy % formatting, and safe extra fields.

See also

RagPipeline

Embedding and upsert from the event stream β€” the ingestion side of RAG.

ContextStore

enrich(session_id, query) for retrieval queries at the overview level.