Citation Data Flow
Overview
Backend ask() → RAGState → JSON/SSE → Conversation.aguiState (raw)
→ CitationExtractor (schema firewall) → SourceReference (domain)
→ MessageState → buildSourceReferencesMap → CitationsSection
Layer 1: Backend (Python)
File: haiku/rag/skills/rag.py
class RAGState(BaseModel):
citations: list[Citation] = [] # session-global flat list
qa_history: list[QAHistoryEntry] = [] # per-turn Q&A pairs
class QAHistoryEntry(BaseModel):
question: str
answer: str
confidence: float = 0.9
citations: list[Citation] = []
The ask() skill sets citation.index (1-based, incremental across
session), appends to both citations and qa_history.
Layer 2: AG-UI transport (JSON over SSE)
Two event types carry state:
- StateSnapshotEvent: full state as JSON
- StateDeltaEvent: RFC 6902 JSON Patch operations
Example payload:
{
"rag": {
"qa_history": [
{
"question": "What is X?",
"answer": "X is...",
"citations": [
{
"index": 1,
"chunk_id": "abc123",
"document_id": "doc456",
"document_uri": "s3://bucket/file.pdf",
"document_title": "My Document",
"content": "The relevant text...",
"headings": ["Chapter 1", "Section 2"],
"page_numbers": [5, 6]
}
]
}
],
"citations": [ ... ]
}
}
Layer 3: Frontend raw state
File: packages/soliplex_client/lib/src/domain/conversation.dart
class Conversation {
final Map<String, dynamic> aguiState; // raw JSON, not parsed
final Map<String, MessageState> messageStates;
}
aguiState stays raw for diff-based extraction and forward
compatibility.
Layer 4: Citation extraction
File:
packages/soliplex_client/lib/src/application/citation_extractor.dart
Called by RunOrchestrator._extractCitations() at run completion:
final citations = _citationExtractor.extractNew(
_preRunAguiState, // aguiState BEFORE this run
conversation.aguiState, // aguiState AFTER this run
);
Algorithm: compare qa_history lengths, extract only new entries at
indices [previousLength, currentLength), convert schema Citation
to domain SourceReference.
CitationExtractor is the schema firewall — the only file importing generated types. Backend schema changes only affect this file.
Layer 5: Domain model
// packages/soliplex_client/lib/src/domain/message_state.dart
class MessageState {
final String userMessageId;
final List<SourceReference> sourceReferences;
final String? runId;
}
// packages/soliplex_client/lib/src/domain/source_reference.dart
class SourceReference {
final String documentId; // internal ID, not displayed
final String documentUri; // file path display
final String content; // markdown preview
final String chunkId; // chunk visualization API
final String? documentTitle; // via displayTitle extension
final List<String> headings; // breadcrumb
final List<int> pageNumbers; // via formattedPageNumbers extension
final int? index; // badge number (1-based from backend)
}
Multi-run accumulation
In tool-yielding loops, _preRunAguiState tracks state before each
run segment. Each segment's new citations merge with existing ones:
Run 1: extractNew({}, state1) → [sr1, sr2]
_preRunAguiState = state1
Run 2: extractNew(state1, state2) → [sr3, sr4]
merged = [sr1, sr2, sr3, sr4]
_preRunAguiState = state2
Final MessageState carries all citations and the last segment's
runId.
Type transformation
Python Citation (snake_case) → JSON → Dart Citation (generated, camelCase)
→ SourceReference (frontend-owned, stable)
File reference
| Purpose | File |
|---|---|
| Backend RAG state | haiku/rag/skills/rag.py |
| Event processing | agui_event_processor.dart |
| Conversation storage | conversation.dart |
| Citation extraction | citation_extractor.dart |
| Orchestrator coordination | run_orchestrator.dart |
| Generated schema | rag.dart |
| Domain: citations | source_reference.dart |
| Domain: message metadata | message_state.dart |