LLM Layer¶
The LLM layer provides chat-model agents for query expansion, synthesis, and gap analysis. It is separate from the embedding model used by ranking, deduplication, relevance scoring, and clustering.
Source: src/models/, src/config/resolve_llm_features.py, src/config/model_selection.py.
Resolution phases¶
LLM behavior is determined in three phases before any stage makes a call:
flowchart TD
LOAD[Settings load] --> RESOLVE[resolve_effective_settings]
RESOLVE --> FACTORY[AgentFactory at call site]
FACTORY --> MODEL[Provider.create_model]
MODEL --> AGENT[Agent with role prompt]
Phase 1: Settings load¶
AppSettings merges kwargs → RA_* env vars → .env → YAML → defaults. Key LLM fields:
| Field | Default | Env override |
|---|---|---|
llm.provider |
"ollama" |
RA_LLM__PROVIDER |
llm.model |
"auto" |
RA_LLM__MODEL |
llm.base_url |
"http://localhost:11434" |
RA_LLM__BASE_URL |
llm.api_key |
None |
RA_LLM__API_KEY |
synthesis.llm_mode |
"auto" |
RA_SYNTHESIS__LLM_MODE |
query_expansion.llm_mode |
"auto" |
RA_QUERY_EXPANSION__LLM_MODE |
Phase 2: Feature flag resolution¶
resolve_effective_settings() runs at pipeline start (ResearchPipeline.execute()). It sets synthesis.llm_enabled and query_expansion.llm_enabled on a copy of settings passed to all stages via ctx.config.
Precedence per feature (synthesis, query_expansion):
RA_{SECTION}__LLM_ENABLEDenv override (true/false/1/0)llm_mode: on→ enabled;llm_mode: off→ disabledllm_mode: auto→ rules below
Auto-mode rules:
| Provider | synthesis LLM | query_expansion LLM |
|---|---|---|
openai, anthropic |
Always enabled | Always enabled |
ollama |
Enabled if ollama_models.yaml entry has synthesis.llm_enabled: true for resolved model |
Same catalog hint |
| Other | Disabled | Disabled |
Default quality
With Ollama llama3.2:3b (fallback) and llm_mode: auto, both synthesis and query expansion LLM are off. Reports use heuristics. See Heuristic vs LLM.
When provider is Ollama and model resolves from catalog, max_llm_papers may be updated from catalog hints (e.g., 8B model → 5 papers).
Phase 3: Model name resolution¶
When llm.model is "auto" or empty, resolve_llm_model_name() in src/config/model_selection.py:
- Loads
config/ollama_models.yaml - If
auto_select: true, detects system resources (RAM, disk, swap) - Picks highest-priority model that fits resources
- Falls back to catalog
fallbackmodel if none fit
Explicit model names (CLI, env, or YAML) skip auto-selection.
Phase 4: Provider and agent creation¶
AgentFactory (src/models/factory.py) resolves config, instantiates the provider, creates a pydantic-ai Agent with the role system prompt.
Provider registry¶
| Provider key | Class | Backend |
|---|---|---|
ollama |
OllamaProvider |
OpenAI-compatible API at normalized base_url |
openai |
OpenAIProviderImpl |
pydantic-ai OpenAI model |
anthropic |
AnthropicProviderImpl |
pydantic-ai Anthropic model |
Register custom providers with register_llm_provider().
API key resolution¶
| Provider | Key sources (priority) |
|---|---|
| Ollama | RA_LLM__API_KEY → OLLAMA_API_KEY → default "ollama" |
| OpenAI | RA_LLM__API_KEY → OPENAI_API_KEY (required) |
| Anthropic | RA_LLM__API_KEY → ANTHROPIC_API_KEY (required) |
Base URL normalization¶
normalize_openai_base_url() appends /v1 if missing. The code default is http://localhost:11434; .env.example may show http://localhost:11434/v1 — both resolve to the same endpoint.
Agent roles¶
| Role | Enum | Stage | Purpose |
|---|---|---|---|
| EXPANSION | AgentRole.EXPANSION |
query_expansion | JSON variants + sub_questions |
| EXTRACTION | AgentRole.EXTRACTION |
synthesis (Pass A) | Per-paper structured extraction |
| SYNTHESIS | AgentRole.SYNTHESIS |
synthesis (Pass B) | Cross-paper synthesis JSON |
| GAP_ANALYSIS | AgentRole.GAP_ANALYSIS |
gap_analysis | Gaps/opportunities JSON |
| ANALYSIS | AgentRole.ANALYSIS |
src/analysis/llm.py only |
Legacy module-level agent |
Runtime call sites¶
flowchart TD
RESOLVED[ctx.config after resolve_effective_settings]
RESOLVED --> QE{query_expansion.llm_enabled?}
QE -->|yes| QE_AGENT[AgentFactory EXPANSION]
QE -->|no| QE_SKIP[heuristics only]
RESOLVED --> SY{synthesis.llm_enabled?}
SY -->|yes| SY_A[EXTRACTION up to max_llm_papers]
SY_A --> SY_B[SYNTHESIS collective]
SY -->|no| SY_H[heuristic extraction + synthesis]
RESOLVED --> GA{synthesis.llm_enabled?}
GA -->|yes| GA_AGENT[GAP_ANALYSIS structured]
GA -->|no| GA_H[heuristic from synthesis fields]
Gap analysis coupling
Gap analysis LLM is gated by synthesis.llm_enabled, not a separate gap_analysis.llm_mode.
Embedding model (separate)¶
The chat LLM and embedding model are independent:
| Component | Config | Used by |
|---|---|---|
| Chat LLM | llm.* |
query_expansion, synthesis, gap_analysis |
| Embeddings | embedding.* |
deduplication, ranking, relevance_scoring, clustering |
Embedding provider: sentence-transformers via src/embeddings/. If not installed, embedding-dependent features degrade gracefully with warnings.
Non-pipeline LLM module¶
src/analysis/llm.py creates a module-level analysis_agent at import via AgentFactory() — not part of the 11-stage pipeline. Used by legacy/orchestrator helper paths.
Known gaps¶
| Issue | Detail |
|---|---|
llm.timeout_seconds |
Configured but unused; stage timeouts are pipeline-level |
llm.temperature |
Not passed to pydantic-ai model constructors |
expand_query_llm |
Uses get_settings().llm instead of ctx.config.llm |
analysis/llm.py |
Global agent at import — separate from pipeline |
Related pages¶
- Ollama — auto-select, catalog, setup integration
- Cloud providers — OpenAI, Anthropic
- Heuristic vs LLM — quality tradeoffs
- Synthesis stage — two-pass LLM flow
- Environment variables