Stage: query_expansion¶

Generates query variants and sub-questions to improve retrieval coverage.


Class	`QueryExpansionStage`
Module	`src/research/query_expansion.py`
Registry key	`query_expansion`

Input / output¶

Direction	Type	Details
Input (`data`)	`str \\| QueryUnderstandingResult`	Uses `key_concepts` when understanding result provided
Output (`data`)	`ExpandedQuerySet`	Passed to retrieval
Artifacts written	`expanded_queries`	Debug visibility only

Behavior¶

Two-phase expansion:

Heuristics always run — synonym variants, concept permutations, sub-question templates
Optional LLM pass — when query_expansion.llm_enabled, calls AgentRole.EXPANSION for additional variants and sub-questions; merged with heuristic output (deduplicated, capped)

When LLM is disabled or fails, heuristic-only expansion is returned.

Settings inconsistency

expand_query_llm() reads get_settings().llm rather than ctx.config.llm. If settings differ between pipeline config and global singleton, LLM expansion may use unexpected provider/model.

Configuration¶

Key	Purpose
`query_expansion.llm_enabled`	Resolved at pipeline start from `llm_mode` + env
`query_expansion.llm_mode`	`auto` / `on` / `off`
`query_expansion.max_variants`	Cap on query variants
`query_expansion.max_sub_questions`	Cap on sub-questions

Env overrides: RA_QUERY_EXPANSION__LLM_ENABLED, RA_QUERY_EXPANSION__LLM_MODE.

LLM¶

Optional — AgentRole.EXPANSION when enabled. Heuristics always produce baseline expansion.

Timeout¶

pipeline.stage_timeout_seconds (default 300 s).

Recovery¶

On failure, returns prior data unchanged.

Metrics¶

variant_count
sub_question_count