Skip to content

Semantic Scholar

Semantic Scholar provides paper metadata and abstracts via the bulk search API. Enabled by default alongside OpenAlex.

Implementation: src/retrieval/providers/semantic_scholar.py

HTTP API

Attribute Value
Search URL GET https://api.semanticscholar.org/graph/v1/paper/search/bulk
Query params query, limit, fields=title,abstract,year,venue,url,externalIds
Authentication Optional — S2_API_KEY env → x-api-key header
Timeout 60s (search), 15s (health)
Retries 3 with exponential backoff
Rate limiting HTTP 429 → sleep Retry-After (default 60s)

Example request

GET https://api.semanticscholar.org/graph/v1/paper/search/bulk?query=transformer+attention&limit=8&fields=title,abstract,year,venue,url,externalIds
x-api-key: YOUR_KEY   # optional

API key

Variable Required Effect
S2_API_KEY No Higher rate limits when set

Without a key, anonymous rate limits apply. For heavy interactive use or batch jobs, set S2_API_KEY in .env.

Normalization

S2 field RetrievedPaper field
title title
abstract abstract
year year
venue venue
url url
externalIds.DOI doi (prefixed with https://doi.org/ if needed)

Citation count is not always present in the requested field set; ranking may rely more on embedding similarity for S2-sourced papers.

Configuration

retrieval:
  providers:
    semantic_scholar:
      enabled: true
S2_API_KEY=your_api_key_here
RA_RETRIEVAL__PROVIDERS__SEMANTIC_SCHOLAR__ENABLED=true

Health check

Minimal search with limit=1 against the bulk endpoint, 15s timeout.

Operational notes

  • Watch for 429 responses in logs — the client waits and retries automatically
  • Bulk search is used (not single-paper lookup) for query-variant parallelism
  • Combined with OpenAlex, provides good recall for CS/ML topics

See also: Provider matrix, Environment variables.