Wikipedia

Website: https://wikipedia.org CLI Tool: curl Authentication: None required for reading (anonymous access allowed)

Description

Wikipedia is the world's largest free online encyclopedia, containing over 60 million articles in 300+ languages. AI agents can access Wikipedia content through the MediaWiki API, which provides structured access to articles, search functionality, and metadata. The API is designed for programmatic access and returns JSON or XML responses.

Commands

Search Articles

# Search Wikipedia articles
curl "https://en.wikipedia.org/w/api.php?action=opensearch&search=Artificial%20Intelligence&limit=10&format=json"

Search for Wikipedia articles by keyword. Returns article titles, descriptions, and URLs. Use URL encoding for spaces and special characters.

Get Article Content (Plain Text)

# Get article extract in plain text
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Artificial%20Intelligence&prop=extracts&explaintext=true&format=json"

Retrieve article content as plain text. Use explaintext=true to remove HTML formatting. Returns full article text or summary.

Get Article Content (HTML)

# Get article content with HTML formatting
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Machine%20Learning&prop=extracts&format=json"

Retrieve article content with HTML formatting preserved. Useful for preserving structure and links.

Get Article Summary

# Get article summary (first section only)
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Python&prop=extracts&exintro=true&explaintext=true&format=json"

Get just the introduction/summary of an article. Use exintro=true to limit to the first section. Ideal for quick lookups.

Get Article by Page ID

# Get article by numeric page ID
curl "https://en.wikipedia.org/w/api.php?action=query&pageids=1234567&prop=extracts&explaintext=true&format=json"

Retrieve article using its numeric page ID instead of title. Useful when you have stored page IDs.

Get Multiple Articles

# Get multiple articles in one request
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Python|JavaScript|Ruby&prop=extracts&exintro=true&explaintext=true&format=json"

Fetch multiple articles in a single API call. Separate titles with pipe character (|). Maximum 50 titles per request.

Get Article Metadata

# Get article info (page ID, last edit, length)
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Wikipedia&prop=info&format=json"

Retrieve metadata about an article including page ID, last revision timestamp, page length, and protection status.

Get Article Categories

# Get categories for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&format=json"

List all categories assigned to an article. Useful for understanding article classification.

Get Article Links

# Get all links from an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Quantum%20Computing&prop=links&pllimit=50&format=json"

Get all internal Wikipedia links from an article. Use pllimit to control number of results (max 500).

Get Article Images

# Get images used in an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Solar%20System&prop=images&format=json"

List all images included in an article. Returns image filenames.

Get Image URL

# Get actual URL of an image file
curl "https://en.wikipedia.org/w/api.php?action=query&titles=File:Example.jpg&prop=imageinfo&iiprop=url&format=json"

Get the full URL to download an image file. Use titles=File: prefix for image lookups.

Search with Suggestions

# Get search suggestions (autocomplete)
curl "https://en.wikipedia.org/w/api.php?action=opensearch&search=Quantum&limit=10&format=json"

Get search suggestions for partial queries. Useful for autocomplete functionality. Includes typo correction.

Advanced Search (Full Text)

# Full-text search with snippets
curl "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=artificial%20intelligence&srlimit=10&format=json"

Perform full-text search across Wikipedia. Returns snippets showing search term context. More detailed than opensearch.

Get Random Article

# Get random article
curl "https://en.wikipedia.org/w/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json"

Get a random Wikipedia article. Use rnnamespace=0 for main articles only (excludes talk pages, etc.).

Get Article Revisions

# Get revision history for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Blockchain&prop=revisions&rvlimit=10&rvprop=timestamp|user|comment&format=json"

Get revision history showing who edited an article and when. Use rvlimit to control number of revisions returned.

Get Article in Different Language

# Get article title in other languages
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Computer&prop=langlinks&lllimit=50&format=json"

Get links to the same article in other language editions of Wikipedia. Useful for multilingual content.

Check if Article Exists

# Check if page exists
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Example%20Article&format=json"

Check if an article exists. Response includes "missing" key if page doesn't exist.

Get Article Coordinates

# Get geographic coordinates for an article
curl "https://en.wikipedia.org/w/api.php?action=query&titles=Eiffel%20Tower&prop=coordinates&format=json"

Get GPS coordinates for articles about places. Returns latitude and longitude.

Get Page View Statistics

# Get page view count (requires different API)
curl "https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Python/daily/20240101/20240131"

Get page view statistics for an article over a date range. Uses Wikimedia REST API (separate from MediaWiki API).

Examples

Simple Article Lookup Workflow

# Search for article
SEARCH=$(curl -s "https://en.wikipedia.org/w/api.php?action=opensearch&search=Python%20programming&limit=5&format=json")
echo $SEARCH | jq '.[1][0]'  # First result title

# Get article summary
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Python%20(programming%20language)&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'

Research Topic Workflow

# Get main article
TOPIC="Artificial Intelligence"
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$TOPIC&prop=extracts|categories|links&explaintext=true&format=json" > article.json

# Extract text
jq '.query.pages[].extract' article.json

# Get related topics via links
jq '.query.pages[].links[].title' article.json | head -20

# Get categories
jq '.query.pages[].categories[].title' article.json

Multi-Language Content Access

# Get article in English
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'

# Get same article in German
curl -s "https://de.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'

# Get article in French
curl -s "https://fr.wikipedia.org/w/api.php?action=query&titles=Berlin&prop=extracts&exintro=true&explaintext=true&format=json" | jq '.query.pages[].extract'

Fact-Checking Workflow

# Search for topic
curl -s "https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=climate%20change&format=json" | jq '.query.search[] | {title: .title, snippet: .snippet}'

# Get full article with metadata
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Climate%20change&prop=extracts|info|revisions&explaintext=true&rvlimit=1&format=json" > climate.json

# Check when last updated
jq '.query.pages[].revisions[0].timestamp' climate.json

# Get article text
jq '.query.pages[].extract' climate.json

Image Extraction Workflow

# Get images from article
IMAGES=$(curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Mars&prop=images&format=json")
echo $IMAGES | jq '.query.pages[].images[].title'

# Get URL for first image
IMAGE_NAME=$(echo $IMAGES | jq -r '.query.pages[].images[0].title')
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$IMAGE_NAME&prop=imageinfo&iiprop=url&format=json" | jq '.query.pages[].imageinfo[0].url'

Python Script Example

import requests
import json

def get_wikipedia_summary(title):
    """Get Wikipedia article summary."""
    url = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "titles": title,
        "prop": "extracts",
        "exintro": True,
        "explaintext": True,
        "format": "json"
    }

    response = requests.get(url, params=params)
    data = response.json()

    # Extract the page content
    pages = data["query"]["pages"]
    page_id = list(pages.keys())[0]

    if "missing" in pages[page_id]:
        return None

    return pages[page_id]["extract"]

# Usage
summary = get_wikipedia_summary("Machine Learning")
print(summary)

Batch Article Processing

# Create list of topics
TOPICS=("Python" "JavaScript" "Ruby" "Go" "Rust")

# Fetch all articles
for topic in "${TOPICS[@]}"; do
    echo "=== $topic ==="
    curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=$topic&prop=extracts&exintro=true&explaintext=true&format=json" | jq -r '.query.pages[].extract' | head -5
    echo ""
done

Monitor Article Changes

# Get current revision info
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Bitcoin&prop=revisions&rvlimit=1&rvprop=timestamp|user|comment&format=json" > bitcoin_latest.json

# Check latest edit
jq '.query.pages[] | {
    timestamp: .revisions[0].timestamp,
    user: .revisions[0].user,
    comment: .revisions[0].comment
}' bitcoin_latest.json

Geographic Data Extraction

# Get articles with coordinates near a location
curl -s "https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gscoord=37.7749|-122.4194&gsradius=10000&gslimit=10&format=json" | jq '.query.geosearch[] | {title: .title, dist: .dist}'

# Get coordinates for specific place
curl -s "https://en.wikipedia.org/w/api.php?action=query&titles=Golden%20Gate%20Bridge&prop=coordinates&format=json" | jq '.query.pages[].coordinates[0] | {lat: .lat, lon: .lon}'

Notes

No Authentication Required: Wikipedia's API is fully open for reading. No API keys or registration needed for anonymous access.
Rate Limits:
No hard rate limit for anonymous users, but excessive usage may be throttled
Recommended: Max 200 requests per second for bursts, average 1-2 requests per second
Use User-Agent header to identify your bot: curl -A "MyBot/1.0 (contact@example.com)"
Respectful usage is enforced by community guidelines, not technical limits
API Endpoints:
Action API: https://en.wikipedia.org/w/api.php (main API, used in all examples)
REST API: https://en.wikipedia.org/api/rest_v1/ (newer, mobile-focused)
Wikimedia API: https://wikimedia.org/api/rest_v1/ (cross-wiki statistics)
Language Support: Change domain for different languages:
English: en.wikipedia.org
Spanish: es.wikipedia.org
French: fr.wikipedia.org
German: de.wikipedia.org
Full list: https://meta.wikimedia.org/wiki/List_of_Wikipedias
Output Formats:
JSON (recommended): format=json
XML: format=xml
PHP serialized: format=php
YAML: format=yaml
Always use format=json for AI agent consumption
Text Extraction Options:
explaintext=true: Returns plain text without HTML
exintro=true: Returns only the introduction section
exsentences=N: Returns first N sentences
exchars=N: Returns first N characters (approximate)
API Limits Per Request:
Multiple titles: Max 50 per request (use pipe separator: Python|Java|C++)
Links: Max 500 per request (use pllimit=500)
Categories: Max 500 per request
Images: Max 500 per request
Revisions: Max 500 per request (use rvlimit=500)
Best Practices for AI Agents:
Always include a descriptive User-Agent header
Cache responses to avoid repeated requests for same content
Use exintro=true for summaries instead of full articles when possible
Batch requests using pipe-separated titles when fetching multiple articles
Use continue parameter for paginated results
Handle "missing" pages gracefully in your code
Respect Wikipedia's content licenses (CC BY-SA 4.0)
Error Handling:
Missing page: Response includes "missing": "" key
Invalid title: Response includes "invalid": "" key
API errors: Check "error" key in response
Network timeouts: Implement retry logic with exponential backoff
Always check response structure before accessing nested fields
URL Encoding:
Spaces: Use %20 or + in URLs
Special characters: URL encode using standard encoding
Bash: Use curl with quotes to handle spaces
Python: Use requests library which handles encoding automatically
Content Parsing Tips:
Use jq for JSON parsing in bash scripts
Page ID is the key in .query.pages object (not always sequential)
Extract page content: .query.pages[].extract
Handle multiple pages: Iterate over .query.pages | to_entries
Remove HTML: Use explaintext=true or parse HTML with library
Page Namespaces:
0: Main articles (default)
1: Talk pages
2: User pages
6: Files/Images
14: Categories
Use rnnamespace=0 to limit results to main articles
MediaWiki API Documentation:
Full API docs: https://www.mediawiki.org/wiki/API:Main_page
API sandbox (interactive): https://en.wikipedia.org/wiki/Special:ApiSandbox
Query examples: https://www.mediawiki.org/wiki/API:Query
Help with specific action: Add action=help&modules=query to any request
Content Licensing:
Text: Creative Commons BY-SA 4.0 and GFDL
Images: Various licenses (check individual file pages)
You must attribute Wikipedia and preserve license
Commercial use is allowed with proper attribution
See: https://en.wikipedia.org/wiki/Wikipedia:Copyrights
Alternative Tools:
wikipedia Python library: Simplified Wikipedia API wrapper
wptools Python library: Advanced Wikipedia/Wikidata tool
wtf_wikipedia JavaScript library: Wikipedia text parser
mwclient Python library: MediaWiki API client
pywikibot Python framework: Bot framework for Wikipedia
Advanced Features:
Wikidata integration: Get structured data via Wikidata API
Page previews: Use REST API for mobile-optimized previews
Nearby pages: Use geosearch for location-based queries
Citation extraction: Parse references from article HTML
Infobox data: Parse from HTML or use Wikidata API
Performance Optimization:
Use compression: Add Accept-Encoding: gzip header
Request only needed properties: Limit prop= parameter
Use page IDs when possible: Faster than title lookups
Enable HTTP/2: Supported on all Wikipedia domains
Keep-alive connections: Reuse TCP connections for multiple requests
Common Gotchas:
Page titles are case-sensitive (except first character)
Disambiguation pages have "(disambiguation)" suffix
Redirects: Check for "redirect": "" key in response
Some articles have protection (can't be edited)
Mobile vs desktop: Different content sometimes
Infoboxes and tables: Difficult to parse from plain text, use HTML
Mobile/Summary API (Alternative): bash # Simpler summary endpoint curl "https://en.wikipedia.org/api/rest_v1/page/summary/Python_(programming_language)"
Returns structured summary, image, and coordinates
Easier to parse than main API
Mobile-optimized content
Includes thumbnail image URL

WORLDBOOK

wikipedia

Wikipedia

Description

Commands

Search Articles

Get Article Content (Plain Text)

Get Article Content (HTML)

Get Article Summary

Get Article by Page ID

Get Multiple Articles

Get Article Metadata

Get Article Categories

Get Article Links

Get Article Images

Get Image URL

Search with Suggestions

Advanced Search (Full Text)

Get Random Article

Get Article Revisions

Get Article in Different Language

Check if Article Exists

Get Article Coordinates

Get Page View Statistics

Examples

Simple Article Lookup Workflow

Research Topic Workflow

Multi-Language Content Access

Fact-Checking Workflow

Image Extraction Workflow

Python Script Example

Batch Article Processing

Monitor Article Changes

Geographic Data Extraction

Notes

Get this worldbook via CLI

Comments (0)

Add a Comment

wikipedia

Wikipedia

Description

Commands

Search Articles

Get Article Content (Plain Text)

Get Article Content (HTML)

Get Article Summary

Get Article by Page ID

Get Multiple Articles

Get Article Metadata

Get Article Categories

Get Article Links

Get Article Images

Get Image URL

Search with Suggestions

Advanced Search (Full Text)

Get Random Article

Get Article Revisions

Get Article in Different Language

Check if Article Exists

Get Article Coordinates

Get Page View Statistics

Examples

Simple Article Lookup Workflow

Research Topic Workflow

Multi-Language Content Access

Fact-Checking Workflow

Image Extraction Workflow

Python Script Example

Batch Article Processing

Monitor Article Changes

Geographic Data Extraction

Notes

Get this worldbook via CLI

Comments (0)

Add a Comment

Related Worldbooks

GitLab

Git