STRING is a comprehensive database of known and predicted protein-protein interactions covering 59M proteins and 20B+ interactions across 5000+ organisms. Query interaction networks, perform functional enrichment, discover partners via REST API for systems biology and pathway analysis.
When to Use This Skill
This skill should be used when:
Retrieving protein-protein interaction networks for single or multiple proteins
Performing functional enrichment analysis (GO, KEGG, Pfam) on protein lists
Discovering interaction partners and expanding protein networks
Testing if proteins form significantly enriched functional modules
Generating network visualizations with evidence-based coloring
Analyzing homology and protein family relationships
Conducting cross-species protein interaction comparisons
Identifying hub proteins and network connectivity patterns
Quick Start
The skill provides:
Python helper functions (scripts/string_api.py) for all STRING REST API operations
Comprehensive reference documentation (references/string_reference.md) with detailed API specifications
When users request STRING data, determine which operation is needed and use the appropriate function from scripts/string_api.py.
Core Operations
1. Identifier Mapping ()
string_map_ids
Convert gene names, protein names, and external IDs to STRING identifiers.
When to use: Starting any STRING analysis, validating protein names, finding canonical identifiers.
Usage:
from scripts.string_api import string_map_ids
# Map single protein
result = string_map_ids('TP53', species=9606)
# Map multiple proteins
result = string_map_ids(['TP53', 'BRCA1', 'EGFR', 'MDM2'], species=9606)
# Map with multiple matches per query
result = string_map_ids('p53', species=9606, limit=5)
900 (highest confidence): Very stringent, experimental evidence preferred
Trade-offs:
Lower thresholds: More interactions (higher recall, more false positives)
Higher thresholds: Fewer interactions (higher precision, more false negatives)
Network Types
Functional Networks (Default)
Includes all evidence types (experimental, computational, text-mining). Represents proteins that are functionally associated, even without direct physical binding.
When to use:
Pathway analysis
Functional enrichment studies
Systems biology
Most general analyses
Physical Networks
Only includes evidence for direct physical binding (experimental data and database annotations for physical interactions).
When to use:
Structural biology studies
Protein complex analysis
Direct binding validation
When physical contact is required
API Best Practices
Always map identifiers first: Use string_map_ids() before other operations for faster queries
Use STRING IDs when possible: Use format 9606.ENSP00000269305 instead of gene names
Specify species for networks >10 proteins: Required for accurate results
Respect rate limits: Wait 1 second between API calls
Use versioned URLs for reproducibility: Available in reference documentation
Handle errors gracefully: Check for "Error:" prefix in returned strings
Choose appropriate confidence thresholds: Match threshold to analysis goals
Detailed Reference
For comprehensive API documentation, complete parameter lists, output formats, and advanced usage, refer to references/string_reference.md. This includes:
Complete API endpoint specifications
All supported output formats (TSV, JSON, XML, PSI-MI)
Advanced features (bulk upload, values/ranks enrichment)
Error handling and troubleshooting
Integration with other tools (Cytoscape, R, Python libraries)
Data license and citation information
Troubleshooting
No proteins found:
Verify species parameter matches identifiers
Try mapping identifiers first with string_map_ids()
Check for typos in protein names
Empty network results:
Lower confidence threshold (required_score)
Check if proteins actually interact
Verify species is correct
Timeout or slow queries:
Reduce number of input proteins
Use STRING IDs instead of gene names
Split large queries into batches
"Species required" error:
Add species parameter for networks with >10 proteins
Always include species for consistency
Results look unexpected:
Check STRING version with string_version()
Verify network_type is appropriate (functional vs physical)
Review confidence threshold selection
Additional Resources
For proteome-scale analysis or complete species network upload: