Schemas

There are four schemas in STRING that describes different aspects of the content.

schema description
evidence contains info of the underlying evidence for interactions.
homology blast hits used to propagate evidence to other species by means of homology.
items info about entries (protein names, species, orthgroups, etc.).
network the interactions and their scores.

Table: evidence.abstracts

Abstracts are used for showing the text bodies in the text-mining view.

field description
abstract_id abstract identifier (e.g. "PMID019234442", "OMIM000100070", etc.).
publication_date date of publication (e.g. "2009").
publication_source source. the name of journal(e.g. "Nature").
linkout_url link to publication.
title title of publication.
body the abstract of the publication.

Table: evidence.collections

The different sources of data from where STRING imports data (for the channels 'experiments' and 'databases').

field description
collection_id the name of the data source (e.g. "dip").
pubmed_id the pubmed identification number (e.g. "14681454").
comment short description of the data and the date it was imported.

Table: evidence.evidence_transfers

Evidence of interaction propagated across species using homology.

field description
target_protein_id_a interactor A that have received evidence from source.
target_protein_id_b interactor B that have received evidence from source.
transfer_score_c1 fraction of transfer score for interactor A. depends on how ambiguous the homology is. range from 0 to 1, higher score is better.
transfer_score_c2 fraction of transfer score for interactor A.
source_ascore coexpression score of the source interaction.
source_escore experimental score of the source interaction.
source_dscore database score of the source interaction.
source_tscore textmining score of the source interaction.

Table: evidence.fusion_evidence

Homology propagated evidence for fusion evidence, which originates from only one source protein.

field description
target_protein_id_a interactor A that have received evidence from source.
target_protein_id_b interactor A that have received evidence from source.
source_protein the source from which evidence for the interaction have have been transferred.
source_species species_id for the species of the source of the evidence.
transfer_score_c1 homology fraction of score.
transfer_score_c2 homology fraction of score.
fusion_score fusion score of the source interaction.

Table: evidence.items_abstracts

The abstracts in which a protein is mentioned.

field description
protein_id internal protein identifier.
abstract_id abstract identifier.
name name of gene in abstract.
abstract_length length of abstract in number of characters.
mesh_id identifier for MeSH (i.e. the controlled vocabulary of NCBI for different names that refer to the same concept).

Table: evidence.orthgroups_abstracts

The abstracts that mention any member of a orthologous group (redundant table w.r.t. evidence.items_abstracts)

field description
orthgroup_id internal orthologs group identifier.
abstract_id abstract identifier.

Table: evidence.orthgroups_sets

The sets that support evidence for interaction between orthologs groups.

field description
orthgroup_id internal orthologs group identifier.
set_id the identifier to external repository from where evidence was gathered
is_database_set this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel)

Table: evidence.sets

The sets of evidence that support interactions.

field description
set_id the identifier to external repository from where evidence was gathered (e.g. "BCRT1314").
collection_id the external repository (e.g. "biocarta").
title type of evidence (e.g. "curated pathway").
comment auxiliary information of the set.
url link to original data.

Table: evidence.sets_items

The members in the evidence sets. An interaction exists if two lines have the same set_id. set_id | identifier to a single of proteins in an external repository (protein complex, pathway or binary pair).

field description
protein_id internal protein identifier.
species_id taxonomy identifier.
is_database_set this flag is set to 'true' when the set concerns the 'database' channel in STRING (as opposed to the 'experiments' channel).

Table: evidence.sets_pubmedrefs

Supporting papers from external repository.

field description
set_id identifier to external repository.
pubmed_id pubmed identifier.

Table: homology.best_hit_per_species

Derived homology information that is used for transfer of evidence.

field description
protein_id internal protein identifier.
species_id taxonomy identifier of the species to which the alignment is found.
nr_high_scoring_hits number of proteins in the species that have blast bit scores higher than 60.
best_hit_protein_id the id of the protein with the highest scoring alignment.
best_hit_identifier the string preferred name of the highest scoring alignment.
best_hit_bitscore the bitscore of the highest scoring alignment.
best_hit_normscore score normalized by the self-hit of the longer protein.
best_hit_alignment_length the length of the alignment.

Table: homology.blast_data

Raw data of all-against-all BLAST alignments.

field description
protein_id_a internal identifier of protein A.
protein_id_b internal identifier of protein B.
bitscore bitscore of alignment (higher is better).
start_a amino acid where alignment start.
end_a amino acid where alignment stop (c.f. length_of_alignment_a = end_a
start_b amino acid where alignment start.
end_b amino acid where alignment stop.
size_b length of protein B.

Table: items.funccats

The functional categories defined by the COG database.

field description
funccat_id one-letter identifier of functional category.
funccat_description description of function (e.g. "Transcription").

Table: items.genes

Information of the genes that are used for neighborhood evidence.

field description
gene_id internal gene identifier.
gene_external_id external gene identifier (e.g.: "257311.BPP1623.NC_002928.1731802")
start_position_on_contig the nucleotide of the start of the ORF.
end_position_on_contig the nucleotide where the ORF ends.
protein_size size of the protein (usually the longest splice variant).

Table: items.genes_proteins

Mapping between the internal identifier of genes and proteins.

field description
protein_id internal protein identifier.
gene_id internal gene identifier.

Table: items.meshterms

Mesh (Medical Subject Headings) describe a controlled vocabulary when names and categories can not be distinguished.

field description
mesh_id MeSH identifier (e.g. 2826)
description Description of MeSH term (e.g."Chorismate Mutase")

Table: items.orthgroups

Information of orthologous groups.

field description
orthgroup_id internal orthologs groups identifier.
orthgroup_external_id name of orthologs group (e.g. "COG0133").
description general description of biological functionality.
protein_count number of members in orthologs group.
species_count number of distinct species in group.

Table: items.orthgroups_funccats

The functional category of a orthologs group.

field description
orthgroup_id internal orthologs groups identifier.
funccat_id one-letter identifier of functional category.

Table: items.orthgroups_species

This describes how many genes in a given organism encode a gene from a given orthologous group.

field description
orthgroup_id internal orthologs groups identifier.
species_id taxonomy identifier.
count number of genes.

Table: items.protein_image_match

Information about protein structure images used for the nodes in the network view.

field description
protein_id internal protein identifier.
image_id internal identifier of an protein structure image.
identity the percentage identity to the most similar protein.
source the origin of the protein structure.
start_position_on_protein from which position of the protein a structure is mapped.
end_position_on_protein to which position of the protein a stucture is mapped.
annotation the name of the structure.

Table: items.proteins

Information about the proteins in STRING.

field description
protein_id internal protein identifier.
protein_external_id taxonomy identifier and name of protein concatenated.
species_id taxonomy identifier.
protein_checksum checksum of the protein sequence.
protein_size length of the protein (in amino acids).
annotation description of the functionality of protein.
preferred_name the preferred name of STRING (e.g. "amiF")
annotation_word_vectors internal use only: enables full-text searching.

Table: items.proteins_meshterms

Mapping between MeSH and STRING.

field description
mesh_id MeSH id.
protein_id internal protein identifier.

Table: items.proteins_names

Mapping of various names to string entries

field description
protein_name a name of the protein (e.g. "amiF", "spr1703", "AE008535", etc.)
protein_id internal protein identifier.
species_id taxonomy identifier.
source the origin of the name (e.g. "Ensembl")
is_preferred_name "true" if the name is the preferred string name.

Table: items.proteins_orthgroups

Description of the members in the orthologs groups.

field description
protein_id internal protein identifier.
orthgroup_id to which orthgroup the protein belongs (internal orthgroup id).
species_id taxonomy identifier of the protein.
start_position residue within the protein where the orthologous group mapping starts.
end_position residue within the protein where the orthologous group mapping ends.
preferred_name preferred name of protein (redundant w.r.t items.proteins_names).
protein_annotation annotated function of protein (redundant w.r.t items.proteins_names).

Table: items.proteins_sequences

Describes the sequence of the protein.

field description
protein_id internal protein identifier.
sequence protein sequence.

Table: items.proteins_smartlinkouts

Links to the SMART database describing the domain structures of a protein.

field description
protein_id internal protein identifier.
protein_size length of protein in amino acids.
smart_url link to SMART database entry.

Table: items.runs"

Neighborhood evidence: this describes an un-interrupted group of neighboring genes a ('run').

field description
run_id internal id.
species_id taxonomy identifier.
contig_id from genome assembly information: which chromosome or otherwise identified contig the run is on.

Table: items.runs_genes_proteins

Mapping of between runs, genes and proteins.

field description
run_id internal id.
gene_id internal gene identifier.
protein_id internal protein identifier.
start_position_on_contig the nucleotide of the start of the ORF.
end_position_on_contig the nucleotide where the ORF ends.
preferred_name the preferred name of STRING.
annotation functional annotation of protein.

Table: items.runs_orthgroups

Describes which orthologous groups map to an un-interrupted group of genes on the chromosome.

field description
run_id internal id.
orthgroup_id internal orthologs groups identifier.

Table: items.species

Information on the organisms in STRING.

field description
species_id taxonomy identifier (e.g "9606" for human).
official_name scientific name of organism.
compact_name other name, shortened version of the scientific name.
kingdom to which of the 3 different highest grouping in the taxonomy the organism belong.
type If the organism is a core species or periphery species. Core species are BLAST aligned all-against-all, periphery only against the core.

Table: items.species_names

NCBI taxonomy used for organism selection on input page.

field description
species_id taxonomy identifier .
species_name species synonym.
official_name scientific name of organism.

Table: items.species_nodes

Auxiliary table to NCBI organism selection (c.f. items.species_names).

field description
species_id taxonomy identifier.
species_name name of query.
position position of fist clade member in a STRING clade.
size number of string species in the NCBI clade.

Table: network.actions

The type of an interaction

field description
item_id_a internal protein identifier.
item_id_b internal protein identifier.
mode type of interaction ("reaction", "expression", "activation", "ptmod"(post-translational modifications), "binding", "catalysis")
action the effect of the action ("inhibition", "activation")
a_is_acting the directionality of the action if applicable (1 gives that item_id_a is acting upon item_id_b)
score the best combined score of all interactions in string.

Table: network.best_combined_scores_orthgroups

Derived table of best combined score between two orthologs groups.

field description
orthgroup_id internal orthologs group identifier.
best_score the highest score of any members between two orthologs group.

Table: network.best_combined_scores_proteins

The highest interaction scores of a protein.

field description
protein_id internal protein identifier.
best_score the best combined score of all interactions in string.

Table: network.node_node_links

The interactions and their scores between proteins in a species (and orthologs groups)

field description
node_id_a internal identifier (equivalent to protein_id).
node_id_b internal identifier (equivalent to protein_id).
node_id_b taxonomy identifier (equivalent to species_id).
combined_score the combined score of all the evidence scores (including transferred scores).
evidence_score the scores of the individual channels represented as a list of score types and their score. For example, {{4,626}} means that coocurrance score (4) is 0.626. The types of score can be found in table network.score_types.

The combined_score is multiplied by 1000 to represent a score that range from 0 to 1 (as an integer from 0 to 1000).

Table: network.score_types

field description
score_id internal identifier
score_type the type of the score, see below
score_type name description
1 equiv_nscore neighborhood score, (computed from the inter-gene nucleotide count).
2 equiv_nscore_transferred neighborhood score from other species (via homology).
3 equiv_fscore fusion score (derived from fused proteins in other species).
4 equiv_pscore cooccurence score of the phyletic profile (derived from similar absence/presence patterns of genes).
5 equiv_hscore homology score, the degree of homology of the interactors (trivial and normally not reported in STRING).
6 array_score coexpression score (derived from similar patter of mRNA expression measured by DNA arrays and similar technologies).
7 array_score_transferred coexpression score transferred by homology from other species.
8 experimental_score experimental score (derived from experimental data, such as, affinity chromatography).
9 experimental_score_transferred experimental score transferred by homology from other species.
10 database_score database score (derived from curated data of various databases).
11 database_score_transferred database score transferred by homology from other species.
12 textmining_score textmining score (derived from co-occurring mentioning of gene/protein names in abstracts).
13 textmining_score_transferred textmining score transferred by homology from other species.
14 neighborhood_score raw neighborhood counts for COG mode (deprecated).
15 fusion_score raw fusion score for COG mode (deprecated).
16 cooccurence_score raw cooccurence score for COG mode (deprecated).