Statistics

The following tables show statistical information about the SUS-BAR database.

In Table 1 the total number of pig proteins, as retrieved from the present releases of the databases, is sorted out based on the number of sequences endowed with unique Gene Ontology of the three main roots (Molecular Function (MF); Biological Process (BP); Cellular Component (CC)), with all the GO terms (All-GO), with Pfam domains (Pfam), with both Pfam and All-GO (Pfam & All-GO) terms and with a structure in the Protein database (PDB). Sequences are also listed depending on the UniProtKB branch from where there were retrieved (SwissProt, manually annotated and reviewed, and TrEMBL, automatically annotated).


Table 1. Annotation of the PIG proteome in UniProtKB and Ensembl

 

 

MF

BP

CC

All-GO

Pfam°

Pfam &All-GO

PDB*

SwissProt

^[1,406]

Sequences

1,159

966

1,259

1,377

1,288

1,392

112

Terms

765

1,234

262

2,261

961

3,222

-

TrEMBL

^[18,170]

Sequences

9,514

4,556

5,817

11,230

13,092

14,934

0

Terms

895

983

247

2,125

3,712

5,837

-

Ensembl

^[15,805]

Sequences

12,369

11,295

10,500

13,583

12,981

13,832

77

Terms

2,632

6,867

947

10,446

4,150

14,596

-

Total

^[35,381]

Sequences

23,042

16,817

17,576

26,190

27,361

30,158

189

Terms

2,657

6,890

949

10,496

4,324

14,820

-


With our method all the pig protein sequences are aligned towards the BAR+ database and they may enter into a cluster containing statistically validated information (P value<0.01) for a specific GO term or Pfam domain. This is the case for 26,320 pig protein sequences while 9,061 remain singletons and carry along the UniProtKB or Ensembl annotation (when present). 83% of the cluster-retained sequences align towards clusters endowed with statistically validated annotation and they inherit all the cluster statistically validated GO terms and/or Pfam domains. In Table 2, with the symbol ° we identify terms that are statistically validated and have an experimental evidence code.


Table 2. Statistically validated annotation of the PIG proteome in SUS-BAR


 

MF

MF°

BP

BP°

CC

CC°

All-GO

All-GO°

Pfam

Pfam & All-GO

*PDB

Cluster

^[26,320]

Sequences

17,152

12,380

16,567

13,359

16,571

13,187

19,820

15,785

20,690

21,793

9,383

Clusters

6,929

3,973

6,482

4,599

6,442

4,497

8,578

5,974

9,212

9,941

3,528

Terms

3,668

3,069

10,325

9,896

1,369

1,234

15,362

14,199

3,941

19,303

-

§Singleton

^[9,061]

Sequences

4,596

11

3,095

8

3,084

10

5,280

16

5,697

6,406

30

Terms

1,090

33

2,966

184

552

43

4,608

260

2,058

6,666

-

Total

^[35,381]

Sequences

21,748

12,391

19,662

13,367

19,655

13,197

25,100

15,801

26,387

28,199

9,413

Terms

3,730

3,070

10,533

9,900

1,393

1,235

15,656

14,205

4,220

19,876

-




In Table 3 the effect of our annotation procedure is shown for sequences without any annotation in UniProtKB and Ensembl.


Table 3. SUS-BAR annotation of pig protein sequences not annotated in UniProtKB and Ensembl


 

MF

BP

CC

All-GO

Pfam

Pfam & All-GO

*PDB

UniProtKB
^[3,250]

Sequences

285

418

456

607

234

666

124

Clusters

240

358

396

526

204

580

101

Terms

515

2,232

426

3,173

154

3,327

-

Ensembl

^[2,370]

Sequences

90

104

131

175

77

202

31

Clusters

73

83

104

142

58

159

23

Terms

189

656

262

1,107

65

1,172

-

Total

^[5,620]

Sequences

375

522

587

782

311

868

155

Clusters

282

402

453

607

247

674

113

Terms

545

2,311

467

3,323

195

3,518

-


In Table 4 the search by Homo sapiens, Mus musculus and Bos taurus retrieves all the clusters where sequences of the three organisms share some annotation with those of the pig animal, including, when available, a structural template. Interestingly a large fraction of the pig protein sequences inherit from the clusters statistically validated annotation albeit the low sequence identity (SI< 30%) with sequences carrying information into the cluster.


Table 4. PIG sequences in clusters with other organisms

Organism 

#Clusters

#Pig Sequences

#Pig Sequences (SI<30%)

#Clusters with PDB

#Pig
Sequences inheriting PDBs

Homo sapiens

10,475

22,581

3,958

3,487

9,314

Mus musculus

9,778

21,648

4,525

3,430

9,222

Bos taurus

9,303

21,044

4,238

3,305

9,050




For optimal rendering of the website please use: Mozilla Firefox | Google Chrome | Opera | Apple Safari