-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathaid-pavlidis_09_metacoexpression_798669.pdf.txt
2831 lines (2256 loc) · 97.3 KB
/
aid-pavlidis_09_metacoexpression_798669.pdf.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>1471-2164-10-420.fm</title>
<meta name="Author" content="Ezhilan"/>
<meta name="Creator" content="FrameMaker 7.1"/>
<meta name="Producer" content="Acrobat Distiller 7.0 (Windows)"/>
<meta name="CreationDate" content=""/>
</head>
<body>
<pre>
BMC Genomics
BioMed Central
Open Access
Research article
Meta-coexpression conservation analysis of microarray data: a
"subset" approach provides insight into brain-derived neurotrophic
factor regulation
Tamara Aid-Pavlidis*†1, Pavlos Pavlidis†2 and Tõnis Timmusk1
Address: 1Department of Gene Technology, Tallinn University of Technology, Akadeemia tee 15, 19086 Tallinn, Estonia and 2Department of
Biology, Section of Evolutionary Biology, University of Munich, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
Email: Tamara Aid-Pavlidis* - [email protected]; Pavlos Pavlidis - [email protected];
Tõnis Timmusk - [email protected]
* Corresponding author †Equal contributors
Published: 8 September 2009
BMC Genomics 2009, 10:420
doi:10.1186/1471-2164-10-420
Received: 29 December 2008
Accepted: 8 September 2009
This article is available from: http://www.biomedcentral.com/1471-2164/10/420
© 2009 Aid-Pavlidis et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Background: Alterations in brain-derived neurotrophic factor (BDNF) gene expression contribute
to serious pathologies such as depression, epilepsy, cancer, Alzheimer's, Huntington and
Parkinson's disease. Therefore, exploring the mechanisms of BDNF regulation represents a great
clinical importance. Studying BDNF expression remains difficult due to its multiple neural activitydependent and tissue-specific promoters. Thus, microarray data could provide insight into the
regulation of this complex gene. Conventional microarray co-expression analysis is usually carried
out by merging the datasets or by confirming the re-occurrence of significant correlations across
datasets. However, co-expression patterns can be different under various conditions that are
represented by subsets in a dataset. Therefore, assessing co-expression by measuring correlation
coefficient across merged samples of a dataset or by merging datasets might not capture all
correlation patterns.
Results: In our study, we performed meta-coexpression analysis of publicly available microarray
data using BDNF as a "guide-gene" introducing a "subset" approach. The key steps of the analysis
included: dividing datasets into subsets with biologically meaningful sample content (e.g. tissue,
gender or disease state subsets); analyzing co-expression with the BDNF gene in each subset
separately; and confirming co- expression links across subsets. Finally, we analyzed conservation in
co-expression with BDNF between human, mouse and rat, and sought for conserved overrepresented TFBSs in BDNF and BDNF-correlated genes. Correlated genes discovered in this study
regulate nervous system development, and are associated with various types of cancer and
neurological disorders. Also, several transcription factor identified here have been reported to
regulate BDNF expression in vitro and in vivo.
Conclusion: The study demonstrates the potential of the "subset" approach in co-expression
conservation analysis for studying the regulation of single genes and proposes novel regulators of
BDNF gene expression.
Page 1 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
Background
The accumulation of genome-wide gene expression data
has enabled biologists to investigate gene regulatory
mechanisms using system biology approaches. Recent
developments in microarray technologies and bioinformatics have driven the progress of this field [1]. Moreover,
publicly available microarray data provide information
on human genome-wide gene expression under various
experimental conditions, which for most researchers
would be difficult to access otherwise.
BDNF (brain-derived neurotrophic factor) plays an
important role in the development of the vertebrates'
nervous system [2]. BDNF supports survival and differentiation of embryonic neurons and controls various neural
processes in adulthood, including memory and learning
[3], depression [4], and drug addiction [5]. Alterations in
BDNF expression can contribute to serious pathologies
such as epilepsy, Huntington, Alzheimer's, and Parkinson's disease [6]. Alteration in BDNF expression is associated with unfavorable prognosis in neuroblastoma [7],
myeloma [8], hepatocellular carcinoma [9] and other
tumors [10]. Apart from brain, expression of alternative
BDNF transcripts has been detected in a variety of tissues
(such as heart, muscle, testis, thymus, lung, etc.) [11,12].
Numerous studies have been conducted to unravel the
regulation of BDNF expression in rodents and human.
Data on the structure of human [11] and rodent
[12]BDNF gene have been recently updated. Nevertheless,
little is known about the regulation of human BDNF gene
expression in vivo. Unraveling the regulation of BDNF
expression remains difficult due to its multiple activitydependent and tissue-specific promoters. Thus, analysis of
the gene expression under various experimental conditions using microarray data could provide insight into the
regulation of this complex gene.
Meta-coexpression analysis uses multiple experiments to
identify more reliable sets of genes than would be found
using a single data set. The rationale behind meta-coexpression analysis is that co-regulated genes should display
similar expression patterns across various conditions.
Moreover, such analysis may benefit from a vast representation of tissues and conditions [13]. A yeast study
showed that the ability to correctly identify co-regulated
genes in co-expression analysis is strongly dependent on
the number of microarray experiments used [14]. Another
study that examined 60 human microarray datasets for coexpressed gene pairs reports that gene ontology (GO)
score for gene pairs increases steadily with the number of
confirmed links compared to the pairs confirmed by only
a single dataset [15]. Several studies have successfully
applied meta-analysis approach to get important insights
into various biological processes. For instance, microarray
meta-analysis of aging and cellular senescence led to the
http://www.biomedcentral.com/1471-2164/10/420
observation that the expression pattern of cellular senescence was similar to that of aging in mice, but not in
humans [16]. Data from a variety of laboratories was integrated to identify a common host transcriptional response
to pathogens [17]. Also, meta-coexpression studies have
displayed their efficiency to predict functional relationships between genes [18]. However, co-expression alone
does not necessarily imply that genes are co-regulated.
Thus, analysis of evolutionary conservation of co-expression coupled with the search for over-represented motifs
in the promoters of co-expressed genes is a powerful criterion to identify genes that are co-regulated from a set of
co-expressed genes [19,20].
In co-expression analysis, similarity of gene expression
profiles is measured using correlation coefficients (CC) or
other distance measures. If the correlation between two
genes is above a given threshold, then the genes can be
considered as «co-expressed» [1]. Co-expression analysis
using a «guide-gene» approach involves measuring CC
between pre-selected gene(s) and the rest of the genes in a
dataset.
It is a common practice in meta-coexpression studies to
assess co-expression by calculating the gene pair correlations after merging the datasets [20] or by confirming the
re-occurrence of significant correlations across datasets
[15]. However, it has been shown recently that genes can
reveal differential co-expression patterns across subsets in
the same dataset (e.g. gene pairs that are correlated in normal tissue might not be correlated in cancerous tissue or
might be even anti-correlated) [21]. Therefore, assessing
co-expression by measuring CC across merged samples of
a dataset or by merging datasets may create correlation
patterns that could not be captured using the CC measurement.
In this study, we performed co-expression analysis of publicly available microarray data using BDNF as a "guidegene". We inferred BDNF gene co-expression links that
were conserved between human and rodents using a
novel "subset" approach. Then, we discovered new putative regulatory elements in human BDNF and in BDNFcorrelated genes, and proposed potential regulators of
BDNF gene expression.
Results
We analyzed 299 subsets derived from the total of 80
human, mouse and rat microarray datasets. In order to
avoid spurious results that could arise from high-throughput microarray analysis methods, we applied successive
filtering of genes. Then, we divided datasets into subsets
with biologically meaningful sample content (e.g. tissue,
gender or disease state subsets), analyzed co-expression
with BDNF across samples separately in each subset and
Page 2 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
confirmed the links across subsets. Finally, we analyzed
conservation in co-expression between human, mouse
and rat, and sought for conserved TFBSs in BDNF and
BDNF-correlated genes (Figure 1).
Data filtering
Gene Expression Omnibus (GEO) from NCBI and
ArrayExpress from EBI are the largest public peer reviewed
microarray repositories, each containing about 8000
experiments. In order to avoid inaccuracies arising from
measuring expression correlation across different microarray platforms [13] we used only Affymetrix GeneChips
platforms for the analysis. Since ArrayExpress imports
Affymetrix experiments from GEO http://www.ebi.ac.uk/
microarray/doc/help/GEO_data.html, we used only GEO
database to retrieve datasets.
A study examining the relationship between the number
of analyzed microarray experiments and the reliability of
the results reported that the accuracy of the analysis plateaus at between 50 and 100 experiments [14]. Another
study demonstrated how the large amount of microarray
data can be exploited to increase the reliability of inferences about gene functions. Links that were confirmed
three or more times between different experiments had
significantly higher GO term overlaps than those seen
only once or twice (p < 10-15) [15]. Therefore, we performed meta-coexpression analysis using multiple experiments to increases the accuracy of the prediction of the coexpression links.
Since BDNF served as a guide-gene for our microarray
study, qualitative and quantitative criteria were applied
for selection of the experiments with respect to BDNF
probe set presence on the platform [see Additional file 1:
BDNF probe sets], BDNF signal quality and expression
levels. In addition, non-specific filtering [19] was performed to eliminate the noise (see Methods/Microarray
datasets). Consequently, 80 human, mouse and rat microarray experiments (datasets) from Gene Expression Omnibus (GEO) database met the selection criteria. Each
dataset was split into subsets according to the annotation
file included in the experiment [see Additional file 2:
Microarray datasets and Additional file 3: Subsets]. In
summary, 299 subsets were obtained from 38 human, 24
mouse and 18 rat datasets. From 38 human datasets, 8
were related to neurological diseases (epilepsy, Huntington's, Alzheimer's, aging, encephalitis, glioma and schizophrenia) and contained samples from human brain;
another 9 datasets contained samples from human "normal" (non-diseased) tissues (non-neural, such as blood,
skin, lung, and human brain tissues); 12 datasets had
samples from cancerous tissues of various origins (lung,
prostate, kidney, breast and ovarian cancer). The rest 9
datasets contained samples from diseased non-neural tissues (HIV infection, smoking, stress, UV radiation etc.).
http://www.biomedcentral.com/1471-2164/10/420
Out of 24 mouse datasets, 5 datasets were related to neurological diseases (brain trauma, spinal cord injury,
amyotrophic lateral sclerosis, and aging); 15 datasets contained normal tissue samples (neural and peripheral tissues); 1 dataset contained lung cancer samples; 3 datasets
were related to non-neural tissues' diseases (muscle dystrophy, cardiac hypertrophy and asthma). Among 18 rat
datasets, 11 datasets were related to neurological diseases
(spinal cord injury, addiction, epilepsy, aging, ischemia
etc), 5 datasets were with "normal tissue samples" composition and 2 datasets examined heart diseases [see Additional file 2: Microarray datasets].
According to Elo and colleagues [22] the reproducibility
of the analysis of eight samples approaches 55%. Selecting
subsets with more than eight samples for the analysis
could increase the reproducibility of the experiment however reducing the coverage, since subsets with lower
number of samples would be excluded. Thus, we selected
subsets with a minimum of eight samples for the analysis,
in order to achieve satisfactory reproducibility and coverage. The expression information for human, mouse and
rat genes obtained from GEO database, information
about BDNF probe names used for each dataset, information about subsets derived from each experiment, and
data on correlation of expression between BDNF and
other genes for each microarray subset has been made
available online and can be accessed using the following
link: http://www.bio.lmu.de/~pavlidis/bmc/bdnf.
Differential expression of BDNF across subsets
Since the study was based on analyzing subsets defined by
experimental conditions (gender, age, disease state etc) it
was of biological interest to examine if BDNF is differentially expressed across subsets within a dataset. We used
Kruskal-Wallis test [23] to measure differential expression. The results of this analysis are given in the Additional files 4, 5 and 6: Differential expression of the BDNF
gene in human, mouse and rat datasets.
Co-expression analysis
Since the expression of BDNF alternative transcripts is tissue-specific and responds to the variety of stimuli, seeking
for correlated genes in each subset separately could help
to reveal condition-specific co-expression. The term "subset" in this case must be understood as "a set of samples
under the same condition".
We derived 119 human, 73 mouse and 107 rat subsets
from the corresponding datasets. Pearson correlation
coefficient (PCC) was chosen as a similarity measure since
it is one of the most commonly used, with many publications describing analysis of Affymetrix platforms
[13,24,25]. PCC between BDNF and other genes' probe
sets was measured across samples for each subset separately. From each subset, probe sets with PCC r > 0.6 were
Page 3 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
http://www.biomedcentral.com/1471-2164/10/420
Download Affymetrix microarray datasets:
Human, mouse and rat from GEO
~ 30 000 probe sets per platform
Check datasets for BDNF expression:
BDNF probe set presence on the platform
BDNF CALL = PRESENT in > 70% of samples
Non−specific filtering of data in each dataset:
Exclude genes with missing values in > 1/3 of samples
Column−average imputation
Two−fold expression change from the
average in > 5 samples
Dividing datasets into subsets
Co−expression analysis
Pearson correlation coefficient − resampling
~ 9 000 BDNF−correlated genes per species
Co−expression link confirmation
BDNF−correlated 3+ genes
~ 2400 in human
~ 1800 in mouse
~ 740 in rat
Co−expression conservation analysis:
BDNF−correlated genes in human − mouse − rat
~80 conserved BDNF−correlated genes
Discovery of over−represented TFBSs in conserved
BDNF−correlated genes; DiRE and CONFAC
Novel potential regulators of BDNF expression
Figure 1
Microarray data analysis flowchart
Microarray data analysis flowchart. Altogether, 80 human, mouse and rat Affymetrix datasets were analyzed (dataset
selection criteria: > 16 samples per dataset; BDNF detection call PRESENT in more than 70% of the samples). Data was subjected to non-specific filtering (missing values and 2-fold change filtering). Thereafter, datasets were divided into 299 corresponding subsets. Co-expression analysis in human, mouse and rat subsets allowed the detection of genes that co-expressed
with BDNF in more than 3 subsets (~1000 genes for each species). As a result of co-expression conservation analysis, 84 genes
were found to be correlated with BDNF in all three species. Discovery of over-represented motifs in the regulatory regions of
these genes and in BDNF suggested novel regulators of BDNF gene expression.
Page 4 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
selected. It was demonstrated by Elo and colleagues [22]
that in the analysis of simulated datasets a cutoff value r =
0.6 showed both high reproducibility (~0.6 for profile
length equal to 10) and low error. A "data-driven cutoff
value" approach has been rejected because it is based on
the connectivity of the whole network, whereas we
focused only on the links between BDNF and other genes.
A lower threshold of 0.4 generated a list of genes that
showed no significant similarities when analyzed using
g:Profiler tool that retrieves most significant GO terms,
KEGG and REACTOME pathways, and TRANSFAC motifs
for a user-specified group of genes [26]. The value r = 0.6
was chosen over more stringent PCC values because the
lengths of the expression profiles were not too short
(mean profile length ~17, standard deviation ~12). Moreover, the PCC threshold higher than 0.6 was not justified
since we performed further filtering by selecting only conserved correlated genes, thus controlling the spurious
results.
Each probe set correlation with BDNF that passed the
threshold was defined as a "link". It has been previously
shown that a link must be confirmed in at least 3 experiments (3+ link) in order to be called reliable [15]. Therefore, we selected (3+) genes for evolutionary conservation
analysis, narrowing the list of correlated genes to eliminate the noise. g:Profiler analysis of these genes revealed
that the results are statistically significant (low p-values)
and the genes belong to GO categories that are relevant to
biological functions of BDNF. For example, the list of
human genes produced the following results when analyzed with g:Profiler (p-values for the GO categories are
given in the parenthess): nervous system development
(5.96·10-21), central nervous system development
(3.29·10-07), synaptic transmission (4.40·10-11), generation of neurons (1.58·10-08), neuron differentiation
(1.02·10-06), neurite development (4.11·10-07), heart
development (1.67·10-09), blood vessel development
(5.51·10-14), regulation of angiogenesis (7.16·10-09),
response to wounding (1.32·10-11), muscle development
(1.53·10-10), regulation of apoptosis (1.65·10-07), etc.
We have used r = 0.6 as a "hard" threshold value for the
CC. A disadvantage of this approach is that there will be
no connection between BDNF and other genes whose correlation with BDNF is 0.59 in a specific dataset [27]. Using
multiple datasets was expected to remedy this effect. An
alternative approach would be to use "soft" threshold
approaches [27]. According to the soft threshold
approach, a weight between 0 and 1 is assigned to the connection between each pair of genes (or nodes in a graph).
Often, the weight between the nodes A and B is represented by some power of the CC between A and B. However, other similarity measures may be used given that
they are restricted in [0, 1]. A drawback of the weighted
CC approach is that it is not clear how to define nodes that
http://www.biomedcentral.com/1471-2164/10/420
are directly linked to a specific node [27] because the
available information is related only to how strongly two
nodes are connected. Thus, if neighbors to a node are
requested, threshold should be applied to the connection
strengths. Alternatively, Li and Horvath [28] have developed an approach to answer this question based on
extending the topological overlap measure (TOM), which
means that the nodes (e.g. genes) should be strongly connected and belong to the same group of nodes. However,
this analysis requires the whole network of a set of genes.
In the current analysis, we did not construct the co-expression network for all the genes of microarray experiments.
Instead, we focused on a small part of it i.e. the BDNF
gene and the genes linked to BDNF. Therefore, TOM analysis was not possible using our approach.
To see how the "weighted CC" method would affect the
results of our study we used a simplified approach.
Instead of applying "hard" threshold (0.6) for the CC we
measured the strength of all the connections between
BDNF and all the genes in a microarray experiment. The
connection strength sj = [(1 + CCj)/2]b, where CCj denotes
the CC between BDNF and the gene j, is between 0 and 1
and b is an integer. In order to define b, analysis of the
scale-free properties of the network is required. However,
we used the value 6. Great b values give lower weight to
weak connections. Then we calculated the average
sj(ave(sj)) among all the subsets. Finally, we sorted the
genes based on their ave(sj) and calculated the overlap of
the top of this list with our results for each species (human
mouse and rat). When restricting the top of the weighted
CC list to the same number of genes that we have
obtained for the 3+ list for each species, we observed that
the top-weighted CC genes overlap extensively with the 3+
list (overlapping > 80%) for each species. Therefore, even
though the "soft" and "hard" thresholding approaches are
considerably different we observe quite extensive overlap
of the results. We would like to stress that we did not
apply the full weighted CC and TOM methodology since
it would require the construction of the whole network
which was beyond the aims of our study. However, such
investigation of the whole co-expression network could
contribute to the understanding of BDNF regulation and
function.
Correlation conservation and g:Profiler analysis
Co-expression that is conserved between phylogenetically
distant species may reveal functional gene associations
[29]. We searched for common genes in the lists of 2436
human, 1824 mouse and 740 rat genes (3+ genes, whose
expression is correlated with BDNF). From these genes,
490 were found to be correlated with BDNF in human
and mouse, 210 correlated with BDNF in human and rat,
and 207 conserved between mouse and rat [see Additional file 7: Conserved BDNF-correlated genes]. We
found a total of 84 genes whose co-expression with BDNF
Page 5 of 20
(page number not for citation purposes)
Page 6 of 20
GO category
Conserved correlated genes
protein tyrosine
kinase PW *
ANGPT1
BAIAP2
PTPRF
FP106
dendrite
localization*
DBN1
signal
transduction*
DUSP1
EPHA4
EPHA5
EPHA7
FGFR1
GAS6
KALRN
IRS2
NTRK2
FREQ
GRIA3
KCND2
NTRK2
ANGPT1
CREM
DUSP6
EPHA5
FGFR1
IGFBP5
KALRN
NR4A2
PDE4B
PRKAG2
PTPRF
TBX3
BAIAP2
COL11A1
CXCL5
DUSP1
EGR1
EPHA4
EPHA7
FGF13
GAS6
GRIA3
IL6ST
IRS2
KLF10
MYH9
NTRK2
ODZ2
PENK
PLAUR
PRKCB
PRKCE
RGS4
SCG2
ZFP106
hsa-miR-369-3p*
COL11A1
DBC1
DCN
DUSP1
GAS6
ITF-2
KLF10
NEUROD6
PENK
TRPC4
TF:
CCCGCCCCCR
CCCC (KROX) *
ATF3
ATP1B1
CCND2
COL11A1
DBN1
DLGAP4
EPHA7
GAS6
GRIA3
IL6ST
IRS2
KCND2
KLF10
NFIA
NPTXR
PCSK2
SNCA
THRA
ATF3
CCND2
DBC1
DUSP6
FREQ
ITF-2
MBP
NPTXR
PCSK1
PTGS2
THRA
BAIAP2
BASP1
CAMK2D
COL4A5
CREM
CXCL5
DBN1
DLGAP4
DUSP1
EGR1
EPHA5
EPHA7
GRIA3
HN1
IRS2
KALRN
KLF10
LMO7
MDM2
NFIA
NPTX1
NR4A2
NTRK2
OLFM1
PDE4B
PRKCB1
PRSS23
PTPRF
PURA
TBX3
TRPC4
VCAN
NS development*
BAIAP2
DBN1
EPHA4
EPHA7
FGF13
FGFR1
IRS2
KALRN
MBP
NEFL
NEUROD6
NPTX1
NR4A2
NTRK2
OLFM1
PCSK2
PTPRF
PURA
SMARCA4
SNCA
TBX3
angiogenesis
ANGPT1
BAIAP2
CYR61
MYH9
SCG2
SERPINE1
TBX3
apoptosis/
anti-apoptosis
BIRC4
KLF10
NEFL
PLAGL1
PRKCE
SCG2
SNCA
cell cycle
CAMK2D
CORO1A
DUSP1
MDM2
MYH9
PPP3CA
synaptic
transmission/
plasticity
DBN1
KCND2
MBP
NPTX1
NR4A2
SNCA
BMC Genomics 2009, 10:420
TF:
GGGGAGGG
(MAZ/SP1) *
TBX3
GO categories marked with a star (*) have been reported as statistically significant for this gene list by g:Profiler analysis tool. Human gene names are given representing mouse and rat orthologs
whenever gene names for all three species are not the same. GO - gene ontology, PW - pathway, TF - transcription factor, NS - nervous system.
(page number not for citation purposes)
http://www.biomedcentral.com/1471-2164/10/420
Table 1: BDNF-correlated genes conserved between human, mouse and rat.
BMC Genomics 2009, 10:420
was conserved in all three organisms (Table 1) [see also
Additional file 7: Conserved BDNF-correlated genes].
Due to a variety of reasons (e.g. sample size of a dataset/
subset, probe set binding characteristics, sample preparation methods, etc.), when measured only in one dataset/
subset, some of the co-expression links might occur by
chance. Checking for multiple re-occurrence of a link is
expected to reduce the number of false-positive links.
More importantly, the conservation analysis should further reduce the number of artifacts. However, since our
analysis comprised a multitude of subsets it was important to estimate the statistical significance of the results.
To tackle this problem, we created randomized subsets
similarly to what was described by Lee and colleagues [15]
and calculated the distribution of correlated 3+ links for
each species separately. The results showed that our coexpression link confirmation analysis resulted in a significantly higher number of links compared to the randomized data (p-value < 0.005 for each species). However,
it should be mentioned that the number of 3+ links
remained quite high in the randomized datasets: for
human subsets it constituted about 58% of the observed
3+ links, for mouse about 43% and for rat 21%. These
results justify the subsequent co-expression conservation
analysis step. Indeed, in random human, mouse and rat
subsets the number of correlated 3+ links was only about
9% of the discovered conserved BDNF-correlated links
(that is ~7.5 genes out of 84).
Analysis of the list of 84 conserved BDNF-correlated genes
using g:Profiler showed significantly low p-values for all
the genes and revealed significant GO categories related to
BDNF actions [see Additional file 8: g:Profiler analysis].
Statistically significant GO categories included: i) MYCassociated zinc finger protein (MAZ) targets (44 genes, p =
1.82·10-05); ii) signal transduction (36 genes, p =
3.51·10-06); iii) nervous system development (17 genes, p
= 5.27·10-08); iv) Kruppel-box protein homolog (KROX)
targets (18 genes, p = 1.21·10-04); v) transmembrane
receptor protein tyrosine kinase pathway (7 genes, p =
3.56·10-06); vi) dendrite localization (5 genes, p =
1.82·10-05) (Table 1).
According to the Gene Ontology database, conserved
BDNF-correlated gene products participate in axonogenesis (BAIAP2), dendrite development (DBN1), synaptic
plasticity and synaptic transmission (DBN1, KCND2,
MBP, NPTX1, NR4A2 and SNCA), regeneration (GAS6,
PLAUR), regulation of apoptosis (XIAP (known as
BIRC4), KLF10, NEFL, PLAGL1, PRKCE, SCG2, SNCA, and
TBX3), skeletal muscle development (MYH9, PPP3CA,
and TBX3) and angiogenesis (ANGPT1, BAIAP2, CYR61,
MYH9, SCG2, SERPINE1 and TBX3) (Table 1). Out of 84,
24 BDNF-correlated genes are related to cancer and 14 are
involved in neurological disorders (Table 2).
http://www.biomedcentral.com/1471-2164/10/420
Interactions among correlated genes
We searched if any of the correlated genes had known
interactions with BDNF using Information Hyperlinked
over Proteins gene network (iHOP). iHOP allows navigating the literature cited in PubMed and gives as an output
all sentences that connect gene A and gene B with a verb
http://www.ihop-net.org/[30]. We constructed a "gene
network" using the iHOP Gene Model tool to verify
BDNF-co-expression links with the experimental evidences reported in the literature (Figure 2). For the URL
links to the cited literature see Additional file 9: iHOP references.
According to the literature, 17 out of 84 conserved correlated genes have been reported to have functional interaction or co-regulation with BDNF (Figure 2A). IGFBP5
[31], NR4A2, RGS4 [32] and DUSP1 [33] have been previously reported to be co-expressed with human or rodent
BDNF. Other gene products, such as FGFR1 [34] and
SNCA [35] are known to regulate BDNF expression. Proprotein convertase PCSK1 is implied in processing of proBDNF [36]. PTPRF tyrosine phosphatase receptor associates with NTRK2 and modulates neurotrophic signaling
pathways [37]. Thyroid hormone receptor alpha (THRA)
induces expression of BDNF receptor NTRK2 [38]. Finally,
expression of such genes like EGR1 [39], MBP [40], NEFL
[41], NPTX1 [42], NTRK2, SERPINE1 [43], SCG2 [44],
SNCA [45] and TCF4 (also known as ITF2) [46] is known
to be regulated by BDNF signaling. CCND2, DUSP1,
DUSP6, EGR1 and RGS4 gene expression is altered in cortical GABA neurons in the absence of BDNF [47].
iHOP reports the total of 250 interactions with human
BDNF. In order to assess the probability of observing 17/
84 or more functional interactions between BDNF and
other genes, we had to make an assumption regarding the
total number of human genes that iHOP uses. A lower
number of total genes would result in higher p-values
whereas a higher number of total genes would produce
lower p-values. We assumed that the total number of
human genes is N = 5000, 10000, 20000 or 30000. Furthermore, the total number of genes linked to BDNF is m
= 250 based on iHOP data. Thus, the p-values were
obtained using the right-tail of the hypergeometric probability distribution. For N = 5000, 10000, 20000 or 30000,
the p-values are 1.0 × 10-07, 1.7 × 10-12, 1.3 × 10-17, 1.18 ×
10-20 respectively.
By analyzing the iHOP network indirect connections with
BDNF could be established for the genes that did not have
known direct interactions with BDNF (Figure 2B). For
example, SCG2 protein is found in neuroendocrine vesicles and is cleaved by PCSK1 [48] - protease that cleaves
pro-BDNF. BDNF and NTRK2 signaling affect SNCA gene
expression and alpha-synuclein deposition in substantia
nigra [49]. ATF3 gene is regulated by EGR1 [50], which
Page 7 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
http://www.biomedcentral.com/1471-2164/10/420
Table 2: Conserved correlated genes are associated with various types of cancer and neurological disorders.
Disease
Associated genes
References
Schizophrenia
BDNF RGS4 NR4A2
Schmidt-Kastner et al. (2006)
Parkinson's disease
BDNF PTGS2 SNCA NR4A2
Murer et al. (2001)
Chae et al. (2008)
Pardo and van Duijn (2005)
Alzheimer's
BDNF
KALRN
Murer et al. (2001)
Youn et al. (2007)
Polyglutamine neurodegeneration
NEFL
BAIAP2
Mosaheb et al. (2005)
Thomas et al. (2001)
alpha-mannosidosis
MAN1A1
D'Hooge et al. (2005)
Ophthalmopathy
CYR61 DUSP1 EGR1 PTGS2
Lantz et al. (2005)
Epilepsy
BDNF DUSP6 EGR1
Binder and Scharfman (2004)
Rakhade et al. (2007)
Depression
BDNF DUSP1
Russo-Neustadt and Chen (2005)
Rakhade et al. (2007)
Ischemia
BDNF CD44 PTGS2
Binder and Scharfman (2004)
Murphy et al. (2005)
Ovarian carcinoma
BDNF ITF2 DUSP1 RGS4
Yu et al. (2008)
Kolligs et al. (2002)
Puiffe et al. (2007)
Breast cancer
BDNF FGFR1 CCND2 PLAU SERPINE1 PLAUR MAZ DUSP6
EGR1
KFL10
PTRF
Tozlu et al. (2006)
Koziczak et al. (2004)
Grebenchtchikov et al. (2005)
Cui et al. (2006)
Liu et al. (2007)
Reinholz et al. (2004)
Levea et al. (2000)
Lung cancer
BDNF ODZ2 CCND2 GFI1
Ricci et al. (2005)
Kan et al. (2006)
Prostate cancer
BDNF IGFBP5 PLAUR p75NTR
Bronzetti et al. (2008)
Nalbandian et al. (2005)
Pheochromocytoma
PCSK1 PCSK2 SCG2
Guillemot et al. (2006)
Endometrial cancer
CXCL5 OLFM1
Wong et al. (2007)
Leukemia
PKCB1 CCND2
Hans et al. (2005)
expression is activated by BDNF [39]. For more interactions see Figure 2.
Motif discovery
Assuming that genes with similar tissue-specific expression patterns are likely to share common regulatory elements, we clustered co-expressed genes according to their
tissue-specific expression using information provided by
TiProd database [51]. Each tissue was assigned a category
and the genes expressed in corresponding tissues were
clustered into the following categories: i) CNS, ii) peripheral NS (PNS), ii) endocrine, iii) gastrointestinal, and iv)
genitourinary. We applied DiRE [52] and CONFAC [53]
motif-discovery tools to search for statistically over-represented TFBSs in the clusters and among all conserved
BDNF-correlated genes. DiRE can detect regulatory elements outside of proximal promoter regions, as it takes
advantage of the full gene locus to conduct the search. The
Page 8 of 20
(page number not for citation purposes)
BMC Genomics 2009, 10:420
http://www.biomedcentral.com/1471-2164/10/420
A
B
IGFBP5
Nptx1
PTPRF
CXL5
PLAUR
PCSK2