Информация о публикации

Просмотр записей
Инд. авторы: Kulakovskiy I.V., Makeev V.J., Vorontsov I.E., Kasianov A.S., Medvedeva Y.A., Yevshin I.S., Kolpakov F.A., Soboleva A.V., Ashoor H.., Ba-alawi W.., Bajic V.B.
Заглавие: HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models
Библ. ссылка: Kulakovskiy I.V., Makeev V.J., Vorontsov I.E., Kasianov A.S., Medvedeva Y.A., Yevshin I.S., Kolpakov F.A., Soboleva A.V., Ashoor H., Ba-alawi W., Bajic V.B. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models // Nucleic Acids Research. - 2016. - Vol.44. - Iss. D1. - P.D116-D125. - ISSN 0305-1048. - EISSN 1362-4962.
Внешние системы: DOI: 10.1093/nar/gkv1249; РИНЦ: 26838360; PubMed: 26586801; SCOPUS: 2-s2.0-84976873260; WoS: 000371261700016;
Реферат: eng: Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
Ключевые слова: MUTATIONS; MOTIFS; ELEMENTS; SEQUENCES; DATABASE; HUMAN GENOME; DNA-BINDING; CHIP-SEQ DATA; EXPRESSION; DISCOVERY;
Издано: 2016
Цитирование:
1. Ravasi, T., Suzuki, H., Cannistraci, C.V., Katayama, S., Bajic, V.B., Tan, K., Akalin, A., Schmeier, S., Kanamori-Katayama, M., Bertin, N. et al. (2010) An atlas of combinatorial transcriptional regulation in mouse and man. Cell, 140, 744-752.
2. Melton, C., Reuter, J.A., Spacek, D.V. and Snyder, M. (2015) Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet., 47, 710-716.
3. Kamanu, F.K., Medvedeva, Y.A., Schaefer, U., Jankovic, B.R., Archer, J.A.C. and Bajic, V.B. (2012) Mutations and binding sites of human transcription factors. Front. Genet., 3, 100.
4. Kazemian, M., Pham, H., Wolfe, S.A., Brodsky, M.H. and Sinha, S. (2013) Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res., 41, 8237-8252.
5. Stormo, G.D. (2013) Introduction to Protein-DNA Interactions: Structure, Thermodynamics, and Bioinformatics. 1st edn. Cold Spring Harbor Laboratory Press, NY.
6. Siggers, T. and Gordân, R. (2014) Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res., 42, 2099-2111.
7. Kulakovskiy, I.V. and Makeev, V.J. (2013) DNA sequence motif: a jack of all trades for ChIP-Seq data. Adv. Protein Chem. Struct. Biol., 91, 135-171.
8. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A. and Luscombe, N.M. (2009) A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet., 10, 252-263.
9. Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worsley-Hunt, R., Arenillas, D.J., Buchman, S., Chen, C., Chou, A., Ienasescu, H. et al. (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res., 42, D142-D147.
10. Wingender, E. (2008) The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief. Bioinform., 9, 326-332.
11. Pachkov, M., Balwierz, P.J., Arnold, P., Ozonov, E. and van Nimwegen, E. (2013) SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res., 41, D214-D220.
12. Kulakovskiy, I. V, Medvedeva, Y.A., Schaefer, U., Kasianov, A.S., Vorontsov, I.E., Bajic, V.B. and Makeev, V.J. (2013) HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res., 41, D195-D202.
13. Stormo, G.D. (2015) DNA Motif Databases and Their Uses. Curr. Protoc. Bioinformatics, 51, 15.
14. Medvedeva, Y.A., Khamis, A.M., Kulakovskiy, I.V., Ba-Alawi, W., Bhuyan, M.S.I., Kawaji, H., Lassmann, T., Harbers, M., Forrest, A.R.R. and Bajic, V.B. (2014) Effects of cytosine methylation on transcription factor binding sites. BMC Genomics, 15, 119.
15. Pardo, L.M., Rizzu, P., Francescatto, M., Vitezic, M., Leday, G.G.R., Sanchez, J.S., Khamis, A., Takahashi, H., van de Berg, W.D.J., Medvedeva, Y.A. et al. (2013) Regional differences in gene expression and promoter usage in aged human brains. Neurobiol. Aging, 34, 1825-1836.
16. Alam, T., Medvedeva, Y.A., Jia, H., Brown, J.B., Lipovich, L. and Bajic, V.B. (2014) Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes. PLoS One, 9, e109443.
17. Khamis, A.M., Hamilton, A.R., Medvedeva, Y.A., Alam, T., Alam, I., Essack, M., Umylny, B., Jankovic, B.R., Naeger, N.L., Suzuki, M. et al. (2015) Insights into the Transcriptional Architecture of Behavioral Plasticity in the Honey Bee Apis mellifera. Sci. Rep., 5, 11136.
18. Wang, Q., Huang, J., Sun, H., Liu, J., Wang, J., Wang, Q., Qin, Q., Mei, S., Zhao, C., Yang, X. et al. (2014) CR Cistrome: a ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse. Nucleic Acids Res., 42, D450-D458.
19. Dunham, I., Kundaje, A., Aldred, S.F., Collins, P.J., Davis, C.A., Doyle, F., Epstein, C.B., Frietze, S., Harrow, J., Kaul, R. et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57-74.
20. Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K.R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G. et al. (2013) DNA-binding specificities of human transcription factors. Cell, 152, 327-339.
21. Keilwagen, J. and Grau, J. (2015) Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res., 43, e119.
22. Langmead, B., Trapnell, C., Pop, M. and Salzberg, S.L. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol., 10, R25.
23. Jothi, R., Cuddapah, S., Barski, A., Cui, K. and Zhao, K. (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res., 36, 5221-5231.
24. Kulakovskiy, I. V, Boeva, V.A., Favorov, A.V. and Makeev, V.J. (2010) Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics, 26, 2622-2623.
25. Kulakovskiy, I., Levitsky, V., Oshchepkov, D., Bryzgalov, L., Vorontsov, I. and Makeev, V. (2013) From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites. J. Bioinform. Comput. Biol., 11, 1340004.
26. Levitsky, V.G., Kulakovskiy, I.V., Ershov, N.I., Oschepkov, D.Y., Makeev, V.J., Hodgman, T.C. and Merkulova, T.I. (2014) Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genomics, 15, 80.
27. Kulakovskiy, I.V. and Makeev, V.J. (2010) Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics (Oxford)., 54, 667-674.
28. Vorontsov, I.E., Kulakovskiy, I.V. and Makeev, V.J. (2013) Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol. Biol., 8, 23.
29. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., Singh, H. and Glass, C.K. (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell, 38, 576-589.
30. Papatsenko, D., Darr, H., Kulakovskiy, I.V., Waghray, A., Makeev, V.J., MacArthur, B.D. and Lemischka, I.R. (2015) Single-cell analyses of ESCs reveal alternative pluripotent cell states and molecular mechanisms that control self-renewal. Stem Cell Rep., 5, 207-220.
31. Forrest, A.R.R., Kawaji, H., Rehli, M., Baillie, J.K., de Hoon, M.J.L., Lassmann, T., Itoh, M., Summers, K.M., Suzuki, H., Daub, C.O. et al. (2014) A promoter-level mammalian expression atlas. Nature, 507, 462-470.
32. Marinov, G.K., Kundaje, A., Park, P.J. and Wold, B.J. (2014) Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda)., 4, 209-223.
33. Wingender, E., Schoeps, T., Haubrock, M. and Dönitz, J. (2015) TFClass: a classification of human transcription factors and their rodent orthologs. Nucleic Acids Res., 43, D97-D102.
34. Dabrowski, M., Dojer, N., Krystkowiak, I., Kaminska, B. and Wilczynski, B. (2015) Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data. BMC Bioinformatics, 16, 140.
35. Touzet, H. and Varré, J.-S. (2007) Efficient and accurate P-value computation for Position Weight Matrices. Algorithms Mol. Biol., 2, 15.
36. Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. and Ukkonen, E. (2009) MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics, 25, 3181-3182.
37. Guillon, N., Tirode, F., Boeva, V., Zynovyev, A., Barillot, E. and Delattre, O. (2009) The oncogenic EWS-FLI1 protein binds in vivo GGAA microsatellite sequences with potential transcriptional activation function. PLoS One, 4, e4932.
38. Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T.W., Greven, M.C., Pierce, B.G., Dong, X., Kundaje, A., Cheng, Y. et al. (2012) Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res., 22, 1798-1812.
39. Kheradpour, P. and Kellis, M. (2014) Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res., 42, 2976-2987.
40. Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P. et al. (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res., 22, 1813-1831.
41. Bi, Y., Kim, H., Gupta, R. and Davuluri, R. V. (2011) Tree-based position weight matrix approach to model transcription factor binding site profiles. PLoS One, 6, e24210.
42. Kuttippurathu, L., Hsing, M., Liu, Y., Schmidt, B., Maskell, D.L., Lee, K., He, A., Pu, W.T. and Kong, S.W. (2011) CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments. Bioinformatics, 27, 715-717.
43. Ma, X., Kulkarni, A., Zhang, Z., Xuan, Z., Serfling, R. and Zhang, M.Q. (2012) A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information. Nucleic Acids Res., 40, e50.
44. Sebastian, A. and Contreras-Moreira, B. (2014) footprintDB: a database of transcription factors with annotated cis elements and binding interfaces. Bioinformatics, 30, 258-265.
45. Weirauch, M.T., Yang, A., Albu, M., Cote, A.G., Montenegro-Montero, A., Drewe, P., Najafabadi, H.S., Lambert, S.A., Mann, I., Cook, K. et al. (2014) Determination and inference of eukaryotic transcription factor sequence specificity. Cell, 158, 1431-1443.
46. Dolfini, D. and Mantovani, R. (2012) YB-1 (YBX1) does not bind to Y/CCAAT boxes in vivo. Oncogene, 32, 4189-4190.