LD‐informed deep learning for Alzheimer's gene loci detection using WGS data
Taeho Jo, Paula Bice, Kwangsik Nho, Andrew J. Saykin, the Alzheimer's Disease Sequencing Project, Alzheimer & Dementia TRCI (2025) Deep‐Block is a multi‐stage deep learning framework designed to detect AD associated genetic loci in large‐scale WGS data. It segments the genome based on linkage disequilibrium, applies sparse attention to select key blocks, and evaluates SNP feature importance with TabNet/RF. In a study of 7416 participants, 30,218 LD blocks were identified, including novel variants and established APOE loci. The results were supported by eQTL analysis across 13 brain regions and comparisons to existing GWAS data.
Taeho Jo, Junpyo Kima, Paula Bice, Kevin Huynh, Tingting Wang, Matthias Arnold, Peter J. Meikle, Corey Giles, Rima Kaddurah-Daoukf, Andrew J. Saykina, Kwangsik Nho, eBioMedicine (2023) This study introduces the Circular-Sliding Window Association Test (c-SWAT), a methodology designed to enhance the diagnostic classification of AD using serum-based metabolomics data, with a focus on lipidomics. Leveraging data from 997 participants, c-SWAT integrates feature correlation analysis, feature selection via CNN, and final classification through Random Forest, achieving an accuracy of up to 80.8% and an AUC of 0.808 in distinguishing AD from cognitively normal older adults.
Taeho Jo, Kwangsik Nho, Shannon L. Risacher, Andrew J. Saykin, AAIC (2023) This study introduces a new deep learning method using CNNs to analyze tau PET images and identify Alzheimer's Disease (AD) related patterns. The method achieved a 90.8% accuracy in classifying AD and highlighted significant tau deposition regions associated with AD. Additionally, we used the SWAT method to find AD-related SNPs, uncovering key genetic loci, including the known APOE regions, and achieved an AUC of 0.82.
Deep Learning-based SWAT-Tab Approach for Identifying Genetic Variants using Whole Genome Sequencing
Taeho Jo, Kwangsik Nho, Andrew J. Saykin, AAIC (2023) The study introduces SWAT-TAB, an evolved form of SWAT-CNN, optimized for identifying genetic variants in Alzheimer's disease (AD). It utilizes the Tabnet algorithm to meticulously select relevant features using a concept called sequential attention and was applied to ADSP WGS data, revealing pivotal genetic features. SWAT-TAB demonstrated enhanced efficiency, offering reduced processing time and improved ease of implementation compared to its predecessor.
Taeho Jo, Junpyo Kim, Paula Bice, Kevin Huynh, Tingting Wang, Peter J Meikle, Rima Kaddurah-Daouk, Kwangsik Nho, Andrew J. Saykin, AAIC (2022) We used serum-based cross-sectional lipidome data with 781 lipids from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including 216 cognitively normal (CN), 635 MCI, and 382 dementia (AD). Phenotype influence scores (PIS) was derived by deep learning-based circling Sliding Window Association Test approach (Circling SWAT), an extension of SWAT (Jo et al., 2022) with correlation heatmap and dendrogram analysis for omics data with minimal features.
Taeho Jo, Kwangsik Nho, Paula Bice, Andrew J Saykin, For The Alzheimer’s Disease Neuroimaging Initiative, Briefings in Bioinformatics (2022) We propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. We tested our approach using GWAS data from the ADNI including (N = 981; CN = 650, AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an AUC of 0.82.
Deep learning–based genome-wide association analysis in Alzheimer’s disease
Taeho Jo, Kwangsik Nho, Andrew J. Saykin, AAIC (2021) We used genome-wide genotyping data (12,448,786 SNPs following imputation) from 916 participants in the Alzheimer’s Disease Neuroimaging Initiative (458 cognitively normal controls and 458 AD patients). A convolutional neural network (CNN) consisting of convolutional, pooling and fully connected Softmax layers was used in a two-stage approach.
Deep learning detection of informative features in tau PET for Alzheimer’s disease classification
Taeho Jo, Kwangsik Nho, Shannon L. Risacher & Andrew J. Saykin for the Alzheimer’s Neuroimaging Initiative, BMC Bioinformatics (2020) We developed a deep learning-based framework to identify informative features for AD classification using tau positron emission tomography (PET) scans. The 3D convolutional neural network (CNN)-based classification model of AD from cognitively normal (CN) yielded an average accuracy of 90.8% based on five-fold cross-validation. The LRP model identified the brain regions in tau PET images that contributed most to the AD classification from CN.
Taeho Jo, Kwangsik Nho, Shannon L. Risacher, Andrew J. Saykin, AAIC (2020) We downloaded 458 tau PET images (196 CN, 196 MCI, and 66 AD) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and included only one scan per individual. SPM12 was used to process the tau PET data using standard techniques. We used a 3D convolution neural network (CNN) method for the classification, and applied a layer-wise relevance propagation (LRP) algorithm to identify informative features and to visualize the classification results. Five-fold cross validation was applied, where 70% of the entire data set was used for model training, 20% for model testing, and 10% for independent validation.
Taeho Jo, Kwangsik Nho, Andrew J. Saykin, Frontiers in Aging Neuroscience (2019) The application of deep learning to early detection and automated classification of AD has recently gained considerable attention, as rapid progress in neuroimaging techniques has generated large-scale multimodal neuroimaging data. A systematic review of publications using deep learning and neuroimaging data for diagnostic classification of AD was performed. A PubMed and Google Scholar search was used to identify deep learning papers on AD published between Jan 2013 and July 2018. These papers were reviewed, evaluated, and classified by algorithm and neuroimaging type, and the findings were summarized.
Taeho Jo, Kwangsik Nho, Shannon L. Risacher, Andrew J. Saykin, AAIC (2019) Demographic information, 3D MRI and PET image data, and APOE data were downloaded from the ADNI data repository (N=329; 185 CN and 144 AD). In our novel Multimodal-3DCNN approach, we first applied 3D Convolutional Neural Network (3D-CNN) to multimodal neuroimaging (MRI and PET) and then combined the output of 3D-CNN with APOE ε4 genotype and demographic information (age, sex, education, handedness etc.) using a gram matrix method (mCNN; Jo et al. AAIC2018). Finally, Deep Neural Network (DNN) was used to distinguish individuals with AD from CN. A 5-fold cross validation approach was employed to evaluate performance.
Taeho Jo, Kwangsik Nho, Shannon L. Risacher, Jingwen Yan, Andrew J. Saykin, AAIC (2018) Intermediate layers of the CNN were extracted, and the patient's clinical information was added by the gram matrix method. The clinical information was encoded as 2D matrices in this method, and the 2D images were extracted for train set by using the hippocampal segmentations, downloaded from the LONI ADNI site, carried out using Surgical Navigation Technologies (SNT). CNN with augmentation was performed on baseline scans from 103 participants with AD, 144 cognitively normal (CN) controls. Global CDR scores and the number of APOE ε4 alleles were included as clinical and genetic data.
Evaluation of Protein Structural Models Using Random Forests
Renzhi Cao, Taeho Jo, Jianlin Cheng, arXiv (2016) We propose a new protein quality assessment method which can predict both local and global quality of the protein 3D structural models. Our method uses both multi and single model quality assessment method for global quality assessment, and uses chemical, physical, geo-metrical features, and global quality score for local quality assessment. CASP9 targets are used to generate the features for local quality assessment. We evaluate the performance of our local quality assessment method on CASP10, which is comparable with two stage-of-art QA methods based on the average absolute distance between the real and predicted distance. We blindly tested our method on CASP11, and the good performance shows that combining single and multiple model quality assessment method could be a good way to improve the accuracy of model quality assessment.
Improving Protein Fold Recognition by Deep Learning Networks
Taeho Jo, Jie Hou, Jesse Eickholt & Jianlin Cheng, Scientific Reports (2015) The three–dimensional structure of Heterosigma akashiwo Na+–ATPase (HANA) was predicted by means of homology modeling based on the crystal structure of the K+–bound form of shark Na+/K+–ATPase (PDB ID: 2ZXE). The overall structure of HANA appears to be similar to that of shark Na+/K+–ATPase. Both contain three characteristic cytoplasmic domains, A, N and P, which are unique to P–type ATPases. HANA has a long TM7–8 junction as a large extracellular domain, in place of the β–subunit of shark Na+/K+–ATPase. Two putative K+–binding sites in the transmembrane domain of HANA were identified by means of valence mapping based on the constructed structure. The presence of K+–binding sites and the reported ion requirements for ATPase activity and EP formation indicate that HANA may transport K+ ions in the same manner as animal Na+/K+–ATPas...
Improving protein fold recognition by random forest
Taeho Jo & Jianlin Cheng, BMC Bioinformatics (2014) RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels.
Homology Modeling of an Algal Membrane Protein, Heterosigma Akashiwo Na^+-ATPase
Taeho Jo, Mariko Shono, Masato Wada, Sayaka Ito, Junko Nomoto, Yukichi Hara, Membrane (2010) The three–dimensional structure of Heterosigma akashiwo Na+–ATPase (HANA) was predicted by means of homology modeling based on the crystal structure of the K+–bound form of shark Na+/K+–ATPase (PDB ID: 2ZXE). The overall structure of HANA appears to be similar to that of shark Na+/K+–ATPase. Both contain three characteristic cytoplasmic domains, A, N and P, which are unique to P–type ATPases. HANA has a long TM7–8 junction as a large extracellular domain, in place of the β–subunit of shark Na+/K+–ATPase. Two putative K+–binding sites in the transmembrane domain of HANA were identified by means of valence mapping based on the constructed structure. The presence of K+–binding sites and the reported ion requirements for ATPase activity and EP formation indicate that HANA may transport K+ ions in the same manner as animal Na+/K+...
- 1