Introduction

BHIT, a novel Bayesian partition computational method for detecting SNP interactions (epistasis). The proposed approach builds a Bayesian model on both continuous data and discrete data to partition multiple-phenotype data. Comparing with other methods on both simulation data and real data, the key strengths of BHIT are as follows: (i) With the advanced Bayesian model equipped with MCMC search, BHIT can efficiently explore high-order interactions. (ii) BHIT has a flexible Bayesian model on continuous and discrete data, so that both continuous and discrete phenotypes could be handled simultaneously, and the interaction within or between phenotypes and genetic data can also be detected.

Request Download BHIT

I agree to use BHIT for academic research or teaching purposes only. I will not use it for any commercial purpose, or redistribute this program to other users. I agree that any public report or publication of results obtained using BHIT will acknowledge its use by an appropriate citation of this website.
I agree the term

Prerequisite

Usage:

BHIT inputfilename outputfilename iterNum burninNum observNum SNPNum PhenoNum MAF newRuningTag

Input Example:

Example in input.txt
1 2 3 2 1 0.55 0.48
2 1 1 3 2 0.86 0.37
2 2 1 1 3 0.10 0.76
...
Each line represents one observation. One observation includes 5 SNP and followed by 2 quantitative phenotype. Each SNP is represented as a digit. 1 represents homozygous major allel; 3 represents homozygous minor allel; 2 represents heterozytous.
Note: Users could easily use convert.pl to convert PLINK raw files to inputfiles (To convert your ped or bed file to a raw file use the plink --recodeA option).

Output:

Contains 3 output files. The following tables are the contents of the output files for each of the above data sets; the output shows the proportion of times each data set spent in each partition group after the burn-in period:
	0	1	
D1	0	1	
D2	0	1	
D3	0	1	
D4	1	0	
D5	1	0	
C1	0	1	
C2	0	1
The 0 column represents independent data sets, and other columns show D1, D2, D3, C1 and C2 are in one partition by identifier 1.

Flowchart of BHIT pipeline

The BHIT pipeline on general species is shown as Figure below. In preprocessing stage, missing data imputation methods (Nputet, fastPHASE, etc.) should be applied to fill the blank space if missing value exists in the genotype data. Then we filter SNP with MAF less than 0.05. All the genotype data should be convert to appropriate data format by PLINK --recodeA. If the input phenotype has continuous trait, whether it follows the normal distribution should be checked by Kolmogorov-Smirnov test. After that, both genotype and phenotype data should be combine together and converted to BHIT file format by script provided by BHIT website. In order to dealing with genome-wide SNPs, we provide three strategies to use BHIT in the pipeline. Strategy A has a two stages, feature selection methods (LASSO, etc) could be used first to filter all SNPs and run BHIT only on the filtered set of SNPs. Strategy B splits all SNPs into different chromosomes and run BHIT on individual chromosomes. Strategy C is mainly focused on SNPs located on protein-coding regions and/or located several known regions users defined. In the end, check and validate all the results. FlowChart

Contact:

If you have questions and suggestions, please contact with Juexin Wang (wangjue@missouri.edu)

Reference:

Juexin Wang, Trupti Joshi, Babu Valliyodan, Haiying Shi, Yanchun Liang, Henry T. Nguyen, Jing Zhang, and Dong Xu. "A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies." BMC genomics 16, no. 1 (2015): 1011 PubMed PaperLink