Whole genome association analysis toolset
4 stars based on
By default, any --recode option, and also --make-bed will preserve all genotypes exactly as they are. To set to missing Mendel errors or heterozygous haploid calls, use the options --set-me-missing and --set-hh-missing respectively.
For the former, you will also need to specify --me 1 1 i. It is sometimes useful to have a PED file that is tab-delimited, except that between alleles of the same genotype a space instead of a tab is used.
A file formatted in this way can load into Excel, for example, as a tab-delimited file, but with one genotype per column instead of one allele per column. Use the option --tab as well as --recode or --recode12 to achieve this effect. Command reference table List of options List of output files Under development 5.
Permutation procedures Basic permutation Adaptive permutation max T permutation Ranked permutation Gene-dropping Within-cluster Permuted phenotypes files Imputation beta Isolate large text and binary data reference set Basic association test Modifying parameters Imputing discrete calls Verbose output options Dosage data Input file formats Association analysis Outputting dosage data Meta-analysis Basic usage Misc.
Annotation Basic usage Misc. LD-based results clumping Basic usage Verbose reporting Combining multiple studies Best single proxy Gene-based report Basic usage Other options Simulation tools Basic usage Resampling a population Quantitative traits Flow-chart Order of commands Recode and reorder a sample A basic, but often useful feature, is to output a dataset: Also, if --output-missing-genotype is specified which can be as well as --missing-genotype then this value will be used instead i.
The --make-bed option does the same as --recode but creates binary files; these can also be filtered, etc, as described below. In contrast, plink --file data --recode12 will recode the alleles as 1 and 2 isolate large text and binary data the missing genotype will always be 0.
Isolate large text and binary data these commands will create two new files plink. Unless manually specified, for all these options, the usual filters for missingness and allele frequency will be set so as not to exclude any SNPs or individuals.
By explicitly including an option, e. These flags should be used in conjunction with a data generation command e. Alleles other than A,C,G,T or 1,2,3,4 will be left unchanged. To make a new file in which non-founders without both parents also in the same fileset are recoded as founders i. Transposed genotype files When using either --recode or --recode12you can obtain a transposed text genotype file by adding the --transpose option.
This generates two files: The order of individuals in this file is the same as the order across the columns of the TPED file. Additive and dominance components The following format is often useful if one wants to use a standard, non-genetic statistical package to analyse the data, as here genotypes are coded as a single allele dosage number.
To create a file with SNP genotypes recoded in terms of additive and dominant components, use the option: The --recodeAD option produces both an additive and dominance coding: The --recodeAD option saves the data to a single file plink. This file can be easily loaded into R: The additive count of the number of common 1 alleles is therefore: The behavior of isolate large text and binary data --recodeA and --recodeAD commands can be changed with the --recode-allele command.
This allows for the 0, 1, 2 count to reflect the number of a pre-specified allele type per SNP, rather than the number of the minor allele. This command takes as a single argument the name of a file that lists SNP name and allele to report, e.
If an allele is isolate large text and binary data in --recode-allele that is not seen in the data, similarly all individuals will receive a 0 count i. NOTE For alleles that have exactly 0. Listing by minor allele count The command --recode-rlist will generate a files plink. For example, consider a particular SNP, rs has a minor allele G seen twice in two heterozygotes and two individuals with a missing genotpe; all other individuals are homozygous for the major allele.
In this case, we would see two rows in the pink. This command could be used in isolate large text and binary data with the --reference command and --freq to list all instances of rare non-reference alleles, e. The --with-reference with generate a fourth file plink.
Listing by genotype Another format that might sometimes be useful is the --list option which genetes a file plink.
For example, if we have isolate large text and binary data file with two SNPs rs and rs both on chromosome 1: This option is often useful in conjunction with --snpif you want an easy breakdown of which individuals have which genotypes. Update SNP information To automatically update either the genetic or physical positions for some or all SNPs in a dataset, use the --update-map command, which takes a single parameter of a filename, e.
To change genetic position 3rd column in map file add the flag --update-cm as well as --update-map. There is no way to change chromosome codes using this command.
Normally, one would want to save the new file with the changed positions, as in the example above, although one could combine other commands instead e. SNPs not in this file will be left unchanged. If a SNP is listed more than once in the file, an error will be reported. If this is the case, a message will be written to the LOG file. Although the positions are updated, the order is not changed internally: For example, the if the original contains Only after saving and reloading e.
This will only be an issue for commands which rely on relative SNP positions e. If the LOG file does not show a message that the order of SNPs has changed after using --update-mapone need not worry. The name and chromosome code of a SNP can also be changed, by adding the modifiers --update-name or --update-chre.
You cannot update more than one isolate large text and binary data at a time for SNPs. Update allele information To recode alleles, for example from A,B allele coding to A,C,G,T coding, use the command --update-allelesfor example. Force a specific reference allele It is possible to manually specify which allele is the A1 allele and which is A2. By default, the minor allele is assigned to be A1. All odds ratios, etc, are calculated with respect to the A1 allele i. To set a particular allele as A1which might not be the minor isolate large text and binary data, use the command --reference-allelewhich can be used with any other analysis or data generation command, e.
This command can make comparing results across studies easier, so that odds ratios reported can be made to be in the same direction as the other study, for example. Update individual information Rather than try to manually edit PED or FAM files which is not adviseduse these functions to change ID codes, sex and parental information for individuals in a fileset. The command plink --bfile mydata --update-ids recoded. Not all people need be listed in the file they will not be changed; the order of the file need not match the original dataset.
Two simular commands but that cannot be run at the same time as --update-ids are --update-sex myfile1. With all of these commands, you need to issue a data output command --make-bed--recodeetc for the changes to be preserved.
Write covariate files If a covariate file is specified along with any of the above --recode options or with --make-bedthen that covariate file isolate large text and binary data also be written, as plink. This option is useful if the covariate file has a different number of individuals, or is ordered differently, to produce a set of covariate isolate large text and binary data that line up more easily with the newly-created genotype and phenotype files.
If you want just to create a revised version of the covariate file, but without creating a new set of genotype files, then use the --write-covar option. To also include phenotype information in the plink. This can be useful, for example, when used in conjunction with --recodeA to generate the files needed to replicate an analysis in R e.
To recode a categorical variable to a set of binary dummy variables, add the command --dummy-coding for example. A 1 5 0. Note that one level is automatically excluded 1 in this case, i. The command can operate on multiple covariates in a single file at the same time.
Note that missing values are correctly handled i. Write cluster files Similar to --write-covarthe --write-cluster will output the single selected cluster from the file specified by --within. Unlike covariate files, this allows string labels to be used. The --dummy-coding can not currently be used with --write-cluster however. To flip strand for just a subset of the sample e.
HINT When merging two datasets, it is clearly very important that the two sets of SNPs are concordant in terms of positive or negative strand. Whereas some mismatches will be easy to spot as more than two alleles will be observed in the merged dataset, other instances will not be so easy to spot, i. Using LD to identify incorrect strand assignment in isolate large text and binary data subset of the sample If cases and controls have been genotyped separately and then the data merged, it is always possible that strand has been incorrectly or incompletely assigned to each SNP, meaning that the merged data may contain a number of SNPs for which the allele coding differs between cases and controls or between any other grouping, such as collection site, etc.
If the two mis-matched groups correspond to cases and controls exactly, then rare SNPs will show a very strong association with disease e. More common SNPs could show intermediate levels of association that might be easier to confuse with a real signal. A simple approach to detect some proportion of such SNPs uses differential patterns of LD in cases versus controls: For these SNP pairs, it counts the number of times the signed correlation is different in sign between cases isolate large text and binary data controls a negative LD pair versus the same a positive LD pair.
For example, the command plink --bfile mydata --flip-scan produces the output file plink. In contrast, there is not a single SNP for which both cases and controls have a consistent pattern of LD. So, in this particular case, it would suggest that stand is flipped in either cases or controls. To display the specific sets of correlations in cases and controls for each SNP, add the option --flip-scan-verbose which generates a file plink. This latter class of SNP would not cause problems of spurious association in single SNP analysis, but it could cause severe problems in haplotype and imputation analysis.