|
| 1 | +# UCSC LiftOver Tool |
| 2 | + |
| 3 | +UCSC `liftOver` is a command-line tool for converting genomic coordinates between different genome assemblies using chain files. |
| 4 | + |
| 5 | +## Online Version |
| 6 | + |
| 7 | +UCSC also provides a **web-based liftOver tool** for quick conversions without installation: |
| 8 | + |
| 9 | +- **Online liftOver tool**: https://genome.ucsc.edu/cgi-bin/hgLiftOver |
| 10 | +- Upload a BED file or paste coordinates directly |
| 11 | +- Select source and target genome assemblies |
| 12 | +- Download results immediately |
| 13 | + |
| 14 | +The online tool is convenient for small-scale conversions, while the command-line tool is recommended for batch processing and automation. |
| 15 | + |
| 16 | +## Installation |
| 17 | + |
| 18 | +Download the `liftOver` binary from UCSC: |
| 19 | + |
| 20 | +- **Linux**: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver |
| 21 | +- **macOS**: http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/liftOver |
| 22 | + |
| 23 | +Make it executable: |
| 24 | +```bash |
| 25 | +chmod +x liftOver |
| 26 | +``` |
| 27 | + |
| 28 | +## Download Chain Files |
| 29 | + |
| 30 | +Download chain files from UCSC for your desired conversion (e.g., hg19 → hg38): |
| 31 | + |
| 32 | +```bash |
| 33 | +wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz |
| 34 | +``` |
| 35 | + |
| 36 | +Common chain files: |
| 37 | + |
| 38 | +- `hg19ToHg38.over.chain.gz` (hg19 → hg38) |
| 39 | +- `hg38ToHg19.over.chain.gz` (hg38 → hg19) |
| 40 | +- `hg18ToHg19.over.chain.gz` (hg18 → hg19) |
| 41 | + |
| 42 | +## Basic Usage |
| 43 | + |
| 44 | +```bash |
| 45 | +liftOver input.bed chain_file.chain.gz output.bed unmapped.bed |
| 46 | +``` |
| 47 | + |
| 48 | +**Arguments:** |
| 49 | + |
| 50 | +- `input.bed`: Input file in BED format (0-based, half-open intervals) |
| 51 | +- `chain_file.chain.gz`: Chain file for the conversion |
| 52 | +- `output.bed`: Successfully lifted coordinates |
| 53 | +- `unmapped.bed`: Failed coordinates with failure reasons |
| 54 | + |
| 55 | +## Input Format (BED) |
| 56 | + |
| 57 | +BED format uses **0-based, half-open intervals**: |
| 58 | + |
| 59 | +```text |
| 60 | +chr1 1000 1001 rs123 |
| 61 | +chr1 2000 2001 rs456 |
| 62 | +chr2 5000 5001 rs789 |
| 63 | +``` |
| 64 | + |
| 65 | +Columns: |
| 66 | +1. Chromosome name |
| 67 | +2. Start position (0-based) |
| 68 | +3. End position (0-based, exclusive) |
| 69 | +4. Name/ID (optional, but useful for tracking) |
| 70 | + |
| 71 | +!!! warning "Coordinate System" |
| 72 | + BED format is **0-based**. If your coordinates are **1-based** (e.g., from VCF or sumstats), convert them first: |
| 73 | + - 1-based position `N` → BED start: `N-1`, BED end: `N` |
| 74 | + |
| 75 | +## Common Options |
| 76 | + |
| 77 | +```bash |
| 78 | +liftOver -minMatch=0.95 input.bed chain_file.chain.gz output.bed unmapped.bed |
| 79 | +``` |
| 80 | + |
| 81 | +- `-minMatch=0.95`: Minimum match ratio for intervals (default: 0.95) |
| 82 | +- `-multiple`: Allow multiple mappings (default: drop ambiguous mappings) |
| 83 | + |
| 84 | +!!! example "Example" |
| 85 | + Convert SNP positions from hg19 to hg38: |
| 86 | + |
| 87 | + ```bash |
| 88 | + # Create input BED file (0-based) |
| 89 | + cat > snps_hg19.bed << EOF |
| 90 | + chr1 1000000 1000001 rs123 |
| 91 | + chr1 2000000 2000001 rs456 |
| 92 | + chr2 5000000 5000001 rs789 |
| 93 | + EOF |
| 94 | + |
| 95 | + # Run liftover |
| 96 | + liftOver snps_hg19.bed hg19ToHg38.over.chain.gz snps_hg38.bed snps_unmapped.bed |
| 97 | + |
| 98 | + # Check results |
| 99 | + echo "Successfully lifted:" |
| 100 | + wc -l snps_hg38.bed |
| 101 | + |
| 102 | + echo "Failed:" |
| 103 | + wc -l snps_unmapped.bed |
| 104 | + ``` |
| 105 | + |
| 106 | +!!! example "Simple Example: Liftover chr1 from BIM File" |
| 107 | + This example demonstrates how to extract chromosome 1 positions from a PLINK BIM file and convert them from hg19 to hg38: |
| 108 | + |
| 109 | + ```bash |
| 110 | + # Extract chr1 positions from BIM file and convert to BED format |
| 111 | + # BIM format: chr variant_id genetic_distance position(1-based) allele1 allele2 |
| 112 | + # BED format: chr start(0-based) end(0-based) variant_id |
| 113 | + |
| 114 | + awk '$1==1 {print "chr1\t" ($4-1) "\t" $4 "\t" $2}' \ |
| 115 | + 01_Dataset/1KG.EAS.auto.snp.norm.nodup.split.rare002.common015.missing.bim \ |
| 116 | + > chr1_hg19.bed |
| 117 | + |
| 118 | + # Run liftover |
| 119 | + liftOver chr1_hg19.bed hg19ToHg38.over.chain.gz chr1_hg38.bed chr1_unmapped.bed |
| 120 | + |
| 121 | + # Check results |
| 122 | + echo "Total input positions:" |
| 123 | + wc -l chr1_hg19.bed |
| 124 | + |
| 125 | + echo "Successfully lifted:" |
| 126 | + wc -l chr1_hg38.bed |
| 127 | + |
| 128 | + echo "Failed:" |
| 129 | + wc -l chr1_unmapped.bed |
| 130 | + |
| 131 | + # View first few successfully lifted positions |
| 132 | + echo "First 5 lifted positions:" |
| 133 | + head -5 chr1_hg38.bed |
| 134 | + ``` |
| 135 | + |
| 136 | + ``` |
| 137 | + ========================================== |
| 138 | + Liftover Example: chr1 from BIM file |
| 139 | + ========================================== |
| 140 | + |
| 141 | + Step 1: Extracting chr1 positions from BIM file... |
| 142 | + Input: ../01_Dataset/1KG.EAS.auto.snp.norm.nodup.split.rare002.common015.missing.bim |
| 143 | + Converting 1-based BIM coordinates to 0-based BED format... |
| 144 | + Extracted 97655 chr1 positions |
| 145 | + Output: chr1_hg19.bed |
| 146 | + |
| 147 | + Step 2: Running liftover (hg19 → hg38)... |
| 148 | + Chain file: hg19ToHg38.over.chain.gz |
| 149 | + This may take a few minutes... |
| 150 | + Reading liftover chains |
| 151 | + Mapping coordinates |
| 152 | + Liftover completed |
| 153 | + |
| 154 | + Step 3: Results summary |
| 155 | + ========================================== |
| 156 | + Total input positions: 97655 |
| 157 | + Successfully lifted: 97526 |
| 158 | + Success rate: 99.87% |
| 159 | + Failed: 258 |
| 160 | + |
| 161 | + First 5 failed positions: |
| 162 | + #Deleted in new |
| 163 | + chr1 1590525 1590526 1:1590526:G:C |
| 164 | + #Deleted in new |
| 165 | + chr1 1590574 1590575 1:1590575:G:A |
| 166 | + #Deleted in new |
| 167 | + ========================================== |
| 168 | + |
| 169 | + Step 4: Example lifted positions (first 5): |
| 170 | + Format: chr start(0-based) end(0-based) variant_id |
| 171 | + chr1 14929 14930 1:14930:A:G |
| 172 | + chr1 15773 15774 1:15774:G:A |
| 173 | + chr1 15776 15777 1:15777:A:G |
| 174 | + chr1 57291 57292 1:57292:C:T |
| 175 | + chr1 77873 77874 1:77874:G:A |
| 176 | + |
| 177 | + Output files: |
| 178 | + Input BED (hg19): chr1_hg19.bed |
| 179 | + Output BED (hg38): chr1_hg38.bed |
| 180 | + Unmapped positions: chr1_unmapped.bed |
| 181 | + ``` |
| 182 | + |
| 183 | + **Key points:** |
| 184 | + |
| 185 | + - BIM files use **1-based coordinates**, so we subtract 1 to convert to 0-based BED format |
| 186 | + - The BED end position is `position` (same as start+1 for single-base variants) |
| 187 | + - The variant ID from column 2 is preserved in the BED file for tracking |
| 188 | + |
| 189 | + See `liftover_chr1_example.sh` for a complete script that performs this conversion. |
| 190 | + |
| 191 | +## Output Files |
| 192 | + |
| 193 | +- **`output.bed`**: Contains successfully lifted coordinates in the target assembly |
| 194 | +- **`unmapped.bed`**: Contains failed coordinates with reasons (e.g., "No chain found", "Multiple mappings") |
| 195 | + |
| 196 | +## Tips |
| 197 | + |
| 198 | +- Always check the `unmapped.bed` file to see which positions failed and why |
| 199 | +- For sumstats, convert 1-based positions to 0-based BED format before liftover |
| 200 | +- After liftover, convert back to 1-based if needed for downstream analysis |
| 201 | +- Some positions may fail due to assembly differences (centromeres, gaps, duplications) — this is expected |
| 202 | + |
| 203 | +## References |
| 204 | + |
| 205 | +- [UCSC LiftOver Tool](https://genome.ucsc.edu/cgi-bin/hgLiftOver) - Tool for converting coordinates between genome assemblies |
| 206 | +- UCSC liftOver tool: **Download and documentation** |
| 207 | + http://hgdownload.soe.ucsc.edu/admin/exe/ |
| 208 | +- UCSC Genome Browser: **liftOver tool** |
| 209 | + https://genome.ucsc.edu/cgi-bin/hgLiftOver |
0 commit comments