Skip to content

GMOD/bam-js

Repository files navigation

NPM version Build Status

Install

$ npm install --save @gmod/bam

Usage

import { BamFile } from '@gmod/bam'

const t = new BamFile({
  bamPath: 'test.bam',
})

// note: it's required to first run getHeader before any getRecordsForRange
const header = await t.getHeader()

// this would get same records as samtools view ctgA:1-50000
const records = await t.getRecordsForRange('ctgA', 0, 50000)

The bamPath argument only works on nodejs. In the browser, you should pass bamFilehandle with a generic-filehandle2 e.g. RemoteFile

import { RemoteFile } from 'generic-filehandle2'
import { BamFile } from '@gmod/bam'

const bam = new BamFile({
  bamFilehandle: new RemoteFile('yourfile.bam'), // or a full http url
  baiFilehandle: new RemoteFile('yourfile.bam.bai'), // or a full http url
})

Input are 0-based half-open coordinates (note: not the same as samtools view coordinate inputs!)

Usage with htsget

Since 1.0.41 we support usage of the htsget protocol

Here is a small code snippet for this

import { HtsgetFile } from '@gmod/bam'

const ti = new HtsgetFile({
  baseUrl: 'http://htsnexus.rnd.dnanex.us/v1/reads',
  trackId: 'BroadHiSeqX_b37/NA12878',
})
await ti.getHeader()
const records = await ti.getRecordsForRange('1', 2000000, 2000001)

Let us know if it doesn't work for your use case.

Documentation

BAM constructor

The BAM class constructor accepts arguments

  • bamPath/bamUrl/bamFilehandle - a local file path, remote URL string, or a class object with a read method
  • csiPath/csiUrl/csiFilehandle - a CSI index for the BAM file, required for long chromosomes greater than 2^29 in length
  • baiPath/baiUrl/baiFilehandle - a BAI index for the BAM file
  • recordClass - a custom class extending BamRecord to use for records (see Custom BamRecord class section below)

Note: filehandles implement the Filehandle interface from generic-filehandle2. The path and url arguments are convenience wrappers for LocalFile and RemoteFile.

async getRecordsForRange(refName, start, end, opts)

Note: requires calling getHeader first.

  • refName - a string for the chrom to fetch from
  • start - a 0-based half open start coordinate
  • end - a 0-based half open end coordinate
  • opts.signal - an AbortSignal to indicate stop processing
  • opts.viewAsPairs - re-dispatches requests to find mate pairs. default: false
  • opts.pairAcrossChr - control the viewAsPairs option behavior to pair across chromosomes. default: false
  • opts.maxInsertSize - control the viewAsPairs option behavior to limit distance within a chromosome to fetch. default: 200kb

async getHeader(opts?)

Fetches the header from BamFile or HtsgetFile. Must be called before getRecordsForRange.

async indexCov(refName, start, end)

  • refName - a string for the chrom to fetch from
  • start - a 0-based half open start coordinate (optional)
  • end - a 0-based half open end coordinate (optional)

Returns features of the form {start, end, score} containing estimated feature density across 16kb windows in the genome

async lineCount(refName: string)

  • refName - a string for the chrom to fetch from

Returns number of features on refName, uses special pseudo-bin from the BAI/CSI index (e.g. bin 37450 from bai, returning n_mapped from SAM spec pdf) or 0 if refName does not exist in the sample

async hasRefSeq(refName: string)

  • refName - a string for the chrom to check

Returns whether we have this refName in the sample

BamRecord properties

// Core alignment fields
record.fileOffset // "file offset" based id -- not a true file offset
record.ref_id // numerical sequence id from SAM header
record.start // 0-based start coordinate
record.end // 0-based end coordinate
record.name // QNAME
record.seq // sequence string
record.qual // Uint8Array of quality scores (null if unmapped)
record.CIGAR // CIGAR string e.g. "50M2I48M"
record.flags // SAM flags integer
record.mq // mapping quality (undefined if 255)
record.strand // 1 or -1
record.template_length // TLEN

// Auxiliary data
record.tags // object with all aux tags e.g. {MD: "100", NM: 0}
record.getTag('MD') // get a single tag (more efficient than record.tags when you only need one)
record.getTagRaw('MD') // get tag as Uint8Array for string tags (avoids string conversion)
record.NUMERIC_MD // MD tag as Uint8Array (for fast mismatch rendering)
record.NUMERIC_CIGAR // Uint32Array of packed CIGAR operations
record.NUMERIC_SEQ // Uint8Array of packed sequence (4-bit encoded)

// Mate info
record.next_refid // mate reference id
record.next_pos // mate position

// Flag methods
record.isPaired()
record.isProperlyPaired()
record.isSegmentUnmapped()
record.isMateUnmapped()
record.isReverseComplemented()
record.isMateReverseComplemented()
record.isRead1()
record.isRead2()
record.isSecondary()
record.isFailedQc()
record.isDuplicate()
record.isSupplementary()

// Utility
record.seqAt(idx) // get single base at position
record.toJSON() // serialize record

Custom BamRecord class

You can provide your own BamRecord class to add custom properties or methods:

import { BamFile, BamRecord } from '@gmod/bam'

class CustomBamRecord extends BamRecord {
  get customProperty() {
    return `custom-${this.name}`
  }

  getDoubleStart() {
    return this.start * 2
  }
}

const bam = new BamFile<CustomBamRecord>({
  bamPath: 'test.bam',
  recordClass: CustomBamRecord,
})

await bam.getHeader()
const records = await bam.getRecordsForRange('ctgA', 0, 50000)
// records are typed as CustomBamRecord[]
console.log(records[0].customProperty)
console.log(records[0].getDoubleStart())

License

MIT © Colin Diesh

Publishing

Trusted publishing via GitHub Actions.

npm version patch  # or minor/major

About

Parse BAM and BAM index files in javascript for node and the browser

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors