Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

@gmod/tabix

GMOD10.4kMIT3.1.0TypeScript support: included

Read Tabix-indexed files, supports both .tbi and .csi indexes

bionode, biojs, genomics

readme

@gmod/tabix

NPM version Coverage Status Build Status

Read Tabix-indexed files using either .tbi or .csi indexes.

Install

$ npm install --save @gmod/tabix

Usage

Importing the module

// import with require in node.js
const { TabixIndexedFile } = require('@gmod/tabix')

// or with es6 imports, this will also give typescript types
import { TabixIndexedFile } from '@gmod/tabix'

Single file bundle

You can use tabix-js without NPM also with the tabix-bundle.js. See the example directory for usage with script tag example/index.html

<script src="https://unpkg.com/@gmod/tabix/dist/tabix-bundle.js"></script>

TabixIndexedFile constructor

Basic usage of TabixIndexedFile under node.js supplies a path and optionally a tbiPath to the constructor. If no tbiPath is supplied, it assumes that the path+'.tbi' is the location of the tbiPath.

// basic usage under node.js provides a file path on the filesystem to bgzipped file
// it assumes the tbi file is path+'.tbi' if no tbiPath is supplied
const tbiIndexed = new TabixIndexedFile({
    path: 'path/to/my/file.gz'
    tbiPath: 'path/to/my/file.gz.tbi'
})

You can also use CSI indexes. Note also the usage of the renameRefSeqs callback. The renameRefSeqs callback makes it so that you can use file.getLines('1',0,100,...) even when the file itself contains names like 'chr1' (can also do the reverse by customizing the renameRefSeqs callback)

// can also open tabix files that have a .csi index
// note also usage of renameRefSeqs callback to trim chr off the chr names
const csiIndexed = new TabixIndexedFile({
  path: 'path/to/my/file.gz',
  csiPath: 'path/to/my/file.gz.csi'
  renameRefSeqs: refSeq => refSeq.replace('chr','')
})

TabixIndexedFile constructor with remote files

const remoteTbiIndexed = new TabixIndexedFile({
  url: 'http://yourhost/file.vcf.gz',
  tbiUrl: 'http://yourhost/file.vcf.gz.tbi', // can also be csiUrl
})

You can also alternatively supply a filehandle-like object with the generic-filehandle2: example

// use a remote file or other filehandle, note RemoteFile comes from https://github.com/GMOD/generic-filehandle2
const { RemoteFile } = require('generic-filehandle2')
const remoteTbiIndexed = new TabixIndexedFile({
  filehandle: new RemoteFile('http://yourhost/file.vcf.gz'),
  tbiFilehandle: new RemoteFile('http://yourhost/file.vcf.gz.tbi'), // can also be csiFilehandle
})

This works in both the browser and in node.js, but note that in node.js you may have to also supply a custom fetch function to the RemoteFile constructor e.g. like this

// for node.js you have to manually supply a fetch function e.g. node-fetch to RemoteFile
const fetch = require('node-fetch')
const remoteTbiIndexedForNodeJs = new TabixIndexedFile({
  filehandle: new RemoteFile('http://yourhost/file.vcf.gz', { fetch }),
  tbiFilehandle: new RemoteFile('http://yourhost/file.vcf.gz.tbi', { fetch }), // can also be csiFilehandle
})

getLines

The basic function this module provides is just called getLines and it returns text contents from the tabix file (it unzips the bgzipped data) and supplies it to a callback that you provide one line at a time.

Important: the start and end values that are supplied to getLines are 0-based half-open coordinates. This is different from the 1-based values that are supplied to the tabix command line tool

// iterate over lines in the specified region
const lines = []
await tbiIndexed.getLines('ctgA', 200, 300, function (line, fileOffset) {
  lines.push(line)
})

After running this, your lines array would contain an array of lines from the file that match your query range

You can also supply some extra arguments to getLines with this format, but these are sort of obscure and only used in some circumstances

const lines = []
const aborter = new AbortController()
await tbiIndexed.getLines('ctgA', 200, 300, {
  lineCallback: (line, fileOffset) => lines.push(line),
  signal: aborter.signal, // an optional AbortSignal from an AbortController
})

After running the above demo, lines is now an array of strings, containing the lines from the tabix file

Notes about the returned values of getLines:

  • commented (meta) lines are skipped.
  • line strings do not include any trailing whitespace characters.
  • the callback is also called with a fileOffset that can be used to uniquely identify lines based on their virtual file offset where the line is found in the file
  • if getLines is called with an undefined end parameter it gets all lines from start going to the end of the contig e.g.
const lines = []
await tbiIndexed.getLines('ctgA', 0, undefined, line=>lines.push(line))`
console.log(lines)

API (auto-generated)

TabixIndexedFile

Table of Contents

constructor

Parameters
  • args object
    • args.path string?
    • args.filehandle filehandle?
    • args.url url?
    • args.tbiPath string?
    • args.tbiUrl tbiUrl?
    • args.tbiFilehandle filehandle?
    • args.csiPath string?
    • args.csiUrl csiUrl?
    • args.csiFilehandle filehandle?
    • args.yieldTime number? yield to main thread after N milliseconds if reading features is taking a long time to avoid hanging main thread (optional, default 500)
    • args.renameRefSeqs function? optional function with sig string => string to transform reference sequence names for the purpose of indexing and querying. note that the data that is returned is not altered, just the names of the reference sequences that are used for querying. (optional, default n=>n)
    • args.chunkCacheSize (optional, default 5*2**20)

getLines

Parameters
  • refName string name of the reference sequence
  • s (number | undefined)
  • e (number | undefined)
  • opts (GetLinesOpts | GetLinesCallback) callback called for each line in the region. can also pass a object param containing obj.lineCallback, obj.signal, etc
  • start start of the region (in 0-based half-open coordinates)
  • end end of the region (in 0-based half-open coordinates)

Returns any promise that is resolved when the whole read is finished, rejected on error

getHeaderBuffer

get a buffer containing the "header" region of the file, which are the bytes up to the first non-meta line

Parameters
  • opts Options (optional, default {})

getHeader

get a string containing the "header" region of the file, is the portion up to the first non-meta line

Parameters
  • opts Options (optional, default {})

Returns Promise for a string

getReferenceSequenceNames

get an array of reference sequence names, in the order in which they occur in the file. reference sequence renaming is not applied to these names.

Parameters
  • opts Options (optional, default {})

checkLine

Parameters
  • metadata object metadata object from the parsed index, containing columnNumbers, metaChar, and format
  • regionRefName string
  • regionStart number region start coordinate (0-based-half-open)
  • regionEnd number region end coordinate (0-based-half-open)
  • line string

Returns object like {startCoordinate, overlaps}. overlaps is boolean, true if line is a data line that overlaps the given region

lineCount

return the approximate number of data lines in the given reference sequence

Parameters
  • refName string
  • opts Options (optional, default {})
  • refSeq reference sequence name

Returns any number of data lines present on that reference sequence

readChunk

read and uncompress the data in a chunk (composed of one or more contiguous bgzip blocks) of the file

Parameters
  • c Chunk
  • opts Options (optional, default {})

Academic Use

This package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.

License

MIT © Robert Buels

changelog

3.1.0 (2025-10-01)

3.0.5 (2025-05-26)

3.0.4 (2025-05-13)

3.0.3 (2025-05-13)

3.0.2 (2025-04-30)

3.0.1 (2025-04-30)

3.0.0 (2025-04-30)

2.0.5 (2025-03-18)

2.0.4 (2024-12-18)

2.0.3 (2024-12-18)

2.0.2 (2024-12-12)

2.0.0 (2024-12-12)

1.6.1 (2024-12-07)

1.6.0 (2024-11-30)

1.5.15 (2024-08-30)

1.5.14 (2024-07-23)

Reverts

  • Revert "Bump to eslint 9" (9bd49b1)

1.5.13 (2024-01-09)

  • Another fix for abort signal in getLines

1.5.12 (2024-01-09)

  • Add missing abort signal to the @gmod/abortable-promise-cache fetch for tabix chunks (#143)

1.5.11 (2023-07-10)

Features

  • explicit buffer import (#140) (fb80ac8)

  • Add explicit buffer import

1.5.10 (2023-03-30)

  • Remove stray console.log

1.5.9 (2023-03-27)

  • Revert the Buffer::slice -> Buffer::subarray change due to use with polyfills

1.5.8 (2023-03-24)

  • Make yieldTime optional

1.5.7 (2023-03-24)

  • Add yieldTime parameter
  • Improve typescripting

1.5.6 (2023-02-28)

  • Add fix for fileOffset being stable in presence of Unicode characters (#137)

1.5.5 (2022-12-17)

  • Use es2015 for nodejs build

1.5.4 (2022-07-18)

  • Bump generic-filehandle 2->3

1.5.3 (2022-04-25)

  • Fix esm module build to use ESM instead of CJS

1.5.2 (2021-12-15)

  • Change typescript signature of lineCallback from Promise<void> to void

1.5.1 (2021-12-15)

  • Add esm module with less babelification for smaller bundle size

1.5.0 (2020-12-11)

  • Use TextDecoder for chunk decoding for small speedup
  • Use canMergeChunks logic to avoid too large of chunks being used
  • Use time based yield instead of number-of-line based yield

1.4.6 (2020-04-30)

  • Fix regression with browser only version of tabix-js not being able to parse results in 1.4.5

1.4.5 (2020-04-28)

  • Remove the filehandle size() call because this is unnecessary and would indicate a corrupt index, and because it additionally has a CORS configuration overhead

1.4.4 (2020-04-06)

  • Fix usage of tabix where start column and end column are the same

1.4.3 (2020-02-04)

  • Fix optional param for constructor for typescript
  • Update method of calculating fileOffset based IDs using updated @gmod/bgzf-filehandle

1.4.2 (2020-02-01)

  • Fix usage of renameRefSeqs callback

1.4.1 (2020-02-01)

  • Remove a runtime dependency on a @types module

1.4.0 (2020-02-01)

  • Add typescripting of the codebase
  • Drop Node 6 support due to changes in our dependencies

1.3.2 (2019-11-01)

  • Make <TRA> SVs to ignore their usage of the END= INFO field going with the since it refers to the other side of a translocation
  • Make stable fileOffset based IDs

1.3.1 (2019-10-06)

  • Small refactor of filehandle.read() to make it more robust

1.3.0 (2019-08-08)

  • Add ability to pass an AbortSignal from an AbortController to getLines()

1.2.0 (2019-07-05)

  • Add ability for getLines to be open-ended. With no end, getlines continues until the end of the sequence.

1.1.8 (2019-06-06)

  • Add a fix for a bgzf unzipping thing that could result in duplicate features being returned

1.1.7 (2019-06-04)

  • Removed chunk merging from header file parsing which now results in smaller bgzf unzip calls being streamed out to clients

1.1.6 (2019-05-31)

  • Fix issue with headerless files returning data lines in header
  • Use generic-filehandle for localFile

1.1.5 (2019-03-05)

  • Fix parsing on a tabix file that should be csi files (e.g. too long of chromosomes)

1.1.4 (2019-02-23)

  • Upgrade to babel 7

1.1.3 (2018-11-23)

  • Change to es6-promisify and quick-lru which can be babelified to IE11 (util.promisify and lru-cache used Object.defineProperty('length', ...))

1.1.2 (2018-10-26)

  • Add VCF info field END= parsing and other file offset improvements
  • Treats VCF type differently from generic type tabix files

1.1.1 (2018-10-05)

  • Trim output to avoid CRLF in output

1.1.0 (2018-09-24)

  • Use custom bgzf block unzipping function
  • Fixes to avoid duplicate lines in output

1.0.2 (2018-09-18)

  • Implement better lineCount function from tbi/csi pseudobin
  • Fix first data line finding with very large header tabix files

1.0.1 (2018-09-15)

  • Add renameRefSeqs handling
  • Fix some blocksForRange

1.0.0 (2018-09-09)

  • Initial release