Research Software

F-Seq: A Feature Density Estimator for High-Throughput Sequence Tags

Tag sequencing using high-throughput sequencing technologies are now regularly employed to identify specific sequence features such as transcription factor binding sites (ChIP-seq) or regions of open chromatin (DNase-seq). To intuitively summarize and display individual sequence data as an accurate and interpretable signal, we developed F-Seq, a software package that generates a continuous tag sequence density estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC Genome Browser.

F-Seq Version 1.81
usage: fseq [options]... [file(s)]...
 -b <background dir>     background directory (default=none)
 -d <input dir>          input directory (default=current directory)
 -f <arg>                fragment size (default=estimated from data)
 -h                      print usage
 -l <arg>                feature length (default=600)
 -o <output dir>         output directory (default=current directory)
 -of <wig | bed | npf>   output format (default wig)
 -p <ploidy dir>         ploidy/input directory (default=none)
 -s <arg>                wiggle track step (default=1)
 -t <arg>                threshold (standard deviations) (default=4.0)
 -v                      verbose output
Note: DNase HS data (5' ends) - set -f 0

Newest version (1.82): Download

Bioinformatics paper version: Download

Source available: Download

Software requirements

This software requires Java version 1.5 or greater.
To see your version of java (or if it is installed), type 'java -version'

If java is not installed or you do not have the correct version, download here.

Current accepted input formats:
Bed

Currently accepted output formats:
Wiggle
Bed
narrowPeak

narrowPeak: Narrow (or Point-Source) Peaks Format

This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+3 format.

field type description
chrom string Name of the chromosome
chromStart int The starting position of the feature in the chromosome. The first base in a chromosome is numbered 0.
chromEnd int The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
name string Name given to a region (preferably unique). Use '.' if no name is assigned.
score int Indicates how dark the peak will be displayed in the browser (1-1000). If '0', the DCC will assign this based on signal value. Ideally average signalValue per base spread between 100-1000.
strand char +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned.
signalValue float Measurement of overall (usually, average) enrichment for the region.
pValue float Measurement of statistical signficance (-log10). Use -1 if no pValue is assigned.
qValue float Measurement of statistical significance using false discovery rate. Use -1 if no qValue is assigned.
peak int Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called.

e.g.

chrX    9091548 9091648 .       0       .       182     5.0945  -1  50
chrX    9358722 9358822 .       0       .       91      4.6052  -1  40
chrX    9391082 9391182 .       0       .       182     9.2103  -1  75

Usage in Linux

Make sure 'bin/fseq' is executable (chmod 0755 bin/fseq)

For a list of options, type 'fseq -h'

Example: fseq -v -of wig chr1.bed chr2.bed
This takes as input the chr1.bed and chr2.bed files. Will use verbose output and outputs the densities in the wiggle format.

A perl script has been included which converts data from mapview format (MAQ alignment results) to BED format.

Usage:  mapviewToBed.pl <MIN QUAL> <MAX_HITS> <FILE>
        MIN_QUAL        = Exclude alignments with <= this mapping quality score.
        MAX_HITS        = Exclude alignments with > this number of hits.
        FILE            = Mapview file to convert.

Example: perl mapviewToBed.pl 0 2 sequence.mapview
This takes as input sequence.mapview, only uses sequences aligning in fewer than 2 locations with 0 or 1 mismatches, and only sequences with an alignment score of 0 or better.

New Features

I will provide more detail for these shortly:

Estimated fragment length.

Background directory - This directory must contain a formatted .bff tracks which represents alignability across the genome. We will shortly upload the script to generate these tracks.
bff files for sequences of 20bp: Download
bff files for sequences of 35bp: Download

Ploidy directory - This directory must contain formatted .iff tracks which represent estimated ploidy information for the cell line of interest. Again, we will shortly provide the script to generate these tracks.
iff files using 10kb windowing for GM12878: Download
iff files using 10kb windowing for K562: Download
iff files using 10kb windowing for HelaS3: Download
iff files using 10kb windowing for HepG2: Download

To generate ploidy tracks, a series of wiggle files representing ploidy information for each chromsome should be converted into our iff format using iffBuilder. Each position should be a relative ploidy level (a normal cell line would have a 1 in each position). Ploidy level should be < 65.535.

Usage: iffBuilder chr1.wig
Example wig input:
fixedStep chrom=chr22 start=14430371 step=1
0.0032
0.0032
0.0032
0.0032
0.0032

Troubleshooting

A likely cause for errors is an "OutOfMemory" exception. To increase the available memory to the java virtual machine, edit 'bin/fseq' file and change the JAVA_OPTS property to increase the heap size.