Research Software
F-Seq: A Feature Density Estimator for High-Throughput Sequence Tags
Tag sequencing using high-throughput sequencing technologies are now regularly employed to identify specific sequence features such as transcription factor binding sites (ChIP-seq) or regions of open chromatin (DNase-seq). To intuitively summarize and display individual sequence data as an accurate and interpretable signal, we developed F-Seq, a software package that generates a continuous tag sequence density estimation allowing identification of biologically meaningful sites whose output can be displayed directly in the UCSC Genome Browser.
F-Seq Version 1.81 usage: fseq [options]... [file(s)]... -b <background dir> background directory (default=none) -d <input dir> input directory (default=current directory) -f <arg> fragment size (default=estimated from data) -h print usage -l <arg> feature length (default=600) -o <output dir> output directory (default=current directory) -of <wig | bed | npf> output format (default wig) -p <ploidy dir> ploidy/input directory (default=none) -s <arg> wiggle track step (default=1) -t <arg> threshold (standard deviations) (default=4.0) -v verbose output Note: DNase HS data (5' ends) - set -f 0
Newest version (1.82): Download
Bioinformatics paper version: Download
Source available: Download
Software requirements
This software requires Java version 1.5 or greater.
To see your version of java (or if it is installed), type 'java -version'
If java is not installed or you do not have the correct version, download
here.
Current accepted input formats:
Bed
Currently accepted output formats:
Wiggle
Bed
narrowPeak
narrowPeak: Narrow (or Point-Source) Peaks Format
This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+3 format.
| field | type | description |
| chrom | string | Name of the chromosome |
| chromStart | int | The starting position of the feature in the chromosome. The first base in a chromosome is numbered 0. |
| chromEnd | int | The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99. |
| name | string | Name given to a region (preferably unique). Use '.' if no name is assigned. |
| score | int | Indicates how dark the peak will be displayed in the browser (1-1000). If '0', the DCC will assign this based on signal value. Ideally average signalValue per base spread between 100-1000. |
| strand | char | +/- to denote strand or orientation (whenever applicable). Use '.' if no orientation is assigned. |
| signalValue | float | Measurement of overall (usually, average) enrichment for the region. |
| pValue | float | Measurement of statistical signficance (-log10). Use -1 if no pValue is assigned. |
| qValue | float | Measurement of statistical significance using false discovery rate. Use -1 if no qValue is assigned. |
| peak | int | Point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called. |
e.g.
chrX 9091548 9091648 . 0 . 182 5.0945 -1 50 chrX 9358722 9358822 . 0 . 91 4.6052 -1 40 chrX 9391082 9391182 . 0 . 182 9.2103 -1 75
Usage in Linux
Make sure 'bin/fseq' is executable (chmod 0755 bin/fseq)
For a list of options, type 'fseq -h'
Example: fseq -v -of wig chr1.bed chr2.bed
This takes as input the chr1.bed and chr2.bed files.
Will use verbose output and outputs the densities in the wiggle format.
A perl script has been included which converts data from mapview format (MAQ alignment results) to BED format.
Usage: mapviewToBed.pl <MIN QUAL> <MAX_HITS> <FILE>
MIN_QUAL = Exclude alignments with <= this mapping quality score.
MAX_HITS = Exclude alignments with > this number of hits.
FILE = Mapview file to convert.
Example: perl mapviewToBed.pl 0 2 sequence.mapview
This takes as input sequence.mapview, only uses sequences aligning in fewer than 2 locations with 0 or 1 mismatches, and only sequences with an alignment score of 0 or better.
New Features
I will provide more detail for these shortly:
Estimated fragment length.
Background directory - This directory must contain a formatted .bff tracks which represents alignability across the genome. We will shortly upload the script to generate these tracks.
bff files for sequences of 20bp: Download
bff files for sequences of 35bp: Download
Ploidy directory - This directory must contain formatted .iff tracks which represent estimated ploidy information for the cell line of interest. Again, we will shortly provide the script to generate these tracks.
iff files using 10kb windowing for GM12878: Download
iff files using 10kb windowing for K562: Download
iff files using 10kb windowing for HelaS3: Download
iff files using 10kb windowing for HepG2: Download
To generate ploidy tracks, a series of wiggle files representing ploidy information for each chromsome should be converted into our iff format using iffBuilder. Each position should be a relative ploidy level (a normal cell line would have a 1 in each position). Ploidy level should be < 65.535.
Usage: iffBuilder chr1.wig Example wig input: fixedStep chrom=chr22 start=14430371 step=1 0.0032 0.0032 0.0032 0.0032 0.0032
Troubleshooting
A likely cause for errors is an "OutOfMemory" exception. To increase the available memory to the java virtual machine, edit 'bin/fseq' file and change the JAVA_OPTS property to increase the heap size.



