                                 jaspextract



Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Extract data from JASPAR

Description

   JASPAR is a collection of transcription factor DNA-binding preferences,
   modelled as matrices. These can be converted into Position Weight
   Matrices (PWMs or PSSMs), used for scanning genomic sequences.

   JASPAR is the only database with this scope where the data can be used
   with no restrictions (open-source).

   This program copies the JASPAR distribution into its component matrix
   sets (e.g. JASPAR_CORE, JASPAR_PHYLOFACTS etc) and copies them into the
   EMBOSS data directories, performing any necessary conversions

   The home page of JASPAR is: http://jaspar.genereg.net/

   The EMBOSS program jaspscan will not work unless this program is run.

   Running this program may be the job of your system manager.

Usage

   Here is a sample session with jaspextract


% jaspextract
Extract data from JASPAR
JASPAR database directory [.]: jaspar


   Go to the output files for this example

Command line arguments

Extract data from JASPAR
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers:
  [-directory]         directory  The FlatFileDir directory containing the
                                  .pfm files and the matrix_list.txt file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-directory" associated qualifiers
   -extension1         string     Default file extension

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   The input files are part of the uncompressed and extracted Archive.zip
   file provided in the JASPAR html/DOWNLOAD directory of the JASPAR
   homepage (http://jaspar.genereg.net). After extracting the file you
   should specify the all_data/FlatFileDir directory when running
   jasparextract. It is advisable to first delete any old data files from
   your EMBOSS data file area e.g. from the
   /usr/local/emboss/share/EMBOSS/data/JASPAR_* directories

Output file format

   The output file format is currently the same as the JASPAR distribution
   format, but with the matrix files separated into directories according
   to their type.

  Output files for usage example

  Directory: JASPAR_CNE

   This directory contains output files.

  Directory: JASPAR_CORE

   This directory contains output files, for example MA0070.1.pfm
   MA0071.1.pfm MA0072.1.pfm MA0073.1.pfm MA0074.1.pfm MA0075.1.pfm
   MA0076.1.pfm MA0077.1.pfm MA0078.1.pfm MA0079.1.pfm and
   matrix_list.txt.

  File: JASPAR_CORE/MA0070.1.pfm

 5  3 16  1  0 17 17  0  0 16 12  8
 6  9  1  1 18  1  0  0 18  1  0  2
 2  3  1  0  0  0  0  1  0  0  1  2
 5  3  0 16  0  0  1 17  0  1  5  6

  File: JASPAR_CORE/MA0071.1.pfm

15  9  6 11 21  0  0  0  0 25
 1  1 12  2  0  0  0  0 25  0
 2  0  4  5  4 25 25  0  0  0
 7 15  3  7  0  0  0 25  0  0

  File: JASPAR_CORE/MA0072.1.pfm

 9 17 15 35 23  2  0 28  0  0  0  0 36 15
 8  2  0  1  0 12  0  0  0  0  0 36  0  6
 8  7  3  0  0 13  0  8 36 36  0  0  0 10
11 10 18  0 13  9 36  0  0  0 36  0  0  5

  File: JASPAR_CORE/MA0073.1.pfm

 3  1  3  0  7  9  8  4  0 11  4  1  3  4  2  4  4  4  1  4
 8 10  8 11  4  2  3  6 11  0  7 10  8  6  9  5  5  6  7  4
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  3  2
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  2  1  1  0  1

  File: JASPAR_CORE/MA0074.1.pfm

 3  0  0  0  0  9  4  2  2  5  0  0  1  0  7
 0  0  0  0  9  0  2  4  0  0  0  0  0  9  1
 7 10  9  0  0  1  0  2  8  5 10  0  0  0  2
 0  0  1 10  1  0  4  2  0  0  0 10  9  1  0

  File: JASPAR_CORE/MA0075.1.pfm

52 59  0  0 58
 2  0  0  0  0
 4  0  1  0  1
 1  0 58 59  0

  File: JASPAR_CORE/MA0076.1.pfm

16  0  0  0  0 20 16  4  1
 1 20 20  0  0  0  0  1  6
 2  0  0 20 20  0  0 15  0
 1  0  0  0  0  0  4  0 13

  File: JASPAR_CORE/MA0077.1.pfm

24 54 59  0 65 71  4 24  9
 7  6  4 72  4  2  0  6  9
31  7  0  2  0  1  1 38 55
14  9 13  2  7  2 71  8  3

  File: JASPAR_CORE/MA0078.1.pfm

 7  8  3 30  0  0  0  0  0
 9  8 18  0  1  0  0  0 17
 6  4  1  0  0  0 31  2 10
 9 11  9  1 30 31  0 29  4

  File: JASPAR_CORE/MA0079.1.pfm

1 2 0 0 0 2 0 0 1 2
1 1 0 0 5 0 1 0 1 0
4 4 8 8 2 4 5 6 6 0
2 1 0 0 1 2 2 2 0 6

  File: JASPAR_CORE/matrix_list.txt

MA0072.1        17.4248426117905        RORA_2  Zinc-coordinating       ; acc "N
P_599022" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear
 Receptor" ; medline "7926749" ; pazar_tf_id "TF0000048" ; species "9606" ; tax_
group "vertebrates" ; type "SELEX"
MA0075.1        9.06306510239134        Prrx2   Helix-Turn-Helix        ; acc "Q
06348" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7901837" ;
pazar_tf_id "TF0000051" ; species "10090" ; tax_group "vertebrates" ; type "SELE
X"
MA0071.1        13.1897301896459        RORA_1  Zinc-coordinating       ; acc "N
P_599023" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear
 Receptor" ; medline "7926749" ; pazar_tf_id "TF0000047" ; species "9606" ; tax_
group "vertebrates" ; type "SELEX"
MA0073.1        22.2782723704014        RREB1   Zinc-coordinating       ; acc "Q
92766" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ;
medline "8816445" ; pazar_tf_id "TF0000049" ; species "9606" ; tax_group "verteb
rates" ; type "SELEX"
MA0070.1        14.6408952002356        PBX1    Helix-Turn-Helix        ; acc "Q
5T486" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7910944" ;
pazar_tf_id "TF0000046" ; species "9606" ; tax_group "vertebrates" ; type "SELEX
"
MA0074.1        20.4511671987138        RXRA::VDR       Zinc-coordinating
; acc "P19793,P11473" ; collection "CORE" ; comment "heterodimer between RXRA an
d VDR" ; family "Hormone-nuclear Receptor" ; medline "8674817" ; pazar_tf_id "TF
0000050" ; species "9606" ; tax_group "vertebrates" ; type "SELEX"
MA0078.1        10.5018372361999        Sox17   Other Alpha-Helix       ; acc "Q
61473" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medlin
e "8636240" ; pazar_tf_id "TF0000054" ; species "10090" ; tax_group "vertebrates
" ; type "SELEX"
MA0079.1        9.7185757452318 SP1     Zinc-coordinating       ; acc "P08047" ;
 collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline
"2192357" ; pazar_tf_id "TF0000055" ; species "9606" ; tax_group "vertebrates" ;
 type "SELEX"
MA0079.2        11.1288626921664        SP1     Zinc-coordinating       ; acc "P
08047" ; collection "CORE" ; comment "Annotations from PAZAR SP1 + SP1_MOUSE + S
P1_HUMAN + SP1_RAT in the pleiades genes project (TF0000105, TF0000121, TF000013
7, TF0000146)." ; family "BetaBetaAlpha-zinc finger" ; medline "17916232" ; paza
r_tf_id "TF0000055" ; species "9606,10090,10116" ; tax_group "vertebrates" ; typ
e "COMPILED"
MA0077.1        9.07881462267178        SOX9    Other Alpha-Helix       ; acc "P
48436" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medlin
e "9973626" ; pazar_tf_id "TF0000053" ; species "9606" ; tax_group "vertebrates"
 ; type "SELEX"
MA0076.1        14.123230134165 ELK4    Winged Helix-Turn-Helix ; acc "P28324" ;
 collection "CORE" ; comment "-" ; family "Ets" ; medline "8524663" ; pazar_tf_i
d "TF0000052" ; species "9606" ; tax_group "vertebrates" ; type "SELEX"

  Directory: JASPAR_FAM

   This directory contains output files.

  Directory: JASPAR_PBM

   This directory contains output files.

  Directory: JASPAR_PBM_HLH

   This directory contains output files.

  Directory: JASPAR_PBM_HOMEO

   This directory contains output files.

  Directory: JASPAR_PHYLOFACTS

   This directory contains output files.

  Directory: JASPAR_POLII

   This directory contains output files.

  Directory: JASPAR_SPLICE

   This directory contains output files.

Data files

   None

Notes

   The home page of JASPAR is: http://jaspar.genereg.net Running this
   program may be the job of your system manager.

References

    1. DNA binding sites: representation and discovery Bioinformatics.
       2000 Jan;16(1):16-23
    2. Applied bioinformatics for the identification of regulatory
       elements Nat Rev Genet. 2004 Apr;5(4):276-87

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0 unless an error is reported

Known bugs

   None.

See also

   Program name     Description
   aaindexextract   Extract amino acid property data from AAINDEX
   cutgextract      Extract codon usage tables from CUTG database
   printsextract    Extract data from PRINTS database for use by pscan
   prosextract      Processes the PROSITE motif database for use by
                    patmatmotifs
   rebaseextract    Process the REBASE database for use by restriction enzyme
                    applications
   tfextract        Process TRANSFAC transcription factor database for use by
                    tfscan

Author(s)

   Alan Bleasby
   European Bioinformatics Institute, Wellcome Trust Genome Campus,
   Hinxton, Cambridge CB10 1SD, UK

   Please report all bugs to the EMBOSS bug team
   (emboss-bug (c) emboss.open-bio.org) not to the original author.

History

   Completed 23rd July 2007

Target users

   This program is intended to be used by administrators responsible for
   software and database installation and maintenance.

Comments

   None
