Bioinformatics Research Unit > Software > BioParser Project > Documentation

Bio::Parser::Homologene::FileParser    v1.7

^ NAME

Bio::Parser::Homologene::FileParser - Perl extension for parsing Homologene files

^ SYNOPSIS

  use Bio::Parser::Homologene::FileParser;
  $hgf = '/usr/local/data/homologene.data';
  $parser = Bio::Parser::Homologene::FileParser->new( -file => $hgf );
  $parser->object_mode(1);  # return objects, not text
  while (my $hrec = $parser->next_record) {
      print "HID     ", $hrec->HID, "\n";
      print join( "\t", $_->{taxID},
                        $_->{geneID},
                        $_->{symbol},
                        $_->{prot_gi},
                        $_->{accessn} ), "\n"
            foreach @{ $hrec->members };
  }

^ DESCRIPTION

This module can be used to iterate through a Homologene text file. Its primary purpose is use in scripts that have no need to look at a record more than once. For programs that need to store all or some of the locus records into memory this module can be used in the reading-in phase although it will be up to the user to determine how to store the loci returned.

When creating a new Homologene FileParser object the only thing to be passed in is the name of the text file to be parsed and after that it's simply a matter of starting a loop and calling next_record() until the end of the file is reached as shown in the example above.

As of 2007-08-15, the definitive text homologene.data file is available via ftp from NCBI at: ftp://ftp.ncbi.nih.gov/pub/HomoloGene/

^ PUBLIC METHODS

new()

  my $datafile = '/usr/local/data/homologene.data';
  my $fp = Bio::Parser::Homologene::FileParser->new( -file => $datafile );

Creates a new instance of the FileParser. Must be passed the name of a file that contains one or more LocusLink records in the same format as the LL_tmpl text file distributed on the NCBI FTP site.

next_record()

Gathers all the lines that are part of a record using the >> lines as delimiters, and passes back a text scalar containing all of the lines in the record. The '>>' line is ommitted.

The >> lines fall at the beginning of the record, but contain only redundant information (LOCUSID) so they can be ignored.

next_id()

Returns the ID of the next record.

Inherited Methods

The following methods are inherited from the Bio::Parser::FileParser parent class. You should look at the documentation for the parent class to see what these methods do.

^ SEE ALSO

^ AUTHORS

^ VERSION

$Id: FileParser.pm,v 1.7 2007/08/16 06:41:18 jpearson Exp $

^ COPYRIGHT

BioParser is copyright 2005 by The Translational Genomics Research Institute. All rights reserved. This License is limited to, and you may use the Software solely for, your own internal and non-commercial use for academic and research purposes. Without limiting the foregoing, you may not use the Software as part of, or in any way in connection with the production, marketing, sale or support of any commercial product or service or for any governmental purposes. For commercial or governmental use, please contact licensing@tgen.org. By installing this Software you are agreeing to the terms of the LICENSE file distributed with this software.

In any work or product derived from the use of this Software, proper attribution of the authors as the source of the software or data must be made. The following URL should be cited:

http://bioinformatics.tgen.org/software/bioparser/