Bio::Parser::Homologene::FileParser v1.7
Bio::Parser::Homologene::FileParser - Perl extension for parsing Homologene files
use Bio::Parser::Homologene::FileParser;
$hgf = '/usr/local/data/homologene.data';
$parser = Bio::Parser::Homologene::FileParser->new( -file => $hgf );
$parser->object_mode(1); # return objects, not text
while (my $hrec = $parser->next_record) {
print "HID ", $hrec->HID, "\n";
print join( "\t", $_->{taxID},
$_->{geneID},
$_->{symbol},
$_->{prot_gi},
$_->{accessn} ), "\n"
foreach @{ $hrec->members };
}
This module can be used to iterate through a Homologene text file. Its
primary purpose is use in scripts that have no need to look at a record
more than once. For programs that need to store all or some of the locus
records into memory this module can be used in the reading-in phase
although it will be up to the user to determine how to store the
loci returned.
When creating a new Homologene FileParser object the only thing to be passed
in is the name of the text file to be parsed and after that it's simply a
matter of starting a loop and calling next_record() until the end
of the file is reached as shown in the example above.
As of 2007-08-15, the definitive text homologene.data file is available
via ftp from NCBI at: ftp://ftp.ncbi.nih.gov/pub/HomoloGene/
new()
my $datafile = '/usr/local/data/homologene.data';
my $fp = Bio::Parser::Homologene::FileParser->new( -file => $datafile );
Creates a new instance of the FileParser.
Must be passed the name of a file that contains one or more
LocusLink records in the same format as the LL_tmpl text file
distributed on the NCBI FTP site.
next_record()
Gathers all the lines that are part of a record using the >> lines as
delimiters, and passes back a text scalar containing all of the lines
in the record. The '>>' line is ommitted.
The >> lines fall at the beginning of the record, but contain only
redundant information (LOCUSID) so they can be ignored.
next_id()
Returns the ID of the next record.
The following methods are inherited from the Bio::Parser::FileParser
parent class. You should look at the documentation for the parent class
to see what these methods do.
$Id: FileParser.pm,v 1.7 2007/08/16 06:41:18 jpearson Exp $
BioParser is copyright 2005 by The Translational Genomics Research
Institute. All rights reserved. This License is limited to, and you
may use the Software solely for, your own internal and non-commercial
use for academic and research purposes. Without limiting the foregoing,
you may not use the Software as part of, or in any way in connection
with the production, marketing, sale or support of any commercial
product or service or for any governmental purposes. For commercial or
governmental use, please contact licensing@tgen.org. By installing this
Software you are agreeing to the terms of the LICENSE file distributed
with this software.
In any work or product derived from the use of this Software, proper
attribution of the authors as the source of the software or data must be
made. The following URL should be cited:
http://bioinformatics.tgen.org/software/bioparser/
|