Bio::Parser::FileParser v1.18
Bio::Parser::FileParser - Parent class for ::FileParser
use Bio::Parser::FileParser;
use vars qw( @ISA );
@ISA = qw( Bio::Parser::FileParser );
This module is not designed for use by end users. It is a base class
that should be extended by FileParser subclasses under the BioParser
hierachy. If you are implementing a set of Parser:: modules for a new
data source, your FileParser.pm module should inherit from this module
via @ISA. Unless you are a BioParser developer, the only part of this
document that is probably relevant is the documentation for the public
methods that are inherited by subclasses.
These methods are inherited by all subclasses and should not be
overloaded by subclasses without a good reason reason.
new()
BioParser users should never need to invoke this method but BioParser
developers are required to invoke it via SUPER::new() in any new
FileParser modules that they create. It in turn invokes the new()
constructor from the base Bio::Parser module.
object_mode()
$fileparser->object_mode(1);
The value of object_mode() changes the behaviour of the
next_record()
method. A call to next_record() normally returns a text scalar
containing the current record but if object_mode() is set to true, then
the text record is parsed and a Bio::Parser::XXX::Record object is
returned. This method takes a single value that evaluates to true or
false - by convention 0, and 1 are used. Setting object_mode() to
false causes next_record() to revert to its default behaviour and
return the record as a text scalar.
parser()
Store a reference to the RDParser.pm module being used to parse records
into objects. A single instance of this module is used as a factory to
parse all records.
record_count()
Returns a scalar integer corresponding to the number of records
returned so far for this file.
line_count()
Returns a scalar integer corresponding to the number of lines
read so far for this file.
record_text()
Returns the current record as a single text scalar. No parsing is done.
record_object()
This method uses the class of the invoking FileParser to guess the
classes of the matching RDParser and Record classes. If the FileParser
class does not already contain an RDParser instance, one is invoked and
used to parse the text returned by a call to record_text(). The
hashref returned by RDParser is then blessed into the appropriate Record
class and returned. If you are not using an RDParser to do the
text-to-object conversion then you'll probably have to provide code
to override this method in your FileParser module.
Some of these methods may be optionally implemented by subclasses while
others are compulsory. If any unimplemented methods are called, the
stub methods in this class will be called. For optional methods, the
stub methods print out a warning and returns undef. For compulsory
methods, the stub methods print out a warning and dies.
next_record()
This method must be implemented by all Bio::Parser::FileParser
subclasses.
When called, this method should read a full record from the text file,
calling the _incr_line_count() and _incr_record_count() methods as
required. It should place the record text into the FileParser object
with a call to record_text(). If it is possible to get the current
record ID from the text object, then this parameter should also be set
with a call to this_id(). For example implementations, see
Bio::Parser::OMIM::FileParser and
Bio::Parser::LocusLink::FileParser.
next_id()
this_id()
These two methods are optional but useful. They usually have to be
implemented by doing a regex on the text record since the parsed record
is not necessarily available. These methods should only be implemented
if it's easy to extract the IDs of the current and next records without
doing the recursive descent parse.
The follwing 2 utility methods are not designed to be used in an
object-oriented fashion.
text_file_os()
my $fh = IO::File->new( 'data_file.txt', 'r' );
my $os = text_file_os( $fh );
This routines takes as input an open filehandle and returns a string
representing the operating system of the file. The string returned is
one of 'win', 'unix', 'mac', or 'unknown'.
Fair warning - the routine is implemented using the perl seek,
read and tell functions. It stores the current position of the
filehandle using tell and reads bytes from
the filehandle until it has seen 3 line endings of the same type
(win/unix/mac) or until the end of the file is reached. This only
works with text files - you'll probably get an answer with a binary file
but it will be meaningless. At the end of the reads, the routine use
seek to reset the position of the filehandle to whatever it was when
the routine was called.
text_file_line_separator()
my $fh = IO::File->new( 'data_file.txt', 'r' );
{
local $/ = text_file_line_separator( $fh );
my $ctr = 0;
while (my $line = $fh->getline) {
chomp $line;
print $ctr++," [$line]\n";
}
}
$fh->close;
This routine is the same as text_file_os() except instead of
returning a string identifying the operating system, it returns the line
separator that was found. This can be used as shown in the example
above to set a local copy of $/.
BioParser users should never need to invoke these methods but BioParser
developers may need to use them so we'll mention what they do. All 3
are primarily for use in the next_record() method that FileParser
subclasses must implement.
$Id: FileParser.pm,v 1.18 2006/07/24 00:17:17 jpearson Exp $
BioParser is copyright 2005 by The Translational Genomics Research
Institute. All rights reserved. This License is limited to, and you
may use the Software solely for, your own internal and non-commercial
use for academic and research purposes. Without limiting the foregoing,
you may not use the Software as part of, or in any way in connection
with the production, marketing, sale or support of any commercial
product or service or for any governmental purposes. For commercial or
governmental use, please contact licensing@tgen.org. By installing this
Software you are agreeing to the terms of the LICENSE file distributed
with this software.
In any work or product derived from the use of this Software, proper
attribution of the authors as the source of the software or data must be
made. The following URL should be cited:
http://bioinformatics.tgen.org/software/bioparser/
|