Bioinformatics Research Unit > Software > BioParser Project > Documentation

Bio::Parser::FileParser    v1.18

^ NAME

Bio::Parser::FileParser - Parent class for ::FileParser

^ SYNOPSIS

  use Bio::Parser::FileParser;
  use vars qw( @ISA );
  @ISA = qw( Bio::Parser::FileParser );

^ DESCRIPTION

This module is not designed for use by end users. It is a base class that should be extended by FileParser subclasses under the BioParser hierachy. If you are implementing a set of Parser:: modules for a new data source, your FileParser.pm module should inherit from this module via @ISA. Unless you are a BioParser developer, the only part of this document that is probably relevant is the documentation for the public methods that are inherited by subclasses.

^ PUBLIC METHODS

These methods are inherited by all subclasses and should not be overloaded by subclasses without a good reason reason.

new()

BioParser users should never need to invoke this method but BioParser developers are required to invoke it via SUPER::new() in any new FileParser modules that they create. It in turn invokes the new() constructor from the base Bio::Parser module.

object_mode()

  $fileparser->object_mode(1);

The value of object_mode() changes the behaviour of the next_record() method. A call to next_record() normally returns a text scalar containing the current record but if object_mode() is set to true, then the text record is parsed and a Bio::Parser::XXX::Record object is returned. This method takes a single value that evaluates to true or false - by convention 0, and 1 are used. Setting object_mode() to false causes next_record() to revert to its default behaviour and return the record as a text scalar.

parser()

Store a reference to the RDParser.pm module being used to parse records into objects. A single instance of this module is used as a factory to parse all records.

record_count()

Returns a scalar integer corresponding to the number of records returned so far for this file.

line_count()

Returns a scalar integer corresponding to the number of lines read so far for this file.

record_text()

Returns the current record as a single text scalar. No parsing is done.

record_object()

This method uses the class of the invoking FileParser to guess the classes of the matching RDParser and Record classes. If the FileParser class does not already contain an RDParser instance, one is invoked and used to parse the text returned by a call to record_text(). The hashref returned by RDParser is then blessed into the appropriate Record class and returned. If you are not using an RDParser to do the text-to-object conversion then you'll probably have to provide code to override this method in your FileParser module.

Overload Methods

Some of these methods may be optionally implemented by subclasses while others are compulsory. If any unimplemented methods are called, the stub methods in this class will be called. For optional methods, the stub methods print out a warning and returns undef. For compulsory methods, the stub methods print out a warning and dies.

next_record()

This method must be implemented by all Bio::Parser::FileParser subclasses. When called, this method should read a full record from the text file, calling the _incr_line_count() and _incr_record_count() methods as required. It should place the record text into the FileParser object with a call to record_text(). If it is possible to get the current record ID from the text object, then this parameter should also be set with a call to this_id(). For example implementations, see Bio::Parser::OMIM::FileParser and Bio::Parser::LocusLink::FileParser.

next_id()

this_id()

These two methods are optional but useful. They usually have to be implemented by doing a regex on the text record since the parsed record is not necessarily available. These methods should only be implemented if it's easy to extract the IDs of the current and next records without doing the recursive descent parse.

Utility methods

The follwing 2 utility methods are not designed to be used in an object-oriented fashion.

text_file_os()

  my $fh = IO::File->new( 'data_file.txt', 'r' );
  my $os = text_file_os( $fh );

This routines takes as input an open filehandle and returns a string representing the operating system of the file. The string returned is one of 'win', 'unix', 'mac', or 'unknown'.

Fair warning - the routine is implemented using the perl seek, read and tell functions. It stores the current position of the filehandle using tell and reads bytes from the filehandle until it has seen 3 line endings of the same type (win/unix/mac) or until the end of the file is reached. This only works with text files - you'll probably get an answer with a binary file but it will be meaningless. At the end of the reads, the routine use seek to reset the position of the filehandle to whatever it was when the routine was called.

text_file_line_separator()

  my $fh = IO::File->new( 'data_file.txt', 'r' );
  {
      local $/ = text_file_line_separator( $fh );
      my $ctr = 0;
      while (my $line = $fh->getline) {
          chomp $line;
          print $ctr++," [$line]\n";
      }
  }
  $fh->close;

This routine is the same as text_file_os() except instead of returning a string identifying the operating system, it returns the line separator that was found. This can be used as shown in the example above to set a local copy of $/.

^ PRIVATE METHODS

BioParser users should never need to invoke these methods but BioParser developers may need to use them so we'll mention what they do. All 3 are primarily for use in the next_record() method that FileParser subclasses must implement.

_incr_line_count()

Should be called in subclass next_record() methods to keep track of how many lines have been read from the text file.

_incr_record_count()

Should be called in subclass next_record() methods to keep track of how many records have been read from the text file.

_record()

Returns the current record by checking the status of object_mode() and calling either record_object() or record_text() as required.

^ SEE ALSO

^ AUTHOR

John Pearson, bioinfresearch@tgen.org

^ VERSION

$Id: FileParser.pm,v 1.18 2006/07/24 00:17:17 jpearson Exp $

^ COPYRIGHT

BioParser is copyright 2005 by The Translational Genomics Research Institute. All rights reserved. This License is limited to, and you may use the Software solely for, your own internal and non-commercial use for academic and research purposes. Without limiting the foregoing, you may not use the Software as part of, or in any way in connection with the production, marketing, sale or support of any commercial product or service or for any governmental purposes. For commercial or governmental use, please contact licensing@tgen.org. By installing this Software you are agreeing to the terms of the LICENSE file distributed with this software.

In any work or product derived from the use of this Software, proper attribution of the authors as the source of the software or data must be made. The following URL should be cited:

http://bioinformatics.tgen.org/software/bioparser/