Thursday, 27 July 2017

Parsing a moving target.

A problem was presented to me where the raw data was expected to have columns added and moved around at some point but a report was needed that could deal with this. I chose to use Perl Named Capture Containers to solve this problem.


#!/usr/bin/perl -w

# This is an example of using named capture containers to parse data from lines 
# when the columns of data could move around, new columns are added and even if 
# the column is removed.

# Matching a pattern with multiple parts cannot deal with columns that move or are missing.
# Each part must be matched on it's own line.
# Every time a match is made it is put into the hash for use later.

# sample3.data as input
#######################
# 01:00:00 httpd=on sshd=on crond=on ntpd=on winbind=off cups=off
# 01:01:00 [567] httpd=on sshd=on crond=on ntpd=on winbind=off cups=off
# 01:02:00 [567] crond=on ntpd=on winbind=off httpd=off cups=off sshd=on
# 01:03:00 [PID:567] ntpd=on winbind=off cups=off sshd=on named=on httpd=on crond=on 
# 01:04:00 [PID:567] named=on httpd=on crond=on

# results from output
#######################
# Time  httpd ntpd sshd 
# 01:00:00  on  on  on 
# 01:01:00  on  on  on 
# 01:02:00  off  on 
# 01:03:00  on  on  on 
# 01:04:00  on 


my %DataSet; 
# The data set is dynamically gathered so to change
# the report just add or remove column names here.
my @ReportFields = ( "httpd", "ntpd", "sshd" );

sub Pack {
 my $Time = shift;
 my $FieldName = shift;
 my $DataValue = shift;
 $DataSet{$Time}->{$FieldName} = $DataValue;
}

while (<>) {
 $Line = $_;
 $Line =~ s/\n//;
 $Line =~ m/^(?<time>\d\d:\d\d:\d\d).*/;
 my $NewTime = $+{time};

 foreach my $R ( sort @ReportFields) {
  if ($Line =~ m/.* \Q$R\E=(?<value>\w*) .*/) { Pack($NewTime, $R, $+{value}) }; 
 }
}

printf "Time\t\t";
foreach my $F ( sort @ReportFields ) {
 printf $F."\t";
}
printf "\n";

foreach my $Time (sort keys %DataSet) {
 printf $Time."\t";
 foreach my $F ( sort @ReportFields ) {
  printf " ".$DataSet{$Time}->{$F}."\t" if (defined $DataSet{$Time}->{$F});
 }
 printf "\n";
}

1 comment:

  1. One more thing to note is that my example uses data fields of the same format so I was able to use a for loop to match the fields I want. A typical log file will have a variety of data recorded and multiple lines with different regular expressions will be needed.

    Happy hacking.

    ReplyDelete