File Coverage

blib/lib/Array/Compare.pm
Criterion Covered Total %
statement 90 90 100.0
branch 31 36 86.1
condition 0 6 0.0
subroutine 11 11 100.0
pod 5 5 100.0
total 137 148 92.6


line stmt bran cond sub pod time code
1             #
2             # $Id$
3             #
4              
5             =head1 NAME
6            
7             Array::Compare - Perl extension for comparing arrays.
8            
9             =head1 SYNOPSIS
10            
11             use Array::Compare;
12            
13             my $comp1 = Array::Compare->new;
14             $comp->Sep('|');
15             $comp->Skip({3 => 1, 4 => 1});
16             $comp->WhiteSpace(0);
17             $comp->Case(1);
18            
19             my $comp2 = Array::Compare->new(Sep => '|',
20             WhiteSpace => 0,
21             Case => 1,
22             Skip => {3 => 1, 4 => 1});
23            
24             my @arr1 = 0 .. 10;
25             my @arr2 = 0 .. 10;
26            
27             $comp1->compare(\@arr1, \@arr2);
28             $comp2->compare(\@arr1, \@arr2);
29            
30             =head1 DESCRIPTION
31            
32             If you have two arrays and you want to know if they are the same or
33             different, then Array::Compare will be useful to you.
34            
35             All comparisons are carried out via a comparator object. In the
36             simplest usage, you can create and use a comparator object like
37             this:
38            
39             my @arr1 = 0 .. 10;
40             my @arr2 = 0 .. 10;
41            
42             my $comp = Array::Compare->new;
43            
44             if ($comp->compare(\@arr1, \@arr2)) {
45             print "Arrays are the same\n";
46             } else {
47             print "Arrays are different\n";
48             }
49            
50             Notice that you pass references to the two arrays to the comparison
51             method.
52            
53             Internally the comparator compares the two arrays by using C<join>
54             to turn both arrays into strings and comparing the strings using
55             C<eq>. In the joined strings, the elements of the original arrays
56             are separated with the C<^G> character. This can cause problems if
57             your array data contains C<^G> characters as it is possible that
58             two different arrays can be converted to the same string.
59            
60             To avoid this, it is possible to override the default separator
61             character, either by passing and alternative to the C<new> function
62            
63             my $comp = Array::Compare->new(Sep => '|');
64            
65             or by changing the seperator for an existing comparator object
66            
67             $comp->Sep('|');
68            
69             In general you should choose a separator character that won't appear
70             in your data.
71            
72             You can also control whether or not whitespace within the elements of
73             the arrays should be considered significant when making the comparison.
74             The default is that all whitespace is significant. The alternative is
75             for all consecutive white space characters to be converted to a single
76             space for the pruposes of the comparison. Again, this can be turned on
77             when creating a comparator object:
78            
79             my $comp = Array::Compare->new(WhiteSpace => 0);
80            
81             or by altering an existing object:
82            
83             $comp->WhiteSpace(0);
84            
85             You can also control whether or not the case of the data is significant
86             in the comparison. The default is that the case of data is taken into
87             account. This can be changed in the standard ways when creating a new
88             comparator object:
89            
90             my $comp = Array::Compare->new(Case => 0);
91            
92             or by altering an existing object:
93            
94             $comp->Case(0);
95            
96             In addition to the simple comparison described above (which returns true
97             if the arrays are the same and false if they're different) there is also
98             a full comparison which returns a list containing the indexes of elements
99             which differ between the two arrays. If the arrays are the same it returns
100             an empty list. In scalar context the full comparison returns the length of
101             this list (i.e. the number of elements that differ). You can access the full
102             comparision in two ways. Firstly, there is a C<DefFull> attribute. If this
103             is C<true> then a full comparison if carried out whenever the C<compare>
104             method is called.
105            
106             my $comp = Array::Compare->new(DefFull => 1);
107             $comp->compare(\@arr1, \@arr2); # Full comparison
108            
109             $comp->DefFull(0);
110             $comp->compare(\@arr1, \@arr2); # Simple comparison
111            
112             $comp->DefFull(1);
113             $comp->compare(\@arr1, \@arr2); # Full comparison again
114            
115            
116             Secondly, you can access the full comparison method directly
117            
118             $comp->full_compare(\@arr1, \@arr2);
119            
120             For symmetry, there is also a direct method to use to call the simple
121             comparison.
122            
123             $comp->simple_compare(\@arr1, \@arr2);
124            
125             The final complication is the ability to skip elements in the comparison.
126             If you know that two arrays will always differ in a particular element
127             but want to compare the arrays I<ignoring> this element, you can do it
128             with Array::Compare without taking array slices. To do this, a
129             comparator object has an optional attribute called C<Skip> which is a
130             reference to a hash. The keys in this hash are the indexes of the array
131             elements and the values should be any true value for elements that should
132             be skipped.
133            
134             For example, if you want to compare two arrays, ignoring the values in
135             elements two and four, you can do something like this:
136            
137             my %skip = (2 => 1, 4 => 1);
138             my @a = (0, 1, 2, 3, 4, 5);
139             my @b = (0, 1, X, 3, X, 5);
140            
141             my $comp = Array::Compare->new(Skip => \%skip);
142            
143             $comp->compare(\@a, \@b);
144            
145             This should return I<true>, as we are explicitly ignoring the columns
146             which differ.
147            
148             Of course, having created a comparator object with no skip hash, it is
149             possible to add one later:
150            
151             $comp->Skip({1 => 1, 2 => 1});
152            
153             or:
154            
155             my %skip = (1 => 1, 2 => 2);
156             $comp->Skip(\%skip);
157            
158             To reset the comparator so that no longer skips elements, set the skip
159             hash to an empty hash.
160            
161             $comp->Skip({});
162            
163             You can also check to see if one array is a permutation of another, i.e.
164             they contain the same elements but in a different order.
165            
166             if ($comp->perm(\@a, \@b) {
167             print "Arrays are perms\n";
168             else {
169             print "Nope. Arrays are completely different\n";
170             }
171            
172             In this case the values of C<WhiteSpace> and C<Case> are still used,
173             but C<Skip> is ignored for, hopefully, obvious reasons.
174            
175             =head1 METHODS
176            
177             =cut
178              
179             package Array::Compare;
180              
181             require 5.006_000;
182 2     2   230642 use strict;
  2         3  
  2         37  
183 2     2   6 use warnings;
  2         2  
  2         66  
184             our ($VERSION, $AUTOLOAD);
185              
186 2     2   491 use Moose;
  2         537251  
  2         18  
187 2     2   7844 use Carp;
  2         4  
  2         1212  
188              
189             $VERSION = '2.01';
190              
191             has Sep => ( is => 'rw', isa => 'Str', default => '^G' );
192             has WhiteSpace => ( is => 'rw', isa => 'Bool', default => 1 );
193             has Case => ( is => 'rw', isa => 'Bool', default => 1 );
194             has DefFull => ( is => 'rw', isa => 'Bool', default => 0 );
195             has Skip => ( is => 'rw', isa => 'HashRef', default => sub { {} } );
196              
197             =head2 new [ %OPTIONS ]
198            
199             Constructs a new comparison object.
200            
201             Takes an optional hash containing various options that control how
202             comparisons are carried out. Any omitted options take useful defaults.
203            
204             =over 4
205            
206             =item Sep
207            
208             This is the value that is used to separate fields when the array is joined
209             into a string. It should be a value which doesn't appear in your data.
210             Default is '^G'.
211            
212             =item WhiteSpace
213            
214             Flag that indicates whether or not whitespace is significant in the
215             comparison. If this value is false then all multiple whitespace characters
216             are changed into a single space before the comparison takes place. Default
217             is 1 (whitespace is significant).
218            
219             =item Case
220            
221             Flag that indicates whther or not the case of the data should be significant
222             in the comparison. Default is 1 (case is significant).
223            
224             =item Skip
225            
226             a reference to a hash which contains the numbers of any columns that should
227             be skipped in the comparison. Default is an empty hash (all columns are
228             significant).
229            
230             =item DefFull
231            
232             Flag which indicates whether the default comparison is simple (just returns
233             true if the arrays are the same or false if they're not) or full (returns an
234             array containing the indexes of the columns that differ). Default is 0 (simple
235             comparison).
236            
237             =back
238            
239             =cut
240              
241             #
242             # Utility function to check the arguments to any of the comparison
243             # function. Ensures that there are two arguments and that they are
244             # both arrays.
245             #
246             sub _check_args {
247 53     53   74   my $self = shift;
248 53 100       247   croak('Must compare two arrays.') unless @_ == 2;
249 52 100       285   croak('Argument 1 is not an array') unless ref($_[0]) eq 'ARRAY';
250 51 100       253   croak('Argument 2 is not an array') unless ref($_[1]) eq 'ARRAY';
251              
252 50         72   return;
253             }
254              
255             =head2 compare_len \@ARR1, \@ARR2
256            
257             Very simple comparison. Just checks the lengths of the arrays are
258             the same.
259            
260             =cut
261              
262             sub compare_len {
263 25     25 1 33   my $self = shift;
264              
265 25         43   $self->_check_args(@_);
266              
267 25         31   return @{$_[0]} == @{$_[1]};
  25         39  
  25         80  
268             }
269              
270             =head2 compare \@ARR1, \@ARR2
271            
272             Compare the values in two arrays and return a data indicating whether
273             the arrays are the same. The exact return values differ depending on
274             the comparison method used. See the descriptions of L<simple_compare>
275             and L<full_compare> for details.
276            
277             Uses the value of DefFull to determine which comparison routine
278             to use.
279            
280             =cut
281              
282             sub compare {
283 25     25 1 2314   my $self = shift;
284              
285 25 100       192   if ($self->DefFull) {
286 12         116     return $self->full_compare(@_);
287               } else {
288 13         134     return $self->simple_compare(@_);
289               }
290             }
291              
292             =head2 simple_compare \@ARR1, \@ARR2
293            
294             Compare the values in two arrays and return a flag indicating whether or
295             not the arrays are the same.
296            
297             Returns true if the arrays are the same or false if they differ.
298            
299             Uses the values of 'Sep', 'WhiteSpace' and 'Skip' to influence
300             the comparison.
301            
302             =cut
303              
304             sub simple_compare {
305 16     16 1 28   my $self = shift;
306              
307 16         33   $self->_check_args(@_);
308              
309 13         19   my ($row1, $row2) = @_;
310              
311             # No point in continuing if the number of elements is different.
312 13 100       24   return unless $self->compare_len(@_);
313              
314             # @check contains the indexes into the two arrays, i.e. the numbers
315             # from 0 to one less than the number of elements.
316 12         232   my @check = 0 .. $#$row1;
317              
318 12         73   my ($pkg, $caller) = (caller(1))[0, 3];
319 12 50       29   $caller = '' unless defined $caller;
320 12         19   my $perm = $caller eq __PACKAGE__ . "::perm";
321              
322             # Filter @check so it only contains indexes that should be compared.
323             # N.B. Makes no sense to do this if we are called from 'perm'.
324 12 100       21   unless ($perm) {
325 48   0     454     @check = grep {!(exists $self->Skip->{$_}
  9         68  
326             && $self->Skip->{$_}) } @check
327 9 0       7 if keys %{$self->Skip};
328               }
329              
330             # Build two strings by taking array slices containing only the columns
331             # that we shouldn't skip and joining those array slices using the Sep
332             # character. Hopefully we can then just do a string comparison.
333             # Note: this makes the function liable to errors if your arrays
334             # contain the separator character.
335 12         123   my $str1 = join($self->Sep, @{$row1}[@check]);
  12         89  
336 12         64   my $str2 = join($self->Sep, @{$row2}[@check]);
  12         82  
337              
338             # If whitespace isn't significant, collapse it
339 12 100       62   unless ($self->WhiteSpace) {
340 1         8     $str1 =~ s/\s+/ /g;
341 1         3     $str2 =~ s/\s+/ /g;
342               }
343              
344             # If case isn't significant, change to lower case
345 12 100       106   unless ($self->Case) {
346 1         6     $str1 = lc $str1;
347 1         1     $str2 = lc $str2;
348               }
349              
350 12         128   return $str1 eq $str2;
351             }
352              
353             =head2 full_compare \@ARR1, \@ARR2
354            
355             Do a full comparison between two arrays.
356            
357             Checks each individual column. In scalar context returns the number
358             of columns that differ (zero if the arrays are the same). In list
359             context returns an list containing the indexes of the columns that
360             differ (an empty list if the arrays are the same).
361            
362             Uses the values of 'Sep' and 'WhiteSpace' to influence the comparison.
363            
364             B<Note:> If the two arrays are of different lengths then this method
365             just returns the indexes of the elements that appear in one array but
366             not the other (i.e. the indexes from the longer array that are beyond
367             the end of the shorter array). This might be a little
368             counter-intuitive.
369            
370             =cut
371              
372             sub full_compare {
373 12     12 1 22   my $self = shift;
374              
375 12         24   $self->_check_args(@_);
376              
377 12         20   my ($row1, $row2) = @_;
378              
379             # No point in continuing if the number of elements is different.
380             # Because of the expected return value from this function we can't
381             # just say 'the arrays are different'. We need to do some work to
382             # calculate a meaningful return value.
383             # If we've been called in array context we return a list containing
384             # the number of the columns that appear in the longer list and aren't
385             # in the shorter list. If we've been called in scalar context we
386             # return the difference in the lengths of the two lists.
387 12 100       26   unless ($self->compare_len(@_)) {
388 3 100       7     if (wantarray) {
389 2         3       my ($max, $min);
390 2 100       3       if ($#{$row1} > $#{$row2}) {
  2         4  
  2         3  
391 1         2 ($max, $min) = ($#{$row1}, $#{$row2} + 1);
  1         2  
  1         3  
392                   } else {
393 1         1 ($max, $min) = ($#{$row2}, $#{$row1} + 1);
  1         2  
  1         1  
394                   }
395 2         18       return ($min .. $max);
396                 } else {
397 1         2       return abs(@{$row1} - @{$row2});
  1         2  
  1         8  
398                 }
399               }
400              
401 9         19   my ($arr1, $arr2) = @_;
402              
403 9         15   my @diffs = ();
404              
405 9         10   foreach (0 .. $#{$arr1}) {
  9         34  
406 62 0 0     77     next if keys %{$self->Skip} && $self->Skip->{$_};
  62         284  
407              
408 58         403     my ($val1, $val2) = ($arr1->[$_], $arr2->[$_]);
409 58 100       249     unless ($self->WhiteSpace) {
410 2         14       $val1 =~ s/\s+/ /g;
411 2         4       $val2 =~ s/\s+/ /g;
412                 }
413              
414 58 100       487     unless ($self->Case) {
415 4         31       $val1 = lc $val1;
416 4         2       $val2 = lc $val2;
417                 }
418              
419 58 100       326     push @diffs, $_ unless $val1 eq $val2;
420               }
421              
422 9 100       45   return wantarray ? @diffs : scalar @diffs;
423             }
424              
425             =head2 perm \@ARR1, \@ARR2
426            
427             Check to see if one array is a permutation of the other (i.e. contains
428             the same set of elements, but in a different order).
429            
430             We do this by sorting the arrays and passing references to the assorted
431             versions to simple_compare. There are also some small changes to
432             simple_compare as it should ignore the Skip hash if we are called from
433             perm.
434            
435             =cut
436              
437             sub perm {
438 3     3 1 210   my $self = shift;
439              
440 3         5   return $self->simple_compare([sort @{$_[0]}], [sort @{$_[1]}]);
  3         19  
  3         15  
441             }
442              
443 2     2   8 no Moose;
  2         2  
  2         14  
444             __PACKAGE__->meta->make_immutable;
445              
446             1;
447             __END__
448            
449             =head1 AUTHOR
450            
451             Dave Cross <dave@mag-sol.com>
452            
453             =head1 SEE ALSO
454            
455             perl(1).
456            
457             =head1 COPYRIGHT AND LICENSE
458            
459             Copyright (C) 2000-2005, Magnum Solutions Ltd. All Rights Reserved.
460            
461             This script is free software; you can redistribute it and/or modify it
462             under the same terms as Perl itself.
463            
464             =cut
465