Saturday, January 29, 2011

Search and delete lines matching a pattern along with comments in previous line if any

Hi,

I have a requirement to write a shell script in csh to search and delete lines matching a pattern along with comments in previous line if any. For example if my file has the following lines

Shell script
#this is  a test
pattern1
format1
pattern2
format2
#format3
pattern3

If the search pattern is "pattern" The output should be as follows

Shell script
format1
format2

To be more precise the lines which have the pattern and the previous line if it begins with "#" should be deleted

Thanks for the help

  • First of all nobody should ever use csh for anything - it's outdated and un- (not "under") powered. Secondly, I doubt it's up to the task. Third, it's much more likely that awk, sed or even Perl will be a much better tool for the task.

    awk '/^#/ {printf line; line=$0"\n"; next} /pattern/ {line=""} ! /pattern/ {printf line; print; line=""}'
    

    Edit: fixed script to handle comment lines correctly

    Dennis Williamson : Use awk variable passing: `awk -v pattern=$pattern '.../pattern/...'`
    Dennis Williamson : By the way, don't forget to upvote and accept answer(s) to your questions if you find them useful.
    Dennis Williamson : Sorry, I wasn't paying attention when I posted that comment about the variables. In order to use a variable for a match you have to use the match operator `~` instead of `//`. Something similar to `nawk -v pattern=$pattern '/^#/ {printf line; line=$0"\n"; next} $0 ~ pattern {line=""}` ...
    Dennis Williamson : Did you paste that here? There's a space in the middle of the last reference to the variable "line". Also, it should be `$0 !~ pattern` instead of putting the "!" first.
    Dennis Williamson : What have you tried? This might work: `.../^#/ {line=line$0"\n"; next}...`
  • Probably a better way to write this logically, but I think this might do it:

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    
    my $previous_line = '';
    while(<>) {
        if ( /pattern/ ) {
            if ( (! ($previous_line =~ /^#/)) && (! ($previous_line =~ /pattern/))) {
                print $previous_line;
            }
        } elsif (! ($previous_line =~ /pattern/)) {
            print $previous_line;
        }
        $previous_line = $_;
    }
    print $previous_line if not ($previous_line =~ /pattern/);
    

    Basically, the loop is a line behind with the previous line. It says it is okay to print the previous line if:

    1. If The current line matches the pattern: Okay to print previous as long as the previous didn't also match pattern, or it was a comment.
    2. If this line is not pattern, it is okay to print the previous line as long as it didn't match pattern.

    You can just save the code in a file and use it like: perl thefile.pl textfile_you_want_to_filter

    troyengel : Exactly how I'd do it more or less in anything (shell, perl, etc.), the logic is what matters.
  • Here is a one-liner solution in Perl (not in C shell). You can modify the /pattern/ regular expression in the middle.

    perl -ne 'if(/^#/){$c=$_}elsif(!/pattern/){print$c,$_;$c=""}else{$c=""}' <file.in
    
    From pts
  • Does it have to be shell scripted?

    1. open file with vi
    2. :g/<pattern>/d
    3. repeat as necessary for additional pattern-types unless you can regex the pattern
    4. :g/^#/d

    can be effectively reproduced using sed if it has to be scripted

    edit:

    1.create file .sedscript:

    /pattern/d
    /^#/d
    

    2.sed -f .sedscript <inputfile> > <outputfile>

    This does not satisfy a requirement to delete the previous line, but your example doesn't seem to require that functionality.

  • Here's a sed version. Some versions of sed may need parts of this separated into multiple -e clauses.

    sed '$b;N;/^#.*\npattern.*$/ ! {P;D}; :c; $d; s/.*\n//;N;/^#.*\npattern.*$/ {bc}; /^pattern/d; D' patterns
    

    Here's a script file version of that one-liner with comments:

    #!/bin/sed -f
    
    # Search for a combination of a comment followed by a pattern
    # until that, print what you find.
    $b
    N
    /^#.*\npattern.*$/ ! {
    P
    D
    }
    
    :c
    # Got a comment and a pattern combination in pattern space.
    # At the end of the file we simply exit
    $d
    
    # Else, we keep reading lines with `N' until we
    # find a different one
    s/.*\n//
    N
    /^#.*\npattern.*$/ {
    bc
    }
    
    # Remove standalone lines that have "pattern"
    /^pattern/d
    
    # Remove the last instance of the combination
    # and go back to the top
    D
    

    This is based on the script in info sed section 4.16 "Remove All Duplicated Lines" (uniq -u).

    Dennis Williamson : Using a variable or a literal?
    Dennis Williamson : If it's a variable, wrap it in single quotes: `sed '$b;N;/^#.*\n'$pattern'.*$/ ! {P;D}; ...`

0 comments:

Post a Comment