[GRASS-user] combine & edit multiple text files

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
    Date Orbit Time Lat Lon
    20030101 4384 81704.016 19.364 -155.103
    20030101 4384 81704.164 19.373 -155.105
    20030101 4384 81704.164 19.375 -155.096
    20030101 4385 100833.648 56.638 161.281
    20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here's what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE ... > test

# cut only desired columns (1_4) delimeiter is spac ' '
cut -d' ' -f1 test > 1
cut -d' ' -f2 test > 2
cut -d' ' -f3 test > 3
cut -d' ' -f4 test > 4

# combine all columns
paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
    20030101 4380 25934.057 -37.022 -69.589
    20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
"120.00" In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00:01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I'm starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI |http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|

The modern solution for problems like these is a script language like Perl or Python.

In Python a simple script for working with columns of data might like like this:

fin = open(infile)
for record in fin:
fields = rec.split() # this part splits the fields on white space
date = fields[0] # pick the fields you want
time = fields[1]

value2 = fields[9]

print “%f %f %f” % (date, time, value2) # print them to stdout or write to a file

run the script and capture the output to a file
python script.py > bigfile.txt

I find cut, paste, sed work will for quick jobs (and they would work in your case). But as soon as I need to look up the documentation on sed I have usually reached the point where a Python script would be easier to impliment. For that reason, I never use awk any more.

My 2 cents,

David

On 8/7/06, maning sambale <emmanuel.sambale@gmail.com > wrote:

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
Date Orbit Time Lat Lon
20030101 4384 81704.016 19.364 -155.103
20030101 4384 81704.164 19.373 -155.105
20030101 4384 81704.164 19.375 -155.096
20030101 4385 100833.648 56.638 161.281
20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here’s what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE … > test

cut only desired columns (1_4) delimeiter is spac ’ ’

cut -d’ ’ -f1 test > 1
cut -d’ ’ -f2 test > 2
cut -d’ ’ -f3 test > 3
cut -d’ ’ -f4 test > 4

combine all columns

paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
20030101 4380 25934.057 -37.022 -69.589
20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
“120.00” In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00:01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I’m starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning


|---------|----------------------------------------------------------|
| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser


David Finlayson

Maning,
  As David says, python or perl are used now for manipulating text
files. I have done several quick scripts for doing this with Perl
(thanks David for the python script, bout time I learn to use it). A
basic perl script would look like this (and note, my Perl is not great,
and am sure there are many other ways to do this) :

Explanation: Files.txt is a ls/dir listing of the wanted files to
combine. Then, the script reads in each file, stripping any sort of
header information from the columns, and outputting everything into one
file. Fairly simple, and a quick search on the web for file
manipulation using Perl will come up with probably a better explanation.

$in_file = "Files.txt";
$out_file = "outfile.txt";

open (INFILE, $in_file) || die "INFILE";
open (OUTFILE, ">$out_file") || die "OUTFILE";

@infiles = <INFILE>;
close(INFILE);

print OUTFILE "z,x,y\n";

foreach $in_files (@infiles)
{

  open (INFILE1, $in_files) || die "Cannot open $in_files";
  while (<INFILE1>)
  {

    chomp($_);
    ($x, $y, $z) = split ',',$_;

    if ($x != x) {
    print OUTFILE "$z,$x,$y\n"; }
  }
  
  close(INFILE1);
}

close(OUTFILE);

Kevin Slover
Coastal / GIS Specialist
2872 Woodcock Blvd Suite 230
Atlanta GA 30341
(P) 678-530-0022
(F) 678-530-0044

-----Original Message-----
From: grassuser-bounces@grass.itc.it
[mailto:grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
Sent: Tuesday, August 08, 2006 12:12 AM
To: grassuser@grass.itc.it
Subject: [GRASS-user] combine & edit multiple text files

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
    Date Orbit Time Lat Lon
    20030101 4384 81704.016 19.364 -155.103
    20030101 4384 81704.164 19.373 -155.105
    20030101 4384 81704.164 19.375 -155.096
    20030101 4385 100833.648 56.638 161.281
    20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here's what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE ... > test

# cut only desired columns (1_4) delimeiter is spac ' '
cut -d' ' -f1 test > 1
cut -d' ' -f2 test > 2
cut -d' ' -f3 test > 3
cut -d' ' -f4 test > 4

# combine all columns
paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
    20030101 4380 25934.057 -37.022 -69.589
    20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
"120.00" In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00:01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I'm starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|

_______________________________________________
grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser

David & Kevin,

Yes, python or perl would be great. But what I need right now is a
quick (maybe dirty) approach. I do intend to study python as I've
heard a lot about it. But not this time, I'm trying to study Linux
tools the "modular way":slight_smile:

Cheers,

Maning

On 8/8/06, Slover, Kevin <kslover@dewberry.com> wrote:

Maning,
  As David says, python or perl are used now for manipulating text
files. I have done several quick scripts for doing this with Perl
(thanks David for the python script, bout time I learn to use it). A
basic perl script would look like this (and note, my Perl is not great,
and am sure there are many other ways to do this) :

Explanation: Files.txt is a ls/dir listing of the wanted files to
combine. Then, the script reads in each file, stripping any sort of
header information from the columns, and outputting everything into one
file. Fairly simple, and a quick search on the web for file
manipulation using Perl will come up with probably a better explanation.

$in_file = "Files.txt";
$out_file = "outfile.txt";

open (INFILE, $in_file) || die "INFILE";
open (OUTFILE, ">$out_file") || die "OUTFILE";

@infiles = <INFILE>;
close(INFILE);

print OUTFILE "z,x,y\n";

foreach $in_files (@infiles)
{

        open (INFILE1, $in_files) || die "Cannot open $in_files";
        while (<INFILE1>)
        {

                chomp($_);
                ($x, $y, $z) = split ',',$_;

                if ($x != x) {
                print OUTFILE "$z,$x,$y\n"; }
        }

        close(INFILE1);
}

close(OUTFILE);

Kevin Slover
Coastal / GIS Specialist
2872 Woodcock Blvd Suite 230
Atlanta GA 30341
(P) 678-530-0022
(F) 678-530-0044

-----Original Message-----
From: grassuser-bounces@grass.itc.it
[mailto:grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
Sent: Tuesday, August 08, 2006 12:12 AM
To: grassuser@grass.itc.it
Subject: [GRASS-user] combine & edit multiple text files

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
    Date Orbit Time Lat Lon
    20030101 4384 81704.016 19.364 -155.103
    20030101 4384 81704.164 19.373 -155.105
    20030101 4384 81704.164 19.375 -155.096
    20030101 4385 100833.648 56.638 161.281
    20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here's what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE ... > test

# cut only desired columns (1_4) delimeiter is spac ' '
cut -d' ' -f1 test > 1
cut -d' ' -f2 test > 2
cut -d' ' -f3 test > 3
cut -d' ' -f4 test > 4

# combine all columns
paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
    20030101 4380 25934.057 -37.022 -69.589
    20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
"120.00" In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00:01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I'm starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|

_______________________________________________
grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI |http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|

Try this to print column 1 and 3. I think it will work on all of your files no matter how many spaces are in between:

cat file | awk ‘{print $1, $3}’

David

On 8/9/06, maning sambale <emmanuel.sambale@gmail.com> wrote:

David & Kevin,

Yes, python or perl would be great. But what I need right now is a
quick (maybe dirty) approach. I do intend to study python as I’ve
heard a lot about it. But not this time, I’m trying to study Linux
tools the “modular way”:slight_smile:

Cheers,

Maning

On 8/8/06, Slover, Kevin <kslover@dewberry.com> wrote:

Maning,
As David says, python or perl are used now for manipulating text
files. I have done several quick scripts for doing this with Perl
(thanks David for the python script, bout time I learn to use it). A
basic perl script would look like this (and note, my Perl is not great,
and am sure there are many other ways to do this) :

Explanation: Files.txt is a ls/dir listing of the wanted files to
combine. Then, the script reads in each file, stripping any sort of
header information from the columns, and outputting everything into one
file. Fairly simple, and a quick search on the web for file
manipulation using Perl will come up with probably a better explanation.

$in_file = “Files.txt”;
$out_file = “outfile.txt”;

open (INFILE, $in_file) || die “INFILE”;
open (OUTFILE, “>$out_file”) || die “OUTFILE”;

@infiles = ;
close(INFILE);

print OUTFILE “z,x,y\n”;

foreach $in_files (@infiles)
{

open (INFILE1, $in_files) || die “Cannot open $in_files”;
while ()
{

chomp($);
($x, $y, $z) = split ‘,’,$
;

if ($x != x) {
print OUTFILE “$z,$x,$y\n”; }
}

close(INFILE1);
}

close(OUTFILE);

Kevin Slover
Coastal / GIS Specialist
2872 Woodcock Blvd Suite 230
Atlanta GA 30341
(P) 678-530-0022
(F) 678-530-0044

-----Original Message-----
From: grassuser-bounces@grass.itc.it
[mailto: grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
Sent: Tuesday, August 08, 2006 12:12 AM
To: grassuser@grass.itc.it
Subject: [GRASS-user] combine & edit multiple text files

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
Date Orbit Time Lat Lon
20030101 4384 81704.016 19.364 -155.103
20030101 4384 81704.164 19.373 -155.105
20030101 4384 81704.164 19.375 -155.096
20030101 4385 100833.648 56.638 161.281
20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here’s what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE … > test

cut only desired columns (1_4) delimeiter is spac ’ ’

cut -d’ ’ -f1 test > 1
cut -d’ ’ -f2 test > 2
cut -d’ ’ -f3 test > 3
cut -d’ ’ -f4 test > 4

combine all columns

paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
20030101 4380 25934.057 -37.022 -69.589
20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
“120.00” In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00:01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I’m starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning


|---------|----------------------------------------------------------|
| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| )/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser


|---------|----------------------------------------------------------|
| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” - N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser


David Finlayson

David,

thank you! that's the one I need (for now). Funny how simple it is.
Another funny anecdote, about a year ago I passed by our GIS lab and
saw a girl editing very large ascii file (mouse click, edit, edit,
save, next line) much the same as my files I'm manipulating right now.
I asked her there might be a better way in doing this. She said it's
the only way her instructor and the lab technician thought them. :slight_smile:

cheers,

Maning

On 8/10/06, David Finlayson <david.p.finlayson@gmail.com> wrote:

Try this to print column 1 and 3. I think it will work on all of your files
no matter how many spaces are in between:

cat file | awk '{print $1, $3}'

David

On 8/9/06, maning sambale <emmanuel.sambale@gmail.com> wrote:
> David & Kevin,
>
> Yes, python or perl would be great. But what I need right now is a
> quick (maybe dirty) approach. I do intend to study python as I've
> heard a lot about it. But not this time, I'm trying to study Linux
> tools the "modular way":slight_smile:
>
> Cheers,
>
> Maning
>
> On 8/8/06, Slover, Kevin <kslover@dewberry.com> wrote:
> > Maning,
> > As David says, python or perl are used now for manipulating text
> > files. I have done several quick scripts for doing this with Perl
> > (thanks David for the python script, bout time I learn to use it). A
> > basic perl script would look like this (and note, my Perl is not great,
> > and am sure there are many other ways to do this) :
> >
> > Explanation: Files.txt is a ls/dir listing of the wanted files to
> > combine. Then, the script reads in each file, stripping any sort of
> > header information from the columns, and outputting everything into one
> > file. Fairly simple, and a quick search on the web for file
> > manipulation using Perl will come up with probably a better explanation.
> >
> > $in_file = "Files.txt";
> > $out_file = "outfile.txt";
> >
> > open (INFILE, $in_file) || die "INFILE";
> > open (OUTFILE, ">$out_file") || die "OUTFILE";
> >
> > @infiles = <INFILE>;
> > close(INFILE);
> >
> > print OUTFILE "z,x,y\n";
> >
> > foreach $in_files (@infiles)
> > {
> >
> > open (INFILE1, $in_files) || die "Cannot open $in_files";
> > while (<INFILE1>)
> > {
> >
> > chomp($_);
> > ($x, $y, $z) = split ',',$_;
> >
> > if ($x != x) {
> > print OUTFILE "$z,$x,$y\n"; }
> > }
> >
> > close(INFILE1);
> > }
> >
> > close(OUTFILE);
> >
> > Kevin Slover
> > Coastal / GIS Specialist
> > 2872 Woodcock Blvd Suite 230
> > Atlanta GA 30341
> > (P) 678-530-0022
> > (F) 678-530-0044
> >
> > -----Original Message-----
> > From: grassuser-bounces@grass.itc.it
> > [mailto: grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
> > Sent: Tuesday, August 08, 2006 12:12 AM
> > To: grassuser@grass.itc.it
> > Subject: [GRASS-user] combine & edit multiple text files
> >
> > Hi!
> >
> > I have a number of ascii files downloaded from ASTR fire project from
> > the ESA Ionia showing monthly fire incidences from 1996-2006. I
> > intend to combine all these files, remove unwanted columns and get the
> > records from my current region/study area only. All records combined
> > is 929,155 records! My guess is I need to use the cat, cut, awk
> > commands.
> >
> > Challenge: the files have different record formating
> >
> > file 1 is like this (take note of the space as the delimiter):
> >
> > Date Time Lat Lon NDVI Station
> > 020201 032428.163 -38.379 -66.334 -.-- ESR
> > 020201 032428.163 -38.375 -66.323 -.-- ESR
> > 020201 032428.312 -38.378 -66.359 -.-- ESR
> > 020201 032428.312 -38.374 -66.348 -.-- ESR
> > 020201 032428.312 -38.371 -66.337 -.-- ESR
> >
> > file 2 looks like this:
> > Date Orbit Time Lat
  Lon
> > 20030101 4384 81704.016 19.364 -155.103
> > 20030101 4384 81704.164 19.373 -155.105
> > 20030101 4384 81704.164 19.375 -155.096
> > 20030101 4385 100833.648 56.638 161.281
> > 20030101 4386 130756.352 -20.340 134.099
> >
> > I only need the columns for date, time, lat, lon
> >
> > Here's what I did:
> >
> > #combine all file (monthly)
> > cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
> > 9908ESA01.FIRE ... > test
> >
> > # cut only desired columns (1_4) delimeiter is spac ' '
> > cut -d' ' -f1 test > 1
> > cut -d' ' -f2 test > 2
> > cut -d' ' -f3 test > 3
> > cut -d' ' -f4 test > 4
> >
> > # combine all columns
> > paste 1 2 3 4 > test5
> >
> > example output:
> >
> > 021231 223941.761 11.035 -5.016 -.-- ESR
> > 021231 224005.303 12.226 -6.243 -.-- ESR
> > 20030101 4380 25934.057 -37.022 -69.589
> > 20030101 4382 45951.090 33.005 -110.772
> >
> > The problem is for the file example 1, lat and lon columns contain
> > spaces other than the delimiter example " -38.00" while another is
> > "120.00" In the file2 example, more spaces are there. I think I need
> > to process different file formats separately but how do I solve the
> > problem for spaces in the lat/lon columns?
> >
> > One last question how do I get the records for my current region only?
> >
> > north: 20:00:01.49976N
> > south: 5:00:01.499767N
> > west: 115:00:01.5012E
> > east: 130:00:01.501193E
> >
> > I'm starting to understand awk (reading the gawk manual right now) but
> > may take a while to get do something magical.
> >
> > Thanks!
> >
> > Maning
> >
> > --
> >
|---------|----------------------------------------------------------|
> > | __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
> > | '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
> > | /'.-c |Linux registered user #402901, http://counter.li.org/ |
> > | | /T |http://esambale.wikispaces.com|
> > | _)_/LI
> >
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
  |
> >
|---------|----------------------------------------------------------|
> >
> > _______________________________________________
> > grassuser mailing list
> > grassuser@grass.itc.it
> > http://grass.itc.it/mailman/listinfo/grassuser
> >
>
> --
>
|---------|----------------------------------------------------------|
> | __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
> | '-._"7' |"Freedom is still the most radical idea of all" - N.Branden|
> | /'.-c |Linux registered user #402901, http://counter.li.org/ |
> | | /T |http://esambale.wikispaces.com|
> | _)_/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
  |
>
|---------|----------------------------------------------------------|
>
> _______________________________________________
> grassuser mailing list
> grassuser@grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser
>

--
David Finlayson

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI |http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|

Simple, yes, but it took me a few minutes on Google to remember the awk syntax. Unix is powerful, but it isn’t intuitive.

David

On 8/11/06, maning sambale <emmanuel.sambale@gmail.com> wrote:

David,

thank you! that’s the one I need (for now). Funny how simple it is.
Another funny anecdote, about a year ago I passed by our GIS lab and
saw a girl editing very large ascii file (mouse click, edit, edit,
save, next line) much the same as my files I’m manipulating right now.
I asked her there might be a better way in doing this. She said it’s
the only way her instructor and the lab technician thought them. :slight_smile:

cheers,

Maning

On 8/10/06, David Finlayson <david.p.finlayson@gmail.com> wrote:

Try this to print column 1 and 3. I think it will work on all of your files
no matter how many spaces are in between:

cat file | awk ‘{print $1, $3}’

David

On 8/9/06, maning sambale <emmanuel.sambale@gmail.com > wrote:

David & Kevin,

Yes, python or perl would be great. But what I need right now is a
quick (maybe dirty) approach. I do intend to study python as I’ve
heard a lot about it. But not this time, I’m trying to study Linux
tools the “modular way”:slight_smile:

Cheers,

Maning

On 8/8/06, Slover, Kevin < kslover@dewberry.com> wrote:

Maning,
As David says, python or perl are used now for manipulating text
files. I have done several quick scripts for doing this with Perl
(thanks David for the python script, bout time I learn to use it). A
basic perl script would look like this (and note, my Perl is not great,
and am sure there are many other ways to do this) :

Explanation: Files.txt is a ls/dir listing of the wanted files to
combine. Then, the script reads in each file, stripping any sort of
header information from the columns, and outputting everything into one
file. Fairly simple, and a quick search on the web for file
manipulation using Perl will come up with probably a better explanation.

$in_file = " Files.txt";
$out_file = “outfile.txt”;

open (INFILE, $in_file) || die “INFILE”;
open (OUTFILE, “>$out_file”) || die “OUTFILE”;

@infiles = ;
close(INFILE);

print OUTFILE “z,x,y\n”;

foreach $in_files (@infiles)
{

open (INFILE1, $in_files) || die “Cannot open $in_files”;
while ()
{

chomp($);
($x, $y, $z) = split ‘,’,$
;

if ($x != x) {
print OUTFILE “$z,$x,$y\n”; }
}

close(INFILE1);
}

close(OUTFILE);

Kevin Slover
Coastal / GIS Specialist
2872 Woodcock Blvd Suite 230
Atlanta GA 30341
(P) 678-530-0022
(F) 678-530-0044

-----Original Message-----
From: grassuser-bounces@grass.itc.it
[mailto: grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
Sent: Tuesday, August 08, 2006 12:12 AM
To: grassuser@grass.itc.it
Subject: [GRASS-user] combine & edit multiple text files

Hi!

I have a number of ascii files downloaded from ASTR fire project from
the ESA Ionia showing monthly fire incidences from 1996-2006. I
intend to combine all these files, remove unwanted columns and get the
records from my current region/study area only. All records combined
is 929,155 records! My guess is I need to use the cat, cut, awk
commands.

Challenge: the files have different record formating

file 1 is like this (take note of the space as the delimiter):

Date Time Lat Lon NDVI Station
020201 032428.163 -38.379 -66.334 -.-- ESR
020201 032428.163 -38.375 -66.323 -.-- ESR
020201 032428.312 -38.378 -66.359 -.-- ESR
020201 032428.312 -38.374 -66.348 -.-- ESR
020201 032428.312 -38.371 -66.337 -.-- ESR

file 2 looks like this:
Date Orbit Time Lat
Lon
20030101 4384 81704.016 19.364 -155.103
20030101 4384 81704.164 19.373 -155.105
20030101 4384 81704.164 19.375 -155.096
20030101 4385 100833.648 56.638 161.281
20030101 4386 130756.352 -20.340 134.099

I only need the columns for date, time, lat, lon

Here’s what I did:

#combine all file (monthly)
cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
9908ESA01.FIRE … > test

cut only desired columns (1_4) delimeiter is spac ’ ’

cut -d’ ’ -f1 test > 1
cut -d’ ’ -f2 test > 2
cut -d’ ’ -f3 test > 3
cut -d’ ’ -f4 test > 4

combine all columns

paste 1 2 3 4 > test5

example output:

021231 223941.761 11.035 -5.016 -.-- ESR
021231 224005.303 12.226 -6.243 -.-- ESR
20030101 4380 25934.057 -37.022 -69.589
20030101 4382 45951.090 33.005 -110.772

The problem is for the file example 1, lat and lon columns contain
spaces other than the delimiter example " -38.00" while another is
“120.00” In the file2 example, more spaces are there. I think I need
to process different file formats separately but how do I solve the
problem for spaces in the lat/lon columns?

One last question how do I get the records for my current region only?

north: 20:00: 01.49976N
south: 5:00:01.499767N
west: 115:00:01.5012E
east: 130:00:01.501193E

I’m starting to understand awk (reading the gawk manual right now) but
may take a while to get do something magical.

Thanks!

Maning

|---------|----------------------------------------------------------|

| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| )/LI

|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
|

|---------|----------------------------------------------------------|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser

|---------|----------------------------------------------------------|

| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” - N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| )/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
|

|---------|----------------------------------------------------------|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser


David Finlayson


|---------|----------------------------------------------------------|
| _.-. |“Ohhh. Great warrior. Wars not make one great.” -Yoda |
| ‘-._"7’ |“Freedom is still the most radical idea of all” -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|


grassuser mailing list
grassuser@grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser


David Finlayson

Finally did it, using awk, cat, sort. Maybe not the best way, but
gets the job done. Thank you!

# combine 96-02 data series
cat 0001ESA01.FIRE 0002ESA01.FIRE .... > file1

#get selected columns 1 to 4
cat file1 | awk '{print $1 $2 $3 $4}' > file2

# combine 03-06 data series
cat 200301ALGO1.FIRE 200302ALGO1.FIRE 200303ALGO1.FIRE .... > file3
# get selected columns 1, 3 to 5
cat file3 | awk '{print $1 $3 $4 $5}' > file4

#combine both files
cat file2 file3 > bigfile

sort -k 3 -g bigfile > file4 # sort to column 3
awk '$3== "5.000" , $3== "21.000" { print $0 }' file4 > file5
awk 'END { print NR }' file5 #counts lines

sort -k 4 -g file5 > file6 # sort to column 4
awk '$4== "114.795" , $4== "126.175" { print $0 }' file6 > file7
awk 'END { print NR }' file7 #counts lines

# import to grass
cat file7 | v.in.ascii out=fire_96_to_06_astr x=4 y=3 fs=" "
columns='label_date varchar(20), label_time varchar(20), x double, y
double'

Maning

On 8/12/06, David Finlayson <david.p.finlayson@gmail.com> wrote:

Simple, yes, but it took me a few minutes on Google to remember the awk
syntax. Unix is powerful, but it isn't intuitive.

David

On 8/11/06, maning sambale <emmanuel.sambale@gmail.com> wrote:
> David,
>
> thank you! that's the one I need (for now). Funny how simple it is.
> Another funny anecdote, about a year ago I passed by our GIS lab and
> saw a girl editing very large ascii file (mouse click, edit, edit,
> save, next line) much the same as my files I'm manipulating right now.
> I asked her there might be a better way in doing this. She said it's
> the only way her instructor and the lab technician thought them. :slight_smile:
>
> cheers,
>
> Maning
>
> On 8/10/06, David Finlayson <david.p.finlayson@gmail.com> wrote:
> > Try this to print column 1 and 3. I think it will work on all of your
files
> > no matter how many spaces are in between:
> >
> > cat file | awk '{print $1, $3}'
> >
> > David
> >
> > On 8/9/06, maning sambale <emmanuel.sambale@gmail.com > wrote:
> > > David & Kevin,
> > >
> > > Yes, python or perl would be great. But what I need right now is a
> > > quick (maybe dirty) approach. I do intend to study python as I've
> > > heard a lot about it. But not this time, I'm trying to study Linux
> > > tools the "modular way":slight_smile:
> > >
> > > Cheers,
> > >
> > > Maning
> > >
> > > On 8/8/06, Slover, Kevin < kslover@dewberry.com> wrote:
> > > > Maning,
> > > > As David says, python or perl are used now for manipulating text
> > > > files. I have done several quick scripts for doing this with Perl
> > > > (thanks David for the python script, bout time I learn to use it).
A
> > > > basic perl script would look like this (and note, my Perl is not
great,
> > > > and am sure there are many other ways to do this) :
> > > >
> > > > Explanation: Files.txt is a ls/dir listing of the wanted files to
> > > > combine. Then, the script reads in each file, stripping any sort of
> > > > header information from the columns, and outputting everything into
one
> > > > file. Fairly simple, and a quick search on the web for file
> > > > manipulation using Perl will come up with probably a better
explanation.
> > > >
> > > > $in_file = " Files.txt";
> > > > $out_file = "outfile.txt";
> > > >
> > > > open (INFILE, $in_file) || die "INFILE";
> > > > open (OUTFILE, ">$out_file") || die "OUTFILE";
> > > >
> > > > @infiles = <INFILE>;
> > > > close(INFILE);
> > > >
> > > > print OUTFILE "z,x,y\n";
> > > >
> > > > foreach $in_files (@infiles)
> > > > {
> > > >
> > > > open (INFILE1, $in_files) || die "Cannot open $in_files";
> > > > while (<INFILE1>)
> > > > {
> > > >
> > > > chomp($_);
> > > > ($x, $y, $z) = split ',',$_;
> > > >
> > > > if ($x != x) {
> > > > print OUTFILE "$z,$x,$y\n"; }
> > > > }
> > > >
> > > > close(INFILE1);
> > > > }
> > > >
> > > > close(OUTFILE);
> > > >
> > > > Kevin Slover
> > > > Coastal / GIS Specialist
> > > > 2872 Woodcock Blvd Suite 230
> > > > Atlanta GA 30341
> > > > (P) 678-530-0022
> > > > (F) 678-530-0044
> > > >
> > > > -----Original Message-----
> > > > From: grassuser-bounces@grass.itc.it
> > > > [mailto: grassuser-bounces@grass.itc.it] On Behalf Of maning sambale
> > > > Sent: Tuesday, August 08, 2006 12:12 AM
> > > > To: grassuser@grass.itc.it
> > > > Subject: [GRASS-user] combine & edit multiple text files
> > > >
> > > > Hi!
> > > >
> > > > I have a number of ascii files downloaded from ASTR fire project
from
> > > > the ESA Ionia showing monthly fire incidences from 1996-2006. I
> > > > intend to combine all these files, remove unwanted columns and get
the
> > > > records from my current region/study area only. All records combined
> > > > is 929,155 records! My guess is I need to use the cat, cut, awk
> > > > commands.
> > > >
> > > > Challenge: the files have different record formating
> > > >
> > > > file 1 is like this (take note of the space as the delimiter):
> > > >
> > > > Date Time Lat Lon NDVI Station
> > > > 020201 032428.163 -38.379 -66.334 -.-- ESR
> > > > 020201 032428.163 -38.375 -66.323 -.-- ESR
> > > > 020201 032428.312 -38.378 -66.359 -.-- ESR
> > > > 020201 032428.312 -38.374 -66.348 -.-- ESR
> > > > 020201 032428.312 -38.371 -66.337 -.-- ESR
> > > >
> > > > file 2 looks like this:
> > > > Date Orbit Time Lat
> > Lon
> > > > 20030101 4384 81704.016 19.364 -155.103
> > > > 20030101 4384 81704.164 19.373 -155.105
> > > > 20030101 4384 81704.164 19.375 -155.096
> > > > 20030101 4385 100833.648 56.638 161.281
> > > > 20030101 4386 130756.352 -20.340 134.099
> > > >
> > > > I only need the columns for date, time, lat, lon
> > > >
> > > > Here's what I did:
> > > >
> > > > #combine all file (monthly)
> > > > cat 9904ESA01.FIRE 9905ESA01.FIRE 9906ESA01.FIRE 9907ESA01.FIRE
> > > > 9908ESA01.FIRE ... > test
> > > >
> > > > # cut only desired columns (1_4) delimeiter is spac ' '
> > > > cut -d' ' -f1 test > 1
> > > > cut -d' ' -f2 test > 2
> > > > cut -d' ' -f3 test > 3
> > > > cut -d' ' -f4 test > 4
> > > >
> > > > # combine all columns
> > > > paste 1 2 3 4 > test5
> > > >
> > > > example output:
> > > >
> > > > 021231 223941.761 11.035 -5.016 -.-- ESR
> > > > 021231 224005.303 12.226 -6.243 -.-- ESR
> > > > 20030101 4380 25934.057 -37.022 -69.589
> > > > 20030101 4382 45951.090 33.005 -110.772
> > > >
> > > > The problem is for the file example 1, lat and lon columns contain
> > > > spaces other than the delimiter example " -38.00" while another is
> > > > "120.00" In the file2 example, more spaces are there. I think I
need
> > > > to process different file formats separately but how do I solve the
> > > > problem for spaces in the lat/lon columns?
> > > >
> > > > One last question how do I get the records for my current region
only?
> > > >
> > > > north: 20:00: 01.49976N
> > > > south: 5:00:01.499767N
> > > > west: 115:00:01.5012E
> > > > east: 130:00:01.501193E
> > > >
> > > > I'm starting to understand awk (reading the gawk manual right now)
but
> > > > may take a while to get do something magical.
> > > >
> > > > Thanks!
> > > >
> > > > Maning
> > > >
> > > > --
> > > >
> >
|---------|----------------------------------------------------------|
> > > > | __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda
|
> > > > | '-._"7' |"Freedom is still the most radical idea of all"
-N.Branden|
> > > > | /'.-c |Linux registered user #402901, http://counter.li.org/
|
> > > > | | /T |http://esambale.wikispaces.com|
> > > > | _)_/LI
> > > >
> >
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
> > |
> > > >
> >
|---------|----------------------------------------------------------|
> > > >
> > > > _______________________________________________
> > > > grassuser mailing list
> > > > grassuser@grass.itc.it
> > > > http://grass.itc.it/mailman/listinfo/grassuser
> > > >
> > >
> > > --
> > >
> >
|---------|----------------------------------------------------------|
> > > | __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
> > > | '-._"7' |"Freedom is still the most radical idea of all" -
N.Branden|
> > > | /'.-c |Linux registered user #402901, http://counter.li.org/ |
> > > | | /T |http://esambale.wikispaces.com|
> > > | _)_/LI
> >
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
> > |
> > >
> >
|---------|----------------------------------------------------------|
> > >
> > > _______________________________________________
> > > grassuser mailing list
> > > grassuser@grass.itc.it
> > > http://grass.itc.it/mailman/listinfo/grassuser
> > >
> >
> > --
> > David Finlayson
>
> --
>
|---------|----------------------------------------------------------|
> | __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
> | '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
> | /'.-c |Linux registered user #402901, http://counter.li.org/ |
> | | /T |http://esambale.wikispaces.com|
> | _)_/LI
|http://www.geocities.com/esambale/philbiodivmap/philbirds.html
  |
>
|---------|----------------------------------------------------------|
>
> _______________________________________________
> grassuser mailing list
> grassuser@grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser
>

--
David Finlayson

--
|---------|----------------------------------------------------------|
| __.-._ |"Ohhh. Great warrior. Wars not make one great." -Yoda |
| '-._"7' |"Freedom is still the most radical idea of all" -N.Branden|
| /'.-c |Linux registered user #402901, http://counter.li.org/ |
| | /T |http://esambale.wikispaces.com|
| _)_/LI |http://www.geocities.com/esambale/philbiodivmap/philbirds.html |
|---------|----------------------------------------------------------|