[kwlug-disc] wget and variable assignment

John Van Ostrand john at netdirect.ca
Thu Jun 3 18:53:18 EDT 2010


----- Original Message -----
> I have a simple screen-scrape to do.
> 
> >From the command line it works fine
> 
> wget -q -O - http://www.openstreetmap.org/stats/data_stats.html| grep
> "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
> 
> it returns the plain number
> 
> 262086
> 
> Cool, now to add it to a script
> 
> This works fine
> GETTEE=`wget -q -O -
> http://www.openstreetmap.org/stats/data_stats.html| grep "<td>Number
> of users" | sed -e 's/[:a-zA-Z <>/:]//g'`
> echo "GETTEE = $GETTEE"
> 
> gives: GETTEE = 262086
> 
> But. I want to grab some other data from the same page, so I want to
> wget once, then grep / sed a couple of times. And I'm breaking it.
> The page appears to have been stripped of its \n and so grepping the
> line I want is failing.
> 
> GETTEE=`wget -q -O -
> http://www.openstreetmap.org/stats/data_stats.html` echo "GETTEE =
> $GETTEE"
> 
> This returns a mess.
> 
> The quick and dirty is to wget four times for four numbers, but I
> don't want to do that. How do I assign the wget to a variable and
> keep \n ?
>
That's because IFS includes newlines which are being stripped. If you set IFS to just space and tab it works:

PAGE=`wget -q -O - http://www.openstreetmap.org/stats/data_stats.html`
export IFS="    "
echo "$PAGE" | grep "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
echo "$PAGE" | grep "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
echo "$PAGE" | grep "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
echo "$PAGE" | grep "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
unset IFS


-- 
John Van Ostrand 
CTO, co-CEO 
Net Direct Inc. 
564 Weber St. N. Unit 12, Waterloo, ON N2L 5C6 
Ph: 866-883-1172 x5102 
Fx: 519-883-8533 

Linux Solutions / IBM Hardware 





More information about the kwlug-disc mailing list