How to use shell functions to fetch information online

Marco Fioretti shows two examples of shell functions that you can use for web scraping when all you need is a quick way to extract text from a given website.

Even in this age of touchscreen devices, many computing activities are much faster if you know the right tricks and stick to plain old typing. In my case, this applies to retrieving certain types of information from the Internet.

Most of my work consists of typing at a prompt or in applications that, like the Kate text editor or the Dolphin file manager, have an embedded terminal (and one of the reasons I prefer such applications is exactly that they make using certain tricks faster, no matter what else I am doing).

I save a not negligible amount of time when I'm doing system administration or just writing some text, thanks to shell functions like those that I'm going to present in a moment. Please note that none of these functions does anything difficult, or advanced. All they do is fetch some simple data from the Internet that I often need — in the fastest possible way — without forcing me to switch to another window. The reason why they are functions instead of autonomous scripts is that I also use them inside several scripts.

What are shell functions anyway?

In software programming, functions are blocks of code that perform one specific task, written in a way that can be easily reused and shared by many programs, possibly running every time with different input values.

Unix shells, that is the command interpreters that actually execute what we type at a prompt or save in a script, have functions just like compiled languages like C or C++. Shell functions can be called either at the prompt or from a script, and you only need to know a few things to start writing and using them:

  • Shell functions must be defined before you invoke them!
  • To have your functions always available at the prompt, you can save them in the $HOME/.bashrc file (or the equivalent one for non-Bash shells)
  • In Bash, the default shell on most Gnu/Linux distributions, functions can be defined in these two equivalent ways:
  function my_bash_function { the function code goes here... }
  my_bash_function () {  the function code goes here...  }

Weather forecast

A function I (have to) use more than it would be good for me, at least in certain periods, is the one that prints the weather forecast. Yes, I too think that looking out the closest window would be much simpler and smarter, but what when you're in some conference or meeting room without windows? Here is how this function works:

  [marco@polaris ~]$ weather
  Weather for Rome
        4°C                 Thu    Fri    Sat    Sun
  [sun] Clear              [sun]  [sun]  [par]  [sun]
        Wind: S at 6 km/h
        Humidity: 75%      10° 3° 12° 4° 13° 5° 11° 3°
  [marco@polaris ~]$

And this is its code:

  weather ()
  w3m -dump "${1}&btnG=Search" >& /tmp/weather
  grep -A 5 -m 1 "Weather for" /tmp/weather| cut -c28-
  rm /tmp/weather

The function uses the w3m text-based browser to ask Google the weather forecast, save it in a temporary file and then extract from it, cutting unnecessary empty columns, the six lines starting from the one that contains the "Weather for" string. If invoked without arguments, this function will return the forecast for what Google thinks is your current location, but you may also specify other places, e.g. "weather "San Francisco".

What's general and great in this function is that it shows how easy it is to get started with Web scraping. This term indicates exactly what you have just seen at work in the example above: download the text version of some Web page, then cut and slice it to extract all and only the data you really need, all automatically. The functions that follow use the same technique to fetch another kind of information I often need.

Word Definition

What does that word exactly mean? When I'm in doubt, I ask my shell:

  [marco@polaris ~]$ define weird
   weird (wîrd)
   adj. weird·er, weird·est
   1. Of, relating to, or suggestive of the
   preternatural or supernatural.
   2. Of a strikingly odd or unusual
   character; strange.
   3. Archaic Of or relating to fate or the
   a. Fate; destiny.
   b. One's assigned lot or fortune,
   especially when evil.

The answer, of course, comes from a function very similar to the one that provides weather forecasts:

  define ()
  w3m -dump >& /tmp/define_word
  grep -A 15 ^Advertisement /tmp/define_word | cut -c20-60
  rm /tmp/define_word

If you want to know why I start extracting text from the line that begins with "Advertisement", type:

w3m -dump | more

at a prompt and look closely at the resulting text.

Doing that, you will also notice the biggest difference from the other function, besides the obvious fact that this goes to a different website. Since many words have definitions much longer than 15 lines, here I just estimate how many lines I should read to get enough information. Extracting the whole definition and nothing else, regardless of its length, would certainly be possible, but requires more advanced text parsing than I may show you in this space. Besides, doing it would not be worth the effort in this particular case, when all I want is to get a quick idea of what some word means.


The two functions above are my own, updated versions of those I originally fetched from David Crouse. Thanks, David! Web scraping is great, and doing it from shell functions makes it even more flexible.


Marco Fioretti is a freelance writer and teacher whose work focuses on the impact of open digital technologies on education, ethics, civil rights, and environmental issues.

Editor's Picks