Tutorial: how to code a robot reporter

This tutorial is a simplified version of a workshop I gave in french to the Fédération professionnelle des journalistes du Québec.

It’s not a surprise for anyone anymore: robots are invading every industry, and newsrooms are not spared.

On March 2014, an algorithm coded by Los Angeles Times reporter Ken Schwenke wrote an article by itself on an earthquake that struck the area. Nowadays, sports news, business news and even political news are written by bots.

Inspired by my American colleague and his initiative, I’ll show you how to create your own robot reporter, with the programming language Python! Our bot will extract data on earthquakes in Canada and will write a short article by itself about the most recent one in the seismic area of Charlevoix, in Quebec!

If you’ve never wrote code before, I suggest you start with another tutorial of mine: Your first steps in programming. 🙂

*****

Most of the time, breaking news articles are always written the same way. For a soccer game, financial results or an earthquake, the reporter always does almost the same thing: he analyses the data, then writes an article with a structure more or less the same.

On a simplified basis, it’s like if the reporter had a text with holes. He just has to fill the holes with different data and information depending on the story. For example:

An earthquake with a magnitude of XX struck XX at XX:XX.

Therefore, if we find the corresponding data, we will easily be able to fill the holes and our article will be complete!

 

I Extract the data

For an earthquake, the data is basically the date, the magnitude and the location. This information is always presented the same way by autorities.

For example, in Canada, the data about the most recent earthquakes is automatically published on the website Earthquakes Canada. On this website, you have access to all the earthquakes in the last thirty days (you can even retrieve data from the past year).

earthquake_list_30

If you look at the source code of the data displayed on the page, you can see that it’s a simple html table.

table_html_earthquake

The cells of each column have a headers. It’s perfect for us. The data will be very easy to extract.

To work with html code, you’re going to need the library BeautifulSoup. If you haven’t already installed it, do so by opening your Terminal and typing the code below. In case of a problem, refer to the official documentation.

sudo beautifulsoup

And here is the complete code to extract the data with Python!

In Python, it’s possible to write comments by using “#” at the beginning of the line. The computer will ignore these lines. I use this to explain my script below. Two exceptions: the first two lines, that tell the computer it’s a Python script and which encoding to use.

Run the script. You should see this in your terminal.

terminal_earthquake_list

II Write the text

Now that our data is clean and well organized, all we have left is to write our article!

For this example, we are going to write an article about the most recent earthquake that struck the seismic area of Charlevoix, in Quebec.

I told you that we would fill a text with holes. Actually, we are going to do a little bit more than that. Our text will adapt itself to our data! For example, depending on the magnitude, our robot reporter will use different adjectives to put emphasis on how powerful or weak the earthquake was.

To do so, we add these lines to our script.

By adding this part to our script, we will create a bot able to write by itself!

Run the script in your terminal. You should have something like this.

article_written_earthquakes

III Conclusions

Voilà! You just created an algorithm able to write a short article about the most recent earthquake in the Charlevoix area!

For the moment, this bot is very simple, but you can add as many functions as you want for a longer article, to extract data from several sources or even to create an interactive map with the coordinates.

Of course, the most interesting thing would be to add several lines of code to your robot reporter to have it monitoring the Earthquakes Canada website and automatically write and publish on your website an article when an earthquake hits your area!

Follow me on Twitter, Facebook and LinkedIn for my next tutorials. 🙂

 

5 thoughts on “Tutorial: how to code a robot reporter

  1. Devon

    I’m an autodidact (self learning) student data scientist, and I have to say, this is INCREDIBLE. Great post. Thank you for this extremely helpful information.

    Reply
  2. Abdirahiim

    I’m getting the following error

    File “myfirstwebscrapper.py”, line 85
    print earthquake_time_date, earthquake_lat, earthquake_lon, earthquake_depth,
    earthquake_mag, earthquake_felt, earthquake_region, earthquake_province
    ^
    SyntaxError: Missing parentheses in call to ‘print’

    by the way I’m running Python 3.4

    Reply
  3. Abdirahiim

    I solved the issue I just needed to use (), forgot about that. But now it’s saying that the module urllib2 doesn’t exist I tried to install it using pip but it’s also saying that it doesn’t exist. How shall I resolve this issue?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *