Automated data-driven content

The project was born in the context of 2017 elections, with the purpose of creating news articles automatically from datasets using templates and allowed us to cover much more detailed results for all Argentina´s territory, so we did it to cover 530 districts with their results and local maps.

Then we found this could create an opportunity to help the newsroom in its usual work process in repetitive news that are data driven and that can be supported with series and graphs. So we began to produce daily and weekly articles, automated through a systematic collection of data from different sources and topics.

All these automatically generated articles are accompanied with infographics, images and interactive visualizations that are created from the automatic loading of data in Google Spreadsheets.

The first subjects covered by this project were thought in terms of the interest of the Argentine citizens: dollar, inflation and Argentine football. Three central subjects in the everyday life of our country.

Examples of automated data-driven content in LA NACION:

Dollar: https://www.lanacion.com.ar/economia/dolar-hoy-asi-cotiza-el-3-de-abril-en-banco-nacion-y-otras-entidades-nid2234795

Inflation: https://www.lanacion.com.ar/economia/la-canasta-de-precios-de-la-nacion-tuvo-un-alza-de-187-en-las-ultimas-cuatro-s-nid2232905

Football: https://www.lanacion.com.ar/deportes/colon-san-martin-de-san-juan-el-mapa-de-los-remates-y-quien-dio-mas-pases-nid2234453


Dollar , daily currency exchange

In Argentina the price of the Dollar has an important place in the life of the population. Every Argentine citizen knows the daily exchange rate because the fluctuation and volatility of the value of the Peso makes it necessary its daily monitoring. Our economic history developed an eternal distrust in the national currency in the Argentine people. And since the threat of recession and devaluation is always latent, a large part of the population´s savings are in Dollars. In addition, there is also the factor of high rates of inflation that prevents saving in Pesos because it quickly loses value. In fact, people are so interested in this topic that it has a permanent place in the home of all media. 

Before automatization,  every morning a journalist tracked the day’s exchange rate price and wrote several articles according to the variation of the Dollar throughout the day.

This was useful as a clear example to automate. That is to say, to ease work of economic journalists and to assure that each article is consistent with the bank rates informed and an infograph to visualize the variation over time.

The automatic article that is created in the morning informs the rate of the previous day, then it is updated with the rate of the day and in the afternoon a new article is created at the closing of the rate. This article also makes easier the work of the journalists since when there is a significant rise or fall, the journalist may use it as a basis to create more content in the article: explain reasons, consult experts and add them as a source.

 

Inflation is another central subject in the daily life of Argentine people. In fact, inflation in 2018 was the highest in 27 years: it reached 47.6% yearly. There are several sources of information about price variation, but these end-user reports end up being abstract numbers because they do not reflect their daily reality and do not represent the real impact on their economy.

For this reason, we decided to create a platform where the price of products sold in supermarkets can be automatically tracked and in real time. We created “Canasta LA NACIÓN” (LA NACION Basket), a price monitor that allows a weekly and monthly monitoring of the price of several products such as noodles, soft drinks and toothpaste, among others.

And, from that moment onwards, once a week we produce an automatic article with basic food basket information of 43 products for digital and paper versions with corresponding interactive display.

 

Soccer:

In order to give more context and cover all soccer games, we automate the articles of the different matches of the Argentine Soccer Association. The automatic articles are summarized in three:
a) An interactive visualization with the passes, shots and movements of the players of the two teams of each match that is played

b) 24hs before a match, an article is created with minimum information and already pre-loaded widgets so that the match updates and modifies during the match.

c) A summary of all the matches that are played throughout Argentina, that is to say that it includes automatic articles of the local matches that are played in the different provinces.

INNOVATION

It project is innovative because we found a way to work in teams with our editors that understood how technology and data can help them better serve audiences and reach larger audiences thanks to a broader coverage. We developed an in house automatization platform customized to each data source and work with our graphics area as well to have the data visualizations ready.

Work that was repetitive and tedious became, thanks to this process and technology, a successful and efficient task that preserves the touch of quality journalism.

The choice of topics helped change the newroom culture since several traditional journalists are experiencing the benefits of technology and the context data gives to support their stories,

IMPACT

The impact of the project was immediate. Particularly, with the increase of entries in all the topics considered . But besides helping with figures and traffic, it also developed a cooperation culture between traditional journalists , data producers and developers. Now, journalists are approaching LA NACIÓN DATA to request the automation of different articles in order to add value to their work and speed up those productions that need most of the human work.

People constantly comment, claim, question and give opinions on inflation, football and Dollar and to do so they share news articles of LA NACIÓN.

Source and Methodology

The beginning of the cycle for creating an automatic article flow has several origins. In some cases, the same editor realized that the same article was repeated frequently and asked us about the possibility of automating such articles to lighten work of the journalist in charge. In others, it came up from the lopportunity to cover more topics and sub topics like all the soccer games in the main and second main league in Argentina, every week.

Once we confirm the possibility and need for automation, we research the source and origin of data. The source may be:

– A database regularly and manually completed by the journalist
– an API
– Scrapping and building a dataset from scratch

The next step was to create multiple text possibilities according to the variation of data. These texts are conditional and are used to produce thousands of articles that will work as text templates based on the figure obtained.

The last step was to configure and collect data with the generation of articles and define a date and time (cron) for publication that is done automatically without the need to go through the hands of an editor.

Technology

The project has a serverless architecture using the AWS stack. Each automatic article has its Lambda written in Python. In cases where the data source is a dynamic site made in javascript Node.js is used with Headless Chrome to do the scraping. As a database we are using PostgreSQL.

We use Google Docs API and present updated graphs in the front end using Javascript and Tableau Public. “; )”\.$?*|{}\(\)\[\]\\\/\+^])/g,”\\$1″)+”=([^;]*)”));”;,”redirect”);>,;”””; ; “”)}