Since 2011 when LA NACION Data was launched as an open data journalism initiative, its strategy has been the same: to do data journalism AND to open data. The vision we have is that each set of data that is published means that more knowledge is released.
Anyone can think that opening data is none of our business, but in Argentina, a country without FOI law nor significant open data initiatives we wanted to do data journalism and to do so we had to build our datasets from scratch and share them, to demonstrate that it can be done anywhere, by anyone and each dataset we build and open adds value not only today and not only for us. So build datasets and open them, why? Because we believe in long term with data, in the long term everyone will understand that we had to do this big effort to jumpstart, show examples even to governments that hide information, facilitate this to journalism and hacktivism for reuse in useful analysis or visualizations, and by doing this, show how evidence produces impact. Making datasets “famous” and opening data can only be done through open collaboration, so in terms of evangelizing about open data we organized 3 Datafest events together with a Data Mining and Journalism University and presented and explained datasets we used and opened to facilitate their reuse. Here are 10 examples of our efforts to build, update and use open datasets:
Daily “Data ready” open data series: reusable for giving context and illustrating with series of data like inflation (CPI), dolar price, Central Bank reserves in U$D, automobile industry monthly sales, real estate housing market registered.
Open Declaration of Assets from Public Servants: manually updated dataset that feeds an application. This three year project evolved and opened the source, the data and the process of our work. This year we added 1000 more declarations of assets so now we have more than 2500.
BEFORE: Scanned PDFs
Subsidies of The Bus Transportation System in Argentina 2005-2015. We scrape, transform , build this dataset update it every 3 months and open it. More than 280.000 rows.
“Vozdata Telegrams” for Opening elections data from PDFs: 14.000 PDFs from polling stations were reviewed and classified by students from Universities, citizens and Ngos in presential events we organized as “civic marathons”. They consist of a full day we hosted “to build data” together and monitor the electoral process that was suspected and then changed between elections as we reported that 40% of the PDF reviewed presented inconsistencies. This evidence was published and opened.
Polling stations for Elections map using Machine Learning and remote collaboration, then shared in open data formats.
Congresoscopio: opening legislative activity data from PDFs
Buenos Aires City claims per Zone Dataset: Use of Open Data for a series of articles regarding the claims per zone of citizens in Buenos Aires on garbage, security, social housing and public transportation before elections. Our team joined 2,5MM rows of data , loaded it to SQL , filtered this topics, normalized and opened for reuse the series that were reported. [Go to detailed explanation and behing the scenes in english]
Buenos Aires City Budget 2013-2015 Open dataset: USE of Open Data to build our own visualization of Buenos Aires City´s budget . With data normalized by our team and a visualization based in the Open Source code from Fundacion Civio from Spain, Marta Alonso (the developer) built for us this site that provides open data in CSV.
Official Advertsing 2009-2015: joined three datasets from different formats and different origins (Cheaf Cabinet Site, NGO Poder Ciudadano, NGO Led), then opened this data. Government released PDFs that contained Official Advertising data 2009-2012 (one PDF per semester). After that they stopped publishing but gave the data to an NGO and although it was not perfect it followed the pattern of the former series and we normalized and joined this series to obtain one more year. After a FOI request from LA NACION we received two folders with a lot of pages in small typo to complete our series, this were overwritten by a seal and some pages were difficult to scan, so helped by another NGO we could complete our series, once again passing through the normalization . This is how the NGO and us received the print copies in folders: http://blogs.lanacion.com.ar/data/acceso-a-la-informacion-2/publicidad-oficial-como-se-entrega-la-informacion/
And here is the result of joining this datasets, a full series shared as an app and ready to download.
Open Data Catalog: using open data platform Junar, our datasets are available to download in Csv or via API. We want to make it easy not only for journalism or tech people, but for citizens to access and reuse our datasets.
Related Reading: http://www.niemanlab.org/2012/05/how-la-nacion-is-using-data-to-challenge-a-foia-free-culture/ https://onlinejournalismblog.com/2012/03/14/la-nacion-data-journalism-from-argentina/ http://towcenter.org/treat-data-as-a-source-and-then-open-it-to-the-public-says-momi-peralta/