Argentina´s Senate Expenses 2004-2013

Extracting stories from public DATA formerly unstructured and in PDFs.

After finding out that Senate have published expenses since 2004 in raw PDFs, some of them as images and completely unstructured, LA NACION data team managed to scrape, transform, normalize and structure three datasets into one and began an interrogation process that included front page stories, replies from actual and former Argentina´s vice presidents (Senate presidents), and provoked a judicial investigation over vicepresident Amado Boudou regarding these expenses. This series of front page stories lead to more stories and different approaches to keep Senate accountable.

As we converted these PDFs into OCR txt files, we realized that we have lots of information lost as a consecuence of very noisy PDF´s. Besides, we realized that there would be more stories if more eyes helped us classify and enter this data. So we decided to ask for help: inspired in The Guardian MP´s Expenses and Propublica´s Free the Files, we asked our Knight-Mozilla Opennews Fellow 2013 Manuel Aristaran to help us develop “Vozdata” a platform for Crowdsourcing data in a structured way .

He developed “Crowdata” working together with Gabriela Rodriguez our Opennews Fellow 2014. We launched our first “Senate Expenses Vozdata project with a dataset of more than 6700 PDFs that took two months to be processed.

To fulfill that, again we asked for collaboration and activated our community organizing  two “Civic Marathons” with NGO´s , Universities and users. See all the details here.

At the same time, one of our journalists heard that in Senate there had been a big growth of the amount of employees and as our data team has been scraping during 30 months (since november 2011),  the lists of Senate permanent, temporary and contracted employees, we could release a unique and original analysis that became a new finding sustained with data and visualizations. In this period , senate employees and contracted went from 3.700 to 5.700 which meant a 55% of growth. Again, our vice president Amado Boudou replied to these articles using the official channel in national TV , but he could not deny any of the numbers on the reporting.

Here are the details, data collection and data analysis process and the articles and visualizations.

Regarding all the Senate Expenses stories, as they were many and some of them are still in judicial investigations, we decided to put them all together in a Tag home page

http://www.lanacion.com.ar/gastos-en-el-senado-t49163

So here is the process and how we did this , together with new stories:

Play it in HD!

Thanks to building this dataset from scratch and analysing dates, we also found out that some expenses of official trips were presented with dates that were overlapped and even included some trips that were not made.

I. THE PROCESS  Seguir leyendo

Sin comentarios

The 30 years of Democracy in Argentina´s Anniversary

Recent history of Argentina changed completely on sunday 30th, 1983. Raúl Alfonsin won the elections and became president which meant, returning to democracy and ending the period of military dictatorship.

To cover this anniversary LA NACION developed a transmedia experience that allowed users simulate a real time elections coverage with digital platforms but using the 1983 original content.

They could experience in real time, 30 years later, this historical elections. This implied an intense research work into our archives, together with multimedia and other materials so as to offer context information like social movements during the weeks previous to the elections, the TV and radio spots and caimpaign ads from Raul Alfonsin , and the newspaper´s front pages.

For us it was a different way to live again this special moment, using present technologies but not in a traditional digital coverage that would be a secuential one way production. This simulated coverage was live from October 19th to November 1st, 2013, based in a narrative that was spread by social media.

Revive history, in real time  

Through the Twitter account @LNvoto83 https://twitter.com/LNvoto83 the coverage published exclusive documents that were crucial to simulate those days in real time and transport users throuth time. All this tuits were published with the hashtag #Voto83.

There , minute by minute multimedia material, partial vote counts tables, live speeches (at the exact hour but 30 years after), and historical TV spots were published. This process lasted two weeks. Seguir leyendo

Sin comentarios