This datajournalism project transformed built and opened the first normalized and comprehensible dataset of official advertising in Argentina covering this 4 year period and grouped by company´s shareholders researched in a different dataset. It produced front page and full page stories and home page news in Lanacion.com, that had lots of impact. It was lead by datajournalist José Crettaz working in teams with LA NACION data team and LA NACION interactive and infographics departments.
So this is the story:
More than 2000 companies and individuals received advertising in Argentina between 2009 and 2013, but 50% of this amount went to 10 media groups … the ones closer to Govermnent.
Only seven companies received more than $ 100Million pesos in this period, including three national TV channels (first, third and fourth in audience order) , four cable news channels and several radio channels. But in that list there are none of the largest and most traditional newspapers in argentina. Even a hairdresser (stylist) received more advertising money than this newspapers..
Independent media discrimination and freedom of the press are at risk in Argentina, not only in the abitrary official ads distribution now sustained by this datajournalism project, but also with private advertisers being ordered by government to stop advertising in the country’s top newspapers, in a bid to weaken independent media companies.
All this actions , besides journalists suffering layoffs and threats from public media or media receiving most of official ads as well as journalsits being harassed in public.
Built from scratch with raw data published with more than two years of delay, and after two foia requests (sort of, as Argentina´s still without Foia law) from LA NACION and transparency NGOs in Argentina, The Jefatura de Gabinete of Ministers released one year of data in two semester PDF´s that contain 30 or more pages each. LA NACION data team added this new information in a three year of transformed, normalized, cleaned , enriched and then open dataset that again is now available for everyone in Argentina to reuse.
In Argentina, the federal government has to inform how the funds for official advertisement are assigned (purchase of ad space in the media and street billboards).
Through the office of the chief of minister’s website (www.jgm.gob.ar) the national government presents that information with suspicious delay, in the not so flexible PDF format (with tables occupying almost 30 pages in each delivery), with typos and with no identification of the media in which the ads are finally published, only including the name of the person or the business name of whoever signs the receipt.
Information on the distribution of the official advertisement funds is commonly demanded by civil organizations and media through requests for access to information and also through the courts in the judicial power.
Still in this country that has no Foia Law , among others, Poder Ciudadano (Local Transparency International Chapter) , the Association for Civil Rights (ADC) and LA NACION have been using administrative procedures for access to this public information (the decree 1172 of the year 2003), with little success.
Informing only the business name of firms (in general fantasy names with no relation to owners or shareholders) that receive official advertisement funds hides who really receives those funds. Several media groups receive funds through several companies. So to be able to prove it we had to do at least two things:
-convert the pdf to xls, normalize the data and verify inconsistencies (source of the data: the chief of minister’s office forms)
-identify the owners or shareholders that control the firms and establish links among them. (Source of the data: Official Bulletins where acts and commercial edicts are published, including partners and directors of societies.
With that information the firms belonging to the same owner were aggregated as media groups and in each case the funds received by each business/firm were added. Then an interactive Tableau graph was added to allow users to navigate the data: media groups and the funds received by semester in the total aggregate.
For this work we included all the periods reported on the web by the Jefatura de Gabinete of ministers , from the second semester of 2009 to the first semester of 2013 .
Thus, La Nación took the official data of the distribution of funds for official advertisement from the web site of the office of the chief of the Cabinet. How much money each individual or company that received national official advertisement collected, with no specification of what media showed the ads. With those data, LA NACIÓN verified in the Official Bulletin of the Argentine Republic the stockholders of the companies and grouped the companies according to the main stockholder, to know how much money each business group received.
Besides, LA NACION opened data including his transformed PDFs in a table with search box embedded with both new semesters, and presented it in Lanacion.com home page with an option to download the data in XML or CSV. In november 2013, LA NACION DATA organized the second datafest in Argentina, here you can see how Jose Crettaz explained to the audience this dataset for dataminers and journalists there to play with. Of course, one group took this as a challenge
How we built this database from scratch. Collection & Analysis.
First we downloaded the PDF files from the official site and then we transformed them into Excel spreadsheets.
The PDFs looks like this and everyone is about 30 pages and more than 1000 rows.
Each row represents a company or individual receiving advertising.
In many cases the different programs for conversion from PDF did not work and the ones that did, we verified that the conversion was accurate, checking the totals.
The following video explains the step by step process & tools used for collecting data process of this project:
This same Excel spreadsheet has a sheet for each period.
There is a page where we created a dictionary of equivalences we call the “Entity business name dictionary”, because the data came without any unique identification of the business name of each entity (the code field does not identify).
So each time we downloaded a new period, the same entity can come with a different business name or typos.
The entity business name dictionary solves this problem. For that we put the business names found in each period in Open Refine and uniformed them with the cluster method, generating a new, corrected, business name.
..and we save this information in our Excel file…
Then each time we include a new data period, we add the formula “Vlookup” that searches in this dictionary an automatically corrects the problem
Besides, each company can integrate a media group, so we have another “Media Group Dictionary” in a different sheet . Esach company is researched in another dataset “the official Gazzette”, that tells us about stakeholders so the journalists can connect them and assign the group to be able to add the amounts of advertising later .
Both dictionaries were created as tables so it allows cells referred to them, to refresh automatically.
And as it happens with the company name, the Group names have a vLookup formula that searches this dictionary and assigns a group to each researched company.
Besides, pivot tables are prepared to group totales by company name or by group of companies. Amounts are ordered by amount, in descending order.
To analize monthly percentage change we created a semaphore code of colors to help our journalist visually detect the relevant changes with scales that go in descending order from red to green.
The we generate a ranking where we can analyze the change in positions using the “rank” formula
So, to have prepared this excell file with presetted formulas the process is really fast to detect main stories and report. The same day the PDFs were released , Jose could report the first story with transformed , normalized and open data here and then he completed the investigation in the following weeks.
Everytime a new semester is released, we add a sheet to our excel that calculates every formula. The same happens as we identify new groups of companies, and run again backwards to refresh not only the present but the whole series of data.
After doing the habitual data checking processes, we extract a table with the columns that our dataviz team needs, both for print and interactive visualizations.
Stories and impact:
>> Last updated main story published March, 31st, 2014 <<
April 1, 2014 – Opposition created a “Media censorship Index” that shows how private ads fall 64% year to year and this adds to the Official Advertising
April 1, 2014 – Another official advertising distribution scandal (LA NACION Editorial)
March 31, 2014 – The story of the hairdresser that won almost $ 6 Million pesos by distributing official advertising
March 26, 2014 – One by one, explore detailed data on two semesters of advertising distribution, searchable and open data.
January 20, 2014 Two years without data about official advertising
September 2013 – PART I with front page story and full page in print: How Official Ads benefitted five media groups
Media Impact
– Perfil news site at first hour in the morning published the article mentioning LA NACION then Perfil also published the story of the hairdresser.
– Clarín also published the story, mentioning “La Nacion´s investigation”
– Clarin published also that the hairdresser is a Senator´s stylist
– Infobae news site found the hairdresser and published an interview
– TN , cable news TV channel included this news in the informative news service all day long every half hour
– Many of the main national radios , Radio Mitre, Radio Continental and other included this investigation in their news services as well as editorials of radio columnists.
– in Twitter, congressmen and Senators RT the link to Lanacion.com´s articles. Some of them are: Laura Alonso (PRO, ex Poder Ciudadano -Transparency Intl), Maria Eugenia Estensoro (UNEN, legisladora porteña) and Patricia Bullrrich (Unión por Todos-PRO). Other journalists recommended the investigation, such as Carlos Montero ( CNN) and local ones.
Local media in Argentina´s provinces looked for their colleague in the tableau to extract their own local stories .