Producing evidence data through Open data journalism and civic collaboration .
Elections transparency impact after comparing digitized primary elections telegrams with manually completed forms using technology and citizen participation in Argentina
In 2015, presidential elections were held in Argentina with massive protests and complaints in some provinces as regards of fraud during primary elections. National and some provinces electoral systems were under suspicious.
We decided to take action in order to analyze relevant public documents, looking to send a message to Government that we, the people, were monitoring to avoid irregularities or fraud during final election and that this could be done using technology for accountability.
A government site has published more than 95.000 telegrams that authorities from each polling station sent to the postal office containing the amount of votes that each party and each candidate won, how many election monitors were present, etc. It’s not the complete overview concerning documents of the electoral process, but it is the first and original source, a handwritten form, from which all electoral results are processed.
We scraped the electoral results official site and built a dataset with the digitized data of every voting poll so we could built the complete dataset from scratch that would help us double check, use data analysis to detect the outliers from districts we should focus on and prioritize our citizen control efforts.
Then we downloaded the 95.000 files and divided the PDFs grouped by province of origin as a strategy for crowdsourcing to involve locals into screening and classifying their own province electoral information.
For making a structured database based on handwritten information, we used VozData, a collaborative tool to convert documents trapped in closed formats into machine readable open data, which in turn is based on “Crowdata” open source platform we built together with our Opennews fellows Manuel Aristaran and Gaba Rodriguez. This news application was inspired by The Guardian “MP´s Expenses” and Propublica´s “Free the files”.
We decided to set a form with these questions:
- Number of citizens that have voted
- Number of envelopes in the ballot box
- Are there any empty cells? (empty cells without a dash or a zero, can help mistakes/fraud by adding votes for one political party)
- Is there a signature of the President of the polling station?
- National ID Number of President (this allowed us to find extremely young authorities)
- Number of total signatures (authorities and elections prosecutors/monitors). Using this we discovered that some stations lacked of any control at all.
- Other possible observations (corrections, etc)
- Do you consider this telegram normal, suspicious or unacceptable?
Vozdata VIDEO Demo in english
CIVIC OPEN COLLABORATION: Partnering with NGO’s and Universities, and general audience!
For launching the site, we partnered with 3 transparency NGOs and 4 universities and invited our audience in print, website and social media channels to collaborate.
NGOs and universities quickly demonstrated their volunteering effort under the badge of their organization in the platform’s teams feature.
Healthy competition between teams has helped to increase the participation and completion of some projects/provinces. As an example, this is a Teams Final Rank for Telegrams revised about Tucuman Province (completed).
We held 3 civic marathons (photo album 1, photo album 2, photo album 3) in LA NACION and one took place in Córdoba province participating University.
While we were digitizing the handwritten original information, government set an user/password to access the site from where we had previously downloaded the files. As we had uploaded each telegram to document cloud for Vozdata platform, the process could continue without interruption.
We completed the process of building this dataset from PDF documents of 7 provinces, and analyzed 10/20% of the rest of the provinces of our country.
After digging in the new database, combining variables, we’ve found that, in the universe of telegrams revised, there were 48% of irregularities from minor inconsistencies to huge unacceptable corrections.
A. Behind the scenes of analisying the new database
After downloading the file containing all telegrams processed in Voz Data, we obtained a csv file with 40.700 registries that we opened in an Excel spreadsheet.
When the VozData database is exported, all the rows may be seen, even those which have not been yet verified by the system. Therefore, we first filter the rows with a formula that adds those verified registries so after doing that we obtained 16.311 rows of data.
With a more limited base and easier to work with, we added a series of filters and formulas to draw our own conclusions.
For each field of the form, we consider as follows:
It is a mistake if it complies with the rules:
- Voters > 350
- Envelopes > 350
- Wrong DNIs (IDs)
- Wrong amount of signatures
In case the rules are complied with, they were marked with a “1” in a column titled Problems.
The following PivoTable was created, with a summary presented per province, where we estimated the total amount of telegrams, the number of irregularities, the percentage of irregularities regarding the total, the number of cases without the signature of the President, the number of the empty cells, the number of cases without any data on the envelopes and the number of wrond DNIs (IDs) of the President (highest authority) of each polling station.
We also classified them according to different criteria, and we added new columns:
Range of voters
Range of envelopes
Difference between envelope/voters
Range of signatures
Range of DNIs
A mathematical formula which evaluated each row was applied for each column.
For example:
=+IF([@[Number of citizens who have voted]]=0;”Zero”;IF([@[Number of citizens who have voted]]>350;”More 350″;IF([@[Number of citizens who have voted]]<10;”Less 10″;”Between 10 and 350″)))
This formula enabled to tag each registry in the corresponding range of voters. And this procedure was repeated with each one of the formulas.
With all these criteria, we could make a more detailed analysis creating different Pivotables:
For each table, the following fields were evaluated:
Classification
The President, does he have any signature?
Does he have DNI?
Envelopes in the ballot box
Number of voters
Empty boxes
Number of signatures
Observations
Different tables were created: a number of tables containing absolute values and other tables containing percentages.
The percentage presented per province was also included. It was always used the conditional format to highlight in a color scale the highest values painted in red through a range of yellow until the lowest value painted in green.
It must be taken into account that all this work was made in record time, because the day when all the information was uploaded, the data analysis was also made and conclusions were drawn to publish the article the following day.
Conclusions:
We analyzed in total 16.311 telegrams and 13 provinces.
48% of telegrams included irregularities (there was no signature of the President, there were empty boxes, no envelopes in the ballot box, no DNI of the President)
3 provinces with more irregularities:
1. Catamarca (75% of the telegrams)
2. Chaco (65%)
3. Jujuy (57%)
3 provinces with less irregularities:
1. Córdoba (33%)
2. Santa Cruz(36%)
3. Misiones (36%)
7 % of the telegrams without mentioning the DNI
15% of the telegrams has 0 envelopes
37% of the telegrams has empty boxes
3% without the signature of the President
The provinces with the totality of the tables processed are: Tucumán, Santa Cruz, Tierra del Fuego, Catamarca, La Rioja, Formosa, Jujuy.
The provinces with partial process of the tables are: Córdoba, Misiones, City of Buenos Aires, Chaco,BUENOS AIRES Province, Entre Ríos).
A main article reporting about the findings was published in our print and online editions
Data was published in CSV format for all the provinces that were completed.
IMPACT:
The national authority in elections mentioned the Vozdata Primaries Telegrams Project findings during a press conference 2 days before the general elections. He also announced that they decided to change the structure of telegrams for matching other electoral documents for the general elections in order to avoid inconsistencies and problems found in the previous process. https://youtu.be/BiPYGGaG0XY (Subtitles in english)
We can’t affirm that this measure was solely considered as a direct reaction in response to our reporting, but the pressure of the information gathered by 3 NGos and 4 Universities in team with LA NACION let us conclude that it has helped to change the previous elections ruling.
And after 12 years of worldwide questioned “Kirchnerismo”, a new opposition government won the elections in national level, Province of Buenos Aires and the Capital City.
The Code driving Vozdata was open sourced by OpenNews Fellows and named Crowdata.
B. Behind the scenes for collecting data: Side B
C. Other Vozdata Projects. Enjoy the VIDEO!