On january the 10th 2011 vicepresident Amado Boudou started his administration. From the moment that several articles were published concerning the incorporation of a large quantity of employees to the Nation´s Senate, on january 2012 we developed an Excel application in Visual Basic for Applications, that (executed every month) and that would register the evolution of the transitory, permanent and hired staff of public senate workers, as well as the details of each work force, which differ between each other.
DATA : STEP BY STEP STARTING FROM RAW
Contracted:
The application did several tasks. First, it loaded each month in different Excel pages, identifying the date of the data collection.
Then we added up by date the quantities of each type of recruitment to observe their evolution.
And finally, a spreadsheet with all the consolidated records, which allows to do several different analysis, such as category variation, recurrence identification, lenght of service, common destinations of the workforce, senators that request for employees, etc.
The first article: “Amado Boudou suma 2000 empleados a su administración”
PREPARING DATA FOR ANALYSIS
In the beginning we had the data by date and type separated in different Excel sheets. To be able to produce a better analysis, we needed to have it all together in one sheet, each record with its date and category (whether it was a member of the permanent, temporary or hired staff). In order to do that a Type column was inserted with the letters “T”, “P” or “H”, respectively.
Once we had the data consolidated in one sheet we were able to work with it so as to insert a pivot table.
Because the destiny column had repeated categorys we used Open Refine to normalize it. The “Cluster and Edit” function allowed us to consolidate destinations that were written differently but never the less corresponded to the same category.
We cleaned with Excel´s “search and replace” function the cases we could not normalize with Open Refine, creating an equivalence dictionary. This dictionary will allow us in the future normalize the data with new updates.
So, in the consolidated column we created a new cloumn called “Normalized Destiny”, and in each cell we applied the “Vlookup” formula to make each destiny correspond with its equivalent normalized in the dictionary of destinies.
Then we generated a pivot table with this data:
In the first place, to be able to visualize time evolution, by month/year and by type of hiring:
To be able to evaluate the time growth, we calculated the percentual difference of the total amount of employees on a certain date in comparison to a previous date per type of hiring.
To evaluate which type of hiring had the highest amount of employees, we calculated the percentages of the different types of hiring in comparison to the total amount on a certain date.
We made graphics out of the different series and then we used the conditional formatting as a traffic light to see the ups and downs of each analysis.
We also calculated the difference between the first date and the last one, so as to see the percentage of growth.
For another analysis, we gathered the different destinys and we created a pivot table. Then we filtered the first date of each year and the last date of all the dataset, ordering them in a ranking by the Destiny that hired more employees. We made a graphic with it, as well as by senator.
The second article: “La caja política del senado, nombran hasta 50 asesores por senador”
For a second article, we decided to concentrate on the hired employees, who´s destination corresponded to a senator. For that, we filtered the data of the last publication, and we analyzed it separately. We added additional information for each senator: their province and the political party to which they belong.
Then, we generated a pivot table and a ranking with the quantity of employees of each one. Because every senator has a different amount of benches per party, we decided to calculate the average of employees each one has, arranging them in descendant order.
With the Excel conditional formatting we were able to stand out the senators that had the higher average.
Then we presented it in a Tableau Public data visualization: