Ayuda
Ir al contenido

Dialnet


Resumen de Exploratory data analysis on large data sets: The example of salary variation in Spanish Social Security Data

Catia Nicodemo, Albert Satorra Brucart

  • New challenges arise in data visualization when the research involves a sizable database. With many data points, classical scatterplots are non-informative due to the cluttering of points. On the contrary, simple plots, such as the boxplot that are of limited use in small samples, offer great potential to facilitate group comparison in the case of an extensive sample. This article presents exploratory data analysis methods useful for inspecting variation across groups in crucial variables and detecting heterogeneity. The exploratory data analysis methods (introduced by Tukey in his seminal book of 1977) encompass a set of statistical tools aimed to extract information from data using simple graphical tools. In this article, some of the exploratory data analysis methods like the boxplot and scatterplot are revisited and enhanced using modern graphical computational devices (as, for example, the heat-map) and their use illustrated with Spanish Social Security data. We explore how earnings vary across several factors like age, gender, type of occupation, and contract, and in particular, the gender gap in salaries is visualized in various dimensions relating to the type of occupation. The exploratory data analysis methods are also applied to assessing and refining competing regressions by plotting residualsversus-fitted values. The methods discussed should be useful to researchers to assess heterogeneity in data, acrossgroup variation, and classical diagnostic plots of residuals from alternative models fits.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus