The SDR Master Box, including the harmonized survey data file, is now available for download via Dataverse. The SDR Master Box consists of five data files and corresponding documentation: (1) the master file (MASTER) with individual-level data from cross-national surveys, (2) country-level file (PLUG-COUNTRY), (3) country-year-level file (PLUG-COUNTRY-YEAR), (4) survey-level file (PLUG-SURVEY), and (5) wave-level file (PLUG-WAVE). The MASTER file is the core of the Master Box and contains harmonized target variables, harmonization control variables, as well as flags for non-unique records, non-unique case IDs, and missing case IDs, while the other PLUG files contain contextual data, metadata, and data quality indicators.
by Claire Durand, Isabelle Valois and Luis Patricio Peña Ibarra, Department of Sociology, University of Montreal
This article presents the current progress of a research project whose aim is to develop methods to analyze combined micro-data. The most recent papers that were presented (see references) give an insight into the work that has been accomplished to date.
This project was triggered by a preceding project where we aimed at analyzing change in support for Quebec sovereignty over time taking into account that question wordings and specific constitutional choices offered in survey questions varied over time and between surveys. We had identified close to 700 questions asked in polls over a 40-year period. In order to analyze these data, we used a multilevel model where polls were embedded within months. This allowed for analyzing the impact of question characteristics at level 1. Since time itself was at level 2, we could study change in support for sovereignty over time and the impact of events that occurred during each period controlling for question wording and constitutional choices (Yale & Durand, 2011).
This research ended with a frustration. We would have liked to be able to answer questions like whether the impact of age on support for sovereignty was fading over time. This requested combining micro-data, not just poll results. Therefore, we decided to combine data sets in order to be able to answer our research questions. However, instead of maintaining the focus on Quebec sovereignty, the focus was changed to institutional trust.
The first project is Valois’s Trust in Canada which involves combining survey data over a 40-year period; the second and third project started with the objective of combining all the surveys that had questions on trust in institutions everywhere in the world. One project is Durand et al.’s Trust in the World who combined the data from all the Barometers conducted outside Europe and the Latin American Public Opinion Project (LAPOP) surveys; the other one is Peña Ibarra’s Trust in Latin America, which uses a subset of these data to focus on Central and South America plus Mexico. The basic information on the three projects is presented in Table 1.
by Ilona Wysmułek, Graduate School for Social Research, Polish Academy of Sciences
Corruption, given its secretive nature, is a phenomenon that is hard to capture in the interview situation.
In corruption research, surveys are among the major sources of our knowledge about the subject (Heath, Richards and de Graaf 2016; Karalashvili, Kraay and Murrell 2015). However, there are several methodological challenges to studying cross-national trends in corruption with public opinion data. Corruption, given its secretive nature, is a phenomenon that is hard to capture in the interview situation. Some respondents are reluctant to answer sensitive questions and some may understand the concept differently than intended by researchers (Azfar and Murrell 2009; Bertrand and Mullainathan 2001). Moreover, international survey projects dealing with corruption continue to face challenges of unequal country representation. Estimation of rare event determinants also remains problematic, given that reported corruption instances are, for most modern democracies, highly infrequent.
To overcome some of these methodological problems, I apply ex-post harmonization of cross-national survey data in corruption research. In my dissertation project, I study corruption perception and individual corruption experience of giving informal payments (as a bribe or a gift) in public schools in Europe. I use cross-national survey data on corruption in public schools in Europe combined with country-level indicators, for example from the World Bank Education Statistics and OECD’s Education at a Glance. I follow the Survey Data Recycling (SDR) framework developed by the research team of Kazimierz M. Slomczynski, which provides a blueprint for ex-post survey data harmonization and for integrating surveys and other data sources (please see corruption project for more detailed information).
by Olena Oleksiyenko, Graduate School for Social Research, Polish Academy of Sciences
This article focuses on issues of harmonizing information on ethnic minority status as part of a larger project on patterns of electoral and non-electoral political participation in post-soviet states. Specifically, I am interested in differences in political participation between a given country’s Russian-speaking minority and the majority population in Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Tajikistan, Uzbekistan and Ukraine.
There is no single international survey project that adequately covers all the former Soviet republics since the Soviet Union’s collapse, to current times. Even projects with the broadest country coverage, such as Life in Transition, do not allow for meaningful over-time comparisons. Hence, I selected, for purpose of ex-post harmonization, international projects that measure peoples’ electoral and non-electoral participation and ethnic identification in any of the post-soviet countries. Table 1 presents the list of the international survey projects I included, which taken together, span the period 1993- 2015.
Table 1. International Survey Projects with Relevant Data
Cross-national comparisons of ethnic groups are not as straightforward as it may seem, since in many cases the underlying concept of “minority group” is different in each state. The literature proposes different approaches to increase comparability of the concept. The “absolutist” approach suggests that only one marker of minority status should be taken into account, e.g. citizenship or language. The advantage of such a solution is conceptual clarity, but one can argue that the complexity of the minority status cannot be precisely studied with only one indicator. An alternative is the “relativist” approach to harmonization of items on minority status. This involves cross-classification of different ethnic referents to obtain a single, cross-nationally equivalent score on “ethnic minority status” (Lambert 2005). The problem with the “relativist” approach is the low availability of the same markers across all surveys.
The Harmonization Project team, in coordination with Cross-national Studies: Interdisciplinary Research and Training program (CONSIRT.osu.edu), has published the latest issue of Harmonization: Newsletter on Survey Data Harmonization in the Social Sciences.
This issue features articles on a variety of methodological topics. Tom Smith, of NORC at the University of Chicago, discusses recent projects in survey data harmonization. Claire Durand and colleagues at the University of Montreal present their projects on analyzing trust in institutions using surveys pooled across time and countries. Zbigniew Sawinski, long-time methodologist of the Polish Panel Survey POLPAN, presents a schema of inter-wave harmonization of panel data. Two graduate students at the Graduate School for Social Research of the Polish Academy of Sciences discuss their dissertation projects on harmonizing ethnic minority status in surveys of post-Soviet nations (Olena Oleksiyenko) and on harmonizing corruption items in international survey projects (Ilona Wysmulek). We also include news from IPUMS (Catherine A. Fitch) and GESIS (Kristi Winters).
The harmonization community continues to present their research at conferences and workshops around the world. In this issue, we have reports from the International Political Science Association meeting in Poland, the QDET2 in Miami, Florida, the 3MC conference in Chicago, Illinois, and the International Social Survey Programme meeting in Lithuania.
As always, we invite all scholars interested in survey data harmonization to read our newsletter and contribute their articles and news to future editions.
The Harmonization Project has released their first book:
Democratic Values and Protest Behavior: Harmonization of Data from International Survey Projects by Kazimierz M. Słomczyński, Irina Tomescu-Dubrow, and J. Craig Jenkins with Marta Kołczyńska, Przemek Powałko, Ilona Wysmułek, Olena Oleksiyanko, Marcin W. Zieliński and Joshua K. Dubrow. 2016. Warsaw: IFiS Publishers.
Across the world, mass political protest has shaped the course of modern history. Building on decades of theory, we hypothesize that the extent and intensity of political protest is a function of micro-level democratic values and socio-demographics, country-level economic development and democratic practices, and the discrepancy (i.e. cross-level interaction) between a country’s democratic practices and peoples’ trust in key democratic institutions – that is, political parties, the justice system, and parliament.
This book is a Technical Report on the logic of, and methodology for, creating a multi-year multi-country database needed for comparative research on political protest. It concerns both the selection and ex-post harmonization of survey information and the manner in which the multilevel structured data can be used in substantive analyses.
The database we created contains information on more than two million people from 142 countries or territories, interviewed between the 1960s and 2013. It stores individual-level variables from 1,721 national surveys stemming from 22 well-known international survey projects, including the European Social Survey, the International Social Survey Programme, and the World Values Survey. We constructed comparable measures of peoples’ participation in demonstrations and signing petitions, their democratic values and socio-demographic characteristics. We complemented the harmonized individual-level data with macro-level measures of democracy, economic performance, and income inequality gathered from external sources. In the process, we pulled together three strands of survey methodology – on data quality, ex-post harmonization, and multilevel modeling.
This book is funded by the (Polish) National Science Center under a three-year international cooperation grant for the Institute of Philosophy and Sociology of the Polish Academy of Sciences (IFiS PAN), and The Ohio State University (OSU) Mershon Center for International Security Studies (grant number: Harmonia-2012/06/M/HS6/00322).
“Survey Weights as Indicators of Data Quality” by Marta Kolczynska, Marcin W. Zielinski, and Przemek Powalko appears in Harmonization newsletter (Summer 2016, v2 n1)
In the last decades, more and more scholars are using weights as a procedure for correction of distortions in surveys. The improvement in the quality of the data using weights is conditional upon the quality of the weights themselves, as well as their ability to correct the discrepancies between the realized sample and the population. In cross-national research, especially when combining survey data from different survey projects, the additional challenge is making sure across national samples, the quality of the weights and the quality of weighted data are comparable and allow for meaningful analyses of the combined data.
Over time, weighting data has gained popularity as a way of dealing with sampling and non-responses errors.
We propose four properties of weights that can be considered as both indicators of their quality, and also as indicators of the quality of the data in terms of the degree of distortion between the targeted sample and the achieved sample. First, the mean value of weights in a sample should be equal to 1; otherwise weighting the data would change the sample size and thus artificially alter standard errors and confidence intervals and lead to unfounded conclusions of hypothesis testing. Second, while weights usually lead to an increase in variance in the data, weights with a smaller variance are generally preferred over weights with greater variance. Weight variance depends on the discrepancy between the achieved sample and the population, or the extent to which the raw data need to be corrected to represent the population. Thus, in some sense, the weight variance can be assumed as a rough indicator of the quality of the sample. Finally, to avoid case exclusion and the loss of information, weights should have values greater than 0. If a weight would take the value 0, that case would be excluded from analyses. Extreme values should be avoided because they lead to potential bias if the individuals who have been assigned very high weights are specific, unusual, and deviating from the average.
There is no measurement without error. However, the size of the error can vary depending on the measure used. In particular in social sciences survey data, the size of the error can be very large: on average, 50 percent of the observed variance in answers to survey questions is error (Alwin 2007). The size of the error can vary a lot depending on the exact formulation of the survey questions used to measure the concepts of interest (Saris and Gallhofer 2014) and also across languages or across time. Thus, one of the main challenges for cross-sectional and longitudinal surveys, in order to make meaningful comparisons across groups or time, is to be able to estimate the size of the measurement error, and to correct for it.
SQP is based on 3,700 quality estimates of questions obtained in more than 30 European countries and languages…
The new issue of Harmonization: Newsletter on Survey Data Harmonization in the Social Sciences is now available. Harmonization is a product of the Harmonization team, and organized by Cross-national Studies: Interdisciplinary Research and Training program (CONSIRT.osu.edu). Working together, we share news and communicate with the growing community of scholars, institutions and government agencies who work on harmonizing social survey data and other projects with similar focus.
Articles in this issue:
Quality of Survey Data: How to Estimate It and Why It Matters by Melanie Revilla, Willem Saris and the Survey Quality Predictor (SQP) team
Estimation Bias due to Duplicated Observations: A Monte Carlo Simulation by Francesco Sarracino and Małgorzata Mikucka
Survey Weights as Indicators of Data Quality by Marta Kołczyńska, Marcin W. Zieliński, and Przemek Powałko
Data, according to the United Nations Statistical Commission, are “the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means” (UNSC 2000: 6). In other words, for information to qualify as data, it needs to be usable. Usable survey data depends on the availability and the high-quality of documentation.
Survey documentation refers to information on when, where, how and by whom the study was conducted, including information on the type of the sampling, size of the sample, response rate, preparation of the questionnaire and other instruments, as well as pretesting, and fieldwork control. In the Internet age, this information should accompany the survey data set in the form of one or more documents electronically available for viewing and downloading.
The main goal of any statistical analysis using survey data is to draw inferences about the target population. The precondition is that the survey sample is representative for the population. Representativeness can be approached in different ways and met to different degrees.
The researcher ultimately has to decide whether a given survey sample is sufficiently representative to solve their research problem. This decision requires knowledge about sampling, including the sampling scheme, the sampling frame and, if such is the case, details of stratified samples or other methods. For researchers, additional aspects of the survey process, such as response rates and control of fieldwork, are also important to review in order to assess survey data quality.