SDR Data Structure and Documentation
The SDR database comprises five data files, which contain data measured at different levels:
- The Master file (MASTER) stores individual-level data from cross-national surveys: harmonized target variables, harmonization control variables, as well as flags for non-unique records, non-unique case IDs, and missing case IDs,
- Country-level file (PLUG-COUNTRY),
- Country-year-level file (PLUG-COUNTRY-YEAR),
- Survey-level file (PLUG-SURVEY),
- Wave-level file (PLUG-WAVE).
The names and contents of each file, case identifiers and key variables, are presented in Table 2 below.
The core of the SDR database is the Master file, which stores the harmonized individual-level survey data. In separate data files, we store (a) information describing the survey process (e.g. original response rates) and indicators of source data quality (e.g. inconsistencies between data-documentation or non-unique cases), and (b) contextual data from various publicly available sources (e.g., country population or GDP). Corresponding to their respective level of measurement, these additional data are stored in four plug- files, PLUG-COUNTRY, PLUG-COUNTRY-YEAR, PLUG-SURVEY, and PLUG-WAVE, respectively.
Table 2. Structure of the SDR database
|Name of file||Content description||Key variables / case identifiers|
|MASTER||Individual (respondent) level data, flags for non-unique records and missing or non-unique case IDs
|PLUG-COUNTRY||Country-level data (names of geographical macro and micro-regions, country codes)||
|PLUG-COUNTRY-YEAR||Demographic and economic indicators, democracy and governance indicators||T_COUNTRY_L1U T_COUNTRY_YEAR|
|PLUG-SURVEY||Characteristics of national surveys (sampling method and response rate), indicators of survey quality as reflected in the documentation, availability and correctness of survey weights
|T_SURVEY_NAME T_SURVEY_EDITION T_COUNTRY_L1U T_COUNTRY_SET|
|PLUG-WAVE||Discrepancies between data and documentation||T_SURVEY_NAME T_SURVEY_EDITION|
The PLUG data files are linked to the MASTER file by one or more key variables in the form of one-to-many merges.
The defining feature of the data structure is the nesting of individuals in national surveys. A national survey is identified uniquely by the combination of project*wave*country*set, which is conceptually equivalent to project*year*country (with the exception of WVS/3/CO and WVS/4/MA, which have two samples each – in this case an additional distinguishing variable Set must be used).
For the SDR database v1.0 the detailed description of each SDR data file and their full documentation is available at Dataverse.