Differences in Country Identifiers

A commonly used identifier across datasets in Demscore modules are countries, but country definitions and names often vary. This poses a challenge for merging data. As a general rule, resolving merge conflicts depends on the chosen Output Unit. Merge scripts always prioritize preserving the data quality of the units of that Output Unit. This means that we follow the country definitions of the dataset chosen as Output Unit. When merging from other datasets we only keep combinations that match the chosen Output Unit.

While this resolves most merge issues, some conflicts remain due to differences in what territories are included in e.g. "France" over time. Below we list the most notable differences in country definitions that users should pay attention to when using data merged on country identifiers. More detailed information can be found in the explanatory notes of the Methodology Document as well as in the Demscore Handbook.

For differences in the full country names across data sources and how they are adjusted for merges and translations between Output Units in Demscore, see the original Demscore translation functions in our code which is publicly available on: https://github.com/demscore/

Cautionary Notes Regarding Country Merges

We want to emphasize that Demscore expresses no opinions on sovereign claims to disputed territories neither through the descriptions below nor through any merge decisions. Nor does Demscore make any judgments concerning which territories qualify as countries.

Please also note that we do not claim the following section with cautionary notes on country merges to be complete. New cases are added continuously.

*V-Dem uses their own conceptualization and definition of country units, while UCDP/VIEWS uses Gleditsch and Ward. In V-Dem, a "country" is defined as a political unit enjoying at least some degree of functional and/or formal sovereignty. In UCDP, a state is "either an internationally recognised sovereign government controlling a specified territory, or an internationally unrecognised government controlling a specified territory whose sovereignty is not disputed by another internationally recognised sovereign government previously controlling the same territory". Both projects provide very detailed definitions of their country units, but we want to point out where these different definitions and conceptualizations of countries lead to particularly tricky problems when merging data.

Included as countries in UCDP/VIEWS from 1991 and onward. V-Dem includes these countries already in 1990, with coding of a country beginning at a point in time when it is judged to have become an effective governance unit or has gained international recognition — whichever comes first. Separation from a larger unit (e.g., an empire) may result in a small temporal overlap between the end of one unit and the beginning of another. Thus, former Soviet republics are coded from 1990 even though the USSR endures formally until 1991.

V-Dem Cyprus does not include areas that are not under the effective control of the Republic of Cyprus during the period of division (1974-). QoG lists Cyprus before and after the division of the island separately, but does not state whether the measurement includes areas that are not under the effective control of the Republic of Cyprus. V-Dem Cyprus and QoG Cyprus are merged together for the years after 1974 nevertheless.

UCDP/VIEWS include Eritrea from 1993 onward. V-Dem codes Eritrea as a separate unit, even during periods of rule by Italy and Ethiopia.* Data related to the conflict involving Ethiopia and Eritrea is collected under UCDP Ethiopia for all years prior to 1993. As a result, this case does not allow the merging of country identifiers from V-Dem and UCDP/VIEWS due to the mismatch in country definitions that goes beyond differently named country identifiers and available combinations of country and year identifiers between the two projects. This and similar cases are marked in the merged datasets as "missing from mismatch" (code: -22222).

COMPLAB: includes Guadeloupe, Martinique, French Guiana, Réunion and Mayotte, Saint Pierre and Miquelon, New Caledonia, French Polynesia, Wallis and Futuna, while Algeria is unspecified.

QoG: prior to 1963 includes Algeria, Complab France is not specified in this regard. COMPLAB France and QoG France are matched nonetheless.

V-Dem & H-DATA: do not include overseas territories. These countries are merged nonetheless.

UCDP/VIEWS use Gleditsch and Ward country identifiers and include Algeria prior to 1963. COMPLAB France and UCDP/VIEWS France are matched nonetheless.

REPDEM: do not specify. These countries are merged nonetheless.

UCDP/VIEWS include Kosovo from 2008 onward. V-Dem codes Kosovo from 1999 onward. Data related to the conflict in Kosovo is collected under UCDP Serbia (Yugoslavia) for all years prior to 2008. As a result, this case does not allow the merging of country identifiers from V-Dem and UCDP/VIEWS due to the mismatch in country definitions that goes beyond differently named country identifiers and available combinations of country and year identifiers between the two projects. This and similar cases are marked in the merged datasets as "missing from mismatch" (code: -22222).

COMPLAB:
Netherlands: Bonaire, Sint Eustatius and Saba are excluded.

Portugal: Azores and Madeira are included, while Angola, Cape Verde, Guinea-Bissau, Mozambique, Sao Tome and Principe, and Macau are excluded.

Spain: Canary Islands, Balearic Islands, Ceuta, Melilla, Plazas de soberanía are included.

United States of America: Puerto Rico, American Samoa, Guam, Northern Mariana Islands. and U.S. Virgin Islands are included.

QoG: data are compiled from different sources, the exact country definitions are not always given and thus may differ from COMPLAB country definitions. These countries are merged nonetheless.

V-Dem & H-DATA: Netherlands, Portugal, Spain, United Kingdom and United States of America do not include overseas territories. These countries are merged nonetheless.

UCDP/VIEWS: use Gleditsch and Ward country identifiers. These countries are merged nonetheless.

REPDEM: do not specify. These countries are merged nonetheless.

V-Dem includes Palestine/British Mandate, Palestine/Gaza, and Palestine/West Bank. UCDP/VIEWS does not include Palestine, but collects data on events/conflicts happening on Palestinian territories under the location/country border variable for Israel. As a result, this case does not allow the merging of country identifiers from V-Dem and UCDP/VIEWS due to the mismatch in country definitions that goes beyond differently named country identifiers and available combinations of country and year identifiers between the two projects. This and similar cases are marked in the merged datasets as "missing from mismatch" (code: -22222).

V-Dem & H-DATA includes "Russian Federation (the)" for the whole time period of their country-year data. QoG USSR (years before 1992) and QoG Russia (from 1993 onward) are thus both matched to V-Dem and H-DATA "Russian Federation (the)".

H-DATA refers to historical Serbia, Yugoslavia, and modern-day Serbia as 'Serbia/Yugoslavia' while QoG includes Yugoslavia (until 1991), Serbia and Montenegro (1992-2005), and Serbia (2006 onward) separately. QoG Yugoslavia, Serbia and Montenegro, and Serbia are thus all merged to H-DATA Serbia/Yugoslavia, accounting for the years.

Slovenia and Croatia are included as a country in UCDP/VIEWS from 1992 and onward. V-Dem codes Slovenia from 1989 onward and Croatia from 1991 onward (and from 1941-1944 as the Formally independent state of Croatia).

QoG Viet Nam and Vietnam North are merged to V-Dem Vietnam, QoG Vietnam South is merged to V-Dem Republic of Vietnam and vice versa.

V-Dem Yemen until 1989 is merged to QoG North Yemen.