A complete guide to using pyqog for QoG data analysis.
Make sure you have pyqog installed. If not, see the
Installation page.
import pyqog
import pandas as pd
The core function is read_qog(), which downloads a QoG dataset and returns it
as a pandas DataFrame. On the first call it downloads from the QoG servers; subsequent calls
use the local cache.
The Basic dataset contains a curated selection of the most commonly used governance and institutional quality indicators. It is a good starting point for most analyses.
# Download basic time-series dataset (latest version)
df = pyqog.read_qog()
# Explore the data
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()[:10]}...")
print(df.head())
By default, read_qog() downloads the basic dataset in
time-series format for the most recent year of publication (2026).
The Standard dataset is the most comprehensive QoG dataset, containing approximately 2,000 variables from hundreds of data sources. Use it when you need access to a wide range of indicators.
# Download standard time-series dataset
df_std = pyqog.read_qog(which_data="standard")
print(f"Variables: {len(df_std.columns)}")
print(f"Rows: {len(df_std)}")
Note: The Standard dataset is large (typically 50+ MB as CSV). The first download may take some time. After caching, it loads instantly from disk.
The OECD dataset focuses on OECD member countries with indicators particularly relevant to developed economies.
# Download OECD time-series dataset
df_oecd = pyqog.read_qog(which_data="oecd")
# Check which countries are included
print(df_oecd["cname"].unique())
QoG datasets come in two formats:
Multiple observations per country across different years. Each row is a country-year pair. Use this format for panel data analysis, trends over time, and longitudinal studies.
# Time-series (default)
df_ts = pyqog.read_qog(
data_type="time-series"
)
# Columns: cname, year, ...
One observation per country (typically the most recent available value for each indicator). Use this for snapshots and cross-country comparisons at a single point in time.
# Cross-sectional
df_cs = pyqog.read_qog(
data_type="cross-sectional"
)
# Columns: cname, ...
Since read_qog() returns a pandas DataFrame, you can use all standard
pandas filtering methods.
df = pyqog.read_qog()
# Single country
brazil = df[df["cname"] == "Brazil"]
print(brazil.head())
# Multiple countries
latam = df[df["cname"].isin(["Brazil", "Argentina", "Chile", "Colombia"])]
print(latam.head())
# Single year
df_2020 = df[df["year"] == 2020]
# Year range
df_recent = df[(df["year"] >= 2010) & (df["year"] <= 2020)]
# Brazil from 2000 to 2020
brazil_2000s = df[
(df["cname"] == "Brazil") &
(df["year"] >= 2000) &
(df["year"] <= 2020)
]
print(brazil_2000s[["cname", "year"]].head())
# Select only the columns you need
cols = ["cname", "year", "ccode", "wdi_gdpcapcon2017", "vdem_corr"]
subset = df[cols].dropna()
print(subset.head())
QoG datasets contain many variables. Use search_variables() to find columns
matching a pattern.
df = pyqog.read_qog(which_data="standard")
# Search for variables related to corruption
corruption_vars = pyqog.search_variables(df, "corrupt")
print(corruption_vars)
# ['ti_cpi', 'vdem_corr', 'wbgi_cce', ...]
# Search for GDP-related variables
gdp_vars = pyqog.search_variables(df, "gdp")
print(gdp_vars)
# Search for democracy indicators
demo_vars = pyqog.search_variables(df, "demo")
print(demo_vars)
The search is case-insensitive and matches any part of the column name. For a complete description of each variable, refer to the codebooks.
QoG maintains an archive of previous dataset versions. You can download any past version
by specifying the year parameter. This refers to the year of publication,
not the year of the data itself.
# Download the 2020 version of the basic dataset
df_2020 = pyqog.read_qog(which_data="basic", year=2020)
# Download the 2018 version of the standard dataset
df_std_2018 = pyqog.read_qog(which_data="standard", year=2018)
# List available versions for a dataset
versions = pyqog.list_versions("standard")
print(versions)
# [2026, 2025, 2024, 2023, 2022, 2021, 2020, ...]
Important: The year parameter refers to the
publication year of the dataset, not the year of the data. For example,
year=2020 downloads the dataset version published in January 2020, which
contains data up to approximately 2018-2019.
Since pyqog returns pandas DataFrames, you can use any Python visualization
library. Here are some examples with matplotlib and seaborn.
import matplotlib.pyplot as plt
df = pyqog.read_qog(which_data="standard")
# Filter for select countries
countries = ["Brazil", "Argentina", "Chile", "Mexico"]
subset = df[df["cname"].isin(countries)]
# Plot GDP per capita over time
fig, ax = plt.subplots(figsize=(10, 6))
for country in countries:
data = subset[subset["cname"] == country]
ax.plot(data["year"], data["wdi_gdpcapcon2017"], label=country)
ax.set_xlabel("Year")
ax.set_ylabel("GDP per capita (constant 2017 USD)")
ax.set_title("GDP per capita: Latin American Countries")
ax.legend()
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
# Use cross-sectional data for a snapshot
df_cs = pyqog.read_qog(which_data="standard", data_type="cross-sectional")
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(
df_cs["wdi_gdpcapcon2017"],
df_cs["vdem_corr"],
alpha=0.6,
edgecolors="white",
linewidth=0.5
)
ax.set_xlabel("GDP per capita (constant 2017 USD)")
ax.set_ylabel("Corruption Index (V-Dem)")
ax.set_title("Corruption vs GDP per capita")
plt.tight_layout()
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
df_cs = pyqog.read_qog(which_data="standard", data_type="cross-sectional")
# Top 20 countries by GDP per capita
top20 = df_cs.nlargest(20, "wdi_gdpcapcon2017")
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
data=top20,
x="wdi_gdpcapcon2017",
y="cname",
palette="viridis",
ax=ax
)
ax.set_xlabel("GDP per capita (constant 2017 USD)")
ax.set_ylabel("")
ax.set_title("Top 20 Countries by GDP per Capita")
plt.tight_layout()
plt.show()
pyqog automatically caches downloaded datasets to avoid repeated downloads.
# First call: downloads from the internet
df = pyqog.read_qog()
# Second call: loads from local cache (instant)
df = pyqog.read_qog()
# Force a fresh download, overwriting cache
df = pyqog.read_qog(update_cache=True)
# Download without saving to cache
df = pyqog.read_qog(cache=False)
# Use a custom directory for cached files
df = pyqog.read_qog(data_dir="/path/to/my/cache")
By default, cached files are stored in ~/.pyqog/cache/. Each file follows the
naming pattern qog_{dataset}_{type}_{version}.csv.
Each QoG dataset has an accompanying codebook (PDF) that describes every variable in detail. You can get the URL programmatically:
# Get the codebook URL for the standard dataset
url = pyqog.get_codebook_url("standard", 2026)
print(url)
# https://www.qogdata.pol.gu.se/data/codebook_std_jan26.pdf
# Get codebook for an archived version
url_old = pyqog.get_codebook_url("basic", 2020)
print(url_old)
# https://www.qogdata.pol.gu.se/dataarchive/codebook_bas_jan20.pdf
For a complete list of codebook links, see the Datasets page.