Tutorial — pyqog

Getting Started

Make sure you have pyqog installed. If not, see the Installation page.

import pyqog
import pandas as pd

The core function is read_qog(), which downloads a QoG dataset and returns it as a pandas DataFrame. On the first call it downloads from the QoG servers; subsequent calls use the local cache.

Downloading the Basic Dataset

The Basic dataset contains a curated selection of the most commonly used governance and institutional quality indicators. It is a good starting point for most analyses.

# Download basic time-series dataset (latest version)
df = pyqog.read_qog()

# Explore the data
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns.tolist()[:10]}...")
print(df.head())

By default, read_qog() downloads the basic dataset in time-series format for the most recent year of publication (2026).

Downloading the Standard Dataset

The Standard dataset is the most comprehensive QoG dataset, containing approximately 2,000 variables from hundreds of data sources. Use it when you need access to a wide range of indicators.

# Download standard time-series dataset
df_std = pyqog.read_qog(which_data="standard")

print(f"Variables: {len(df_std.columns)}")
print(f"Rows: {len(df_std)}")

Note: The Standard dataset is large (typically 50+ MB as CSV). The first download may take some time. After caching, it loads instantly from disk.

Downloading the OECD Dataset

The OECD dataset focuses on OECD member countries with indicators particularly relevant to developed economies.

# Download OECD time-series dataset
df_oecd = pyqog.read_qog(which_data="oecd")

# Check which countries are included
print(df_oecd["cname"].unique())

Time-Series vs Cross-Sectional

QoG datasets come in two formats:

Time-Series (TS)

Multiple observations per country across different years. Each row is a country-year pair. Use this format for panel data analysis, trends over time, and longitudinal studies.

# Time-series (default)
df_ts = pyqog.read_qog(
    data_type="time-series"
)
# Columns: cname, year, ...

Cross-Sectional (CS)

One observation per country (typically the most recent available value for each indicator). Use this for snapshots and cross-country comparisons at a single point in time.

# Cross-sectional
df_cs = pyqog.read_qog(
    data_type="cross-sectional"
)
# Columns: cname, ...

Filtering by Country and Year

Since read_qog() returns a pandas DataFrame, you can use all standard pandas filtering methods.

Filter by country

df = pyqog.read_qog()

# Single country
brazil = df[df["cname"] == "Brazil"]
print(brazil.head())

# Multiple countries
latam = df[df["cname"].isin(["Brazil", "Argentina", "Chile", "Colombia"])]
print(latam.head())

Filter by year

# Single year
df_2020 = df[df["year"] == 2020]

# Year range
df_recent = df[(df["year"] >= 2010) & (df["year"] <= 2020)]

Combine filters

# Brazil from 2000 to 2020
brazil_2000s = df[
    (df["cname"] == "Brazil") &
    (df["year"] >= 2000) &
    (df["year"] <= 2020)
]
print(brazil_2000s[["cname", "year"]].head())

Select specific columns

# Select only the columns you need
cols = ["cname", "year", "ccode", "wdi_gdpcapcon2017", "vdem_corr"]
subset = df[cols].dropna()
print(subset.head())

Searching Variables

QoG datasets contain many variables. Use search_variables() to find columns matching a pattern.

df = pyqog.read_qog(which_data="standard")

# Search for variables related to corruption
corruption_vars = pyqog.search_variables(df, "corrupt")
print(corruption_vars)
# ['ti_cpi', 'vdem_corr', 'wbgi_cce', ...]

# Search for GDP-related variables
gdp_vars = pyqog.search_variables(df, "gdp")
print(gdp_vars)

# Search for democracy indicators
demo_vars = pyqog.search_variables(df, "demo")
print(demo_vars)

The search is case-insensitive and matches any part of the column name. For a complete description of each variable, refer to the codebooks.

Accessing Archive Data

QoG maintains an archive of previous dataset versions. You can download any past version by specifying the year parameter. This refers to the year of publication, not the year of the data itself.

# Download the 2020 version of the basic dataset
df_2020 = pyqog.read_qog(which_data="basic", year=2020)

# Download the 2018 version of the standard dataset
df_std_2018 = pyqog.read_qog(which_data="standard", year=2018)

# List available versions for a dataset
versions = pyqog.list_versions("standard")
print(versions)
# [2026, 2025, 2024, 2023, 2022, 2021, 2020, ...]

Important: The year parameter refers to the publication year of the dataset, not the year of the data. For example, year=2020 downloads the dataset version published in January 2020, which contains data up to approximately 2018-2019.

Visualization Examples

Since pyqog returns pandas DataFrames, you can use any Python visualization library. Here are some examples with matplotlib and seaborn.

Example 1: GDP per capita over time

import matplotlib.pyplot as plt

df = pyqog.read_qog(which_data="standard")

# Filter for select countries
countries = ["Brazil", "Argentina", "Chile", "Mexico"]
subset = df[df["cname"].isin(countries)]

# Plot GDP per capita over time
fig, ax = plt.subplots(figsize=(10, 6))
for country in countries:
    data = subset[subset["cname"] == country]
    ax.plot(data["year"], data["wdi_gdpcapcon2017"], label=country)

ax.set_xlabel("Year")
ax.set_ylabel("GDP per capita (constant 2017 USD)")
ax.set_title("GDP per capita: Latin American Countries")
ax.legend()
plt.tight_layout()
plt.show()

Example 2: Corruption vs GDP scatter plot

import matplotlib.pyplot as plt

# Use cross-sectional data for a snapshot
df_cs = pyqog.read_qog(which_data="standard", data_type="cross-sectional")

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(
    df_cs["wdi_gdpcapcon2017"],
    df_cs["vdem_corr"],
    alpha=0.6,
    edgecolors="white",
    linewidth=0.5
)
ax.set_xlabel("GDP per capita (constant 2017 USD)")
ax.set_ylabel("Corruption Index (V-Dem)")
ax.set_title("Corruption vs GDP per capita")
plt.tight_layout()
plt.show()

Example 3: Bar chart with seaborn

import seaborn as sns
import matplotlib.pyplot as plt

df_cs = pyqog.read_qog(which_data="standard", data_type="cross-sectional")

# Top 20 countries by GDP per capita
top20 = df_cs.nlargest(20, "wdi_gdpcapcon2017")

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
    data=top20,
    x="wdi_gdpcapcon2017",
    y="cname",
    palette="viridis",
    ax=ax
)
ax.set_xlabel("GDP per capita (constant 2017 USD)")
ax.set_ylabel("")
ax.set_title("Top 20 Countries by GDP per Capita")
plt.tight_layout()
plt.show()

Working with Cache

pyqog automatically caches downloaded datasets to avoid repeated downloads.

Default behavior

# First call: downloads from the internet
df = pyqog.read_qog()

# Second call: loads from local cache (instant)
df = pyqog.read_qog()

Force re-download

# Force a fresh download, overwriting cache
df = pyqog.read_qog(update_cache=True)

Disable caching

# Download without saving to cache
df = pyqog.read_qog(cache=False)

Custom cache directory

# Use a custom directory for cached files
df = pyqog.read_qog(data_dir="/path/to/my/cache")

By default, cached files are stored in ~/.pyqog/cache/. Each file follows the naming pattern qog_{dataset}_{type}_{version}.csv.

Codebooks

Each QoG dataset has an accompanying codebook (PDF) that describes every variable in detail. You can get the URL programmatically:

# Get the codebook URL for the standard dataset
url = pyqog.get_codebook_url("standard", 2026)
print(url)
# https://www.qogdata.pol.gu.se/data/codebook_std_jan26.pdf

# Get codebook for an archived version
url_old = pyqog.get_codebook_url("basic", 2020)
print(url_old)
# https://www.qogdata.pol.gu.se/dataarchive/codebook_bas_jan20.pdf

For a complete list of codebook links, see the Datasets page.