Rank-based Latent Causal Discovery (RLCD)

Algorithm Introduction

RLCD [1] learns causal structures with causally-related hidden variables from rank constraints in partially observed linear causal models.

Usage

from causallearn.search.HiddenCausal.RLCD import RLCD

# default parameters
cg = RLCD(data)

# or customized parameters
cg = RLCD(data, ranktest_method, stage1_method, alpha_dict, maxk, node_names)

# visualization using pydot
cg.draw_pydot_graph()

# or save the graph
from causallearn.utils.GraphUtils import GraphUtils

pyd = GraphUtils.to_pydot(cg.G)
pyd.write_png('rlcd_result.png')

Visualization using pydot is recommended. If specific label names are needed, please refer to this usage example (e.g., ‘cg.draw_pydot_graph(labels=[“A”, “B”, “C”])’ or ‘GraphUtils.to_pydot(cg.G, labels=[“A”, “B”, “C”])’).

Inspecting latent variables

The returned CausalGraph includes both observed variables and detected latent variables. Observed variables appear first, followed by latent variables named L1, L2, …

from causallearn.graph.NodeType import NodeType

latent_nodes = [
    node for node in cg.G.get_nodes()
    if node.get_node_type() == NodeType.LATENT
]

print([node.get_name() for node in latent_nodes])
print(cg.all_vars)

RLCD also attaches the following outputs to the returned graph:

cg.stage1_cg   # stage-1 graph over observed variables
cg.adjacency   # adjacency matrix including observed and latent variables
cg.all_vars    # observed variables followed by detected latent variables

For example, the following data has five observed variables generated from one shared latent variable. RLCD can add the detected latent variable to the returned graph.

import numpy as np
from causallearn.graph.NodeType import NodeType
from causallearn.search.HiddenCausal.RLCD import Chi2RankTest, RLCD

rng = np.random.default_rng(1)
sample_size = 3000
latent = rng.normal(size=sample_size)
data = np.column_stack([
    1.0 * latent + 0.05 * rng.normal(size=sample_size),
    1.2 * latent + 0.05 * rng.normal(size=sample_size),
    1.4 * latent + 0.05 * rng.normal(size=sample_size),
    1.6 * latent + 0.05 * rng.normal(size=sample_size),
    1.8 * latent + 0.05 * rng.normal(size=sample_size),
])
data = (data - data.mean(axis=0)) / data.std(axis=0)

cg = RLCD(
    data,
    ranktest_method=Chi2RankTest(data),
    stage1_method="all",
    maxk=2,
)

latent_nodes = [
    node for node in cg.G.get_nodes()
    if node.get_node_type() == NodeType.LATENT
]

print(cg.all_vars)
print([node.get_name() for node in latent_nodes])

This example prints ['X1', 'X2', 'X3', 'X4', 'X5', 'L1'] for cg.all_vars and ['L1'] for the detected latent variables.

Parameters

data: numpy.ndarray, shape (n_samples, n_features). Data, where n_samples is the number of samples and n_features is the number of features.

ranktest_method: rank test object, optional. The rank test object should provide a test(pcols, qcols, r, alpha) method. If not provided, Chi2RankTest(data) is used.

stage1_method: str. Stage-1 method used to partition observed variables. Default: ‘ges’.

alpha_dict: dict, optional. Significance levels for rank tests by rank. Default: {0: 0.01, 1: 0.01, 2: 0.01, 3: 0.01}.

maxk: int. Maximum rank-search cardinality. Default: 3.

node_names: list, optional. Names of observed variables in the returned graph. If not provided, variables are named X1, X2, … Latent variables are named L1, L2, …

Returns

cg: CausalGraph. Learned graph over observed and latent variables, where cg.G.graph[j,i]=1 and cg.G.graph[i,j]=-1 indicate i --> j; cg.G.graph[i,j] = cg.G.graph[j,i] = -1 indicate i --- j; cg.G.graph[i,j] = cg.G.graph[j,i] = 1 indicates i <-> j. The returned object also stores cg.stage1_cg, cg.adjacency, and cg.all_vars for inspecting the stage-1 graph, the full adjacency matrix, and the variable names including latent variables.