solver.press

Using complex matrix interpolation techniques from multi-manifold learning to analyze cross-tissue transcriptomic covariance in Multiple Sclerosis will uncover shared regulatory axes driving immune cell infiltration into the CNS.

BiologyApr 27, 2026Evaluation Score: 63%

Adversarial Debate Score

60% survival rate under critique

Model Critiques

openai: The hypothesis is falsifiable and leverages plausible computational methods (matrix interpolation, multi-manifold learning) for cross-tissue transcriptomic analysis in MS, supported in part by relevant machine learning and matrix interpolation literature. However, direct evidence that these techn...
grok: The hypothesis is falsifiable through empirical testing of matrix interpolation on transcriptomic data, and it is partially supported by papers on machine learning in MS and complex matrix interpolation. However, it lacks direct evidence linking these techniques to shared regulatory axes or immun...
mistral: The hypothesis is falsifiable and aligns with emerging methods in multi-manifold learning, but the supporting papers provide only indirect evidence, and counterarguments about noise in cross-tissue covariance or overfitting risks weaken its robustness.
anthropic: The hypothesis combines two relevant components (cross-tissue MS transcriptomics and matrix interpolation) that each have paper support, but the connection between them is speculative and underdeveloped — the matrix interpolation paper addresses geometric/spectral properties without demonstrated ...

Supporting Research Papers

Formal Verification

Z3 logical consistency:✅ Consistent

Z3 checks whether the hypothesis is internally consistent, not whether it is empirically true.

Experimental Validation Package

This discovery has a Claude-generated validation package with a full experimental design.

Precise Hypothesis

Multi-manifold learning with complex matrix interpolation applied to paired peripheral blood and CNS tissue transcriptomic data from MS patients will identify ≥3 statistically significant shared regulatory axes (gene co-expression modules) that are enriched for immune cell infiltration signatures, with these axes explaining >15% of cross-tissue transcriptomic covariance and showing differential activity (FDR < 0.05) between MS patients and healthy controls in at least two independent cohorts.

Disproof criteria:
  1. Fewer than 3 cross-tissue co-expression modules survive FDR correction (q > 0.05) after multi-manifold decomposition.
  2. Identified regulatory axes explain <5% of cross-tissue transcriptomic covariance (R² < 0.05 in held-out test set).
  3. No significant enrichment (hypergeometric p > 0.05) of immune infiltration gene sets (MSigDB C7, GO:0002250) in any identified axis.
  4. Axes identified in discovery cohort fail to replicate in ≥1 independent MS cohort (Pearson r < 0.3 for module eigengene correlation across tissues).
  5. Complex matrix interpolation performs no better than standard PCA or canonical correlation analysis (CCA) on the same data (ΔAUC < 0.05 on immune infiltration prediction task).
  6. Identified axes are confounded by batch effects, tissue dissection artifacts, or cell-type composition differences rather than regulatory biology (confirmed by sensitivity analysis).
  7. Permutation testing (n=1,000 permutations) shows that observed cross-tissue covariance structure is consistent with random expectation (p > 0.05).

Experimental Protocol

Phase 1 — Data Assembly and Preprocessing (Days 1–30): Acquire and harmonize multi-tissue MS transcriptomic datasets. Apply rigorous QC, batch correction (ComBat-seq), and cell-type deconvolution. Phase 2 — Manifold Construction (Days 31–60): Implement multi-manifold learning framework with complex matrix interpolation. Construct tissue-specific manifolds and compute cross-tissue covariance operators. Phase 3 — Regulatory Axis Identification (Days 61–90): Extract shared regulatory axes via joint spectral decomposition. Annotate axes with TF binding enrichment, immune gene sets, and pathway analysis. Phase 4 — Validation (Days 91–130): Validate axes in independent cohort. Compare performance against CCA, MOFA+, and standard PCA baselines. Functional validation via perturbation experiments. Phase 5 — Biological Interpretation (Days 131–150): Causal inference on top axes using Mendelian randomization with MS GWAS data. Prioritize druggable targets.

Required datasets:
  1. MSBioScreen cohort (GEO: GSE193770) — paired PBMC + CSF transcriptomics, n=120 MS patients, n=40 controls.
  2. UK MS Brain Bank RNA-seq (ArrayExpress: E-MTAB-8316) — CNS white matter lesion bulk RNA-seq, n=215 samples.
  3. Human Cell Atlas MS dataset (CZI CELLxGENE) — scRNA-seq from CNS and blood, ~500K cells.
  4. GTEx v8 brain + whole blood paired samples (dbGaP: phs000424) — healthy reference manifold construction, n=838 donors.
  5. MS GWAS summary statistics (IMSGC 2019, n=47,429 cases) — for Mendelian randomization validation.
  6. JASPAR 2022 TF binding database — regulatory axis annotation.
  7. MSigDB v2023.1 (C2, C7, H collections) — gene set enrichment.
  8. Replication cohort: GEMS study (GEO: GSE41850) — independent blood transcriptomics, n=141 MS patients.
  9. Reference immune infiltration signatures: LM22 matrix (CIBERSORT), CellChat ligand-receptor database.
  10. Computational environment: Python 3.10+, R 4.3+, PyTorch 2.0+, scanpy 1.9+, MOFA+ 1.6+.
Success:
  1. ≥3 regulatory axes survive permutation testing (p < 0.01, Bonferroni corrected) — primary endpoint.
  2. Cross-tissue covariance R² > 0.15 for top axis in held-out test set (5-fold CV).
  3. ≥1 axis shows significant GSEA enrichment for immune infiltration gene sets (NES > 1.5, FDR q < 0.05).
  4. Multi-manifold method outperforms best baseline (MOFA+) by ΔAUC > 0.05 on immune infiltration prediction.
  5. Top axis replicates in GEMS cohort (Spearman r > 0.3, p < 0.05 for axis score vs. EDSS correlation).
  6. ≥1 axis TF enrichment overlaps with known MS genetic risk loci (IMSGC GWAS p < 5×10⁻⁸) within 500kb.
  7. Mendelian randomization shows nominally significant causal signal (IVW p < 0.05) for ≥1 axis.
Failure:
  1. <3 axes survive permutation testing after Bonferroni correction.
  2. Best axis explains <5% cross-tissue covariance (R² < 0.05).
  3. No immune gene set enrichment in any axis (all FDR q > 0.20).
  4. Multi-manifold method performs equivalently to PCA (ΔAUC < 0.02).
  5. Zero replication of axes in independent cohort (all Spearman r < 0.15).
  6. 40% of variance in axes explained by known confounders (age, sex, treatment status) in linear regression.

  7. Complex matrix interpolation fails to converge or produces non-positive-definite operators for >50% of interpolation values t.

420

GPU hours

150d

Time to result

$4,200

Min cost

$28,500

Full cost

ROI Projection

Commercial:
  1. Diagnostic: Cross-tissue regulatory axis scores as blood biomarkers for MS disease activity monitoring — addressable market ~$800M globally (MS diagnostics segment growing at 8.2% CAGR).
  2. Therapeutic target identification: TFs and upstream regulators identified as axes could be targeted by small molecules or ASOs; 3–5 year path to IND filing for top candidate.
  3. Platform technology: Multi-manifold complex interpolation framework applicable to other neuroinflammatory diseases (NMO, ALS, Alzheimer's) — platform licensing potential $5–20M.
  4. Computational tool: Open-source release of manifold learning pipeline with MS application could attract pharma partnerships (precedent: Seurat, $0 license but $50M+ in partnership value generated for Satija lab).
  5. Personalized medicine: Axis-based patient stratification for existing MS therapies (natalizumab, ocrelizumab) — companion diagnostic potential worth $30–80M in co-development deals.
  6. Data asset: Harmonized multi-tissue MS transcriptomic database created during validation has standalone value for licensing to pharma ($500K–2M one-time or subscription).

🔓 If proven, this unlocks

Proving this hypothesis is a prerequisite for the following downstream discoveries and applications:

  • 1MS_DRUGGABLE_TARGET_PRIORITIZATION_010
  • 2CNS_IMMUNE_AXIS_PERTURBATION_011
  • 3CROSS_DISEASE_MANIFOLD_TRANSFER_012
  • 4MS_BIOMARKER_BLOOD_PANEL_013
  • 5REGULATORY_NETWORK_CAUSAL_INFERENCE_014

Prerequisites

These must be validated before this hypothesis can be confirmed:

  • MS_PAIRED_TRANSCRIPTOMICS_QC_001
  • MULTIMANIFOLD_LEARNING_BENCHMARK_002
  • CIBERSORT_CNS_VALIDATION_003
  • CROSS_TISSUE_COVARIANCE_BASELINE_004

Implementation Sketch

# Multi-Manifold Complex Matrix Interpolation for MS Cross-Tissue Analysis
# Architecture Overview

import numpy as np
import scipy.linalg as la
from scipy.sparse import csr_matrix
import torch
import scanpy as sc
from sklearn.preprocessing import StandardScaler

# ── STEP 1: DATA LOADING & PREPROCESSING ──────────────────────────────────────
class MultiTissueDataLoader:
    def __init__(self, blood_adata_path, cns_adata_path, paired_sample_ids):
        self.blood = sc.read_h5ad(blood_adata_path)
        self.cns = sc.read_h5ad(cns_adata_path)
        self.paired_ids = paired_sample_ids  # list of matched sample IDs

    def preprocess(self):
        for adata in [self.blood, self.cns]:
            sc.pp.filter_genes(adata, min_cells=10)
            sc.pp.normalize_total(adata, target_sum=1e6)  # CPM
            sc.pp.log1p(adata)
            sc.pp.highly_variable_genes(adata, n_top_genes=5000)
        # Subset to paired samples
        self.X_blood = self.blood[self.paired_ids].X.toarray()  # shape: (n_samples, p_blood)
        self.X_cns = self.cns[self.paired_ids].X.toarray()      # shape: (n_samples, p_cns)
        return self.X_blood, self.X_cns

# ── STEP 2: MANIFOLD CONSTRUCTION ─────────────────────────────────────────────
class TissueManifold:
    def __init__(self, X, n_neighbors=15, sigma=None):
        self.X = X
        self.n = X.shape[0]
        self.k = n_neighbors
        self.sigma = sigma or self._estimate_bandwidth()

    def _estimate_bandwidth(self):
        # 5-NN bandwidth estimation
        from sklearn.neighbors import NearestNeighbors
        nbrs = NearestNeighbors(n_neighbors=5).fit(self.X)
        distances, _ = nbrs.kneighbors(self.X)
        return np.median(distances[:, -1])

    def compute_affinity_matrix(self):
        from sklearn.metrics import pairwise_distances
        D = pairwise_distances(self.X)
        W = np.exp(-D**2 / (2 * self.sigma**2))
        np.fill_diagonal(W, 0)
        return W

    def compute_diffusion_operator(self):
        W = self.compute_affinity_matrix()
        D_inv = np.diag(1.0 / W.sum(axis=1))
        P = D_inv @ W  # row-stochastic diffusion operator
        return P  # shape: (n, n)

# ── STEP 3: COMPLEX MATRIX INTERPOLATION ──────────────────────────────────────
class ComplexManifoldInterpolator:
    """
    Interpolates between two diffusion operators using matrix logarithm/exponential
    in the complex domain to handle non-commuting operators.
    """
    def __init__(self, P_blood, P_cns):
        self.P_blood = P_blood.astype(complex)
        self.P_cns = P_cns.astype(complex)
        self.log_P_blood = la.logm(P_blood + 1e-8 * np.eye(P_blood.shape[0]))
        self.log_P_cns = la.logm(P_cns + 1e-8 * np.eye(P_cns.shape[0]))

    def interpolate(self, t):
        """Geodesic interpolation: P(t) = exp((1-t)*log(P_blood) + t*log(P_cns))"""
        assert 0 <= t <= 1
        log_Pt = (1 - t) * self.log_P_blood + t * self.log_P_cns
        Pt = la.expm(log_Pt)
        return np.real(Pt)  # take real part; imaginary should be ~0 for valid operators

    def find_optimal_t(self, t_grid=None, cv_folds=5):
        """Select t* minimizing cross-tissue reconstruction error via CV"""
        from sklearn.model_selection import KFold
        if t_grid is None:
            t_grid = np.linspace(0.1, 0.9, 9)
        errors = []
        kf = KFold(n_splits=cv_folds, shuffle=True, random_state=42)
        for t in t_grid:
            Pt = self.interpolate(t)
            fold_errors = []
            for train_idx, test_idx in kf.split(Pt):
                # Reconstruction error: how well manifold coords predict cross-tissue expression
                Pt_train = Pt[np.ix_(train_idx, train_idx)]
                Pt_test = Pt[np.ix_(test_idx, train_idx)]
                # Simplified: use Frobenius norm of off-diagonal cross-tissue block
                fold_errors.append(np.linalg.norm(Pt_test - Pt_test.mean()))
            errors.append(np.mean(fold_errors))
        t_star = t_grid[np.argmin(errors)]
        return t_star, errors

# ── STEP 4: JOINT SPECTRAL DECOMPOSITION ──────────────────────────────────────
class RegulatoryAxisExtractor:
    def __init__(self, Pt_star, X_blood, X_cns, n_axes=25):
        self.Pt = Pt_star
        self.X_blood = X_blood
        self.X_cns = X_cns
        self.K = n_axes

    def extract_eigenvectors(self):
        """Spectral decomposition; select K by eigengap criterion"""
        eigenvalues, eigenvectors = np.linalg.eigh(self.Pt)
        # Sort descending
        idx = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        # Eigengap criterion
        gaps = np.diff(eigenvalues[:50])
        K_auto = np.argmin(gaps[:30]) + 1  # first large gap
        K = min(self.K, K_auto)
        self.manifold_coords = eigenvectors[:, :K]  # shape: (n_samples, K)
        self.eigenvalues = eigenvalues[:K]
        return self.manifold_coords

    def map_to_gene_space(self, alpha=0.01):
        """Ridge regression: gene_loadings = (Z^T Z + alpha I)^{-1} Z^T X"""
        from sklearn.linear_model import Ridge
        Z = self.manifold_coords
        self.blood_loadings = Ridge(alpha=alpha).fit(Z, self.X_blood).coef_  # (K, p_blood)
        self.cns_loadings = Ridge(alpha=alpha).fit(Z, self.X_cns).coef_      # (K, p_cns)
        return self.blood_loadings, self.cns_loadings

    def permutation_test(self, n_permutations=1000):
        """Test significance of each axis via label permutation"""
        observed_var = np.var(self.manifold_coords, axis=0)
        null_vars = np.zeros((n_permutations, self.manifold_coords.shape[1]))
        for i in range(n_permutations):
            perm_idx = np.random.permutation(self.Pt.shape[0])
            Pt_perm = self.Pt[perm_idx][:, perm_idx]
            _, evecs_perm = np.linalg.eigh(Pt_perm)
            null_vars[i] = np.var(evecs_perm[:, -self.manifold_coords.shape[1]:], axis=0)
        p_values = (null_vars >= observed_var).mean(axis=0)
        return p_values  # Bonferroni correct externally

# ── STEP 5: ENRICHMENT & ANNOTATION ───────────────────────────────────────────
class AxisAnnotator:
    def __init__(self, gene_loadings, gene_names, msigdb_path, jaspar_path):
        self.loadings = gene_loadings
        self.genes = gene_names
        self.msigdb = self._load_msigdb(msigdb_path)
        self.jaspar = jaspar_path

    def _load_msigdb(self, path):
        import json
        with open(path) as f:
            return json.load(f)

    def gsea_per_axis(self, axis_idx, top_n=200):
        """Run fgsea via rpy2 interface"""
        import rpy2.robjects as ro
        ranked_genes = dict(zip(self.genes,
                                self.loadings[axis_idx]))
        # Call fgsea in R
        ro.r(f'''
            library(fgsea)
            ranks <- c({",".join([f'"{g}"={v}' for g,v in ranked_genes.items()])})
            result <- fgsea(pathways=msigdb_genesets, stats=ranks,
                           nperm=10000, minSize=15, maxSize=500)
            result[padj < 0.05]
        ''')

    def compute_cross_tissue_r2(self, blood_scores, cns_scores):
        """R² between axis scores across tissues"""
        from sklearn.metrics import r2_score
        return r2_score(cns_scores, blood_scores)

# ── STEP 6: BASELINE COMPARISON ───────────────────────────────────────────────
class BaselineComparison:
    def run_mofa_plus(self, X_blood, X_cns, n_factors=25):
        """Run MOFA+ via mofapy2"""
        from mofapy2.run.entry_point import entry_point
        ent = entry_point()
        ent.set_data_matrix([[X_blood, X_cns]])
        ent.set_model_options(factors=n_factors)
        ent.build()
        ent.run()
        return ent.model.get_factors()

    def run_cca(self, X_blood, X_cns, n_components=25):
        from sklearn.cross_decomposition import CCA
        cca = CCA(n_components=n_components)
        cca.fit(X_blood, X_cns)
        return cca.transform(X_blood, X_cns)

    def compare_auc(self, method_scores, infiltration_labels):
        from sklearn.linear_model import LogisticRegressionCV
        from sklearn.model_selection import cross_val_score
        clf = LogisticRegressionCV(cv=5, max_iter=1000)
        auc = cross_val_score(clf, method_scores, infiltration_labels,
                              cv=5, scoring='roc_auc').mean()
        return auc

# ── MAIN PIPELINE ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
    # 1. Load data
    loader = MultiTissueDataLoader("blood.h5ad", "cns.h5ad", paired_ids)
    X_blood, X_cns = loader.preprocess()

    # 2. Build manifolds
    blood_manifold = TissueManifold(X_blood, n_neighbors=15)
    cns_manifold = TissueManifold(X_cns, n_neighbors=15)
    P_blood = blood_manifold.compute_diffusion_operator()
    P_cns = cns_manifold.compute_diffusion_operator()

    # 3. Complex interpolation
    interpolator = ComplexManifoldInterpolator(P_blood, P_cns)
    t_star, cv_errors = interpolator.find_optimal_t()
    Pt_star = interpolator.interpolate(t_star)

    # 4. Extract regulatory axes
    extractor = RegulatoryAxisExtractor(Pt_star, X_blood, X_cns)
    manifold_coords = extractor.extract_eigenvectors()
    blood_loadings, cns_loadings = extractor.map_to_gene_space()
    p_values = extractor.permutation_test(n_permutations=1000)

    # 5. Filter significant axes (Bonferroni)
    K = manifold_coords.shape[1]
    sig_axes = np.where(p_values < 0.01 / K)[0]
    print(f"Significant axes: {len(sig_axes)} (threshold: {0.01/K:.4f})")

    # 6. Annotate
    annotator = AxisAnnotator(blood_loadings[sig_axes], gene_names,
                              "msigdb_v2023.json", "jaspar2022.meme")
    for ax in sig_axes:
        annotator.gsea_per_axis(ax)

    # 7. Baseline comparison
    baseline = BaselineComparison()
    mofa_factors = baseline.run_mofa_plus(X_blood, X_cns)
    manifold_auc = baseline.compare_auc(manifold_coords[:, sig_axes], infiltration_labels)
    mofa_auc = baseline.compare_auc(mofa_factors, infiltration_labels)
    print(f"Manifold AUC: {manifold_auc:.3f} | MOFA+ AUC: {mofa_auc:.3f} | Delta: {manifold_auc-mofa_auc:.3f}")

    # SUCCESS CHECK
    assert len(sig_axes) >= 3, "FAIL: <3 significant axes"
    assert (manifold_auc - mofa_auc) > 0.05, "FAIL: No improvement over MOFA+"
Abort checkpoints:
  1. Day 15 — Data QC checkpoint: If <40 paired MS samples pass QC across all datasets combined, abort and seek additional data sources before proceeding. Threshold: n_paired < 40.
  2. Day 35 — Manifold quality checkpoint: If KBET batch correction acceptance rate < 0.5 or if manifold intrinsic dimensionality estimate > 50 (suggesting noise dominance), abort manifold approach and pivot to linear methods. Threshold: KBET < 0.

📡 New evidence since EVP generation

Discoveries published after this EVP was written that relate to its hypothesis or downstream unlocks.

Source

AegisMind Research
Need AI to work rigorously on your problems? AegisMind uses the same multi-model engine for personal and professional use. Get started