Descriptive & Comparison¶

Data Summary¶

Computes pixel intensity histograms for images or descriptive statistics for DataFrames.

Details

Inputs:

any — an image (grayscale or RGB) or a pandas DataFrame
mask — optional mask to restrict image histograms to the masked region

Outputs:

table — histogram bin counts (images) or describe() summary (DataFrames)
figure — distribution plot of the input data

Direction	Port	Type
Input	`in`	in
Input	`mask`	mask
Output	`table`	table
Output	`fig`	fig

Outlier Detection¶

Detects and removes outliers in numerical data using statistical tests.

Details

Methods:

ROUT (Prism Regression) — robust nonlinear regression-based detection
ROUT (Fast Math) — faster variant of the ROUT method
Grubbs — classical single-outlier test applied iteratively
Threshold — Q value (ROUT) or alpha significance level (Grubbs).

Outputs two tables: rows kept and rows removed.

Direction	Port	Type
Input	`in`	table
Output	`kept`	table
Output	`removed`	table

Properties: Method

Grouped Comparison¶

Tests whether there are significant differences among two or more groups.

Details

Tests:

One-Way ANOVA — parametric, assumes normal distribution and equal variances
Kruskal-Wallis — non-parametric rank-based alternative to ANOVA

Outputs a summary table with test statistic, p-value, and significance flag.

Direction	Port	Type
Input	`in`	table
Output	`stats_table`	stat

Properties: Statistical Method

Pairwise Comparison¶

Performs pairwise comparisons between groups using parametric or non-parametric tests.

Details

Tests:

Student's T-test — parametric, assumes equal variance and normal distribution
Welch's T-test — parametric, does not assume equal variance
Mann-Whitney U — non-parametric rank-based test
Kolmogorov-Smirnov — tests whether two groups come from the same distribution
Two-sample Z-test — compares means when variance is known or n is large
Permutation test — non-parametric, no distributional assumptions, resampling-based
Tukey HSD — post-hoc test after ANOVA
Dunn — non-parametric post-hoc test (requires scikit-posthocs)
Fisher's Z — compare correlation coefficients between groups (target column = r values)
Alternative — two-sided (default), greater (group1 > group2), or less (group1 < group2). Tukey HSD and Dunn are always two-sided.
P-Adj Method — multiple comparison correction (Bonferroni, Holm, BH).
N Permutations — number of resampling iterations for the permutation test (default 10,000).

Direction	Port	Type
Input	`in`	table
Output	`stats_table`	stat

Properties: Statistical Method, Alternative, N Permutations, P-Adj Method

Normality Test¶

Tests whether each numerical column in a DataFrame follows a normal distribution.

Details

Tests:

Shapiro-Wilk — recommended for small to moderate samples
Kolmogorov-Smirnov — compares against a theoretical normal CDF
Anderson-Darling — weighted variant sensitive to distribution tails

Outputs:

results — summary table with test statistic, p-value, and pass/fail per column.
qq_plot — Q-Q (quantile-quantile) plots for each column. Points following the red dashed reference line indicate normality; systematic curvature suggests non-normal distribution.

Use the Group Column option to test normality per group (e.g. per treatment condition before running a t-test or ANOVA).

Direction	Port	Type
Input	`in`	table
Output	`results`	table
Output	`qq_plot`	figure

Properties: Test(s)

Pairwise Matrix¶

Computes a pairwise correlation or distance matrix for all numeric columns and visualises it as a heatmap.

Details

Correlation methods:

Pearson — linear correlation coefficient, assumes normality
Spearman — rank-based, robust to outliers and non-normal distributions
Kendall — rank-based, slower but more exact for small sample sizes

Outputs a matrix table (for further analysis) and an annotated heatmap figure.

Direction	Port	Type
Input	`in`	table
Output	`table`	table
Output	`figure`	figure

Properties: Metric, Colormap, ,

Variance Test¶

Tests whether two or more groups have equal variance (homoscedasticity).

Details

Use this to decide between Student's t-test (equal variance) and Welch's t-test (unequal variance), or to check ANOVA assumptions.

Tests:

Levene's test — robust, works for non-normal data (recommended default)
Bartlett's test — more powerful but assumes normality
F-test — classical variance ratio test for exactly 2 groups (sensitive to non-normality)

Outputs a table with test statistic and p-value per group pair (F-test) or for all groups at once (Levene, Bartlett).

A significant p-value (< 0.05) means variances are NOT equal — use Welch's t-test.

Direction	Port	Type
Input	`in`	table
Output	`result`	table

Properties: Test

Effect Size¶

Calculates effect sizes for pairwise group comparisons.

Details

Measures how large the difference between groups is, complementing p-values from statistical tests. Journals increasingly require effect sizes alongside significance testing.

Methods:

Auto — Cohen's d for 2 groups, Eta-squared for 3+ groups
Cohen's d — standardised mean difference (pooled SD)
Hedges' g — Cohen's d with small-sample bias correction
Glass's delta — mean difference divided by the control group SD
Rank-biserial r — effect size for Mann-Whitney U (non-parametric)
Eta-squared — proportion of variance explained (ANOVA-style)
Omega-squared — bias-corrected eta-squared

Output columns: group1, group2, n1, n2, effect_size, ci_lower, ci_upper, magnitude, method.

magnitude uses conventional thresholds:

Cohen's d / Hedges' g / Glass's delta: negligible < 0.2, small < 0.5, medium < 0.8, large >= 0.8
Eta-squared / Omega-squared: negligible < 0.01, small < 0.06, medium < 0.14, large >= 0.14

Direction	Port	Type
Input	`in`	table
Output	`results`	table

Properties: Method, CI Level, Bootstrap Iterations

Descriptive Stats¶

Computes comprehensive descriptive statistics for numeric columns.

Details

Calculates per-group (or overall) statistics including central tendency, dispersion, shape, and confidence intervals — everything needed for a publication-ready summary table.

Output columns: group, column, n, mean, median, std, sem, ci_lower, ci_upper, min, q1, q3, max, iqr, skewness, kurtosis, cv.

group_col — optional grouping column. If set, statistics are computed per group. Leave blank for overall stats.
value_cols — columns to summarise. Leave blank for all numeric.
ci_level — confidence interval level (default 0.95).

Direction	Port	Type
Input	`in`	table
Output	`results`	table

Properties: CI Level

Distribution Fit¶

Fits data to candidate probability distributions and ranks them by goodness-of-fit (AIC / BIC / Kolmogorov-Smirnov).

Details

Select which distributions to test, or use All to try every candidate. The node outputs a ranking table with fitted parameters and a figure overlaying the best-fit PDFs on the empirical histogram.

Candidate distributions: Normal, Log-Normal, Exponential, Gamma, Weibull, Beta, Rayleigh, Uniform, Cauchy, Logistic, Pareto, Student-t, Inverse Gaussian.

Outputs:

results — one row per tested distribution with shape/loc/scale params, log-likelihood, AIC, BIC, KS statistic, and KS p-value, sorted by AIC.
figure — histogram of the data with top-N best-fit PDF curves overlaid.

Direction	Port	Type
Input	`in`	table
Output	`results`	table

Properties: Distributions, Overlay Top-N, Histogram Bins, Fig Width, Fig Height