Descriptive & Comparison¶
Data Summary¶
Computes pixel intensity histograms for images or descriptive statistics for DataFrames.
Details
Inputs:
- any — an image (grayscale or RGB) or a pandas DataFrame
- mask — optional mask to restrict image histograms to the masked region
Outputs:
- table — histogram bin counts (images) or
describe()summary (DataFrames) - figure — distribution plot of the input data
| Direction | Port | Type |
|---|---|---|
| Input | in |
in |
| Input | mask |
mask |
| Output | table |
table |
| Output | fig |
fig |
Outlier Detection¶
Detects and removes outliers in numerical data using statistical tests.
Details
Methods:
- ROUT (Prism Regression) — robust nonlinear regression-based detection
- ROUT (Fast Math) — faster variant of the ROUT method
-
Grubbs — classical single-outlier test applied iteratively
-
Threshold — Q value (ROUT) or alpha significance level (Grubbs).
Outputs two tables: rows kept and rows removed.
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | kept |
table |
| Output | removed |
table |
Properties: Method
Grouped Comparison¶
Tests whether there are significant differences among two or more groups.
Details
Tests:
- One-Way ANOVA — parametric, assumes normal distribution and equal variances
- Kruskal-Wallis — non-parametric rank-based alternative to ANOVA
Outputs a summary table with test statistic, p-value, and significance flag.
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | stats_table |
stat |
Properties: Statistical Method
Pairwise Comparison¶
Performs pairwise comparisons between groups using parametric or non-parametric tests.
Details
Tests:
- Student's T-test — parametric, assumes equal variance and normal distribution
- Welch's T-test — parametric, does not assume equal variance
- Mann-Whitney U — non-parametric rank-based test
- Kolmogorov-Smirnov — tests whether two groups come from the same distribution
- Two-sample Z-test — compares means when variance is known or n is large
- Permutation test — non-parametric, no distributional assumptions, resampling-based
- Tukey HSD — post-hoc test after ANOVA
- Dunn — non-parametric post-hoc test (requires scikit-posthocs)
-
Fisher's Z — compare correlation coefficients between groups (target column = r values)
-
Alternative — two-sided (default), greater (group1 > group2), or less (group1 < group2). Tukey HSD and Dunn are always two-sided.
-
P-Adj Method — multiple comparison correction (Bonferroni, Holm, BH).
-
N Permutations — number of resampling iterations for the permutation test (default 10,000).
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | stats_table |
stat |
Properties: Statistical Method, Alternative, N Permutations, P-Adj Method
Normality Test¶
Tests whether each numerical column in a DataFrame follows a normal distribution.
Details
Tests:
- Shapiro-Wilk — recommended for small to moderate samples
- Kolmogorov-Smirnov — compares against a theoretical normal CDF
- Anderson-Darling — weighted variant sensitive to distribution tails
Outputs:
- results — summary table with test statistic, p-value, and pass/fail per column.
- qq_plot — Q-Q (quantile-quantile) plots for each column. Points following the red dashed reference line indicate normality; systematic curvature suggests non-normal distribution.
Use the Group Column option to test normality per group (e.g. per treatment condition before running a t-test or ANOVA).
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | results |
table |
| Output | qq_plot |
figure |
Properties: Test(s)
Pairwise Matrix¶
Computes a pairwise correlation or distance matrix for all numeric columns and visualises it as a heatmap.
Details
Correlation methods:
- Pearson — linear correlation coefficient, assumes normality
- Spearman — rank-based, robust to outliers and non-normal distributions
- Kendall — rank-based, slower but more exact for small sample sizes
Outputs a matrix table (for further analysis) and an annotated heatmap figure.
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | table |
table |
| Output | figure |
figure |
Properties: Metric, Colormap, ,
Variance Test¶
Tests whether two or more groups have equal variance (homoscedasticity).
Details
Use this to decide between Student's t-test (equal variance) and Welch's t-test (unequal variance), or to check ANOVA assumptions.
Tests:
- Levene's test — robust, works for non-normal data (recommended default)
- Bartlett's test — more powerful but assumes normality
- F-test — classical variance ratio test for exactly 2 groups (sensitive to non-normality)
Outputs a table with test statistic and p-value per group pair (F-test) or for all groups at once (Levene, Bartlett).
A significant p-value (< 0.05) means variances are NOT equal — use Welch's t-test.
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | result |
table |
Properties: Test
Effect Size¶
Calculates effect sizes for pairwise group comparisons.
Details
Measures how large the difference between groups is, complementing p-values from statistical tests. Journals increasingly require effect sizes alongside significance testing.
Methods:
- Auto — Cohen's d for 2 groups, Eta-squared for 3+ groups
- Cohen's d — standardised mean difference (pooled SD)
- Hedges' g — Cohen's d with small-sample bias correction
- Glass's delta — mean difference divided by the control group SD
- Rank-biserial r — effect size for Mann-Whitney U (non-parametric)
- Eta-squared — proportion of variance explained (ANOVA-style)
- Omega-squared — bias-corrected eta-squared
Output columns: group1, group2, n1, n2, effect_size, ci_lower, ci_upper, magnitude, method.
magnitude uses conventional thresholds:
- Cohen's d / Hedges' g / Glass's delta: negligible < 0.2, small < 0.5, medium < 0.8, large >= 0.8
- Eta-squared / Omega-squared: negligible < 0.01, small < 0.06, medium < 0.14, large >= 0.14
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | results |
table |
Properties: Method, CI Level, Bootstrap Iterations
Descriptive Stats¶
Computes comprehensive descriptive statistics for numeric columns.
Details
Calculates per-group (or overall) statistics including central tendency, dispersion, shape, and confidence intervals — everything needed for a publication-ready summary table.
Output columns: group, column, n, mean, median, std, sem, ci_lower, ci_upper, min, q1, q3, max, iqr, skewness, kurtosis, cv.
- group_col — optional grouping column. If set, statistics are computed per group. Leave blank for overall stats.
- value_cols — columns to summarise. Leave blank for all numeric.
- ci_level — confidence interval level (default 0.95).
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | results |
table |
Properties: CI Level
Distribution Fit¶
Fits data to candidate probability distributions and ranks them by goodness-of-fit (AIC / BIC / Kolmogorov-Smirnov).
Details
Select which distributions to test, or use All to try every candidate. The node outputs a ranking table with fitted parameters and a figure overlaying the best-fit PDFs on the empirical histogram.
Candidate distributions: Normal, Log-Normal, Exponential, Gamma, Weibull, Beta, Rayleigh, Uniform, Cauchy, Logistic, Pareto, Student-t, Inverse Gaussian.
Outputs:
-
results — one row per tested distribution with shape/loc/scale params, log-likelihood, AIC, BIC, KS statistic, and KS p-value, sorted by AIC.
-
figure — histogram of the data with top-N best-fit PDF curves overlaid.
| Direction | Port | Type |
|---|---|---|
| Input | in |
table |
| Output | results |
table |
Properties: Distributions, Overlay Top-N, Histogram Bins, Fig Width, Fig Height