Skip to content

Distributions: Defining the Rules of Chance

Abstract

Distributions serve as the probabilistic blueprints for your simulation's uncertainty. By defining the "shape" of possible variations, they enable rigorous Monte Carlo analysis. This guide details the mechanics of Salted Seeds for deterministic reproducibility, the distinction between configuration registries (DistributionDict) and result registries (NamedValueDict), and the core workflows for both locked and repeated sampling.


A Distribution is a mathematical recipe for generating random numbers. While a NamedValue represents a single point in time, a Distribution represents the "shape" of all possible values.

The "What" and the "Why"

Simulations are often used for Monte Carlo analysis where the same task is run hundreds of times with slight variations to see how often it fails. Distributions define those variations.

Instead of saying "the floor is slippery," you define a UniformDistribution for friction between 0.1 and 0.4. The system will then pick a new, valid number for every trial.

Advanced Features

Repeatable Randomness (The "Salted" Seed)

To make sure your results are repeatable (great for debugging), we use a "salted" seeding method. If you provide a single global seed, the system automatically mixes it with other parameters to create a unique local seed for every draw.

The three ingredients in the "Salt" are:

  1. Global Seed: Controls the broad "campaign." Change this to get a totally different set of results.
  2. Distribution Name: Ensures unique draws for different parameters. Without this, two parameters with the same config (like x_offset and y_offset, if using the same dispersion) would produce identical, coupled values.
  3. Trial Number: Ensures that every iteration in your Monte Carlo run gets a unique value from the dispersion.

The Registry: DistributionDict

A DistributionDict acts as a centralized record of the "rules" used during a simulation. While a NamedValueDict stores the results (the numbers), the DistributionDict stores the config (the math).

This is critical for:

  • Serialization: Saving exactly what settings were used so a colleague can recreate the simulation.
  • Bulk Updates: Changing the trial_num for every distribution at once as the simulation progresses.

Supported Distribution Types

  • Normal: The classic Bell Curve for natural variation.
  • Uniform: For strict ranges where any value is equally likely.
  • Discrete Uniform: Similar to a uniform distribution, but only allows for integers to be returned.
  • Categorical: To pick from a fixed set of named choices (e.g., Materials).
  • Bernoulli: A simple True/False coin flip.
  • Truncated Normal: A Bell Curve with hard physical limits (e.g., mass cannot be negative).
  • Log Normal: For positive values with "long-tail" outliers (e.g., contact forces).
  • Triangular: A simpler alternative to Normal when you only know min, max, and peak.
  • Poisson / Exponential: For modeling the frequency or time between random events.
  • Rayleigh: For modeling the magnitude of a 2D vector with independent Normal components (e.g., radial positioning error).
  • Permutation: To return a shuffled version of a master list.

Example: Sampling into Registries

When using sample_and_update_dicts, the system checks if a value with that name already exists in your NamedValueDict. If it does, it returns the existing value instead of drawing a new one. This ensures all parts of your simulation use the same "random" choice for a single trial.

Python
from numpydantic import NDArray

import stochas

# 1. Define the rule
motor_rule = stochas.NormalDistribution(
    name=stochas.DistName("motor_torque"),
    mu=5.0,
    sigma=0.2,
)

# 2. Setup the registries
rules = stochas.DistributionDict()
results = stochas.NamedValueDict[NDArray]()

# 3. Sample and Register
# This returns a NamedValue and saves it to 'results'
val_1 = motor_rule.sample_and_update_dicts(
    dist_dict=rules,
    named_value_dict=results,
).squeeze()

# 4. Subsequent calls return the SAME value
val_2 = motor_rule.sample_and_update_dicts(
    dist_dict=rules,
    named_value_dict=results,
).squeeze()

print(val_1.value == val_2.value)  # True

Example: Repeated Sampling

If you need to pull many different random numbers from the same distribution without locking them into a registry (e.g., for noise injection or redraws to meet some constraint), use the .sample() or .draw() methods directly. These move the Random Number Generator forward with every call.

Python
rules = stochas.DistributionDict()
results = stochas.NamedValueDict[NDArray]()

friction_rule = (
    stochas.UniformDistribution(
        name=stochas.DistName("friction"),
        low=0.2,
        high=0.4,
    )
    .with_seed(42)  # using set seed
    .with_trial_num(10)  # and set trial_num
)

# These will all be different random numbers
draw_1 = friction_rule.sample_to_named_value().squeeze()  # 0.378
draw_2 = friction_rule.sample_to_named_value().squeeze()  # 0.205
draw_3 = friction_rule.sample_to_named_value().squeeze()  # 0.216

print(draw_1.value, draw_2.value, draw_3.value)  # Three different values

# just be sure to add them to a collection when done!
rules.update(friction_rule)
results.update(draw_3)

Generating Report Tables

When a simulation uses many distributions, it is useful to export a human-readable summary for inclusion in a design document or report. DistributionDict supports this with the to_tables() method.

Categorizing Distributions

Each distribution has a category field. Setting it to a meaningful label groups related distributions together. One subdirectory is created per unique category value, and each distribution type within it gets its own CSV file.

Python
    from pathlib import Path

    import stochas

    dist_dict = stochas.DistributionDict()
    dist_dict.update(
        stochas.NormalDistribution(
            name=stochas.DistName("link_mass"),
            mu=1.5,
            sigma=0.15,
            category="link_properties",
        )
    )
    dist_dict.update(
        stochas.NormalDistribution(
            name=stochas.DistName("link_inertia"),
            mu=0.02,
            sigma=0.002,
            category="link_properties",
        )
    )
    dist_dict.update(
        stochas.TruncatedNormalDistribution(
            name=stochas.DistName("link_length"),
            mu=0.25,
            sigma=0.01,
            low=0.0,
            category="link_properties",
        )
    )
    dist_dict.update(
        stochas.UniformDistribution(
            name=stochas.DistName("init_joint_ang"),
            low=-0.5,
            high=0.5,
            category="initial_conditions",
        )
    )
    dist_dict.to_tables(Path("report_tables"))

Tip

Any distribution that does not have category set will land in an uncategorized/ subdirectory.

Output Format

to_tables(directory) creates one subdirectory per category, then writes one .csv file per distribution type within it. The filename is the distribution type name (e.g. normal.csv, truncated_normal.csv). Each file is a flat table whose columns are Name plus every parameter specific to that type.

The example above produces the following layout:

report_tables/
├── link_properties/
│   ├── normal.csv
│   └── truncated_normal.csv
└── initial_conditions/
    └── uniform.csv
link_properties/normal.csv
Name Units mu sigma
link_mass kg 1.5 0.15
link_inertia kg·m² 0.02 0.002
Text Only
Name,Units,mu,sigma
link_mass,kg,1.5,0.15
link_inertia,kg·m²,0.02,0.002
link_properties/truncated_normal.csv
Name Units mu sigma low high
link_length m 0.25 0.01 0 inf
Text Only
Name,Units,mu,sigma,low,high
link_length,m,0.25,0.01,0.0,inf