Skip to content

Feat/seed for structure generation#336

Merged
Gitdowski merged 10 commits intomainfrom
feat/seed-for_structure-generation
May 8, 2026
Merged

Feat/seed for structure generation#336
Gitdowski merged 10 commits intomainfrom
feat/seed-for_structure-generation

Conversation

@Gitdowski
Copy link
Copy Markdown
Contributor

Following PR #332, where a random seed to the get_structure_dict() has been added, this PR adds the new input argument structure_seed to the API workflow. If now a replica simulation is started with the same composition, system size and settings, but with a different structure_seed it will trigger a new simulation.
Also, the structure_seed is also part of the hashing. Therefore, an existing database needs to be migrated.

  • added structure_seed to the MeltQuenchParams class and the generate_structure()
  • added tests

@Gitdowski Gitdowski requested a review from Copilot May 6, 2026 13:20
@github-actions github-actions Bot added the type: feature Changelog: new feature or performance improvement → bumps minor version label May 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Gitdowski
Copy link
Copy Markdown
Contributor Author

For reference, at the current stage, a database can be migrated (requires write access) using:

# run once with app env
from amorphouspy_api.database import get_job_store, Job
from amorphouspy_api.models import JobSubmission
from amorphouspy_api.routers.jobs_helpers import _job_hash

store = get_job_store()
with store.session() as s:
    rows = s.query(Job).filter(Job.status == "completed").all()
    for row in rows:
        req = row.request_data or {}
        sim = req.get("simulation", {})
        if "structure_seed" not in sim:
            sim["structure_seed"] = 42
            req["simulation"] = sim
            sub = JobSubmission(**req)
            new_hash = _job_hash(sub, row.composition)
            row.request_hash = new_hash
            row.request_data = req
    s.commit()

@Gitdowski Gitdowski requested a review from ltalirz May 6, 2026 13:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the API workflow to accept a structure_seed parameter for structure generation, ensuring that identical compositions/settings can produce different initial structures (and thus distinct jobs) when the seed changes, and that the seed is included in job hashing for caching/deduplication.

Changes:

  • Add structure_seed to MeltQuenchParams and include it in the job hash via submission.simulation.model_dump().
  • Forward structure_seed to core structure generation as random_seed in generate_structure().
  • Add API and workflow tests to verify hashing, forwarding, and request acceptance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
amorphouspy_api/src/amorphouspy_api/models.py Adds structure_seed to the simulation parameter model exposed by the API.
amorphouspy_api/src/amorphouspy_api/workflows/meltquench.py Passes the new seed through to get_structure_dict(..., random_seed=...) during structure generation.
amorphouspy_api/src/tests/test_jobs.py Adds tests covering hash changes, seed forwarding, and POST acceptance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amorphouspy_api/src/amorphouspy_api/models.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@Atilaac
Copy link
Copy Markdown
Contributor

Atilaac commented May 6, 2026

Thanks @Gitdowski, can you add this to the documentation as well!

@Gitdowski
Copy link
Copy Markdown
Contributor Author

@Atilaac: Good catch. Is this another place in the documentation where you would add a word on this, or is this fine?

Comment thread docs/theory/structure.md Outdated

If specified manually, using (combinations of) an unrealistically high `density`, too high `min_distance`, too many `target_atoms` or `n_molecules`, or too few `max_attempts_per_atom` can lead to placement failures.

Internally, the random placement of atoms is controlled by the `structure_seed` parameter. This ensures reproducibility on the one hand. On the other hand, if statistics are checked and the same system is simulated several times it is recommended to use different seeds for each run to get a better sampling of the configuration space.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check: at the moment, if I specify the same structure_seed, do I get the same structure?
My suspicion would be: yes, on the same computer; maybe not on different computers.

What you may want to communicate here is: if you use a fixed structure_seed , you may get the same structure (not guaranteed). If you use a different structure_seed, you are guaranteed to get a different structure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the RNG to explicitely call the PCG64, PreComputed Generator, instead of relying on the default_rng() (which currently is the PCG64). Therefore, we are safe in case numpy decides to switch to a different RNG for a future release.
Furthermore, PCG64 is frozen and designed to be reproducible across different CPU achitectures and operating systems.

I just checked this for a small system on two different machines. With the default seed the same structures are generated on both machines. Also using a non-default seed produces the same structures on both machines (and of course different from the structures generated by the default seed).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! So is it still true that the result depends on Python/numpy version or no longer?

In any case, good to merge from my side

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if I undersand it correctly, this should be reproducible even across different python/numpy versions, as long as numpy version is 1.17 or newer. They explicitely fixed this.

From Copilot: Caveats (that we are currently not using in our code):

  1. If the call order of commands is changed, this changes the results. Eg,
rng.random()
rng.normal()

is not the same as

rng.normal()
rng.random()
  1. If we employ parallel or distributed usage. For example when using:
  • SeedSequence.spawn
  • multiprocessing
  • joblib
  • MPI

Reproducibility then depends on:

  • stable process count
  • stable spawning order

This is deterministic but configuration‑sensitive.

@Gitdowski Gitdowski merged commit 6e7a78b into main May 8, 2026
10 checks passed
@Gitdowski Gitdowski deleted the feat/seed-for_structure-generation branch May 8, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature Changelog: new feature or performance improvement → bumps minor version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants