Skip to content

refactor(huntsman): Unify TaskId by replacing spider-core's definition with spider-storage's.#331

Open
LinZhihao-723 wants to merge 1 commit into
y-scope:mainfrom
LinZhihao-723:task-id-fix
Open

refactor(huntsman): Unify TaskId by replacing spider-core's definition with spider-storage's.#331
LinZhihao-723 wants to merge 1 commit into
y-scope:mainfrom
LinZhihao-723:task-id-fix

Conversation

@LinZhihao-723
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 commented May 31, 2026

Description

TaskId was defined in both spider-core and spider-storage while:

  • The one defined in spider-core is a wrapper of UUID.
  • The one defined in spider-storage is an enum of {task index, commit, cleanup}.

The definition in spider-storage is the actual definition we use in the current implementation.

Thus, we replace the definition in spider-core with spider-storage's one so that all components can use the same definition.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.

Summary by CodeRabbit

  • Refactor

    • Restructured internal task identification handling to use a more efficient enumeration-based approach instead of the previous type-wrapper pattern.
  • Tests

    • Updated test utilities to work with the new task identification structure.

@LinZhihao-723 LinZhihao-723 requested review from a team and sitaowang1998 as code owners May 31, 2026 21:35
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Walkthrough

This pull request migrates the TaskId type definition from a UUID-wrapper in spider-storage to a semantic enum in spider-core, distinguishing tasks by index position or role. All dependent imports are repointed, and test fixtures are updated to use deterministic TaskId::Index(0) values.

Changes

TaskId type migration

Layer / File(s) Summary
Define TaskId enum in spider-core
components/spider-core/src/types/id.rs
Introduces TaskId as an enum with Index(TaskIndex), Commit, and Cleanup variants, replacing the previous UUID-wrapper type alias. Adds TaskIndex import, derives Serialize/Deserialize, and removes the SignedTaskId alias and associated tests.
Remove TaskId from spider-storage cache module
components/spider-storage/src/cache.rs
Cache module no longer exports TaskId or imports TaskIndex, since the canonical definition now resides in spider-core::types::id.
Update TaskId imports across modules
components/spider-storage/src/cache/job.rs, components/spider-storage/src/task_instance_pool.rs, components/spider-storage/tests/scheduling_infra.rs
All TaskId references now import from spider_core::types::id instead of crate::cache, maintaining type identity and public API signatures.
Update test fixtures to use deterministic TaskId values
components/spider-tdl/src/task.rs, components/spider-tdl/src/task_context.rs, components/spider-tdl/tests/test_task_macro.rs, tests/huntsman/task-executor/src/lib.rs, tests/huntsman/task-executor/tests/test_process_pool.rs, tests/huntsman/tdl-integration/tests/complex.rs
Test helpers and unit tests construct TaskContext and ExecuteRequest with fixed TaskId::Index(0) instead of random TaskId::new() values, ensuring deterministic test behaviour and accommodating the new enum definition.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • sitaowang1998
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the primary change: unifying the TaskId definition across components by replacing spider-core's UUID-based definition with spider-storage's enum-based definition.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
components/spider-tdl/src/task_context.rs (1)

30-40: ⚡ Quick win

Cover Commit and Cleanup in this round-trip test.

TaskContext.task_id now has three wire shapes, but this test only exercises TaskId::Index(0). A small table-driven round-trip over Index, Commit, and Cleanup would keep the termination-task serialization path covered too.

🧪 Suggested test shape
 #[test]
 fn round_trip_msgpack() -> anyhow::Result<()> {
-    let ctx = TaskContext {
-        job_id: JobId::new(),
-        task_id: TaskId::Index(0),
-        task_instance_id: 13,
-        resource_group_id: ResourceGroupId::new(),
-    };
-    let encoded = rmp_serde::to_vec(&ctx)?;
-    let decoded: TaskContext = rmp_serde::from_slice(&encoded)?;
-    assert_eq!(decoded, ctx);
+    for task_id in [TaskId::Index(0), TaskId::Commit, TaskId::Cleanup] {
+        let ctx = TaskContext {
+            job_id: JobId::new(),
+            task_id,
+            task_instance_id: 13,
+            resource_group_id: ResourceGroupId::new(),
+        };
+        let encoded = rmp_serde::to_vec(&ctx)?;
+        let decoded: TaskContext = rmp_serde::from_slice(&encoded)?;
+        assert_eq!(decoded, ctx);
+    }
     Ok(())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-tdl/src/task_context.rs` around lines 30 - 40, Update the
round_trip_msgpack test to iterate over the three TaskId wire shapes so Commit
and Cleanup are covered: create a small table (or array) of TaskId variants
including TaskId::Index(0), TaskId::Commit(some_id) and
TaskId::Cleanup(some_id), and for each variant construct a TaskContext (using
the existing fields like job_id, task_instance_id, resource_group_id), serialize
with rmp_serde::to_vec and deserialize with rmp_serde::from_slice, and
assert_eq!(decoded, original) for every case; modify the round_trip_msgpack
function to loop over these TaskId variants so the termination-task
serialization paths (Commit and Cleanup) are exercised.
tests/huntsman/task-executor/src/lib.rs (1)

186-202: ⚡ Quick win

Update docstring to reflect fixed task_id value.

The docstring states "The id fields are fresh per call", but task_id is now fixed at TaskId::Index(0) while job_id and resource_group_id remain fresh. The docstring should clarify this distinction.

📝 Proposed docstring update
 /// # Returns
 ///
-/// A placeholder msgpack-encoded [`TaskContext`] suitable for a one-shot test invocation. The id
-/// fields are fresh per call but the executor doesn't inspect them.
+/// A placeholder msgpack-encoded [`TaskContext`] suitable for a one-shot test invocation. The
+/// `job_id` and `resource_group_id` fields are fresh per call; `task_id` is fixed at `Index(0)`.
+/// The executor doesn't inspect these values.
 ///
 /// # Panics
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/huntsman/task-executor/src/lib.rs` around lines 186 - 202, Update the
docstring for build_ctx to accurately reflect that not all id fields are fresh:
explain that job_id and resource_group_id are newly generated per call
(JobId::new(), ResourceGroupId::new()) while task_id is fixed to
TaskId::Index(0) and task_instance_id is 1; mention that encoding may panic on
failure as before and keep the must_use note.
tests/huntsman/task-executor/tests/test_process_pool.rs (1)

66-90: ⚡ Quick win

Update docstring to clarify which IDs are fresh.

The docstring states "fresh IDs", but task_id is now fixed at TaskId::Index(0) while job_id and resource_group_id remain fresh. Consider updating the docstring for clarity.

📝 Proposed docstring update
 /// # Returns
 ///
-/// A request with fresh IDs, a placeholder [`TimeoutPolicy`] (which the pool ignores — the caller
-/// supplies `hard_timeout` directly to [`ProcessPool::execute`]), and the supplied `inputs`.
+/// A request with fresh `job_id` and `resource_group_id`, fixed `task_id` at `Index(0)`, a
+/// placeholder [`TimeoutPolicy`] (which the pool ignores — the caller supplies `hard_timeout`
+/// directly to [`ProcessPool::execute`]), and the supplied `inputs`.
 fn make_request(task_func: &str, inputs: Vec<TaskInput>) -> ExecuteRequest {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/huntsman/task-executor/tests/test_process_pool.rs` around lines 66 -
90, The docstring for make_request is misleading: it says "fresh IDs" but only
job_id and resource_group_id are generated fresh while task_id is fixed to
TaskId::Index(0); update the function's docstring to explicitly state which IDs
are fresh (job_id and resource_group_id) and that task_id is intentionally fixed
to TaskId::Index(0) so callers understand the behavior; reference the
make_request function and the fields job_id, resource_group_id, and task_id in
the updated comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@components/spider-tdl/src/task_context.rs`:
- Around line 30-40: Update the round_trip_msgpack test to iterate over the
three TaskId wire shapes so Commit and Cleanup are covered: create a small table
(or array) of TaskId variants including TaskId::Index(0),
TaskId::Commit(some_id) and TaskId::Cleanup(some_id), and for each variant
construct a TaskContext (using the existing fields like job_id,
task_instance_id, resource_group_id), serialize with rmp_serde::to_vec and
deserialize with rmp_serde::from_slice, and assert_eq!(decoded, original) for
every case; modify the round_trip_msgpack function to loop over these TaskId
variants so the termination-task serialization paths (Commit and Cleanup) are
exercised.

In `@tests/huntsman/task-executor/src/lib.rs`:
- Around line 186-202: Update the docstring for build_ctx to accurately reflect
that not all id fields are fresh: explain that job_id and resource_group_id are
newly generated per call (JobId::new(), ResourceGroupId::new()) while task_id is
fixed to TaskId::Index(0) and task_instance_id is 1; mention that encoding may
panic on failure as before and keep the must_use note.

In `@tests/huntsman/task-executor/tests/test_process_pool.rs`:
- Around line 66-90: The docstring for make_request is misleading: it says
"fresh IDs" but only job_id and resource_group_id are generated fresh while
task_id is fixed to TaskId::Index(0); update the function's docstring to
explicitly state which IDs are fresh (job_id and resource_group_id) and that
task_id is intentionally fixed to TaskId::Index(0) so callers understand the
behavior; reference the make_request function and the fields job_id,
resource_group_id, and task_id in the updated comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0b475c7f-27cf-4706-8084-7473253a5015

📥 Commits

Reviewing files that changed from the base of the PR and between 27091f0 and 5650d5a.

📒 Files selected for processing (11)
  • components/spider-core/src/types/id.rs
  • components/spider-storage/src/cache.rs
  • components/spider-storage/src/cache/job.rs
  • components/spider-storage/src/task_instance_pool.rs
  • components/spider-storage/tests/scheduling_infra.rs
  • components/spider-tdl/src/task.rs
  • components/spider-tdl/src/task_context.rs
  • components/spider-tdl/tests/test_task_macro.rs
  • tests/huntsman/task-executor/src/lib.rs
  • tests/huntsman/task-executor/tests/test_process_pool.rs
  • tests/huntsman/tdl-integration/tests/complex.rs
💤 Files with no reviewable changes (1)
  • components/spider-storage/src/cache.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant