group by v2 by Honglei-Qiu · Pull Request #680 · PaddlePaddle/GraphNet

Honglei-Qiu · 2026-03-23T08:50:42Z

PR Category

Feature Enhancement

Description

新增group分组规则

paddle-bot · 2026-03-23T08:50:49Z

Thanks for your contribution!

sqlite/graph_net_sample_groups_insert2.py

Xreki

统计下每种分组方法，都能产生多少个group

sqlite/graph_net_sample_groups_insert_v2.py

Xreki · 2026-03-24T01:59:39Z

sqlite/graph_net_sample_groups_insert_v2.py

+    WHERE s.deleted = 0
+      AND s.sample_type != 'full_graph'
+) sub
+WHERE sub.rn = 1


这个判断的作用是什么？

去重吧，防止一个sample被重复选取

Xreki · 2026-03-24T02:01:19Z

sqlite/graph_net_sample_groups_insert_v2.py

+
+def get_v2_group_members(candidates: list[CandidateGraph], num_dtypes: int):
+    # Index candidates by op_seq
+    by_op_seq = defaultdict(list)


优化下所有的变量命名

sqlite/graph_net_sample_groups_insert_v2.py

Xreki · 2026-03-24T06:13:32Z

sqlite/graph_net_sample_groups_insert.py

+        b.input_shapes_bucket_id,
+        b.input_dtypes_bucket_id,
+        s.graph_hash,
+        ROW_NUMBER() OVER (


graph_hash不需要了吧？ROW_NUMBER在这里的作用是什么？

在每个 (op_seq, shapes, dtypes) 分区内，按创建时间排序编号，然后只取 rn = 1（最早的那条）。作用是桶内去重：同一个桶里可能有多个样本，只保留一个代表。
不过现在代码改了很多

Xreki · 2026-03-24T06:16:01Z

sqlite/graph_net_sample_groups_insert.py

+    """
+
+    # Index candidates by op_seq
+    by_op_seq = defaultdict(list)


by_op_seq这样的变量名太抽象了

candidates_by_op_seq，润色一下

Xreki · 2026-03-24T06:17:03Z

sqlite/graph_net_sample_groups_insert.py

+    for c in candidates:
+        by_op_seq[c.op_seq_bucket_id].append(c)
+
+    rule3_selected_uids = set()


不要以relux_这样的方式命名

Copilot

Pull request overview

This PR enhances the SQLite grouping insertion script by adding “group by v2” rules to generate additional graph_net_sample_groups beyond the existing v1 bucket-based grouping.

Changes:

Refactors graph_net_sample_groups_insert.py into clearer query/generation/insert phases with per-rule stats output.
Adds v2 grouping logic (Rule 4 dtype coverage + Rule 3 sparse sampling) controlled by --num_dtypes.
Updates DB access to use a read-only query() path via sqlite3 alongside SQLAlchemy inserts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T02:28:20Z

sqlite/graph_net_sample_groups_insert.py

+SELECT
+    sub.sample_uid,
+    sub.op_seq_bucket_id,
+    sub.input_shapes_bucket_id,
+    sub.sample_type,
+    group_concat(sub.sample_uid, ',') AS all_uids
+FROM (
+    SELECT
+        s.uuid AS sample_uid,
+        s.sample_type,
+        b.op_seq_bucket_id,
+        b.input_shapes_bucket_id
+    FROM graph_sample s
+    JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid
+    ORDER BY s.create_at ASC, s.uuid ASC
+) sub
+GROUP BY sub.sample_type, sub.op_seq_bucket_id, sub.input_shapes_bucket_id;


query_bucket_groups selects sub.sample_uid as the bucket "head" without aggregating it or using a deterministic window function. In SQLite, selecting a non-GROUP BY column in an aggregate query can return an arbitrary row, so head_uid (and thus Rule 2 heads) may be nondeterministic. Consider selecting the head via MIN(...)/MAX(...) on a stable key, or using a window function to pick the first row by (create_at, uuid) and aggregating the rest separately.

Suggested change

SELECT

sub.sample_uid,

sub.op_seq_bucket_id,

sub.input_shapes_bucket_id,

sub.sample_type,

group_concat(sub.sample_uid, ',') AS all_uids

FROM (

SELECT

s.uuid AS sample_uid,

s.sample_type,

b.op_seq_bucket_id,

b.input_shapes_bucket_id

FROM graph_sample s

JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid

ORDER BY s.create_at ASC, s.uuid ASC

) sub

GROUP BY sub.sample_type, sub.op_seq_bucket_id, sub.input_shapes_bucket_id;

WITH buckets AS (

SELECT

s.uuid AS sample_uid,

s.sample_type,

b.op_seq_bucket_id,

b.input_shapes_bucket_id,

FIRST_VALUE(s.uuid) OVER (

PARTITION BY s.sample_type, b.op_seq_bucket_id, b.input_shapes_bucket_id

ORDER BY s.create_at ASC, s.uuid ASC

) AS head_uid

FROM graph_sample s

JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid

)

SELECT

MIN(head_uid) AS head_uid,

op_seq_bucket_id,

input_shapes_bucket_id,

sample_type,

group_concat(sample_uid, ',') AS all_uids

FROM buckets

GROUP BY sample_type, op_seq_bucket_id, input_shapes_bucket_id;

Copilot · 2026-03-30T02:28:21Z

sqlite/graph_net_sample_groups_insert.py

+        b.op_seq_bucket_id,
+        b.input_shapes_bucket_id
+    FROM graph_sample s
+    JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid


query_bucket_groups doesn’t filter out deleted samples or full_graph samples (unlike query_v2_candidates). If graph_net_sample_buckets contains rows for deleted/full_graph samples (e.g., from older runs), this script will generate groups for data that is supposed to be excluded. Add WHERE s.deleted = 0 AND s.sample_type != 'full_graph' (or equivalent) to keep v1/v2 selection consistent.

Suggested change

JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid

JOIN graph_net_sample_buckets b ON s.uuid = b.sample_uid

WHERE s.deleted = 0

AND s.sample_type != 'full_graph'

sqlite/graph_net_sample_groups_insert.py

group by v2

95b4b92

group by v2

6185419

Xreki reviewed Mar 23, 2026

View reviewed changes

sqlite/graph_net_sample_groups_insert2.py Outdated Show resolved Hide resolved

sqlite/graph_net_sample_groups_insert2.py Outdated Show resolved Hide resolved

Honglei-Qiu added 4 commits March 23, 2026 13:26

group by v2

91a7389

group by v2

e444c2e

group by all

a13a356

group by all

b4a098a

Xreki reviewed Mar 24, 2026

View reviewed changes

Honglei-Qiu added 4 commits March 24, 2026 07:05

group by all

238d966

group by all

489af43

group by all

05e5d0f

group by all

f784701

Xreki requested a review from Copilot March 30, 2026 02:24

Copilot started reviewing on behalf of Xreki March 30, 2026 02:25 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

group by v1 and v2

c25d8d1

Conversation

Honglei-Qiu commented Mar 23, 2026

PR Category

Description

Uh oh!

paddle-bot bot commented Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants