Skip to content

Consider adding a de-duplication step to our rankers #10463

@sjrl

Description

@sjrl

As a follow up to #10441 it will be possible to connect multiple outputs retrievers directly to downstream components like Rankers without a Joiner in-between.

However, a useful thing the DocumentJoiner did was de-duplicate any duplicate documents that may been retrieved by both upstream retrievers (e.g. BM25 + Embedding). So we may want to add a de-duplication feature to common downstream components that may receive multiple input connections.

We should consider how this should be added. I.e. should this become default behavior and should this be toggleable via an init param.

My impression making this the de-duplication behavior should be fine since I don't think there is an expected use case where you want the same document returned multiple times.

Metadata

Metadata

Assignees

Labels

P1High priority, add to the next sprint

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions