Conversation
📝 WalkthroughWalkthroughAdds a full embed extraction pipeline (gist, X/Twitter posts, video providers, HTML), DB schema and migration for embed entities, S3 upload/remove support, worker fetchers and tests, API response enrichment, GitHub helper, and migrates TypeBox imports/dependencies from Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API
participant Worker as Worker
participant GitHub as GitHub
participant DB as DB
participant S3 as S3
Client->>API: enqueue /tasks/url-metadata (gist URL)
API->>Worker: job received
Worker->>GitHub: GET /gists/{id}
GitHub-->>Worker: gist metadata + files
Worker->>S3: upload file contents (MD5 keys)
S3-->>Worker: content keys
Worker->>DB: upsert gist + file rows, delete stale rows
Worker->>S3: remove deleted files
Worker->>DB: update url_metadata (gistId, embedType)
Worker-->>API: job complete
API->>DB: query enriched url_metadata
API-->>Client: enriched response (metadata + embed)
sequenceDiagram
participant Client as Client
participant API as API
participant Worker as Worker
participant FX as FXTwitter
participant Images as ImageProcessor
participant S3 as S3
participant DB as DB
Client->>API: enqueue /tasks/url-metadata (tweet URL)
API->>Worker: job received
Worker->>FX: GET /{user}/status/{id}
FX-->>Worker: tweet + media URLs
Worker->>Images: process avatar -> upload
Images->>S3: upload avatar
S3-->>Worker: avatar key
Worker->>Images: process media -> upload
Images->>S3: upload media
S3-->>Worker: media key
Worker->>DB: upsert url_metadata_post and url_metadata (embedType=post)
Worker-->>API: job complete
API->>DB: query enriched url_metadata
API-->>Client: enriched response (author, content, images)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (4)
pnpm-workspace.yaml (1)
6-6: TypeBox migration is complete; drift check would be a good preventive measure.All source files are using the new
typeboxpackage correctly, and no legacy@sinclair/typeboxreferences remain. A CI check to prevent reintroduction of the legacy package name is recommended as a future safeguard, but not required now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pnpm-workspace.yaml` at line 6, Add a preventive CI drift check to ensure the legacy package name '@sinclair/typebox' is never reintroduced: create a CI job (or an npm script referenced by CI) named something like "check-typebox" that searches the repository for the string '@sinclair/typebox' (excluding node_modules/build artifacts) and fails the job if any matches are found; this should be triggered on PRs and main branch updates and references the package names "typebox" and "@sinclair/typebox" so reviewers can quickly locate the check and its intent.apps/worker/src/tasks/url-metadata/processor.ts (1)
10-10: Consider adding an "html" embed type for completeness.The
UrlMetadataEmbedTypedefines "gist" | "video" | "post", but the HTML fallback path leavesembedTypeasundefined. For clarity and future filtering/querying, you might want to include "html" in the union:♻️ Suggested addition
-type UrlMetadataEmbedType = "gist" | "video" | "post"; +type UrlMetadataEmbedType = "gist" | "video" | "post" | "html";And then set it explicitly in the else branch (line 28-29):
} else { + embedType = "html"; embedPromise = getEmbedDataFromHtml(jobId, inputUrl, signal); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/processor.ts` at line 10, The UrlMetadataEmbedType union should include an "html" variant and the fallback assignment should set embedType = "html" instead of leaving it undefined; update the type alias UrlMetadataEmbedType to include "html" and modify the fallback branch in the URL metadata parsing logic (the code that currently leaves embedType undefined) to explicitly assign "html" so downstream filtering/queries can rely on a concrete embedType.apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts (1)
3-25: Consider relaxing literal types for external API fields.The
provider_url: "https://vimeo.com/"is typed as a literal string. If Vimeo's API ever returns a slightly different value (e.g., without trailing slash), it could cause type mismatches at runtime with stricter type checking. Consider usingstringfor fields that aren't critical to the application logic.♻️ Suggested type relaxation
export interface VimeoOEmbedResponse { type: "video"; version: "1.0"; provider_name: "Vimeo"; - provider_url: "https://vimeo.com/"; + provider_url: string;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts` around lines 3 - 25, The VimeoOEmbedResponse interface uses literal string types for external API fields (e.g., provider_url: "https://vimeo.com/") which is brittle; update the interface by changing those literal types to more permissive types (for example provider_url: string and similarly relax provider_name, version, and type if they are not critical) so the shape accepts minor API variations while keeping strongly typed fields like title, width, height, duration, thumbnail_url, video_id, and uri unchanged.apps/worker/src/tasks/url-metadata/common.ts (1)
17-17: Consider lazy initialization for the S3 bucket.The top-level
awaitruns at module import time, which means:
- Any import failure from
s3.ensureBucketwill prevent the module from loading.- The bucket is created even if the importing code never uses it.
- Test mocking becomes trickier since this runs before test setup.
For a worker context this may be acceptable, but lazy initialization would be more resilient:
♻️ Suggested lazy initialization pattern
-export const BUCKET = await s3.ensureBucket(env.S3_BUCKET); +let _bucket: Awaited<ReturnType<typeof s3.ensureBucket>> | undefined; + +export async function getBucket() { + if (!_bucket) { + _bucket = await s3.ensureBucket(env.S3_BUCKET); + } + return _bucket; +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/common.ts` at line 17, Replace the top-level awaited export with a lazy initializer: remove the top-level "export const BUCKET = await s3.ensureBucket(env.S3_BUCKET)" and add an async getter (e.g., getBucket or ensureBucketInitialized) that caches the bucket after calling s3.ensureBucket(env.S3_BUCKET) on first use; update callers to call this getter instead of referencing BUCKET directly so the bucket creation runs lazily and is easier to mock/test and to avoid module-import side effects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/routes/tasks/url-metadata.ts`:
- Around line 107-109: Replace the use of logical OR with nullish coalescing for
optional numeric fields so valid 0 values are preserved: change occurrences of
"width: width || undefined" and "height: height || undefined" (and the analogous
occurrences at the later lines) to use "width: width ?? undefined" and "height:
height ?? undefined"; keep altText as "altText || undefined" only if empty
string should be treated as missing, otherwise change altText to "altText ??
undefined". Update the assignments where these symbols (width, height, altText)
are used in the URL metadata construction and at the later lines mentioned to
use ?? accordingly.
- Around line 113-115: The mapGist function currently throws when the side-table
row is missing; instead, change the behavior in mapGist (and the analogous
functions that throw for missing side rows—e.g., the mapPost path and the other
URL metadata mappers around the later throw sites) to return undefined when
db.query.urlMetadataGist.findFirst(...) (or the equivalent db query) returns no
row, and call the existing requeue/refresh helper (use the module's
enqueue/refresh function used elsewhere for metadata refresh) to schedule an
async refresh rather than throwing; ensure callers of mapGist handle undefined
as a cache-hit fallback so the request path does not 500.
- Line 237: The code currently force-casts result.embedType using "embedType:
result.embedType as 'post' | 'gist' | 'video' | undefined" which is unsafe;
replace this with a guarded mapper: add a small type‑guard (e.g.,
isValidEmbedType(value): value is 'post'|'gist'|'video') and set embedType to
isValidEmbedType(result.embedType) ? result.embedType : undefined (or handle
invalid values explicitly), updating the assignment site where embedType is set
from result.embedType so the response contract is enforced without unsafe
casting.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts`:
- Line 42: twitchHosts currently lists ["clips.twitch.tv", "twitch.tv"] but
misses "www.twitch.tv", causing valid Twitch URLs to be ignored; update the
twitchHosts export (the twitchHosts constant in
apps/worker/src/tasks/url-metadata/data-providers/twitch.ts) to include
"www.twitch.tv" in the array alongside the existing entries so URL
matching/embedding recognizes both forms.
- Around line 29-38: The current fallback uses the page URL (embedUrl = url)
which produces invalid Twitch iframe src; instead detect non-clip Twitch URLs
and build a proper player endpoint (use "https://player.twitch.tv/" with either
?channel=<channelName> for channel pages or ?video=v<videoId> for video pages /
collections), preserve/add the parent query param on that player URL (use
embedUrl.searchParams.set("parent", parent)), and update the returned html to
use width="400" (minimum) and height="300"; change references to embedUrl and
url where the iframe src is constructed and handle the TODO for video/collection
paths by mapping path segments to the correct channel or video query parameter
before returning html.
In `@apps/worker/src/tasks/url-metadata/data-providers/video.ts`:
- Line 37: Exported videoHosts currently combines youtubeHosts and vimeoHosts
but omits twitchHosts, preventing Twitch URLs from being routed into the video
embed path despite getVideoDataFromUrl supporting Twitch; update the exported
videoHosts (symbol: videoHosts) to include twitchHosts alongside youtubeHosts
and vimeoHosts so Twitch URLs are recognized, and verify getVideoDataFromUrl
still handles Twitch-specific logic (symbols: getVideoDataFromUrl, twitchHosts).
In `@apps/worker/src/tasks/url-metadata/data-providers/youtube.ts`:
- Around line 27-47: The code builds youtubeUrl using the variable videoId
without ensuring it was set; update the parsing logic around videoId in
apps/worker/src/tasks/url-metadata/data-providers/youtube.ts (the videoId
variable and the youtubeUrl construction) to guard against missing/empty IDs:
after the if/else blocks that set videoId, check that videoId is a non-empty
string and if not, return null (or otherwise handle the invalid case) instead of
proceeding to construct `https://www.youtube.com/watch?v=${videoId}`.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.test.ts`:
- Around line 33-35: The test uses a simplified mock chain
((db.delete(urlMetadataGistFile).where(undefined).returning as Mock)) that hides
the real where clause used by the implementation (and(eq(...),
not(inArray(...))); update the mock in getEmbedDataFromGist.test.ts to reflect
or assert the actual where predicate: either mock
db.delete(urlMetadataGistFile).where with a matcher that checks the passed
predicate (e.g., capture the where arg and verify it composes and(eq(...),
not(inArray(...)))) or replace the .where(undefined) with a more explicit
mockImplementation that validates the incoming where function/criteria before
returning Promise.resolve([]); alternatively, if simplifying is intentional add
a brief comment explaining why the simplified mock covers the case and cannot
hide regressions, referencing db.delete and urlMetadataGistFile so reviewers can
locate the change.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.ts`:
- Around line 25-33: The function currently logs an invalid gist URL but
continues to call github.getGistById; change the invalid-path branch (when
!githubUsername || !gistId) to return immediately (e.g., return undefined) after
logging to avoid the network call and further processing instead of only setting
error = true; ensure this prevents execution of github.getGistById and
downstream code that expects valid gistId/inputUrl.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.test.ts`:
- Around line 97-101: The test incorrectly passes a URL string as the jobId to
getEmbedDataFromPost; update the test to pass a distinct mock job ID (e.g.,
"test-job-id" or similar) as the first argument while keeping the existing URL
(new URL(fakeFxTwitterUrl)) and AbortSignal, so the call mirrors production
usage and avoids conflating jobId with the input URL when generating S3 keys.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts`:
- Around line 33-37: The code in getEmbedDataFromPost uses new URL(avatarUrl)
and new URL(image.url) directly (variables avatarUrl and image.url) which can
throw synchronously; wrap the URL construction in a try/catch or pre-validate
the string before calling processImages so malformed URLs are skipped or
converted to undefined instead of throwing, and ensure any skipped images are
filtered out before calling processImages([ ... ]) and before mapping into new
URL for images so the downstream .catch handlers are not bypassed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts`:
- Around line 37-45: In getEmbedDataFromVideo, protect parsing of
videoData.thumbnail_url before calling processImages: the expression new
URL(videoData.thumbnail_url) can throw synchronously and bypass the existing
.catch on the processImages promise; wrap the URL construction/validation in a
try/catch (or guard with a safe URL-parse helper) and only call processImages
when the URL was successfully created, and ensure any thrown error is caught and
logged/handled using the existing jobId/signal flow so banner assignment doesn't
crash the task.
In `@apps/worker/test-utils/setup.ts`:
- Around line 38-41: The mock deleteMockResponse currently sets returning =
vi.fn() which yields undefined when awaited; change returning to be a function
that returns an empty array by default (e.g., make returning an async/mock
function that returns []), so callers that await the .returning(...) can safely
iterate; update the returned object in deleteMockResponse (the where mock that
returns { returning }) accordingly so .where(...).returning(...) resolves to an
empty array by default.
In `@packages/db/src/schema/url-metadata.ts`:
- Line 24: The url_metadata_gist_file schema currently forces language as
notNull (varchar("language", { length: 16 }).notNull()), which fails when GitHub
returns null/omits files[...].language; change the column definition in
packages/db/src/schema/url-metadata.ts for the url_metadata_gist_file entry to
allow nulls (remove .notNull() or use .nullable() depending on the ORM) so
inserts succeed when language is absent, and run migrations/regen types if
applicable.
In `@packages/github-api/src/getGistById.ts`:
- Around line 26-29: The error message in getGistById incorrectly references
getCommits; update the throw in the block that checks if data is undefined or
response.error (in function getGistById) to mention getGistById instead of
getCommits and include the same response.response.status and response.error
details so the runtime message correctly identifies the failing operation.
---
Nitpick comments:
In `@apps/worker/src/tasks/url-metadata/common.ts`:
- Line 17: Replace the top-level awaited export with a lazy initializer: remove
the top-level "export const BUCKET = await s3.ensureBucket(env.S3_BUCKET)" and
add an async getter (e.g., getBucket or ensureBucketInitialized) that caches the
bucket after calling s3.ensureBucket(env.S3_BUCKET) on first use; update callers
to call this getter instead of referencing BUCKET directly so the bucket
creation runs lazily and is easier to mock/test and to avoid module-import side
effects.
In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts`:
- Around line 3-25: The VimeoOEmbedResponse interface uses literal string types
for external API fields (e.g., provider_url: "https://vimeo.com/") which is
brittle; update the interface by changing those literal types to more permissive
types (for example provider_url: string and similarly relax provider_name,
version, and type if they are not critical) so the shape accepts minor API
variations while keeping strongly typed fields like title, width, height,
duration, thumbnail_url, video_id, and uri unchanged.
In `@apps/worker/src/tasks/url-metadata/processor.ts`:
- Line 10: The UrlMetadataEmbedType union should include an "html" variant and
the fallback assignment should set embedType = "html" instead of leaving it
undefined; update the type alias UrlMetadataEmbedType to include "html" and
modify the fallback branch in the URL metadata parsing logic (the code that
currently leaves embedType undefined) to explicitly assign "html" so downstream
filtering/queries can rely on a concrete embedType.
In `@pnpm-workspace.yaml`:
- Line 6: Add a preventive CI drift check to ensure the legacy package name
'@sinclair/typebox' is never reintroduced: create a CI job (or an npm script
referenced by CI) named something like "check-typebox" that searches the
repository for the string '@sinclair/typebox' (excluding node_modules/build
artifacts) and fails the job if any matches are found; this should be triggered
on PRs and main branch updates and references the package names "typebox" and
"@sinclair/typebox" so reviewers can quickly locate the check and its intent.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (39)
apps/api/package.jsonapps/api/src/routes/tasks/post-images.tsapps/api/src/routes/tasks/url-metadata.tsapps/worker/package.jsonapps/worker/src/tasks/sync-author/processor.tsapps/worker/src/tasks/sync-author/types.tsapps/worker/src/tasks/sync-collection/processor.tsapps/worker/src/tasks/sync-collection/types.tsapps/worker/src/tasks/sync-post/processor.tsapps/worker/src/tasks/sync-post/types.tsapps/worker/src/tasks/url-metadata/common.tsapps/worker/src/tasks/url-metadata/data-providers/fx-embed/twitter.tsapps/worker/src/tasks/url-metadata/data-providers/fx-embed/types.tsapps/worker/src/tasks/url-metadata/data-providers/twitch.tsapps/worker/src/tasks/url-metadata/data-providers/video.tsapps/worker/src/tasks/url-metadata/data-providers/vimeo.tsapps/worker/src/tasks/url-metadata/data-providers/x.tsapps/worker/src/tasks/url-metadata/data-providers/youtube.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromGist.test.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromGist.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromHtml.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.test.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.tsapps/worker/src/tasks/url-metadata/processor.tsapps/worker/src/tasks/url-metadata/utils/processImage.tsapps/worker/test-utils/setup.tspackages/bullmq/package.jsonpackages/bullmq/src/tasks/post-image.tspackages/bullmq/src/tasks/url-metadata.tspackages/common/package.jsonpackages/common/src/env.tspackages/db/src/schema/url-metadata.tspackages/github-api/src/getGistById.tspackages/github-api/src/index.tspackages/post-images/package.jsonpackages/post-images/src/fetch-post-data.tspackages/s3/src/utils.tspnpm-workspace.yaml
| width: width || undefined, | ||
| height: height || undefined, | ||
| altText: altText || undefined, |
There was a problem hiding this comment.
Use nullish coalescing for optional numeric fields.
Line [107], Line [108], Line [163], and Line [164] use ||, which drops valid 0 values. Use ?? so only null/undefined are omitted.
Suggested fix
- width: width || undefined,
- height: height || undefined,
+ width: width ?? undefined,
+ height: height ?? undefined,
@@
- width: result.embedWidth || undefined,
- height: result.embedHeight || undefined,
+ width: result.embedWidth ?? undefined,
+ height: result.embedHeight ?? undefined,Also applies to: 163-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/tasks/url-metadata.ts` around lines 107 - 109, Replace
the use of logical OR with nullish coalescing for optional numeric fields so
valid 0 values are preserved: change occurrences of "width: width || undefined"
and "height: height || undefined" (and the analogous occurrences at the later
lines) to use "width: width ?? undefined" and "height: height ?? undefined";
keep altText as "altText || undefined" only if empty string should be treated as
missing, otherwise change altText to "altText ?? undefined". Update the
assignments where these symbols (width, height, altText) are used in the URL
metadata construction and at the later lines mentioned to use ?? accordingly.
| if (!embedUrl) { | ||
| embedUrl = url; | ||
| } | ||
|
|
||
| // Set the "parent" property for embedding - https://dev.twitch.tv/docs/embed/everything/#usage | ||
| embedUrl.searchParams.set("parent", parent); | ||
|
|
||
| return { | ||
| html: `<iframe src="${embedUrl.toString()}" height="300"></iframe>`, | ||
| title: "Twitch Embed", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n apps/worker/src/tasks/url-metadata/data-providers/twitch.tsRepository: playfulprogramming/hoof
Length of output: 1678
🌐 Web query:
Twitch embed iframe src requirements player.twitch.tv channel video
💡 Result:
For a non-interactive Twitch player iframe, the src must point to https://player.twitch.tv/ and include one content identifier plus at least one parent domain:
src formats
- Live channel
<iframe src="https://player.twitch.tv/?channel=<channel>&parent=<yourdomain.com>"></iframe>
- VOD / past broadcast
<iframe src="https://player.twitch.tv/?video=<videoID>&parent=<yourdomain.com>"></iframe>
Key requirements
- HTTPS/SSL required for domains using Twitch embeds. [1]
parentis required: add the full domain (include subdomain likewwwif used). If the embed can appear on multiple domains, include multipleparent=entries. Missing/mismatchedparenttriggers a playback/config error. [1][2]src="https://player.twitch.tv/?channel=twitchpresents&parent=example.com&parent=www.example.com"
- Minimum size (per Twitch embed requirements): width ≥ 400, height ≥ 300. [3]
Sources: Twitch embed requirements and examples from Twitch Developers docs and Twitch Developer Forums. [1][2][3]
Non-clip Twitch URLs produce invalid embed endpoints.
The code falls back to embedding the page URL directly (line 30: embedUrl = url). This results in iframe sources like https://twitch.tv/somechannel?parent=..., which Twitch does not recognize as valid embed endpoints. Twitch requires embeds to use https://player.twitch.tv/ with channel or video query parameters.
Additionally, the iframe has height="300" but no width attribute. Twitch embeds require a minimum width of 400 pixels.
The TODO comment on line 25 explicitly flags this limitation for video and collection URLs, which also need proper handling.
💡 Suggested fix
if (!embedUrl) {
const [first, second] = url.pathname.split("/").filter(Boolean);
embedUrl = new URL("https://player.twitch.tv/");
if (first === "videos" && second) {
embedUrl.searchParams.set("video", `v${second}`);
} else if (first) {
embedUrl.searchParams.set("channel", first);
} else {
return { html: "", title: "Twitch Embed" };
}
}Also add width to the iframe: width="400" minimum.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts` around lines 29
- 38, The current fallback uses the page URL (embedUrl = url) which produces
invalid Twitch iframe src; instead detect non-clip Twitch URLs and build a
proper player endpoint (use "https://player.twitch.tv/" with either
?channel=<channelName> for channel pages or ?video=v<videoId> for video pages /
collections), preserve/add the parent query param on that player URL (use
embedUrl.searchParams.set("parent", parent)), and update the returned html to
use width="400" (minimum) and height="300"; change references to embedUrl and
url where the iframe src is constructed and handle the TODO for video/collection
paths by mapping path segments to the correct channel or video query parameter
before returning html.
| }; | ||
| } | ||
|
|
||
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv"]; |
There was a problem hiding this comment.
Add www.twitch.tv to supported hosts.
Common Twitch URLs use www.twitch.tv; excluding it can skip valid embeds.
🛠️ Proposed fix
-export const twitchHosts = ["clips.twitch.tv", "twitch.tv"];
+export const twitchHosts = ["clips.twitch.tv", "twitch.tv", "www.twitch.tv"];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv"]; | |
| export const twitchHosts = ["clips.twitch.tv", "twitch.tv", "www.twitch.tv"]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/data-providers/twitch.ts` at line 42,
twitchHosts currently lists ["clips.twitch.tv", "twitch.tv"] but misses
"www.twitch.tv", causing valid Twitch URLs to be ignored; update the twitchHosts
export (the twitchHosts constant in
apps/worker/src/tasks/url-metadata/data-providers/twitch.ts) to include
"www.twitch.tv" in the array alongside the existing entries so URL
matching/embedding recognizes both forms.
| const avatarUrl = post.author.avatar_url; | ||
| let avatarResult: ProcessImageResult | undefined = undefined; | ||
| if (avatarUrl) { | ||
| avatarResult = await processImages( | ||
| [new URL(avatarUrl)], |
There was a problem hiding this comment.
Guard provider URL parsing before image processing.
new URL(avatarUrl) (Line 37) and new URL(image.url) (Line 54) can throw before the downstream .catch(...) blocks run. A malformed upstream value can abort the whole task.
💡 Proposed fix
+ const safeParseUrl = (value: string, label: string): URL | undefined => {
+ try {
+ return new URL(value);
+ } catch (e) {
+ console.error(`Invalid ${label} URL for '${inputUrl}'`, e);
+ error = true;
+ return undefined;
+ }
+ };
+
const avatarUrl = post.author.avatar_url;
let avatarResult: ProcessImageResult | undefined = undefined;
if (avatarUrl) {
+ const parsedAvatarUrl = safeParseUrl(avatarUrl, "avatar");
+ if (!parsedAvatarUrl) {
+ avatarResult = undefined;
+ } else {
avatarResult = await processImages(
- [new URL(avatarUrl)],
+ [parsedAvatarUrl],
72,
BUCKET,
"remote-posts",
jobId,
signal,
).catch((e) => {
console.error(`Error processing post image for '${inputUrl}'`, e);
error = true;
return undefined;
});
+ }
}
...
if (image) {
+ const parsedImageUrl = safeParseUrl(image.url, "post image");
+ if (!parsedImageUrl) {
+ imageResult = undefined;
+ } else {
imageResult = await processImages(
- [new URL(image.url)],
+ [parsedImageUrl],
image.width,
BUCKET,
"remote-posts",
jobId,
signal,
).catch((e) => {
console.error(`Error processing post image for '${inputUrl}'`, e);
error = true;
return undefined;
});
+ }
}Also applies to: 53-55
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts` around lines 33 -
37, The code in getEmbedDataFromPost uses new URL(avatarUrl) and new
URL(image.url) directly (variables avatarUrl and image.url) which can throw
synchronously; wrap the URL construction in a try/catch or pre-validate the
string before calling processImages so malformed URLs are skipped or converted
to undefined instead of throwing, and ensure any skipped images are filtered out
before calling processImages([ ... ]) and before mapping into new URL for images
so the downstream .catch handlers are not bypassed.
| const banner = videoData?.thumbnail_url | ||
| ? await processImages( | ||
| [new URL(videoData.thumbnail_url)], | ||
| expectedThumbnailWidth, | ||
| BUCKET, | ||
| "remote-video-thumbnail", | ||
| jobId, | ||
| signal, | ||
| ).catch((e) => { |
There was a problem hiding this comment.
Protect thumbnail URL parsing before calling processImages.
new URL(videoData.thumbnail_url) (Line 39) can throw before the current .catch(...) handler executes. This can fail the task unexpectedly on malformed provider data.
💡 Proposed fix
- const banner = videoData?.thumbnail_url
- ? await processImages(
- [new URL(videoData.thumbnail_url)],
- expectedThumbnailWidth,
- BUCKET,
- "remote-video-thumbnail",
- jobId,
- signal,
- ).catch((e) => {
- console.error(`Unable upload thumbnail image for ${inputUrl}`, e);
- error = true;
- return undefined;
- })
- : undefined;
+ let banner = undefined;
+ if (videoData?.thumbnail_url) {
+ try {
+ const thumbnailUrl = new URL(videoData.thumbnail_url);
+ banner = await processImages(
+ [thumbnailUrl],
+ expectedThumbnailWidth,
+ BUCKET,
+ "remote-video-thumbnail",
+ jobId,
+ signal,
+ ).catch((e) => {
+ console.error(`Unable upload thumbnail image for ${inputUrl}`, e);
+ error = true;
+ return undefined;
+ });
+ } catch (e) {
+ console.error(`Invalid thumbnail URL for ${inputUrl}`, e);
+ error = true;
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts` around lines 37
- 45, In getEmbedDataFromVideo, protect parsing of videoData.thumbnail_url
before calling processImages: the expression new URL(videoData.thumbnail_url)
can throw synchronously and bypass the existing .catch on the processImages
promise; wrap the URL construction/validation in a try/catch (or guard with a
safe URL-parse helper) and only call processImages when the URL was successfully
created, and ensure any thrown error is caught and logged/handled using the
existing jobId/signal flow so banner assignment doesn't crash the task.
| const deleteMockResponse = () => { | ||
| const returning = vi.fn(); | ||
| return { where: vi.fn(() => ({ returning })) }; | ||
| }; |
There was a problem hiding this comment.
Make mocked .returning(...) default to an array.
Current default vi.fn() returns undefined. Callers that iterate over await ...returning(...) can crash in tests.
💡 Proposed fix
const deleteMockResponse = () => {
- const returning = vi.fn();
+ const returning = vi.fn(async () => []);
return { where: vi.fn(() => ({ returning })) };
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const deleteMockResponse = () => { | |
| const returning = vi.fn(); | |
| return { where: vi.fn(() => ({ returning })) }; | |
| }; | |
| const deleteMockResponse = () => { | |
| const returning = vi.fn(async () => []); | |
| return { where: vi.fn(() => ({ returning })) }; | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/test-utils/setup.ts` around lines 38 - 41, The mock
deleteMockResponse currently sets returning = vi.fn() which yields undefined
when awaited; change returning to be a function that returns an empty array by
default (e.g., make returning an async/mock function that returns []), so
callers that await the .returning(...) can safely iterate; update the returned
object in deleteMockResponse (the where mock that returns { returning })
accordingly so .where(...).returning(...) resolves to an empty array by default.
| .references(() => urlMetadataGist.gistId, { onDelete: "cascade" }), | ||
| filename: varchar("filename", { length: 256 }).notNull(), | ||
| contentKey: varchar("content_key", { length: 256 }).notNull(), | ||
| language: varchar("language", { length: 16 }).notNull(), |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub REST API GET gists gist_id files language field null omitted
💡 Result:
In the REST API “Get a gist” response, each entry under files may include a language property (the docs show it in examples), but it’s not something you can rely on always being a non-empty string. [1]
In practice, files[...].language is best treated as optional/nullable:
- Many client SDKs model it as an optional field (e.g.,
*stringwithomitempty), meaning it can be missing or null in JSON. [2] - GitHub determines language via Linguist (used to detect languages and ignore/handle binary content), so if Linguist can’t confidently classify a file (unknown/unsupported extension, binary-ish content, etc.), the API may return
nullor omit it. [3]
If you need a stable value, fall back to your own detection (e.g., by filename/extension or running Linguist yourself) when language is null/absent. [3]
Sources: GitHub REST Gists docs [1], go-github GistFile model [2], GitHub Linguist README [3].
🏁 Script executed:
# Find and inspect the url-metadata.ts schema file
find . -name "url-metadata.ts" -type fRepository: playfulprogramming/hoof
Length of output: 195
🏁 Script executed:
# Inspect the url-metadata.ts schema file to see the actual implementation
cat -n ./packages/db/src/schema/url-metadata.ts | head -50Repository: playfulprogramming/hoof
Length of output: 2244
Make language optional in url_metadata_gist_file schema.
The language field is marked as notNull() but GitHub's REST API can omit or return null for files[...].language when Linguist cannot confidently classify the file. This causes insert failures when ingesting gist data.
Schema adjustment
- language: varchar("language", { length: 16 }).notNull(),
+ language: varchar("language", { length: 16 }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| language: varchar("language", { length: 16 }).notNull(), | |
| language: varchar("language", { length: 16 }), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/db/src/schema/url-metadata.ts` at line 24, The
url_metadata_gist_file schema currently forces language as notNull
(varchar("language", { length: 16 }).notNull()), which fails when GitHub returns
null/omits files[...].language; change the column definition in
packages/db/src/schema/url-metadata.ts for the url_metadata_gist_file entry to
allow nulls (remove .notNull() or use .nullable() depending on the ORM) so
inserts succeed when language is absent, and run migrations/regen types if
applicable.
| if (typeof data === "undefined" || response.error) { | ||
| throw new Error( | ||
| `GitHub API (getCommits) returned ${response.response.status} ${response.error}`, | ||
| ); |
There was a problem hiding this comment.
Fix incorrect operation name in error message.
Line 28 references getCommits, but this function is getGistById. This makes runtime errors misleading.
🛠️ Proposed fix
- `GitHub API (getCommits) returned ${response.response.status} ${response.error}`,
+ `GitHub API (getGistById) returned ${response.response.status} ${response.error}`,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (typeof data === "undefined" || response.error) { | |
| throw new Error( | |
| `GitHub API (getCommits) returned ${response.response.status} ${response.error}`, | |
| ); | |
| if (typeof data === "undefined" || response.error) { | |
| throw new Error( | |
| `GitHub API (getGistById) returned ${response.response.status} ${response.error}`, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/github-api/src/getGistById.ts` around lines 26 - 29, The error
message in getGistById incorrectly references getCommits; update the throw in
the block that checks if data is undefined or response.error (in function
getGistById) to mention getGistById instead of getCommits and include the same
response.response.status and response.error details so the runtime message
correctly identifies the failing operation.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
apps/api/src/routes/tasks/url-metadata.ts (1)
132-156:⚠️ Potential issue | 🟠 MajorRequeue cache repairs when embed side-data is missing.
These mappers now fail soft, but the request path never flips
shouldSubmitJobwhen a cachedgist/postrow is incomplete. That leaves broken cache entries returning partial embed payloads until the normal stale window instead of repairing themselves on the next read.Suggested direction
async function mapEmbed( result: UrlMetadataDbResult, ): Promise<Static<typeof UrlMetadataResponseSchema>["embed"]> { if (result.embedType === "post") { + const post = result.postId ? await mapEmbedPost(result.postId) : undefined; - return { - type: "post", - post: result.postId ? await mapEmbedPost(result.postId) : undefined, - }; + return post ? { type: "post", post } : undefined; } if (result.embedType === "gist") { + const gist = result.gistId ? await mapEmbedGist(result.gistId) : undefined; - return { - type: "gist", - gist: result.gistId ? await mapEmbedGist(result.gistId) : undefined, - }; + return gist ? { type: "gist", gist } : undefined; } if (result.embedType === "video") { return mapEmbedVideo(result); } return undefined; } @@ - const response = { - ...mapUrlMetadata(result), - embed: await mapEmbed(result), - }; + const embed = await mapEmbed(result); + if ( + (result.embedType === "gist" || result.embedType === "post") && + !embed + ) { + shouldSubmitJob = true; + } + + const response = { + ...mapUrlMetadata(result), + embed, + };Also applies to: 193-202, 276-279
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/routes/tasks/url-metadata.ts` around lines 132 - 156, The mappers mapEmbedGist and mapEmbedPost currently return undefined on missing side-data (e.g., when urlMetadataGistFile.findMany returns zero rows or urlMetadataPost rows are incomplete) but do not signal the request flow to requeue a cache-repair job; update these functions so that whenever they detect incomplete embed side-data (gistFiles.length === 0 or required post fields missing after urlMetadataPost.findFirst), they set the request's cache-repair flag (shouldSubmitJob) or call the existing enqueue/submit job helper before returning undefined — locate mapEmbedGist/mapEmbedPost, the calls to urlMetadataGist.findFirst, urlMetadataGistFile.findMany and urlMetadataPost.findFirst, and add the single-line mutation/invocation to trigger submission of the repair job when incomplete data is detected.
🧹 Nitpick comments (2)
apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts (1)
75-92: Consider defensive checks for optional API fields.The
postDataobject accessespost.author.name,post.text,post.url, etc. directly. Per theTweetAPIResponsetypes, some nested fields may be undefined or null (e.g.,avatar_urlisstring | null). If the FXTwitter API returns incomplete data, this could insertundefinedinto NOT NULL columns and cause a database error.💡 Suggested validation
+ // Validate required fields before insert + if (!post.author?.name || !post.author?.screen_name || !post.text || !post.url) { + console.error(`Incomplete post data for '${inputUrl}'`); + return { error: true }; + } + const postData = { postId: dbPostId, authorName: post.author.name,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts` around lines 75 - 92, The postData construction in getEmbedDataFromPost (the postData object) reads nested optional fields like post.author.name, post.text, post.url, post.created_at, avatarResult?.key, imageResult?.key and image?.altText without guards; add defensive checks and sane defaults (or bail out) before building postData: validate post.author and required strings (name, screen_name, text, url, created_at) and coalesce nullable avatar/image fields (avatarResult, imageResult, image) to safe values or set null/empty strings, and if required NOT NULL DB columns would be missing, return an error or skip the record so you never insert undefined into the DB. Ensure checks live near the code that builds postData in getEmbedDataFromPost and reference the same identifiers (post, post.author, avatarResult, imageResult, image) when implementing.apps/worker/src/tasks/url-metadata/processor.ts (1)
59-64: Clarify handling ofundefinedembed fields in database insert.When destructuring
embedData, fields likegistId,postId,embedSrc, etc. may beundefineddepending on the embed type. These are then passed directly to the database insert.This works because the schema defines these columns as nullable (per
packages/db/src/schema/url-metadata.ts), but you may want to explicitly coalesce tonullfor consistency with other fields in the result object (e.g.,title: title ?? null).♻️ Optional: Explicit null coalescing for consistency
- gistId, - postId, - embedSrc, - embedWidth, - embedHeight, - embedType, + gistId: gistId ?? null, + postId: postId ?? null, + embedSrc: embedSrc ?? null, + embedWidth: embedWidth ?? null, + embedHeight: embedHeight ?? null, + embedType: embedType ?? null,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/processor.ts` around lines 59 - 64, The destructured embed fields (gistId, postId, embedSrc, embedWidth, embedHeight, embedType) from embedData can be undefined and are being passed straight into the DB insert; update the code where embedData is mapped for insertion (the object built for the DB insert in processor.ts) to explicitly coalesce each optional embed field to null (e.g., gistId: gistId ?? null) so the insert always uses null instead of undefined, matching how other fields like title are handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/worker/src/tasks/url-metadata/data-providers/x.ts`:
- Around line 9-16: Accept an AbortSignal in getXPostData's signature and
forward it to fetchAsBot (pass the signal param through the options) so job
cancellation propagates; validate that the returned TweetAPIResponse.tweet is
present before returning (throw or return null/error if missing) to handle the
optional type; also update the caller getEmbedDataFromPost to pass its received
signal into getXPostData so the signal is threaded end-to-end.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts`:
- Around line 32-35: The abort signal passed into getEmbedDataFromPost is not
forwarded to getXPostData, so cancellation won't reach the underlying request;
update the call to getXPostData within getEmbedDataFromPost to include the
signal (e.g., pass { signal, userId, postId } or the existing options object
augmented with signal), keeping the existing .catch handling and error log
intact so cancellations propagate correctly.
- Line 91: The createdAt assignment in getEmbedDataFromPost.ts uses new
Date(post.created_at) without validating post.created_at; update the code that
builds the object (the createdAt property) to first check that post.created_at
exists and that new Date(post.created_at) yields a valid date (e.g., const d =
new Date(post.created_at); if (!post.created_at || isNaN(d.getTime())) then
either set createdAt to null/default or throw an error), and then assign
createdAt: d (or the fallback) so you avoid inserting "Invalid Date" into the
timestamp column.
---
Duplicate comments:
In `@apps/api/src/routes/tasks/url-metadata.ts`:
- Around line 132-156: The mappers mapEmbedGist and mapEmbedPost currently
return undefined on missing side-data (e.g., when urlMetadataGistFile.findMany
returns zero rows or urlMetadataPost rows are incomplete) but do not signal the
request flow to requeue a cache-repair job; update these functions so that
whenever they detect incomplete embed side-data (gistFiles.length === 0 or
required post fields missing after urlMetadataPost.findFirst), they set the
request's cache-repair flag (shouldSubmitJob) or call the existing
enqueue/submit job helper before returning undefined — locate
mapEmbedGist/mapEmbedPost, the calls to urlMetadataGist.findFirst,
urlMetadataGistFile.findMany and urlMetadataPost.findFirst, and add the
single-line mutation/invocation to trigger submission of the repair job when
incomplete data is detected.
---
Nitpick comments:
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts`:
- Around line 75-92: The postData construction in getEmbedDataFromPost (the
postData object) reads nested optional fields like post.author.name, post.text,
post.url, post.created_at, avatarResult?.key, imageResult?.key and
image?.altText without guards; add defensive checks and sane defaults (or bail
out) before building postData: validate post.author and required strings (name,
screen_name, text, url, created_at) and coalesce nullable avatar/image fields
(avatarResult, imageResult, image) to safe values or set null/empty strings, and
if required NOT NULL DB columns would be missing, return an error or skip the
record so you never insert undefined into the DB. Ensure checks live near the
code that builds postData in getEmbedDataFromPost and reference the same
identifiers (post, post.author, avatarResult, imageResult, image) when
implementing.
In `@apps/worker/src/tasks/url-metadata/processor.ts`:
- Around line 59-64: The destructured embed fields (gistId, postId, embedSrc,
embedWidth, embedHeight, embedType) from embedData can be undefined and are
being passed straight into the DB insert; update the code where embedData is
mapped for insertion (the object built for the DB insert in processor.ts) to
explicitly coalesce each optional embed field to null (e.g., gistId: gistId ??
null) so the insert always uses null instead of undefined, matching how other
fields like title are handled.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e328b19-832c-4e07-8d1c-00e54b1ef4e7
📒 Files selected for processing (4)
apps/api/src/routes/tasks/url-metadata.tsapps/worker/src/tasks/url-metadata/data-providers/x.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.tsapps/worker/src/tasks/url-metadata/processor.ts
| numLikes: post.likes, | ||
| numReposts: post.reposts, | ||
| numReplies: post.replies, | ||
| createdAt: new Date(post.created_at), |
There was a problem hiding this comment.
Validate created_at before parsing as Date.
new Date(post.created_at) will produce an "Invalid Date" if created_at is undefined or malformed. This could cause unexpected behavior when inserting into the createdAt timestamp column.
🛡️ Proposed validation
- createdAt: new Date(post.created_at),
+ createdAt: post.created_at ? new Date(post.created_at) : new Date(),Or throw an error if created_at is required:
+ if (!post.created_at || isNaN(new Date(post.created_at).getTime())) {
+ console.error(`Invalid created_at for post '${inputUrl}'`);
+ return { error: true };
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| createdAt: new Date(post.created_at), | |
| createdAt: post.created_at ? new Date(post.created_at) : new Date(), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromPost.ts` at line 91, The
createdAt assignment in getEmbedDataFromPost.ts uses new Date(post.created_at)
without validating post.created_at; update the code that builds the object (the
createdAt property) to first check that post.created_at exists and that new
Date(post.created_at) yields a valid date (e.g., const d = new
Date(post.created_at); if (!post.created_at || isNaN(d.getTime())) then either
set createdAt to null/default or throw an error), and then assign createdAt: d
(or the fallback) so you avoid inserting "Invalid Date" into the timestamp
column.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts (1)
9-20: Consider stopping traversal after finding the first iframe.The
visitcallback continues traversing after finding an iframe, which means if there are multiple iframes,propertieswill be overwritten with the last one's attributes. This may not be intentional.♻️ Proposed fix to return early
function getIFrameAttributes(html: string) { const tree = fromHtml(html); let properties = {} as Record<string, unknown>; visit(tree, { tagName: "iframe" }, (_node) => { if (!_node) return; const node: Element = _node; properties = node.properties; + return false; // Stop traversal after first iframe }); return properties; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts` around lines 9 - 20, The getIFrameAttributes function currently lets visit continue and so properties get overwritten by later iframes; update the visit callback in getIFrameAttributes to stop traversal once the first iframe is found by returning visit.EXIT (or the equivalent sentinel) from the callback after setting properties so the first match is returned immediately.apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts (1)
42-42: Include common Vimeo subdomains invimeoHoststo match hostname patterns used by YouTube and Twitch.Vimeo embeds use
player.vimeo.comas the standard embed URL subdomain. The codebase already includes subdomains in similar arrays (YouTube includeswww.youtube.com, Twitch includesclips.twitch.tv), sovimeoHostsshould include at leastplayer.vimeo.comfor consistency and to handle embeds from that domain.export const vimeoHosts = ["vimeo.com", "player.vimeo.com"];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts` at line 42, The vimeoHosts array currently only contains "vimeo.com" so embedded Vimeo URLs like "player.vimeo.com" are missed; update the export vimeoHosts to include common Vimeo subdomains (e.g., add "player.vimeo.com" and optionally "www.vimeo.com") so hostname matching in the Vimeo data provider (export const vimeoHosts) will handle embed and subdomain patterns consistently with YouTube/Twitch host lists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts`:
- Around line 32-39: The current fetch in vimeo.ts (inside the function using
fetchAsBot) blindly calls res.body.json() which will fail or return invalid data
for non-OK HTTP responses; update the promise callback to check res.ok (or
inspect res.status) after fetchAsBot returns, and if not OK throw or reject with
a clear error containing the status and statusText (or return a controlled
null/undefined), otherwise parse and return the JSON as VimeoOEmbedResponse;
reference the fetchAsBot call and the VimeoOEmbedResponse type so the check
occurs before calling res.body.json().
---
Nitpick comments:
In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts`:
- Line 42: The vimeoHosts array currently only contains "vimeo.com" so embedded
Vimeo URLs like "player.vimeo.com" are missed; update the export vimeoHosts to
include common Vimeo subdomains (e.g., add "player.vimeo.com" and optionally
"www.vimeo.com") so hostname matching in the Vimeo data provider (export const
vimeoHosts) will handle embed and subdomain patterns consistently with
YouTube/Twitch host lists.
In `@apps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts`:
- Around line 9-20: The getIFrameAttributes function currently lets visit
continue and so properties get overwritten by later iframes; update the visit
callback in getIFrameAttributes to stop traversal once the first iframe is found
by returning visit.EXIT (or the equivalent sentinel) from the callback after
setting properties so the first match is returned immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cab77c96-159b-47bf-bcfd-a282a8886fe4
📒 Files selected for processing (6)
apps/worker/src/tasks/url-metadata/data-providers/video.tsapps/worker/src/tasks/url-metadata/data-providers/vimeo.tsapps/worker/src/tasks/url-metadata/data-providers/x.tsapps/worker/src/tasks/url-metadata/data-providers/youtube.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromPost.tsapps/worker/src/tasks/url-metadata/getEmbedDataFromVideo.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/worker/src/tasks/url-metadata/data-providers/youtube.ts
- apps/worker/src/tasks/url-metadata/data-providers/x.ts
| return await fetchAsBot({ | ||
| url: new URL( | ||
| `https://vimeo.com/api/oembed.json?url=${encodeURIComponent(url.toString())}`, | ||
| ), | ||
| method: "GET", | ||
| skipRobotsCheck: true, | ||
| signal, | ||
| }).then((res) => res.body.json() as Promise<VimeoOEmbedResponse>); |
There was a problem hiding this comment.
Missing error handling for failed HTTP responses.
The function assumes the fetch always succeeds and returns valid JSON. If Vimeo returns an error status code (e.g., 404 for invalid videos), calling res.body.json() on a non-OK response may produce unexpected results or throw.
🛡️ Proposed fix to add response status check
export async function getVimeoOEmbedDataFromUrl(
url: URL,
signal: AbortSignal,
): Promise<VimeoOEmbedResponse> {
- return await fetchAsBot({
+ const res = await fetchAsBot({
url: new URL(
`https://vimeo.com/api/oembed.json?url=${encodeURIComponent(url.toString())}`,
),
method: "GET",
skipRobotsCheck: true,
signal,
- }).then((res) => res.body.json() as Promise<VimeoOEmbedResponse>);
+ });
+ if (res.statusCode < 200 || res.statusCode >= 300) {
+ throw new Error(`Vimeo oEmbed request failed with status ${res.statusCode}`);
+ }
+ return res.body.json() as Promise<VimeoOEmbedResponse>;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return await fetchAsBot({ | |
| url: new URL( | |
| `https://vimeo.com/api/oembed.json?url=${encodeURIComponent(url.toString())}`, | |
| ), | |
| method: "GET", | |
| skipRobotsCheck: true, | |
| signal, | |
| }).then((res) => res.body.json() as Promise<VimeoOEmbedResponse>); | |
| export async function getVimeoOEmbedDataFromUrl( | |
| url: URL, | |
| signal: AbortSignal, | |
| ): Promise<VimeoOEmbedResponse> { | |
| const res = await fetchAsBot({ | |
| url: new URL( | |
| `https://vimeo.com/api/oembed.json?url=${encodeURIComponent(url.toString())}`, | |
| ), | |
| method: "GET", | |
| skipRobotsCheck: true, | |
| signal, | |
| }); | |
| if (res.statusCode < 200 || res.statusCode >= 300) { | |
| throw new Error(`Vimeo oEmbed request failed with status ${res.statusCode}`); | |
| } | |
| return res.body.json() as Promise<VimeoOEmbedResponse>; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/worker/src/tasks/url-metadata/data-providers/vimeo.ts` around lines 32 -
39, The current fetch in vimeo.ts (inside the function using fetchAsBot) blindly
calls res.body.json() which will fail or return invalid data for non-OK HTTP
responses; update the promise callback to check res.ok (or inspect res.status)
after fetchAsBot returns, and if not OK throw or reject with a clear error
containing the status and statusText (or return a controlled null/undefined),
otherwise parse and return the JSON as VimeoOEmbedResponse; reference the
fetchAsBot call and the VimeoOEmbedResponse type so the check occurs before
calling res.body.json().
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/db/drizzle/20260307174859_furry_dark_beast/migration.sql`:
- Around line 7-13: The column size for url_metadata_gist_file.language is too
small (varchar(16)) and must be widened to varchar(64) to accept multi-word
GitHub language labels; update the migration that creates the
"url_metadata_gist_file" table to change the "language" column from varchar(16)
to varchar(64) and also update the schema definition in the DB schema for
url-metadata (the exported url_metadata_gist_file/table schema or column
descriptor) to reflect the new max length so runtime inserts from
getEmbedDataFromGist.ts succeed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e4ec19df-27b2-44b0-8ee1-be11c02b38de
📒 Files selected for processing (2)
packages/db/drizzle/20260307174859_furry_dark_beast/migration.sqlpackages/db/drizzle/20260307174859_furry_dark_beast/snapshot.json
| CREATE TABLE "url_metadata_gist_file" ( | ||
| "gist_id" varchar(256), | ||
| "filename" varchar(256), | ||
| "content_key" varchar(256) NOT NULL, | ||
| "language" varchar(16) NOT NULL, | ||
| CONSTRAINT "url_metadata_gist_file_pkey" PRIMARY KEY("gist_id","filename") | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Ingest path:"
rg -n -C2 'language:\s*file\?\.language\s*\?\?\s*"text"' \
apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.ts
echo
echo "Current schema cap:"
rg -n -C1 'varchar\("language", \{ length: 16 \}\)|"language" varchar\(16\)' \
packages/db/src/schema/url-metadata.ts \
packages/db/drizzle/20260307174859_furry_dark_beast/migration.sqlRepository: playfulprogramming/hoof
Length of output: 826
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full url_metadata_gist_file table definition ==="
rg -n -A10 'url_metadata_gist_file' packages/db/src/schema/url-metadata.ts | head -20
echo
echo "=== Check for hardcoded length assumptions or truncation logic ==="
rg -n 'language.*16|length.*16|truncate.*language' \
apps/worker/src/tasks/url-metadata/ packages/db/src/schema/ || echo "No matches"
echo
echo "=== Verify GitHub's actual language label examples ==="
rg -n 'language|Language' apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.ts | head -15Repository: playfulprogramming/hoof
Length of output: 843
🌐 Web query:
GitHub gist API language field examples Jupyter Notebook maximum length
💡 Result:
language field in the GitHub Gist REST API (files.*.language)
In a Gist response, each file object can include a detected language string, e.g.:
- Markdown file
"language": "Markdown"(example shown in GitHub’s Gists REST API docs) [1]
- Ruby file
"language": "Ruby"(example shown in GitHub’s Gists REST API docs) [2]
- Jupyter Notebook file (
.ipynb)- The detected language name GitHub Linguist uses for
.ipynbis"Jupyter Notebook"[3], so that’s what you should expect infiles.*.languagewhen GitHub can detect it.
- The detected language name GitHub Linguist uses for
Notes:
languageis detection-derived (GitHub Linguist); it may benullif GitHub can’t confidently identify a language (commonly for binary/unknown content). [4]
“Maximum length”
- There’s no published maximum length for the
languagestring itself in the Gist REST API schema/docs (it’s just a short language name like"Markdown"/"Jupyter Notebook"). - If you meant maximum file content length returned by the API: the Gist REST API returns up to 1 MB of
contentper file; beyond that, the file is markedtruncated: true, and you should fetch the full file viaraw_url. [1]
Sources: [1] [2] [3] [4]
Widen url_metadata_gist_file.language column to accommodate GitHub's multi-word language labels.
The ingest path at apps/worker/src/tasks/url-metadata/getEmbedDataFromGist.ts:80 writes GitHub's gist-file language value directly into this column without validation or truncation. GitHub's Linguist library returns multi-word labels such as "Jupyter Notebook" (17 characters), which exceeds the current varchar(16) constraint in both the migration and schema files. Inserts will fail when GitHub returns such labels.
Widen the column to varchar(64) in:
packages/db/drizzle/20260307174859_furry_dark_beast/migration.sql(line 11)packages/db/src/schema/url-metadata.ts(line 24)
Proposed fix
- "language" varchar(16) NOT NULL,
+ "language" varchar(64) NOT NULL,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/db/drizzle/20260307174859_furry_dark_beast/migration.sql` around
lines 7 - 13, The column size for url_metadata_gist_file.language is too small
(varchar(16)) and must be widened to varchar(64) to accept multi-word GitHub
language labels; update the migration that creates the "url_metadata_gist_file"
table to change the "language" column from varchar(16) to varchar(64) and also
update the schema definition in the DB schema for url-metadata (the exported
url_metadata_gist_file/table schema or column descriptor) to reflect the new max
length so runtime inserts from getEmbedDataFromGist.ts succeed.
Adds embed information from playfulprogramming/hoof#121 to provide & cache the video/post/gist details <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Unified embed handling: richer support for video (YouTube/Vimeo/Twitch), social posts, and Gist previews via metadata. * Gist previews now show truncated code content and filename. * **Bug Fixes** * More reliable embed fallbacks and external-link behavior for video embeds. * **Style** * Improved video/social embed layout: centered, responsive sizing, correct aspect-ratio handling, and better image rendering (cross-origin-safe images and thumbnails). <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Fixes #82 - copies the platform-detectors implementation from the frontend to run as part of the url-metadata task to store/upload github gists, posts, and video embed info.
Summary by CodeRabbit
New Features
Database
Tests
Chores