-
Notifications
You must be signed in to change notification settings - Fork 247
feat: move rrweb event stream to client and query through /api/clickhouse-proxy #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 454f7b2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
validateRequest({ | ||
query: z.object({ | ||
hyperdx_connection_id: objectIdSchema, | ||
function validation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handling both GET and POST routes now, so broke into RequestHandlers that both can use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: we don't need to wrap the validate middleware within another function, right? can just be a var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was doing it mainly to make the PR easily reviewable (other functions would reformat and indent lines that otherwise haven't changed) but I'll just add some prettier commands for the time being
try { | ||
const { teamId } = getNonNullUserWithTeam(req); | ||
const { hyperdx_connection_id } = req.query; | ||
const connection_id = req.headers['x-hyperdx-connection-id']!; // ! because zod already validated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header x-hyperdx-connection-id is now in place of the respective queryparam. I chose this based on the clickhouse documentation for a reverse-proxy in front of a clickhouse instance. They do have a section that allows you to send query params as well, but those are all prefixed with 'param_', which quickly gets messy with zod validating either query param 'hyperdx_connection_id' or 'param_hyperdx_connection_id', so I opted to move to using headers.
const qIdx = _req.url.indexOf('?'); | ||
if (qIdx >= 0) { | ||
qparams = _req.url.substring(qIdx); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_req.query is type ParamQs, which does not necessarily play nicely with URLSearchParams. Since we now forward all query params, prefer to just grab the string from the url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this introduce any encoding issue? I'd suggest to move all irrelevant changes to a separate PR so folks can review it easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
packages/api/src/utils/validation.ts
Outdated
import express from 'express'; | ||
import { z } from 'zod'; | ||
|
||
export function validateRequestHeaders<T extends z.Schema>(schema: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the same as validateRequest function, but they don't validate headers. Maybe should submit a feature request
**/ | ||
on: { | ||
proxyReq: fixRequestBody, | ||
}, | ||
// ...(IS_DEV && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function fixRequestBody does not handle Content-Type: 'text/plain', which makes things horribly annoying. Here's what happens:
- bodyParser.text reads the incoming stream to req.body
- fixRequestBody looks at the content type and sees 'text/plain', but it doesn't handle that, so it does not write to the proxy request
- The proxy request is sent to our express server with a non-zero content-length but an empty body
- The express server's express.text middleware (bodyParser under the hood) will see text/plain and a Content-Length > 0, so it tries to read the incoming stream and parse it into req.body
- The stream never comes in, so next() is never called. The server will just wait ad infinitum.
I filed an issue and PR to fix fixRequestBody. But we don't even need bodyParser here, so just disabling it and forwarding everything works too.
const headers = {}; | ||
if (!isLocalMode && connectionId) { | ||
headers['x-hyperdx-connection-id'] = connectionId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to sending connectionId through the header instead of query params
4883707
to
b5e07d3
Compare
packages/app/src/sessions.ts
Outdated
); | ||
|
||
const clickhouseClient = createClient({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reuse or extend?
export class ClickhouseClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with Warren, this would be a good proof of concept first to use client-web here and then introduce to ClickhouseClient when we feel good about it. I created a linear issue to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a TODO comment?
|
||
// OPTIMIZATION STRATEGY | ||
// | ||
// 1. Write a clickhouse query to divide a session into different chunks, where each chunk has a start time. Maybe each chunk contains 100 events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunks can only start from a full snapshot (type 4 rrweb event), as most events are deltas that need to be applied from the last full snapshot.
// 1. Write a clickhouse query to divide a session into different chunks, where each chunk has a start time. Maybe each chunk contains 100 events. | ||
// 2. When slider advances, use the timestamp to determine which chunk you are in | ||
// 3. Fetch data associated with that chunk | ||
// 4. Probably do some prefetching for future times | ||
export function useRRWebEventStream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why we don't reuse useOffsetPaginatedQuery
here instead? iirc it should do all the things this function does, but better. we should avoid multiple streaming clickhouse query function implementations to keep our implementation centralized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I do think we need to refactor how rrweb events are fetched in general. I created HDX-1629 for that
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
packages/api/src/utils/validation.ts
Outdated
@@ -0,0 +1,17 @@ | |||
import express from 'express'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should probably put this in the middleware
dir
packages/app/src/sessions.ts
Outdated
'x-hyperdx-connection-id': source.connection, | ||
}, | ||
url: window.location.origin, | ||
pathname: '/api/clickhouse-proxy', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the app is able to stream directly, the clickhouse client should request ch server without proxy under local model. check out the getClickhouseClient
method (
hyperdx/packages/app/src/clickhouse.ts
Lines 21 to 39 in 8c95b9e
export const getClickhouseClient = () => { | |
if (IS_LOCAL_MODE) { | |
const localConnections = getLocalConnections(); | |
if (localConnections.length === 0) { | |
console.warn('No local connection found'); | |
return new ClickhouseClient({ | |
host: '', | |
}); | |
} | |
return new ClickhouseClient({ | |
host: localConnections[0].host, | |
username: localConnections[0].username, | |
password: localConnections[0].password, | |
}); | |
} | |
return new ClickhouseClient({ | |
host: PROXY_CLICKHOUSE_HOST, | |
}); | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, added in
packages/app/pages/api/[...all].ts
Outdated
@@ -6,7 +6,8 @@ const DEFAULT_SERVER_URL = `http://127.0.0.1:${process.env.HYPERDX_API_PORT}`; | |||
export const config = { | |||
api: { | |||
externalResolver: true, | |||
bodyParser: true, | |||
bodyParser: false, | |||
responseLimit: '32mb', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on the decision behind this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw it as good practice to implement a responseLimit online and felt it might not be best security to allow unlimited data to flow through, so I chose a number I also saw being used on the express bodyParser middleware for json and text in api-app.ts. This may not be ideal though, especially since we already allow the client to potentially send a massive query to the db, which is where the performance hit would really come from.
For now I'm removing the responseLimit. I'll look to your guidance for the right thing to do on this
9464f91
to
6801d96
Compare
packages/app/src/sessions.ts
Outdated
} | ||
: undefined, | ||
url: hostname ?? window.location.origin, | ||
pathname: pathname ?? '/api/clickhouse-proxy', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can reuse PROXY_CLICKHOUSE_HOST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is local to the clickhouse client file in common-utils. I was thinking when we move this over to common-utils we would then swap this to PROXY_CLICKHOUSE_HOST. I'll add a todo comment for that
packages/app/src/sessions.ts
Outdated
'x-hyperdx-connection-id': source.connection, | ||
} | ||
: undefined, | ||
url: hostname ?? window.location.origin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR moves the rrweb event stream fetching from an API route to the client and introduces a new ClickHouse proxy for queries. Key changes include:
- Updating the ClickHouse client usage and query parameter handling in common-utils and app packages.
- Refactoring the event stream logic in sessions to leverage the ClickHouse client with async iterator streaming.
- Removing the sessions API route and enhancing the ClickHouse proxy to accept both GET and POST requests with header‐based connection identification.
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
packages/common-utils/src/clickhouse.ts | Removes connection query parameter and adds header-based handling. |
packages/app/src/sessions.ts | Refactors rrweb event streaming to use the ClickHouse client query. |
packages/app/pages/api/[...all].ts | Disables automatic body parsing for API requests. |
packages/api/src/routers/api/sessions.ts | Removes the sessions API endpoint. |
packages/api/src/routers/api/index.ts | Omits sessionsRouter from the API index. |
packages/api/src/routers/api/clickhouseProxy.ts | Updates header-based connection validation and extends methods. |
packages/api/src/middleware/validation.ts | Adds request header validation middleware. |
packages/api/src/api-app.ts | Removes sessions route usage from the API application. |
.changeset/neat-badgers-matter.md | Documents the version bump for the updated modules. |
Files not reviewed (1)
- docker/clickhouse/local/config.xml: Language not supported
linesFetched++; | ||
forFunc(parsed); | ||
} catch { | ||
// do noting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo detected: 'do noting' should be corrected to 'do nothing'.
// do noting | |
// do nothing |
Copilot uses AI. Check for mistakes.
} catch { | ||
// do noting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catch block silently suppresses JSON parsing errors. Consider logging the error or handling it appropriately to aid in debugging malformed rows.
} catch { | |
// do noting | |
} catch (error) { | |
console.error('Error parsing JSON row:', error, 'Row:', row); |
Copilot uses AI. Check for mistakes.
// @ts-expect-error _req.query is type ParamQs, which doesn't play nicely with URLSearchParams. TODO: Replace with getting query params from _req.url eventually | ||
const qparams = new URLSearchParams(_req.query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on an implicit conversion of _req.query, explicitly converting it by calling toString() or retrieving query params from _req.url would improve clarity and reliability.
// @ts-expect-error _req.query is type ParamQs, which doesn't play nicely with URLSearchParams. TODO: Replace with getting query params from _req.url eventually | |
const qparams = new URLSearchParams(_req.query); | |
const url = new URL(_req.url, `http://${_req.headers.host}`); | |
const qparams = new URLSearchParams(url.search); |
Copilot uses AI. Check for mistakes.
packages/app/src/sessions.ts
Outdated
}, | ||
}); | ||
|
||
return await clickhouseClient.query({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra await?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Game changer! Can't wait to get this out 🚢 🚢 🚢
Ref: HDX-1381