Skip to content

create a kafka package #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rybit opened this issue Jul 2, 2020 · 8 comments
Closed

create a kafka package #135

rybit opened this issue Jul 2, 2020 · 8 comments

Comments

@rybit
Copy link
Member

rybit commented Jul 2, 2020

Ok we have 3 projects that are doing kafka each slightly differently:

They all have slightly different interfaces and abstractions. I'd love to get some opinions about how we should standardize some of this.

At a minimum, I'd like to have configuration structs/opts that can be used to establish a connection. It should handle:

  • TLS
  • authentication
  • sane defaults

Here are some question I have:

  • should it expose the underlying kafka library directly, or pull a reader/writer interface?
  • which underlying library should we use?
  • is there a recommended way we should consume/ack messages?

What are your questions? Got some ideas for what it should/nt cover?

@rybit
Copy link
Member Author

rybit commented Jul 2, 2020

cc @smoya, @jose-ledesma, @mraerino, @pandemicsyn who did those implementations

@mraerino
Copy link
Contributor

mraerino commented Jul 2, 2020

which underlying library should we use?

I really liked most of the https://github.com/segmentio/kafka-go lib
all projects you linked seem to be using it

a thing to look out for is that their NewWriter can panic on invalid input: https://github.com/netlify/edge-state/blob/master/kafkaclient/kafka.go#L297-L304

is there a recommended way we should consume/ack messages?

most likely separate calls to Fetch and Commit after it has been applied internally
edge-state uses an abstraction that also handles retries and only commits if no error occured: https://github.com/netlify/edge-state/blob/master/kafkaclient/readloop.go


I think a good constructor and a thin wrapper around the library should work well for the start.
The segmentio Reader & Writer Implementations are very high-level and easy to use.

@pandemicsyn
Copy link

pandemicsyn commented Jul 2, 2020

should it expose the underlying kafka library directly, or pull a reader/writer interface?

Assuming we can sort out a nice way to handle configs - I'd be fine potentially exposing the underlying kafka lib directly since others might be doing more advanced stuff, but so far my use case has been very straight forward. I've gotten away with just using the segment packages vanilla kafka.Reader implementation.

which underlying library should we use?

I only have real word exposure to https://github.com/segmentio/kafka-go (now) and https://github.com/confluentinc/confluent-kafka-go (in the past). I didn't have any issues with confluents driver - but I know that dependency on librdkafka is a no-go for some folks.

I think its worth noting though that segment driver isn't without issues (like that time_wait bug) and is definitely still under going development. 4.0 is reworking some big parts (segmentio/kafka-go#438 , segmentio/kafka-go#461) and it still occasionally has issues (https://github.com/segmentio/kafka-go/issues?q=is%3Aissue+is%3Aclosed) but I generally liked it and be down for taking the chance on it.

is there a recommended way we should consume/ack messages?

+1 What @mraerino said ;)

I think support for emitting background kafka stats from the segment driver (like we do for mysql in metriks) would definitely be nice too.

@pandemicsyn
Copy link

Also just wanna expand on why I like defaulting and rolling with separate Fetch/Commit's as our recommended way.

Basically, at a previous gig we defaulted to auto commit in the package setup - and most folks would implement their own separate Fetch/Commit pattern like @mraerino mentioned. But as the company grew, and others at the co starting using it who where not as familiar with Kafka it sometimes led to implementation pain for them after they'd gone live (i was one of those folks).

Forcing folks to think through their fetch/commit strategy is good.

@jose-ledesma
Copy link

jose-ledesma commented Jul 3, 2020

When I was looking for kafka libraries, the first I thought about was the confluent one. But using c-bindings may make it hard for local environments, and it may be a bit cumbersome to dockerize it (what is the docker from? the golang one and configure the Confluent repository, and then maybe the golang debian changes and it breaks librdkafka? start from the ubuntu one and install golang on it?)

When I looked to the pure go implementations, the segmentio one seemed the more mature (although it is still in development)

Agree with everything said so far by @mraerino and @pandemicsyn

@rybit
Copy link
Member Author

rybit commented Jul 8, 2020

yeah I'm in alignment with the team:

  • thin config wrapper
  • segment's impl
  • expose the lib directly (we can tweak this later if we see patterns)

@mraerino
Copy link
Contributor

mraerino commented Jul 16, 2020

I really liked most of the https://github.com/segmentio/kafka-go lib
all projects you linked seem to be using it

I think our team will retract that statement.
We've been running into nasty issues with the lib:
https://github.com/netlify/edge-state/issues/257
https://github.com/netlify/edge-state/issues/243

@smoya is evaluating https://github.com/Shopify/sarama and https://github.com/confluentinc/confluent-kafka-go

@smoya
Copy link
Contributor

smoya commented Aug 6, 2020

FYI, we are now using https://github.com/confluentinc/confluent-kafka-go and it fixed all the issues we had.
It is basically a wrapper on top of librdkafka and it is very well maintained.

There is some basic work pending around error handling in order to have something deliverable that can be shared across the org. This will happen very soon (I would like next week).

@smoya smoya closed this as completed Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants