Skip to content

native bindings seg faulting with node v0.7.12 #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
muirmanders opened this issue Jun 21, 2012 · 10 comments
Closed

native bindings seg faulting with node v0.7.12 #136

muirmanders opened this issue Jun 21, 2012 · 10 comments
Labels

Comments

@muirmanders
Copy link

I am seeing segmentation faults when trying to use the native pg bindings with node v0.7.12 (pg v0.7.1, postgres 9, Mac OS X 10.7.3).

I do not see the problem using the javascript bindings, and I do not see the problem on node v0.6.19 with native or javascript bindings.

Here is a simple script which produces a seg fault for me:

var pg = require('pg').native;

var client = new pg.Client('postgres://muir@localhost/node_test');
client.connect();

client.query('select 1', function(err, data) {
  console.log("didn't segfault!");
  process.exit();
});

Here is some debugging info from gdb (notice that "loop" is null for some reason):

Reading symbols for shared libraries ..... done
WARNING: ev_io is deprecated, use uv_poll_t

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000000000000a0
uv__io_stop (loop=0x0, handle=0x102800110) at ../deps/uv/src/unix/core.c:636
636   ev_io_stop(loop->ev, &handle->io_watcher);
(gdb) thread apply all bt

Thread 2 (process 73678):
#0  0x00007fff883106b6 in semaphore_wait_trap ()
#1  0x00000001002da71d in v8::internal::RuntimeProfiler::WaitForSomeIsolateToEnterJS () at /Users/muir/.nvm/src/node-v0.7.12/deps/v8/src/runtime-profiler.cc:443
#2  0x00000001002da71d in v8::internal::RuntimeProfilerRateLimiter::SuspendIfNecessary (this=0x1a03) at runtime-profiler.cc:490
#3  0x0000000100397dab in v8::internal::SamplerThread::Run (this=0x1a03) at platform-macos.cc:781
#4  0x00000001003970ad in ThreadEntry (arg=0x100b18310) at platform-macos.cc:518
#5  0x00007fff879828bf in _pthread_start ()
#6  0x00007fff87985b75 in thread_start ()
Current language:  auto; currently c

Thread 1 (process 73678):
#0  uv__io_stop (loop=0x0, handle=0x102800110) at ../deps/uv/src/unix/core.c:636
#1  0x0000000100050110 in uv__handle_stop [inlined] () at /Users/muir/.nvm/src/node-v0.7.12/deps/uv/src/uv-common.h:74
#2  0x0000000100050110 in uv__poll_stop [inlined] () at /Users/muir/.nvm/src/node-v0.7.12/deps/uv/src/unix/poll.c:75
#3  0x0000000100050110 in uv_poll_stop (handle=0x1028000c0) at ../deps/uv/src/unix/poll.c:81
#4  0x0000000100aa9047 in Connection::HandleConnectionIO (this=0x102800090) at binding.cc:589
#5  0x0000000100aa954b in Connection::HandleIOEvent (this=0x102800090, revents=1606415184) at binding.cc:391
#6  0x00000001000485e3 in ev_invoke_pending (loop=0x10066f8c0) at ../deps/uv/src/unix/ev/ev.c:2145
#7  0x0000000100044739 in uv__run (loop=0x10066f260) at ../deps/uv/src/unix/core.c:248
#8  0x00000001000448dc in uv_run (loop=0x10066f260) at ../deps/uv/src/unix/core.c:265
#9  0x0000000100008025 in node::Start (argc=1606415456, argv=0x100aa9001) at node.cc:2885
#10 0x0000000100001384 in start ()

Could the problem just be that pg needs to be updated to use uv instead of ev?

@mattmonson
Copy link

I am getting this too, but I, unfortunately, can't fall back to the javascript bindings because I need SSL =/

@brianc
Copy link
Owner

brianc commented Jun 29, 2012

I'm getting this too now. Just updated my OS X to v0.8.0. I'll try to get some help on this issue as soon as I can. I'm guessing I need to migrate the eio_ calls to their uv_ counter-parts and use the uv_* structures

@mattmonson
Copy link

Yeah, the 0.8.0 release notes (http://blog.nodejs.org/) had a migration
guide (https://github.com/joyent/node/wiki/API-changes-between-v0.6-and-v0.8
).

On Thu, Jun 28, 2012 at 5:24 PM, Brian Carlson <
[email protected]

wrote:

I'm getting this too now. Just updated my OS X to v0.8.0. I'll try to
get some help on this issue as soon as I can. I'm guessing I need to
migrate the eio_ calls to their uv_ counter-parts and use the uv_*
structures


Reply to this email directly or view it on GitHub:
#136 (comment)

@mouna-apperson
Copy link

Any estimate on when this will be done?

I set up a new array of node instances we are using to run API server with node 0.8.1 and it was non-obvious to me what was going wrong until I traced it out. I'll need to decide whether to rollback to 0.6.19 or wait. I hate missing out on all the bug fixes and performance enhancements in node, but we have over 200k daily uniques and I need to get this array up in the next couple days.

@brianc
Copy link
Owner

brianc commented Jul 3, 2012

@nick-apperson I'm sorry your experiencing an issue. I would recommend using the non-native bindings (pure javascript) until I have time to devote to fixing the issue. It's high on my list for node-postgres but unfortunately right now I've got a few things higher on the list. At the risk of TMI: moving into a partial demolished remodel home in a week, getting married a few weeks after that, and have a lot of client work to tend to.

some options -

  1. use pure javascript bindings
  2. pull request - the beauty of open source is you can fix it yourself
  3. contact me at [email protected] to sponsor a few hours of my time to fix this issue today or tomorrow
  4. hang tight until I get to fixing this

I would LIKE to get to this issue before the weekend, but the way things are going with me right now almost everything is in a holding pattern until my personal & professional issues are done being in the front seat.

@mouna-apperson
Copy link

I have updated to using the javascript bindings. I thought about doing the pull request, but my job requires every moment of my free programming time. I would love to contribute to the project though (and when I'm working less, I might).

I didn't mean to be unappreciative of what you're doing. I'm also not opposed to sponsoring your time working on bugs. I've sent you an email asking about your rates.

@brianc
Copy link
Owner

brianc commented Jul 3, 2012

No problem @nick-apperson. You didn't seem unappreciative -- I understand how deadlines can be. I'll get back to your email a little later today and keep you updated here about progress on the bugfix. Thanks!

@brianc
Copy link
Owner

brianc commented Jul 5, 2012

the ever amazing @booo sent a patch for initial libuv porting. Thank you so much! I've run through all the tests locally and they have passed on this set up:

osx - node v0.8.0 - postgres 9.1.4

@booo was saying he ran into some tests failing, but on my setup the entire suite passed. I'll run this through my setup on linux tomorrow, but I wanted to get this out asap to get past the major segfaults some people were experiencing. I will work on getting things more stable over the rest of the week and this weekend. Thanks again to @booo.

@booo
Copy link
Contributor

booo commented Jul 5, 2012

Ok. I'm going to push the patch in a minute. Should I create a pull request?

The failing test is test/native/connection-test.js. The problem is calling con.end() after an error occurred. The c++ code calls uv_poll_stop but the read_watcher points to nothing or random memory. It's not initialized because of the error while connecting.

https://github.com/booo/node-postgres/blob/c++/src/binding.cc#L641

Sample causing this segfault on ubuntu with node 0.8.1 and postgres 9.1.*:

pg = (require "./").native

client = new pg.Client()
client.connect (error)-> client.end()
client.connect()

I don't have a really nice solution for this. Maybe we should keep the state of the read_watcher in some variable and only call uv_poll_stop if appropriate?

Also I suggest that we move this block

https://github.com/booo/node-postgres/blob/master/src/binding.cc#L335

behind the CONNECT_BAD block

https://github.com/booo/node-postgres/blob/master/src/binding.cc#L342

For example if we provide wrong connection informatin this is the actual problem and not that we couldn't set the connection to non-blocking.

@brianc
Copy link
Owner

brianc commented Jul 5, 2012

sure - a pull request helps me identify what in your repo you want merged in.

I see what you're saying about the seg fault w/ a non-initialized read/write watchers. I think a private variable on the connection class...something like ioInitialized_ or something could be used. The same way connecting_ is used now. Probably overall better to use a simple state machine in the long term, but this would get the job done.

I totally agree with moving that block behind the CONNECT_BAD block as well. 👍

brianc pushed a commit that referenced this issue Jul 6, 2012
Test case from ticket works but some tests fail.

Signed-off-by: brianc <[email protected]>
@brianc brianc closed this as completed Jul 12, 2012
brianc pushed a commit that referenced this issue Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants