-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Latency/Load/Overhead caused by integer indexed arrays indexOf/splice/shift etc #2199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interesting. If this is really a problem I think it would be worth seeing some improvements. |
@rauchg What are your thoughts? |
@selay - hey this sounds pretty interesting. Can you share the modifications you've made or at least any ideas what you've done. I'm pretty interested in this fix, as I've already browsed the code and thought it was quite a strange approach. How do you use the "associative array", what's the key? I don't understand the array either. First I thought it was because they want the total number of connected sockets but never found anything like that in the code. Weird.. |
Yes, using socket id as key (or property if you prefer to call it an object) in "associated array". As you know, socket id is the default room for each socket - so when you send something to Socket A, you actually send it to Room named socket id.
you can have
To check if it exists,
you can have
The original code to remove from room when socket disconnects is:
You can simply have:
The same stuff to check if room exists are used many times during a single emit, broadcast etc at the top of each function called, so actually the already big overhead gets multiplied.
First you get id with an exhaustive search and then access by index.
In several other places, there are other inefficient codes:
You can change to this:
BY THE WAY, I JUST WROTE ALL CODES HERE FOR ILLUSTRATION WITHOUT LOOKING AT MY PROJECT, SO PLEASE DON"T ASSUME THEY ARE CORRECT, AND TEST BEFORE USING. |
Hi! I tried to implement the suggested modifications, is there any way to properly bench it ?
function Socket(nsp, client){
[...]
this.id = nsp.name + '::' + client.id; Note: the separator
Socket.prototype.emit = function(ev){
[...]
this.adapter.broadcast(packet, {
except: [this.id],
rooms: this._rooms,
flags: flags
}); PS: the Travis build has miserably failed 😱 (https://travis-ci.org/socketio/socket.io/builds/79921947), but I think it's related to the last commit on master (Error: Cannot find module 'base64id'). Maybe. Please 😇 |
Yes, you need to hack any component or module that references integer-indexed array. Any conversion can be a big overhead. The modules or adaptor use what socket.io uses, so dependencies may require modification. If you want to make it generic and contribute to this project, I think the better way is you only change socket.io for the next version, and note the changes so that other module developers can change their modules to be compatible with socket.io next release. If I get some time, I will help to covert some modules as well. I think the next release should focus on efficiency and it can be version 2.0 to be seperate from 1.x line, which means other modules will need to support it. |
Hi! As suggested by @kinsi55, I also converted I have merged these changes in my repo, if anyone is interested. Against 1.3.5 and 1.3.7 releases (as I haven't figured why #2251 yet). |
Did you test memory usage? |
@darrachequesne Hi. Apparently the answer i posted to your PR, after deleting my initial comment didnt go trough because my mobile connection sucked ass, so i'll write it down again. I deleted my comment because i actually did some tests on this in chome, and to me, although it doesnt make any sense, it looked like a normal for loop in combination w/ object.keys is slower than for in, i couldnt believe that to be honest, but that seems to be the case, as @selay prove too, while the jsperf gave me a 98% slower than for, which is why i initally even criticised the code. I guess you never learn out with JS, and as @selay wrote, i am pretty sure memory usage would be something that gets influenced by this change, so i am sorry, but i'd suggest changing it back. |
Well I guess it depends on the browser 😃 (http://postimg.org/image/v1os2pfjv/) ! But those results seem to vary with the number of attributes in the object (with few attributes |
I just tested it right now w/ this since i was curious and i really want socket.io to be what it can.
Result for 0.12.7:
Same with 4.1.1:
TL;DR There really isnt a difference in node you should consider to address in this case, though, in 4.1.1 for..in seems to be slower consistently (Yes i tried it more often than listed here, for..in is, in this case, always slower than a normal for loop (Node 4.1.1) |
Hello, we submitted similar PR |
I guess anyone who used socket.io in production environment has already or will soon run into this issue. Changing loop types, global to local variables etc. will not provide much benefit, but of course optimization should eventually be done anywhere possible with the condition that it doesn't affect maintainability (for example, for-in loop is straightforward and easier to maintain. Object.keys may use for-in loop or another loop internally depending on the implementation when it iterates through keys. it is unlikely to be better. But even if it is better, it makes the code horrible to maintain - eventually can lead to more bugs when an enhancement is made). Also, such things are generally optimized by (smart) interpreter anyway. However, indexOf to "associative array" (object) is a different story. Its performance difference is undeniably and significantly high. Also, fixing it does not really complicate maintainability because [] access is very common and easy to follow. (actually, it is more straightforward then indexOf because the latter usually confuses novice programmers due to return value of -1 not false when not found, for example). So, I think it is better to prioritize issues which deliver axiomatic improvement. |
@kapouer Hi! I hadn't seen your PR #2141, it seems they are indeed quite similar! All the change are included in the PR #2239, aren't them (please correct me if I'm wrong)? Currently, conversion from array elements to object attributes affects:
Note: The other PR is closed, we should rather take a look at socketio/socket.io-adapter#30, right? I think the latter may be interessant (converting @selay do you see any other improvement to be made on the current subject? |
Well also my point @darrachequesne was to make sure we don't break API even for adapters - current breakage between master and 1.3.x branches is already painful. Besides that, i think you have a much clearer view of what's to do and how to do it. And No, i have no benchmarks on that. Maybe @manubb could help but not sure he has time for this. |
Should be fixed in #2239 , thanks for @darrachequesne 😄 |
Recently I was debugging to find the reason for frequent server crashes, delays in transmission and server capacity issues.
I have found several issues in socket.io and hope it might be useful to report here. Well, I had to hack it to solve the issues and it provided unbelievable improvement. However, I have only changed the parts I use and that means my current changes are not mergable and not compatible with existing modules.
Problem:
Room etc uses integer based arrays to keep sockets and then uses indexOf multiple times to find the index of socket. IndexOf is used many times during a single emit process. Additionally splice is used to remove a socket, and that means it re-arranges multiple big arrays for each socket changes.
I did a small simulation with 10000 sockets connected and randomly leaving, emitting, connecting etc.
(Compared socket.io against my hacked version which uses "associative array" where index and value are the same. No need for indexOf and splice (because no need to rearrange array indexes).)
When 10000 sockets are connected, emit is 2040 times!!! faster in the hacked version than current socket.io implementation. The server load is at least twice low, and memory usage is 30% less.
The latency and delay issues disappeared.
(it apparently solved another unrelated issue - when server is busy to accept a new handshake quickly, client side continues to make new polling requests and as a result, multiple connections were happening with the same socket.id. When a message comes, it was getting echoed/duplicated many times.)
I dont understand what the reason is to use integer indexed array and then search each time. As location is not known, each searches the entire array (10000 sockets multiple times for a single emit and how about 1000 sockets emit and this overhead is multiplied by 1000, a single disconnect rearranges all 10000 sockets etc * 1000).
Actually, you can run a simple test to compare [] acess against indexOf and can see the performance difference which is over 3000 times.
So, the current socket.io implementation is only suitable for a small-scale project where you don't expect more than 3000 sockets connecting. Or you will need unnecessarily have load balancing, multiple nodes etc.
The text was updated successfully, but these errors were encountered: