-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Please note: protobuf.js 6 will be a complete rewrite #485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very excited about this, looking forward to it @dcodeIO! |
I hope there is a high-performance toJSON method for protos. Currently, we do toRaw and fix up the Longs. toRaw is almost all what our profiles show. If with the rewrite if performance is improved that will be an added plus! |
I haven't yet touched toJSON on message instances at all and I also hate the current one. Looking forward to your ideas to get it right this time! |
if you haven't read this https://www.reaktor.com/blog/javascript-performance-fundamentals-make-bluebird-fast/ |
You can now have a look for yourself: https://github.com/dcodeIO/protobuf.js |
It looks much more readable. I haven't dug through in detail yet, but I generally read through most of the implementations of types and the encoding/decoding portion. The reflection part doesn't looks that much different with $type. What I want to check is the generated files, which I haven't get had a chance to look at. The goal should be that any code that avoids using reflection should run very fast. Maybe some performance tests will help. |
There is no code generation, yet. What is there, currently, is a way to properly extend the runtime message The idea here is that it should be possible to replace even that with a custom implementation, specifically built for the type of message you are using, through code generation. How this will look is still tbd. Also, there are multiple possible ways of doing this: You could either just wrap your custom class around reflection, either manually or generated, or generate code that uses |
All these sounds good. One more wishlist I want to through in the ring is to create the generated classes with an immutable + builder interface. So that there are no setters on the decoded objects and a builder pattern which always does copy. You might want other users of the library to weigh in on this. This wish list is based on some of the production bugs that I have seen in the past (silly mistakes) that could have been loud failures. I suspect (without proof and 2nd hand information) the Javascript engines can detect those immutable objects and convert that to c-structs. Could be a worthwhile performance improvement |
I see. At the moment, there are no setters and no methods on runtime message instances. These just extend Prototype and have some convenience methods on the constructor. Wanted that as the default when working with reflection and it's up to the user to decorate their message instances. see: https://github.com/dcodeIO/protobuf.js/blob/master/src/prototype.js It does however use references and does not copy in its current state by default. See the constructor. You are not forced to call that constructor, though, and can initialize message instances another way. |
Can we do this in Typescript pretty please with proper modularity? |
The library itself is plain JS, but first class typescript support (.d.ts for the library and codegen for .ts) is definitely something I'd like to have. |
If you're rewriting it it's far easier to do it in typescript and publish it as a compiled library with auto generated .d.ts |
You might have missed this: https://github.com/dcodeIO/protobuf.js/tree/master I understand your desire for a full TypeScript implementation, I thought about this too, but because of the amount of low level stuff going on I decided against this. |
Noticed that @englercj already made a d.ts generator on top of jsdoc. First bits here: https://github.com/dcodeIO/protobuf.js/tree/master/types Still some issues with multi dimensional arrays and generic promises, but looks good for a start. Edit: Added a simple post-process step for now that makes it a valid .d.ts. |
I know your pain from trying to do things like arbitrary class constructors in the past with typescript. In the end the code is the same - my suggestion for typescript is mostly around development ease but it looks like you're doing just fine with raw JS. I like the .d.ts, looks good. Looking forward to testing out the new code soon. |
One possible consideration here is first class gRPC service support. Having the proper hooks to support use cases like https://github.com/grpc-ecosystem/grpc-gateway and/or https://github.com/tmc/grpc-websocket-proxy would be nice. With regard to code generation, is your intent to generate typed clients given a defined proto service? This seems like it would be quite useful. I have some basic work in generating js client code here: https://github.com/tmc/grpcutils (basic elm and flow type generation). |
@tmc also note http://GitHub.com/paralin/grpc-bus |
If we can do anything to simplify how they use the library, yeah, then we should probably do it. But we should make sure that we don't add functionality to core that actually should be a library on its own. Regarding code generation: What I am currently trying to do is to design a minimum common interface that could then be used to generate entire projects of JS or TypeScript classes. This can be done in a couple of different ways of course, like either just wrapping reflection or actually emitting message-specific encoders and decoders on top of the new reader and writer interface. Today, I have some good and some bad news alike: I've just added getters and setters to the runtime message prototype as this seems to be the only working solution to handle oneof fields properly. That's because any fields that are part of a oneof have side effects on other fields of the oneof. The bad news is that, to make this work, I had to introduce a hidden property Example: https://github.com/dcodeIO/protobuf.js/blob/master/scripts/sandbox.js#L49 |
I would vote for generating message-specific encoders and decoders on top of the new reader and writer interface (if there is proof that performance is there). So before going too deep, maybe a prototype and benchmarks would help. From a code size point of view the post gz-compression size may not be that different of generated code and that wrapping reflection. |
I too am of the opinion that this lib should focus on code-gen and not so much reflection. Ideally it would work just like protoc (or a plugin for protoc), in that I got generated code that did all the things necessary. Even better would be support for the different
|
My vision for Something like: var codegen = require("protobufjs-codegen"),
Reader = codegen.Reader,
Writer = codegen.Writer;
function AwesomeMessage(properties) {
// Generated initializer specific to this message
}
AwesomeMessage.encode = function(instance) {
// Generated encoder specific to this message
return buffer;
};
AwesomeMessage.decode = function(buffer) {
// Generated decoder specific to this message
return instance;
}; But as I said, that should be hardly faster than a reflection based encoder/decoder once jit is warmed up. This eliminates a couple of function calls, which is relevant in a native environment, but within JS, there are other, more important bottlenecks that cannot be circumvented most of the time. If there is a relevant speed boost, than by skipping even the reader/writer interface and intentionally not supporting specific environments. As a reference, this is how a type encodes/decodes currently: |
I am currently getting an issue with the following: syntax = "proto3";
import "google/protobuf/descriptor.proto";
//-------------------------------------------------------------------------------
extend google.protobuf.MessageOptions
{
// The string name of the message to extend
string extends = 21000;
// When set the message is only available on the server
// (probably used for server-to-server communication)
boolean is_internal = 21001;
} The error is:
Are extensions not ready yet, I use custom options for messages, services, and methods. I also tried to use options without first defining them in an extension and got a different error:
Any idea when options/extensions will make it in? |
There has been an issue with parsing extend fields with no rule. This should be fixed now. Side note: Added code generation directly into reflection. Nearly as fast as native JSON for a simple test case identical with that of the smart guy with the idea. |
I now also added a graceful fallback if the environment throws on code generation for whatever reason. This adds some size to the library, but it's surely worth it. Still todo: MapField#encode/decode Now, the reasons that it's still slower than JSON probably are that there is still some checking going on and that we have the reader/writer abstraction. Ideas? |
Still getting an error, though a different one now.
This is thrown from here: NamespacePrototype.add = function add(object) {
if (!object || nestedTypes.indexOf(object.constructor) < 0)
throw util._TypeError("object", nestedError);
|
Ah, currently it doesn't accept fields declared within namespaces, which is valid for extension fields I assume. Will fix that soon. Edit: Should be working now. |
Here is the profile for the bench.js run as
|
Profiled a bit and optimized writer forking. Also added codegen for Type#decode. Run the benchmark again, guys. |
Now working on single-function decoders. That should speed things up even more and could be reused for codegen of custom classes. function _Package$decode($resolvedTypes, $toHash, reader, message, limit) {
"use strict";
while (reader.pos < limit) {
var tag = reader.tag();
switch (tag.id) {
case 1:
message.$values["name"] = reader.string();
break;
case 2:
message.$values["version"] = reader.string();
break;
case 3:
message.$values["description"] = reader.string();
break;
case 4:
message.$values["author"] = reader.string();
break;
case 5:
message.$values["license"] = reader.string();
break;
case 6:
var type = $resolvedTypes[5];
message.$values["repository"] = type.decodeDelimited_(reader, type._constructor ? new type._constructor() : Object.create(type.prototype));
break;
case 7:
message.$values["bugs"] = reader.string();
break;
case 8:
message.$values["main"] = reader.string();
break;
case 9:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.string();
else
throw Error('illegal wire format for .Package.bin');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["bin"] = map;
break;
case 10:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.string();
else
throw Error('illegal wire format for .Package.scripts');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["scripts"] = map;
break;
case 11:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.string();
else
throw Error('illegal wire format for .Package.dependencies');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["dependencies"] = map;
break;
case 12:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.string();
else
throw Error('illegal wire format for .Package.optionalDependencies');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["optionalDependencies"] = map;
break;
case 13:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.string();
else
throw Error('illegal wire format for .Package.devDependencies');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["devDependencies"] = map;
break;
case 14:
var length = reader.uint32(), map = {};
if (length) {
length += reader.pos;
var keys = [], values = [], ki = 0, vi = 0;
while (reader.pos < length) {
tag = reader.tag();
if (tag.id === 1)
keys[ki++] = reader.string();
else if (tag.id === 2)
values[vi++] = reader.bool();
else
throw Error('illegal wire format for .Package.browser');
}
var key;
for (ki = 0; ki < vi; ++ki)
map[typeof (key = keys[ki]) === 'object' ? $toHash(key) : key] = values[ki];
}
message.$values["browser"] = map;
break;
case 15:
message.$values["types"] = reader.string();
break;
default:
reader.skipType(tag.wireType);
break;
}
}
return message;
} |
Woot! The benchmarks show about 2x speedup for decode and about 1.25x for encode over native JSON. That's glorious! Once the library is more feature complete with extensions and oneofs, I have some large deeply nested binary serialized data from our production setup which I would love to benchmark. |
Theoretically, oneofs and extensions are already in, but there are not enough test cases yet to make sure it's working. |
In current head (45d3e73) I am getting this error using extensions:
Using this protobuf: syntax = "proto3";
import "google/protobuf/descriptor.proto";
//-------------------------------------------------------------------------------
extend google.protobuf.MessageOptions
{
boolean some_option = 21000;
}
extend google.protobuf.MethodOptions
{
boolean some_option = 21000;
}
extend google.protobuf.ServiceOptions
{
boolean some_option = 21000;
}
//-------------------------------------------------------------------------------
message Something {
option (some_option) = true;
}
service MyService {
option (some_option) = true;
rpc update (Something) returns (Something) {
option (some_option) = false;
}
} Extension fields seem to be going on Root, but I do have the same name extension field on different options. |
As far as I know every extension field is referenced by its unique name within the namespace it is declared in. Are you sure that this is valid? |
Interesting, seems like protoc also throws for the duplicate extensions. I could've swore that I've done this before though (I definitely did with v5). Thanks! |
Well now that I changed that, it is giving me an error importing This compiles: syntax = "proto3";
// import "google/protobuf/descriptor.proto";
//-------------------------------------------------------------------------------
extend google.protobuf.MessageOptions
{
bool some_option1 = 21000;
}
extend google.protobuf.MethodOptions
{
bool some_option2 = 21000;
}
extend google.protobuf.ServiceOptions
{
bool some_option3 = 21000;
} But this does not: syntax = "proto3";
import "google/protobuf/descriptor.proto";
//-------------------------------------------------------------------------------
extend google.protobuf.MessageOptions
{
bool some_option1 = 21000;
}
extend google.protobuf.MethodOptions
{
bool some_option2 = 21000;
}
extend google.protobuf.ServiceOptions
{
bool some_option3 = 21000;
} With error:
Additionally, this does not compile: syntax = "proto3";
// import "google/protobuf/descriptor.proto";
//-------------------------------------------------------------------------------
extend google.protobuf.MessageOptions
{
bool some_option1 = 21000;
}
extend google.protobuf.MethodOptions
{
bool some_option2 = 21000;
}
extend google.protobuf.ServiceOptions
{
bool some_option3 = 21000;
}
//-------------------------------------------------------------------------------
message Something {
option (some_option1) = true;
}
service MyService {
option (some_option3) = true;
rpc update (Something) returns (Something) {
option (some_option2) = false;
}
} With error:
Note that the final proto, with the google import enabled works with protoc. |
The google import still requires the descriptor file to be present. The internal types are google.prototbuf.Any etc, but don't contain the descriptor. Looks to me like google is also moving away from the descriptor as there are now types, wrappers etc. present outside of it, too. Attempted to fix the parse error with the last commit. |
Well, protoc fails without With latest head I still get an error:
The object is of type |
Yep, should also be fixed now. Regarding descriptor.proto: Just saying that you still need this externally as it is not included (besides Any etc., which is included). |
Awesome, it is parsing now. I'll let you know if anything else comes up, thanks! |
Looking good guys. I already do a significant Proto -> JS + TypeScript codegen step in all of my stuff, like here: https://github.com/paralin/grpc-bus/blob/master/scripts/gen_proto.bash I think I will try to adopt v6 in my development projects and see if I can get some kind of TypeScript interface generation working well. I shouldn't have to do much assuming I can still reflect over everything like before. |
Alright guys, I finally decided not to go for any hidden properties on message instances. Instead, every message instance now also is a normal object with non-enumerable getters and setters for virtual oneof fields and plain default values on the prototype - plus the usual non-enumerable This also means that you have to take care when setting fields that are part of a oneof, because if you set multiple, the encoder won't know. In such a case, it's now recommended to set the virtual oneof field to the present field's name, which as a side effect clears other fields. |
Added a better benchmark test case and it turned out as expected. More precisely it now shows slower encoding but faster decoding than native JSON. If you have any ideas how to fasten up encoding a tiny bit more, let me know! |
One more wish list: Please add a method toJSONObject() that directly converts proto to the json obj without needing to encodeJSON and doing JSON.parse. |
What do you think of this one? |
That works! + function bench_protobuf_asjson() {
+ function TestMessage(properties) {
+ protobuf.Prototype.call(this, properties);
+ }
+ protobuf.inherits(TestMessage, Test);
+ var testPb = new TestMessage(data);
+
+ var start = Date.now();
+ for (var i = 0; i < times; ++i) {
+ var json = testPb.asJSON({ long: Number, enum: String });
+ }
+ summarize("encode protobuf." + "js asJson(Number, String)", start, '');
+ var start = Date.now();
+ for (var i = 0; i < times; ++i) {
+ var json = testPb.asJSON({ long: Number, enum: Number });
+ }
+ summarize("encode protobuf." + "js asJson(Number, Number)", start, '');
+ console.log();
+ var start = Date.now();
+ for (var i = 0; i < times; ++i) {
+ var json = testPb.asJSON({ long: String, enum: Number });
+ }
+ summarize("encode protobuf." + "js asJson(String, Number)", start, '');
+ console.log();
+ var start = Date.now();
+ for (var i = 0; i < times; ++i) {
+ var json = testPb.asJSON({ long: String, enum: String });
+ }
+ summarize("encode protobuf." + "js asJson(String, String)", start, '');
+ console.log();
+ }
+
|
Note that What exactly are you trying to accomplish there again? |
Ugh. I was trying 4 different variations (number long, string longs, number enum and string enum), will update the benchmark. |
Updated the benchmark (above) performance is similar. Just shows decode + asJSON is faster than JSON.parse which is no brainer for many who wishes to keep the code in JSON while using PB to transport |
asJSON basically just makes sure that you get an object that is properly JSON.stringifyable. If you do not intend to stringify it, you can just access the properties on the message without calling asJSON at all. Every message also is an object with all the fields as properties. |
Yes that works on many parts of our app that is purely in JS. The backstory of the trouble we have is
So to work around these things that we had to keep the redux state in pure JSON and convert the proto before it is set on the redux store. |
Please also take this into account if you consider contributing.
Key points:
Element#verifyValue
and similar stuffWhy?
Will create a new branch once I have it working to some extend. Looking forward to read your ideas, especially on code generation.
The text was updated successfully, but these errors were encountered: