-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
mhchem.js update - first version - request for discussion #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
First of all, thanks so much for your work on this so far. It is great to have the MathJax and LaTeX versions more in line. I'm also glad that the MathJax usage has inspired you to take some use cases back to the LaTeX version as well. I am not worried about the differences like staggered indices. I think we can just mention that in the change log, and it will be OK. Can you make a more complete list of the differences that would affect the use of mhchem? We will need that for the change log, if nothing else. I understand the desire to enforce your own view of correct spacing, but I'm not certain I would be too dogmatic about that. But it is your product, and if the LaTeX version enforces spacing rules, the MathJax version should as well. It might be nice to have a backward-compatibility mode, but we could always make the old mhchem available in the 3rd-party repository for those who want to continue using the previous version. I would like this new version to become the core version when it is ready. I do have some comments on the code, which I will past separately below. |
My biggest concern about the code has to do with efficiency, in particular in the It seems to me that your situation is taylor made to use prototypal inheritance for the methods and static copies of the structures, rather than dynamically creating them over and over. You are using closures in some instances, but only to get access to If you don't want to create prototypes yourself, you can use MathJax's object model itself. E.g., you could do
There are similar things happening in a number of the service routines, like in I think it would be good to reorganize the structure to create the structures once at compile time, rather than over and over at run time. |
Because the "Files changed" tab doesn't show the diff (since it is so large), I can't comment on individual lines directly, so I'll link them here. Because object and array lookups are slower than variable lookups, it is more efficient to cache object and array looks up if they are used more than once. For example, in line 97, you could use
and similarly for the other loops. |
The lookup loops nested four layers deep seem a bit inefficient to me. In particular, something like lines 106 to 108 seem like they could be handled via direct lookup in an object rather than scanning through an array item by item, and doing a string search on the inner-most loop. Something like
where instead of
you would have
This approach will find the correct action immediately without having to loop through the options. The I suspect that some of the other loops could be handled similarly. |
For non-ascii characters, like those in line 184 and line 186, you should use unicode character references rather than explicit characters, which depend on the encoding. The encoding of the scripts loaded into a page depends on the encoding of the page itself, and we don't control that, so it is safer to use numeric references, as in
Similarly, in the error messages (like line 311), you should use |
For things like the definitions of the various structures (like
While this doesn't seem like much, those 25 characters repeated over and over do add up to several hundred characters saved. That does help reduce the size of the final |
The
to make it work in both situations. |
All of that really makes sense. Thanks! I would have to learn how to do prototypal inheritance compatible with IE6. But, ... In fact, I only instantiate a new parser for keeping a local 'state' and 'buffer'. I could switch to good only procedural programming and just pass a newly created state/buffer object each time. That would solve the issues with instantiating constant objects all the time, wouldn't it? Regarding
|
I think we're willing to drop IE prior to IE9 at this point. In any case, the
That also works.
As long as the objects are not created within the routines you are calling, yes. That is, if they are all created outside the
That is fine. I was trying to make a version that could be used as an object literal rather than needing the extra variables like Thanks for being responsive to my comments. It is always traumatic to have someone else review your code. |
The update might take me a while. (Do not worry, it wasn't traumatic. That's not the reason for my slow mode. Well, one thing. I was so proud to have my code IE6 compatible and now you say it wasn't necessary. I'll get over it! 😄) |
No problem. I'm happy your version worked with IE6, and I suspect you will be able to do that for the update as well. |
I incorporated all of your suggestions.
This is the change log.
|
I have only taken a quick look, but I like what you have done. This looks much nicer, and the speed improvement is great to see. I'll give it a closer look next week, but my initial pass didn't spot anything, so I don't expect there to by any major issues. Thanks for your hard work on this! |
OK, I've gotten to go through your code more carefully, and I like what I see overall. I only have a few comments about performance. There is one fairly important change that (for the one fairly complicated expression that I tested) reduced the parsing time by 70 to 80%. That has to do with the definition of The other changes have much smaller impacts, but are still useful, and they are in the main loop in the
and then use Another minor improvement is to cache the
and similarly for the loop at line 143. The only other thing that disturbs me is the repetition of the code in lines 147-154 at lines 159-166. One solution would be to move that into a separate function that can be called in both places. Another would be to do the following:
There is a slight performance penalty to doing the test for Other than that, I think you are in good shape. Thanks for all your work on this. |
Question: I notice that you have removed |
Question: Your change from requiring |
@pkra, we need to decide how we want to handle the changes from the old to the new version. Do we want to simply replace the current version with this new one? Do we want to keep two versions for now? Do we want to move the old one to the contrib CDN and take this one? Do we want to move BOTH to the contrib CDN and not have it in the core at all? If we keep two versions in core, how should they be named? Should |
PS, some of the breaking changes include the change from |
I already have a fix for There are a few inconsistencies that I'd rather not support like Regarding A push with your code suggestions will follow later. |
- support for exotic input like ^{12}_6 C - improved adaptive spacing before macros (2 \color{red}{H2O}, 2\color{red}{H2O}) - support for amounts like 2$ H2O (2n H, 2nH, n$ H, n, 2$ H, 2) - recognize space-comma as comma
Thanks for the additional information on the inconsistencies. It sounds like you have StackExchange covered. As long as we provide a means of continuing to use the old version (even if that takes a configuration change), I think that should be sufficient for the others. @pkra, do you have any feelings on that? |
@dpvc wrote
Right. I don't know the new version yet so maybe better discuss this at the next F2F? |
@pkra wrote
I don't know all the differences, either. The initial post includes a discussion of these. @mhchem, could you say what edits you had to make to the chemistry StackExchange site? Was it just the changing the misuses of The one that concerns me is that |
Oh, I do not have a record of what I changed at StackExchange. In our first conversation you were quite relaxed regarding incompatible changes, so I did not keep a fine-graned documentation. Most of the edits were definitely regarding misuse, i.e. using it for non-chemical things. Some things were rare non-intended use, like a unicode character that no one else ever used. Some things were bugs of the old version, e.g. And some things were really incompatiblities like All my edits are listet at http://chemistry.stackexchange.com/users/24052/mhchem?tab=activity&sort=all (Often, it's hard to really see the changes unless you switch to markdown view). At the beginning, I made some edits that I thought will become incompatibilities, but I later decided to support this kind of input. Would you think it is necessary that I revisit all my edits and categorize them? |
No, sorry, I just thought you might remember what the main issues were. If most were incorrect usage, then I'm not concerned. I just thought it would help us decide how to proceed if we knew the nature of the things that might go wrong when the new version is in place. |
Just to be clear, I don't think you need to go back through your edits to categorize. |
The performance improvements by your suggestions are quite impressive. Thanks! (Regarding the 'patterns' object, I just changed indentation. Surprisingly, the diff algorithm cannot the that.) |
Your changes look good. I think it is ready to go (though you could still cache |
Superseded by mathjax/MathJax-third-party-extensions#28 |
And also #1537 |
This is a "request for discussion" pull-request, because there is still some fine-tuning I'd like to do.
I am the inventor and author of the LaTeX package mhchem. MathJax includes an mhchem implementation that, over time, got a little bit disconnected from the LaTeX package. Having talked to dpvc and pkra, I'd like to take over maintenance of MathJax/mhchem and bring it up-to-date.
It resulted in a complete rewrite of the core part. This is my first take. Please discuss.
There are some inconsistencies. The most notable change is the switch to staggered charges, as has become the standard in recent years (IUPAC, many publishers). But there are a few other differences on close inspection. For testing and comparison, I downloaded 43.5k usages of \ce from chemistry.stackexchange.com. I went through 2k of them manually and checked the rest by script. A few hundred of these would break visually and I already fixed them on StackExchange. (Most of them were mere misuse of \ce, using it for non-chemical stuff.)
One thing, that still bothers me, are the reaction arrows. I consider it an important feature of mhchem, to have very readable input, not just nice output. Therefore, I would like to enforce a space before and after an reaction arrow. What keeps me from doing this, are the 500 or so StackExchange posts that I would need to edit. Still thinking. I don't know of any other large consumer of mhchem and would like to get it just right, just before others, like Wikipedia catch on.
As for this discussion, please comment on the code (quality, compatibility, style). Please discuss whether this should (with some incompatibilities!) replace the current mhchem extension or if this should become a 3rd-party extension.
BTW. It was not only a one-way street to bring LaTeX/mhchem features to MathJax/mhchem. I also have a list of 10 or so use cases, that I saw people using with MathJax, that I want to backport to LaTeX.