Skip to content

Explain UTF-8 BOM rule in readme #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mk-pmb opened this issue Nov 18, 2017 · 12 comments
Open

Explain UTF-8 BOM rule in readme #1640

mk-pmb opened this issue Nov 18, 2017 · 12 comments
Labels

Comments

@mk-pmb
Copy link

mk-pmb commented Nov 18, 2017

Most style decisions are explained in the readme, but I couldn't find the reasoning on why a BOM is considered bad. Where I've looked:

  • the patch that enabled it, found no commit message body.
  • the current code of the rules file,
    // require or disallow the Unicode Byte Order Mark
    // https://eslint.org/docs/rules/unicode-bom
    'unicode-bom': ['error', 'never'],
  • the eslint rule page mentioned in the patch code comment, doesn't claim it's bad.
  • searched the readme for "BOM", "unicode" and "byte order"
  • searched issue tracker for "BOM", "unicode" and "byte order"

Could someone explain it, or add search keywords to make the explanation easier to find?

Update: Also, is there a recommendation on how to declare the file encoding instead? I searched the readme for "charset", "encod" and "character set" but no matches.

@ljharb
Copy link
Collaborator

ljharb commented Nov 18, 2017

The rule page says "UTF-8 does not require a BOM because byte ordering does not matter when characters are a single byte. Since UTF-8 is the dominant encoding of the web, we make "never" the default option." - the reasoning for choosing "never" is because files should always only be in UTF-8.

Are you running into issues with this rule?

@mk-pmb
Copy link
Author

mk-pmb commented Nov 19, 2017

Since UTF-8 is the dominant encoding of the web, we make "never" the default option.

Yes, that's what I found there as well. It's ok as a default for eslint. I thought the patch repeats this because there had been a stronger reason for "never".
In my projects I like BOMs because Firefox will trust the BOM more than it trusts the Content-Type header, thus it protects my files' encoding when used on webspace that announces another charset (e.g. legacy website). It's also helpful on webspace that doesn't send any charset info (e.g. python -m SimpleHTTPServer), or not even any Content-Type (file:// access).

To reproduce, run Python 2.7.6 and Firefox 57 on Ubuntu trusty with system locale en_US.UTF-8. Python's SimpleHTTPServer sends just "text/plain" without a charset, probably because it doesn't consider itself authoritative to guess more details. Both this way and when loading via file://, without a BOM, Firefox guesses "windows-1252" and consequently garbles umlauts and emoji.

@ljharb
Copy link
Collaborator

ljharb commented Nov 19, 2017

In general, this config repeats all the defaults explicitly.

I'd say that your solution should be to use a build process to auto-insert the BOM for you, rather than encoding that directly in the file.

Separately, lots of web features are broken on file://, so it shouldn't be used for any reason ever anyways.

@mk-pmb
Copy link
Author

mk-pmb commented Nov 19, 2017

your solution should be to use a build process

I could imagine an argument for separation of concerns: The transport and compatibility issues should be solved by some other mechanism because the code files should only be concerned with behavior.
Are there other reasons to suggest a build process in general, independent of project details?

@ljharb
Copy link
Collaborator

ljharb commented Nov 19, 2017

Modern web dev requires a build process anyways (for minification, babel, etc) - it has for years, and it will for the foreseeable future.

@mk-pmb
Copy link
Author

mk-pmb commented Nov 19, 2017

I think those are reasons worthy to be mentioned in the style guide. How about this?
"Your projects should use a build process so you can easily plug in a linter, transpiler, minification etc. Dealing with encoding issues in the source files (e.g. UTF-8 BOM to indicate Unicode) thus is a code smell for a lack of tooling."

Update: Changed the "ing"s to "er"s to match the search keywords.

@ljharb
Copy link
Collaborator

ljharb commented Nov 19, 2017

I guess that's fine; this isn't something that almost anybody runs into because almost everyone uses tools that assume UTF-8. Want to send a PR?

@mk-pmb
Copy link
Author

mk-pmb commented Nov 19, 2017

ok, PR coming up later.

@galvarez421
Copy link

@ljharb
Copy link
Collaborator

ljharb commented Sep 19, 2019

Your SO link contains a link to a vscode extension that fixes the vscode bug.

@galvarez421
Copy link

In my case, it has been easier to disable the rule so that other developers working on the projects using the config don't have to install an extension or otherwise configure things specifically to satisfy the rule (granted, there may be reasons to keep the rule enabled, but I haven't run into them). I mostly mentioned the Visual Studio case in response to your question to the OP ("Are you running into issues with this rule?") and in case it's considered worth consideration, given the popularity of Visual Studio.

I agree with @mk-pmb's suggestion that the documentation should explain why exactly BOM is disallowed. Given that the only options for the rule are "always" or "never", it's clear why the default is "never", given your explanation and the explanation in the rule page. However, I don't think it's clear why the Airbnb config enables the rule as an error. The rule page says that "UTF-8 does not require a BOM" but it's not clear why that should translate to the BOM being disallowed.

@mk-pmb
Copy link
Author

mk-pmb commented Sep 19, 2019

See my PR #1643 for potential reasons.

@j0pgrm j0pgrm mentioned this issue Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants