Skip to content

Wip: Add new lint "char_slices" #6529

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -1759,6 +1759,7 @@ Released 2018-09-13
[`cast_ref_to_mut`]: https://rust-lang.github.io/rust-clippy/master/index.html#cast_ref_to_mut
[`cast_sign_loss`]: https://rust-lang.github.io/rust-clippy/master/index.html#cast_sign_loss
[`char_lit_as_u8`]: https://rust-lang.github.io/rust-clippy/master/index.html#char_lit_as_u8
[`char_slices`]: https://rust-lang.github.io/rust-clippy/master/index.html#char_slices
[`chars_last_cmp`]: https://rust-lang.github.io/rust-clippy/master/index.html#chars_last_cmp
[`chars_next_cmp`]: https://rust-lang.github.io/rust-clippy/master/index.html#chars_next_cmp
[`checked_conversions`]: https://rust-lang.github.io/rust-clippy/master/index.html#checked_conversions
Expand Down
2 changes: 2 additions & 0 deletions clippy_lints/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -901,6 +901,7 @@ pub fn register_plugins(store: &mut rustc_lint::LintStore, sess: &Session, conf:
&types::CAST_REF_TO_MUT,
&types::CAST_SIGN_LOSS,
&types::CHAR_LIT_AS_U8,
&types::CHAR_SLICES,
&types::FN_TO_NUMERIC_CAST,
&types::FN_TO_NUMERIC_CAST_WITH_TRUNCATION,
&types::IMPLICIT_HASHER,
Expand Down Expand Up @@ -1336,6 +1337,7 @@ pub fn register_plugins(store: &mut rustc_lint::LintStore, sess: &Session, conf:
LintId::of(&types::CAST_PRECISION_LOSS),
LintId::of(&types::CAST_PTR_ALIGNMENT),
LintId::of(&types::CAST_SIGN_LOSS),
LintId::of(&types::CHAR_SLICES),
LintId::of(&types::IMPLICIT_HASHER),
LintId::of(&types::INVALID_UPCAST_COMPARISONS),
LintId::of(&types::LET_UNIT_VALUE),
Expand Down
67 changes: 66 additions & 1 deletion clippy_lints/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,34 @@ declare_clippy_lint! {
"usage of `Box<Vec<T>>`, vector elements are already on the heap"
}

declare_clippy_lint! {
/// **What it does:** Checks for use of `&[char]` anywhere in the code.
///
/// **Why is this bad?** `char` represents a Unicode codepoint, not a
/// character. Usually users of `&[char]` want O(1) indexing on characters,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs should also say something about how to index on characters (e.g. unicode_segmentation?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the issue, the correct way to index on characters is to use &[&str]. I can try to make that more clear in the documentation if that is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not talking about the type but the code.

/// not codepoints.
///
/// **Known problems:** None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dealing with FFI to other high-level languages where strings are represented by sequences of chars, the single correct way is to use &[char] or Vec<char>. This should be listed under "Known problems".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as we understand it, char should not be used at FFI boundaries. consider u32/i32 instead.

(from what we understand & and &mut shouldn't be used at FFI boundaries. basically any type that is rust ABI shouldn't be used at FFI boundaries?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found another counter-example (though nightly only): std::str::pattern::Pattern is implemented on &[char] (which basically means any char in the slice matches the pattern). There may be other use cases where having chars equidistantly distributed in memory may be of the essence. I feel the Problems section should reflect this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec<char> and &[char] is also a completely reasonable way to represent UTF32 encoded strings.

///
/// **Example:**
/// ```rust,ignore
/// struct X {
/// chars: &[char]
/// }
/// ```
///
/// Better:
///
/// ```rust,ignore
/// struct X {
/// chars: &[&str]
/// }
/// ```
pub CHAR_SLICES,
pedantic,
"usage of `&[char]`, usually you want to index characters instead of codepoints"
}

declare_clippy_lint! {
/// **What it does:** Checks for use of `Vec<Box<T>>` where T: Sized anywhere in the code.
/// Check the [Box documentation](https://doc.rust-lang.org/std/boxed/index.html) for more information.
Expand Down Expand Up @@ -252,7 +280,7 @@ pub struct Types {
vec_box_size_threshold: u64,
}

impl_lint_pass!(Types => [BOX_VEC, VEC_BOX, OPTION_OPTION, LINKEDLIST, BORROWED_BOX, REDUNDANT_ALLOCATION, RC_BUFFER]);
impl_lint_pass!(Types => [BOX_VEC, VEC_BOX, CHAR_SLICES, OPTION_OPTION, LINKEDLIST, BORROWED_BOX, REDUNDANT_ALLOCATION, RC_BUFFER]);

impl<'tcx> LateLintPass<'tcx> for Types {
fn check_fn(&mut self, cx: &LateContext<'_>, _: FnKind<'_>, decl: &FnDecl<'_>, _: &Body<'_>, _: Span, id: HirId) {
Expand Down Expand Up @@ -581,6 +609,24 @@ impl Types {
"a `VecDeque` might work",
);
return; // don't recurse into the type
} else if let TyKind::Slice(s) = hir_ty.kind {
if let TyKind::Path(ref qpath) = s.kind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be more readable using an if_chain!. See elsewhere in the code for how to use it.

if !is_local {
let hir_id = hir_ty.hir_id;
let res = qpath_res(cx, qpath, hir_id);
if let Some(def_id) = res.opt_def_id() {
if Some(def_id) == cx.tcx.lang_items().char_impl() {
span_lint(
cx,
CHAR_SLICES,
hir_ty.span,
"consider using `&[&str]` instead of `&[char]`",
);
return; // don't recurse into the type
}
}
}
}
}
}
match *qpath {
Expand Down Expand Up @@ -714,6 +760,25 @@ impl Types {
};
self.check_ty(cx, &mut_ty.ty, is_local);
},
TyKind::Slice(ref s) => {
if let TyKind::Path(ref qpath) = s.kind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the same as L614..629. Perhaps factor out a method?

if !is_local {
let hir_id = hir_ty.hir_id;
let res = qpath_res(cx, qpath, hir_id);
if let Some(def_id) = res.opt_def_id() {
if Some(def_id) == cx.tcx.lang_items().char_impl() {
span_lint(
cx,
CHAR_SLICES,
hir_ty.span,
"consider using `&[&str]` instead of `&[char]`",
);
return; // don't recurse into the type
}
}
}
}
},
_ => self.check_ty(cx, &mut_ty.ty, is_local),
}
}
Expand Down
15 changes: 15 additions & 0 deletions tests/ui/char_slices.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#![warn(clippy::char_slices)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing the corresponding char_slices.stderr file. If you run the tests, you should get a message telling you how to "bless" the result of the test, and then you can commit the file. This will likely also make CI pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because it currently does not produce any output in stderr. I think something is wrong but I have no idea what it is. If you have any ideas they would be very welcome. (This is only my second PR to clippy, and my first was relatively easy because it was almost exactly the same as three other lints).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... What I usually do in this case is to comment out lint conditions or move the lint out of other checks until output appears. Then I may reorder the checks until I've found the one that precludes linting and then I try to understand why it doesn't match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@llogiq I think I found the problem. res.opt_def_id() is never equal to cx.tcx.lang_items().char_impl(). What's another or better way to see if the inner type of a slice is char?

Copy link
Contributor

@giraffate giraffate Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO how about just using rustc_middle::ty::Char to check a type in slice? I guess the hir_ty_to_ty method can convert from rustc_hir::Ty to ty::Ty.


fn main() {
let char_slice: &[char] = &['a', 'b'];
takes_char_slice(char_slice);
let char_slice2 = returns_char_slice();
}

fn takes_char_slice(char_slice: &[char]) {
println!("{:?}", char_slice)
}

fn returns_char_slice() -> &'static [char] {
&['c', 'd']
}