Skip to content

Ignore sentence terminators inside quotes when applying the 'BeginDocumentationCommentWithOneLineSummary' option. #687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
//===----------------------------------------------------------------------===//

import Foundation
#if os(macOS)
import NaturalLanguage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add #if os(macOS) around this import to match where it's used, since it won't exist on Linux.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed it.

#endif
import SwiftSyntax

/// All documentation comments must begin with a one-line summary of the declaration.
Expand Down Expand Up @@ -125,13 +128,31 @@ public final class BeginDocumentationCommentWithOneLineSummary: SyntaxLintRule {
}

var sentences = [String]()
var tags = [NLTag]()
var tokenRanges = [Range<String.Index>]()
let tags = text.linguisticTags(

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
tagger.enumerateTags(
in: text.startIndex..<text.endIndex,
scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
tokenRanges: &tokenRanges)
unit: .word,
scheme: .lexicalClass
) { tag, range in
if let tag {
tags.append(tag)
tokenRanges.append(range)
}
return true
}

var isInsideQuotes = false
let sentenceTerminatorIndices = tags.enumerated().filter {
$0.element == "SentenceTerminator"
if $0.element == NLTag.openQuote {
isInsideQuotes = true
} else if $0.element == NLTag.closeQuote {
isInsideQuotes = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to point out that this won't handle nested quotes correctly, but it looks like the Foundation API itself doesn't categorize them correctly either:

func tags(_ text: String) {
  var tokenRanges = [Range<String.Index>]()
  let tags = text.linguisticTags(in: text.startIndex..<text.endIndex, scheme: NSLinguisticTagScheme.lexicalClass.rawValue, tokenRanges: &tokenRanges)
  print(Array(tags))
}
tags("\"Hello 'world'\"")
// ["OpenQuote", "Interjection", "Whitespace", "OpenQuote", "Noun", "CloseQuote", "OpenQuote"]

So an attempt to increment/decrement a nested quote count won't work as intended. In other words, I wanted to point it out but you don't need to change anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is entirely dependent on the behavior of the Foundation API.
And since linguisticTags is a deprecated API, it's unlikely that we'll see any improvements to it 🤔

The replacement, NaturalLanguage, works the same as linguisticTags, but I've migrated to it in case it's improved in the future.

}
return !isInsideQuotes && $0.element == NLTag.sentenceTerminator
}.map {
tokenRanges[$0.offset].lowerBound
}
Expand All @@ -152,8 +173,8 @@ public final class BeginDocumentationCommentWithOneLineSummary: SyntaxLintRule {
/// Returns the best approximation of sentences in the given text using string splitting around
/// periods that are followed by spaces.
///
/// This method is a fallback for platforms (like Linux, currently) where `String` does not
/// support `NSLinguisticTagger` and its related APIs. It will fail to catch certain kinds of
/// This method is a fallback for platforms (like Linux, currently) that does not
/// support `NaturalLanguage` and its related APIs. It will fail to catch certain kinds of
/// sentences (such as those containing abbreviations that are followed by a period, like "Dr.")
/// that the more advanced API can handle.
private func nonLinguisticSentenceApproximations(in text: String) -> (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,34 @@ final class BeginDocumentationCommentWithOneLineSummaryTests: LintOrFormatRuleTe
)
#endif
}

func testSentenceTerminationInsideQuotes() {
assertLint(
BeginDocumentationCommentWithOneLineSummary.self,
"""
/// Creates an instance with the same raw value as `x` failing iff `x.kind != Subject.kind`.
struct TestBackTick {}

/// A set of `Diagnostic` that can answer the question ‘was there an error?’ in O(1).
struct TestSingleSmartQuotes {}

/// A set of `Diagnostic` that can answer the question 'was there an error?' in O(1).
struct TestSingleStraightQuotes {}

/// A set of `Diagnostic` that can answer the question “was there an error?” in O(1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this example and the one below use smart quotes. Can you add one with regular ASCII (straight) double quotes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add it!

struct TestDoubleSmartQuotes {}

/// A set of `Diagnostic` that can answer the question "was there an error?" in O(1).
struct TestDoubleStraightQuotes {}

/// A set of `Diagnostic` that can answer the question “was there
/// an error?” in O(1).
struct TestTwoLinesDoubleSmartQuotes {}

/// A set of `Diagnostic` that can answer the question "was there
/// an error?" in O(1).
struct TestTwoLinesDoubleStraightQuotes {}
"""
)
}
}