Skip to content

user_agent plugin failed parsing fields in Filebeat CI tests #48318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaiyan-sheng opened this issue Oct 21, 2019 · 8 comments
Closed

user_agent plugin failed parsing fields in Filebeat CI tests #48318

kaiyan-sheng opened this issue Oct 21, 2019 · 8 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@kaiyan-sheng
Copy link

kaiyan-sheng commented Oct 21, 2019

Beats CI filebeat tests start failing caused by something changed with the user_agent plugin or regex, specifically user_agent.os.nameuser_agent.version and user_agent.name field.

For example: user_agent original message Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR[ 2.0.50727](tel: 2050727); .NET CLR 3.0.30729) is parsed and showing user_agent.version=11.0, which 11.0 is nowhere to be found in the original message. Please see https://travis-ci.org/elastic/beats/jobs/600418441#L2296 for more details.

More user agent lines that failed:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0
Wget/1.13.4 (linux-gnu)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:49.0) Gecko/20100101 Firefox/49.0

More unit tests probably need to be added, especially for windows related https://github.com/elastic/elasticsearch/blob/master/modules/ingest-user-agent/src/test/java/org/elasticsearch/ingest/useragent/UserAgentProcessorTests.java#L93

@kaiyan-sheng kaiyan-sheng added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Oct 21, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@kaiyan-sheng
Copy link
Author

I'm currently working on disabling the specific checks in Beats elastic/beats#14179.

@jsoriano
Copy link
Member

Find here the differences in expectations in Beats test files: https://github.com/elastic/beats/pull/14190/files

Most cases can be summarized in these ones, I think that the first two may worth investigating.

  • Dots added after the version:
-        "user_agent.version": "50.0"
+        "user_agent.version": "50.0."
  • Some versions changed number:
         "user_agent.original": "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; WOW64; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.30729; .NET CLR[ 2.0.50727](tel: 2050727); .NET CLR 3.0.30729)",
-        "user_agent.os.name": "Windows 8.1",
-        "user_agent.version": "7.0"
+        "user_agent.os.full": "Windows 8.1",
+        "user_agent.os.name": "Windows",
+        "user_agent.os.version": "8.1",
+        "user_agent.version": "11.0"
  • More details in some user agent (they look good):
-        "user_agent.os.name": "Windows 7",
+        "user_agent.os.full": "Windows 7",
+        "user_agent.os.name": "Windows",
+        "user_agent.os.version": "7",
  • More detailed versions (they look good too):
-        "user_agent.version": "54.0.2840"
+        "user_agent.version": "54.0.2840.98"

@jakelandis
Copy link
Contributor

@jsoriano @kaiyan-sheng - It looks like #47807 is the catalyst here. We use reg-exes from https://github.com/ua-parser/uap-core which were recently updated. I hesitant to roll back the changes introduced there since UA strings are a moving target and we should move with them. If you find errors in the parsing we should contribute that back upstream and update our parser config.

Maybe the Beats tests need to be updated with more modern UA strings ?

cc @spinscale

@adriansr
Copy link

adriansr commented Oct 24, 2019

The one with the extra dot at the end of the version number is Firefox:

        "user_agent.original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0",
-       "user_agent.version": "50.0"
+       "user_agent.version": "50.0."

@jsoriano
Copy link
Member

jsoriano commented Oct 24, 2019

I hesitant to roll back the changes introduced there since UA strings are a moving target and we should move with them. If you find errors in the parsing we should contribute that back upstream and update our parser config.

Agree, I wouldn't roll back the changes. But I wonder if the new values for these Windows/IE versions (from 7.0 to 11.0) are expected.

Maybe the Beats tests need to be updated with more modern UA strings ?

Yep, we have done that by now (elastic/beats#14190).

The one with the extra dot at the end of the version number is Firefox:

But is the dot at the end expected?

@adriansr
Copy link

But is the dot at the end expected?

I guess not, it doesn't make much sense. I was just adding an example for what looks like another error

@dakrone
Copy link
Member

dakrone commented May 17, 2024

This has been open for quite a while, and we haven't made much progress on this due to focus in other areas. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@dakrone dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

7 participants