Skip to content

Two approaches for clipping detection: clipping_score and clipping_peaks #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Mar 2, 2023

Conversation

claudiodsf
Copy link
Member

@claudiodsf claudiodsf commented Feb 23, 2023

This PR adds two approaches for clipping detection:

  1. Checking for the presence of peaks in the first and last few bins of the kernel density instead of only the presence of more than one peak to identify clipping (by @krisvanneste)
  2. Computing a clipping score based on the shape of the kernel density

Description of the clipping score approach:

The algorithm is based on the following steps:

  1. The trace is detrended and demeaned. Optionally, the trace baseline can be removed.

  2. A kernel density estimation is performed on the trace amplitude values.

  3. Two weighted kernel density functions are computed:
    - a full weighted kernel density, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 1 and 100.
    - a weighted kernel density without the central peak, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 0 and 100.
    In both cases, the weight gives more importance to samples far from the zero mean value. In the second case, the central peak is ignored.

  4. The score, ranging from 0 to 100, is the sum of the squared weighted kernel density without the central peak, normalized by the sum of the squared full weighted kernel density. The score is 0 if there is no additional peak beyond the central peak.

@claudiodsf claudiodsf mentioned this pull request Feb 23, 2023
@claudiodsf
Copy link
Member Author

Here's the same example discussed in this comment, but with the approach proposed here:

WI ABD 00 HHE_score

@krisvanneste
Copy link
Collaborator

Claudio,
Thank you!
I will look at your new approach and test it on my examples in the next few days.

@claudiodsf
Copy link
Member Author

I finally decided to add a remove_baseline config parameter.
This parameters has two effects:

  1. removes the trace baseline after instrument correction and before filtering
  2. removes the baseline on the raw trace when computing the clipping score

@krisvanneste
Copy link
Collaborator

Claudio,

I tested your new clip detection method on all my events that contain 1 or more clipped stations. I copied the output below.
I should study the results in more detail, but at first sight they are again mixed:

  • the results obtained with remove_baseline=True are sometimes very bad (but not always)
  • it's not possible to define a clipping score that will unambiguously separate clipped records from unclipped ones.

I noticed you use savgol_filter when remove_baseline=True. What is it doing exactly? From the name of the option, I supposed it would be detrending, but I'm not sure this is what it does. I also apply savgol_filter in my code to determine inflection points in the signal, which is quite different from removing a baseline. Is it possible to show its effect on the debug trace plot?

Note: in the output below, stations marked with * are actually clipped. For each event and phase, I also report the minimum clipping score obtained for stations that are actually clipped, and the maximum clipping score for stations that are not clipped. Ideally, the first one should be higher than the second.

Event #987, P-window, remove_baseline=False:
BE.AUL*: 88.3732
BE.BEBN*: 96.2525
BE.BOU: 18.619
BE.BRQ*: 97.5209
BE.CLA*: 97.0883
BE.DOU: 58.8019
BE.HU1: 47.6094
BE.HUM*: 90.1658
BE.LCH: 64.4281
BE.LES: 14.9836
BE.MEMS*: 92.8565
BE.RQR: 38.7125
BE.SNF*: 95.9347
BE.STI*: 95.6706
BE.UCCS: 47.1577
BE.VIA: 37.5718
BE.WIB: 45.8856
BE.WLF: 83.9605
BE.ZEV: 15.2854
Min. clipping score (clipped): 88.3732
Max. clipping score (unclipped): 83.9605

Event #987, P-window, remove_baseline=True:
BE.AUL*: 19.0193
BE.BEBN*: 92.5741
BE.BOU: 5.83271
BE.BRQ*: 72.3471
BE.CLA*: 50.8514
BE.DOU: 9.1741
BE.HU1: 9.98621
BE.HUM*: 72.0887
BE.LCH: 9.49827
BE.LES: 10.3977
BE.MEMS*: 85.7856
BE.RQR: 3.76731
BE.SNF*: 81.2068
BE.STI*: 95.2516
BE.UCCS: 12.3157
BE.VIA: 45.0971
BE.WIB: 6.65135
BE.WLF: 7.40844
BE.ZEV: 4.43014
Min. clipping score (clipped): 19.0193
Max. clipping score (unclipped): 45.0971

Event #987, S-window, remove_baseline=False:
BE.AUL*: 94.743
BE.BEBN*: 98.2564
BE.BOU: 55.4312
BE.BRQ*: 96.5371
BE.CLA*: 94.8211
BE.DOU*: 92.7378
BE.HU1: 50.9763
BE.HUM*: 94.259
BE.LCH*: 97.0442
BE.LES: 30.2255
BE.MEMS*: 96.4927
BE.RQR: 92.2359
BE.SNF*: 94.4072
BE.STI*: 95.7813
BE.UCCS: 15.7692
BE.VIA: 53.8325
BE.WIB: 33.2392
BE.WLF: 89.1979
BE.ZEV*: 92.5521
Min. clipping score (clipped): 92.5521
Max. clipping score (unclipped): 92.2359

Event #987, S-window, remove_baseline=True:
BE.AUL*: 91.2837
BE.BEBN*: 96.0008
BE.BOU: 3.98126
BE.BRQ*: 89.3783
BE.CLA*: 89.5306
BE.DOU*: 82.0006
BE.HU1: 10.9608
BE.HUM*: 88.5418
BE.LCH*: 86.2451
BE.LES: 1.43242
BE.MEMS*: 76.9253
BE.RQR: 48.1142
BE.SNF*: 90.0213
BE.STI*: 91.0632
BE.UCCS: 12.5697
BE.VIA: 1.51434
BE.WIB: 3.00229
BE.WLF: 4.64262
BE.ZEV*: 74.5816
Min. clipping score (clipped): 74.5816
Max. clipping score (unclipped): 48.1142

Event #1108, P-window, remove_baseline=False:
BE.BEBN: 60.6561
BE.BOU*: 99.3243
BE.BRQ*: 94.8579
BE.CLA: 56.0279
BE.DOU: 11.2556
BE.HUM: 8.31139
BE.KLB: 28.3073
BE.LES: 24.3511
BE.MEMB: 44.4219
BE.MEMS: 47.7056
BE.MEU: 19.799
BE.PLT: 27.8138
BE.ROB: 16.3324
BE.RQR: 37.2983
BE.SKQ: 8.84353
BE.SNF: 33.74
BE.STI: 15.1359
BE.UCCS: 15.938
BE.VIA: 12.9943
BE.WIB: 41.7449
BE.WLF: 32.805
BE.WOR: 20.7935
Min. clipping score (clipped): 94.8579
Max. clipping score (unclipped): 60.6561

Event #1108, P-window, remove_baseline=True:
BE.BEBN: 43.0158
BE.BOU*: 90.5954
BE.BRQ*: 71.313
BE.CLA: 54.7753
BE.DOU: 7.62242
BE.HUM: 5.52316
BE.KLB: 17.0524
BE.LES: 10.9834
BE.MEMB: 24.5975
BE.MEMS: 25.6547
BE.MEU: 12.6893
BE.PLT: 33.9151
BE.ROB: 11.8982
BE.RQR: 18.4199
BE.SKQ: 5.5804
BE.SNF: 11.9689
BE.STI: 9.28007
BE.UCCS: 12.8329
BE.VIA: 11.7355
BE.WIB: 43.4826
BE.WLF: 21.9869
BE.WOR: 15.2888
Min. clipping score (clipped): 71.313
Max. clipping score (unclipped): 54.7753

Event #1108, S-window, remove_baseline=False:
BE.BEBN: 61.4968
BE.BOU*: 96.5337
BE.BRQ*: 95.0996
BE.CLA: 41.7513
BE.DOU: 42.8922
BE.HUM: 7.54886
BE.KLB: 51.1515
BE.LES*: 82.8771
BE.MEMB: 80.0432
BE.MEMS: 78.2439
BE.MEU: 53.6323
BE.PLT: 39.0317
BE.ROB: 18.8713
BE.RQR*: 76.9263
BE.SKQ: 6.41371
BE.SNF*: 94.2083
BE.STI: 72.1128
BE.UCCS: 53.9521
BE.VIA: 29.3854
BE.WIB: 20.1383
BE.WLF: 24.4955
BE.WOR: 20.4675
Min. clipping score (clipped): 76.9263
Max. clipping score (unclipped): 80.0432

Event #1108, S-window, remove_baseline=True:
BE.BEBN: 69.5796
BE.BOU*: 93.2722
BE.BRQ*: 95.3383
BE.CLA: 21.2446
BE.DOU: 35.8785
BE.HUM: 4.59051
BE.KLB: 45.5904
BE.LES*: 72.7496
BE.MEMB: 1.70158
BE.MEMS: 3.6086
BE.MEU: 14.1515
BE.PLT: 36.3825
BE.ROB: 4.63713
BE.RQR*: 77.0104
BE.SKQ: 6.19097
BE.SNF*: 89.8524
BE.STI: 37.3391
BE.UCCS: 46.5831
BE.VIA: 19.7998
BE.WIB: 31.8073
BE.WLF: 3.67065
BE.WOR: 20.4312
Min. clipping score (clipped): 72.7496
Max. clipping score (unclipped): 69.5796

Event #1306, P-window, remove_baseline=False:
BE.AUL: 28.4384
BE.BEBN*: 99.0062
BE.BOU: 38.832
BE.CLA: 17.753
BE.CTH: 35.6364
BE.DOU: 47.2163
BE.GES: 18.537
BE.KLB: 22.8059
BE.LCH: 26.4853
BE.MASA: 41.6329
BE.MEMB*: 98.9157
BE.MEMS: 25.9393
BE.RCHB: 61.096
BE.RQR: 36.5128
BE.SKQ: 35.6804
BE.SNF: 21.2604
BE.STI*: 97.2425
BE.STWA: 69.9696
BE.UCCB: 54.3369
BE.WLF: 43.7346
Min. clipping score (clipped): 97.2425
Max. clipping score (unclipped): 69.9696

Event #1306, P-window, remove_baseline=True:
BE.AUL: 2.84993
BE.BEBN*: 14.623
BE.BOU: 11.5592
BE.CLA: 7.50623
BE.CTH: 3.89984
BE.DOU: 25.5506
BE.GES: 5.23492
BE.KLB: 15.1978
BE.LCH: 1.90361
BE.MASA: 17.4023
BE.MEMB*: 71.5343
BE.MEMS: 3.91574
BE.RCHB: 4.95394
BE.RQR: 19.2441
BE.SKQ: 10.001
BE.SNF: 21.7939
BE.STI*: 7.08743
BE.STWA: 55.3211
BE.UCCB: 59.2099
BE.WLF: 6.77923
Min. clipping score (clipped): 7.08743
Max. clipping score (unclipped): 59.2099

Event #1306, S-window, remove_baseline=False:
BE.AUL: 51.1673
BE.BEBN*: 96.4862
BE.BOU: 64.057
BE.BREA: 6.72344
BE.CLA: 27.953
BE.CLHA: 60.3496
BE.CTH: 16.4319
BE.DOU: 38.7155
BE.GES: 40.6198
BE.KLB: 18.9279
BE.LCH*: 98.4636
BE.LLVA*: 53.2694
BE.MASA: 21.5947
BE.MEMB*: 96.7753
BE.MEMS: 69.0191
BE.RCHB*: 78.0686
BE.RQR: 40.2522
BE.SKQ: 11.3034
BE.SNF: 78.3879
BE.STI*: 97.6516
BE.STWA: 18.5656
BE.UCCB: 37.5371
BE.WLF: 77.8152
NL.HGN: 43.7527
Min. clipping score (clipped): 53.2694
Max. clipping score (unclipped): 78.3879

Event #1306, S-window, remove_baseline=True:
BE.AUL: 4.03008
BE.BEBN*: 94.3027
BE.BOU: 3.2213
BE.BREA: 4.54774
BE.CLA: 3.51802
BE.CLHA: 51.6509
BE.CTH: 22.7864
BE.DOU: 32.069
BE.GES: 6.58176
BE.KLB: 1.54218
BE.LCH*: 5.83592
BE.LLVA*: 19.4394
BE.MASA: 20.7803
BE.MEMB*: 63.4019
BE.MEMS: 2.69266
BE.RCHB*: 1.38116
BE.RQR: 6.03936
BE.SKQ: 2.38007
BE.SNF: 17.2614
BE.STI*: 28.7306
BE.STWA: 17.6149
BE.UCCB: 36.1183
BE.WLF: 1.37119
NL.HGN: 2.36249
Min. clipping score (clipped): 1.38116
Max. clipping score (unclipped): 51.6509

For information, I also report the results obtained with my modified version of the previous clipping detection method for the same events, stations and phases:

event_ID, phase, correct_detection, false_positives, false_negatives
987, P, 7, 0, 0
987, S, 11, 2, 0
1108, P, 2, 0, 0
1108, S, 5, 1, 0
1306, S, 5, 3, 0
1306, P, 3, 1, 0

@claudiodsf
Copy link
Member Author

Hi Kris, thank you for this thorough testing!

Wow, that's a very hard problem.
My conclusion is that, for the moment, it's probably wise to keep both approaches in the code (i.e., peaks and score) and let the user choose using configuration file.

I will made some comments on your branch to ask you some improvement and, when done, I will try and merge the two approaches in master, and add relevant config options.

Concerning the savgol_filter(), I use it as a smoothing filter, to determine the baseline.
Normally you should be able to see the computed baseline in the debug plot, don't you??

I have some other idea to apply on this branch, I'll let you know 😉

@krisvanneste
Copy link
Collaborator

Sorry, I didn't turn on the debug plot because there were so many records. However, I can test it now for records that give completely different results with or without the baseline removal. I will do that in the following days.

@claudiodsf
Copy link
Member Author

claudiodsf commented Feb 27, 2023

Just pushed a new version with a simplified approach, not involving exponential fit.

Let me know, when you have time 😉

P.S. I updated the description of this PR

@claudiodsf
Copy link
Member Author

Here's the same trace as in the comment above, with the new approach:

WI ABD 00 HHE-new_approach

@krisvanneste
Copy link
Collaborator

I will look into it.

@krisvanneste
Copy link
Collaborator

Here I document an example where the clipping score dramatically drops from 88 (remove_baseline=False) to 19 (remove_baseline=True). The trace is actually clipped (but not very strongly).

987_BE AUL_P_remove_baseline=False

987_BE AUL_P_remove_baseline=True

This is using the first version. I will now pull the latest version.

@claudiodsf
Copy link
Member Author

claudiodsf commented Mar 1, 2023

The baseline is clearly not well modelled...

It should have been flat, in this case

@krisvanneste
Copy link
Collaborator

With the new version, the clipping scores are 94 and 91, respectively.

@krisvanneste
Copy link
Collaborator

Sorry, that was a mistake. It's 78 and 7...

@claudiodsf
Copy link
Member Author

Could you post a screenshot?

@krisvanneste
Copy link
Collaborator

Here are the corresponding plots:

987_BE AUL_P_remove_baseline=False2

987_BE AUL_P_remove_baseline=True2

I had to outcomment the line
ax0.yaxis.set_major_formatter(yfmt)
as it was triggering an error in my version of matplotlib (2.2.4).

@claudiodsf
Copy link
Member Author

This result makes sense to me.

The baseline is not well modeled, but if we consider the baseline-removed signal, the drop in clipping score makes sense (less samples accumulated close to the edges).

I'll try to fix the Matplotlib problem.

@krisvanneste
Copy link
Collaborator

Not sure if it is useful, but this is the last error in the stack trace:

  File "C:\Miniconda3\envs\py3\lib\site-packages\matplotlib\axis.py", line 973, in iter_ticks
    self.major.formatter.set_locs(majorLocs)

  File "C:\Miniconda3\envs\py3\lib\site-packages\matplotlib\ticker.py", line 668, in set_locs
    self._set_format(vmin, vmax)

TypeError: _set_format() takes 1 positional argument but 3 were given

@claudiodsf
Copy link
Member Author

The Matplotlib error should be fixed

@krisvanneste
Copy link
Collaborator

Claudio,

It's hard to keep up the pace!

I first tested the clipping scores using baseline removal. It's clear that these have improved:

Event #987, P-window, remove_baseline=True:
Min. clipping score (clipped): 75.6986
Max. clipping score (unclipped): 82.2174

Event #987, S-window, remove_baseline=True:
Min. clipping score (clipped): 88.4902
Max. clipping score (unclipped): 82.7535

Event #1108, P-window, remove_baseline=True:
Min. clipping score (clipped): 86.5117
Max. clipping score (unclipped): 42.1689

Event #1108, S-window, remove_baseline=True:
Min. clipping score (clipped): 55.7423
Max. clipping score (unclipped): 57.0279

Event #1306, P-window, remove_baseline=True:
Min. clipping score (clipped): 89.8852
Max. clipping score (unclipped): 46.812

Event #1306, S-window, remove_baseline=True:
Min. clipping score (clipped): 59.5651
Max. clipping score (unclipped): 72.3559

Using a clip score threshold of 75, I obtain:
32 correct, 2 false positive, 3 false negative
This is not bad.

@claudiodsf
Copy link
Member Author

It's hard to keep up the pace!

Yes, I really would like to get this PR merged and move on 😉.

We can always add more improvements in future commits / PR 😄

@krisvanneste
Copy link
Collaborator

Regarding the histogram (or peak-count?) method, I agree with the change to clip_percentile. However, I ran the test again and the results are dramatically different! I guess this is due to changed density smoothing.

@claudiodsf
Copy link
Member Author

However, I ran the test again and the results are dramatically different! I guess this is due to changed density smoothing.

Ok, can you try adding a smoothing parameter to _get_kernel_density()? So that each algorithm can use its own!

I'll get a 1h break (😄) so feel free to (pull and) push to this branch!

@krisvanneste
Copy link
Collaborator

OK, I will try.

@krisvanneste
Copy link
Collaborator

The problem was indeed with the kernel smoothing bandwidth. I modified the code as you requested, but I can't push to the branch. Can you check the permissions?

image

@claudiodsf
Copy link
Member Author

That's strange. I elevated your role to "maintainer": can you try again?

@krisvanneste
Copy link
Collaborator

No, I receive the same error.

@claudiodsf
Copy link
Member Author

Ok, can you just put a patch (diff) here and I will integrate it. Thanks

@krisvanneste
Copy link
Collaborator

Here's the patch (zipped because .patch extension not supported for upload):
patch.zip

@claudiodsf
Copy link
Member Author

Ok, done! I'm writing a bit of documentation, then I'm planning to merge this, if you agree

@krisvanneste
Copy link
Collaborator

Sure, you can go ahead!

@claudiodsf claudiodsf changed the title Two approaches for clipping detection: counting peaks and score Two approaches for clipping detection: clipping_score and clipping_peaks Mar 2, 2023
@claudiodsf claudiodsf merged commit b8dbf80 into master Mar 2, 2023
@claudiodsf claudiodsf deleted the clipping_score branch March 2, 2023 17:00
@claudiodsf
Copy link
Member Author

Merged 🥳

@claudiodsf
Copy link
Member Author

Thank you again @krisvanneste for the huge effort you put into this work!

@krisvanneste
Copy link
Collaborator

You are welcome. I'm happy to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants