Two approaches for clipping detection: clipping_score and clipping_peaks #25

claudiodsf · 2023-02-23T11:38:14Z

This PR adds two approaches for clipping detection:

Checking for the presence of peaks in the first and last few bins of the kernel density instead of only the presence of more than one peak to identify clipping (by @krisvanneste)
Computing a clipping score based on the shape of the kernel density

Description of the clipping score approach:

The algorithm is based on the following steps:

The trace is detrended and demeaned. Optionally, the trace baseline can be removed.
A kernel density estimation is performed on the trace amplitude values.
Two weighted kernel density functions are computed:
- a full weighted kernel density, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 1 and 100.
- a weighted kernel density without the central peak, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 0 and 100.
In both cases, the weight gives more importance to samples far from the zero mean value. In the second case, the central peak is ignored.
The score, ranging from 0 to 100, is the sum of the squared weighted kernel density without the central peak, normalized by the sum of the squared full weighted kernel density. The score is 0 if there is no additional peak beyond the central peak.

claudiodsf · 2023-02-23T11:49:43Z

Here's the same example discussed in this comment, but with the approach proposed here:

krisvanneste · 2023-02-23T12:54:16Z

Claudio,
Thank you!
I will look at your new approach and test it on my examples in the next few days.

claudiodsf · 2023-02-23T16:18:10Z

I finally decided to add a remove_baseline config parameter.
This parameters has two effects:

removes the trace baseline after instrument correction and before filtering
removes the baseline on the raw trace when computing the clipping score

krisvanneste · 2023-02-24T14:26:28Z

Claudio,

I tested your new clip detection method on all my events that contain 1 or more clipped stations. I copied the output below.
I should study the results in more detail, but at first sight they are again mixed:

the results obtained with remove_baseline=True are sometimes very bad (but not always)
it's not possible to define a clipping score that will unambiguously separate clipped records from unclipped ones.

I noticed you use savgol_filter when remove_baseline=True. What is it doing exactly? From the name of the option, I supposed it would be detrending, but I'm not sure this is what it does. I also apply savgol_filter in my code to determine inflection points in the signal, which is quite different from removing a baseline. Is it possible to show its effect on the debug trace plot?

Note: in the output below, stations marked with * are actually clipped. For each event and phase, I also report the minimum clipping score obtained for stations that are actually clipped, and the maximum clipping score for stations that are not clipped. Ideally, the first one should be higher than the second.

Event #987, P-window, remove_baseline=False:
BE.AUL*: 88.3732
BE.BEBN*: 96.2525
BE.BOU: 18.619
BE.BRQ*: 97.5209
BE.CLA*: 97.0883
BE.DOU: 58.8019
BE.HU1: 47.6094
BE.HUM*: 90.1658
BE.LCH: 64.4281
BE.LES: 14.9836
BE.MEMS*: 92.8565
BE.RQR: 38.7125
BE.SNF*: 95.9347
BE.STI*: 95.6706
BE.UCCS: 47.1577
BE.VIA: 37.5718
BE.WIB: 45.8856
BE.WLF: 83.9605
BE.ZEV: 15.2854
Min. clipping score (clipped): 88.3732
Max. clipping score (unclipped): 83.9605

Event #987, P-window, remove_baseline=True:
BE.AUL*: 19.0193
BE.BEBN*: 92.5741
BE.BOU: 5.83271
BE.BRQ*: 72.3471
BE.CLA*: 50.8514
BE.DOU: 9.1741
BE.HU1: 9.98621
BE.HUM*: 72.0887
BE.LCH: 9.49827
BE.LES: 10.3977
BE.MEMS*: 85.7856
BE.RQR: 3.76731
BE.SNF*: 81.2068
BE.STI*: 95.2516
BE.UCCS: 12.3157
BE.VIA: 45.0971
BE.WIB: 6.65135
BE.WLF: 7.40844
BE.ZEV: 4.43014
Min. clipping score (clipped): 19.0193
Max. clipping score (unclipped): 45.0971

Event #987, S-window, remove_baseline=False:
BE.AUL*: 94.743
BE.BEBN*: 98.2564
BE.BOU: 55.4312
BE.BRQ*: 96.5371
BE.CLA*: 94.8211
BE.DOU*: 92.7378
BE.HU1: 50.9763
BE.HUM*: 94.259
BE.LCH*: 97.0442
BE.LES: 30.2255
BE.MEMS*: 96.4927
BE.RQR: 92.2359
BE.SNF*: 94.4072
BE.STI*: 95.7813
BE.UCCS: 15.7692
BE.VIA: 53.8325
BE.WIB: 33.2392
BE.WLF: 89.1979
BE.ZEV*: 92.5521
Min. clipping score (clipped): 92.5521
Max. clipping score (unclipped): 92.2359

Event #987, S-window, remove_baseline=True:
BE.AUL*: 91.2837
BE.BEBN*: 96.0008
BE.BOU: 3.98126
BE.BRQ*: 89.3783
BE.CLA*: 89.5306
BE.DOU*: 82.0006
BE.HU1: 10.9608
BE.HUM*: 88.5418
BE.LCH*: 86.2451
BE.LES: 1.43242
BE.MEMS*: 76.9253
BE.RQR: 48.1142
BE.SNF*: 90.0213
BE.STI*: 91.0632
BE.UCCS: 12.5697
BE.VIA: 1.51434
BE.WIB: 3.00229
BE.WLF: 4.64262
BE.ZEV*: 74.5816
Min. clipping score (clipped): 74.5816
Max. clipping score (unclipped): 48.1142

Event #1108, P-window, remove_baseline=False:
BE.BEBN: 60.6561
BE.BOU*: 99.3243
BE.BRQ*: 94.8579
BE.CLA: 56.0279
BE.DOU: 11.2556
BE.HUM: 8.31139
BE.KLB: 28.3073
BE.LES: 24.3511
BE.MEMB: 44.4219
BE.MEMS: 47.7056
BE.MEU: 19.799
BE.PLT: 27.8138
BE.ROB: 16.3324
BE.RQR: 37.2983
BE.SKQ: 8.84353
BE.SNF: 33.74
BE.STI: 15.1359
BE.UCCS: 15.938
BE.VIA: 12.9943
BE.WIB: 41.7449
BE.WLF: 32.805
BE.WOR: 20.7935
Min. clipping score (clipped): 94.8579
Max. clipping score (unclipped): 60.6561

Event #1108, P-window, remove_baseline=True:
BE.BEBN: 43.0158
BE.BOU*: 90.5954
BE.BRQ*: 71.313
BE.CLA: 54.7753
BE.DOU: 7.62242
BE.HUM: 5.52316
BE.KLB: 17.0524
BE.LES: 10.9834
BE.MEMB: 24.5975
BE.MEMS: 25.6547
BE.MEU: 12.6893
BE.PLT: 33.9151
BE.ROB: 11.8982
BE.RQR: 18.4199
BE.SKQ: 5.5804
BE.SNF: 11.9689
BE.STI: 9.28007
BE.UCCS: 12.8329
BE.VIA: 11.7355
BE.WIB: 43.4826
BE.WLF: 21.9869
BE.WOR: 15.2888
Min. clipping score (clipped): 71.313
Max. clipping score (unclipped): 54.7753

Event #1108, S-window, remove_baseline=False:
BE.BEBN: 61.4968
BE.BOU*: 96.5337
BE.BRQ*: 95.0996
BE.CLA: 41.7513
BE.DOU: 42.8922
BE.HUM: 7.54886
BE.KLB: 51.1515
BE.LES*: 82.8771
BE.MEMB: 80.0432
BE.MEMS: 78.2439
BE.MEU: 53.6323
BE.PLT: 39.0317
BE.ROB: 18.8713
BE.RQR*: 76.9263
BE.SKQ: 6.41371
BE.SNF*: 94.2083
BE.STI: 72.1128
BE.UCCS: 53.9521
BE.VIA: 29.3854
BE.WIB: 20.1383
BE.WLF: 24.4955
BE.WOR: 20.4675
Min. clipping score (clipped): 76.9263
Max. clipping score (unclipped): 80.0432

Event #1108, S-window, remove_baseline=True:
BE.BEBN: 69.5796
BE.BOU*: 93.2722
BE.BRQ*: 95.3383
BE.CLA: 21.2446
BE.DOU: 35.8785
BE.HUM: 4.59051
BE.KLB: 45.5904
BE.LES*: 72.7496
BE.MEMB: 1.70158
BE.MEMS: 3.6086
BE.MEU: 14.1515
BE.PLT: 36.3825
BE.ROB: 4.63713
BE.RQR*: 77.0104
BE.SKQ: 6.19097
BE.SNF*: 89.8524
BE.STI: 37.3391
BE.UCCS: 46.5831
BE.VIA: 19.7998
BE.WIB: 31.8073
BE.WLF: 3.67065
BE.WOR: 20.4312
Min. clipping score (clipped): 72.7496
Max. clipping score (unclipped): 69.5796

Event #1306, P-window, remove_baseline=False:
BE.AUL: 28.4384
BE.BEBN*: 99.0062
BE.BOU: 38.832
BE.CLA: 17.753
BE.CTH: 35.6364
BE.DOU: 47.2163
BE.GES: 18.537
BE.KLB: 22.8059
BE.LCH: 26.4853
BE.MASA: 41.6329
BE.MEMB*: 98.9157
BE.MEMS: 25.9393
BE.RCHB: 61.096
BE.RQR: 36.5128
BE.SKQ: 35.6804
BE.SNF: 21.2604
BE.STI*: 97.2425
BE.STWA: 69.9696
BE.UCCB: 54.3369
BE.WLF: 43.7346
Min. clipping score (clipped): 97.2425
Max. clipping score (unclipped): 69.9696

Event #1306, P-window, remove_baseline=True:
BE.AUL: 2.84993
BE.BEBN*: 14.623
BE.BOU: 11.5592
BE.CLA: 7.50623
BE.CTH: 3.89984
BE.DOU: 25.5506
BE.GES: 5.23492
BE.KLB: 15.1978
BE.LCH: 1.90361
BE.MASA: 17.4023
BE.MEMB*: 71.5343
BE.MEMS: 3.91574
BE.RCHB: 4.95394
BE.RQR: 19.2441
BE.SKQ: 10.001
BE.SNF: 21.7939
BE.STI*: 7.08743
BE.STWA: 55.3211
BE.UCCB: 59.2099
BE.WLF: 6.77923
Min. clipping score (clipped): 7.08743
Max. clipping score (unclipped): 59.2099

Event #1306, S-window, remove_baseline=False:
BE.AUL: 51.1673
BE.BEBN*: 96.4862
BE.BOU: 64.057
BE.BREA: 6.72344
BE.CLA: 27.953
BE.CLHA: 60.3496
BE.CTH: 16.4319
BE.DOU: 38.7155
BE.GES: 40.6198
BE.KLB: 18.9279
BE.LCH*: 98.4636
BE.LLVA*: 53.2694
BE.MASA: 21.5947
BE.MEMB*: 96.7753
BE.MEMS: 69.0191
BE.RCHB*: 78.0686
BE.RQR: 40.2522
BE.SKQ: 11.3034
BE.SNF: 78.3879
BE.STI*: 97.6516
BE.STWA: 18.5656
BE.UCCB: 37.5371
BE.WLF: 77.8152
NL.HGN: 43.7527
Min. clipping score (clipped): 53.2694
Max. clipping score (unclipped): 78.3879

Event #1306, S-window, remove_baseline=True:
BE.AUL: 4.03008
BE.BEBN*: 94.3027
BE.BOU: 3.2213
BE.BREA: 4.54774
BE.CLA: 3.51802
BE.CLHA: 51.6509
BE.CTH: 22.7864
BE.DOU: 32.069
BE.GES: 6.58176
BE.KLB: 1.54218
BE.LCH*: 5.83592
BE.LLVA*: 19.4394
BE.MASA: 20.7803
BE.MEMB*: 63.4019
BE.MEMS: 2.69266
BE.RCHB*: 1.38116
BE.RQR: 6.03936
BE.SKQ: 2.38007
BE.SNF: 17.2614
BE.STI*: 28.7306
BE.STWA: 17.6149
BE.UCCB: 36.1183
BE.WLF: 1.37119
NL.HGN: 2.36249
Min. clipping score (clipped): 1.38116
Max. clipping score (unclipped): 51.6509

For information, I also report the results obtained with my modified version of the previous clipping detection method for the same events, stations and phases:

event_ID, phase, correct_detection, false_positives, false_negatives
987, P, 7, 0, 0
987, S, 11, 2, 0
1108, P, 2, 0, 0
1108, S, 5, 1, 0
1306, S, 5, 3, 0
1306, P, 3, 1, 0

claudiodsf · 2023-02-27T13:33:40Z

Hi Kris, thank you for this thorough testing!

Wow, that's a very hard problem.
My conclusion is that, for the moment, it's probably wise to keep both approaches in the code (i.e., peaks and score) and let the user choose using configuration file.

I will made some comments on your branch to ask you some improvement and, when done, I will try and merge the two approaches in master, and add relevant config options.

Concerning the savgol_filter(), I use it as a smoothing filter, to determine the baseline.
Normally you should be able to see the computed baseline in the debug plot, don't you??

I have some other idea to apply on this branch, I'll let you know 😉

krisvanneste · 2023-02-27T14:25:22Z

Sorry, I didn't turn on the debug plot because there were so many records. However, I can test it now for records that give completely different results with or without the baseline removal. I will do that in the following days.

claudiodsf · 2023-02-27T21:13:06Z

Just pushed a new version with a simplified approach, not involving exponential fit.

Let me know, when you have time 😉

P.S. I updated the description of this PR

claudiodsf · 2023-02-28T07:12:10Z

Here's the same trace as in the comment above, with the new approach:

krisvanneste · 2023-02-28T09:54:54Z

I will look into it.

krisvanneste · 2023-03-01T12:53:01Z

Here I document an example where the clipping score dramatically drops from 88 (remove_baseline=False) to 19 (remove_baseline=True). The trace is actually clipped (but not very strongly).

This is using the first version. I will now pull the latest version.

claudiodsf · 2023-03-01T12:55:28Z

The baseline is clearly not well modelled...

It should have been flat, in this case

krisvanneste · 2023-03-01T13:01:54Z

With the new version, the clipping scores are 94 and 91, respectively.

krisvanneste · 2023-03-01T13:02:58Z

Sorry, that was a mistake. It's 78 and 7...

claudiodsf · 2023-03-01T13:12:24Z

Could you post a screenshot?

krisvanneste · 2023-03-01T13:12:52Z

Here are the corresponding plots:

I had to outcomment the line
ax0.yaxis.set_major_formatter(yfmt)
as it was triggering an error in my version of matplotlib (2.2.4).

claudiodsf · 2023-03-01T13:16:02Z

This result makes sense to me.

The baseline is not well modeled, but if we consider the baseline-removed signal, the drop in clipping score makes sense (less samples accumulated close to the edges).

I'll try to fix the Matplotlib problem.

krisvanneste · 2023-03-01T13:24:36Z

Not sure if it is useful, but this is the last error in the stack trace:

  File "C:\Miniconda3\envs\py3\lib\site-packages\matplotlib\axis.py", line 973, in iter_ticks
    self.major.formatter.set_locs(majorLocs)

  File "C:\Miniconda3\envs\py3\lib\site-packages\matplotlib\ticker.py", line 668, in set_locs
    self._set_format(vmin, vmax)

TypeError: _set_format() takes 1 positional argument but 3 were given

claudiodsf · 2023-03-01T14:38:33Z

The Matplotlib error should be fixed

krisvanneste · 2023-03-02T11:58:01Z

Claudio,

It's hard to keep up the pace!

I first tested the clipping scores using baseline removal. It's clear that these have improved:

Event #987, P-window, remove_baseline=True:
Min. clipping score (clipped): 75.6986
Max. clipping score (unclipped): 82.2174

Event #987, S-window, remove_baseline=True:
Min. clipping score (clipped): 88.4902
Max. clipping score (unclipped): 82.7535

Event #1108, P-window, remove_baseline=True:
Min. clipping score (clipped): 86.5117
Max. clipping score (unclipped): 42.1689

Event #1108, S-window, remove_baseline=True:
Min. clipping score (clipped): 55.7423
Max. clipping score (unclipped): 57.0279

Event #1306, P-window, remove_baseline=True:
Min. clipping score (clipped): 89.8852
Max. clipping score (unclipped): 46.812

Event #1306, S-window, remove_baseline=True:
Min. clipping score (clipped): 59.5651
Max. clipping score (unclipped): 72.3559

Using a clip score threshold of 75, I obtain:
32 correct, 2 false positive, 3 false negative
This is not bad.

claudiodsf · 2023-03-02T12:32:32Z

It's hard to keep up the pace!

Yes, I really would like to get this PR merged and move on 😉.

We can always add more improvements in future commits / PR 😄

krisvanneste · 2023-03-02T13:10:11Z

Regarding the histogram (or peak-count?) method, I agree with the change to clip_percentile. However, I ran the test again and the results are dramatically different! I guess this is due to changed density smoothing.

claudiodsf · 2023-03-02T13:13:42Z

However, I ran the test again and the results are dramatically different! I guess this is due to changed density smoothing.

Ok, can you try adding a smoothing parameter to _get_kernel_density()? So that each algorithm can use its own!

I'll get a 1h break (😄) so feel free to (pull and) push to this branch!

krisvanneste · 2023-03-02T13:15:54Z

OK, I will try.

krisvanneste · 2023-03-02T13:55:07Z

The problem was indeed with the kernel smoothing bandwidth. I modified the code as you requested, but I can't push to the branch. Can you check the permissions?

claudiodsf · 2023-03-02T14:34:07Z

That's strange. I elevated your role to "maintainer": can you try again?

krisvanneste · 2023-03-02T14:37:41Z

No, I receive the same error.

claudiodsf · 2023-03-02T14:38:49Z

Ok, can you just put a patch (diff) here and I will integrate it. Thanks

krisvanneste · 2023-03-02T14:44:13Z

Here's the patch (zipped because .patch extension not supported for upload):
patch.zip

claudiodsf · 2023-03-02T14:49:47Z

Ok, done! I'm writing a bit of documentation, then I'm planning to merge this, if you agree

krisvanneste · 2023-03-02T14:51:49Z

Sure, you can go ahead!

claudiodsf · 2023-03-02T17:00:59Z

Merged 🥳

claudiodsf · 2023-03-02T17:01:30Z

Thank you again @krisvanneste for the huge effort you put into this work!

krisvanneste · 2023-03-02T17:18:01Z

You are welcome. I'm happy to contribute!

Compute a clipping score and let the user set a threshold.

dba6130

claudiodsf mentioned this pull request Feb 23, 2023

Improve clip detection #24

Closed

claudiodsf added 5 commits February 23, 2023 15:11

clipping_detection: increase minimum number of bins

1061be2

Do not check S/N if trace is already ignored beacuse of clipping

c135499

clipping_detection: accept multiple input files

9de9b85

clipping_detection: more accurate baseline modeling

63fa801

New parameter: remove_baseline

6485349

Simplified approach to compute clipping score

6776567

Fix algorithm description

3a3d98b

Fix docstring formatting

fc9e220

Fix ScalarFormatter for old versions of Matplotlib

3a3e357

Small fix in coding style

06c46dc

claudiodsf added 3 commits March 2, 2023 13:40

Rename clipping functions to match configspec

b8b2ba4

clipping_detection: fancier output :-)

dee71e5

Updated changelog

fd138f0

Another small improvement in main docstring

41de211

Add bw_method option to _get_kernel_density()

d1c68ee

claudiodsf added 6 commits March 2, 2023 16:02

Split into smaller functions

c6a7011

Trim lines

ceb4e2b

Sphinx docs for clipping detection

2e7ca69

Fix sphinx_favicon in the extensions

2202f55

Add figures to clipping detection docs

d87072f

Talk about "clipping score" first, since it's the default one

9de823c

claudiodsf force-pushed the clipping_score branch from 12d9b7f to 9de823c Compare March 2, 2023 16:41

claudiodsf changed the title ~~Two approaches for clipping detection: counting peaks and score~~ Two approaches for clipping detection: clipping_score and clipping_peaks Mar 2, 2023

claudiodsf merged commit b8dbf80 into master Mar 2, 2023

claudiodsf deleted the clipping_score branch March 2, 2023 17:00

Two approaches for clipping detection: clipping_score and clipping_peaks #25

Two approaches for clipping detection: clipping_score and clipping_peaks #25

Uh oh!

Conversation

claudiodsf commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the clipping score approach:

Uh oh!

claudiodsf commented Feb 23, 2023

Uh oh!

krisvanneste commented Feb 23, 2023

Uh oh!

claudiodsf commented Feb 23, 2023

Uh oh!

krisvanneste commented Feb 24, 2023

Uh oh!

claudiodsf commented Feb 27, 2023

Uh oh!

krisvanneste commented Feb 27, 2023

Uh oh!

claudiodsf commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claudiodsf commented Feb 28, 2023

Uh oh!

krisvanneste commented Feb 28, 2023

Uh oh!

krisvanneste commented Mar 1, 2023

Uh oh!

claudiodsf commented Mar 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krisvanneste commented Mar 1, 2023

Uh oh!

krisvanneste commented Mar 1, 2023

Uh oh!

claudiodsf commented Mar 1, 2023

Uh oh!

krisvanneste commented Mar 1, 2023

Uh oh!

claudiodsf commented Mar 1, 2023

Uh oh!

krisvanneste commented Mar 1, 2023

Uh oh!

claudiodsf commented Mar 1, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

claudiodsf commented Mar 2, 2023

Uh oh!

krisvanneste commented Mar 2, 2023

Uh oh!

Uh oh!

claudiodsf commented Feb 23, 2023 •

edited

Loading

claudiodsf commented Feb 27, 2023 •

edited

Loading

claudiodsf commented Mar 1, 2023 •

edited

Loading