-
Notifications
You must be signed in to change notification settings - Fork 13
Two approaches for clipping detection: clipping_score and clipping_peaks #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's the same example discussed in this comment, but with the approach proposed here: |
Claudio, |
I finally decided to add a
|
Claudio, I tested your new clip detection method on all my events that contain 1 or more clipped stations. I copied the output below.
I noticed you use savgol_filter when Note: in the output below, stations marked with * are actually clipped. For each event and phase, I also report the minimum clipping score obtained for stations that are actually clipped, and the maximum clipping score for stations that are not clipped. Ideally, the first one should be higher than the second.
For information, I also report the results obtained with my modified version of the previous clipping detection method for the same events, stations and phases:
|
Hi Kris, thank you for this thorough testing! Wow, that's a very hard problem. I will made some comments on your branch to ask you some improvement and, when done, I will try and merge the two approaches in Concerning the I have some other idea to apply on this branch, I'll let you know 😉 |
Sorry, I didn't turn on the debug plot because there were so many records. However, I can test it now for records that give completely different results with or without the baseline removal. I will do that in the following days. |
Just pushed a new version with a simplified approach, not involving exponential fit. Let me know, when you have time 😉 P.S. I updated the description of this PR |
I will look into it. |
The baseline is clearly not well modelled... It should have been flat, in this case |
With the new version, the clipping scores are 94 and 91, respectively. |
Sorry, that was a mistake. It's 78 and 7... |
Could you post a screenshot? |
This result makes sense to me. The baseline is not well modeled, but if we consider the baseline-removed signal, the drop in clipping score makes sense (less samples accumulated close to the edges). I'll try to fix the Matplotlib problem. |
Not sure if it is useful, but this is the last error in the stack trace:
|
The Matplotlib error should be fixed |
Claudio, It's hard to keep up the pace! I first tested the clipping scores using baseline removal. It's clear that these have improved:
Using a clip score threshold of 75, I obtain: |
Yes, I really would like to get this PR merged and move on 😉. We can always add more improvements in future commits / PR 😄 |
Regarding the histogram (or peak-count?) method, I agree with the change to |
Ok, can you try adding a smoothing parameter to I'll get a 1h break (😄) so feel free to (pull and) push to this branch! |
OK, I will try. |
That's strange. I elevated your role to "maintainer": can you try again? |
No, I receive the same error. |
Ok, can you just put a patch (diff) here and I will integrate it. Thanks |
Here's the patch (zipped because .patch extension not supported for upload): |
Ok, done! I'm writing a bit of documentation, then I'm planning to merge this, if you agree |
Sure, you can go ahead! |
12d9b7f
to
9de823c
Compare
Merged 🥳 |
Thank you again @krisvanneste for the huge effort you put into this work! |
You are welcome. I'm happy to contribute! |
This PR adds two approaches for clipping detection:
Description of the clipping score approach:
The algorithm is based on the following steps:
The trace is detrended and demeaned. Optionally, the trace baseline can be removed.
A kernel density estimation is performed on the trace amplitude values.
Two weighted kernel density functions are computed:
- a full weighted kernel density, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 1 and 100.
- a weighted kernel density without the central peak, where the kernel density is weighted by the distance from the zero mean value, using a 8th order power function between 0 and 100.
In both cases, the weight gives more importance to samples far from the zero mean value. In the second case, the central peak is ignored.
The score, ranging from 0 to 100, is the sum of the squared weighted kernel density without the central peak, normalized by the sum of the squared full weighted kernel density. The score is 0 if there is no additional peak beyond the central peak.