-
Notifications
You must be signed in to change notification settings - Fork 26
[RFC]: implement a broader range of statistical distributions #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@vivekmaurya001 Thank you for opening this RFC. A few comments/questions:
|
Thanks for reviewing @kgryte ! Although this classification in domains might not be very correct, But in the end I will try to implement those distributions also which I not highlighted which is required for linked issue |
Also, I was thinking of highlighting those distributions which are common in |
Yes, that seems reasonable. |
@kgryte , Any furthur suggestions or improvements from your side ? |
Nothing else on my end! |
Full name
Vivek Maurya
University status
Yes
University name
Indian Institute of Technology (Banaras Hindu University), Varanasi
University program
Bachelor of Technology in Electronics and communication Engineering
Expected graduation
Aug 07, 2027
Short biography
I'm currently a 2nd-year undergraduate student, pursuing Electronics Engineering. I had an interest in the development field since my first year of college and have completed several relevant courses in my academic curriculum, including C Language, Data Structures and Algorithms, Probability and Statistics.
Apart from this, I have done extensive work in JavaScript, TypeScript, React, Node.js, and MongoDB, developing projects in these technologies and winning multiple hackathons in my college. I also have a strong interest in blockchain technology and competitive programming.
Timezone
Indian Standard Time (UTC +5:30)
Contact details
email:- [email protected],[email protected], github:- vivekmaurya001
Platform
Linux
Editor
VSCode is my preferred code editor since I began my development journey because of ease of usage, support for large number of languages and many usefull extensions for git and docker , Also Debugging becomes a lot easier in VSCode
Programming experience
I have about 1.5 years of programming experience, during which I have developed many solo and group projects. Over this period, I have gained a strong foundation in React, Node.js, MongoDB, C/C++, JavaScript, Python, and backend development.
Here are some of my hackathon-winning projects:-
JavaScript experience
I started my backend development journey with JavaScript, and as I worked on more projects, I gradually gained confidence in the language. Initially, my focus was on building APIs and handling server-side logic using Node.js. Over time, I explored asynchronous programming, database interactions, and performance optimizations, which deepened my understanding of JavaScript beyond just scripting.
My contributions to stdlib further expanded my JavaScript expertise to a much greater level. Working on statistical and mathematical functions and blas implementations helped me to write efficient, structured, and performance-oriented code.
Node.js experience
While learning backend development, I was also exploring Node.js, and my understanding of it strengthened significantly through hands-on projects. Initially, I started with basic server-side scripting, but as I worked on more complex applications, I became familiar with asynchronous programming, event-driven architecture, and working with APIs.
C/Fortran experience
C was my first computer language, which I learned in my first year of college. I was very good at it. I also completed a course on data structures and algorithms in C and did competitive coding in C language in my first year, then shifted to C++. I don't have much knowledge of Fortran but Im open to it for learning when needed.
Interest in stdlib
I was curious about the Node modules I used numerous times in my projects—what their structure was, how they could be built, this ended when I found stdlib in December ! . Working with stdlib gave me the opportunity to contribute to such a recognized library, which have over millions of downloads !.
What stood out to me the most was how well-organized everything is—the clear documentation, structured workflow, and attention to detail. It’s not just about writing functions; it’s about making sure they are implemented properly, run efficiently, and are thoroughly tested to maintain high quality.
After working with stdlib for over three months, I found myself focusing on quality contributions over quantity. The PR reviewing process with maintainers helped me improve a lot.
Version control
Yes
Contributions to stdlib
I have contributed in stdlib in multiple areas like adding constants in
float32
, adding functions inmath/base/special
, adding C implementation of distributionsstats/base/dists
, updating native addons from C++ to C instats/base
, adding assert functions inmath/base/assert
, adding accessor array support to functions instats/base
, adding wasm package to blas functions below is a list of my different PR's :-Merged:
constants/float32/ln-half
math/base/special/heavisidef
stats/base/dists/invgamma/stdev
stats/base/dstdev
native addon from C++ to Cmath/base/assert/is-even
to follow latest project conventionsmath/base/assert/is-probabilityf
math/base/special/hypot
to follow latest project conventionsmath/base/special/kernel-tan
stats/base/cumin
blas/ext/base/wasm/dapxsum
open:
stats/base/dists/burr-type3/cdf
stats/base/dists/burr-type3/pdf
blas/ext/base/wasm/dnanasumors
math/base/special/gammasgnf
Link to all my merged and open PR's
stdlib showcase
Signal Transform:- In this project, I have shown how to use the standard library (stdlib) to efficiently perform the Discrete Fourier Transform (DFT) on time-domain signals. It demonstrates how stdlib can be used to process and visualize signals. More specifically, it includes:
• Working with complex numbers
• Handling double-precision floating-point numbers
• Generating signals using special math functions
This helps in understanding how signals can be transformed and analyzed.
Goals
Goal is to implement all important,
continuous
discrete
,multivariate
statistical distributions found inscipy
intostdlib
and their API’s for random number generation.After successfully completing this project we will be having a wide varity of distributions with their parameters:- PDF, CDF, mean, median, mode, logpdf, logcdf, mgf, entropy, kurtosis, skewness, variance etc in
stats/base/dists
Additionally, APIs will be available for generating random samples from any implemented distribution.Throughout the implementation, I will ensure that quality and performance remain a top priority
Here is my work plan :-
Parameters to be Implemented
• Core Functions: CDF, PDF, Mean, Median, Mode
• Advanced Metrics: Log PDF, Log CDF, MGF, Quantile
• Statistical Properties: Variance, Standard Deviation, Skewness, Kurtosis
• Entropy Calculation
I have worked on classifying all distributions in
scipy
Statistical functions into 3 categories:-partially implementable
complex implementation
In
second
type some distributions require dependencies not developed yet, majorly hinderingmgf
andentropy
calculation, For these I will see if there
simple closed form
exists or they can be done using someapproximations
, If not other parameters can still be implemented.In
third
type I have found major blockers to be inmultivariate
distributions requiring some basic matrix functionalities like :- Covariance Matrix Calculation, Determinant, Transpose, Matrix Multiplication, Addition, Subtraction, Division, Inverse Calculation, Trace calculation, Kronecker Product.As there is a large number of distributions in
scipy
so I have highlighted those which have a broader range of usage or very usefull in physical applications. For each distribution I have searched the number of important physics domain it comes in like maxwell distribution comes in Statistical Mechanics, Thermodynamics, Fluid Dynamics , Nakagami distribution comes in Wireless Communications, Signal Processing etcA much more detailed explanation of my classification and blockers is in below pdf
distribution implemetation (1).pdf
Implementation Plan:-
A rough plan is I will be impementing all
highlighted
distributions in type 1 1st , then type 2then type 3 then their random API’s
Before implementing any distribution , I will thhorogly read about its parametrs and understand and for reference I will be following these sources
1 - Scipy
2 - numpy
3 - julia
4 - R stats
I will start with type 1 highlighted continous distribution then discrete then multivariate and
make sure to complete it by week 4.
Moving to Type 2 starting with continous probability I will execute those packages whose mostly parameters can be implemented , If the work on functionality needed, is done till that time then no problem otherwise I will discuss it with mentors and find out some alternative way for execution without comprimising the quality and performance , try to make it complete in week 5-6.
starting with type 3 following same continous then multivariate , if the work on blockers is done by that time then no problem otherwise I will start to work on implemented distribution’s random API’s.
For random API’s I will first see that if any specific performant method exist for the distribution otherwise I will try some basic methods like inverse transform sampling etc I will be following the same order as implemented distributions.
If the work on random API’s got over early I will be implementing remaining distributions.
Why this project?
The proposed project is a very exciting opportunity to delve deep into statistical computations, focusing on implementing a wide range of probability distributions. This project will not only enhance my understanding of statistical modeling but also contribute to the broader JavaScript ecosystem. The key motivations behind my proposal are:
Qualifications
For this project, I will need a good understanding of JavaScript and statistical analysis. As mentioned earlier, I have been practicing development for over 1.5 years, working on full-stack projects that have given me in-depth knowledge of JavaScript concepts. Additionally, my coursework in Probability and Statistics during college has provided me with a solid foundation, which will be valuable for this project. I also feel confident in applying what I have learned in real-world scenarios.
Contributing to stdlib for over three months has helped me improve the quality of my PRs and refine my approach to coding. It has also allowed me to focus on areas I am truly interested in, making the experience both valuable and rewarding.
Also, I have done work on implementing
burr
distribution here are links :-stats/base/dists/burr-type3/cdf
stats/base/dists/burr-type3/pdf
stats/base/dists/burr-type3/logcdf
Prior art
Some basic distributions have already been implemented in
stdlib
, Now with the help of these many other distributions can be implemented.Implementations of such statistical functions can be found in libraries such as SciPy and R's stats package, which offer a vast collection of probability distributions, along with functions for PDF, CDF, quantile functions, and random sampling.
Commitment
Additionally, once my exams are over, I plan to start working during the community bonding period
to ensure the project stays on schedule and I meet all milestones.
Considering it a
Large project
so, project length will be around350 hours
or more.Schedule
Assuming a 12 week schedule,
Community Bonding Period: will start implementing highlighted independent distributions like Burr , Burr12 , dgamma, exponweib etc as mentioned in doc in type 1
Week 1 to Week 4 :- In this period i will be completing all distributions of type 1, As there are many distributions in this it will take time to properly complete it.
Week 5 to week 6 :- Starting with type 2 highlighted distributions , As it will be requiring some funtionality to be developed so more reasearch are required for this , while other parameters which can still be implemented.
Week 7 to Week 10 :- In this I will be finishing off any remaining work in previous weeks also Starting with Type 3 dists and after completing it, will start implementing Random API’s and try to finish it off by week 10
Week 11 to Week 12 :- After completing type 3 and random API’s , start to execute remaining distribution in same order and try to do whatever i can do in these weeks
Final Week: try completing any remaining distribution by this week.
Post Gsoc :- After these 12 weeks I will continue contributing if some distributions still left I will be completing those according to time availability
Notes:
Related issues
Issue #2
Checklist
[RFC]:
and succinctly describes your proposal.The text was updated successfully, but these errors were encountered: