-
Notifications
You must be signed in to change notification settings - Fork 25
[RFC]: Implement a broader range of statistical distributions #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@AgPriyanshu18 Thank you for opening this RFC. One suggestion I have is that you be more specific in which distributions you plan to add and when. As a start, you can investigate those mentioned on the stdlib issue tracker: https://github.com/stdlib-js/stdlib/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+distribution+label%3AStatistics. I'd also suggest exploring a bit more what SciPy has to offer, reading the source code therein, and determining which distributions will be most straightforward to implement. Importantly, you'll want to start off your work working on those distributions for which we have all the requisite functionality (e.g., special math functions, utilities, etc). Otherwise, if you start off by working on a distribution for which we don't already have the prereqs, you'll quickly be blocked and won't be able to make much progress. As such, I strongly suggest doing a bit more R&D so that your timeline is as informed as possible. |
In fact, to ensure that this project is aligned with your interests, I suggest, if possible, actually trying to find a suitable distribution to add to stdlib now and trying to do so. It will be to your benefit to have a good understanding as to what you'd be signing up for in pursuing this project. |
Thank you @kgryte for your review, And am sorry for such a late reply. I have arranged the functions in two categories based on their complexity of implementation and the need for other utility functions, I have mentioned those under project goals. I have included them in my timeline as well as you have asked me. |
Full name
Priyanshu Agarwal
University status
Yes
University name
Indian Institute of Information Technology Jabalpur
University program
Bachelors of Technology in Computer Science and Engineering
Expected graduation
01-08-2025
Short biography
I am Priyanshu Agarwal, a pre-final year Computer Science and Engineering student at the Indian Institute of Information Technology Jabalpur. My background is complemented by hands-on experience in development with languages and frameworks like
JavaScript
andNode.js
, mobile development inKotlin
for Android, and core programming languagesC/C++
andJava
. Internships, freelance projects, and active contributions to college projects further strengthen this experience. My passion for continuous learning led me to explore new frameworks and technologies within these domains, solidifying my practical understanding.Beyond web and mobile development, I hold a strong interest in data structures, algorithms, and object-oriented programming (
OOPs
), evident from my coursework (Advanced Probability
,Data Science
,Data Structures & Algorithms
,Theory of computation
) and participation in coding platforms like LeetCode and Codeforces. These platforms have been valuable tools to hone my problem-solving skills and enhance my proficiency in these critical areas. Additionally, I recently began contributing to STDLIB, immersing myself in their extensive codebase and actively engaging in development tasks. This experience has provided invaluable insights into a larger codebase structure and collaborative development practices, further enriching my skillset.Timezone
Indian Standard Time ( IST ) , (GMT+ 5:30)
Contact details
Email : [email protected] / [email protected] Phone number: 6378228784 Linkedin: https://www.linkedin.com/in/priyanshu-agarwal-484151220/ Github: https://github.com/AgPriyanshu18
Platform
Linux
Editor
My development environment utilizes Ubuntu, Visual Studio Code (VS Code) serves as my primary editor, offering extensive built-in JavaScript support for unit testing, formatting, and other essential tools. This combination ensures efficient development, high-quality code, and seamless collaboration via VS Code's Git integration. While some might prefer macOS for web development, Ubuntu provides a robust environment perfect for this project, allowing me to be productive and contribute effectively from the start.
Programming experience
I have been actively expanding my programming skillset since High School, focusing on JavaScript, C, C++, Java, Kotlin, and Node.js. This has resulted in a strong understanding of these languages and their associated libraries and frameworks. I am also a competitive programming enthusiast solved over 300 problems in total on several platforms. To demonstrate my abilities, I'd like to highlight my relevant project:
Sevayu: This project focused on digitalizing hospital services via a subscription model. Sevayu offered features like online appointment booking, detailed medicine information, and personalized functionalities. Here, I delved into data science applications within JavaScript. To personalize functionalities, I implemented recommendation algorithms based on user medical history and appointment data.
Pocket Manager: is a Kotlin/XML/Firebase expense tracker that simplifies expense recording with accuracy. Users gain insights through interactive charts and budgeting tools powered by algorithms that crunch spending data to reveal spending trends.
These projects showcase my proficiency in working with web APIs, real-time data, and data visualization within a JavaScript environment, all of which are relevant skills for this data analysis library project.
JavaScript experience
I've been actively honing my JavaScript skills. This journey has equipped me with a comprehensive understanding of the language's syntax, features, and best practices. I've delved into many projects, from building interactive web applications to crafting server-side functionalities using Node.js like seyavu. What truly excites me about JavaScript is its unique blend of flexibility and power. First-class functions allow me to treat functions like variables, promoting code reusability and a more functional programming style. Dynamic typing, while requiring careful attention, streamlines development, and rapid prototyping.Seyavu
Node.js experience
My passion for backend development led me to delve into Node.js. This powerful platform, combined with the popular Express framework, has become my go-to for building web applications. I've honed my skills by crafting backends for projects like a Sevayu. These real-world experiences solidified my understanding of Node.js while equipping me to solve complex problems effectively.
Further strengthening my backend expertise, I've actively implemented various databases, including MySQL, and MongoDB, to tailor data storage solutions for each project. Additionally, I've gained proficiency in writing middleware for authentication and routing, ensuring streamlined user access and website navigation.
C/Fortran experience
My programming journey began with C/C++ in high school, where I discovered the power of low-level languages. C's compactness thrived in direct hardware control, perfectly suited for projects like Arduino programming. Its lightning-fast execution speed also made it my weapon of choice in competitive programming.
While my experience with Fortran isn't as extensive as with JavaScript or C/C++, I've gained a solid understanding of its strengths in the realm of scientific computing. Fortran's optimized support for numerical operations and arrays makes it a natural choice for tasks like linear algebra and simulations.
Interest in stdlib
The Stdlib library offers a robust collection of utility functions and modules that streamline development efforts within the Node.js ecosystem. It encompasses a wide range of functionalities, including mathematical and statistical computations, and file system operations. This comprehensive toolkit empowers developers of varying skill sets to efficiently address diverse project requirements.
Furthermore, Stdlib prioritizes excellence, performance optimization, and adherence to established industry standards. This unwavering commitment fosters trust and widespread adoption within the Node.js community. This has make me interested in this organization and motivated me to contribute.
Version control
Yes
Contributions to stdlib
Merged
feat: Added
utils/none-in-by
#1416feat: add
array/base/count-same-value-zero
#1384Open
refactors
blas/ext/base/sfill
to follow current projects convention #1809refactors
blas/ext/base/snansumpw
to follow current project conventions #1711Goals
The main goal of this project is to implement a broader range of statistical distributions that are present in the SciPy library directly within the Stdlib library. Currently, Stdlib users who need to work with a wider variety of statistical distributions often rely on external libraries like SciPy. This integration project aims to eliminate that dependency by bringing the power of SciPy's stats module right into Stdlib.
By implementing all the distributions found in SciPy's stats module, Stdlib users will have access to a significantly richer toolkit for statistical analysis. They'll be able to calculate key distribution properties like PDFs, CDFs, and quantiles, all without leaving the Stdlib environment. Additionally, the ability to generate random variates based on these distributions will further enhance Stdlib's capabilities for data simulation tasks. This expanded functionality will streamline the workflow for developers working on data analysis projects in JavaScript, allowing them to focus on their core analysis tasks without worrying about managing external libraries.
-Approach -
To accomplish this project, I have tried to categorize all the major Cumulative and Discrete Distributions based on their complexity to implement and the need for utility functions for their smooth implementations.
First Category - This includes distributions whose implementations are straightforward and all the required dependencies are present in stdlib/math/base/special, I have tried to make order in terms of complexity of distribution formula.
Boltzmann
,Bradford
,half norm
,lognormal
,Argus
,Plank
,Dogum
,Gibrat
,Rademacher
,Inverse Weibull
,Log logistic
,Angelit
,logamma
,Gompertz
,Fold Cauchy
,Half cauchy
,Half normal
,Half logistics
.Second Catogery - The distributions in this category have a bit of complexity in implementation and need to implement some utility and math functions for smooth implementation. They may need first-category functions for easier implementation too.
CystalBall
,Burr(III)
,Double Weibull
,Double Laplace
,Maxwell
,Zipf
,Von Misses
,Wald
,Non-Central Chi-square
,Rice
,Studentized Range
,Skellam
,fariguelife
.Why this project?
I'm highly motivated by the opportunity to contribute to a prominent open-source organization like Stdlib. The chance to collaborate with a community developing such an important Javascript library is a great learning opportunity and great for my resume too. I have had a nag for mathematics and data science since High school days, I like to learn about special functions and distributions which are used in deriving results from given data and now getting a chance to write those functions in a library in a prominent language like Javascript is a too good opportunity.
Two reasons which excite me most are -
Empowering the Javascript Community: Currently, Javascript developers often rely on external libraries like SciPy for comprehensive statistical analysis. This project has the potential to transform Stdlib into a powerhouse for statistical computing within Javascript. By integrating the vast array of distributions from SciPy's stats module, we can equip the entire Javascript data science community with a powerful toolkit directly within their preferred environment. Imagine the possibilities!
Expanding My Javascript and Data Science Skillset: Participating in this project presents a fantastic opportunity to deepen my understanding of both Javascript and data science concepts. Working on the implementation of these statistical distributions will not only enhance my programming skills but also solidify my grasp of various statistical methodologies. This project is a win-win, allowing me to contribute to a valuable library while simultaneously strengthening my skillset.
Overall, the chance to contribute to a prominent community project like this, while simultaneously expanding my knowledge in both Javascript and data science, is incredibly motivating. I'm eager to dive in and be a part of making Stdlib an even more powerful tool for web-based numerical computing.
Qualifications
I have good experience in Javascript from working on my projects and internships, I also know Python by developing projects for AI courses. I am well versed in object-oriented I am well versed in c/c++ as I have been practicing data structure and algorithms since starting of college.
My academic background has equipped me with a strong foundation for working on this project. Completing a data science course provided a solid understanding of statistical concepts and their applications. Additionally, coursework in the Theory of computation(TOC) and advanced probability has solidified my technical foundation for the algorithmic challenges involved in integrating SciPy's distributions.
My contribution to STDLIB has provided me with familiarity with the codebase and the ability to work more and enhance its usability to the extent of my abilities.
Thus, I find myself a good fit for the project, with my experience and skills in programming and the knowledge I have acquired in my academic journey. I think i can complete this project.
Prior art
On researching and understanding the project I have the following observations.
These references are enough to properly understand what must be done in this project, and how should proceed.
Commitment
1 May - 26 May -> Bonding Period
27 May - 7 July -> 40 hours/week ( 40 * 6 )
8 July - 17 August -> 20 hours/week ( 20 * 6 )
Total = 240 + 120 = 340 hours
I don’t plan to take vacations.
Schedule
Assuming a 12-week schedule,
Community Bonding Period:
During this period I will work on implementing distribution functions with a straightforward approach that already has the necessary utilities and mathematical functions. I will take Boltzmann, Bradford, half normal and log-normal distributions, and others if possible for the starting.
I would further implement the API for drawing random variates and consult with the mentors. And I would like to share my views on which properties of each function can be implemented, and some better ways to make documentation and write tests using the SciPy library.
Week 1 and Week 2:
I will start with the starting of the first category of functions that have their dependencies implemented. I will complete distributions in the First Category which includes - Argus, Plank, Dagum, Gibrat, inverse Weibull, Loglogistic, and Fold Cauchy.
All of these distributions have their respective issues already opened and completely unaddressed till now.
I will also implement APIs for drawing random variates. Then, I will evaluate the prepared packages with mentors and discuss the issues if any. I will try my best to take more distribution on successful completion of above mentioned.
Week 3:
During this week, I will fix the bugs or issues found in the previous implementation and will work to perfect the previous implementation as a base for all further implementations. I will also start working with further distribution functions.
Week 4 and Week 5:
In this period, I will start to work on functions whose implementation is not very straightforward and falls under the first category of distribution functions. I will start with properly understanding the distribution function and its properties and then will implement the functions according to the previously perfected packages.
Week 6: (midterm)
My goal will be to complete most of the distribution function in the first Category and prepare by midterm for evaluation and complete any backlog present or resolve any other issues or bugs.
Week 7 and Week 8:
After gaining experience from the first category functions, I will start with the beginning functions of the Second category, which include - CystalBall, Burr(III), Double Weibull, Double Laplace, Maxwell, and Zipf.
Here, distributions like Maxwell need gammaincc as a utility function which is not present, so I will add these mathematical functions as well.
I will also incorporate APIs for drawing random variates.
Week 9 and Week 10:
During this week, I plan to end the work on distribution functions with complete documentation and testing. I will complete the rest of the issues present in the Second Category and will address the issues that arise if any during the reviewing of my work.
Week 11 and Week 12:
In this phase, I will complete any backlogs if present. Thoroughly test complete functionality developed with every possible scenario, and resolve bugs if any. Add any additional APIs related to distributions after discussions with the mentor if found.
Final Week:
I will complete the project and wrap up all the things during this week submit my work and take suggestions from mentors.
Notes:
Related issues
No
Checklist
[RFC]:
and succinctly describes your proposal.The text was updated successfully, but these errors were encountered: