Skip to content

pymultifit submission #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
14 of 32 tasks
syedalimohsinbukhari opened this issue Jan 21, 2025 · 20 comments
Open
14 of 32 tasks

pymultifit submission #233

syedalimohsinbukhari opened this issue Jan 21, 2025 · 20 comments
Assignees

Comments

@syedalimohsinbukhari
Copy link

syedalimohsinbukhari commented Jan 21, 2025

Submitting Author: Syed Ali Mohsin Bukhari (@syedalimohsinbukhari)
All current maintainers: (@syedalimohsinbukhari)
Package Name: pymultifit
One-Line Description of Package: A python library for fitting data with multiple models.
Repository Link: https://github.com/syedalimohsinbukhari/pyMultiFit
Version submitted: v1.0.3 v1.0.6
EiC: @coatless
Editor: @Batalex
Reviewer 1: @g4brielvs
Reviewer 2: @KristinaGagalova
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

pymultifit is built primarily to solve one problem, to fit multiple models (and mixture models) to a given data. Be it multiple Gaussian, Laplacian, or a mixture of such models, this package aims to deal with multi-model data fitting. The package also provides easy-to-use BaseDistribution and BaseFitter classes for respective user-defined functions.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization1
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific

  • Geospatial
  • Education

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

Researchers, data scientists, and statisticians who work with datasets requiring multi-model fitting for robust analysis and modeling.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Apart from scipy, lmfit, and scikit-learn the general purpose scientific packages, there exists PyAutoFit, a Python-based probabilistic programming language built on Bayesian inference. Another notable library is Mixture-Models, which specializes in advanced optimization techniques for fitting various families of mixture models, including Gaussian mixture models and their variants. Both libraries are powerful tools for specific use cases, and I recently came to know about them during my search of existing options.

While these libraries offer robust solutions for hierarchical modeling (PyAutoFit) or a diverse array of pre-defined mixture models (Mixture-Models), pyMultiFit distinguishes itself through its simplicity of use and its focus on simplicity of use. Specifically, it is designed to provide a lightweight and user-friendly framework for fitting multi-model data, including custom mixture models (for example, gaussian + laplace + line). pymultifit also provides easy-to-use base classes that can be modified for any distribution/fitter purposes.

One of the more prominent features of pyMultiFit is the BaseFitter template class that provides custom fitting to any definable function with minimal boilerplate code. All the plotting and boundary functionalities are handled inside the template class so that the user can focus solely on running through multiple models quickly without thinking about how to manage multiple models of the same type or even of different types.

Additionally, the generators template function provides the user with an N-model data generator function with added noise capability to mimic real-life scenarios of whatever distribution the user might want.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • uses an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Footnotes

  1. Please fill out a pre-submission inquiry before submitting a data visualization package.

@coatless
Copy link

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • The package imports properly into a standard Python environment import package.
  • Fit The package meets criteria for fit and overlap.
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a CODE_OF_CONDUCT.md file.
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via a Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

I think there's enough novelty behind the multifitter and distribution approaches discussed to move forward with a full review. (For clarity, scipy provides scipy.stats.fit() for a single DV or CV whereas multiple and different supports are given by pymultifit via MixedDataFitter) Moreover, there are ample tutorials and a solid case study of applying the package to solve real-world problems.

My only concern is regarding the accuracy benchmarks throwing RuntimeWarning notices. I think this is leaning into a discussion that appeared in the presubmission intake analysis regarding a re-implementation with numpy.

For example, with arcsine() and beta(a=5, b=80, loc=-3, scale=5) we have:

Beta

https://pymultifit.readthedocs.io/latest/benchmarks/_bm_accuracy.html#beta(a=5,-b=80,-loc=-3,-scale=5)

/home/sarl-ws-5/PycharmProjects/pyMultiFit/src/pymultifit/distributions/utilities_d.py:168: RuntimeWarning: overflow encountered in power
  numerator = y**(alpha - 1) * (1 - y)**(beta - 1)
/home/sarl-ws-5/PycharmProjects/pyMultiFit/src/pymultifit/distributions/utilities_d.py:168: RuntimeWarning: overflow encountered in multiply
  numerator = y**(alpha - 1) * (1 - y)**(beta - 1)

Arcsine

https://pymultifit.readthedocs.io/latest/benchmarks/_bm_accuracy.html#arcsine()

/home/sarl-ws-5/PycharmProjects/pyMultiFit/src/pymultifit/distributions/utilities_d.py:71: RuntimeWarning: invalid value encountered in sqrt
  pdf_ = 1 / (np.pi * np.sqrt(y * (1 - y)))
/home/sarl-ws-5/PycharmProjects/pyMultiFit/src/pymultifit/distributions/utilities_d.py:71: RuntimeWarning: divide by zero encountered in divide
  pdf_ = 1 / (np.pi * np.sqrt(y * (1 - y)))
/home/sarl-ws-5/PycharmProjects/pyMultiFit/benchmarks/functions.py:30: RuntimeWarning: invalid value encountered in subtract
  pdf_abs_diff = np.abs(pdf_custom - pdf_scipy) + EPSILON

@lwasser lwasser moved this from pre-review-checks to seeking-editor in peer-review-status Jan 29, 2025
@syedalimohsinbukhari
Copy link
Author

syedalimohsinbukhari commented Jan 29, 2025

Hi @coatless

Thanks for getting back. I've been meaning to address those issues for a while but was waiting for a response before I proceeded. I've updated the required functions, and they shouldn't give the same issues now. The only issue now is that in arcsine for x=1 (in the edge case of testing), the pdf gives np.inf in both scenarios and thus the invalid value error.

Image

Cheers.

@coatless
Copy link

@syedalimohsinbukhari any reason for not directly using np.arcsin()? I'm not having that issue with the built-in version:

Screenshot of a Jupyter notebook with a custom implementation throwing a similar warning vs. the implementation in NumPy of `np.arcsin()`

Could you restore the docstring for beta_pdf_():

syedalimohsinbukhari/pyMultiFit@517e6fe#diff-901f230daf84fc7cccf4819b150e039e69d3b060a1bcc2279a55f84ce2570da6L118

@syedalimohsinbukhari
Copy link
Author

syedalimohsinbukhari commented Jan 29, 2025

Hi @coatless,

I think this image should clear things up. The ArcSineDistribution is not the same as np.arcsin of trigonometry; it is a distribution with pdf $f(y) = \dfrac{1}{\pi\sqrt{y(1-y)}}$ that's why.

For reference, wiki article and scipy docs.

Image

Thanks for the headsup for beta_pdf_; strangely, it is showing up in my local builds but not in recent RTD builds. I'll look into it more.

Image

@syedalimohsinbukhari
Copy link
Author

Hi @coatless

Just letting you know that beta_pdf_() docstring is now showing up on RTD servers as well.

https://pymultifit.readthedocs.io/latest/distributions/utilities_d.html#pymultifit.distributions.utilities_d.beta_pdf_

@coatless
Copy link

@syedalimohsinbukhari thanks! I'm going to work on getting an editor assigned to move the review forward.

@syedalimohsinbukhari
Copy link
Author

@coatless Awesome!! Looking forward to it.

@syedalimohsinbukhari
Copy link
Author

syedalimohsinbukhari commented Feb 18, 2025

Hi @coatless

I'm working on some updates and wanted to ask if it'll be okay to push the next version or should I just keep the version as is since I've already mentioned it for submission.

@coatless
Copy link

@syedalimohsinbukhari go for it! We're still working on getting editors.

@coatless
Copy link

coatless commented Mar 6, 2025

@syedalimohsinbukhari Thanks for your patience. I've secured an editor to further move the review along.

I am happy to announce that @Batalex will be the editor for your submission.

@lwasser lwasser moved this from seeking-editor to under-review in peer-review-status Mar 6, 2025
@syedalimohsinbukhari
Copy link
Author

Hi @Batalex.

Welcome, and I'm looking forward to working with you.

@Batalex
Copy link
Contributor

Batalex commented Mar 8, 2025

Hello @syedalimohsinbukhari, nice to meet you.
I am excited to be part of pymultifit review, and I'll get started with my search for reviewers.

@syedalimohsinbukhari
Copy link
Author

Hi @coatless

I'm working on some updates and wanted to ask if it'll be okay to push the next version or should I just keep the version as is since I've already mentioned it for submission.

@syedalimohsinbukhari go for it! We're still working on getting editors.

Hi, @coatless, @Batalex

As mentioned previously, I've updated the submission version for pymultifit with some streamlining of functionalities without changing much infrastructure. The docs, examples, and tutorails are up to date with this version v1.0.6.

@Batalex
Copy link
Contributor

Batalex commented Mar 10, 2025

Hey, thanks for letting us know. My personal policy when it comes to the reviews I lead as the editor is as follows:

  • you can work as you please on the code base, there is no need to freeze anything during the review process. This is important because finding reviewers can take a while, and there is no point in asking you to delay the work multiple times for several months
  • we'll ask the reviewers to review the version submitted. Since there is no reviewer at this time, feel free to bump to the latest release.
  • however, as soon as reviewers are assigned to the review, the version submitted in the issue header will stay the same. Making sure that reviewers' concern made on an older codebase are properly addressed in the newer one will be your job

@Batalex
Copy link
Contributor

Batalex commented Apr 26, 2025

Hi @g4brielvs and @KristinaGagalova 👋 Thank you for volunteering to review for pyOpenSci! Please don't hesitate to introduce yourselves.

@syedalimohsinbukhari, I am pleased to report that we have our two reviewers 🎉

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please note that we will review the revision 1.06

Please get in touch with any questions or concerns! Your review is due: May 19th

Reviewers: @g4brielvs, @KristinaGagalova
Due date: 2025/05/19

@Batalex
Copy link
Contributor

Batalex commented May 10, 2025

Hey there @syedalimohsinbukhari,
Even as the editor, I like to go through the submissions and give my insights on what could be improved. This is not a formal review and my remarks are more about the style (packaging, code quality) rather than the content (the actual fitters), but I certainly would definitely appreciate it if you implemented some of those suggestions!
I'll probably revisit this if I can find some more time, there is still some content I have not gone through.

Pyproject.toml

I would advise migrating as much as possible from setup.py to pyproject.toml.
pyproject.toml is now well established in the community, so you can find a lot of documentation for it (shamelessly linking to our own guide on the subject.)

There are several benefits to this approach: dependencies groups, having the linter/formatter/test configuration in one single file, using the standard tooling, better readability…
I must admit that I am not super familiar with conda packaging and how we can avoid duplicated information there.

Automated tooling

I always recommend setting up automation to handle "the boring stuff": code format, linter, type checker, running tests, building docs, etc.

Formatter

It seems that you already have some kind of IDE built-in formatter for your project (or you spent quite some time yourself aligning those brackets and commas by hand :D).
The problem with this approach is that it is not scalable.
Should an external contributor work on your project, they might not use the same editor.
Therefore, enforcing a consistent format in your codebase will be up to you: that means low added-value commits removing blank spaces or switching single quotes to double ones.

By enforcing a format using a CLI tool such as ruff format, you make sure that everyone can apply the same standard code format.
You can add a quality check to your continuous integration on GitHub to avoid adding non-formatted code to your code base.

Task runner

Building / maintaining a package is hard.
There is only so much we can remember about the various code quality tools we use.
It's even harder when you consider that (1) you write code to solve an issue you have, not just for the sake of it and (2) other contributors might have different experiences when it comes to this stuff.

By using a task runner, you can trivialize a part of this burden by using a third-party tool to delegate the complex commands to run.
The most popular one for Python is tox but nox has a special place in my heart.

Here is what it can look like. Of course, this specific example is not directly applicable to pyMultiFit since it is using ruff and uv.
nox (installed on your system) takes this noxfile.py and creates a task for each decorated function. A task is executed in its own virtual environment, so you do not mess with your system or project dependencies at all.

# noxfile.py
import nox

nox.options.default_venv_backend = "uv"
nox.options.reuse_venv = "yes"
nox.options.sessions = ["fmt", "lint"]

@nox.session()
def fmt(session: nox.Session) -> None:
    """Format source code."""
    session.run(
        "uv",
        "sync",
        "--frozen",
        "--only-group",
        "fmt",
        env={"UV_PROJECT_ENVIRONMENT": session.virtualenv.location},
    )
    session.run("ruff", "format", "src", "tests")


@nox.session()
def lint(session: nox.Session) -> None:
    """Lint source code."""
    session.run(
        "uv",
        "sync",
        "--frozen",
        "--group",
        "lint",
        env={"UV_PROJECT_ENVIRONMENT": session.virtualenv.location},
    )
    session.run("ruff", "check", "--fix", "src")
    session.run("ruff", "check", "--fix", "tests")
    session.run("mypy", "src")


@nox.session()
def tests(session: nox.Session) -> None:
    """Run the unit tests."""
    session.run_install(
        "uv",
        "sync",
        "--frozen",
        "--group",
        "unit",
        env={"UV_PROJECT_ENVIRONMENT": session.virtualenv.location},
    )
    session.run("python", "-m", "pytest", *session.posargs)

Then, you get this nice user experience out of the box:

❯ nox -l
Tasks definition for the nox runner.

Sessions defined in /Users/alex/code/example/noxfile.py:

* fmt -> Format source code.
* lint -> Lint source code.
- tests -> Run the unit tests.

sessions marked with * are selected, sessions marked with - are skipped.

Want to format and lint your code? Just run nox. Run the unit tests? nox -s tests.
You can add as many sessions as you need, like building the docs, fetching an external resource, etc.

Your GitHub workflows are simplified as well:

  tests:
    name: Unit test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install nox (and other tools if you need them)
        run: pipx install nox
      - name: Run tests
        run: nox -s tests

Type checker

I commend your efforts on adding some type annotations but while doing so, you must be absolutely certain that they are correct.
My personal take is that it's better to not have any type hint rather than incorrect ones.
For instance, in the gamma fitter example:

import numpy as np
from matplotlib import pyplot as plt

from src.pymultifit import EPSILON
from src.pymultifit.fitters import GammaFitterSR
from src.pymultifit.generators import multi_gamma_sr

params = [(2, 2, 1, 5), (4, 6, 2, 1), (1, 3, 2, 9)]
x = np.linspace(EPSILON, 15, 1000)

noise_level = 0.05
y = multi_gamma_sr(x, params=params, noise_level=noise_level)

In the last line, params=params is underlined in my editor with the following error, which signals to your user trying out your package that they are doing something wrong:

error: Argument of type "list[tuple[int, int, int, int]]" cannot be assigned to parameter "params" of type "listOfTuplesOrArray" in function "multi_gamma_sr"
    [...] (reportArgumentType)

The fix is absolutely not trivial. Sure, we could change listOfTuples to List[Tuple[Union[float, int], ...]], that way the example provided would still be valid. But what about

params = [[2, 2, 1, 5], [4, 6, 2, 1], [1, 3, 2, 9]]
# or
params = np.array([[2.0, 2.0, 1.0, 5.0], [4.0, 6.0, 2.0, 1.0], [1.0, 3.0, 2.0, 9.0]], dtype=np.Float16)

?
According to the source code, that should still work, but it will be underlined as well.

Unfortunately, numeric data structures are one of the hardest problems when it comes to type annotations. The main takeaways here are:

  • If you want to restrict the type annotations to just a few types, that's fine, users can live with using a list of tuples instead of a 2D np.Float16 array. They just need this restriction explicitly documented and the rest of the code base coherent with this design decision.
  • You can be less restrictive with types using something like params: Sequence[Sequence] (or even Any, because I am not sure whether a 2D numpy array qualifies as a Sequence[Sequence]. The typing experience would be looser, but you would defer checking the correctness of the data structure to a runtime check.

To help you with the first point ("the rest of the code base coherent with this design decision"), I encourage you to use a type checker. Yes, another tool to get familiar with :D
The most popular ones for Python are pyright and mypy.

Be warned that running a type checker for the first time will probably highlight dozens of issues throughout your codebase. It just happens that we make plenty of assumptions while writing code. For instance, in utilies_f.py, at the very end plotter.set_xlabel(x_label if x_label else 'X') triggers an error for the type checker because plotteris either a list or a matplotlib Axis. The utility function makes the (probably correct) assumption that we have an axis by this point, and that we can access its method set_xlabel, but from a static analysis perspective, we might as well be trying to access this method on a list.

Misc.

  • I encourage you to give the PEP8 - Style guide a read. You sometimes use a format not commonly found in Python codebases (like listOfTuples mentioned above. A typical Python codebase would use ListOfTuples). Let's be clear, this is not a game changer, it will not make your code run any faster or something like that. However, by enforcing the same stylistic conventions as the ones commonly found in Python projects, you make it easier for people to contribute to your project.
  • There are a few instances in the example folder where you pass a show_individual argument. The correct one is in plural form.
  • I saw a few instances of functions signatures where one of the arguments has a default None value. You must be careful with the type annotation, like the _plot_fit method with data_label: Union[list[str], str] = None. If data_label can be either a list of str or a str, then it cannot be None, so you should use Optional like you did for a few other args.
  • There are quite a few variables/modules assigned/imported but never used. It's not a big deal, but it's always nice to cleanup those.
  • "Star imports" are usually frown upon in Python (ex: in distributions/backend/__init__.py with from .polynomials import *). The reason why is that star imports can silently override local definitions, which can lead to hard-to-debug issue. A better approach is to explicitly list what you want to import, or just import the module import polynomials to keep the namespace clean.

@syedalimohsinbukhari
Copy link
Author

Hi @Batalex,

Thanks for the insight; I always welcome any and all suggestions/recommendations. I'll be sure to address these issues. Cheers.

@syedalimohsinbukhari
Copy link
Author

@Batalex,

Hi, just dropping in here to confirm something.

I've worked ahead of v1.0.6 but couldn't push it before the reviewers were assigned. Those corrections are somewhat significantly different from the version submitted in a few functions.

Keeping in mind that the review is for v1.0.6, I've made a branch that's just a couple of minor corrections ahead of the submitted version, pyopensci-review. If it is okay with everyone, I intend to keep the review's correction there before merging into main.

If you have any other suggestions, please let me know.

@KristinaGagalova
Copy link

Hi @syedalimohsinbukhari,
Thank you for submitting pymultifit and working on such a neat project. Please see my review, overall its positive, I only have a few suggestions.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

~2

Review Comments

The package is thoughtfully designed and well-documented, which reflects best practices in modularity, extensibility, and usability. Please find constructive suggestions aimed at improving the developer and user experience further.

  1. Unused normalization parameter in exp_pdf, exp_cdf, and ```Exponential``, but it does not affect the output.
    Remove it unless you plan to implement normalization logic.
    Example removal:
def exp_pdf(x, lambda_):
    return lambda_ * np.exp(-lambda_ * x)
  1. To prevent overflow in the exponential of the black-body term (especially for large x / kt), in the black_body function consider capping or regularizing:
den_ = np.expm1(np.minimum(x / kt, 700))  # avoid overflow
  1. As @Batalex suggested previously, it would be good to add a parameter check function for the input and evaluate whether the input is as expected. Its up to the author to decide how strict that should be

Style suggestions:

  • It would be suitable to add type hints to functions so that the output is stated clearly
def exp_pdf(x: np.ndarray, lambda_: float, normalize: bool = False) -> np.ndarray:

Other suggestions:

  • It would be valuable to add a simple test with pytest to ensure that functions, such as Exponential().pdf(...), returns expected results for known inputs.
  • You propose an easy and out-of-the-box support for multifitters, namingpolyfit and curve_fit. Do you maybe have any bechnmarking and performance testing on these and how pyMultiFit improves on the mentioned ones?

@g4brielvs
Copy link

@syedalimohsinbukhari Thank you fit submitting this package for review. I'll conclude my comments this week. Apologies for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: under-review
Development

No branches or pull requests

6 participants