Feature: Automatic generation of StrictDoc's qualification data package #2147

stanislaw · 2025-04-13T09:23:00Z

stanislaw · 2025-04-13T09:24:42Z

Any feedback is greatly appreciated!

cc @haxtibal @johanenglund @thseiler @richardbarlow @nicpappler

richardbarlow · 2025-04-17T08:24:52Z

This looks like a great plan! Thank you for putting the time and effort in to help ensure that StrictDoc is easy to adopt.
Hannah, Philippa and I have just reviewed it and have some questions/comments/observations.

Before getting into the details I thought I'd make it clear what we (gaitQ) need as a Medical Device (MD) manufacturer:

For any tool that we use in the development of our MDs we must ensure that the tool meets our intended use of the tool.
What that means is that there is no single method of qualifying StrictDoc for use across different MD manufacturers as they may each have a different intended use for the tool.
It is up to us to create a validation plan for every tool that we use and there are no common standards across the industry for requirement capture tools.

As discussed in the office hours the other week, our preferred approach would be for us to map our requirements to StrictDoc's requirements.
We would then depend on StrictDoc having a sufficiently rigorous internal test process that is documented and produces traceable test results - as you are proposing.
Each time we update the version of StrictDoc we use, we would review the StrictDoc documentation to ensure that the processes have not changed (or assess them for acceptability if they have), download and archive a copy of the qualification data pack and finally review the qualification data pack to ensure that our requirements are still met.

Document the approach in the SDoc documentation.

In addition to the documentation of the qualification data package generation, we feel that there a couple of additional bits of documentation missing to ensure that quality is maintained in general (unless we're just looking in the wrong place?):

A defined process for how changes are reviewed and merged into StrictDoc. E.g. PRs must have n approvals from x, y, z before merge.
Requirements for overall test coverage. E.g. no PR will lower the overall coverage unless under exceptional circumstances with a documented rationale
Ideally both of these would be automatically enforced in the GitHub PR process.
We also think that adding a template for PRs that includes the checklist from the 'Contributing to StrictDoc' would be beneficial to help ensure that this process is followed.

Currently, StrictDoc only has the following artifacts generated automatically...

Something that I wanted to ask in the most recent office hours call was what do you consider to be the 'Device Under Test' when it comes to testing a StrictDoc release?
I see that there are no artifacts associated with the 'Releases' in GitHub (other than the automatically generated zip/tarball), so presumably the source distribution uploaded to PyPI is the canonical release artifact for a StrictDoc release?

We would prefer that all of the qualification steps are performed against the actual artifact that will be pushed to PyPI.
I appreciate that this unfortunately isn't the whole story due to dependencies, but it would go a long way to giving us some confidence.
I can fully understand why you would be looking to qualify a container image that contains all dependencies, but achieving that is quite a big task in itself and we would be happy with less for now.
Ultimately we may end up deciding that we have to re-run the StrictDoc qualification steps ourselves with fully pinned dependencies, but this is a decision for us based on our risk management.

Have you considered attestation of both the source release pushed to PyPI and the qualification data pack?
It would give us a great deal of comfort to be able to prove that the version of software published to PyPI and its associated qualification data pack are authentic and match the git tag.

Deploy the package under a current version or a public location (qualification repository).

If PyPI is the repository for release artifacts then to me it would make the most sense for it to live along side those.
However, this is outside of the scope of PyPI so it probably not possible.
Perhaps adding the qualification data pack to the GitHub Release record would be the next best thing?

I think it's outside the scope of this issue, but we also need the traceability of StrictDoc's requirements through to the source - at least for the requirements that we depend on.
We believe the most value can be found from tracing the requirements to the integration tests, rather than the implementation itself.

We are in the process of defining our requirements for StrictDoc and should be able to get something to you in the next week or two.

stanislaw · 2025-04-20T13:30:33Z

This looks like a great plan! Thank you for putting the time and effort in
to help ensure that StrictDoc is easy to adopt.
Hannah, Philippa and I have just reviewed it and have some questions/comments/observations.

Thanks. This is very good input. Now that you are creating requirements for StrictDoc in a dedicated document, please consider including all these points there, so that we can address them in a structured and traceable manner.

To support the first round of discussion, my comments below. Later on, I will provide my answers in SDoc documents, traceable from your document.

Before getting into the details I thought I'd make it clear what we (gaitQ) need as a Medical Device (MD) manufacturer:
For any tool that we use in the development of our MDs we must ensure that the tool meets our intended use of the tool.
What that means is that there is no single method of qualifying StrictDoc for use across different MD manufacturers as they may each have a different intended use for the tool.
It is up to us to create a validation plan for every tool that we use and there are no common standards across the industry for requirement capture tools.

Understood, and as far as I know, this aligns with how safety-related developments are generally handled. I am also familiar with how the RTEMS RTOS prepares its Qualification Data Package (QDP) for a specific minimal profile across a set of pre-qualified hardware platforms. A user then has to integrate the RTEMS QDP into their larger project and demonstrate that the intended use remains within the envelope of the QDP. I consider the RTEMS QDP a good reference for many aspects of OSS qualification.

As discussed in the office hours the other week, our preferred approach
would be for us to map our requirements to StrictDoc's requirements.
We would then depend on StrictDoc having a sufficiently rigorous internal
test process that is documented and produces traceable test results - as you are proposing.
Each time we update the version of StrictDoc we use, we would review the
StrictDoc documentation to ensure that the processes have not changed
(or assess them for acceptability if they have),
download and archive a copy of the qualification data pack and finally
review the qualification data pack to ensure that our requirements are still met.

Document the approach in the SDoc documentation.

Understood and agree with the approach. Please include these as requirements in your document.

In addition to the documentation of the qualification data package
generation, we feel that there a couple of additional bits of documentation missing to ensure that quality is maintained in general (unless we're just looking in the wrong place?):

1. A defined process for how changes are reviewed and merged into StrictDoc.
> E.g. PRs must have n approvals from x, y, z before merge.

Within the small core team (@mettta and myself), we haven't been too strict about this so far, as many things were discussed directly. However, we have always reviewed and approved contributions from users.

Now that there are more contributors and we want to move towards a more formal development process, it would be a good idea to configure the GitHub settings to require at least one approval, and to document this approach in the development plan.

2. Requirements for overall test coverage. E.g. no PR will lower the
> overall coverage unless under exceptional circumstances with a documented rationale

This is an important open point. StrictDoc has four groups of tests, and none of them individually achieves 100% code coverage. The current coverage threshold for unit tests is set at 60%, mainly because some Python classes are primarily exercised through higher-level end-to-end tests for the CLI and web interface.

I am considering a solution where a custom Python script would validate that the combined code coverage does not drop below a certain percentage. However, the challenge is that the GitHub CI jobs are split between CLI and web end-to-end tests for performance reasons. They have to be parallelized, otherwise, a single test job would take much longer for each PR.

Ideally both of these would be automatically enforced in the GitHub PR
process.

I need to think through a good solution to this. Maybe we could make an exception for GitHub CI Linux jobs because they are the fastest. If all tests on Linux are merged into one job that calculates the combined coverage and sets a limit, this would give a day-to-day validation of the code coverage threshold.

We also think that adding a template for PRs that includes the checklist
from the 'Contributing to StrictDoc' would be beneficial to help ensure that this process is followed.

This one is easy, opened an issue: #2169.

Currently, StrictDoc only has the following artifacts generated automatically...
Something that I wanted to ask in the most recent office hours call was
what do you consider to be the 'Device Under Test' when it comes to testing a StrictDoc release? I see that there are no artifacts associated with the 'Releases' in GitHub (other than the automatically generated zip/tarball), so presumably the source distribution uploaded to PyPI is the canonical release artifact for a StrictDoc release?

Yes, the plan was to use PyPI as the device under test.

We would prefer that all of the qualification steps are performed against
the actual artifact that will be pushed to PyPI.

That was the plan.

I appreciate that this unfortunately isn't the whole story due to
dependencies, but it would go a long way to giving us some confidence.
I can fully understand why you would be looking to qualify a container
image that contains all dependencies, but achieving that is quite a big task in itself and we would be happy with less for now.
Ultimately we may end up deciding that we have to re-run the StrictDoc
qualification steps ourselves with fully pinned dependencies, but this is a decision for us based on our risk management.

It is an interesting trade-off between development convenience and precise version pinning.

StrictDoc used to have all its direct dependencies pinned, but that created some overhead because it requires manual updates of all dependencies and reacting to security update notifications.

If strict pinning was required, we would need a complete list of dependencies frozen recursively. I would suggest keeping this as a separate exercise, to be done only if a more rigorous approach becomes necessary.

Have you considered attestation of both the source release pushed to PyPI
and the qualification data pack?
It would give us a great deal of comfort to be able to prove that the
version of software published to PyPI and its associated qualification data pack are authentic and match the git tag.

It is a great idea, and I need to think about how it could be implemented.

I used to have an automatic release and deployment process triggered by a GitHub release, but I had to disable it at some point because:

I encountered occasional issues when PyPI was temporarily down, causing the release job or dependency resolution to fail, often after a 30-minute run due to an unrelated network problem.
The end-to-end tests, while generally stable, have a small but nonzero percentage of flaky tests, typically caused by SeleniumBase timeouts or internal Selenium WebDriver handling. To address this, I had implemented a retry mechanism on Selenium exceptions, but some residual instability remains across the 250+ e2e tests. Failing the whole release job due to a single flaky test is not great.
Libraries like Ruff evolve quickly, which sometimes leads to CI failures when a new Ruff lint check is introduced and StrictDoc's code requires adjustment. This could be solved by pinning library versions, but again, it brings us back to the earlier trade-off of added maintenance overhead.
The current solution is to run all existing tests with a given PR, manually bump the version, manually release StrictDoc, and manually create a GitHub tag through the GitHub UI.

Ideally, the StrictDoc PIP package should already be available when the qualification tasks start running. However, I have observed that after a release to PyPI, it sometimes takes a few seconds or minutes before the new package becomes fully downloadable. A related issue: after releasing, one needs to trigger one pip install to first download the old version and then the new becomes available as if the pip install action refreshes some caches with PyPI 😕 . This suggests that tightly coupling release and qualification tasks into a single job could be tricky.

I am open to discussing the best approach for improving this. So far, solving it has not been the highest priority because the manual process has worked reliably enough.

Deploy the package under a current version or a public location
(qualification repository).
If PyPI is the repository for release artifacts then to me it would make
the most sense for it to live along side those.
However, this is outside of the scope of PyPI so it probably not possible.
Perhaps adding the qualification data pack to the GitHub Release record would be the next best thing?

We need to think about connecting the PyPI stable release and the qualification data pack. I am not sure if coupling directly the PyPI release and the qualification data pack is a good idea from the release process convenience perspective.

I think it's outside the scope of this issue, but we also need the
traceability of StrictDoc's requirements through to the source - at least for the requirements that we depend on. We believe the most value can be found from tracing the requirements to the integration tests, rather than the implementation itself.

We are planning to trace requirements to both code and tests. We have created this diagram to structure more how StrictDoc's own documentation has to be traced: #2167.

With a few exceptions, we are almost certain this structure should cover everything StrictDoc has. The rest is mechanical work of adding the traces.

We are in the process of defining our requirements for StrictDoc and should
be able to get something to you in the next week or two.

This is great, and we are looking forward to reviewing it.

Thanks for your comments!

stanislaw added the Core roadmap label Apr 15, 2025

stanislaw added this to the 2025-Q2 milestone Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Automatic generation of StrictDoc's qualification data package #2147

Feature: Automatic generation of StrictDoc's qualification data package #2147

stanislaw commented Apr 13, 2025 •

edited

Loading

stanislaw commented Apr 13, 2025

richardbarlow commented Apr 17, 2025

stanislaw commented Apr 20, 2025

Feature: Automatic generation of StrictDoc's qualification data package #2147

Feature: Automatic generation of StrictDoc's qualification data package #2147

Comments

stanislaw commented Apr 13, 2025 • edited Loading

Description

Problem

Solution

Additional Information

stanislaw commented Apr 13, 2025

richardbarlow commented Apr 17, 2025

stanislaw commented Apr 20, 2025

stanislaw commented Apr 13, 2025 •

edited

Loading