-
Notifications
You must be signed in to change notification settings - Fork 7
clarify error states for time-related error codes 1402 through 1405 #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For reference, this is what is in the body of the spec
and in the Appendix
|
I think that there is general agreement that we should change the first three to be
My interpretation was "time outside valid range" meant "a time was outside of valid range":
I'll let others state what their preference is. @aharonroberts mentioned a different interpretation that I'm not sure I can reproduce. |
It is: The chosen time range is outside the available time range. Could just be: No data available for the chosen time range, but that is less specific.
Aaron
I think that there is general agreement that we should change the first three to be
1402 - Syntax error in time.min
1403 - Syntax error in time.max
1404 - time.min equal to or after time.max (no change from current)
My interpretation was "time outside valid range" meant "a time was outside of valid range":
1405 - time.min < startDate and/or time.max > stopDate
I'll let others state what their preference is.
|
We have 1201 for no data in the requested time range. The response should be empty and ideally, 1201 appears in the HTTP header if a headerless response is requested. |
I figured we had “no data in this range” but I wasn’t looking at the list. Maybe we don’t need 1405?
From: Bob Weigel <[email protected]>
Reply-To: hapi-server/data-specification <[email protected]>
Date: Monday, August 24, 2020 at 3:07 PM
To: hapi-server/data-specification <[email protected]>
Subject: [EXTERNAL] Re: [hapi-server/data-specification] clarify error states for time-related error codes 1402 through 1405 (#97)
We have 1201 for no data in the requested time range. The response should be empty and ideally, 1201 appears in the HTTP header if a headerless response is requested.
|
I agree that we don't need 1405; this would require users who are looking at the boundaries of the time series to know exactly where those boundaries are (or do multiple requests to find those boundaries). e.g., if someone requests data for 1March2015-31March2015, and the dataset starts on 3March2015, the server should return the data for March 3-March 31 and not a 400 error. |
From a user perspective, @supervised's comment is a good argument for removing 1405. A disadvantage is that it will make caching slightly more complex. Probably not so much that it justifies keeping it. |
I just implemented the server-side checks for these, and found them clear enough. I agree with @supervised, where I feel guilty putting in these precise checks that are going to be annoying for humans talking to the server. (It would be easy to replace a start which is before the startDate limit with the startDate, rather than throwing an exception, for example. |
Here is a proposal for clarification: Replace
with
I think that we should keep 1405 as-is for now. It was placed there with the intention of meaning I've tried a request to the following servers with 1405
No error; serves data starting at
|
The main discussion here is about 1405: It's current form: "time outside valid range" is ambiguous. Does this mean the whole requested time range is outside the valid range? OR just one of the request times (start or stop) is outside. Half the servers interpret this one way, half the other way. To fix the ambiguity, we have to decide what is the least surprising behavior. We will do a poll to see what hapi-dev people think. Option 1: make 1405 very strict, so that all requests must always fall within the known time ranges of the dataset. If the start time of the request is before the data start time (as advertised by the Option 2: make 1405 be only for when the entire requested time range is outside the valid range for the data; if a user request has any overlap with the valid range, just return the data that is present within the overlap. Option 3: Remove 1405 all together. Don't throw any errors it the time range is non-overlapping or has start or stop outside the valid range. If there is no data just, return an empty response (there is a no-data response already: Principles to follow:
|
Voting results
My preference is still for 1. It is simpler, has far fewer side effects that we'll need to deal with, and is what was originally intended for 1405. We can have the verifier check the error message and warn if it does return the allowed time range. We can always loosen things in the future if there is a user request, which there has not been. |
Sorry I didn't notice this thread earlier. Is an internal gap part of a dataset valid range? How precise are the begin and end times required to be for a dataset valid range? If a request begins for hour 00 of a day, but the beginning of the valid range is actually 00:00:00.003, then is that an error? (I guess this is reiterating some of the above comments.) My strong opinion is that option 2 is the only possible way to handle errors. For any time request, a server should be expected to return all available data within the requested interval and, arguably, some reasonable number of samples beyond the boundaries of the request. I see no compelling reason to treat the dataset boundaries as special cases requiring different behavior (than, say, internal gaps). The definition of 1405 should be, imho, "no data in requested interval". |
HAPI specifies that servers should strictly only return records that fall within the time requested, with the start time being inclusive and the stop time exclusive. This allows for content from multiple requests to be stitched together seamlessly. For a dataset that starts at 00:00:00.003, I think the server could go ahead and report a start time of 00:00 exactly. |
There was a lot of discussion on this Monday's HAPI developer's telecon. The main problem with option 2 or option 4 is that server behavior is different from server to server. It some servers are allowed to not report an error for a time range outside the request, then if a request is way outside (start time is off by years), then users will get confused about no data, since the server can't report an error on the input. Having all servers strictly enforce the requirement that all time requests must be fully inside will ensure that there is consistent behavior across all servers. It dies mean that clients must clip any requests to fit inside the advertised availability range of each datasets. They need to do this anyway once they get the data, so it just shifts where the careful clipping needs to be done. One concern is the moving end date situation which is common for active missions. We probably also need a way to specify There was also the suggestion that servers be allowed to be sloppy up to a oint. If there is a (We would need a separate ticket for the |
The server responds with an empty body if a request is made for a time range within start/stop with no data. (I was writing the following when I saw Jon's summary above.) I think that it would be useful for servers to allow start before the actual start and stop after actual stop. But on Monday we discussed several complications. I recall one of the issues is that if the server does not update the stop date in the metadata, but data are returned when a stop date is given after that given in the metadata because the database is being continuously updated. A user may notice that they can just set the stop date to a time in the distant future. Someone who looks at the info response date will conclude the data are not being updated. If servers do allow start = 00:00:00 when the actual start date is 00:00:30, what happens if the data are on a 1 ms cadence and the maxRequestDuration is 1 second? The client may conclude that the requested time range is too large unless additional logic is implemented to handle this case. Same for the server (or maybe not - the point is that there are many cases that would need documentation.) If someone sends me a URL that returns data for a dataset with a start=1970-01-01 and stop=2000-01-01 but data only exists in 1999, I'd be confused. Do we allow start dates of 0001-01-01T00:00:00Z and stop dates of 4000-01-01T00:00:00Z? If not, what should be allowed? This would require additional documentation and discussion as it may need to depend on the nominal cadence. We also only touched on the implications for caching. Given how long this discussion is taking and that the original intent was option 1., I suggest we go with option 1. and consider a modification if there is a strong demand for an alternative. This could be considered with |
(Just my opinions here.) Be careful not to impose human expectations onto time handling. Human expectations are frequently self-inconsistent and just plain wrong. (E.g., Midnight New Year's Eve occurs at the beginning of Dec 31.) We have instruments that sample at 200000 samples per second; what if I want the second 50 samples? This is complicated by the occasional practice of listing the cadence as that of the repeating records, not of the waveform capture. Since time intervals necessarily must be specified as Tbegin <= t < Tend, then Tend must be allowed to be beyond the last time tag of the last record of a data set. I don't understand why allowing requested Tbegin and Tend outside the range of a data set should be confusing any more than internal gaps would be confusing to a user. You can only return the data that exist. |
If the |
Add to the spec about how to handle real-time date that is being updated. You can have the server set the A request (even a HEAD request) would include the expiry time for the Eventually allow for a |
Also add suggestion that 1405 errors should include what valid start/stops are. |
Co-authored-by: Bob Weigel <[email protected]>
I'm running into this issue for continuously updated data now, while trying to fix some minor issues with the timeline viewer. The INTERMAGNET server currently returns a 400 / 1405, returning no data, when making a request for a view which includes the current time, which is very useful for monitoring magnetometers in real time during a storm (see e.g. https://spaceweather.knmi.nl/viewer/?layout=ground_magnetometers - as of Jan 17 2024, the data only appears if the user moves the view so that the current time indicator is out of the view to the right). I'm now testing a fix that passes the min([stop_date_in_view, stop_date_from_info]) into the data request. But I also have a setting to automatically refresh every couple of minutes, and with this fix, the data won't update, unless I make a new info request first for every data request, because it keeps using the old stopDate from the old info request. I would prefer a data request to always return the available data, even if the requested dates cover one of the edges of the dataset. Similar to how a request that covers a gap in the middle of a dataset would return a 200 OK with a partial or empty dataset, but would not return a 400 Bad Request. I also want to build a feature that uses the startDate and stopDate to give guidance to the user who doesn't see the data show up in the timeline viewer, when the view is outside the range of the data, and having these info values be accurate representations would be helpful for that. The proposed relativeStopDate would also be useful, as would having an indication of the update cadence in the info (e.g. a 3-hourly Kp forecast for the next three days can be updated at daily cadence). I'm not in favour of using some sort of 'fake' stopDate, that does not represent the actual limits of the available data. In any case, I think a better solution to the current 1405 situation, which is: |
We will look at this during our in-person meeting next week. |
Issue #167 also mentions real-time data showing up |
Bob suggested a keyword for meaning the info stopDate. start=2023-01-01&stop=stopDate |
How to communicate to HAPI clients should ask for new data? Maybe put an expires-at value in the HTTP header. MQTT protocol? Used by weather servers Maybe have a fully separate server that reads from a HAPI server on the back end and makes live connections available busing something like MQTT. World Met. Org has effort called WIS2 that uses this MQTT - it's their version of HAPI. |
Email sent to [email protected] We are looking for feedback from HAPI data providers with servers where the dataset end date/time changes. Eleco has an application where he wants to update a plot when new data becomes available (for INTERMAGNET now, but eventually for others). To do this, he needs to periodically request the info to get the current stop date and then make a data request if the stop date has changed. One issue this does not address is the case where the application wants to update the plot automatically. To do this, one would hit the server with stop=stopDate on a timer, which could lead to many useless requests. As a result, we propose that our documentation indicates that the server can set a HTTP header such as "expires" (but which header is debatable) that indicates to the client when it should make a new request. If you have any feedback, please respond to the issue thread (ideally) or this email thread. You can also attend our weekly HAPI developers meeting for additional discussion. |
Discussion of the above email has moved to This issue number has been closed. |
The error messages have ambiguities in terms of syntax or times being outside the valid ranges. Especially 1405 - what if there is some overlap, for example.
The text was updated successfully, but these errors were encountered: