Skip to content
This repository was archived by the owner on Jun 18, 2024. It is now read-only.

Need a way to capture a Title and Description for each resource in the Distribution array #248

Closed
cew821 opened this issue Jan 15, 2014 · 14 comments

Comments

@cew821
Copy link

cew821 commented Jan 15, 2014

The required fields for the distribution type are accessURL and format i.e. MIME-type. I think we should consider adding two additional fields (which could be optional) to the distribution type, which would be resourceTitle and resourceDescription.

Right now, if data managers create a data record using CKAN (i.e. using inventory.data.gov), the data manager is prompted to provide a Title and Description in addition to the accessURL/webService and File Format. When a user browses the inventory within that CKAN instance, they see this title and description as a part of the record. See this screenshot for an example:

image

However, since the schema doesn't have a field for these Title and Description elements, this data is lost when the data.json file is generated from the inventory. As a result, users of catalog.data.gov don't see a title or description for each resource, even if it was originally provided by a data manager in inventory.data.gov:

image

I propose that we add some optional fields to the schema that would allow each object in a distribution include a resourceTitle and resourceDescription text field, which, if present, could be used by catalog.data.gov and other catalogs to provide users of the catalog more information about each resource in the distribution.

@dsmorgan77
Copy link
Contributor

👍

@pschweitzerusgsgov
Copy link
Contributor

Yes.

DCAT has these fields in its Distribution type, so it should be a simple matter to take notice of them if present.

@gbinal gbinal added the schema label Apr 14, 2014
@gbinal gbinal added this to the Next Version of Common Core Metadata Schema milestone Apr 14, 2014
@haleyvandyck
Copy link
Contributor

Thanks for this-- looks good. We've added it to the list of things to consider with the next schema update.

@bletalien
Copy link

Please!!! Here's an example of why: http://catalog.data.gov/dataset/general-schedule-and-locality-pay

@gbinal
Copy link
Contributor

gbinal commented Jul 17, 2014

There seems to be a lot of support from this (I literally just heard someone say 'yes, yes, yes please').

What seems to be at issue is not necc. adding new fields but providing guidance for how these could be used within the distribution when an agency has the complexity and feels strongly about it.

+1

@smrgeoinfo
Copy link
Contributor

classic example! @gbinal are you arguing against adding link properties for the accessURLs?

@dafeder
Copy link

dafeder commented Jul 17, 2014

I think the idea would be to use existing title and description field names inside the distribution array (rather than add new field names to the schema, especially using the term "resource" which is CKAN-specific). This would be consistent with DCAT: http://www.w3.org/TR/vocab-dcat/#class-distribution

@smrgeoinfo
Copy link
Contributor

So the idea would be to extent the current https://project-open-data.github.io/schema/ distribution that has an accessURL and format to include dcat:title and dcat:description?

from the current POD distribution documentation:
"Distribution is a concatenation, as appropriate, of the following elements: accessURL and format. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entries as seen below:"

@dafeder
Copy link

dafeder commented Jul 17, 2014

That would certainly make sense for CKAN and DKAN! Agree they should both be optional.

@dafeder
Copy link

dafeder commented Jul 17, 2014

Except, when implementing as RDFa it would be dct:title, not dcat:title. JSON would use simply title.

@gbinal
Copy link
Contributor

gbinal commented Jul 24, 2014

There's been pretty good agreement on allowing for this, though not requiring it. Also,

@philipashlock philipashlock modified the milestone: Next Version of Common Core Metadata Schema (1.0 -> 1.1.) Jul 24, 2014
@philipashlock
Copy link
Contributor

Here's an example of adding the title and description within a distribution just like DCAT allows (see #350). Note that this example also includes the change of format to mediaType (#272) for IANA MIME types and accessURL to downloadURL (#335) for file download URLs

One question is what good examples of title and description would look like. Should one of them be the file name or a description of the file format? A human readable value for the file format should already be covered by format

This is an excerpt, but see the gist for the full data.json

 "distribution": [
    {
        "description": "Widgets data as a CSV file",
        "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets.csv",
        "format": "CSV",
        "mediaType": "text/csv",
        "title": "widgets.csv"
    },
    {
        "description": "Widgets data as a zipped CSV file with attached data dictionary",
        "downloadURL": "https://data.agency.gov/datasets/widgets-statistics/widgets-all.zip",
        "format": "Zipped CSV",
        "mediaType": "application/zip",
        "title": "widgets-all.zip"
    },
    {
        "accessURL": "https://data.agency.gov/api/widgets-statistics/",
        "description": "A fully queryable REST API with JSON and XML output",
        "format": "API",
        "title": "Widgets REST API"
    }
]

gbinal added a commit that referenced this issue Sep 8, 2014
In response to #217, #248

I still need to update the expanded guidance
gbinal added a commit that referenced this issue Sep 8, 2014
@gbinal
Copy link
Contributor

gbinal commented Sep 8, 2014

This is addressed by baa0178 and by cd7a527

rebeccawilliams pushed a commit that referenced this issue Oct 2, 2014
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?:

* Adds optional describedByType field at the dataset and distribution level (#291, #332)
* Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358)
* Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358)
* Changes publisher field to an object that allows multiple levels of organizations (#296)
* Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) 
* Changes format field to a human readable description and to exist only within distribution (#272, #293)
* Adds optional description field for use within distribution (#248)
* Adds optional title field for use within distribution (#248)
* Changes accrualPeriodicity field to use ISO 8601 date syntax (#292)
* Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217)
* Changes license field to be a URL (#196)
@gbinal
Copy link
Contributor

gbinal commented Nov 7, 2014

Thank you for driving the conversation around this issue and helping to assemble the v1.1 metadata update.

There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. Project Open Data is a living project though. Please continue any conversations around how the schema can be improved with new issues and pull requests!

It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data.

@gbinal gbinal closed this as completed Nov 7, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants