• 沒有找到結果。

Metadata: Group discussions began with metadata and the requirements for metadata. The need for and range of metadata required vary depending on the intended application and it is important to

define what the use applications are before consideration of what metadata standards should or could be adopted. Motivations for metadata include; to describe data (who, what when, where, how, data quality); to facilitate data discovery and new scientific collaborations; to reprocess and synthesize data; to exchange data including harvest it to one location for specialized use; and to generate user interfaces.

The development of a consistent community practice with respect to metadata is hindered by a wide range of problems including: (list as sidebar)

 Benefits of metadata may not be adequately understood by those who originally document a dataset, leading to minimal and inadequate metadata for most reuse.

 Interpretation of standards differ and there is little guidance on how to fill them out.

 Some required information for the intended use is not provided. This is an inevitable outcome of different users of data having different interests and needs from those who originally documented the data.

 To make metadata fully discoverable and usable by users from other scientific domains, it may be necessary to satisfy a number of sophisticated standards and vocabularies, even for a single data set. This level of sophistication is not supported by current tools and data models, and not expected by users.

 For legacy data, it may be impossible to recover the needed metadata after the observations have taken place.

 The desire to control what information is exposed sometimes constrains the metadata that is provided (for example, the location of a ship working in an ecologically or financially sensitive area).

 Initial creation of metadata by users can be time-consuming, confusing, and unrewarding (due to the amount of metadata requested, poor tools and user interfaces, and limited infrastructure supporting metadata creation).

Common practice for how metadata is provided also varies greatly between disciplines and data types. For some data types, metadata may be embedded in formatted data (e.g. GeoTIFF, HDF, NetCDF, NITFS, SEGY, MGD77, ESRIgrid (ARCASCII), GRIB). For embedded metadata, additional challenges include inconsistent metadata formats in file headers and the often inadequate models and structure for information (metadata/data) adopted in the file format. For other data types, metadata are provided external to data. Currently used standards include FGDC, DIF, Dublin Core, and ISO 19115 (following the implementation approach of ISO 19139).

Most data and metadata centers are moving to work with ISO 19115, but it is a somewhat general-purpose standard. To become more useful for a particular community, a profile or extension (see sidebar) must be developed that meets the community needs. Of course, such tailored enhancements of the standard will not work with the ones developed for other communities, unless specific measures are taken to assure interoperability. In addition, ISO standards are not freely available (and in fact are somewhat costly). The workshop participants expressed concern that these issues might inhibit widespread adoption of ISO 19115.

Include as side bar

“Extensions, Profiles, and Vocabularies

Extensions are additions to a metadata standard that allow users to provide information in additional

fields that were not mentioned in the original standard. In standards such as ISO 19115, extensions include1:

 addition of a new metadata section

alteration of the domain of a metadata element (for example, assigning a code list to specify what responses are allowed for that metadata element)

 addition of terms in a code list

 addition of a new metadata element to an existing metadata element

 addition of a new metadata entity

 changing the obligation of a metadata element from optional to mandatory (but not the reverse, which would break the core standard)

Constraints are considered a specialized subset of extensions, in which additional restrictions are placed on the standard. (In the above list items 2 and 6 are constraints.) In this case the term 'extension' is describing the addition of information to the standard, even though the metadata instances that follow the standard are restricted.

Profiles are the community-specific application of the metadata standard. In a sense, profile = metadata content standard + extensions.

Profiles must meet the core requirements of the metadata content standard (that is, provide the mandatory elements that the standard requires) but can include extensions (described above). Since we also know a metadata content standard is composed of the core metadat set, a profile also can be thought of as

profile = core metadata set + optional elements + extensions.

The developers of most content standards expect and encourage the development of extensions and profiles, and may direct how they are to be specified and/or registered. A community that adopts a profile increases the interoperability of its metadata internally. It even increases its interoperability with communities that use other profiles, because the use of the core metadata elements is shared.

An important way that content standards may be constrained is through the use of vocabularies.

Vocabularies can be used to fill out particular fields within the standard. The vocabulary used may be specified within the standard itself (for example, some fields in ISO 19115 define possible entries); or the standard may describe how to specify the vocabulary or vocabularies used (netCDF COARDS/CF allows users to specify the "standard vocabulary"); or the standard may be silent about vocabularies (the CSDGM is fairly open about how many fields are filled out). As noted above, extensions are a common way to narrow theoptionsforfilling outfieldsrequiring textualresponses.“ From MMI sensor workshop report (http://marinemetadata.org/smireportdraftpdf).

Interfaces: To develop an interoperable system requires more than standardization on data and

metadata formats. It also requires consideration of the interfaces to data catalogs or data servers that

facilitate data transport between distributed repositories, and of the interfaces to services such as vocabulary list servers, unique reference systems (that generate unique identification numbers or strings for objects and data sets), and universal resource name resolvers (that can translate a URN to a web site, or to other information as appropriate). The specification for these interfaces includes the transport protocols, which describe how the connection is made between systems, and is likely to include a specification of the content that is transferred using the protocol. That content specification is analogous, and in some cases the same as, the content specifications described above.

Just as there are a wide variety of data and metadata formats currently in use, there are also a wide range of protocols in common use for interfaces (e.g. SOAP, REST, OAI-PMH, UDDI, WSDL, OPeNDAP, THREDDS).

General needs with respect to interfaces are for a well defined overarching architecture that is open for neighboring communities to access; consistent ways to discover data; coherent, consistent and complete standards with respect to a science domain; better tools to work with standards, and better collaborative tools that gracefully integrate appropriate interfaces, or can be used to develop new ones. Interfaces must be chosen and implemented appropriate to use requirements.

Registries: Registries provide searc

hable lists of ‘objects’, which are typically computation resources but may range from websites, to metadata, to data sets, to data systems. An overview of some existing registries relevant for marine, and more broadly geoscience, data are listed in Table 2.

Registriesfora variety ofotherkindsof‘objects’are currently lacking.Forexample,registriesof Web Map Services, online KML resources, or of sensor information are all needed.

Registry Objects Services Interface

protocol

Metadata

GCMD Datasets WxS DIF

STD-DOI Datasets SOAP

OceanPortal websites

SESAR samples WSDL/SOAP

Pangaea Datasets OAI-PMH DIF, DC, ISO

WDC Datasets

GeoNetwork Datasets Z39.50 ISO, FGDC,

DC

GeoConnections Datasets FGDC, ISO

SEDIS Datasets WMS OAI-PMH ISO

NDG Datasets OAI-PMH

SOAP,REST

MOLES ,

FGDC, ISO,

DIF, DC,

CSML

OAIster DOIs OAI-PMH

GEON All WxS WSDL/SOAP

Principles for selection. When selecting the protocol, content, and vocabulary specifications

and tools for a community, consideration should be given to the needs of the community and characteristics of the available resources (specifications and tools). Factors to consider include the degree of adoption of each resource (within the community, and as a whole); the degree to which the resource describes or satisfies the characteristics of interest to the community, or can be extended to do so; and degree to which the resource will be used in automated systems. Another important consideration is whether the agreement is intended to come up with a working solution as quickly as possible, or is able to develop a solution that can support future growth of both the community and the larger environmental cyberinfrastructure. More capability is possible, and required, for systems to support anticipated advances in cyberinfrastructure.

There are several existing community-based efforts relevant to the selection and development of standards and protocols to support data exchange within the marine geoscience community. These include the SeaVox project (http://www.bodc.ac.uk/data/codes_and_formats/seavox/) and the Marine Metadata Initiative (MMI, www.marinemetadata.org). SeaVoc is a

Vocabulary Content Governance Group, moderated by BODC (**Roy Lowry- would you like to add further description here?).

The MMI hosts a wide range of information on specifications and tools and encourages contribution of information developed by the community for others (in that and other communities) to use. They also encourage community projects, which are developing their own standards to consider using the MMI site to host their materials and publish their deliberations.

Recommendations

T3-R1. The community must minimize the proliferation of metadata standards and work

toward a uniform approach for scientific metadata. There are two basic approaches to the problem

of proliferating metadata standards; 1. develop a single uniform specification for scientific metadata, and 2. facilitate mediation or crosswalks between what is hopefully a limited number of different metadata standards. A single universal specification is unattainable, but a coherent, consistent, science-focused approach, ideally focused on building a minimum subset of profiles around a single standard, will limit the proliferation of profiles and ensure that the concept of developing crosswalks is viable.

T3-R2. The community must create agreed processes for community development of

standards, guidance, and profiles. Governing structures are needed to enable the development of a

community consensus about the overall standard(s) and approaches, and to establish processes to develop "official" extensions as needed for different specialized fields.

T3-R3. Community-based best practices for adoption of the ISO 19115 standard are required.

As many groups within the global geoscience community are moving to adopt the ISO 19115 standard, there is a strong desire to avoid fragmentation and adopt a common solution to the problems of interpretation associated with this standard. To address these issues, a sub-committee of scientific data-metadata users needs to be established to come up with a best practice document with clear examples for application of the ISO 19115 standard (and ISO 19139). These guidelines

would provide recommendations developed by the scientific community to resolve the interpretation ambiguities of the ISO standard, and make the current standard more portable between data and metadata centers.

T3-R4. New efforts within the marine geoscience community to develop standards and

protocols to support interoperability should build upon and take advantage of existing efforts.

Community-based efforts including the SeaVox project

(www.bodc.ac.uk/data/codes_and_formats/seavox/) and the Marine Metadata Initiative (MMI,

www.marinemetadata.org) offer relevant services, as well as forums for participation and

contribution.

3.3.2. Se s s i on I I : The Low-Hangi ng Fr ui t f or Dat a Exc hange

The working group for session II focused their discussions on identifying opportunities for interoperability in the near future given the existing data resources within the global marine geoscience community. This group was asked to;

 Explore realistic opportunities for the implementation of international data exchange

 Define a plan for easy start

A growing variety of data resources relevant for marine geoscience research now exist within the international community. Each provides varying levels of data discovery and data delivery through their own custom search interfaces. At present, to find data of interest across these distributed data centers, a user must first be aware of all relevant data resources, visit each site, learn how to use the particular search interfaces provided (often in a language other than their own) just to determine whether data of interest exists at that data center. In contrast to the current scenario, what users desire is the ability to discover (and then access) data of interest seamlessly across distributed centers without the need for pre-existing knowledge of each resource and how to use their individual search tools.

The general consensus was that an achievable initial goal is to develop a data discovery resource across possibly a subset of the distributed and heterogeneous data resources now available within the international community. Discussions regarding how to implement a resource discovery interface focussed on the potential scope, as well as organizational and technical issues.

Scope: One approach for building a resource discovery-only interface would be to gather