Seitenhierarchie

Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.
Kommentar: added csv deposit

Content

Inhalt
exclude(Inhalt|Weiterführende Informationen)
stylenone

Further information

Package structures

Transforming transfer information packages to SIPs and AIPs


Transfer package specifications

TIB uses several levels of transfer packages, which are described here. The graphic Package structure provides a general overview of the transfer packages used:

Specifications for transfer information packages are relevant for the delivery of objects.

The graphic Transforming transfer information packages to SIPs and AIPs describes how transfer information packages are transformed to pre-ingest SIPs, post-ingest SIPs and AIPs.

Anker
EPS_EN
EPS_EN
Transfer information packages

Different transfer information structures are used for different scenarios:

  1. SIPs with a simple structure and one representation
    SIPs with a simple structure can be submitted as ZIP files (example: the University Publications team) or as folders (example: the German Research Report team).
  2. METS deposit
    1. for objects with multiple representations or complex data structures
    2. Objects with multiple representations or complex data structures and externally created METS file

  3. Connection to the repository via OAI or another interface
  4. Objects with metadata from source systems or complex relations between data packages

The transfer information packages are described in the form of normalised tables. The table below explains the structure of the normalised tables.

Specification parameter

Implementation

Naming convention

Naming convention according to which the package must be named

Package structure

Structure in which the package must be available

Content data

Description of the minimum and maximum number of files expected

Permissible file formats

Description of the permissible file formats, where relevant

Representations

The number and type of representations allowed

Quality of data

Describes whether only valid and well-formed files are accepted

Metadata

Describes whether or not the object must be indexed in the Gemeinsamer Verbundkatalog (Union Catalogue, GVK)

Identifier

Identifier that uniquely identifies the object and links it to descriptive metadata, such as a PPN (identification number), an EKI (identification number given by the first cataloguing institution) or a handle

Legal metadata

Describes whether the object belongs to a collection in which several licence texts, licence text versions and access rights can be allocated. If this is the case, this must be mapped in a superordinate directory structure.


Anker
ES_EN
ES_EN
Objects with a simple structure and a representation

Example: the University Publications team – legacy data transfer


With this transfer information structure, there must be exactly one ZIP file, which may contain several files. All files in the ZIP file belong to the MASTER representation.

Specification parameter

Implementation

Naming convention

The ZIP file is named as follows: PPN__name of author

Package structure

One ZIP file per SIP, which contains all data belonging to the doctoral thesis

Content data

At least one PDF file

Permissible file formats

At least one PDF file is expected

Representations

MASTER

Quality of data

Only valid and well-formed PDF files are accepted.

Metadata

The object must be indexed in the catalogue.

Identifier

An identifier that refers to a catalogue record is expected to be found in the name of the ZIP file.

Legal metadata

No special identification necessary; all objects are subject to the same licence text and have the same access rights.


Example: the German Research Report team

With this transfer information structure, there must be exactly one file that belongs to the MASTER representation.

Specification parameter

Implementation

Naming convention

The file is named using the PPN.

Package structure

One PDF file per SIP

Content data

Exactly one PDF file

Permissible file formats

Exactly one PDF file

Representations

MASTER

Quality of data

Non-valid and non-well-formed files are also accepted.

Metadata

The object must be indexed in the catalogue.

Identifier

An identifier that refers to a catalogue record is expected to be found in the name of the PDF file.

Legal metadata

Objects are sorted by licence agreement and access rights before ingest.


Anker
METS-a_EN
METS-a_EN
Objects with multiple representations or complex data structures

In the transfer structure for complex objects, different representations may be ingested with 1-n files each.

Specification parameter

Implementation

Naming convention

The package is named at the top directory level using the EKI (identification number given by the first cataloguing institution).

Package structure

For each SIP, there is a directory named using the EKI. It contains a directory for each representation; the directory is named based on the name vocabulary defined in the archive. The content data are available in the representation folders.

Codeblock
IDENTIFIER
|--MASTER (mandatory)
|	|--File1
|	|--File n
|		|-- Folder 0-n
|			|--File 0-m
|--MODIFIED_MASTER (optional)
|	|--File1
|	|--File n
|	|-- Folder 0-n
|		|--File 0-m
|--DERIVATIVE_COPY (optional)
|	|--File1
|	|--File n
|	|-- Folder 0-n
|		|--File 0-m

Content data

At least one file per representation

Permissible file formats

No limitation

Representations

MASTER, MODIFIED_MASTER, DERIVATIVE_COPY

Quality of data

Non-valid and non-well-formed files are also accepted.

Metadata

The object must be indexed in the catalogue.

Identifier

EKI

Legal metadata

The acquisition team uses a superordinate directory structure to assign the objects to the collection groups, publications types, applicable licence texts (in different versions, where applicable) and access rights.

Anker
METS-b_EN
METS-b_EN
Objects with multiple representations or complex data structures and externally created METS file

In the transfer structure for complex objects with externally created METS file, the path identifier/content/streams may contain different representations with 1-n files each. A METS file which validates against the Rosetta xsd (https://developers.exlibrisgroup.com/rosetta/integrations/mets-dnx/) must be handed over.

Specification parameterImplementation

Naming convention

The package is named at the top directory level using an unique identifier.

Package structure

For each SIP, there is a directory named with an unique identifier. It contains one dc.xml with Dublin Core metadata and a directory named content. content directory contains a METS file ie1.xml and a directory streams, which contains the actual data. If multiple representations are handed over, the relevant files must be allocated to the corresponding representation in the METS file.

Other representation  names than MASTER, PRE-INGEST_MODIFIED_MASTER and DERIVATIVE_COPY must be coordinated with TIB’s digital preservation team.

Codeblock
IDENTIFIER
|--dc.xml
|--content
|	|--ie1.xml
|	|--streams
|		|--File1
|		|--File n
|		|--Folder 0-n
|			|--File 0-m


Content data

At least one file per representation.

Permissible file formats

No limitation

Representations

MASTER (mandatory), PRE-INGEST_MODIFIED_MASTER (optional), DERIVATIVE_COPY (optional), further representations according to prior agreement

Quality of data

Non-valid and non-well-formed files are also accepted.

Metadata

The object has not be indexed in the catalogue.

Identifier

Unique identifier

Legal metadata

Legal metadata are captured as follows:

1)      The access right to the object as assigned by the depositing institution shall be documented as dcterms:accessRights (e.g. private/public or another controlled vocabulary)

2)      The depositing institution‘s right to preserve the object shall be documented as dc:rights

3)      The submission agreement as concluded between TIB‘s team digital preservation and the depositing institution shall be documented as dcterms:license.

Anker
OAI_EN
OAI_EN
Connection to the repository via an OAI interface using the example of Leibniz Universität Hannover Institutional Repository

With this transfer information package, records are ingested via the OAI interface of Leibniz Universität Hannover Institutional Repository; the records must contain the metadata, at least a title and an identifier, and 1-n files. All objects within a record belong to the MASTER representation.

Specification parameter

Implementation

Naming convention

The original file name of the file is kept. No specification check is performed.

Package structure

At least one file per SIP

Content data

At least one file

At least one metadata format must include direct links to the files and supplements belonging to the record.

Permissible file formats

No limitation

Representations

MASTER

Quality of data

Non-valid and non-well-formed files are also accepted.

Metadata

The repository must contain at least the object’s title and identifier metadata. There may also be additional metadata.

Identifier

A repository-internal handle

Legal metadata

The applicable licence terms must be stated.

Anker
CSV
CSV
Objects with metadata from source systems or complex

dependencies

relations between data packages

Specification parameterImplementation

Naming convention

Das Paket ist auf der obersten Verzeichnisebene mit einem eindeutigen Identifier benannt

The package is named at the top directory level with a unique identifier.

Package structure

Pro SIP gibt es ein Verzeichnis, das mit einem eindeutigen Identifier benannt ist. Darin sind eine

For each SIP there is a directory named with a unique identifier. This contains a dc.xml

mit

with Dublin Core

-Metadaten enthalten, optional können eine harvest.xml mit Angaben zur Abholung von einer Datenquelle und eine collection.xml mit Angaben zur Zuordnung eines Objekts zu einer Sammlung enthalten sein.

metadata, optionally a harvest.xml with information about fetching from a data source and a collection.xml with information about assigning an object to a collection.

The representation directories contain the content files.

MD5 checksums can optionally be submitted as one checksum file for all files in all delivered identifier directories at the identifier directory level, or as one MD5 checksum per file in the representation directories.

Representation names apart from

Die Repräsentationsverzeichnisse enthalten die Inhaltsdateien.

MD5-Prüfsummen können optional als eine Prüfsummendatei für alle Dateien in allen abgelieferten Identifier-Verzeichnissen auf der Ebene der Identifier-Verzeichnisse oder als eine MD5-Prüfsumme pro Datei in den Repräsentationsverzeichnissen übermittelt werden.

Repräsentationsnamen abgesehen von

MASTER, PRE_INGEST_MODIFIED_MASTER

und

and DERIVATIVE_COPY

müssen mit der TIB abgestimmt werden

must be agreed with TIB.


Codeblock
languagediff
root
|--[checksums].[md5] (
optional
optionally a 
eine
checksum 
Checksummendatei
file 
für
for 
alle
all 
Dateien
files in all 
allen
delivered 
abgelieferten
identifier 
Identifier-Verzeichnissen
directories)
|--IDENTIFIER
|	|--dc.xml (
Pflicht
mandatory)
|	|--harvest.xml (optional)
|	|--collection.xml (optional)
|	|--MASTER (Pflicht)
|		|--1-n Files
|		|--0-1nFiles.[md5] (
optional
optionally 
je
one 
eine
checksum 
Checksummendatei
file 
pro
per 
Datei
file in 
der
the 
Repräsentation
representation)
|		|--0-n 
Verzeichnisse
directories
|			|-- ...
|	|--PRE_INGEST_MODIFIED_MASTER (optional)
|		|--1-n Files
|		|--0-n Files.[md5] (
optional
optionally 
je
one 
eine
checksum 
Checksummendatei
file 
pro
per 
Datei
file in 
der
the 
Repräsentation
representation)
|		|--0-n 
Verzeichnisse
directories
|		|-- ...
|	|--DERIVATIVE_COPY (optional)
|		|--1-n Files
|		|--0-1nFiles.[md5] (
optional
optionally 
je
one 
eine
checksum 
Checksummendatei
file 
pro
per 
Datei
file in 
der
the 
Repräsentation
representation)
|		|--0-n 
Verzeichnisse
directories
|			|-- ...
|	|--SOURCE_MD (optional)
|		|--1-n .[xml]
|		
|	|--
weitere
more 
Repräsentationen
representations (optional)
|--IDENTIFIER


Content data

mindestens eine Datei pro Repräsentation

at least one file per representation

Permissible file formats

keine Beschränkung

No limitation

Representations

MASTER (

Pflicht

mandatory), PRE_INGEST_MODIFIED_MASTER (optional), DERIVATIVE_COPY (optional),

weitere nach Absprache

more representation by arrangement

Quality of data

Es werden auch nicht-valide und nicht-wohlgeformte Dateien akzeptiert.

Metadata

Das Objekt muss nicht im Katalog nachgewiesen sein. Es muss eine dc.xml vorhanden sein.

Identifier

Eindeutiger Identifier

Legal metadata

Rechtliche Metadaten werden in der CSV-Datei wie folgt erfasst:

1) Das Zugriffsrecht auf das Dokument, so wie für die Nutzung von der abgebenden Stelle erteilt, soll über dcterms:accessRight (z.B. als private/public oder mit einem anderen kontrollierten Vokabular) dokumentiert werden.
(2) Das Archivierungsrecht der abgebenden Stelle wird als dc:rights dokumentiert .
(3) Die Übernahmevereinbarung, wie zwischen dem Team LZA der TIB und abgebender Stelle abgeschlossen. Dies erfolgt über

Non-valid and non-well-formed files are also accepted.

Metadata

The object does not have to be indexed in the catalog. A dc.xml must be available.

Identifier

Unique identifier

Legal metadata

Legal metadata is captured in the CSV file as follows:

(1) The access right to the document, as granted for use by the releasing entity, shall be documented via dcterms:accessRight (e.g., as private/public or with some other controlled vocabulary).
(2) The archiving right of the releasing entity shall be documented as dc:rights .
(3) The transfer agreement, as concluded between the TIB digital preservation team and the donating body. This is done via dcterms:license.
4)

Zugriffsrechte auf das Objekt in Rosetta werden als Access Right erfasst

Access rights to the object in Rosetta are recorded as Access Right.

Anker
PreSIP_EN
PreSIP_EN
Pre-ingest SIPs

A submission application creates Rosetta-compliant pre-ingest SIPs from various transfer information packages, and transfers them to Rosetta during the second step.

Anker
PostSIP_EN
PostSIP_EN
Post-ingest SIPs

After deposit, the pre-ingest SIPs become post-ingest SIPs, which are enriched with additional metadata by the system. The transformation process is complete when a package has been transferred to permanent archival storage and successfully deposited there.

During further processing in Rosetta, the post-ingest SIP is transformed to an AIP, and is automatically enriched with additional metadata. A post-ingest SIP becomes an AIP once it has been transferred to permanent archival storage and saved there successfully.