How does a digital repository work?
A repository stores data that can be retrieved and modified later. Different types of repositories exist. They can be used for version control, metadata and other purposes.
What is a repository?
‘Repository’ means ‘storage’ and comes from the Latin word repositorium. In software technology, a repository is a digital archive in which data, documents, development progress, metadata and programs can be stored and shared. Version control is another feature of repositories. Depending on the intended use, this technology enables large teams or communities working all over the world to collaborate on a shared project. Available types of repositories differ in terms of their approach and structure. The best-known repositories include GitHub and the Google Repository.
The basis for a repository is usually a database, which, depending on requirements, can be set up on a local hard disk or a server, or can also be distributed across numerous servers in a content delivery network (CDN). Data catalogues are created that contain the forms and representations of various stored objects and provide information about their relationship to each other. All this information is stored in the form of metadata and can be searched for, retrieved, modified and adapted at any time with the appropriate authorisation.
How is a repository structured?
To illustrate how a repository is structured, let’s visualise a tree. In software development, you can even see this reflected in the terminology. Here a distinction is made between the trunk, which contains the current version of a project and the source code, and the branches, where edits are stored. Changes are later added back to the trunk so that all participants have access to them. Storage works via tags.
What types of repositories are there?
Not all repositories are the same. They differ by their type of archive. Different approaches exist. The following are the best-known ones.
Repository for version management
In version management, the aim is to store data in a clear manner while logically working out steps and connections in a common archive. Source code files and other data are stored and archived. Data can be copied from the repository to a local hard drive for developers to continue working with them. This process is referred to as ‘checking out’. The developer then works with the local data, making changes or discarding previous changes. Once the work is complete, the latest state of the project is uploaded back to the repository, which is referred to as ‘checking in’. All changes and comments are logged during this process.
This approach has several advantages. For one, users can collaborate on a project without overwriting older versions. Instead, all status updates are logged, making it possible to return to a previous version. A repository enables small and large teams to collaborate on the same project. Updates can be made simultaneously without overwriting statuses or changes being lost. Theoretically, all users can continue a project at any state without any risks.
The most popular version control systems include CVS, GitHub and SVN.
Repository for metadata
A repository for metadata tends to be used in highly complex IT infrastructures. Such a repository contains the data of the entire system as well as information about the infrastructure’s context and environment. The advantage of this type of repository is that changes can be made without altering the source code or needing to implement additional programs. Instead, the database table, which is the basis for the respective system, is adapted in a straightforward manner. The metadata repository tends to be used in enterprise application integration (UAI) and data warehousing.
Repository for software
A software repository is particularly important for Linux users. A software repository contains application packages and the corresponding metadata such as explanations, annotations, dependencies and changes. Installation and updates are performed using a package manager. In this way, users don’t have to worry about updating their applications. Instead, the system is updated automatically. The updates themselves are often provided by the community. Users maintaining packages, known as package maintainers, typically provide the updated data and carry out the maintenance of the respective software repository.
Repository for document servers
The term repository is also applied to extensive network publications and document servers, at least figuratively. Although some special features of the repository principle aren’t adopted one to one, the procedure is adapted for use. Well-known document servers such as arXiv publish publications from the fields of biology, computer science, mathematics, physics and statistics. An expert reviews new articles and approves or rejects them. The scientific works can then be made available for download. However, in contrast to a version control repository, it is not possible to edit documents.
Repository for CASE
A repository is also frequently used in computer-aided software engineering. It’s mainly used to store project data, documentation and source code.
Which repositories are useful?
Numerous types of repositories are available for different purposes. A distinction is made between solutions that are open source and those offered commercially. The most popular open-source repository is GitHub. However, there are various GitHub alternatives such as Apache Allura, Bazaar, Gitolite, Mercurial or SourceForge. A detailed comparison of GitHub and GitLab is available in our Digital Guide. Among the best-known proprietary repositories are Alienbrain, Bitkeeper, IBM Rational Synergy and MySQL Yum.
Whether a repository is suitable for your project depends on your requirements and your way of working. For teamwork, a repository can improve work processes and optimise workflow. Even if employees access a project and make changes at different times and from different locations, the trunk is always secure. Solutions can be tested without jeopardising previous progress. It’s a good idea to test an open-source solution before purchasing a commercial option.
How does a repository work?
Used correctly, a repository offers several advantages. GitHub is a great example of this. Once you’ve installed and set up GitHub, you can use the intuitive user interface to assign and process tasks. Commits and pulls are used for listed changes. In this way, a team leader can track individual progress steps and members can follow the project down to the smallest detail. To learn more about GitHub, have a look at our Git tutorial.