reprepro manual

This manual documents reprepro, a tool to generate and administer Debian package repositories.
Other useful resources:

Table of contents

Sections of this document:

Introduction

What reprepro does

Reprepro is a tool to take care of a repository of Debian packages (.dsc,.deb and .udeb). It installs them to the proper places, generates indices of packages (Packages and Sources and their compressed variants) and of index files (Release and optionally Release.gpg), so tools like apt know what is available and where to get it from. It will keep track which file belongs to where and remove files no longer needed (unless told to not do so). It can also make (partial) partial mirrors of remote repositories, including merging multiple sources and automatically (if explicitly requested) removing packages no longer available in the source. And many other things (sometimes I fear it got a few features too much).

What reprepro needs

It needs some libraries (zlib, libgpgme, libdb (Version 3, 4.3 or 4.4)) and can be compiled with some more for additional features (libarchive, libbz2). Otherwise it only needs apt's methods (only when downloading stuff), gpg (only when signing or checking signatures), and if compiled without libarchive it needs tar and ar installed.
If you tell reprepro to call scripts for you, you will of course need the interpreters for these scripts: The included example to generate pdiff files needs python. The example to extract changelogs needs dpkg-source.

What this manual aims to do

This manual aims to give some overview over the most important features, so people can use them and so that I do not implement something a second time because I forgot support is already there. For a full reference of all possible commands and config options take a look at the man page, as this manual might miss some of the more obscure options.

First steps

generate a repository with local packages

mirroring packages from other repositories

This example shows how to generate a mirror of a single architecture with all packages of etch plus security updates:

Repository basics

An apt-getable repository of Debian packages consists of two parts: the index files describing what is available and where it is and the actual Debian binary (.deb), installer binary (.deb), and source (.dsc together with .tar.gz or .orig.tar.gz and .diff.gz) packages.
While you do not know how these look like to use reprepro, it's always a good idea to know what you are creating.

Index files

All index files are in subdirectories of a directory called dists. Apt is very decided what names those should have, including the name of dists. Including all optional and extensional files, the hierarchy looks like this:
dists
CODENAME
Each distribution has it's own subdirectory here, named by it's codename.
Release
This file describes what distribution this is and the checksums of all index files included.
Release.gpg
This is the optional detached gpg signature of the Release file. Take a look at the section about signing for how to active this.
Contents-ARCHITECTURE.gz
This optional file lists all files and which packages they belong to. It's downloaded and used by tools like apt-file to allow users to determine which package to install to get a specific file.
To activate generating of these files by reprepro, you need a Contents header in your distribution declaration.
COMPONENT1
Each component has it's own subdirectory here. They can be named whatever users can be bothered to write into their sources.list, but things like main, non-free and contrib are common. But funny names like bad or universe are just as possible.
source
If this distribution supports sources, this directory lists which source packages are available in this component.
Release
This file contains a copy of those information about the distribution applicable to this directory.
Sources
Sources.gz
Sources.bz2
These files contain the actual description of the source Packages. By default only the .gz file created, to create all three add the following to the declarations of the distributions:
DscIndices Sources Release . .gz .bz2
That header can also be used to name those files differently, but then apt will no longer find them...
Sources.diff
This optional directory contains diffs, so that only parts of the index file must be downloaded if it changed. While reprepro cannot generate these so-called pdiffs itself, it ships with an example python script it can call to generate those.
binary-ARCHITECTURE
Each architecture has its own directory in each component.
Release
This file contains a copy of those information about the distribution applicable to this directory.
Packages
Packages.gz
Packages.bz2
These files contain the actual description of the binary Packages. By default only the uncompressed and .gz files are created. To create all three, add the following to the declarations of the distributions:
DebIndices Packages Release . .gz .bz2
That header can also be used to name those files differently, but then apt will no longer find them...
Packages.diff
This optional directory contains diffs, so that only parts of the index file must be downloaded if it changed. While reprepro cannot generate these so-called pdiffs itself, it ships with an example python script it can call to generate those.
debian-installer
This directory contains information about the .udeb modules for the Debian-Installer. Those are actually just a very stripped down form of normal .deb packages and this the hierarchy looks very similar:
binary-ARCHITECTURE
Packages
Packages.gz
COMPONENT2
There is one dir for every component. All look just the same.
To allow accessing distribution by function instead of by name, there are often symlinks from suite to codenames. That way users can write
deb http://some.domain.tld/debian SUITE COMPONENT1 COMPONENT2
instead of
deb http://some.domain.tld/debian CODENAME COMPONENT1 COMPONENT2
in their /etc/apt/sources.list and totally get surprised by getting something new after a release.

Package pool

While the index files have a required filename, the actual files are given just as relative path to the base directory you specify in your sources list. That means apt can get them no matter what scheme is used to place them. The classical way Debian used till woody was to just put them in subdirectories of the binary-ARCHITECTURE directories, with the exception of the architecture-independent packages, which were put into a artificial binary-all directory. This was replaced for the official repository with package pools, which reprepro also uses. (Actually reprepro stores everything in pool a bit longer than the official repositories, that's why it recalculates all filenames without exception).
In a package pool, all package files of all distributions in that repository are stored in a common directory hierarchy starting with pool/, only separated by the component they belong to and the source package name. As everything this has disadvantages and advantages: Now let's look at the actual structure of a pool (there is currently no difference between the pool structure of official Debian repositories and those generated by reprepro):
pool
The directory all this resides is is normally called pool. That's nowhere hard coded in apt but that only looks at the relative directory names in the index files. But there is also no reason to name it differently.
COMPONENT1
Each component has it's own subdirectory here. They can be named whatever users can be bothered to write into their sources.list, but things like main, non-free and contrib are common. But funny names like bad or universe are just as possible.
a
As there are really many different source packages, the directory would be too full when all put here. So they are separated in different directories. Source packages starting with lib are put into a directory named after the first four letters of the source name. Everything else is put in a directory having the first letter as name.
asource
Then the source package name follows. So this directory pool/COMPONENT1/a/asource/ would contain all files of different versions of the hypothetical package asource.
asource
a-source_version.dsc
a-source_version.tar.gz
The actual source package consists of its description file (.dsc) and the files references by that.
binary_version_ARCH1deb
binary_version_ARCH2.deb
binary2_version_all.deb
di-module_version_ARCH1.udeb
Binary packages are stored here to. So to know where a binary package is stored you need to know what its source package name is.
liba
As described before packages starting with lib are not stored in l but get a bit more context.
COMPONENT2
There is one dir for every component. All look just the same.
As said before, you don't need to know this hierarchy in normal operation. reprepro will put everything to where it belong, keep account what is there and needed by what distribution or snapshot, and delete files no longer needed. (Unless told otherwise or when you are using the low-level commands).

Config files

TO BE DOCUMENTED

Generation of index files

Deciding when to generate

As reprepro stores all state in its database, you can decide when you want them to be written to the dists/ directory. You can always tell reprepro to generate those files with the export command:
reprepro -b $YOURBASEDIR export $CODENAMES
This can be especially useful, if you just edited conf/distributions and want to test what it generates.

While that command regenerates all files, in normal operation reprepro will only regenerate files where something just changed or that are missing. With --export option you can control when this fill happen:

never
Don't touch any index files. This can be useful for doing multiple operations in a row and not wanting to regenerate the indices all the time. Note that unless you do an explicit export or change the same parts later without that option, the generated index files may be permanently out of date.
normal
This is the default behaviour. In this mode all distributions are processed that were looked at without error. This ensures that even after a operation that had nothing to do the looked at distribution has all the files exported needed to access it.
changed
Only look for missing files if something in a distribution actually changed. That means that if you generate or change the options of a distribution you have to actually do some action adding, removing or changing a package in that before the changes will hit the dists/ directory.
force
Also try to write the current state if some error occured. In all other modes reprepro will not write the index files if there was a problem. While this keeps the repository usable for users, it means that you will need an explicit export to write possible other changes done before that in the same run. (reprepro will tell you that at the end of the run with error, but you should not miss it).

Distribution specific fields

There are a lot of conf/distributions headers to control what index files to generate for some distribution, how to name them, how to postprocess them and so on. The most important are:

Fields for the Release files

The following headers are copied verbatim to the Release file, if they exist: Origin, Label, Codename, Suite, Architectures (excluding a possible value "source"), Components, Description, and NotAutomatic.

Choosing compression and file names

Depending on the type of the index files, different files are generated. No specifying anything is equivalent to:
 DscIndices Sources Release .gz
 DebIndices Packages Release . .gz
 UDebIndices Packages . .gz
This means to generate Release, Sources.gz for sources, Release, Packages and Packages.gz for binaries and Packages and Packages.gz for installer modules.
The format of these headers is the name of index file to generate, followed by the optional name for a per-directory release description (when no name is specified, no file is generated). Then a list of compressions: A single dot (.) means generating an uncompressed index, .gz means generating a gzipped output, while .bz2 requests and bzip2ed file. (.bz2 is not available when disabled at compile time). After the compressions a script can be given that is called to generate/update additional forms, see "Additional index files".

Signing

If there is a SignWith header, reprepro will try to generate a Release.gpg file using libgpgme. If the value of the header is yes it will use the first key it finds, otherwise it will give the option to libgpgme to determine the key. (Which means fingerprints and keyids work fine, and whatever libgpgme supports, which might include most that gpg supports to select a key).
The best way to deal with keys needing passphrases is to use gpg-agent. The only way to specify which keyring to use is to set the GNUPGHOME enviroment variable, which will effect all distributions.

Contents files

Reprepro can generate files called dists/CODENAME/Contents-ARCHITECTURE.gz listing all files in all binary packages available for the selected architecture in that distribution and which package they belong to.
This file can either be used by humans directly or via downloaded and searched with tools like apt-file.
To activate generating of these files by reprepro, you need a Contents header in that distribution's declaration in conf/distributions, like:
Contents: 1
The number is the inverse ratio of not yet looked at and cached files to process in every run. The larger the more packages are missing. 1 means to list everything.
There can be a number of arguments after the number, and additional headers specifying which Architectures to generate Contents files for and which Components to include in those. For example
Contents: 1 udebs nodebs . .gz .bz2
ContentsArchitectures: ia64
ContentsComponents: none
ContentsUComponents: main
means to not skip any packages, generate Contents for .udeb files, not generating Contents for .debs. Also it is only generated for the ia64 architecture and only packages in component main are included.

Additional index files (like .diff)

Index files reprepro cannot generate itself, can be generated by telling it to call a script.
the tiffany example hook script (generates pdiff files)
This example generates Packages.diff and/or Sources.diff directories containing a set of ed-style patches, so that people do not redownload the whole index for just some small changes.
To use it, copy tiffany.example from the examples directory into your conf directory. (or any other directory, then you will need to give an absolute path later). Unpack, if needed. Rename it to tiffany.py and make it executeable. Make sure you have python-apt, diff and gzip installed. Then add something like the following to the headers of the distributions that should use this in conf/distributions:
 DscIndices: Sources Release . .gz tiffany.py
 DebIndices: Packages Release . .gz tiffany.py
More information can be found in the file itself. You should read it.
the bzip2 example hook script
This is an very simple example. Simple and mostly useless, as reprepro has built in .bz2 generation support, unless you compiled it your own with --without-libbz2 or with no libbz2-dev installed.
To use it, copy bzip.example from the examples directory into your conf directory. (or any other directory, then you will need to give an absolute path later). Unpack, if needed. Rename it to bzip2.sh and make it executeable. Then add something like the following to the headers of the distributions that should use this in conf/distributions:
 DscIndices: Sources Release . .gz bzip2.sh
 DebIndices: Packages Release . .gz bzip2.sh
 UDebIndices: Packages . .gz bzip2.sh
The script will compress the index file using the bzip2 program and tell reprepro which files to include in the Release file of the distribution.
internals
TO BE CONTINUED

...

TO BE CONTINUED

Local packages

There are two ways to get packages not yet in any repository into yours.
includedsc, includedeb, include
These are for including packages at the command line. Many options are available to control what actually happens. You can easily force components, section and priority and/or choose to include only some files or only in specific architectures. (Can be quite usefull for architecture all packages depending on some packages you will some time before building for some of your architectures). Files can be moved instead of copied and most sanity checks overwritten. They are also optimized towards being fast and simply try things instead of checking a long time if they would succeed.
processincoming
This command checks for changes files in an incoming directory. Being optimized for automatic processing (i.e. trying to checking everything before actually doing anything), it can be slower (as every file is copied at least once to sure the owner is correct, with multiple partitions another copy can follow). Component, section and priority can only be changed via the distribution's override files. Every inclusion needs a .changes file.
This method is also relatively new (only available since 2.0.0), thus optimisation for automatic procession will happen even more.

Including via command line

There are three commands to directly include packages into your repository: includedeb, includedsc and includechanges. Each needs to codename of the distribution you want to put your package into as first argument and a file of the appropiate type (.deb, .dsc or .changes, respectively) as second argument.
If no component is specified via --component (or short -C), it will be guessed looking at its section and the components of that distribution.
If there are no --section (or short -S) option, and it is not specified by the (binary or source, depending on the type) override file of the distribution, the value from the .changes-file is used (if the command is includechanges) or it is extracted out of the file (if it is a .deb-file, future versions might also try to extract it from a .dsc's diff or tarball).
Same with the priority and the --priority (or short -P) option.
With the --architecture (or short -A) option, the scope of the command is limited to that architecture. includdeb will add a Architecture all packages only to that architecture (and complain about Debian packages for other architectures). include will do the same and ignore packages for other architectures (source packages will only be included if the value for --architecture is source).
To limit the scope to a specify type of package, use the --packagetype or short -T option. Possible values are deb, udeb and dsc.
When using the --delete option, files will be moved or deleted after copying them. Repeating the --delete option will also delete unused files.
TO BE CONTINUED.

Processing and incoming queue

TO BE DOCUMENTED

Incoming directories

TO BE DOCUMENTED

Mirroring

TO BE DOCUMENTED

Propagation of packages

You can copy packages between distributions using the pull and copy commands.
TO BE DOCUMENTED

Snapshots

There is a gensnapshot command.
TO BE DOCUMENTED

Source package tracking

TO BE DOCUMENTED

Maintenance

This section lists some commands you can use to check and improve the health of you repository.
Normally nothing of this should be needed, but taking a look from time to time cannot harm.
reprepro -b $YOURBASEDIR dumpunreferenced
This lists all files reprepro knows about that are not marked as needed by anything. Unless you called reprepro with the --keepunreferenced option, those should never occour. Though if reprepro is confused or interupted it may sometimes prefer keeping files around instead of deleting them.
reprepro -b $YOURBASEDIR deleteunreferenced
This is like the command before, only that such files are directly forgotten and deleted.
reprepro -b $YOURBASEDIR check
Look if all needed files are in fact marked needed and known.
reprepro -b $YOURBASEDIR checkpool
Make sure all known files are still there and still have the same checksum.
reprepro -b $YOURBASEDIR checkpool fast
As the command above, but do not compute checksums.
reprepro -b $YOURBASEDIR tidytracks
If you use source package tracking, check for files kept because of this that should no longer by the current rules.

Internals

reprepro stores the data it collects in Berkeley DB file (.db) in a directory called db/ or whatever you specified via command line. With a few exceptions, those files are NO CACHES, but the actual data. While some of those data can be regained when you lose those files, they are better not deleted.

packages.db

This file contains the actual package information.
It contains a database for every (codename,component,architecture,packagetype) quadruple available.
Each is indexed by package name and essentially contains the information written do the Packages and Sources files.
Note that if you change your conf/distributions to no longer list some codenames, architectures or components, that will not remove the associated databases in this file. That needs an explicit call to clearvanished.

references.db

This file contains a single database that lists for every file why this file is still needed. This is either an identifier for a package database, an tracked source package, or a snapshot.
Some low level commands to access this are (take a look at the manpage for how to use them):
rereference
recreate references (i.e. forget old and create newly)
dumpreferences
print a list of all references
_removereferences
remove everything referenced by a given identifier
_addreference
manually add a reference

files.db

This file contains what reprepro knows about your pool directory, i.e. what files it things are there with what sizes and md5sums. If you manually put files there or remove them, you should tell reprepro about that. (it sometimes looks for files there without being told, but it never forgets files except when it would have deleted them anyway). Some low level commands (take a look at the man page for how to use them):
checkpool fast
Make sure all files are still there.
checkpool
Make sure all files are still there and correct.
dumpunreferenced
Show all known files without reference.
deleteunreferenced
Delete all known files without reference.
_listmd5sums
Dump this database
_detect
Add files to the database
_forget
Forget that some file is there
_addmd5sums
Create the database from dumped data

release.cache.db

In this file reprepro remembers what it already wrote to the dists directory, so that it can write their checksums (including the checksums of the uncompressed variant, even if that was never written to disk) in a newly to create Release file without having to trust those files or having to unpack them.

contents.cache.db

This file contains all the lists of files of binary package files where reprepro already needed them. (which can only happen if you requested Contents files to be generated).

tracking.db

This file contains the information of the source package tracking.

Disaster recovery

TO BE DOCUMENTED (see the recovery file until then)

Paranoia

As all software, reprepro might have bugs. And it uses libraries not written by myself, which I'm thus even more sure that they will have bugs. Some of those bugs might be security relevant. This section contains some tips, to reduce the impact of those. External stuff being used and attack vectors opened by it:
libgpgme/gpg
Almost anything is run through libgpgme and thus gpg. It will be used to check the Release.gpg file, or to read .dsc and .changes files (even when there is no key to look for specified, as that is the best way to get the data from the signed block). Avoiding this by just accepting stuff without looking for signatures on untrusted data is not really an option, so I know nothing to prefent this type of problems.
libarchive
The .tar files within .deb files are normaly (unless that library was not available while compiling) read using libarchive. This happens when a .deb file is to be added (though only after deciding if it should be added, so if it does not have the correct checksum or the .changes did not have the signatures you specified, it is not) or when the file list is to be extracted (when creating Contents files). Note that they are not processed when only mirroring them (of course unless Contents files are generated), as then only the information from the Packages file is copied.
dpkg-deb/tar
If reprepro was compiled without libarchive, dpkg-deb is used instead, which most likely will call tar. Otherwise just the same like the last item.
zlib
When mirroring packages, the downloaded Packages.gz and Sources.gz files are read using zlib. Also the generated .gz files are generated using it. There is no option but hoping there is no security relevant problem in that library.
libbz2
Only used to generate .bz2 files. If you fear simple blockwise writing using that library has a security problem that can be exploited by data enough harmless looking to be written to the generated index files, you can always decide to no tell reprepro to generate .bz2 files.

What reprepro cannot do

There are some things reprepro does not do:
Verbatim mirroring
Reprepro aims to put all files into a coherent pool/ hierarchy. Thus it cannot guarantee that files will have the same relatives path as in the original repository (especially if those have no pool). It also creates the index files from its own indices. While this leads to a tidy repository and possible savings of disk-space, the signatures of the repositories you mirror cannot be used to authenticate the mirror, but you will have to sign (or tell reprepro to sign for you) the result. While this is perfect when you only mirror some parts or specific packages or also have local packages that need local signing anyway, reprepro is no suitable tool for creating a full mirror that can be authenticated without adding the key of this repository.
Placing your files on your own
Reprepro does all the calculation of filenames to save files as, bookkeeping what files are there and what are needed and so on. This cannot be switched off or disabled. You can place files where reprepro will expect them and reprepro will use them if their md5sum matches. But reprepro is not suited if you want those files outside of a pool or in places reprepro does not consider their canonical ones.
Having different files with the same name
take a look in the FAQ (currently question 1.2) why and how to avoid the problem.