This page describes a way to mirror the Linux kernel BKCVS repository to a GNU Arch one. BKCVS is the name for a CVS repository mirroring the BitKeeper one currently used by some (most important) of the Linux kernel developers.

The Linux BKCVS repository can be retrieved using rsync:

rsync -az --delete rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5 /path/to/cvsrepo/linux-2.6
cvs -d /path/to/cvsrepo co linux-2.6

There is a ChangeSet,v file in the repository which contains the log messages of every commit. The CVS commit logs of the kernel files contain a "Logical change X" text, X being the ChangeSet,v CVS revision number.

The script below rsync's the Linux CVS repository, generates a log with the changeset information, generates a patch and a GNU Arch log for every new changeset and applies/commits them one by one. The script does not pollute the Linux tree with CVS directories. It uses some files/directories named ",bkcvs*" in the tree root.

The following steps should be performed:

The script always removes the last library revisions since keeping all of them is using too much hdd space (there are around 50 changesets a day). The script also uses the ,bkcvs-home directory in the current tree as the $HOME one because it modifies the .arch-params/\=id file according to the author of the BKCVS commit (since GNU Arch doesn't have an option to set this).

The scripts generates some X-BKCVS* headers in the arch log (they can be easily removed/modified). The X-BKCVS-Rev header is actually useful in case the entire directory is lost - just checkout the Linux GNU Arch repository and copy this number in the ,bkcvs-last-rev file (the only file needed for the script).

The command line options:

bkcvs-arch-sync.sh \
    --working-dir|-d <dir>    cd to <dir> before running (current directory without this option)
    --dont-rsync|-n           don't rsync with the Linux repository (useful for debugging)
    --help|-h                 print this message

bkcvs-arch-sync.sh

#!/bin/sh
#
# Generates BKCVS changesets and synchronises them with the
# GNU Arch repository
#
# Copyright (C) 2005 ARM Limited
# Written by Catalin Marinas <catalin.marinas@arm.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

# Just exit if any error occurs
set -e
#set -x

# Print a help message
function print_help()
{
    echo "Usage:"
    echo "$1 \\"
    echo "    --working-dir|-d <dir>    # cd to <dir> before running"
    echo "    --dont-rsync|-n           # don't rsync with the Linux repository"
    echo "    --help|-h                 # print this message"
}

# Parse the command line options
GNUARCH_LINUX_DIR=`pwd`
DONT_RSYNC=n

while [ $# != 0 ]; do
    case $1 in
        --working-dir|-d)
            shift
            GNUARCH_LINUX_DIR=$1
            shift
            ;;
        --dont-rsync|-n)
            DONT_RSYNC=y
            shift
            ;;
        --help|-h)
            print_help $0
            exit 0
            ;;
        ?*)
            print_help $0
            exit 1
            ;;
        *)
            break
            ;;
    esac
done

# cd to the GNU Arch Linux tree
cd $GNUARCH_LINUX_DIR


# Different variables
FULL_LOG=,bkcvs.log
BKCVS_LAST_REV_FILE=,bkcvs-last-rev
BKCVS_HOME=,bkcvs-home
TLA_MYID_FILE=$BKCVS_HOME/.arch-params/\=id
CVSROOT=$GNUARCH_LINUX_DIR/,bkcvs-rsync
CVSREPO=linux-2.6
PATCH_PREFIX=,patch
BKCVS_RSYNC=rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5
TLA_TREE_VERSION=`tla tree-version $GNUARCH_LINUX_DIR`

# grep regexps
RCS_FILE="^RCS file: .*,v$"
LOG_CHANGE="^}\?(Logical change [0-9\.]\+)$"
CVS_REV="^revision [0-9\.]\+$"


# Build the patch
function build_cset()
{
    BKCVS_REV=$1
    BKCVS_REV_RE=`echo $BKCVS_REV | sed -e 's/\./\\\./g'`
    PATCH_FILE=$2

    rm -f $PATCH_FILE
    touch $PATCH_FILE

    # For too long logs, just grep-out the unimportant lines
    # (awk is slower than grep)
    cat $FULL_LOG \
        | grep -B1 -e "^file .*$" \
                   -e "^bkcvsrev $BKCVS_REV_RE$" \
        | grep -v -e "^--$" \
        | grep -B2 -e "^bkcvsrev $BKCVS_REV$" \
        | grep -v -e "^--$" \
        | gawk ' \
            /^file .*$/                  { file = $2 }
            /^revision [0-9\.]+$/        { revision = $2 }
            /^bkcvsrev '$BKCVS_REV_RE'$/ { print file, revision }' \
      | while read file cvsrev; do
        # just remove the leading '1.'
        new_rev=${cvsrev#*.}
        old_rev=$((new_rev - 1))

        cvs -q -d $CVSROOT rdiff -u -r1.$old_rev -r1.$new_rev \
            $CVSREPO/$file >> $PATCH_FILE
    done
}

# Generate a tla-compatible log file
function build_log()
{
    BKCVS_REV=$1

    rlog -N -r$BKCVS_REV $CVSROOT/ChangeSet,v | gawk ' \
        BEGIN {
            search = "-"
            FS = "[ \t;]+"
        }
        search == "-" && /^----------------------------$/ {
            search = "r"
            next
        }
        search == "r" && /^revision [0-9\.]+$/ {
            search = "d"
            next
        }
        search == "d" && /^date:.*;  author:.*$/ {
            date = $2 " " $3
            author = $5
            summary = ""
            search = "s"
            next
        }
        search ==  "s" && /^.+$/ {
            if (summary == "")
                summary = $0
            else
                summary = summary " " $0
            next
        }
        search == "s" && /^$/ {
            changelog = ""
            search = "l"
            next
        }
        search == "l" && /^=============================================================================$/ {
            print "Summary: " summary
            print "Keywords: "
            print "X-BKCVS-Date: " date
            print "X-BKCVS-Author: " author
            print "X-BKCVS-Rev: '$BKCVS_REV'"
            print "X-BKrev: " bkrev
            print changelog
            exit
        }
        search == "l" && /^BKrev: .*$/ {
            bkrev = $2
        }
        search == "l" {
            changelog = changelog "\n" $0
            next
        }'
}


# Generate the temporary home directory. We use it for generating the
# author of the patch
rm -rf $BKCVS_HOME
mkdir -p $BKCVS_HOME
cp -R $HOME/.arch-params $BKCVS_HOME/.arch-params

if [ $DONT_RSYNC != y ]; then
    # rsync with the kernel BKCVS repository
    mkdir -p $CVSROOT/$CVSREPO
    cvs -q -d $CVSROOT init

    rsync -az --delete --exclude /BitKeeper/ --exclude /ChangeSet,v \
        $BKCVS_RSYNC/ $CVSROOT/$CVSREPO
    rsync -az --delete $BKCVS_RSYNC/ChangeSet,v $CVSROOT

    # generate the full BKCVS log (only keep the filename, revision number
    # and the cset number)
    # Make sure ChangeSet,v is not in the repository since it doesn't
    # follow the rules
    echo "Generating the full BKCVS log"
    cvs -q -d $CVSROOT rlog $CVSREPO | gawk ' \
        BEGIN {
            search = "f";
        }
        /^RCS file: .*,v$/ {
            print
            search = "r"
            next
        }
        search == "r" && /^revision [0-9\.]+$/ {
            print
            search = "b"
            next
        }
        search == "b" && /^\}?\(Logical change [0-9\.]+\)$/ {
            print
            search = "r"
            next
        }' \
        | sed -e "s%^RCS file: $CVSROOT/$CVSREPO/\(.*\),v$%file \1%" \
            -e "s/^}\?(Logical change \([0-9\.]\+\))$/bkcvsrev \1/" \
        > $FULL_LOG
fi

# Generate revisions one by one ("1." is removed from start and end)
start=`cat $BKCVS_LAST_REV_FILE | sed -e "s/^1\.//"`
end=`rlog -N -r1.$start $CVSROOT/ChangeSet,v \
    | grep -e "^head: 1\.[0-9]\+$" \
    | sed -e "s/^head: 1\.\([0-9]\+\)$/\1/"`

# check the last patch (is rsync atomic?)
BKCVS_REV=1.$start

PATCH_FILE=$PATCH_PREFIX-$BKCVS_REV
echo "Checking the last applied changeset"
echo "-- Building the $BKCVS_REV changeset"
build_cset $BKCVS_REV $PATCH_FILE
patch --dry-run -R -s -p1 < $PATCH_FILE
rm $PATCH_FILE

((start++))
if [ $((start > end)) == 1 ]; then
    echo "No changesets to be applied"
    exit 0
fi

# add the missing changesets
echo "Adding BKCVS changesets between 1.$start and 1.$end"
while [ $((start <= end)) == 1 ]; do
    BKCVS_REV=1.$((start++))
    PATCH_FILE=$PATCH_PREFIX-$BKCVS_REV

    echo "-- Building the $BKCVS_REV changeset"
    build_cset $BKCVS_REV $PATCH_FILE

    TLA_LOG_FILE=`tla make-log`
    build_log $BKCVS_REV > $TLA_LOG_FILE
    echo -n "    "; head -n1 $TLA_LOG_FILE | sed -e "s/^Summary: //"

    # modify the local =id file (for tla my-id)
    cat $TLA_LOG_FILE | grep -e "^X-BKCVS-Author: .*$" \
        | sed -e "s/^X-BKCVS-Author: \([^ \t]*\)$/\1 <\1@invalid-address.com>/" \
        > $TLA_MYID_FILE

    # patch the source with the new changeset (-E is needed since cvs diff
    # does not produce a proper timestamp)
    patch -f -s -E -p1 < $PATCH_FILE

    # Add ids for the new files
    for i in `tla tree-lint -t`; do
        find ./$i -not -path "*/.arch-ids*" -a -not -path "*/{arch}*" \
            -a -not -path "*/,*" \
            -exec tla add-id {} \;
    done
    
    # Remove the ids for the missing files
    for i in `tla tree-lint -m`; do
        rm $i
    done

    # Prune the empty directories (containing maybe only .arch-ids)
    # We actually need to test them in the reverse order to cope with
    # empty subdirs
    unset STACK
    for i in `find . -type d -not -path "*/.arch-ids*" \
                             -a -not -path "*/{arch}*" \
                             -a -not -path "*/,*"`; do
        STACK="$i $STACK"
    done
    for i in $STACK; do
        ls $i/* &> /dev/null || rm -rf $i
    done

    # the actual commit (HOME changed to preserve the original author)
    HOME=$BKCVS_HOME tla commit

    # cleanup
    rm $PATCH_FILE

    # remove the previous library revision (saves a lot of space)
    tla library-remove $TLA_TREE_VERSION--`tla library-revisions | tail -n2 | head -n1`

    echo "$BKCVS_REV" > $BKCVS_LAST_REV_FILE
done

echo
echo "-- Update completed successfully --"

cat

BKCVS to Arch Script for Linux Kernel (last edited 2008-08-13 19:35:13 by 82)