RTN-103

Procedure for creating a butler repository at FrDF for ComCam multisite campaigns#

Abstract

In this note we document the required input datasets and the procedure we followed at the Rubin French Data Facility (FrDF) for creating and populating a butler repository for the needs of ComCam multisite campaigns. This note is base on DM-48746.

Introduction#

Input Datasets#

SkyMap#

Skymap used was /pbs/throng/lsst/users/byanny/skymaps/lsst_cells_v1.skymap.config. More details on the skymap can be found in the issue DM-46717.

Raw images#

For the ComCam multisite butler repository we use the 16000 exposures raw images produced during the LSSTComCam campaign (about 16000 exposures). Raw exposures are registered in Rucio in the raw scope, in a dataset named Dataset/LSSTComCam/raw/<date>, where <date> is the date where the exposure has been acquired. They are automatically replicated at FrDF and are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/raw/LSSTComCam/. To facilitate ingestion, a metadata file _index.json has been generated for each exposure using the astrometadata package, and uploaded in the same directory as the exposure files.

Calibration data#

LSSTComCam calibration data are located at USDF in the /repo/main butler repository. The list of calibrations to ingest is the following:

### curated calibrations ### * DM-48650

Each item is a ticket ($TICKET) that corresponds to a calibration collection (COLLECTION=$INSTRUMENT/calib/$TICKET), and requires an export.yaml to be ingested. These files can be found at USDF in the directory /sdf/data/rubin/shared/calibration_archive:

cd /sdf/data/rubin/shared/calibration_archive
rg -l $TICKET TAXICAB-* | grep export.yaml |& head -1

For instance:

rg -l DM-48520 TAXICAB-* | grep export.yaml |& head -1
./TAXICAB-23/LSSTComCam.calibs.20250213a/export.yaml

These files can be manually retrieved through ssh, although they will eventually be managed by Rucio. Each collection is registered in Rucio in the ancillary scope using the following command:

rucio-register data-products \
  -s 10 \
  -C /sdf/data/rubin/shared/calibration_archive/rucio/main-calib-config.yaml \
  -r /repo/main \
  -t $dstype \
  -c $COLLECTION \
  -d $DATASET

rucio did update --close ancillary:$DATASET

where $dstype is the dataset type (dstyps in our case), $COLLECTION is the collection name as defined above, and $DATASET is the dataset name: Dataset/LSSTComCam/$dstype/$TICKET.

The registered data products can then be replicated at FrDF:

rucio rule add --rses 'SLAC_BUTLER_DISK|IN2P3_RAW_DISK' --copies 2 ancillary:$DATASET

or

rucio rule add --rses 'IN2P3_RAW_DISK' --copies 1 ancillary:$DATASET

They are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/LSSTComCam/calib/.

Reference catalogs#

Two versions of “The Monster” catalog are used (see DM-46370 and DM-49042). Both are located at USDF in /sdf/data/rubin/shared/refcats, and registered in Rucio, in datasets Dataset/refcats/the_monster_20240219_1 and Dataset/refcats/the_monster_20240904 ?

They are replicated at FRDF with:

rucio rule add --rses 'IN2P3_RAW_DISK' --copies 1 raw:Dataset/refcats/the_monster_20240219_1

and are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/raw/refcats/.

Pretrained-models catalog#

Pretrained-models catalog is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/pretrained-models. It is replicated at FrDF with:

rucio rule add --rses IN2P3_RAW_DISK --copies 1 ancillary:Dataset/LSSTComCam/dstyps/pretrained-models

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/pretrained_models/.

FGCM calibration#

FGCM lookup table (see DM-48089) is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/fgcmLookUpTable. It is replicated at FrDF with:

rucio rule add --rses IN2P3_RAW_DISK --copies 1 ancillary:Dataset/LSSTComCam/dstyps/fgcmLookUpTable

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/LSSTComCam/calib/fgcmcal/.

Solar System Objects catalog#

Solar System Objects catalog (see DM-49977) is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/DM-49977. It is replicated at FrDF with:

rucio rule add --rses IN2P3_RAW_DISK --copies 1 ancillary:Dataset/LSSTComCam/dstyps/DM-49977

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/u/jkurla/dp1_ephem_2/.

Creating and populating the repository#

We present here the procedure we used for creating and populating the repository.

The location of the repository is referred using the environment variable $REPO:

export REPO='davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/butler/ccms1'

The location of data to be ingested is defined using the environment variable $DATA:

export DATA='davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument'

Create an empty repository#

We use the seed configuration file butler-seed_ccms1.yaml shown below to create a butler repository composed of a PostgreSQL registry database and a WebDAV datastore (the default):

$ cat butler-seed_ccms1.yaml
datastore:
  name: "ccms1"
  root: "davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/butler/ccms1"
registry:
  db: postgresql://ccpglsstprod.in2p3.fr:6552/lsstprod
  namespace: ccms1

To create the repository at location $REPO we use the command:

butler create --seed-config butler-seed_ccms1.yaml --override $REPO

Register instrument#

To register the instrument for this repository we use the command below:

butler register-instrument $REPO lsst.obs.lsst.LsstComCam

Register SkyMap#

To register the skymap configuration we use the command below:

butler register-skymap --config-file lsst_cells_v1.skymap.config $REPO

Ingest raw exposures#

We ingest the raw exposures using:

butler ingest-raws --fail-fast --transfer direct $REPO $DATA/raw/LSSTComCam

Note that parallel ingestion was performed to speedup the process. One can then check that all visits / detectors have been ingested:

butler query-datasets $REPO raw --collections LSSTComCam/raw/all --limit 0 | wc -l
148849

Since there are 9 detectors in LSSTComCam, this corresponds to the approximate number of 16000 exposures in the LSSTComCam campaign.

Define visits#

To define visits from the exposures previously ingested into the repository we use the command below:

butler define-visits $REPO LSSTComCam --collections LSSTComCam/raw/all

Add instrument’s curated calibrations#

To ingest the known calibration data for LSSTComCam (see DM-48650) we use the command below:

butler write-curated-calibrations $REPO lsst.obs.lsst.LsstComCam --label DM-48650

Ingest calibration data#

To ingest calibration data we use the command below, for each collection:

butler import $REPO $DATA/ancillary --export-file export.yaml -t direct

Once all calibrations have been ingested, a global calibration collection is defined:

butler collection-chain $REPO LSSTComCam/calib LSSTComCam/calib/DM-48955,LSSTComCam/calib/DM-48520,LSSTComCam/calib/DM-47365,LSSTComCam/calib/DM-47741,LSSTComCam/calib/DM-47547,LSSTComCam/calib/DM-47499,LSSTComCam/calib/DM-47447,LSSTComCam/calib/DM-47197,LSSTComCam/calib/DM-46360,LSSTComCam/calib/DM-47498,LSSTComCam/calib/DM-48650,LSSTComCam/calib/DM-48650/unbounded

Ingest reference catalogs#

For the first version of “The Monster” catalog, the corresponding dataset type is registered with:

butler register-dataset-type $REPO the_monster_20240904 SimpleCatalog htm7

Then the ingestion is done:

butler ingest-files $REPO the_monster_20240904 refcats/DM-46370/the_monster_20240904 --prefix $DATA/raw/refcats/the_monster_20240904/ -t direct the_monster_20240904.ecsv

where the file the_monster_20240904.ecsv has been provided by B. Yanny. Similarly, for the second version:

butler register-dataset-type $REPO the_monster_20250219 SimpleCatalog htm7
butler ingest-files $REPO the_monster_20250219 refcats/DM-49042/the_monster_20250219 --prefix $DATA/raw/refcats/the_monster_20250219/ -t direct the_monster_20250219.ecsv

A chained collection is then created:

butler collection-chain $REPO refcats refcats/DM-46370/the_monster_20240904,refcats/DM-49042/the_monster_20250219

Ingest Pretrained-models catalog#

Pretrained-models catalog is ingested with:

butler import $REPO --export-file pretrained-models-export.yaml -t direct $DATA/ancillary/

where pretrained-models-export.yaml has the following content:

description: Butler Data Repository Export
version: 1.0.2
universe_version: 7
universe_namespace: daf_butler
data:
- type: collection
  collection_type: RUN
  name: pretrained_models/tac_cnn_comcam_2025-02-18
  host: null
  timespan_begin: null
  timespan_end: null
- type: dataset_type
  name: pretrainedModelPackage
  dimensions: []
  storage_class: NNModelPackagePayload
  is_calibration: false
- type: dataset
  dataset_type: pretrainedModelPackage
  run: pretrained_models/tac_cnn_comcam_2025-02-18
  records:
  - dataset_id:
    - !uuid 'a83d850a-0094-417c-ac9c-64d0f7b98048'
    data_id:
    - {}
    path: pretrained_models/tac_cnn_comcam_2025-02-18/pretrainedModelPackage/pretrainedModelPackage_pretrained_models_tac_cnn_comcam_2025-02-18.zip
    formatter: lsst.meas.transiNet.modelPackages.formatters.NNModelPackageFormatter

A chained collection is then created:

butler collection-chain $REPO pretrained_models pretrained_models/tac_cnn_comcam_2025-02-18

Ingest FGCM calibration#

FGCM calibration is ingested with:

butler import $REPO --export-file DM-48089-fgcmLookupTable-export.yaml -t direct $DATA/ancillary/

where DM-48089-fgcmLookupTable-export.yaml has the following content:

description: Butler Data Repository Export
version: 1.0.2
universe_version: 7
universe_namespace: daf_butler
data:
- type: dimension
  element: instrument
  records:
  - name: LSSTComCam
    visit_max: 7050123199999
    visit_system: 2
    exposure_max: 7050123199999
    detector_max: 1000
    class_name: lsst.obs.lsst.LsstComCam
- type: collection
  collection_type: RUN
  name: LSSTComCam/calib/fgcmcal/DM-48089
  host: null
  timespan_begin: null
  timespan_end: null
- type: dataset_type
  name: fgcmLookUpTable
  dimensions:
  - instrument
  storage_class: Catalog
  is_calibration: false
- type: dataset
  dataset_type: fgcmLookUpTable
  run: LSSTComCam/calib/fgcmcal/DM-48089
  records:
  - dataset_id:
    - !uuid 'bb573ca3-6159-45d9-88e3-866e01da4882'
    data_id:
    - instrument: LSSTComCam
    path: LSSTComCam/calib/fgcmcal/DM-48089/fgcmLookUpTable/fgcmLookUpTable_LSSTComCam_LSSTComCam_calib_fgcmcal_DM-48089.fits
    formatter: lsst.obs.base.formatters.fitsGeneric.FitsGenericFormatter

A chained collection is then created:

butler collection-chain $REPO LSSTComCam/calib/fgcmcal LSSTComCam/calib/fgcmcal/DM-48089

Ingest Solar System Objects catalog#

Solar System Objects catalog (see DM-49977) is ingested with:

butler import $REPO --export-file export.yaml -t direct $DATA/ancillary/

where the file export.yaml has been provided by B. Yanny. A TAGGED collection is then created, including all datasets:

butler = Butler('$REPO',writeable=True)
butler.registry.registerCollection("LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409", CollectionType.TAGGED)
dataset_refs = butler.registry.queryDatasets("preloaded_DRP_SsObjects",collections="u/jkurla/dp1_ephem_2*",instrument="LSSTComCam")
butler.registry.associate("LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409", dataset_refs)

Create global collection#

Within the 16000 exposures ingested, about 2000 are Science exposures (each with 9 detectors):

butler query-datasets $REPO raw --collections LSSTComCam/raw/all --where "exposure.observation_type='science'" --limit 0 |wc -l
19205

From these ones, 1792 exposures have been selected to be processed (see DM-49594). We define therefore a collection containing thse 1792 selected LSSTComCam exposures:

python /pbs/throng/lsst/users/byanny/butler_associate_visits.py $REPO /pbs/throng/lsst/users/byanny/dp1_good_visits.txt LSSTComCam/raw/DP1-RC3/DM-49594 LSSTComCam/raw/all LSSTComCam 2000

Finally, we define a collection containg all input collections previously defined:

butler collection-chain $REPO LSSTComCam/DP1/defaults LSSTComCam/raw/DP1-RC3/DM-49594,LSSTComCam/calib,refcats,skymaps,pretrained_models,LSSTComCam/calib/fgcmcal,LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409