RTN-103

Procedure for creating a butler repository at FrDF for ComCam multisite campaigns#

Abstract

In this note we document the required input datasets and the procedure we followed at the Rubin France Data Facility (FrDF) for creating and populating a butler repository for the needs of ComCam multisite campaigns, using the middleware from the weekly w_2025_20. This note is based on DM-48746.

Introduction#

In the first par of this note, we list the input datasets, their corresponding Rucio datasets, as well as their location at FrDF. In the second part, we detail the procedure used to ingest these datasets in the Butler repository. In the last part we give procedures to inspect thhe repository and check that the content is consistent.

Input Datasets#

SkyMap#

Skymap used was /pbs/throng/lsst/users/byanny/skymaps/lsst_cells_v1.skymap.config. More details on the skymap can be found in the issue DM-46717.

Raw images#

For the ComCam multisite butler repository we use the 16000 exposures recorded during the LSSTComCam campaign. Raw exposures are registered in Rucio in the raw scope, in a dataset named Dataset/LSSTComCam/raw/<date>, where <date> is the date where the exposure has been acquired. They are automatically replicated at FrDF and are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/raw/LSSTComCam/. To facilitate ingestion, a metadata file _index.json has been generated for each exposure using the astrometadata package, and uploaded in the same directory as the exposure files.

Calibration data#

LSSTComCam calibration data are located at USDF in the /repo/main butler repository. The list of calibrations to ingest is the following:

Each item is a ticket ($TICKET) that corresponds to a calibration collection (COLLECTION=$INSTRUMENT/calib/$TICKET), and requires an export.yaml to be ingested. These files can be found at USDF in the directory /sdf/data/rubin/shared/calibration_archive:

$ rg -l $TICKET /sdf/data/rubin/shared/calibration_archive/TAXICAB-* | grep export.yaml |& head -1

For instance:

$ rg -l DM-48520 /sdf/data/rubin/shared/calibration_archive/TAXICAB-* | grep export.yaml |& head -1
./TAXICAB-23/LSSTComCam.calibs.20250213a/export.yaml

These files can be manually retrieved through ssh, although they will eventually be managed by Rucio. Each collection is registered in Rucio in the ancillary scope using the following command:

$ rucio-register data-products \
    --chunk-size 10 \
    --rucio-register-config /sdf/data/rubin/shared/calibration_archive/rucio/main-calib-config.yaml \
    --repo /repo/main \
    --dataset-type $dstype \
    --collections $COLLECTION \
    --rucio-dataset $DATASET

$ rucio did update --close ancillary:$DATASET

where $dstype is the dataset type (dstyps in our case), $COLLECTION is the collection name as defined above, and $DATASET is the dataset name: Dataset/LSSTComCam/$dstype/$TICKET.

The registered data products can then be replicated at FrDF:

$ rucio rule add --rse-exp 'SLAC_BUTLER_DISK|IN2P3_RAW_DISK' \
    --copies 2 ancillary:$DATASET

or

$ rucio rule add --rse-exp 'IN2P3_RAW_DISK' \
    --copies 1 ancillary:$DATASET

They are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/LSSTComCam/calib/.

Reference catalogs#

Two versions of “The Monster” catalog are used (see DM-46370 and DM-49042). Both are located at USDF in /sdf/data/rubin/shared/refcats, and registered in Rucio, in datasets

Dataset/refcats/the_monster_20240219_1
Dataset/refcats/the_monster_20240219_2
Dataset/refcats/the_monster_20240219_3
Dataset/refcats/the_monster_20240219_4
Dataset/refcats/the_monster_20240219_5
Dataset/refcats/the_monster_20240219_6
Dataset/refcats/the_monster_20240219_7
Dataset/refcats/the_monster_20240219_8
Dataset/refcats/the_monster_20240219_9
Dataset/refcats/the_monster_20240219_10
Dataset/refcats/the_monster_20240219_11
Dataset/refcats/the_monster_20240219_12
Dataset/refcats/the_monster_20240219_13
Dataset/refcats/the_monster_20240219_14

and Dataset/refcats/the_monster_20240904. They are replicated at FRDF with:

$ rucio rule add --rse-exp 'IN2P3_RAW_DISK' \
    --copies 1 raw:Dataset/refcats/the_monster_20240219_1

and are located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/raw/refcats/.

Pretrained-models catalog#

Pretrained-models catalog is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/pretrained-models. It is replicated at FrDF with:

$ rucio rule add --rse-exp IN2P3_RAW_DISK \
    --copies 1 ancillary:Dataset/LSSTComCam/dstyps/pretrained-models

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/pretrained_models/.

FGCM calibration#

FGCM lookup table (see DM-48089) is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/fgcmLookUpTable. It is replicated at FrDF with:

$ rucio rule add --rse-exp IN2P3_RAW_DISK \
    --copies 1 ancillary:Dataset/LSSTComCam/dstyps/fgcmLookUpTable

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/LSSTComCam/calib/fgcmcal/.

Solar System Objects catalog#

Solar System Objects catalog (see DM-49977) is registered in Rucio in the ancillary, in dataset Dataset/LSSTComCam/dstyps/DM-49977. It is replicated at FrDF with:

$ rucio rule add --rse-exp IN2P3_RAW_DISK \
  --copies 1 ancillary:Dataset/LSSTComCam/dstyps/DM-49977

and is located in davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument/ancillary/u/jkurla/dp1_ephem_2/.

Creating and populating the repository#

We present here the procedure we used for creating and populating the repository.

The location of the repository is referred using the environment variable $REPO:

$ export REPO='davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/butler/ccms1'

The location of data to be ingested is defined using the environment variable $DATA:

$ export DATA='davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/instrument'

Create an empty repository#

We use the seed configuration file butler-seed_ccms1.yaml shown below to create a butler repository composed of a PostgreSQL registry database and a WebDAV datastore (the default):

$ cat butler-seed_ccms1.yaml
datastore:
  name: "ccms1"
  root: "davs://ccdavrubinint.in2p3.fr:2880/pnfs/in2p3.fr/lsst/butler/ccms1"
registry:
  db: postgresql://ccpglsstprod.in2p3.fr:6552/lsstprod
  namespace: ccms1

To create the repository at location $REPO we use the command:

$ butler create --seed-config butler-seed_ccms1.yaml --override $REPO

Register instrument#

To register the instrument for this repository we use the command below:

$ butler register-instrument $REPO lsst.obs.lsst.LsstComCam

Register SkyMap#

To register the skymap configuration we use the command below:

$ butler register-skymap --config-file lsst_cells_v1.skymap.config $REPO

Ingest raw exposures#

We ingest the raw exposures using:

$ butler ingest-raws --fail-fast --transfer direct $REPO $DATA/raw/LSSTComCam

Note that parallel ingestion was performed to speedup the process.

Define visits#

To define visits from the exposures previously ingested into the repository we use the command below:

$ butler define-visits $REPO LSSTComCam --collections LSSTComCam/raw/all

Add instrument’s curated calibrations#

To ingest the known calibration data for LSSTComCam (see DM-48650) we use the command below:

$ butler write-curated-calibrations $REPO lsst.obs.lsst.LsstComCam --label DM-48650

Ingest calibration data#

To ingest calibration data we use the command below, for each collection:

$ butler import $REPO $DATA/ancillary --export-file export.yaml --transfer direct

Once all calibrations have been ingested, a global calibration collection is defined:

$ butler collection-chain $REPO LSSTComCam/calib LSSTComCam/calib/DM-48955,LSSTComCam/calib/DM-48520,LSSTComCam/calib/DM-47365,LSSTComCam/calib/DM-47741,LSSTComCam/calib/DM-47547,LSSTComCam/calib/DM-47499,LSSTComCam/calib/DM-47447,LSSTComCam/calib/DM-47197,LSSTComCam/calib/DM-46360,LSSTComCam/calib/DM-47498,LSSTComCam/calib/DM-48650,LSSTComCam/calib/DM-48650/unbounded

Ingest reference catalogs#

For the first version of “The Monster” catalog, the corresponding dataset type is registered with:

$ butler register-dataset-type $REPO the_monster_20240904 SimpleCatalog htm7

Then the ingestion is done:

$ butler ingest-files $REPO the_monster_20240904 refcats/DM-46370/the_monster_20240904 --prefix $DATA/raw/refcats/the_monster_20240904/ --transfer direct the_monster_20240904.ecsv

where the file the_monster_20240904.ecsv has been provided by B. Yanny. Similarly, for the second version:

$ butler register-dataset-type $REPO the_monster_20250219 SimpleCatalog htm7
$ butler ingest-files $REPO the_monster_20250219 refcats/DM-49042/the_monster_20250219 --prefix $DATA/raw/refcats/the_monster_20250219/ --transfer direct the_monster_20250219.ecsv

A chained collection is then created:

$ butler collection-chain $REPO refcats refcats/DM-46370/the_monster_20240904,refcats/DM-49042/the_monster_20250219

Ingest Pretrained-models catalog#

Pretrained-models catalog is ingested with:

$ butler import $REPO --export-file pretrained-models-export.yaml --transfer direct $DATA/ancillary/

where pretrained-models-export.yaml has the following content:

description: Butler Data Repository Export
version: 1.0.2
universe_version: 7
universe_namespace: daf_butler
data:
- type: collection
  collection_type: RUN
  name: pretrained_models/tac_cnn_comcam_2025-02-18
  host: null
  timespan_begin: null
  timespan_end: null
- type: dataset_type
  name: pretrainedModelPackage
  dimensions: []
  storage_class: NNModelPackagePayload
  is_calibration: false
- type: dataset
  dataset_type: pretrainedModelPackage
  run: pretrained_models/tac_cnn_comcam_2025-02-18
  records:
  - dataset_id:
    - !uuid 'a83d850a-0094-417c-ac9c-64d0f7b98048'
    data_id:
    - {}
    path: pretrained_models/tac_cnn_comcam_2025-02-18/pretrainedModelPackage/pretrainedModelPackage_pretrained_models_tac_cnn_comcam_2025-02-18.zip
    formatter: lsst.meas.transiNet.modelPackages.formatters.NNModelPackageFormatter

A chained collection is then created:

$ butler collection-chain $REPO pretrained_models pretrained_models/tac_cnn_comcam_2025-02-18

Ingest FGCM calibration#

FGCM calibration is ingested with:

$ butler import $REPO --export-file DM-48089-fgcmLookupTable-export.yaml --transfer direct $DATA/ancillary/

where DM-48089-fgcmLookupTable-export.yaml has the following content:

description: Butler Data Repository Export
version: 1.0.2
universe_version: 7
universe_namespace: daf_butler
data:
- type: dimension
  element: instrument
  records:
  - name: LSSTComCam
    visit_max: 7050123199999
    visit_system: 2
    exposure_max: 7050123199999
    detector_max: 1000
    class_name: lsst.obs.lsst.LsstComCam
- type: collection
  collection_type: RUN
  name: LSSTComCam/calib/fgcmcal/DM-48089
  host: null
  timespan_begin: null
  timespan_end: null
- type: dataset_type
  name: fgcmLookUpTable
  dimensions:
  - instrument
  storage_class: Catalog
  is_calibration: false
- type: dataset
  dataset_type: fgcmLookUpTable
  run: LSSTComCam/calib/fgcmcal/DM-48089
  records:
  - dataset_id:
    - !uuid 'bb573ca3-6159-45d9-88e3-866e01da4882'
    data_id:
    - instrument: LSSTComCam
    path: LSSTComCam/calib/fgcmcal/DM-48089/fgcmLookUpTable/fgcmLookUpTable_LSSTComCam_LSSTComCam_calib_fgcmcal_DM-48089.fits
    formatter: lsst.obs.base.formatters.fitsGeneric.FitsGenericFormatter

A chained collection is then created:

$ butler collection-chain $REPO LSSTComCam/calib/fgcmcal LSSTComCam/calib/fgcmcal/DM-48089

Ingest Solar System Objects catalog#

Solar System Objects catalog (see DM-49977) is ingested with:

$ butler import $REPO --export-file export.yaml --transfer direct $DATA/ancillary/

where the file export.yaml has been provided by B. Yanny. A TAGGED collection is then created, including all datasets:

butler = Butler('$REPO',writeable=True)
butler.registry.registerCollection("LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409", CollectionType.TAGGED)
dataset_refs = butler.registry.queryDatasets("preloaded_DRP_SsObjects",collections="u/jkurla/dp1_ephem_2*",instrument="LSSTComCam")
butler.registry.associate("LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409", dataset_refs)

Create global collection#

Within the 16000 exposures ingested, about 2000 are Science exposures (each with 9 detectors):

$ butler query-datasets $REPO raw --collections LSSTComCam/raw/all --where "exposure.observation_type='science'" --limit 0 |wc -l
19205

From these ones, 1792 exposures have been selected to be processed (see DM-49594). We define therefore a collection containing thse 1792 selected LSSTComCam exposures:

$ python /pbs/throng/lsst/users/byanny/butler_associate_visits.py $REPO /pbs/throng/lsst/users/byanny/dp1_good_visits.txt LSSTComCam/raw/DP1-RC3/DM-49594 LSSTComCam/raw/all LSSTComCam 2000

Finally, we define a collection containing all input collections previously defined:

$ butler collection-chain $REPO LSSTComCam/DP1/defaults LSSTComCam/raw/DP1-RC3/DM-49594,LSSTComCam/calib,refcats,skymaps,pretrained_models,LSSTComCam/calib/fgcmcal,LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409

Inspecting and checking the Butler repository#

The LSSTComCam/DP1/defaults collection should look like this:

$ butler query-collections --chains=tree $REPO LSSTComCam/DP1/defaults
                                   Name                                        Type
-------------------------------------------------------------------------- -----------
LSSTComCam/DP1/defaults                                                    CHAINED
  LSSTComCam/raw/DP1-RC3/DM-49594                                          TAGGED
  LSSTComCam/calib                                                         CHAINED
    LSSTComCam/calib/DM-48955                                              CHAINED
      LSSTComCam/calib/DM-48955/illumCorr/illuminationCorrection.20250224a CALIBRATION
    LSSTComCam/calib/DM-48520                                              CHAINED
      LSSTComCam/calib/DM-48520/DP1/flat-y.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/flat-z.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/flat-i.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/flat-r.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/flat-g.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/flat-u.20250207a                       CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/dark.20250207a                         CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/bias.20250207a                         CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/cti.20250207a                          CALIBRATION
      LSSTComCam/calib/DM-48520/DP1/defects.20250207a                      CALIBRATION
    LSSTComCam/calib/DM-47365                                              CHAINED
      LSSTComCam/calib/DM-47365/addManualDefects/defects.20241211a         CALIBRATION
    LSSTComCam/calib/DM-47741                                              CHAINED
      LSSTComCam/calib/DM-47741/twiflat/flat-y.20241120a                   CALIBRATION
    LSSTComCam/calib/DM-47547                                              CHAINED
      LSSTComCam/calib/DM-47547/twiflat/flat-z.20241113a                   CALIBRATION
      LSSTComCam/calib/DM-47547/twiflat/flat-r.20241113a                   CALIBRATION
      LSSTComCam/calib/DM-47547/twiflat/flat-g.20241113a                   CALIBRATION
    LSSTComCam/calib/DM-47499                                              CHAINED
      LSSTComCam/calib/DM-47499/twiflat/flat-u.20241110a                   CALIBRATION
    LSSTComCam/calib/DM-47447                                              CHAINED
      LSSTComCam/calib/DM-47447/gainFixup/flat-g.20241107a                 CALIBRATION
      LSSTComCam/calib/DM-47447/gainFixup/flat-i.20241107a                 CALIBRATION
      LSSTComCam/calib/DM-47447/gainFixup/flat-r.20241107a                 CALIBRATION
      LSSTComCam/calib/DM-47447/gainFixup/dark.20241107a                   CALIBRATION
      LSSTComCam/calib/DM-47447/gainFixup/bias.20241107a                   CALIBRATION
      LSSTComCam/calib/DM-47447/gainFixup/ptc.20241107a                    CALIBRATION
    LSSTComCam/calib/DM-47197                                              CHAINED
      LSSTComCam/calib/DM-47197/pseudoFlat/flat-r.20241028d                CALIBRATION
      LSSTComCam/calib/DM-47197/pseudoFlat/flat-i.20241028d                CALIBRATION
    LSSTComCam/calib/DM-46360                                              CHAINED
      LSSTComCam/calib/DM-46360/isrTaskLSST/flat-i.20240926a               CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/flat-r.20240926a               CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/flat-g.20240926a               CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/dark.20240926a                 CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/bias.20240926a                 CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/bfk.20240926a                  CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/ptc.20240926a                  CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/linearizer.20240926a           CALIBRATION
      LSSTComCam/calib/DM-46360/isrTaskLSST/defects.20240926a              CALIBRATION
    LSSTComCam/calib/DM-47498                                              CHAINED
      LSSTComCam/calib/DM-47498/fallbackFlats/flat-all.20241112a           CALIBRATION
    LSSTComCam/calib/DM-48650                                              CALIBRATION
    LSSTComCam/calib/DM-48650/unbounded                                    RUN
  refcats                                                                  CHAINED
    refcats/DM-46370/the_monster_20240904                                  RUN
    refcats/DM-49042/the_monster_20250219                                  RUN
  skymaps                                                                  RUN
  pretrained_models                                                        CHAINED
    pretrained_models/tac_cnn_comcam_2025-02-18                            RUN
  LSSTComCam/calib/fgcmcal                                                 CHAINED
    LSSTComCam/calib/fgcmcal/DM-48089                                      RUN
  LSSTComCam/calib/DM-49977/DP1.0/preloaded_SsObjects.20250409             TAGGED

One can then check that all visits / detectors have correctly been ingested:

$ butler query-datasets $REPO raw --collections LSSTComCam/raw/all --limit 0 | wc -l
148849

Since there are 9 detectors in LSSTComCam, this corresponds to the approximate number of 16000 exposures in the LSSTComCam campaign.