Index file
The index file records information about each of the datasets stored in the prefix directory on the parallel file system. It is stored in the prefix directory. Internally, the data of the index file is organized as a hash. Here are the contents of an example index file.
VERSION
1
CURRENT
scr.dataset.18
DIR
scr.dataset.18
DSET
18
scr.dataset.12
DSET
12
DSET
18
DIR
scr.dataset.18
COMPLETE
1
DSET
ID
18
NAME
scr.dataset.18
CREATED
1312853507675143
USER
user1
JOBNAME
simulation123
JOBID
112573
CKPT
18
FILES
4
SIZE
2097182
COMPLETE
1
FLUSHED
2011-08-08T18:31:47
12
DIR
scr.dataset.12
FETCHED
2011-08-08T18:31:47
FLUSHED
2011-08-08T18:30:30
COMPLETE
1
DSET
COMPLETE
1
SIZE
2097182
FILES
4
ID
12
NAME
scr.dataset.12
CREATED
1312853406814268
USER
user1
JOBNAME
simulation123
JOBID
112573
CKPT
12
The VERSION
field records the version number of file format of the
index file. This enables future SCR implementations to change the format
of the index file while still allowing SCR to read index files written
by older implementations.
The CURRENT
field specifies the name of a dataset directory. When
restarting a job, SCR starts with this directory. It then works
backwards from this directory, searching for the most recent checkpoint
(the checkpoint having the highest id) that is thought to be complete
and that has not failed a previous fetch attempt.
The DIR
hash is a simple index which maps a directory name to a
dataset id.
The information for each dataset is indexed by dataset id under the
DSET
hash. There may be multiple copies of a given dataset id, each
stored within a different dataset directory within the prefix directory.
For a given dataset id, each copy is indexed by directory name under the
DIR
hash. For each directory, SCR tracks whether the set of dataset
files is thought to be complete (COMPLETE
), the timestamp at which
the dataset was copied to the parallel file system (FLUSHED
),
timestamps at which the dataset (checkpoint) was fetched to restart a
job (FETCHED
), and timestamps at which a fetch attempt of this
dataset (checkpoint) failed (FAILED
).