Converters¶
Provides a collection of routines to convert from Rockstar and Consistent-Tree ascii catalogues into hdf5 files
Recommended usage:
import uchuutools.converters as utconv
Available submodules¶
convert_ctrees_to_h5()- Converts ascii Consistent-Trees catalogues to hdf5
convert_halocat_to_h5()- Converts ascii Rockstar and Consistent-Trees halo catalogues to hdf5
-
uchuutools.converters.convert_ctrees_to_h5(filenames, standard_consistent_trees=None, outputdir='./', output_filebase='forest', write_halo_props_cont=True, fields=None, drop_fields=None, truncate=True, compression='gzip', buffersize=None, use_pread=True, max_nforests=None, comm=None, show_progressbar=False)[source]¶ Convert a set of forests from Consistent Trees ascii file(s) into an (optionally compressed) hdf5 file. Can be invoked with MPI.
Parameters: filenames (list of strings for Consistent-Trees catalogues, required) – The input ascii files will be decompressed, if required.
standard_consistent_tree (boolean, optional, default: None) – Whether the input filres were generated by the Uchuu collaboration’s parallel Consistent-Trees code. If only two files are specified in
filenames, and these two filenames end with ‘forests.list’, and ‘locations.dat’, then a standard Consistent-Trees output will be inferred. If all files specified infilenamesend with ‘.tree’, then parallel Consistent-Trees is inferred.outputdir (string, optional, default: current working directory (‘./’)) – The directory where the converted hdf5 file will be written in. The output filename is obtained by appending ‘.h5’ to the
input_file.output_filebase (string, optional, default: “forest”) – The output filename is constructed using ‘{outputdir}/{output_filebase}_{rank}.h5’
write_halo_props_cont (boolean, optional, default: True) – Controls if the individual halo properties are written as distinct datasets such that any given property for all halos is written contiguously (structure of arraysA).
When set to False, only one dataset (‘halos’) is created under the group ‘Forests’, and all properties of a halo is written out contiguously (array of structures).
fields (list of strings, optional, default: None) – Describes which specific columns in the input file to carry across to the hdf5 file. Default action is to convert ALL columns.
drop_fields (list of strings, optional, default: None) – Contains a list of column names that will not be carried through to the hdf5 file. If
drop_fieldsis not set for a parallel Consistent-Trees run, then [Tidal_Force,Tidal_ID] will be used.drop_fieldsis processed afterfields, i.e., you can specifyfields=Noneto create an initial list of all columns in the ascii file, and then specifydrop_fields = [colname2, colname7, ...], and only those columns will not be present in the hdf5 output.truncate (boolean, default: True) – Controls whether a new file is created on this ‘rank’. When set to
True, the header info file is written out. Otherwise, the file is appended to. The code checks to make sure that the existing metadata in the hdf5 file is identical to the new metadata in the ascii files being currently converted (i.e., tries to avoid different simulation + mergertree results being present in the same file)compression (string, optional, default: ‘gzip’) – Controls the kind of compression applied. Valid options are anything that
h5pyaccepts.buffersize (integer, optional, default: 1 MB) – Controls the size of the buffer how many halos are written out per write call to the hdf5 file. The number of halos written out is this buffersize divided the size of the datatype for individual halos.
use_pread (boolean, optional, default: True) – Controls whether low-level i/o operations (through
os.pread) is used. Otherwise, higher-level i/o operations (viaio.open) is used. This option is only meaningful on linux systems (and python3+). Sincepreaddoes not change the file offset, additional parallelisation can be implemented reasonably easily.max_nforests (integer >= 1, optional, default: None) – The maximum number of forests to convert across all tasks. If a positive value is passed then the total number of forests converted will be
min(totnforests, max_nforests). ValueError is raised if the passed parameter value is less than 1.comm (MPI communicator, optional, default: None) – Controls whether the conversion is run in MPI parallel. Should be compatible with mpi4py.MPI.COMM_WORLD.
show_progressbar (boolean, optional, default: False) – Controls whether a progressbar is printed. Only enables progressbar on rank==0, the remaining ranks ignore this keyword.
Returns: Returns
Trueon successful completion.
-
uchuutools.converters.convert_halocat_to_h5(filenames, outputdir='./', write_halo_props_cont=True, fields=None, drop_fields=None, chunksize=100000, compression='gzip', comm=None, show_progressbar=False)[source]¶ Converts a list of Rockstar/Consistent-Trees halo catalogues from ascii to hdf5.
Can be used with MPI but requires that the number of files to be larger than the number of MPI tasks spawned.
Parameters: filenames (list of strings, required) – A list of filename(s) for the Rockstar/Consistent Trees file. Can be compressed (.gz, .bz2, .xz, .zip) files.
outputdir (string, optional, default: current working directory (‘./’)) – The directory where the converted hdf5 file will be written in. The output filename is obtained by appending ‘.h5’ to the
input_file. If the output file already exists, then it will be truncated.write_halo_props_cont (boolean, optional, default: True) – Controls if the individual halo properties are written as distinct datasets such that any given property for ALL halos is written contiguously (structure of arrays, SOA).
When set to False, only one dataset (‘halos’) is created, and ALL properties of a halo is written out contiguously (array of structures).
fields (list of strings, optional, default: None) – Describes which specific columns in the input file to carry across to the hdf5 file. Default action is to convert ALL columns.
drop_fields (list of strings, optional, default: None) – Describes which columns are not carried through to the hdf5 file. Processed after
fields, i.e., you can specifyfields=Noneto create an initial list of all columns in the ascii file, and then specifydrop_fields = [colname2, colname7, ...], and those columns will not be present in the hdf5 output.chunksize (integer, optional, default: 100000) – Controls how many lines are read in from the input file before being written out to the hdf5 file.
compression (string, optional, default: ‘gzip’) – Controls the kind of compression applied. Valid options are anything that
h5pyaccepts.comm (MPI communicator, optional, default: None) – Controls whether the conversion is run in MPI parallel. Should be compatible with mpi4py.MPI.COMM_WORLD.
show_progressbar (boolean, optional, default: False) – Controls whether a progressbar is printed. Only enables progressbar on rank==0, the remaining ranks ignore this keyword.
Returns: Returns
Trueon successful completion.