mdsuite.database.data_manager module¶

MDSuite: A Zincwarecode package.

License¶

This program and the accompanying materials are made available under the terms of the Eclipse Public License v2.0 which accompanies this distribution, and is available at https://www.eclipse.org/legal/epl-v20.html

SPDX-License-Identifier: EPL-2.0

Copyright Contributors to the Zincwarecode Project.

Contact Information¶

email: zincwarecode@gmail.com github: https://github.com/zincware web: https://zincwarecode.com/

Citation¶

If you use this module please cite us with:

Summary¶

Module for the data manager. The data manager handles loading of data as TensorFlow generators. These generators allow for the full use of the TF data pipelines but can required special formatting rules.

class mdsuite.database.data_manager.DataManager(database: Optional[Database] = None, data_path: Optional[list] = None, data_range: Optional[int] = None, n_batches: Optional[int] = None, batch_size: Optional[int] = None, ensemble_loop: Optional[int] = None, correlation_time: int = 1, remainder: Optional[int] = None, atom_selection=slice(None, None, None), minibatch: bool = False, atom_batch_size: Optional[int] = None, n_atom_batches: Optional[int] = None, atom_remainder: Optional[int] = None, offset: int = 0)[source]¶

Bases: object

Class for the MDS tensor_values fetcher.

Due to the amount of tensor_values that needs to be collected and the possibility to optimize repeated loading, a separate tensor_values fetching class is required. This class manages how tensor_values is loaded from the MDS database_path and optimizes processes such as pre-loading and parallel reading.

batch_generator(dictionary: bool = False, system: bool = False, remainder: bool = False, loop_array: Optional[ndarray] = None) → tuple[source]¶

Build a generator object for the batch loop.

Parameters:

dictionary (bool) – If true return a dict. This is default now and could be removed.
system (bool) – If true, a system parameter is being called for.
remainder (bool) – If true, a remainder batch must be computed.
loop_array (np.ndarray) –
If this is not None, elements of this array will be looped over in in the batches which load data at their indices. For example,

loop_array = [[1, 4, 7], [10, 13, 16], [19, 21, 24]]

In this case, in the fist batch, configurations 1, 4, and 7 will be loaded for the analysis. This is particularly important in the structural properties.

Return type:

Returns a generator function and its arguments

ensemble_generator(system: bool = False, glob_data: Optional[dict] = None) → tuple[source]¶

Build a generator for the ensemble loop.

Parameters:

system (bool) – If true, the system generator is returned.
glob_data (dict) – data to be loaded in ensembles from a tensorflow generator. e.g. {b’Na/Positions’: tf.Tensor}. Will usually include a b’data_size’ key which is checked in the loop and ignored. All keys are in byte arrays. This appears when you pass a dict to the tensorflow generator.

Return type:

Ensemble loop generator