mdsuite.database.simulation_database module¶

MDSuite: A Zincwarecode package.

License¶

This program and the accompanying materials are made available under the terms of the Eclipse Public License v2.0 which accompanies this distribution, and is available at https://www.eclipse.org/legal/epl-v20.html

SPDX-License-Identifier: EPL-2.0

Copyright Contributors to the Zincwarecode Project.

Contact Information¶

email: zincwarecode@gmail.com github: https://github.com/zincware web: https://zincwarecode.com/

Citation¶

If you use this module please cite us with:

Summary¶

class mdsuite.database.simulation_database.Database(path: Union[str, Path] = 'database')[source]¶

Bases: object

Database class.

Databases make up a large part of the functionality of MDSuite and are kept fairly consistent in structure. Therefore, the database_path structure we are using has a separate class with commonly used methods which act as wrappers for the hdf5 database_path.

path¶

The name of the database_path in question.

Type:: str|Path

add_data(chunk: TrajectoryChunkData, start_idx: int)[source]¶

Add new data to the dataset.

Parameters:

chunk – a data chunk
start_idx – Configuration at which to start writing.

add_dataset(structure: dict)[source]¶

Add a dataset of the necessary size to the database_path.

Just as a separate method exists for building the group structure of the hdf5 database_path, so too do we include a separate method for adding a dataset. This is so datasets can be added not just upon the initial construction of the database_path, but also if tensor_values is added in the future that should also be stored. This method will assume that a group has already been built, although this is not necessary for HDF5, the separation of the actions is good practice.

Parameters:: structure (dict) – Structure of a single property to be added to the database_path. e.g. {‘Na’: {‘Forces’: (200, 5000, 3)}}
Return type:: Updates the database_path directly.

change_key_names(mapping: dict)[source]¶

Change the name of database_path keys.

Parameters:: mapping (dict) – Mapping for the change of names
Return type:: Updates the database_path

check_existence(path: str) → bool[source]¶

Check to see if a dataset is in the database_path.

Parameters:: path (str) – Path to the desired dataset
Returns:: response – If true, the path exists, else, it does not.
Return type:: bool

database_exists() → bool[source]¶: Check if the database file already exists.

get_data_size(data_path: str) → tuple[source]¶

Return the size of a dataset as a tuple (n_rows, n_columns, n_bytes).

Parameters:: data_path (str) – path to the tensor_values in the hdf5 database_path.
Returns:: dataset_properties – Tuple of tensor_values about the dataset, e.g. (n_rows, n_columns, n_bytes)
Return type:: tuple

get_database_summary()[source]¶

Get a summary of the database properties.

Returns:: summary – A list of properties that are in the database.
Return type:: list

get_load_time(database_path: Optional[str] = None)[source]¶

Calculate the open/close time of the database_path.

Parameters:: database_path (str) – Database path on which to test the time.
Returns:: opening time – Time taken to open and close the database_path
Return type:: float

get_memory_information() → dict[source]¶

Get memory information from the database_path.

Returns:: memory_database – A dictionary of the memory information of the groups in the database_path
Return type:: dict

initialize_database(structure: dict)[source]¶

Build a database_path with a general structure.

Note, this method WILL overwrite a pre-existing database_path. This is because it is only to be called on the initial construction of an experiment class and the first addition of tensor_values to it.

Parameters:: structure (dict) – General structure of the dictionary with relevant dataset sizes. e.g. {‘Na’: {‘Forces’: (200, 5000, 3)}, ‘Pressure’: (5000, 6), ‘Temperature’: (5000, 1)} In this case, the last value in the tuple corresponds to the number of components that wil be parsed to the database_path.

load_data(path_list: ~typing.Optional[list] = None, select_slice: <numpy.lib.index_tricks.IndexExpression object at 0x7f811e5ba760> = slice(None, None, None), dictionary: bool = False, scaling: ~typing.Optional[list] = None, d_size: ~typing.Optional[int] = None)[source]¶

Load tensor_values from the database_path for some operation.

Should be called by the tensor_values fetch class as this will ensure correct loading and pre-loading.

resize_datasets(structure: dict)[source]¶

Resize a dataset so more tensor_values can be added.

Parameters:: structure (dict) – path to the dataset that needs to be resized. e.g. {‘Na’: {‘velocities’: (32, 100, 3)}} will resize all ‘x’, ‘y’, and ‘z’ datasets by 100 entries.

class mdsuite.database.simulation_database.MoleculeInfo(name: str, n_particles: int, properties: List[PropertyInfo], mass: Optional[float] = None, charge: float = 0, groups: Optional[dict] = None)[source]¶

Bases: SpeciesInfo

Information about a Molecule.

All the information of a species + groups

groups¶

A molecule specific dictionary for mapping the molecule to the particles. The keys of this dict are index references to a specific molecule, i.e. molecule 1 and the values are a dict of atom species and their indices belonging to that specific molecule. e.g

water = {“groups”: {“0”: {“H”: [0, 1], “O”: [0]}}

This tells us that the 0th water molecule consists of the 0th and 1st hydrogen atoms in the database as well as the 0th oxygen atom.

Type:: dict

groups: dict = None¶

class mdsuite.database.simulation_database.PropertyInfo(name: str, n_dims: int)[source]¶

Bases: object

Information of a trajectory property. example: pos_info = PropertyInfo(‘Positions’, 3) vel_info = PropertyInfo(‘Velocities’, 3).

name¶

The name of the property

Type:: str

n_dims¶

The dimensionality of the property

Type:: int

n_dims: int¶

name: str¶

class mdsuite.database.simulation_database.SpeciesInfo(name: str, n_particles: int, properties: List[PropertyInfo], mass: Optional[float] = None, charge: float = 0)[source]¶

Bases: object

Information of a species.

name¶

Name of the species (e.g. ‘Na’)

Type:: str

n_particles¶

Number of particles of that species

Type:: int

properties¶

List of the properties that were recorded for the species mass and charge are optional

Type:: list of PropertyInfo

charge: float = 0¶

mass: float = None¶

n_particles: int¶

name: str¶

properties: List[PropertyInfo]¶

class mdsuite.database.simulation_database.TrajectoryChunkData(species_list: List[SpeciesInfo], chunk_size: int)[source]¶

Bases: object

Class to specify the data format for transfer from the file to the database.

add_data(data: ndarray, config_idx, species_name, property_name)[source]¶

Add configuration data to the chunk :param data: The data to be added, with shape (n_configs, n_particles, n_dims).

n_particles and n_dims relates to the species and the property that is being added

Parameters:

config_idx – Start index of the configs that are being added.
species_name – Name of the species to which the data belongs
property_name – Name of the property being added.
Example –
------- –
loop (that reads 5 configs per) –
loop –
add_data(vel_array –
16*5 –
'Na' –
'Velocities') –
(5 (where vel.data.shape ==) –
42 –
3) –

get_data()[source]¶

class mdsuite.database.simulation_database.TrajectoryMetadata(n_configurations: int, species_list: ~typing.List[~mdsuite.database.simulation_database.SpeciesInfo], box_l: ~typing.Optional[list] = None, sample_rate: int = 1, sample_step: ~typing.Optional[float] = None, temperature: ~typing.Optional[float] = None, simulation_data: dict = <factory>)[source]¶

Bases: object

Trajectory Metadata container.

This metadata must be extracted from trajectory files to build the database into which the trajectory will be stored.

n_configurations¶

Number of configurations of the whole trajectory.

Type:: int

species_list¶

The information about all species in the system.

Type:: list of SpeciesInfo

box_l¶

The simulation box size in three dimensions

Type:: list of float

sample_rate¶

The number of timesteps between consecutive samples # todo remove in favour of sample_step

Type:: int optional

sample_step¶

The time between consecutive configurations. E.g. for a simulation with time step 0.1 where the trajectory is written every 5 steps: sample_step = 0.5. Does not have to be specified (e.g. configurations from Monte Carlo scheme), but is needed for all dynamic observables.

Type:: int optional

temperature¶

The set temperature of the system. Optional because only applicable for MD simulations with thermostat. Needed for certain observables.

Type:: float optional

simulation_data¶

All other simulation data that can be extracted from the trajectory metadata. E.g. software version, pressure in NPT simulations, time step, …

Type:: str|Path, optional

box_l: list = None¶

n_configurations: int¶

sample_rate: int = 1¶

sample_step: float = None¶

simulation_data: dict¶

species_list: List[SpeciesInfo]¶

temperature: float = None¶