mdsuite.database.simulation_database module

MDSuite: A Zincwarecode package.

License

This program and the accompanying materials are made available under the terms of the Eclipse Public License v2.0 which accompanies this distribution, and is available at https://www.eclipse.org/legal/epl-v20.html

SPDX-License-Identifier: EPL-2.0

Copyright Contributors to the Zincwarecode Project.

Contact Information

email: zincwarecode@gmail.com github: https://github.com/zincware web: https://zincwarecode.com/

Citation

If you use this module please cite us with:

Summary

class mdsuite.database.simulation_database.Database(path: Union[str, Path] = 'database')[source]

Bases: object

Database class.

Databases make up a large part of the functionality of MDSuite and are kept fairly consistent in structure. Therefore, the database_path structure we are using has a separate class with commonly used methods which act as wrappers for the hdf5 database_path.

path

The name of the database_path in question.

Type:

str|Path

add_data(chunk: TrajectoryChunkData, start_idx: int)[source]

Add new data to the dataset.

Parameters:
  • chunk – a data chunk

  • start_idx – Configuration at which to start writing.

add_dataset(structure: dict)[source]

Add a dataset of the necessary size to the database_path.

Just as a separate method exists for building the group structure of the hdf5 database_path, so too do we include a separate method for adding a dataset. This is so datasets can be added not just upon the initial construction of the database_path, but also if tensor_values is added in the future that should also be stored. This method will assume that a group has already been built, although this is not necessary for HDF5, the separation of the actions is good practice.

Parameters:

structure (dict) – Structure of a single property to be added to the database_path. e.g. {‘Na’: {‘Forces’: (200, 5000, 3)}}

Return type:

Updates the database_path directly.

change_key_names(mapping: dict)[source]

Change the name of database_path keys.

Parameters:

mapping (dict) – Mapping for the change of names

Return type:

Updates the database_path

check_existence(path: str) bool[source]

Check to see if a dataset is in the database_path.

Parameters:

path (str) – Path to the desired dataset

Returns:

response – If true, the path exists, else, it does not.

Return type:

bool

database_exists() bool[source]

Check if the database file already exists.

get_data_size(data_path: str) tuple[source]

Return the size of a dataset as a tuple (n_rows, n_columns, n_bytes).

Parameters:

data_path (str) – path to the tensor_values in the hdf5 database_path.

Returns:

dataset_properties – Tuple of tensor_values about the dataset, e.g. (n_rows, n_columns, n_bytes)

Return type:

tuple

get_database_summary()[source]

Get a summary of the database properties.

Returns:

summary – A list of properties that are in the database.

Return type:

list

get_load_time(database_path: Optional[str] = None)[source]

Calculate the open/close time of the database_path.

Parameters:

database_path (str) – Database path on which to test the time.

Returns:

opening time – Time taken to open and close the database_path

Return type:

float

get_memory_information() dict[source]

Get memory information from the database_path.

Returns:

memory_database – A dictionary of the memory information of the groups in the database_path

Return type:

dict

initialize_database(structure: dict)[source]

Build a database_path with a general structure.

Note, this method WILL overwrite a pre-existing database_path. This is because it is only to be called on the initial construction of an experiment class and the first addition of tensor_values to it.

Parameters:

structure (dict) – General structure of the dictionary with relevant dataset sizes. e.g. {‘Na’: {‘Forces’: (200, 5000, 3)}, ‘Pressure’: (5000, 6), ‘Temperature’: (5000, 1)} In this case, the last value in the tuple corresponds to the number of components that wil be parsed to the database_path.

load_data(path_list: ~typing.Optional[list] = None, select_slice: <numpy.lib.index_tricks.IndexExpression object at 0x7f811e5ba760> = slice(None, None, None), dictionary: bool = False, scaling: ~typing.Optional[list] = None, d_size: ~typing.Optional[int] = None)[source]

Load tensor_values from the database_path for some operation.

Should be called by the tensor_values fetch class as this will ensure correct loading and pre-loading.

resize_datasets(structure: dict)[source]

Resize a dataset so more tensor_values can be added.

Parameters:

structure (dict) – path to the dataset that needs to be resized. e.g. {‘Na’: {‘velocities’: (32, 100, 3)}} will resize all ‘x’, ‘y’, and ‘z’ datasets by 100 entries.

class mdsuite.database.simulation_database.MoleculeInfo(name: str, n_particles: int, properties: List[PropertyInfo], mass: Optional[float] = None, charge: float = 0, groups: Optional[dict] = None)[source]

Bases: SpeciesInfo

Information about a Molecule.

All the information of a species + groups

groups

A molecule specific dictionary for mapping the molecule to the particles. The keys of this dict are index references to a specific molecule, i.e. molecule 1 and the values are a dict of atom species and their indices belonging to that specific molecule. e.g

water = {“groups”: {“0”: {“H”: [0, 1], “O”: [0]}}

This tells us that the 0th water molecule consists of the 0th and 1st hydrogen atoms in the database as well as the 0th oxygen atom.

Type:

dict

groups: dict = None
class mdsuite.database.simulation_database.PropertyInfo(name: str, n_dims: int)[source]

Bases: object

Information of a trajectory property. example: pos_info = PropertyInfo(‘Positions’, 3) vel_info = PropertyInfo(‘Velocities’, 3).

name

The name of the property

Type:

str

n_dims

The dimensionality of the property

Type:

int

n_dims: int
name: str
class mdsuite.database.simulation_database.SpeciesInfo(name: str, n_particles: int, properties: List[PropertyInfo], mass: Optional[float] = None, charge: float = 0)[source]

Bases: object

Information of a species.

name

Name of the species (e.g. ‘Na’)

Type:

str

n_particles

Number of particles of that species

Type:

int

properties

List of the properties that were recorded for the species mass and charge are optional

Type:

list of PropertyInfo

charge: float = 0
mass: float = None
n_particles: int
name: str
properties: List[PropertyInfo]
class mdsuite.database.simulation_database.TrajectoryChunkData(species_list: List[SpeciesInfo], chunk_size: int)[source]

Bases: object

Class to specify the data format for transfer from the file to the database.

add_data(data: ndarray, config_idx, species_name, property_name)[source]

Add configuration data to the chunk :param data: The data to be added, with shape (n_configs, n_particles, n_dims).

n_particles and n_dims relates to the species and the property that is being added

Parameters:
  • config_idx – Start index of the configs that are being added.

  • species_name – Name of the species to which the data belongs

  • property_name – Name of the property being added.

  • Example

  • -------

  • loop (that reads 5 configs per) –

  • loop

  • add_data(vel_array

  • 16*5

  • 'Na'

  • 'Velocities')

  • (5 (where vel.data.shape ==) –

  • 42

  • 3)

get_data()[source]
class mdsuite.database.simulation_database.TrajectoryMetadata(n_configurations: int, species_list: ~typing.List[~mdsuite.database.simulation_database.SpeciesInfo], box_l: ~typing.Optional[list] = None, sample_rate: int = 1, sample_step: ~typing.Optional[float] = None, temperature: ~typing.Optional[float] = None, simulation_data: dict = <factory>)[source]

Bases: object

Trajectory Metadata container.

This metadata must be extracted from trajectory files to build the database into which the trajectory will be stored.

n_configurations

Number of configurations of the whole trajectory.

Type:

int

species_list

The information about all species in the system.

Type:

list of SpeciesInfo

box_l

The simulation box size in three dimensions

Type:

list of float

sample_rate

The number of timesteps between consecutive samples # todo remove in favour of sample_step

Type:

int optional

sample_step

The time between consecutive configurations. E.g. for a simulation with time step 0.1 where the trajectory is written every 5 steps: sample_step = 0.5. Does not have to be specified (e.g. configurations from Monte Carlo scheme), but is needed for all dynamic observables.

Type:

int optional

temperature

The set temperature of the system. Optional because only applicable for MD simulations with thermostat. Needed for certain observables.

Type:

float optional

simulation_data

All other simulation data that can be extracted from the trajectory metadata. E.g. software version, pressure in NPT simulations, time step, …

Type:

str|Path, optional

box_l: list = None
n_configurations: int
sample_rate: int = 1
sample_step: float = None
simulation_data: dict
species_list: List[SpeciesInfo]
temperature: float = None