Descriptors

wACSF

The weighted Atom-centered Symmetry Functions (wACSF) [1] descriptor can be used to represent the local environment near an atom by using a fingerprint composed of the output of multiple two- and three-body symmetry functions. wACSF is an extension of the Atom-centered Symmetry Functions (ACSF) [2] by applying a weighting scheme to the symmetry functions, which can account for different types of atomic pairs or neighbor interactions more effectively. Because of that, wACSFs leads to a significantly better generalisation performance in the machine learning potential than the large set of conventional ACSFs.

[1] Jörg Behler, Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys., 134(7):074106, (2011).
[2] M. Gastegger, et al., wACSF—Weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys., 148 (24): 241709, (2018).

Input file:

  • type: wacsf

  • params:

    • r_min (float, default = 0.0): The minimum radial cutoff distance (in A) around the absorption site.

    • r_max (float, default = 8.0): The maximum radial cutoff distance (in A) around the absorption site.

    • n_g2 (int, default = 0): The number of G2 symmetry functions to use for encoding.

    • n_g4 (int, default = 0): The number of G4 symmetry functions to use for encoding.

    • l (list, default = [1.0, -1.0]): List of lambda values for G4 symmetry function encoding.

    • z (list, default = [1.0]): List of zeta values for G4 symmetry function encoding.

    • g2_parameterisation (str, default = “shifted”): The strategy to use for G2 symmetry function parameterisation; Options: “shifted” or “centred”.

    • g4_parameterisation (str, default = “centred”): The strategy to use for G4 symmetry function parameterisation; Options: “shifted” or “centred”.

    • use_charge (bool, default = False): If True, includes an additional element in the vector descriptor for the charge state of the complex.

    • use_spin (bool, default = False): If True, includes an additional element in the vector descriptor for the spin state of the complex.

Example:
descriptor:
  type: wacsf
  params:
    r_min: 1.0
    r_max: 6.0
    n_g2: 16
    n_g4: 32

MACE

The Many-body Atomic Cluster Expansion (MACE) [1] descriptor represents the local atomic environment using many-body correlation functions.

[1] Ilyes. B, et al., “MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields”

Input file:

  • type: mace

  • params:

    • invariants_only (bool, default = False): If True, only returns invariant features (rotation and permutation invariant).

    • num_layers (int, default = -1): Number of interaction layers for the underlying MACE model.

    • absorber_atom_only (bool, default = False): If True, returns descriptors only for the absorber atom.

Example:
descriptor:
  type: mace
  params:
    invariants_only: True
    num_layers: 3
    absorber_atom_only: True

RDC

RDC is a descriptor for transforming a molecular system into a Radial (or ‘pair’) Distribution Curve. The RDC is - simplistically - like a histogram of pairwise internuclear distances discretised over an auxilliary real-space grid and smoothed out using Gaussians; pairs are made between the absorption site and all atoms within a defined radial cutoff.

Input file:

  • type: rdc

  • params:

    • r_min (float, default = 0.0): The minimum radial cutoff distance (in A) around the absorption site.

    • r_max (float, default = 6.0): The maximum radial cutoff distance (in A) around the absorption site.

    • dr (float, default = 0.01): The step size (in A) for the auxilliary real-space grid that the RDC is discretised over.

    • alpha (float, default = 10.0): A smoothing parameter used in a Gaussian exponent that defines the effective spatial resolution of the RDC.

    • use_charge (bool, default = False): If True, includes an additional element in the vector descriptor for the charge state of the complex.

    • use_spin (bool, default = False): If True, includes an additional element in the vector descriptor for the spin state of the complex.

Example:
descriptor:
    type: rdc
    params:
      r_min: 0.0
      r_max: 8.0
      dr: 0.01
      alpha: 10.0
      use_charge = False
      use_spin: False

pDOS

The partial Density Of States (pDOS) descriptor encodes relevant electronic information for ML models seeking to simulate X-ray spectroscopy. This approach uses a minimal basis set in conjunction with the guess (non-optimised) electronic configuration to extract and then discretised the density of states (DOS) of the absorbing atom to form the input vector.

The p-DOS descriptor is aimed at capturing the electronic properties, which directly link to the spectroscopic observ- able. To supplement this descriptor with nuclear structure information, the present descriptor can be concatenated with the wACSF descriptor.

Input file:

  • type: armsr

  • params:

    • r_min (float, default = 0.0): The minimum radial cutoff distance (in A) around the absorption site.

    • r_max (float, default = 6.0): The maximum radial cutoff distance (in A) around the absorption site.

    • e_min (float, default = -20.0): The minimum energy grid point for the pDOS (in eV)

    • e_max (float, default = 20.0): The maximum energy grid point for the pDOS (in eV)

    • sigma (float, default = 0.7): The FWHM of the Gaussian function used to broaden the pDOS obtained from pySCF.W

    • num_points (float, default = 200): The number of point over which the broadened pDOS is projected.

    • basis (str, default = “3-21g”): The basis set used by pySCF during developing the pDOS.

    • init_guess (str, default = “mminao”): The method of the initial guess used by pySCF during generation of the pDOS.

    • max_scf_cycles (float, default = 0): The number of SCF cycles used by pySCF during develop the pDOS. Smaller numbers will be closer to the raw guess, while larger number will take longer to load. Note, the warnings are suppressed and so it will not tell you if the SCF is converged. Larger numbers make this more likely, but do not gurantee it.

    • use_wacsf (bool, default = False): If True, the wACSF descriptor for the structure is also generated and concatenated onto the end after the pDOS descriptor.

    • n_g2 (int, default = 0): The number of G2 symmetry functions to use for encoding.

    • n_g4 (int, default = 0): The number of G4 symmetry functions to use for encoding.

    • l (list, default = [1.0, -1.0]): List of lambda values for G4 symmetry function encoding.

    • z (list, default = [1.0]): List of zeta values for G4 symmetry function encoding.

    • g2_parameterisation (str, default = “shifted”): The strategy to use for G2 symmetry function parameterisation; Options: “shifted” or “centred”.

    • g4_parameterisation (str, default = “centred”): The strategy to use for G4 symmetry function parameterisation; Options: “shifted” or “centred”.

    • use_charge (bool, default = False): If True, includes an additional element in the vector descriptor for the charge state of the complex.

    • use_spin (bool, default = False): If True, includes an additional element in the vector descriptor for the spin state of the complex.

Example:
descriptor:
  type: pdos
  params:
    basis: 3-21G
    init_guess: minao
    orb_type: p
    max_scf_cycles: 0
    num_points: 80
    e_min: -10.0
    e_max: 30.0
    sigma: 0.8
    use_wacsf: True
    r_min: 0.5
    r_max: 6.5
    n_g2: 22
    n_g4: 10