honegumi-logo

Star Follow @sgbaird Watch Issue Discuss

Honegumi (“ho-nay-goo-mee”), which means “skeletal framework” in Japanese, is a package for interactively creating minimal working examples for advanced Bayesian optimization topics.

Tip

If you’re new to Bayesian optimization, watch A Gentle Introduction to Bayesian Optimization

Real-world chemistry and materials science optimization tasks are complex! Here are some example features of these kinds of tasks:

Topic

Description

Noise

Repeat measurements are stochastic

Multi-fidelity

Some measurements are higher quality but much more costly

Multi-objective

Almost always, tasks have multiple properties that are important

High-dimensional

Like finding the proverbial “needle-in-a-haystack”, the search spaces are enormous

Constraints

Not all combinations of parameters are valid (i.e., constraints)

Mixed-variable

Often there is a mixture of numerical and categorical variables

However, applications of state-of-the-art algorithms to these materials science tasks have been limited. Advanced implementations are still challenging, even for veteran materials informatics practitioners. In addition to combining multiple algorithms, there are other logistical issues, such as using existing data, embedding physical descriptors, and modifying search spaces. To address these challenges, we present Honegumi, an interactive script generator for materials-relevant Bayesian optimization using the Ax Platform.

Note

Honegumi (骨組み), a Japanese word meaning skeletal framework, is technically pronounced “ho-nay-goo-mee”, but you can also refer to this tool as “honey gummy” to make it easy to remember 😉

Similar to PyTorch’s installation docs, users interactively toggle the options to generate the desired code output. These scripts are unit-tested, and invalid configurations are crossed out. This means you can expect the scripts to run without throwing errors. Honegumi is not a wrapper for optimization packages; instead, think of it as an interactive tutorial generator. Honegumi is the first Bayesian optimization template generator of its kind, and we envision that this tool will reduce the barrier to entry for applying advanced Bayesian optimization to real-world materials science tasks. It also pairs well with LLMs!

Interact with Honegumi using the grid below. Select one option per row and watch the template dynamically appear. Click the corresponding colab badge to open a self-contained Google Colab notebook for the selected script. Click the corresponding github badge to view the script source code directly. For example, if you want a script that optimizes multiple objectives simultaneously (multi-objective) as a function of many parameters (high-dimensional), you would select multi from the objective option row and FULLYBAYESIAN from the model row. Hover your mouse over the 🛈 icon to the right of each row to learn more about each option.

Interactive Grid Example
Logo

Honegumi

Choose between single and multi-objective optimization based on your project needs. Single objective optimization targets one primary goal (e.g. maximize the strength of a material), while multi-objective optimization considers several objectives simultaneously (e.g. maximize the strength of a material while minimizing synthesis cost). Select the option that best aligns with your optimization goals and problem complexity.
Choose between frequentist and fully bayesian implementations of the gaussian process (GP) surrogate model. The frequentist GP model, which is often the default in BO packages, offers efficiency and speed. The fully Bayesian GP models GP parameters as random variables through MCMC estimation, providing a deeper exploration of uncertainty. The fully bayesian treatment has historically provided better closed loop Bayesian optimization performance, but comes at the cost of higher computational demand. Consider your computational resources and the complexity of your optimization task when making your selection.
Choose whether to fit the surrogate model to previous data before starting the optimization process. Including historical data may give your model a better starting place and potentially speed up convergence. Conversely, excluding existing data means starting the optimization from scratch, which might be preferred in scenarios where historical data could introduce bias or noise into the optimization process. Consider the relevance and reliability of your existing data when making your selection.
Choose whether to apply a sum constraint over two or more optimization variables (e.g. ensuring total allocation remains within available budget). This constraint focusses generated optimization trials on feasible candidates at the cost of flexibility. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to implement an order constraint over two or more optimization variables (e.g. ensuring certain tasks precede others). This constraint focusses generated optimization trials on variable combinations that follow a specific order. Excluding the constraint offers flexibility in variable arrangements but may neglect important task sequencing or value inequality considerations. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to implement a linear constraint over two or more optimization variables such that the linear combination of parameter values adheres to an inequality (e.g. 0.2*x_1 + x_2 < 0.1). This constraint focusses generated optimization trials on variable combinations that follow an enforced rule at the cost of flexibility. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to include a composition constraint over two or more optimization variables such that their sum does not exceed a specified total (e.g. ensuring the mole fractions of elements in a composition sum to one). This constraint is particularly relevant to fabrication-related tasks where the quantities of components must sum to a total. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to include a categorical variable in the optimization process (e.g. dark or milk chocolate chips in a cookie recipe). Including categorical variables allows choice parameters and their interaction with continuous variables to be optimized. Note that adding categorical variables can create discontinuities in the search space that are difficult to optimize over. Consider the value of adding categorical variables to the optimization task when selecting this option.
Choose whether to apply custom thresholds to objectives in a multi-objective optimization problem (e.g. a minimum acceptable strength requirement for a material). Setting a threshold on an objective guides the optimization algorithm to prioritize solutions that meet or exceed these criteria. Excluding thresholds enables greater exploration of the design space, but may produce sub-optimal solutions. Consider whether threshold values reflect the reality or expectations of your optimization task when selection this option.
Choose whether to perform single or batch evaluations for your Bayesian optimization campaign. Single evaluations analyze one candidate solution at a time, offering precise control and adaptability after each trial at the expense of more compute time. Batch evaluations, however, process several solutions in parallel, significantly reducing the number of optimization cycles but potentially diluting the specificity of adjustments. Batch evaluation is helpful in scenarios where it is advantageous to test several solutions simultaneously. Consider the nature of your evaluation tool when selecting between the two options.

                
            

Note

If you like this tool, please consider starring it on GitHub. If you’re interested in contributing, reach out to sterling.baird@utoronto.ca 😊

A Perfect Pairing with LLMs

Tip

Use Honegumi with ChatGPT to create non-halucinatory, custom Bayesian optimization scripts. See below for more info.

LLMs are good at recognizing patterns but really bad at suggesting Bayesian optimization scripts from scratch. Since Honegumi is really good at programatically giving valid Bayes opt scripts, you can use Honegumi to get a template and then ask an LLM to adapt it to your use case. The two-minute video below shows how a Honegumi template can be adapted using an LLM (in our case, ChatGPT Plus) to a cookie taste optimization as a function of flour, sugar, and butter content.

Contents

Indices and tables