honegumi-logo


Honegumi ('ho-nay-goo-mee'), which means "skeletal framework" in Japanese, is a package for interactively creating minimal working examples for advanced Bayesian optimization topics.

Honegumi β€” Accelerating the adoption of Bayesian optimization for science arXiv

Tip

New to Bayesian optimization? Start with A Gentle Introduction to Bayesian Optimization and explore our concept guides and coding tutorials on key optimization principles.

Real-world chemistry and materials science optimization tasks are complex! Noise, objectives, tasks, parameters, parameter types, and constraints riddle our optimization campaigns. However, applying state-of-the-art algorithms to these tasks isn’t trivial, even for veteran materials informatics practitioners. Additionally, Python libraries can be cumbersome to learn and use serving as a barrier to entry for interested users. To address these challenges, we present Honegumi, an interactive script generator for materials-relevant Bayesian optimization using the Ax Platform.

Create your optimization script using the grid below! Select options from each row to generate a code template. Hover over the β“˜ icons to get more information and see whether it’s a good choice.

Interactive Grid Example

Honegumi

Choose between single and multi-objective optimization based on your project needs. Single objective optimization targets one primary goal (e.g. maximize the strength of a material), while multi-objective optimization considers several objectives simultaneously (e.g. maximize the strength of a material while minimizing synthesis cost). Select the option that best aligns with your optimization goals and problem complexity.
Choose between three surrogate model implementations: Default uses a standard Gaussian process (GP), Custom enables user-defined acquisition functions and hyperparameters, and Fully Bayesian implements MCMC estimation of GP parameters. The Default option provides a robust baseline performance, Custom allows advanced users to tailor the optimization process, while Fully Bayesian offers deeper uncertainty exploration at higher computational cost. Consider your optimization needs and computational resources when selecting this option.
Choose between single and multi-task optimization based on your experimental setup. Single-task optimization focuses on one specific task, while multi-task optimization leverages data from multiple related tasks simultaneously (e.g. optimizing similar manufacturing processes across different production sites). Multi-task optimization can improve efficiency by sharing information between tasks but requires related task structures. Consider whether your tasks share underlying similarities when making this selection.
Choose whether to include a categorical variable in the optimization process (e.g. dark or milk chocolate chips in a cookie recipe). Including categorical variables allows choice parameters and their interaction with continuous variables to be optimized. Note that adding categorical variables can create discontinuities in the search space that are difficult to optimize over. Consider the value of adding categorical variables to the optimization task when selecting this option.
Choose whether to apply a sum constraint over two or more optimization variables (e.g. ensuring total allocation remains within available budget). This constraint focusses generated optimization trials on feasible candidates at the cost of flexibility. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to implement an order constraint over two or more optimization variables (e.g. ensuring certain tasks precede others). This constraint focusses generated optimization trials on variable combinations that follow a specific order. Excluding the constraint offers flexibility in variable arrangements but may neglect important task sequencing or value inequality considerations. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to implement a linear constraint over two or more optimization variables such that the linear combination of parameter values adheres to an inequality (e.g. 0.2*x_1 + x_2 < 0.1). This constraint focusses generated optimization trials on variable combinations that follow an enforced rule at the cost of flexibility. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to include a composition constraint over two or more optimization variables such that their sum does not exceed a specified total (e.g. ensuring the mole fractions of elements in a composition sum to one). This constraint is particularly relevant to fabrication-related tasks where the quantities of components must sum to a total. Consider whether such a constraint reflects the reality of variable interactions when selecting this option.
Choose whether to apply custom thresholds to objectives in a multi-objective optimization problem (e.g. a minimum acceptable strength requirement for a material). Setting a threshold on an objective guides the optimization algorithm to prioritize solutions that meet or exceed these criteria. Excluding thresholds enables greater exploration of the design space, but may produce sub-optimal solutions. Consider whether threshold values reflect the reality or expectations of your optimization task when selection this option.
Choose whether to fit the surrogate model to previous data before starting the optimization process. Including historical data may give your model a better starting place and potentially speed up convergence. Conversely, excluding existing data means starting the optimization from scratch, which might be preferred in scenarios where historical data could introduce bias or noise into the optimization process. Consider the relevance and reliability of your existing data when making your selection.
Choose whether to perform single or batch evaluations for your Bayesian optimization campaign. Single evaluations analyze one candidate solution at a time, offering precise control and adaptability after each trial at the expense of more compute time. Batch evaluations, however, process several solutions in parallel, significantly reducing the number of optimization cycles but potentially diluting the specificity of adjustments. Batch evaluation is helpful in scenarios where it is advantageous to test several solutions simultaneously. Consider the nature of your evaluation tool when selecting between the two options.
Choose whether to include visualization tools for tracking optimization progress. The default visualizations display key performance metrics like optimization traces and model uncertainty (e.g. objective value convergence over time). Including visualizations helps monitor optimization progress and identify potential issues, but may add minor computational overhead. Consider whether real-time performance tracking would benefit your optimization workflow when selecting this option.

                    
                

What’s the scope of honegumi?

Similar to PyTorch’s installation docs, users interactively toggle the options to generate the desired code output. These scripts are unit-tested, and invalid configurations are crossed out. This means you can expect the scripts to run without throwing errors. Honegumi is not a wrapper for optimization packages; instead, think of it as an interactive tutorial generator. Honegumi is the first Bayesian optimization template generator of its kind, and we envision that this tool will reduce the barrier to entry for applying advanced Bayesian optimization to real-world science tasks.

Note

If you like this tool, please consider starring it on GitHub. If you’re interested in contributing, reach out to sterling.baird@utoronto.ca 😊

Concept Docs and Tutorials

Understanding Bayesian optimization requires both theoretical knowledge and practical experience. Our documentation is structured to support this dual approach. The concept guides provide in-depth explanations of fundamental principles, from the basics of single-objective optimization to advanced topics like multitask optimization and fully Bayesian Gaussian process models. These theoretical foundations are complemented by hands-on coding tutorials that demonstrate real-world applications across various materials science domains.

The tutorials walk you through practical scenarios such as optimizing 3D printed materials, developing biodegradable polymers with specific strength requirements, and efficiently screening anti-corrosion coatings. Each tutorial bridges theory and practice, showing how to apply advanced optimization concepts to solve tangible engineering challenges. Whether you’re new to Bayesian optimization or looking to implement sophisticated multi-objective strategies, our documentation provides the guidance needed to successfully apply these techniques to your specific materials science challenges.

A Perfect Pairing with LLMs

Tip

Use Honegumi with ChatGPT to create non-halucinatory, custom Bayesian optimization scripts. See an example ChatGPT transcript and the videos below.

While Large Language Models excel at pattern recognition, they often struggle to create reliable Bayesian optimization scripts from scratch. Honegumi complements LLMs by providing validated templates that can then be customized through LLM assistance. Watch below as we demonstrate this workflow by optimizing a cookie recipe using Honegumi and ChatGPT:

Overview

Tutorial #2 walkthrough

API Usage

Have a look at our API usage tutorials

Citing

If you find Honegumi useful, please consider citing:

Baird, Sterling G., Andrew R. Falkowski, and Taylor D. Sparks. β€œHonegumi: An Interface for Accelerating the Adoption of Bayesian Optimization in the Experimental Sciences.” arXiv, February 4, 2025. https://doi.org/10.48550/arXiv.2502.06815.

@misc{baird_honegumi_2025,
  title = {Honegumi: {{An Interface}} for {{Accelerating}} the {{Adoption}} of {{Bayesian Optimization}} in the {{Experimental Sciences}}},
  shorttitle = {Honegumi},
  author = {Baird, Sterling G. and Falkowski, Andrew R. and Sparks, Taylor D.},
  year = {2025},
  month = feb,
  number = {arXiv:2502.06815},
  eprint = {2502.06815},
  primaryclass = {cs},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2502.06815},
  archiveprefix = {arXiv},
  keywords = {Computer Science - Machine Learning,Condensed Matter - Materials Science},
}

Zenodo snapshots of the GitHub releases (beginning with v0.3.2) are available at DOI

Contents

Indices and tables