Experimental Strategy: Designs for complex experimental spaces

From LabAutopedia

Jump to: navigation, search
Invited-icon.jpgA LabAutopedia invited article

Authored by: James N. Cawse, Ph.D.; Cawse and Effect LLC


Designs for large, complex experimental spaces

When experimental spaces reach thousands to millions of potential points and do not contain the regularities required for more conventional designs, the best strategy is use of one of several families of nonlinear search algorithms to locate an optimum region. These typically use stochastic (random) elements as well. The optimum region can then be searched further and mapped conventionally.

Genetic Algorithms (GA's)

In GA’s, formulations are encoded as genetic type structures. An initial population is generated and evaluated, then successive populations are generated by genetic operations (mutation and crossover) from “fitter” antecedents.[1] Typical GA’s encode their recipes in simple binary code; chemical systems are typically more complicated and a “chromosome” may be required rather than a “gene”.[2]

GA routines are available in many of the more powerful mathematics and statistics software packages such as R, Matlab, and Mathematica. Unfortunately for chemists, these programs only operate on low-level data types such as binary strings and real numbers. Adapting them to the complexities of chemical systems typically requires generation of a routine that is specific to the problem at hand. One such routine is Opticat which is available on-line.[3] Other efforts are underway to develop a "program generator" which will take the description of a chemical system and automatically generate an appropriate GA.[4]

Operation of a GA requires decisions on a number of operating parameters. Studies and practical experience have indicated that there are preferred values of those parameters in chemical systems (Table1 ).[5] One of the most powerful uses of GA’s is parameter reduction; they seem to be very effective in identifying non-productive formulation components. A GA by itself will never completely eliminate a component, but tracking the component representation with a simple graph (Figure 1) will allow the experimenter to manually eliminate such components when appropriate.[6]

GAparams.png       GAfig1.png
Table 1. Preferred parameters for GA's with heterogeneous catalysts Figure 1. Differentiation of productive and non-productive catalyst components with a Genetic Algorithm[7]

Model-based technology

ProtoLife diagram.png
Figure 2. Predictive Design Technology flow diagram

New methods in this area have been shown to be more powerful than GA’s. Predictive Design Technology (PDT, Figure 2) from ProtoLife Inc. is a proprietary meta-modeling technology, choosing from a variety of predictive modeling techniques that efficiently find optimal targets in complex search spaces. Proprietary algorithms make intelligent decisions at each experimental iteration. PDT chooses the optimal type and complexity of models to use for each specific experimental system, and makes smart tradeoffs between exploration and exploitation, exploring new areas of the experimental space while exploiting accumulating data from successive experimental runs. This allows the search to gradually converge on a limited optimal region of experimental space by drawing on clues gathered in the initial stages (or “generations”), sampling for observation only a tiny fraction of all possible points. [8] PDT has been tested both in simulations and in laboratory screening campaigns for drug formulations[9] synergistic drug combinations and optimization of the amount of functional protein synthesized with a commercial in vitro synthesis kit.[10]

Space filling designs

The “opening move” in these search algorithms is one of the critical elements determining their efficiency. The classic method is simply a random selection of points from the entire experimental space. This effectively eliminates bias but does not necessarily sample the space well. Chance is lumpy, and random selection will have excessive clusters and gaps. A low discrepancy or SmartRandom selection samples the space better, as do other space filling designs (Figure 3). The Sphere Packing algorithm is probably the best choice at this time. ALL of the space filling algorithms except random require a distance metric spanning all the dimensions, which will be difficult for spaces with qualitative elements.

Figure 3. Representative space filling designs

Random.png Smart random Sphere packing Latin Hypercube
Type Random Smart Random Sphere Packing Latin Hypercube
Method Simple random selection Adds new points randomly biased to maximum distance from existing points Maximizes the minimum distance between pairs of design points Maximizes the minimum distance with even spacing of factor levels
Pros Easy to generate Much better than random. Usable in heterogeneous space Largest minimum distance between points Uniform projections on coordinate axes
Cons Bad tendency to cluster and leave gaps Not good for small numbers of points – some clusters & gaps Probably the overall best space filling design Also very good

Lattice Designs

Lattice designs are completely non-random methods which guarantee coverage of quantitative spaces. In these spaces, the goal is:

Place experimental points in such a way that the probability of one or more points being in the desired region is maximized.

“Packing” and “covering” lattices are well known, and are uniformly much better than the intuitive Cartesian grid. (Figure 4 ) These designs are apparently the best way to search truly “rough” surfaces.[11] Unfortunately, the requirement that such a search needs ~10D points makes searches beyond 4-5 dimensions untenable in the laboratory world.

Figure 4. Cartesian lattice (inefficient) Face Centered Cubic lattice (~40% more efficient)


  2. D. Wolf, M. Baerns, Evolutionary Strategy for the Design and Evaluation of High Throughput Experiments, in: J. N. Cawse (Ed.), Experimental Design for Combinatorial and High Throughput Materials Development, John Wiley & Sons, Hoboken, NJ, 2003, pp. 147-162.
  4. M. Holena, D. Linke, Uwe Rodemerck, Generator approach to Evolutionary Optimization of Catalysts and its Integration with Surrogate Modeling, Catalysis Today, accepted for publication
  5. Pereira, S.R.M, Effect of the genetic algorithm parameters on the optimization of heterogeneous catalysts, QSAR Combin. Sci. 24(2):45-47 (2005)
  6. J. N. Cawse, Experimental Designs in High Throughput Systems, in: B. Narasimhan, S. K. Mallapragada, M. D. Porter (Eds.), Combinatorial Materials Science, John Wiley & Sons, Hoboken, NJ, 2007, pp. 29.
  7. Reprinted, with permission, from Cawse, J.N., Baerns, M., & Holena, M., J. Chem. Inform. Comput. Sci. 44 (2004); copyright 2004 American Chemical Society, Washington, DC
  8. J.N. Cawse, G. Gazzola, N. Packard, Efficient discovery and optimization of complex high-throughput experiments, Cat. Tod. 2010, accepted for publication
  9. Caschera, F. et al. (2010). Automated discovery of novel drug formulations using predictive iterated high-throughput experimentation. PLoS One, 5 (1): e8546. doi:10.1371/journal.pone.0008546.
  10. F. Caschera et. al., Coping with complexity - machine learning optimization of cell-free protein synthesis, submitted for publication
  11. F. A. Hamprecht and E. Agrell, Exploring a Space of Materials: Spatial Sampling Design and Subset Selection, in J.N.Cawse, Ed., Experimental Design for Combinatorial and High Throughput Materials Development, Wiley Interscience, 2002, 277-306
Click [+] for other articles on 
The Market Place for Lab Automation & Screening  Automation Software