All published articles of this journal are available on ScienceDirect.
The Strengths and Challenges of New Approach Methods in Genetic Toxicity - Part II
Abstract
Genetic toxicology originated in 1973 with the Ames test, but it has evolved significantly since then. In the early 2000s, there was great promise for the reduction, refinement, and replacement of animal testing; however, the acceleration of these changes has only occurred over the past 5-7 years. With the advent of new technologies in the laboratory, such as organs-on-a-chip, 3D systems, toxicogenomics, reverse dosimetry/qIVIVE, and PBPK/PBTK/mathematical modeling, along with advances like induced pluripotent stem cell technology, CRISPR-Cas9 gene editing, automation, advanced visual imaging, big data throughput, and machine learning (ML), there is an increasing shift away from animal testing. Part I of the study describes the current genetic toxicity tests required by regulatory agencies for the approval of pharmaceuticals, medical devices, and industrial chemicals, as well as their limitations. This part explores how new approach methods (NAMs), already in use or in qualification/validation, can help bridge those gaps, acknowledging that such assays must meet rigorous standards for fitness for purpose, domain of applicability, and context of use. Additionally, the status of regulatory acceptance and implementation is discussed.
1. INTRODUCTION
In Part I, the currently accepted OECD Test Guideline (TG) methods for genetic toxicity testing, along with their strengths and shortcomings, were presented. It was also discussed how the ‘linearity at low dose’ concept (LNT) came into being and influenced the development of genetic toxicity testing in the 20th century. Since then, it has been assumed that there is no safe dose of a carcinogen, as extrapolating to zero every dose has a finite, non-zero risk [1]. Currently, the risk assessment of carcinogens takes the LNT into account by asserting that a dose resulting in 1.5 cancers or fewer in one million humans is not considered a likely risk for developing cancer (“with uncertainty spanning perhaps a magnitude, for exposure occurring over a lifetime”) [2]. Although some carcinogens do have a threshold [3], thus far, LNT is used for risk assessment purposes [3].
A pressing question for toxicologists became: Which of the identified genotoxic and mutagenic compounds are carcinogens, and how might they be best identified? As reported in Part I, the Ames assay became the ‘gold standard’ for assessing the ability of a substance to cause reversion mutations. Other types of assays soon proliferated to assess clastogenic effects, such as sister chromatid exchange (SCE) and forward mutations. These have become codified through the standardization and harmonization procedures of the Organization for Economic Cooperation and Development (OECD, https://www.oecd.org), whereby participatory regulatory agencies will accept tests performed under the approved OECD Test Guidelines (TGs). Applicants thus know which tests to apply when seeking permission to market a new pharmaceutical, agricultural, or industrial chemical, and exactly how they must be performed. Standards are made available for the identification of substances through methods, such as HPLC or MS, as well as standard quality control measures for use in the assays. These processes and procedures have enabled the acceptance of results by regulators in all participating countries, submitted by any applicant, with confidence that they can be compared with other previous results and judged objectively by criteria accepted worldwide, referred to as the Mutual Acceptance of Data (MAD).
Genetic toxicologists have come to realize that they cannot test all the substances they need to in time to approve applications under strict timelines. The burdens of time and resources (not the least of which are animal lives) threaten to become overwhelming. Thus, there is an emerging energetic movement away from animal testing as required by conventional tests toward a new paradigm of in vitro testing1, often referred to as new approach methodologies or NAMs, which are faster, less resource-intensive, and can be scaled up to high throughput to test hundreds to thousands of compounds in record time. Skepticism exists regarding these methods, which are gradually gaining wider usage. Several are under review by OECD and are expected to become approved TGs. Increasingly, data support the notion that NAMs can produce results on par with or better than traditional in vitro or in vivo genotoxicity tests, which will be examined in detail in this study.
There are new concepts to accompany genotoxicity NAMs, such as the 10 key characteristics of carcinogens, which are the abilities of an agent to 1) act as an electrophile either directly or after metabolic activation; 2) be genotoxic; 3) alter DNA repair or cause genomic instability; 4) induce epigenetic alterations; 5) induce oxidative stress; 6) induce chronic inflammation; 7) be immunosuppressive; 8) modulate receptor-mediated effects; 9) cause immortalization; and 10) alter cell proliferation, cell death, or nutrient supply [4]. These characteristics were defined by consensus to develop a framework for evaluating mechanistic data on candidate carcinogens and their effects on human health. Another important new concept is that of Adverse Outcome Pathways (AOPs), which represent a conceptual framework describing the sequence of biological events starting from a molecular initiating event (MIE) and leading to an adverse outcome (AO), triggered by a stressor, such as a xenobiotic. The use of information about how a drug or other substance produces an effect in the body, such as the receptor or molecular pathway that is targeted, describes mechanisms of action (MOAs). These are elements of AOPs and are used to develop Integrated Approaches to Testing and Assessment (IATAs). IATAs involve the combination of many sources of information in order to evaluate the safety or hazard of a substance [5]. These recently developed concepts bring together all the elements of the new framework, including OMICS technologies, in silico technologies (e.g., Benchmark Dose Modeling and AI-aided modeling approaches), and the literature/in vitro-derived internal versus the physically measured external dosages. Together, they are utilized in the quantitative in vivo to in vitro extrapolation (qIVIVE) method, which proceeds from in vitro toxicity results, physiological data, and physiologically based pharmacokinetic modeling (PBPK) to derive human exposure levels that may be considered safe, without the need for additional animal testing. In this regard, an excellent review was carried out by Lu et al. [5].
The purpose of this current review is to survey and describe new approaches in genetic toxicity testing, providing a side-by-side comparison of old and new methods with references. This allows interested scientists to assess which methods are most suitable for their projected needs and understand how the field is evolving in response to regulatory requirements and acceptance.
2. METHODS
The searches were conducted in PubMed, Scopus, Web of Science, and EMBASE. The literature was searched using the following strings:
(“new approach method*” OR “NAMs”) OR (genetox* AND genetic AND toxic*) OR (“in vitro”)
(“new approach method* OR “NAMs”) AND (strengths OR shortcomings) AND (advantages OR disadvantages) AND genetic AND toxic*AND (“animal replacement” OR 3R’s), with or without “short term”, with or without “mutation*”, with or without “technology”, with or without “unconventional” and variations of these terms; and the following phrases were used: strengths and weaknesses of new approach methodologies for genetic toxicity testing, challenges of new approach methodologies for genetic toxicity testing, challenges of animal replacement in genetic toxicity testing. Afterward, the snowball technique was used to expand on the results obtained.
‘New Approach Methodologies’ were restricted to those referenced from 2014 to 2024, and lacked internationally harmonized standardization and validation, i.e., non-OECD and non-ECVAM approved TGs (although several are in process). Citations from abstracts, proceedings, presentations, or white papers were not included. In vivo study methods were not included (but some methods are a mixture of in vitro and in vivo and were included).
This review discussed the regulatory status of NAMs, drawing on professional knowledge and experience, as well as research from the literature, to ensure the most up-to-date information. Information about the qIVIVE process, reverse dosimetry, AED/BER, PBK modeling, and BMD modeling was gleaned from years of experience and knowledge about the current state of the art in refining, reducing, and replacing animals in toxicity testing.
The material and figure describing the ONTOX project are included by permission of ONTOX.
1All tests of living organisms require a sample of the organism although not all require the ultimate sacrifice.
2.1. Shortcomings and Strengths of NAMs
2.1.1. In vitro (yeast) DNA Deletion (DEL) Recombination Assay (Single Test Alternative to Genotoxicity Test Battery)
2.1.1.1. Principle of the Assay
The yeast DNA deletion (DEL) recombination assay has been proposed as a simple and rapid method to measure the reversion frequency in the HIS3 gene through homologous intrachromosomal recombination [6-9], offering a high degree of both sensitivity and specificity to carcinogens.
Ku proposed adapting it, along with a toxicogenomics add-on for MOA determination (and possibly a confirmatory in vivo assay) as an alternative to the ICH S2 genotoxicity test guidelines, which include both in vivo and in vitro testing [7]. At that time, cell transformation assays represented the only in vitro alternative; however, they were inadequate and misleading, and the ICH test battery had been tested using large databases and found to have limited predictive power for “carcinogenicity outcomes, which have genotoxic relevance”. The argument was that beyond the initial test set used to develop the ICH battery, there was little actual predictive utility, as demonstrated by retrospective analysis of marketed drugs. The frequent occurrence of false positives in standard in vitro assays was also mentioned as a disadvantage. Therefore, a single in vitro test was proposed to detect mutations of carcinogenic relevance, which would be widely applicable to various test situations (including contaminants, industrial chemicals, drugs, and candidate biologics) and would mimic human Phase I and II metabolism. Therefore, and to additionally provide MOA information, the system should possess a genome highly like that of humans. Additionally, it should be amenable to high throughput. Several arguments supporting the association between DEL recombination in yeast and carcinogenesis, as well as the improved reliability of detecting true tumorigens [7], were put forward.
2.1.1.2. Strengths and Weaknesses
The system's strengths include its ability to detect direct- and indirect-acting carcinogens, aneugens, and a wide variety of DNA lesions. It is sensitive, specific, simple, and fast; with add-ons, it can also yield information on the mechanism of action (MOA). At 11 days, the assay length is intermediate.
Gardner [9] emphasized that Saccharomyces cerevisiae is particularly well-suited for analyzing gene function due to its ease of manipulation (deletion, mutation, and tagging by PCR) through facile homologous recombination with short stretches of sequence homology. However, it is a disadvantage that the in vitro yeast DEL recombination assay is not a human or a mammalian system, and the results are therefore an extrapolation based on analogy. However, the metabolism is a good mimic for human Phase I and II metabolism.
Following the DEL recombination assay, transcriptomic analysis should be carried out to interpret the MOA, and potentially, an in vivo confirmatory assay could be carried out if the results are equivocal.
2.1.2. 3D Cell Culture Models
2.1.2.1. Principle of the Assay
The EpiDerm™ tissue model [10-12] consists of 3-dimensional normal human epidermal keratinocytes (NHEK) cultured on tissue culture inserts and is ECVAM validated and accepted under OECD test guidelines. A Mat Tek EpiAlveolar™ 3D tissue model has also been developed (Charles River, 2024) for the detection of fibrosis-causing agents. Fibrosis can lead to downstream cancer outcomes in an epigenetic fashion; therefore, this represents another viable transformation test method.
2.1.2.2. Strengths and Weaknesses
Due to their ability to control all facets of the experiment, these systems offer the advantages of in vivo tests while avoiding associated problems, such as uncertainty about whether the toxicant has reached the target organ and at what concentration. Some researchers [13] have grown human bronchial epithelial cells (HBEC) at the air-liquid interface, but without the addition of other cell types, such as immune cells (macrophages), to study the toxicity of indoor air particulate matter. The addition of multiple cell types, such as goblet cells, a secretory cell type of the respiratory airway, or Langerhans/dendritic cells, an immune component of 3D reconstructed skin, improves the functionality and predictive capability of these models. Information about the MOA of a substance can also be gleaned from these models. These models have the advantages of directly visualizable and quantifiable outcomes that are comparable to traditional histopathology. The systems are versatile, being manipulable in many ways [14].
Another distinct advantage of 3D cultures is that they may detect changes in cells leading to cancer that are not normally detectable using other types of genetic toxicity assays. Either direct or indirect (i.e., epigenetic) changes, such as those associated with phototoxicity, wound healing, fibrosis, and inflammation, leading to cancer, can be detected and visualized.
These systems have the disadvantage of not being a high-throughput process in any respect, and are time-consuming, labor-intensive, and technologically demanding.
2.1.3. 3D Reconstructed Skin (RS) Comet Assay
2.1.3.1. Principle of the Assay
Recently accepted for the OECD TG development program, this assay was validated by a Cosmetics Europe-led ‘round robin’ laboratory validation project intended to address the lack of alternatives to traditional in vivo genotoxicity testing. This is because, under EU rules for cosmetics, an in vitro positive test result would rule out the commercial use of a substance without further confirmatory in vivo testing being permitted. This effort also supports the development of dermal genotoxicity assays [14-16]. It aims to evaluate the performance of the test using the Phenion® Full-Thickness skin model in various regulatory, academic, and industry laboratory settings. The researchers applied chemicals three times over a 48-hour period, then isolated keratinocytes and fibroblasts, which were subjected to electrophoresis using the standard Comet assay, with the percent tail DNA as the recorded outcome. The experiment was conducted on 32 substances in a blinded manner. Results were evaluated by a statistician and then decoded [14].
2.1.3.2. Strengths and Weaknesses
The assay was highly predictive (sensitivity 80%), specific (97%), and accurate (92%). Intra- and inter-laboratory reproducibility were 93% and 88%, respectively. It was asserted that the method is useful for confirming the results of standard genotoxicity assays, such as the Ames test, and can fulfill EU Cosmetics Regulation EC No. 1223/2009 requirements that ban animal testing. It can also confirm in vivo results under REACH.
2.1.4. Reconstructed Skin Micronucleus (RSMN)
2.1.4.1. Principle of the Assay
This assay combines the micronucleus (MN) assay with the EpiDerm™ three-dimensional in vitro reconstructed skin (RS) model. RSMN is intended for dermally applied products, not as a stand-alone assay, but rather as a follow-up to verify the results of a standard genotoxicity assay, and it is accepted by European regulatory agencies [17-19].
2.1.4.2. Strengths and Weaknesses
Validation studies have demonstrated good transferability, inter- and intra-laboratory reproducibility, specificity (87%), and sensitivity (65%). However, sensitivity was further increased to 80% by the addition of a 72-hour treatment to resolve equivocal results. In combination with the 3D skin comet assay, the assay sensitivity increased to 92%. Fluorescently labelled cells are visually scored for the presence of micronuclei in binucleated cells; automation may speed the process.
Some of the advantages include topical application of the test substances, the relative rapidity of the test (total treatment time of 48 hours), and the fact that it has been thoroughly validated. Another advantage is that compounds testing negative after 48 hours can be easily retested up to 72 hours, which was found to increase test sensitivity. These qualities are likely to result in significantly lowered resource requirements when measured against traditional animal skin testing. This method is human-based and does not utilize cells of animal origin, but it complements other methods that may employ animal-based components.
2.1.5. Bhas 42 Cell Transformation Assay (Bhas 42 CTA)
2.1.5.1. Principle of the Assay
Also in the OECD TG pipeline, the Bhas 42 CTA is a short-term, sensitive assay for the detection of chemical carcinogenicity. It is not a genetic toxicity assay per se, but it can assess the potential of a substance to cause changes to cells that might signal potential nongenotoxic carcinogenesis [20]. As a modification of the NIH 3T3 method, it was developed through the efforts of several labs [21-23] and later validated by an inter-laboratory study [22] and an international consortium [24-26].
Sasaki et al. [26] described the method of using v-Ha-ras gene-transfected mouse BALB/c 3T3 A31-1-1 cells to determine whether a chemical is an initiating or promoting (non-genotoxic) carcinogen. However, the method is not used to distinguish between genotoxic and non-genotoxic chemicals, but to detect carcinogenicity regardless of genotoxicity. The Bhas 42 cells were developed from BALB/c 3T3 cells through transfection with plasmid pBR322 containing Ha-MuSV-DNA, clone H1 (v-Ha-ras) [26-28], and transformed using 12-O-tetradecanoylphorbol-13-acetate (TPA).
The two components of the assay initially were termed the initiation activity assay and the promotion activity assay but are now termed the ‘proliferation phase’ test to address the late initiation stage that the test assesses, and the ‘stationary phase’ test to define the proliferative stage where cells are treated at the stationary phase, and this provides a growth advantage for anomalous cells.
2.1.5.2. Method
These two phases vary in terms of time and treatment conditions. In the first component, cells are seeded at 4 x 103 cells/well (day 0) and treated early in the assay period (days 1-4) only. This allows target cells to undergo several rounds of division before contact inhibition occurs, allowing fixation of DNA mutations. In the promotion component, cells are seeded at 1.4 x 10^3 cells/well and treated at sub-confluence (days 4-14), then continued without further treatment for a total of 21 days.
The use of the stationary phase test is intended to detect chemical compounds that can act as tumor promoters. However, they are considered negative or equivocal in the Ames assay. For those compounds that are positive in the first or proliferation phase, the Bhas 42 CTA can serve as a confirmatory assay. Compounds positive in both components are considered ‘complete carcinogens’. Currently, this assay has been commercialized and is available from multiple sources as a service or in kit form [29], and has been undergoing OECD TG acceptance for some time.
2.1.5.3. Validation
Ohmori et al. have since measured gene expression over time during the cellular transformation of Bhas 42 cells by TPA [30] and described the pathways and specific gene changes observed. Guichard et al. [31] then evaluated whether a 12-gene panel could predict the cell transformation potential of tumor-promoting agents, using the Bhas 42 CTA. They tested 12 genes that had previously been shown to be altered during transformation using either silica nanoparticles or TPA. Four soluble transforming agents (mezerein, methylarsonic acid, cholic acid, quercetin) were tested, and it was found that one (mezerein) modified all 12 genes, two (methylarsonic acid and cholic acid) gave an incomplete signature, sharing some gene changes but not all, and one (quercetin) induced no change in the 12 genes but induced cytotoxicity. Thus, at least for these four agents, they were unable to predict the signature of a transforming agent using the 12-gene panel. They hypothesized that these agents used different cellular pathways or molecular initiating events and thus could not be classed together using a single gene expression pattern.
Masumoto et al. [32] developed a trained convolutional neural network (CNN) for the automated determination of transformed foci in Bhas 42 cells, which exhibited an AUC of 0.95 and significantly outperformed conventional classification methods, as learned using the OECD guidance document. This was true even using untrained images. An important advantage is that CNN does not require feature extraction and can learn feature extraction from the data, thus reducing the time taken to classify transformed or non-transformed foci and the error rate in classification.
2.1.5.3.1. Strengths and Weaknesses
For an in vitro method, the assay length is somewhat lengthy (21 days), meaning that repeated studies could become quite time-consuming. As with all cell culture methods, any significant deviations that occur can require a complete restart of the procedure. Bhas 42 CTA is not a standalone assay for the detection of genetic toxicity; it is used as a confirmatory assay only for compounds negative or equivocal in the Ames assay. It can differentiate tumor promoters (both genotoxic and non-genotoxic) from non-tumor promoters, which is a useful approach but has limited application.
Advantages include its sensitivity and the ability to determine the transforming potential of a substance without an initiator, as the cell line already contains v-Ha ras. It reduces the time to correctly classify a transformed versus a non-transformed focus.
2.1.6. ToxTracker®
2.1.6.1. Principle of the Assay
Originally developed by Hendriks et al. [33-37], ToxTracker is a fluorescence-based assay that measures the activation of six reporter systems. The assay uses mouse embryonic stem cells (mESC) and detection by flow cytometry in a 96-well plate format [38].
2.1.6.2. Method
The first step is to determine the appropriate dose range by exposing the cells to multiple concentrations in a serial dilution, up to a maximum concentration that produces 50-75% cytotoxicity, or if not reached, 1 mg/mL or the maximal soluble concentration. In a 96-well plate, five concentrations plus positive and negative or vehicle controls are applied for 24 hours, followed by measurement of relative mean fluorescence in the treated vs. (vehicle) control wells, corrected for relative cell count.
Like the Ames and in vitro MN assays, the ToxTracker assay relies on metabolic activation using rat S9 liver homogenate. The Hendriks protocol specified co-treatment of cells with compounds and S9 mix for 3 to 4 hours, followed by recovery for 17 to 24 hours, and then detection. However, this procedure required a significant recovery period due to S9 toxicity. Subsequently, others [38, 39] modified the procedure to increase sensitivity. Their modification reduced the concentration of S9, increased incubation to 24 hours, and specified no recovery period, which apparently produces less interference with assay results.
2.1.6.2.1. Strengths and Weaknesses
ToxTracker can detect several different forms of cellular damage. The two major reporter constructs predicting genotoxicity in the ToxTracker assay are Bscl2-GFP (activated upon the formation of bulky DNA adducts, which subsequently inhibits DNA replication) and Rtkn-GFP (activated upon the formation of DNA double-strand breaks). Other types of damage that are detectable include oxidative stress (Srxn1, Blvrb reporters) and protein damage (Ddit3 reporter), which constitute non-genotoxic mechanisms. Btg2 reporter induction may signal cell cycle arrest or general genotoxic stress. Together, the responses can differentiate between direct and indirect DNA damage and provide information about the specific pathways involved [40].
In a recent interlaboratory validation study, seven labs tested 64 chemicals (both genotoxic and non-genotoxic) using OECD TG 34 and achieved intralaboratory reproducibility of 73 to 98% and interlaboratory reproducibility of 83%. The sensitivity of the assay was 84.4%, and the specificity was 91.2% [39].
The assay requires metabolic activation and utilizes mouse embryo donors. The maximum soluble concentration is 1 mg/mL for some compounds, which may make it challenging to find a concentration that does not cause cytotoxicity, is soluble, and yet is sufficiently concentrated to produce a significantly measurable effect in the assay.
2.1.7. MultiFlow® and MicroFlow® DNA Damage Assays
2.1.7.1. Principle of the Assay
Bryce et al. [41-46] developed a miniaturized flow cytometry-based assay that automates MN scoring (included in OECD TG 487) and a multiplexed flow cytometric-based assay that measures phosphorylation of histone H3 (p-H3; mitosis marker), phosphorylation of H2AX at serine 139 (γH2AX; double strand DNA breaks), nuclear p53 content (p53 translocation marker, response to DNA damage), frequency of 8n cells (marker of polyploidization), and nuclei counts (cell enumeration) for evaluation of cellular genotoxicity.
2.1.7.2. Method
A sophisticated data analysis strategy is employed, utilizing multinomial logistic regression (MLR) to generate probability scores, which are then used to classify chemicals by mechanism of action (MOA), including clastogen, aneugen, and non-genotoxic. The same authors later extended these results to new chemicals with known genotoxic properties and tested the applicability of LR algorithms (and others) to data generated from TK6 cells exposed to 103 chemicals not previously evaluated, tested, or used in training. Multinomial logistic regression (LR), artificial neural network (ANN), and random forest (RF) models were built using 4-hour and 24-hour MultiFlow data to predict whether a chemical is genotoxic and to determine its predicted mechanism of action (MOA) as clastogenic, aneugenic, or non-genotoxic. These were fed through the models after a set of 83 previously studied chemicals was applied to train the models. Both the individual model performance and a ‘majority vote ensemble’ approach were determined. Specific criteria for the number of positive scores from successive concentrations were applied, and compounds were ranked based on a probability score. The authors aimed to enhance the throughput, predictivity, and overall generalizability of genotoxicity testing by employing this strategy. The ANN model performed particularly well, and the ensemble majority vote approach added validity to the conclusions.
2.1.7.2.1. Strengths and Weaknesses
This test aims to determine only directly genotoxic-active substances, and no metabolic activation is applied. Therefore, any substance known or predicted to require metabolic activation would, by definition, be classified as non-genotoxic. The method was cross-validated in a 7-laboratory multi-center study of 60 chemicals. The majority vote ensemble score (2 of the 3 model approaches in agreement) was able to produce high accuracy, specificity, and sensitivity values of between 90 and 95%. The assay could not test 49 of 103 chemicals based on inability to reach the 1 mM limit, failure to meet the assay’s cytotoxicity threshold, or precipitate formation.
Advantages of the MultiFlow™ assay are its ability to screen compounds and classify them by MOA as clastogen, aneugen, or non-genotoxic, which can support de-risking of an adverse finding. It would be a suitable choice as a pre-screen or a mechanistic follow-up for cosmetics under EU rules, or for marketed chemicals under REACH. For non-genotoxic carcinogens, it is useful to study the MOA, especially for data-poor substances. It is a multiplex, high-throughput assay with high sensitivity and specificity, providing mechanistic insights.
2.1.8. TGx-DDI Transcriptomic Biomarker Assay
2.1.8.1. Principle of the Assay
The TGx-DDI assay, developed by Li et al. [47], is designed to identify potential genotoxic substances and discriminate between DNA- and other types of damage [48]. It includes gene expression data for 64 individual genes, identified as relevant to DNA-damage-inducible substances and known non-DNA damage-inducible genes. Originally, TK6 cultured mammalian cells were exposed to 28 chemical substances (one of which is a validated biomarker for aneugenicity, or a change in chromosome number), and the resulting gene expression changes were measured. The results were then generalized to newly tested substances that produce the same changes in vitro.
2.1.8.2. Method
Gene expression analysis is used to assess genotoxicity after cells in culture are exposed to the test substance for four hours. Cell collection, lysis, RNA extraction, and transcriptomic analysis are performed.
Buick et al. [49] employed this combinatorial approach to assess the potential genotoxicity of ten data-poor compounds. Six of the ten were identified as genotoxins by all three assays in the multiplex, despite being data-poor, and the mechanism of action (MOA) was defined as clastogenic. In four other compounds, the results of the three assays did not align, and the MultiFlow® assay results indicating non-genotoxicity were used to conclude that these two compounds were likely false positives in the MicroFlow® test. The last two compounds were weakly DNA-damage inducing in the presence of S9 and MN-inducing by MicroFlow®, but were identified as non-genotoxic by MultiFlow®. Therefore, they were deemed equivocal and recommended for further definitive testing. The authors then potency-ranked each of the test substances using benchmark concentration (BMC) modeling.
2.1.8.2.1. Strengths and Weaknesses
TGX DDI is an effective screening and confirmatory assay as part of a battery of tests to identify potential genotoxins, DNA damage, other cellular damage, and mechanisms of action (MOAs). It is particularly useful for data-poor substances.
Prototypical substances have been used to confirm the assay performance [49-51]. Multiplexing the TGx-DDI together with MicroFlow® and MultiFlow® assays (above) is particularly useful because classifiers from the two approaches can then be compared and the results corroborated. The information that can be derived from this multiplex of assays is clearly much more useful than a simple test of positive or negative genotoxicity alone.
It was noted that the resulting BMCs could be converted to administered equivalent doses (AEDs, as referred to in qIVIVE in the Discussion) using HTTK models. Since qIVIVE can be used to determine a human MOE (known as a bioactivity exposure ratio, BER), it may be practically employed for risk assessment if toxicokinetic parameters, such as plasma protein binding and metabolic clearance, are known for the compound(s). This makes it an extraordinarily valuable technique for human risk assessment.
This assay is amenable to high-throughput analysis and can be completed in as little as one to two days with experienced hands and automated facilities.
Disadvantages include that it is an indirect measure of damage and has not yet been fully validated (although it has been cross-tested in experiments).
2.1.9. MutaMouse™ Assays
2.1.9.1. Principle of the Assay
The FE1 in vitro version of MutaMouse™ Transgenic Rodent Gene Mutation Assay [52-54] is an in vitro transgene mutation assay that uses the FE1 epithelial cell line derived from MutaMouse™ lung. The cells contain a shuttle vector with a lacZ mutation target that is amenable to positive selection of mutants using an E. coli galE-lacZ host and the PGal (phenyl-β-galactosidase) selection system.
2.1.9.1.1. Strengths and Weaknesses
Maertens et al. [51] demonstrated that for nine compounds that previously produced false positive in vitro test results, none of them showed positive results in the FE1 in vitro MutaMouse™ transgenic assay. Furthermore, when compared with the results of Fowler et al. [54] for ability to induce micronuclei in three p53-deficient rodent cell lines (V79, CHO, and CHL) or three p53-competent human cell lines (primary human lymphocyte HuLy cells, human lymphoblastoid TK6 cells, and human hepatocellular carcinoma HepG2 cells), the FE1 MutaMouse cells outperformed the V79, CHO, and CHL cells in identifying the false positive chemicals, and were equal in performance to the human p53-competent cell lines.
Some positive attributes of FE1 include its cytogenetic stability, normal p53 functionality, endogenous metabolic capability (constitutive CYP1A1 and GST enzymes), and the presence of a retrievable transgene for mutational scoring.
As the in vivo MutaMouse™ transgenic assay is accepted by the OECD (OECD TG 488) [55], the in vitro FEI MutaMouse™ assay serves as a complementary test and should be considered an appropriate screen for compounds that previously produced false positive results in conventional assays, or prior to conducting the in vivo MutaMouse™ assay. It has reportedly been submitted to the OECD multistep evaluation process for validation [56].
2.1.10. MutaMouse™ Primary Hepatocyte Mutagenicity Assay
2.1.10.1. Principle of the Assay
Cox et al. characterized and developed a second MutaMouse™ transgenic in vitro assay, based on primary hepatocytes [57, 58]. This assay was intended to overcome problems with in vitro genotoxicity assays, including a need for metabolically competent cells (and the attendant problems with using rodent liver S9), and karyotype instability issues (deletions, duplications, translocations, impaired p53 function, genomic drift, and changing cell growth characteristics).
After thorough characterization, it was determined that cells exhibited a normal phenotype, were metabolically competent, and contained the lacZ shuttle vector on chromosome 3, demonstrating that the cells could be used to measure mutational events after treatment with candidate compounds in vitro. Cytochrome P450 induction by a canonical Cyp1a1 and 1a2 gene inducer, β-naphthoflavone, was also observed.
Later, the same authors tested 13 mutagenic and non-mutagenic compounds, including a range of compounds (direct acting, requiring metabolic activation) and detected a concentration-dependent increase in mutant frequency of up to 14.4-fold vs. control in all but one of the mutagens, and in none of the four non-mutagens (two of which had previously elicited false positive results). They concluded that for either chemicals that require metabolic activation or direct-acting mutagens, the MutaMouse™ primary hepatocyte (PH) assay can be used as an in vitro gene mutation assay.
2.1.11. Side-by-Side Comparison of Conventional vs. New Approach Methods
Table 1 [59-95] and (Fig. 1) present a comparison of the test applicability, endpoints, assay length, advantages, and disadvantages of conventional short-term and alternative (new approach/NAM) genetic toxicity testing methods, listed by test name, along with references and OECD TG numbers.
3. DISCUSSION OF REGULATORY STATUS AND PROGRESS ON ALTERNATIVE IN VITRO GENOTOXICITY TESTING METHODS
A paradigm shift is occurring towards non-animal testing methods. The 2025 Federal budget included $5 million for a new FDA program aimed at reducing animal testing by helping to develop new product testing methods [96]. Some important developments include a ban by Mexico on the sale of animal-tested cosmetics as well as in eight U.S. states (Hawaii, Maine, Maryland, New Jersey, Virginia, California, Nevada, and Illinois), the passage of the U.S. Humane Cosmetics Act, a recent action plan by the European Parliament seeking to phase out all animal experiments in the EU, passage of the Korean PAAM Act, and work by PETA and HSUS to further reduce or eliminate animal use in experimental testing [97]. The EU has prohibited the testing of cosmetic products and ingredients on animals (2004), the marketing of finished cosmetic products and ingredients tested on animals (2009), and the requirement for animal testing in cosmetics (2013) [98]. NIEHS, in collaboration with OECD, developed a guideline for non-animal testing to identify skin sensitizers [99]. EPA declared a commitment to eliminate animal testing [100], followed by the Government of Canada [101, 102]. Recently, the FDA has followed suit.
| Test type | Test name | Applicability | Endpoint(s) | Assay Length (hrs or days) | Strengths | Disadvantages | OECD TG or regulatory status | Reference(s) |
|---|---|---|---|---|---|---|---|---|
| Conventional short-term | Ames Assay | Preliminary screening tool to evaluate the carcinogenic potential of chemicals that are direct acting or require metabolic activation | DNA frameshift or point mutations | 48 hr incubation | Ease of performance | Conflicting results (false -/false + | OECD 471 | Ames 1973 [59] |
| Cost | Not directly concordant to human carcinogenesis or mutagenesis | Required under the Pesticide Ace (US) | OECD [60] | |||||
| Best used to rank similar MOA substances by relative potency | 2 or | Time | Exogenous S9 required (from in vivo rodent) | Required under the TSCA (US) | ||||
| 5 days (fluctuation method) | Availability of library of tested compound results to compare | Dependent on cell culture conditions | ||||||
| Prevents unnecessary further tests | Some compounds untestable | |||||||
| Allows detection of potentially carcinogenic compound preventing wasted effort | Unsuitable for non-genotoxic substances | |||||||
| Must establish proper concentration range | ||||||||
| Complicated test conditions required to get it right | ||||||||
| MN | Staple guideline test | Chromosomal loss, breakage & spindle malformation | 72 hr incubation | Sensitive | 30-40% of compounds that are (-) in both in vivo and ToxTracker are (+) in in vitro MN assay | OECD 474, 487 | Evans 1979 [61] | |
| Can test human lymphocytes in vitro | Question of whether toxicant reaches target tissue (false -) | FDA CFSAN Redbook 2000: IV.C.1.d (July 2000) | Fenech 1985, 1986, 2000 [62-64] | |||||
| Best used as part of a battery of tests to prevent misinterpretation of results | Easily scorable | Question of excessive doses (false +) | Schlegel 1986 [65] | |||||
| May be detecting ox stress, not DNA damage | Heddle 1983 [66] | |||||||
| Countryman 1976 [67] | ||||||||
| Ramalho 1988 [68] | ||||||||
| Thomas 2003 [69] | ||||||||
| In Vitro Mammalian Chromosomal Aberration Test | Staple guideline test | Chromosome or chromatid damage | If lymphocytes used, add 48 hr for mitogenic stimulation | Simple procedure and quantitation | Cannot detect aneugens. | OECD 473 | OECD 2016 [70] | |
| Polyploidy alone does not distinguish aneugens and may indicate cell cycle perturbation or cytotoxicity only | ||||||||
| Exposure for 3-6 hr, followed by incubation for 1.5 – 2 cell cycles | Requires metabolic activation | |||||||
| Requires metaphase arrest | ||||||||
| TK6/MLA | Staple guideline test used since 1980’s | Broad spectrum of genotoxic effects (point mutations | 3-6 hr | Heterozygosity of TK6 gene makes possible to detect point mutations and large deletions & recombination | Sensitivity low for some applications to detect direct-acting agents | OECD 490 (July 2016) | Honma 1999 [71] | |
| frame-shift mutations small deletions chromosomal large deletions rearrangements | or | Consistent results | Very well standardized | OECD 2016 [72] | ||||
| Best used as part of a battery of tests | mitotic recombinations (LOH)) | 24 hr without S9 if 3 hr is negative | Comprehensive, with other assays (can detect mutagens that test negative in Ames Assay) | Low specificity (MLA) | ||||
| + 48 hr culture time (MLA) | ICH4 | |||||||
| Follow up test after a positive Ames Assay result | ||||||||
| 72 hr (TK6) | ||||||||
| HPRT | Preliminary screening assay | Limited or small genetic damage | 7-8 days + incubation on selection medium | Efficient processing | Relatively long protocol | OECD 476 | Johnson 2012 [73] | |
| Confirmatory assay for Ames or large colony MLA | Low spontaneous frequency of mutation at the HGPRT locus makes it difficult to derive enough cells for quantitation | |||||||
| Detects any mutations | Catches mutations missed by Ames or TK6/MLA | |||||||
| Comet | Used as part of a test battery or as a confirmatory assay | DNA Single strand breaks | 1 - 3 days | Simple to perform | Caution advised in interpreting results; intensity of stain is cell cycle phase dependent | OECD 489 | Cook 1976 [74] | |
| Rapid | Collins 2004 [75] | |||||||
| Type and amount of damage | Inexpensive | Careful QC required | Karbaschi 2019 76] | |||||
| Adaptable | ||||||||
| Rate of strand break repair | Reproducible | Cells come from live organisms | ||||||
| Reliable | ||||||||
| Alkali labile sites | Economical | Indirect measure of DNA damage | ||||||
| Sensitive | ||||||||
| Low sensitivity for oxidative damage, crosslinks, bulky adducts | ||||||||
| ROSGlo | Used as part of a test battery or as a confirmatory assay | Oxidation of DNA, RNA, proteins, lipids | Variable incubation period with test substance; | Does not use HRP (which produces false positive results) | Indirect measure (epigenetic damage) | OECD 442E | Holmstrom 2014 [77] | |
| measurements 2 hr post-reagent addition | Amenable to HTS | OECD 425 | Promega.com [78] | |||||
| Little sample prep required | Short-term assay for chronic process | OECD 442D | Biospace.com [79] | |||||
| Multiplexing possible | ||||||||
| Simple procedure | Not a standalone test | |||||||
| Does not require sample manipulation | ||||||||
| Fast | ||||||||
| Sensitive | ||||||||
| γH2AX | Clinical use to assess DNA damage in biopsies | DNA double strand breaks | ~8 hrs | Rapid | Lack of standardization/ harmonization | EURL-ECVAM | Reddig 2018 [80] | |
| Specific (91%) | Kopp 2019 [81] | |||||||
| Used as part of a test battery or as a confirmatory assay | Reaction peaks at from 30 min to 12 hr (depending on substance and dose level) | Sensitive (98%) | Overlapping foci cannot be quantitated; signal saturation | Khoury 2013, 2020 [82-83] | ||||
| HTS possible but with reduced interpretability | Kirkland 2008 [84] | |||||||
| Detects 95% of carcinogenic compounds tested | ||||||||
| Pig-a | Used as part of a test battery or as a confirmatory assay | Deletions or mutations in Pig-a | 28 days treatment; detection is within minutes | Flexible (in vitro or in vivo) | Maximum mutational frequency may occur weeks or longer after the last exposure | OECD 470 | Araten 1999, 2005, 2010, 2013 [85-88] | |
| Chen 2001 [89] | ||||||||
| Monitoring humans for somatic mutation | Low volume blood required | Verification of mutants by DNA sequencing is required to confirm id and quantitate mutant frequency | Olsen 2017 [90] | |||||
| Dertinger 2015 [91] | ||||||||
| Rapid quantification | Timing of measurements is key | Nicklas 2015 [92] | ||||||
| Kruger 2015, 2016 [93-94] | ||||||||
| Mutation rate per cell division also determined | Differential organ sensitivity | |||||||
| Negative result should not be interpreted as no in vivo genotoxicity | ||||||||
| Accurately predicts mutagens, non-mutagens | ||||||||
| Does compound reach bone marrow? | ||||||||
| Roles of DNA repair enzymes in BER and other cell functions can be investigated | ||||||||
| HTS method | ||||||||
| Alternative short-term | In vitro yeast DEL recombination | A proposed alternative to inadequate and misleading cell transformation assays, and improve on the ICH battery which had limited predictive power for genotoxic carcinogens | Direct and indirect-acting carcinogens | 11 days | Sensitive | Not a human or a mammalian system | Alternative to ICH S2, which includes both in vitro and in vivo testing | Brennan 2004 [6] |
| Specific | Ku 2007 [7] | |||||||
| Aneugens | Simple | Lucas 2019 [8] | ||||||
| Fast | ||||||||
| Wide variety of DNA lesions | ||||||||
| MOA determined by TGX add-on | ||||||||
| Spontaneous breaks during replication | ||||||||
| Widely applicable to many substances | ||||||||
| Induced ds breaks by S. cerevisiae homothallic endonuclease | ||||||||
| Mimics human Ph I, II metabolism | ||||||||
| Ease of manipulation of S. cerevisiae; facile homologous recombination | ||||||||
| 3D Cell Cultures | Proposed for use to detect changes that lead to cancer that are not normally detectable with traditional short term tests, and for determination of MOAs of the active substances | Detects either direct or epigenetic changes associated with photo toxicity, wound healing, fibrosis, inflammation, and that lead to carcinogenesis | Time window for experimentation limited but improving | Excellent for exploring MOAs | Time consuming | ECVAM validated under OECD TGs | Mat Tek 2024 [10] | |
| OECD 428 | Lee 2023 [95] | |||||||
| Direct visualization of cellular changes | Technologically demanding | Maione 2018 [12] | ||||||
| Nordberg 2020 [13] | ||||||||
| Manipulable | Labor intensive | |||||||
| Closely resemble in vivo tissue | Not HTS | |||||||
| Reproducible | ||||||||
| Controlled | ||||||||
| Can explore different genetic backgrounds, overlaid disease conditions | ||||||||
| Combine with GWAS for improved discriminatory power | ||||||||
| 3D RS Comet | Intended to confirm or deny a positive conventional assay result; in vivo testing not permitted for cosmetics in EU | DNA Single strand breaks | 48 hr treatment + std comet assay protocol of 1-3 days | Sensitive (80%) | Has the disadvantages mentioned above for 3D cultures, and of comet assay | Accepted into the OECD TG development program | Pfuhler 2021[14] | |
| Specific (97%) | ||||||||
| Developing dermal genotoxicity assays | Type and amount of damage | Accurate (92%) | ||||||
| Reproducible (93, 88% for intra-, inter-laboratory) | ||||||||
| Rate of strand break repair | ||||||||
| Alkali labile sites | ||||||||
| RS Skin MN | Intended for dermally applied products | Chromosomal loss, breakage, apoptosis, necrosis | 48 hr extendable to 72 hr + std MN assay protocol of 72 hr | Specific (87%) | Has the disadvantages mentioned above for 3D cultures, and of MN assay | Accepted in EU as back up or confirmatory assay | Pfuhler 2010 [15] | |
| Sensitive (65% -> to 80% by add’n of 72 hr treatment) | Hu 2009 [16] | |||||||
| Not a stand-alone assay – follow up to conventional genotoxicity assay | Aardema 2010 [17] | |||||||
| W/ 3D skin comet assay, sensitivity of 92% | Dahl 2011 [18] | |||||||
| Rapid | ||||||||
| Topical application | ||||||||
| Validated | ||||||||
| Easy re-testing for added 72 hr if (-) at 48 hr | ||||||||
| Lower resource requirements | ||||||||
| Human based, no animals required | ||||||||
| Bhas 42 CTA | Screening tool for cell transformation potential of tumor-promoting compounds (both genotoxic and non-genotoxic) | Detect initiating (genotoxic) or promoting (non-genotoxic) chemical carcinogens | 21 days | Sensitive | Assay length is long | OECD certificated test; method provided in OECD’s “Guidance Document on the In Vitro Bhas 42 Cell Transformation Assay; Series on Testing and Assessment No. 231” | Ohmori 2004 [20], 2022 [30] | |
| Asada 2005 [21] | ||||||||
| Confirmatory for compounds that are + for initiation | Transforming potential can be directly determined without treatment by a tumor-initiating compound (cell line already has v-Ha-ras gene) | Gives limited information | Tanaka 2009 [22] | |||||
| Sakai 2012 [23] | ||||||||
| Confirmatory for compounds negative or equivocal in Ames Assay | Reduced time to correctly classify transformed vs non-transformed foci | Has limitations associated with 2D cell culture | Sasaki 2014, 2015 [24-25] | |||||
| Guichard 2023 [31] | ||||||||
| Masumoto 2021 [32] | ||||||||
| ToxTracker® | Confirmatory for mode (direct vs indirect) of action and provides information about the MOA, pathways | Detect formation of bulky DNA adducts + inhibition of replication | 1-2 days | HTS | Dose range finding necessary | In Q3 of 2023 OECD conducted peer review | Hendriks 2011, 2012, 2013, 2016, 2024 [33-37] | |
| Czekala 2021 [39] | ||||||||
| Detect formation of DNA ds breaks | Internationally validated | May be difficult to hit the sweet spot between cytotoxicity and maximum soluble concentration or 1 mg/mL for some compounds | Conducted under OECD TG 34 | |||||
| Detect ox stress, protein damage, cell cycle arrest | Provides MOA and pathway information | Requires metabolic activation w/ S9 | ||||||
| Intralaboratory reproducibility (73-98%) | Requires mouse embryo donors | |||||||
| Interlaboratory reproducibility 83% | ||||||||
| Sensitivity 84.4% | ||||||||
| Specificity 91.2% | ||||||||
| Multiflow® DNA Damage | Screen compounds and classify by MOA (clastogen, aneugen, non-genotoxic) | DNA ds breaks | 4 weeks | Multiplex, HTS assay | Only determines direct-acting genotoxic agents | Cross-validated in 7 lab multi-ctr study | Bryce 2014, 2017, 2018 [44-46] | |
| Support de-risking of adverse finding in a conventional assay | Response to DNA damage | Data analysis strategy generates probability scores used to classify substances | Many compounds were not testable due to did not reach cytotoxicity, did not reach 1 mM, or formed precipitate | |||||
| Prescreen or mechanistic follow up for cosmetics in EU | Polyploidization | Multiple models and consensus voting approach strengthens results | ||||||
| Testing of marketed chemicals under REACH | Cell proliferation | Sensitivity, accuracy, specificity values between 90-95% | ||||||
| Discover information on MOA for non-genotoxic carcinogens | Protein misfolding | Provides mechanistic insights | ||||||
| Cell stress | ||||||||
| Cell cycle dysregulation | ||||||||
| MutaMouse FE1 | Screen compounds that produced false positive results in conventional assays | Detects mutations in any tissue with lacZ gene as the mutational target | 4-5 days | Cytogenetic stability | Requires S9 metabolic activation to detect some compounds | OECD 488 (OECD 2011, 2013) | Maertens 2017 [51], | |
| White 2003 [52], Cox 2019a,b [57-58] | ||||||||
| Screen prior to in vivo MutaMouse assay | Score gene mutations, chromosome damage | Normal p53 functionality, | Scoring slow, laborious | Validation in process | ||||
| Endogenous metabolic capability | Spontaneous background frequency is high compared to endogenous genes | Well established protocols | ||||||
| Possession of a retrievable transgene for mutational scoring | Scoring may require specialized reagents | |||||||
| Convenience of in vitro manipulability, sequencing | Transgenes are not endogenous (no transcription-coupled repair of scored loci) | |||||||
| Reliable | Except for spi- selection and the lacZ plasmid model, cannot detect mutations from large deletions and chromosomal aberrations | |||||||
| Reproducible | Multiple systems required for comprehensive coverage of mutational MOA | |||||||
| Clonal selection not required | Selective plating and manual scoring required | |||||||
| MutaMouse PH | Screen compounds that produced false positive results in conventional assays | Detects mutations in any tissue with lacZ gene as the mutational target | 4-5 days | Same as FE1 except uses primary hepatocytes and does not require metabolic activation | Same as FE1 except uses primary hepatocytes and does not require metabolic activation | OECD 488 | Chen 2010 [56] | |
| Cox 2019a, 2019b [57-58] | ||||||||
| Screen prior to in vivo MutaMouse assay | Score gene mutations, chromosome damage |
For U.S. regulatory acceptance of substances added directly to food, a Food or Color Additive Petition must be submitted. For indirect (food contact) substances, a Food Contact Substance Submission is required [103-106]. Voluntary GRAS (Generally Recognized As Safe) status may be sought. FDA CFSAN (now FDA HFP) provides guidelines for animal testing, which are recommended but not required for regulatory acceptance [107] (updated 2018). Therefore, non-animal testing methods may be used to establish GRAS status or obtain premarket approval for food ingredients.

Features of Conventional and Alternative Genetic Toxicity Tests.
Substances intended for addition to animal feed are required to undergo testing (per CFR 21) or to reduce the drug concentration present to a level that causes no harm in the animal (<1 in 1 million cancer risk) or in the population of consumers. Guidance has been issued by the FDA CVM [108] for veterinary drugs administered to animals.
The FDA's CDRH recently initiated a new program to qualify medical device testing methods for future use, known as Medical Device Development Tools (MDDT) [109]. A NAM can become qualified under the process and be deemed fit for purpose under that context of use in future submissions. If a NAM is not pre-qualified, then biocompatibility testing is performed to identify genotoxic chemicals in medical devices and may include more than one of OECD 471 (Ames test), 476 (mouse lymphoma gene mutation assay), 473 (in vitro chromosomal aberration assay), or 487 (in vitro micronucleus assay), which are traditional in vitro methods.
ISTAND (Innovative Science and Technology Approaches for New Drugs) is a pilot program of the FDA's CDER, intended to qualify innovative drug development tools, including NAMs. Unfortunately, to date, no methods have been qualified. However, several are under consideration, including organ-on-a-chip technology, AI-based digital health technology, and an off-target protein binding assessment tool. The FDA's CDER accepts the transgenic mouse six-month assay as one species in its requirement for two rodent carcinogenicity bioassays, thereby reducing the total time on test for mice and the number of animals. Guidance from the FDA’s CDER on Carcinogenicity Testing of Pharmaceuticals states that in certain circumstances, a 2-year rat carcinogenicity assay may not be necessary, using the “Weight of Evidence” [WoE] approach [110]. The FDA clarified that it does not require the use of animal tests for new drug applications. However, it acknowledges that there is currently no acceptable alternative available for chronic toxicology testing (FDA Modernization Act 2.0, Dec. 29, 2022). It clarifies that data from cell-based assays, bioprinted models, organs-on-a-chip, and computer models can be added to new drug applications. Recently, the Commissioner announced that ELSA, FDA’s AI tool, will be used to reduce the time required for the application process, and other changes, such as the use of test results and determinations from other international agencies, as well as updates to GRAS, are forthcoming.
Thus, genotoxicity testing remains an essential component of U.S. preclinical pharmaceutical safety evaluation. Investigational New Drug applications require an in vitro mutagenicity assay (OECD 471), an in vivo study for mitotic/chromosomal damage (Micronucleus assay, OECD 474), and the Comet Assay for DNA fragmentation (OECD 489). However, organs-on-chips have the potential to replace all three of these tests, such as the 3D Skin Comet Assay or the liver-on-chip with human lymphoblastoid (TK6) cells [111] or the 3D skin model (EpiDerm® Model), which may be combined with the micronucleus assay [112] in RS MN. For any genetic toxicity testing strategy, tests should include possible mechanisms of genotoxicity, such as genetic mutations and clastogenic and aneugenic chromosomal aberrations [113].
Recent collaborations to incorporate non-animal testing (Tox21, EuToxRisk, Partnership for the Assessment of Risks from Chemicals (PARC), ONTOX, CAAT, RiskHunt3R, 3Rs Collaborative, NC3Rs, and MPS) are gaining momentum and include international collaboratives aiming to validate and harmonize in vitro alternative test methods (International Cooperation on Alternative Test Methods (ICATM), the European Centre for the Validation of Alternative Methods (ECVAM), the Japanese Centre for the Validation of Alternative Methods (JacVAM), and the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM)). The establishment of Health Canada’s New Substances Assessment and Control Bureau is also expected to accelerate acceptance of NAMs for genetic toxicity testing. The American Society for Cellular and Computational Toxicology (ASCCT) works closely with the European Society of Toxicology In Vitro (ESTIV), and recently, the SAAOP (Society for the Advancement of Adverse Outcome Pathways) has affiliated with ASCCT/ESTIV.
Often, the aim is to demonstrate that non-animal testing methodologies produce results at least as good as those of animal testing methods, which assumes that animal testing methods yield good results. However, it is now acknowledged that they do not yield the best results and often present a confusing patchwork of different study conditions and results, with poor specificity and sensitivity to humans. Therefore, the objective of non-animal testing strategies has evolved into demonstrating that they can correctly categorize a result as ‘toxic’ or ‘non-toxic’, while admittedly not yet being able to accurately address the middle ground of ‘some toxicity’. For genotoxicity, the aim is to correctly discriminate the positively genotoxic carcinogens from the non-genotoxic ones, and if possible, delineate the MOA or even MIE.
An important final step in the process of a purely in vitro genotoxic test is quantitative in vitro to in vivo extrapolation (qIVIVE), which considers absorption, distribution, metabolism, and elimination (ADME) to derive a human-relevant Margin of Exposure (MOE). The administered equivalent dose (AED) in mg/kg body weight/day is determined, which is the estimated dose required to reach a steady-state concentration in the plasma equal to the concentration inducing genotoxicity in the in vitro assay [114]. Later, a case study was conducted on 31 reference chemicals and determined that 20 of the 31 qIVIVE-derived points of departure (PODs) were considered health protective when compared against in vivo-derived PODs.
Other researchers [115] have studied the derivation of a threshold of genotoxicity for known genotoxic substances by using dose-response modeling to determine a margin of exposure (MOE) value or Health-based guidance values (HBGVs). Then, using the Benchmark Dose (BMD) approach, which incorporates physiologically based pharmacokinetic (PBPK) modeling, a point of departure (POD) is determined, and uncertainty factors (UFs) are employed to quantitatively estimate the tolerable daily, weekly, etc., intake. This approach eschews the previously mentioned LNT concept, instead acknowledging that substances often do demonstrate dose-response behavior indicative of a threshold below which risk is reduced to a level that will not cause cancer from a lifetime of exposure. Further refinement of this approach using Bayesian methods [116] is expected to provide improvements over the more traditional approach of estimating UFs. Informed priors are prior knowledge that is incorporated into Bayesian modeling approaches, resulting in the derivation of probability distributions. The latter are ranges related to the probability of an outcome occurring, rather than simplistic point estimates. Utility is further enhanced by the ability of programs to compare multiple modeling approaches and choose among them, or incorporate the results of more than one (model averaging), for more precise and accurate model estimation.
ONTOX has published a protocol for an AI-supported case study that will apply a standardized approach to risk assessment, using the ONTOX toolbox [117]. This is in support of a new project (‘Ontology-driven and artificial intelligence-based repeated dose toxicity testing of chemicals for next generation risk assessment’) under the EU program Horizon 2020. The objective is to create a generic protocol applicable to any chemical for determining the effects of a systemic repeated-dose toxicity experiment, entirely eliminating the need for new animal experiments.
This proof-of-concept protocol focuses on six specific NAMs (liver steatosis and cholestasis, kidney tubular necrosis and crystallopathy, and fetal neural tube closure and cognitive function defects) as an example, using a well-known chemical (PFOA) that has already generated substantial data. Each NAM will have a computational system based on AI. The AI model will be informed by biological, mechanistic, toxicological, epidemiological, physico-chemical, and kinetic data. Other elements of the system will include physiological maps, qAOPs, and ontology frameworks/evidence maps. Where information is lacking, in vitro and in silico testing will be undertaken. Finally, the project will collaborate with industry and regulatory stakeholders to qualify it for regulatory and commercial use.
Probabilistic Risk Assessment (PRA) will be carried out using the APROBA tool [116, 118], incorporating benchmark dose calculations for BMDL and BMDU, and utilizing a workflow for PRA [119] with PBK models, thereby yielding a POD for risk assessment. Physiological maps will inform and enhance translation from in vitro to human endpoints. (Fig. 2) depicts an example workflow that could be employed.

Planned probabilistic workflow in ONTOX (reprinted with permission of ONTOX).
The workflow differs from the qIVIVE process that has been developed lately, which uses reverse dosimetry. Instead, it starts from the external (measured) dose and then proceeds to modeling and qIVIVE and thence to risk assessment. Using a distribution of external concentrations from real-world data to model probabilistically with PBK, a distribution of target site concentrations will be derived. The result will be a distribution of internal concentrations in the tissues (liver, kidney, brain) to be studied, which will be compared to dose-response curves from in vitro studies, and their ranges of agreement (or disagreement) will be noted. Raw dose-response curve data will be transformed using in silico models of in vitro kinetics. This method was chosen because, in reverse dosimetry, each benchmark concentration derived involves multiple simulations, whereas in the forward direction, simulations are only performed once.
The specific work products to be developed include: individual external exposure assessments, population-level external exposure analysis, PBK modeling, and qIVIVE for internal exposure. Additionally, the project will involve identifying human hazard data and animal studies, as well as in vitro and in silico predictions using QSAR, SAR, similarity-based prediction with a supervised-learning neural network model leveraging deep learning, a property transformer AI model, and docking simulations. Finally, in vitro and in silico data will be incorporated from experiments to be performed for each endpoint. Animal hazard characterization will serve as the model approach for applying in vitro data to human hazard characterization. Risk characterization will determine a human MOE by comparing the distributions of exposure and hazard, and sampling MOEs, to establish a probability distribution of MOEs. The project promises to be groundbreaking in that it will deliver a generic solution for probabilistic risk assessment of any chemical entity, without requiring data from in vivo studies, which can serve as a model for general adoption and harmonization among scientists.
Several researchers have recently published informative case studies using qIVIVE for liver steatosis from dietary exposure to Imazalil [120], assessment of non-combustible next-generation product aerosols [121], and coumarin in cosmetic products [122], among others. These studies illustrate these principles in practice in detail.
Elimination of the cancer bioassay has been proposed for pesticide registration. Through a WoE approach that incorporates acute, subchronic, developmental, and reproductive toxicity (DART) assays with evidence of hormone perturbation, immune suppression, genetic toxicity, and mechanistic studies supporting a proposed MOA [123, 124], agrochemical sponsors are seeking waivers from EPA to avoid the cumbersome, lengthy, and often uninformative rodent carcinogenicity bioassay. The Weight of Evidence (WoE) method integrates known information (key chemical properties, planned uses, and estimated exposures) with absorption, distribution, metabolism, and elimination (ADME), toxicokinetics (TK), toxicity (mentioned above), and information from related chemicals (‘read across’) to derive PODs for risk assessments.
In this scenario, genetic toxicity is still included in the data submitted in the regulatory approval package; however, as genetic toxicity testing moves further away from in vivo testing due to the exigencies of time, materials, and the possibilities afforded by new technologies, fewer animal lives will be wasted.
CONCLUSION
In this review, a variety of NAMs (In vitro (yeast) DNA deletion (DEL) recombination assay, 3D RS/RSMN, Bhas 42 CTA, ToxTracker, MultiFlow® and MicroFlow® DNA Damage Assays, TGx-DDI transcriptomic biomarker assay, MutaMouse™ Assays), are discussed for studying the genetic toxicity of chemical substances. Their principles, methods, and the strengths and weaknesses of each, including progress towards OECD acceptance, sensitivity, and specificity, are discussed. In Table 1, the standard tests and NAMs are presented, along with their applicability, assay length, regulatory status, references, strengths, and weaknesses. Fig. 1 illustrates a simplified comparison of various aspects of NAMs.
In a previous study (Part I), the concept of LNT (‘linearity at low dose’) was introduced, influencing subsequent research and the development of genetic toxicity testing, and eventually becoming the accepted paradigm. While these requirements have advanced the science and provided substantial data confirming the genotoxicity or mutational capability of substances, they have also hindered progress towards non-animal testing methods, a long-standing goal in toxicology. However, with advances in knowledge and technologies, such as the qIVIVE paradigm and the WoE approach, the realization of this goal is now possible. It remains for scientists to implement these methods.
AUTHORS’ CONTRIBUTIONS
The author confirms sole responsibility for the following: study conception and design, data collection, analysis and interpretation of results, and manuscript preparation.
LIST OF ABBREVIATIONS
| ADME | = Absorption, Distribution, Metabolism, Excretion |
| AED | = Administered Equivalent Dose |
| AI | = Artificial Intelligence |
| ANN | = Artificial Neural Network |
| AO | = Adverse Outcome |
| AOPs | = Adverse Outcome Pathways |
| ASCCT | = American Society for Cellular and Computational Toxicology |
| AUC | = Area Under the Curve |
| BER | = Bioactivity Exposure Ratio |
| Bhas 42 CTA | = Bhas 42 Cell Transformation Assay |
| BMC | = Benchmark Concentration Modeling |
| BMD | = Benchmark Dose Modeling |
| BMDL | = Benchmark Dose Lower Limit |
| BMDU | = Benchmark Dose Upper Limit |
| CAAT | = Center for Alternatives to Animal Testing (Johns Hopkins University) |
| CFR | = Code of Federal Regulations |
| CNN | = Convolutional Neural Network |
| CRISPR-Cas9 | = Clustered Regularly Interspaced Short Palindromic Repeats (Gene Editing) |
| CYP1A1 | = Cytochrome P450 1A1 |
| DART | = Developmental and Reproductive Toxicity |
| DEL | = Deletion |
| ECVAM | = European Union Reference Laboratory for Alternatives to Animal Testing |
| ESTIV | = European Society for Toxicology In Vitro |
| GRAS | = Generally Recognized as Safe |
| GST | = Glutathione S-Transferase |
| HBEC | = Human Bronchial Epithelial Cells |
| HBGVs | = Health-Based Guidance Values |
| HPLC | = High Performance Liquid Chromatography |
| IATA | = Integrated Approaches to Testing and Assessment |
| ICATM | = International Cooperation on Alternative Test Methods |
| ICCVAM | = Interagency Coordinating Committee on the Validation of Alternative Methods |
| ICH | = International Conference on Harmonization |
| ISTAND | = Innovative Science and Technology Approaches for New Drugs |
| JacVAM | = Japanese Centre for the Validation of Alternative Methods |
| LNT | = Linearity at Low Dose |
| LR | = Logistic Regression |
| MAD | = Mutual Acceptance of Data |
| MDDT | = Medical Device Development Tools |
| mESC | = Mouse Embryonic Stem Cells |
| MIE | = Molecular Initiating Event |
| ML | = Machine Learning |
| MN | = Micronucleus Test |
| MOAs | = Mechanisms of Action |
| MOE | = Margin of Exposure |
| MPS | = Microphysiological Society |
| MS | = Mass Spectrometry |
| NAMs | = New Approach Methodologies |
| NC3Rs | = National Centre for the Replacement, Refinement, and Reduction of Animals in Research (UK-based) |
| NHEK | = Normal Human Epidermal Keratinocytes |
| OECD | = Organization for Economic Cooperation & Development |
| OMICS | = Proteomics, Metabolomics, Genomics, Transcriptomics |
| ONTOX | = EU-funded Horizon 2020 Project on Toxicity Testing |
| PARC | = Partnership for the Assessment of Risks from Chemicals |
| PBPK | = Physiologically Based Pharmacokinetic Modeling |
| PBTK | = Physiologically Based Toxicokinetic Modeling |
| PCR | = Polymerase Chain Reaction |
| PFOA | = Perfluorooctanoic Acid |
| PGAL | = Phenyl-β-Galactosidase |
| PH | = Primary Hepatocyte |
| PODs | = Points of Departure |
| PRA | = Probabilistic Risk Assessment |
| qAOPs | = Quantitative Adverse Outcome Pathways |
| qIVIVE | = Quantitative In Vitro to In Vivo Extrapolation |
| QSAR | = Quantitative Structure-Activity Relationships |
| REACH | = Registration, Evaluation, Authorization, and Restriction of Chemicals |
| RF | = Random Forest |
| RiskHunt3R | = Risk Assessment of Chemicals Integrating Human-Centric Next Generation Testing Strategies (Horizon 2020 Project) |
| RS | = Reconstructed Skin |
| RSMN | = Reconstructed Skin Micronucleus Assay |
| SAAOP | = Society for the Advancement of Adverse Outcome Pathways |
| SAR | = Structure-Activity Relationships |
| SCE | = Sister Chromatid Exchange |
| TG | = Test Guideline |
| TPA | = 12-O-Tetradecanoyl-Phorbol-13-Acetate |
| UF | = Uncertainty Factors |
| WoE | = Weight of Evidence |
ACKNOWLEDGEMENTS
Declared none.

