Machine Learning & Prediction at Wiley Science Solutions
As a leader in spectral and chemistry informatics, Wiley is actively focused on the development of highly accurate prediction, validation, and various other models for Infrared (IR), Raman, NMR, Mass Spectrometry (MS), stereochemistry, and other key areas. Wiley also offers a number of classification algorithms designed to categorize spectra according to specific classes of compounds across multiple instrumental techniques. With over 50 years of experience, Wiley is a leader in this area.
Wiley is engaged in developing machine validation and prediction tools for the academic, corporate, and research communities. These tools rely on large amounts of curated source data from both Wiley and partnering author teams.
Licensing/Access to Algorithms
Wiley will license technologies for use by other organizations or customers on a license or limited use basis. An example of a successful collaboration is the retrosynthetic prediction developed by Wiley and licensed to the American Chemical Society (ACS)’ Chemical Abstract Services (CAS) in SciFinder-n.
Available Algorithms and Models
Infrared Spectrum Prediction
This prediction model uses one of the largest known FTIR spectra collections in the world as the model’s training set. Using this technology, Wiley generated and validated a high-quality database of predicted spectra. Augmenting the empirical coverage within the bounds of a predictive model (the chemical space of the underlying training set) is a strategy to help improve the overall density of coverage within that space for identification of unknowns, especially for novel compounds. Learn more about the Wiley Database of Predicted IR Spectra.
Examples of Available Technologies
Infrared Spectrum Classification
Wiley’s current infrared classification models are extremely accurate and are used in identifying unknown spectra if they fall into the specific classification category.
Models are available predict classes for the following with more to be added to the portfolio as we develop them:
- aminoindanes
- amphetamines
- barbiturates
- benzodiazepines
- cannabinoids
- cocaine-type substances
- fentanyl
- opioids
- piperazines
- steroids
- tryptamines
Some of Wiley’s infrared classification models have been optimized and integrated into KnowItAll’s ID Expert.
Spectrum Validation Models
Wiley has developed spectrum validation models that it uses in the development of our databases. Contact us for details about our validation models.
Proposals for Machine Learning/Prediction
Wiley Science Solutions is home to many of the largest datasets of evaluated standard reference spectra in the world. Wiley does not provide datasets externally for use in machine learning or the creation of derivative products. While our end user licenses and terms of services expressly forbid their use for those purposes, the legal landscape in this area is evolving. We do not believe such use would qualify as fair use, and thus would constitute a breach of license and would inflict commercial damage to Wiley and its author teams.
Wiley will, however, entertain proposals for the development of machine prediction, clinical diagnostic markers, and other applications using our data, with the understanding that the development and commercialization will be done under the direction of Wiley in association with the requesting organization.
Quality Control: Avoiding Potential Issues Inherent in Machine Prediction
As we have seen in recent Natural Language Processing (NLP) news, machine prediction can appear to be right, even when it is completely wrong. This is also the case with many publicly available spectrum prediction algorithms.
Sources of error often occur when the domain being predicted is outside of the knowledge graph of the machine training data model. To avoid this, all prediction models and data at Wiley are tested and validated before being released commercially. We are carefully restricting predictions within the limits of the models developed, while simultaneously expanding those models through targeted data acquisition. For novel compounds (not available commercially), we will further limit certain compounds that we predict could be potentially harmful.
As with any datasets, whether experimental or predicted, these data and tools should be used only by technically qualified individuals or those under their direct supervision. Our databases and software are not a substitute for expert interpretation of analytical or clinical laboratory results and are not a substitute for the current standard of care for analytical or clinical laboratory analysis. Our software and database are provided on an ‘as is’ basis, without warranties of any kind, either express or implied, including, but not limited to, warranties of satisfactory quality or fitness for a particular purpose or warranties as to accuracy or completeness.