
SSEF introduces machine learning algorithm for data visualization
by Dr. H.A.O. Wang, first published in Facette 21 (June 2021)
In January 2021, the Swiss Gemmological Institute SSEF published a scientific article in the Journal of Analytical Atomic Spectrometry (Figure 1) about multi-element analysis of gemstones and machine-learning-assisted data visualization, with a particular focus on the origin determination of emeralds (openly accessible via https://doi.org/10.1039/D0JA00484G).
Our latest research is related to the unique capability of our Time-Of-Flight mass spectrometer (GemTOF, see www.gemtof.ch) to acquire almost all chemical elements simultaneously, even at very low trace levels. As a result of this, GemTOF enables the operator to first measure the gemstone, and then determine which elements are of interest (e.g. for origin determination). This is in contrast to a conventional LA-ICP-MS analysis where the elements of interest have to be selected prior to analysis, thus requiring the operator to make prior assumptions about the composition of the gemstone to be analysed. As such, less frequently occurring elements may be missed, even though they can be part of an important and characteristic chemical signature for the origin determination of gemstones.
Analytical Protocol for GemTOF
Each gemstone contains a unique set of chemical elements (‘chemical fingerprint’) which is related to its geological environment (type of host rock) and formation conditions. By analysing thousands of reference samples from different gemstone deposits, SSEF has accumulated a huge chemical ‘fingerprint’ database over many years. Specifically, when applying sophisticated analytical methods such as mass spectrometry (in our case GemTOF) for trace element analysis, it is absolutely crucial to operate such a method following a very strict and rigorous analytical protocol. In the JAAS peer-reviewed article, we thus present a detailed step-by-step analytical procedure for gemstone analysis including a discussion of how to select appropriate analytical parameters and calibration methods. We further present methods to correct artefacts and to track the stability (performance) of the instrument over time, and discuss data integrity.
Machine Learning Algorithm for Data Visualisation
A ‘chemical fingerprint’ database of a specific type of gemstone may contain over 50 different elements (high-dimensional dataset). For us as human-beings, it is impossible to visualise such a high-dimensional dataset directly (because we only live in a 3D world!). To overcome the problem of using numerous bivariate or three-dimensional chemical plots to gain information about the chemical relationship of a gemstone, we apply a machine learning algorithm, called t-SNE, which reduces the complexity of the dataset and clusters gemstones based on their elemental similarity in a 3D model. The t-SNE algorithm is an unsupervised machine learning algorithm. This means that it uses for its calculation no a priori information about the country of origin. The visualisation result is thus solely based on the closeness of the multielement composition of gemstones.
Case study: Emeralds and their Origins
As a case study we compared results of 168 emeralds originating from different gem deposits. Starting with the multi-element dataset of these emeralds (analysed by GemTOF), we applied the machine learning t-SNE algorithm to successfully reduce the high-dimensional chemical dataset into a three-dimensional data plot. By this, we were able to visualise the t-SNE clustering of the selected emerald into well separated groups and sub-groups (see Figure 2 and for a 3D online clip scan the QR code).
Based on our research, the unsupervised machine learning t-SNE algorithm has proven to be a very versatile method for data visualisation. As such, it provides our gemmologists valuable information assisting them for the origin determination of gemstones.


Machine Learning vs Artificial Intelligence
Artificial Intelligence (AI) is the buzzword of the moment, similar to nanoscience a few years ago. Even for gemstone testing, it is in the headlines. Despite the buzz around this term in the media and in marketing, it must be stated that AI in its true sense cannot be simply migrated and applied to gemmology given the complexity of coloured gemstone testing. In fact, most successful stories about AI use very simple and well-defined training datasets, such as for example millions of photos showing readily identifiable objects of a man, a car or a watch, to name a few. As soon as a new photo arrives, the trained AI algorithm reacts and categorizes the item in the new photo accurately. Think about it, can it recognize a not-yet labelled airplane in a new photo? Probably not.
The same applies to gemstones. Geology is the science which investigates and describes the complex and dynamic processes of rock formation. Consequently, gemstones which form in many different geological settings (deposits) reflect the complex local geological history, as well as the dynamics of the geochemical environment in which they formed. Even if one would collect as many reference samples as possible from specific gem-deposits, it is unlikely to cover the entire mining areas and mining histories of all of these deposits. So, a simple and readily identifiable dataset for gemstones is not available for AI applications in its true sense.
The future has started: Successful application of Machine Learning at SSEF
In the author’s opinion, machine learning methods are much more promising for gemstone testing. In the case of SSEF, the choice of an unsupervised non-linear machine learning algorithm has proven to be fit-for-purpose for gemstone testing at SSEF (see article in JAAS by Wang & Krzemnicki 2021). Unsupervised in this context means that a priori knowledge about the origin of a gemstone is not taken into account for the calculation. By using machine learning, our aim is to extract from the large chemical dataset the common and statistically relevant features of each single gemstone and to finally draw general observations for the gemstones from specific geological and geographical origins. As our study on emeralds from different origins has proven, this approach is very successful and as such is supporting the work of gemmologists in order to obtain a consistent and reliable origin determination of gemstones.
Interested readers will find a detailed description of machine learning methodology for gemstone testing in the above mentioned scientific article in JAAS by Wang & Krzemnicki 2021. A more practical application is described in the paper about new emeralds from Afghanistan, published in 2021 by Krzemnicki et al. in the Journal of Gemmology (see also page 10-11 of this Facette).
Authors