Paraiba Tourmaline origin determination using machine learning

By Dr. Hao Wang & Dr. Michael S. Krzemnicki, first published in Facette 30 (March 2026)

Figure 1: An exceptional Paraiba tourmaline (about 5 ct) from Brazil, set in a ring from the Asta Collection, Hong Kong. Photo by SSEF

Since its discovery in Brazil in the late 1980s, copper-bearing Paraiba tourmaline has completely reshaped the high-end gemstone market. Famous for its electric ‘neon’ blue-to-green hues—caused by traces of copper and manganese—it has established a category of its own (Figure 1).

Today, the name ‘Paraiba’ is used in the trade for copper-bearing tourmalines not just from mines in Brazil, but also from deposits in Nigeria and Mozambique. However, geography dictates value. Due to their history, rarity, and beauty, stones from Brazil can still command a significant premium over their African counterparts.

Because of this price difference, accurately determining a gem’s geographic origin is critical. However, this is easier said than done. Paraiba tourmalines are often exceptionally clean, lacking the internal inclusions that gemmologists typically use as clues. To maintain confidence in the market, scientists at the Swiss Gemmological Institute SSEF have turned to a new ally: Chemical Fingerprinting + Machine Learning.

 

The Chemical Fingerprint

At SSEF, all gemstones are analysed meticulously using a wide range of methods from classic microscopy to advanced spectroscopy. However, modern technology allows us to look much ‘deeper’ into a gemstone than ever before. Using GemTOF, a multi-element analysis platform at SSEF, we can quantify more than 57 different chemical elements in a single tourmaline even at very low trace element concentration (down to parts per billion, or ppb).

While this provides a massive amount of data, the human brain struggles to interpret 50+ variables simultaneously. This is where Machine Learning (ML), a specialised branch of Artificial Intelligence (AI), steps in. ML allows computers to analyse vast datasets to find hidden patterns that are invisible to the naked eye.

Choices of machine learning algorithms

To understand our origin determination approach, it is important to distinguish between the two main types of machine learning: Supervised and Unsupervised.

Until now, almost all machine learning related gemmological studies use supervised learning. Think of this like using flashcards to teach a student. You feed the computer data and tell it: « This is a stone from Brazil, » or « This is a stone from Mozambique. » The computer learns the differences based on these labels.

However, this method has a major flaw: Garbage in, garbage out. In the gem trade, supply chains are complex. Mining regions shift, new deposits appear, and trade documentation isn’t always traceable to country of origin. If a computer model is trained on ‘reference’ stones that were accidentally mislabelled, it will learn the wrong patterns and make incorrect predictions forever.

The advantage of an unsupervised approach

In 2021, the SSEF Research team pioneered an unsupervised (label- free training) approach in emerald origin determination (Wang and Krzemnicki, J. Anal. At. Spectrom., 2021). In a recent publication, this method has been further developed and applied to origin determination of Paraiba tourmaline. In contrast to a supervised approach, we reversed the logic. Instead of telling the computer where the stones came from, we simply fed it the raw chemical data of 469 tourmalines and asked it to group them based purely on their chemical similarities.

Imagine dumping a mixed bag of coins onto a table and asking a computer to sort them by size and weight without telling it the currency names. The computer naturally separates them into distinct piles. Once these mathematical clusters emerge, expert gemmologists can look at where confident reference samples are located. If a reliable Brazilian stone lands in‘ClusterA,’wecandeducethattheotherstonesinthatclusterarelikely Brazilian, too. Figure 2 illustrates the difference in workflow between the two approaches.

In a summary, the benefits of unsupervised ML are clear:

  • Objectivity: The computer is not biased by potentially incorrect origin labels.
  • Discovery: It allows new, unexpected sources to be recognized rather than forcing a stone into a category that doesn’t fit.
Figure 2: Comparison of supervised vs. unsupervised approach in machine learning. The supervised approach (left) predicts origins based on a model trained with pre-defined labels. The unsupervised approach (right, used in this study) clusters data objectively based on similarities of gemstone chemical composition, and then using reference samples to retrospectively identify the origin of sub-groups. (Illustration generated by Gemini)

Results: mapping the invisible

Using conventional statistical Principal Component Analysis (PCA), we were unable to separate stones from Brazil, Mozambique (Mavuco), and Nigeria (Figure 3a). However, by applying advanced unsupervised ML algorithms (t-SNE and UMAP), we successfully transformed the complex chemical data into intuitive 2D maps (Figure 3b and 3c). The results are compelling: both ML algorithms clearly separate the stones by origin. This demonstrates that each region carries a distinct chemical signature— defined not by a single element, but by a complex combination of copper, manganese, gallium, strontium, and many others.

When a new sample is submitted to SSEF for origin determination, we analyse its chemical composition using EDXRF and GemTOF and calculate its chemical similarities to the database. The new sample is then plotted onto the map; if it falls within a specific sub-cluster, our gemmologists at SSEF use this placement as valuable argument which is supporting them in their origin determination conclusions.

SSEF is using state-of-the-art methodology

This study is not merely academic research; it is a practical framework that is already routinely applied in our daily gemstone testing operations. By combining cutting-edge chemical analysis with objective, data-driven algorithms, we are convinced that we are strengthening the scientific foundation of origin determination.

Figure 3: Separating geographic sources using chemical data. (a) Conventional PCA method fails to separate the stones, resulting in an overlapping cloud where origins are indistinguishable. (b) & (c) Advanced unsupervised algorithms (t-SNE and UMAP) successfully separate the complex chemical data into distinct groups. The resulting maps clearly display the isolated clusters for Brazil (blue), Nigeria (purple), and Mozambique (red and green).

This article has been written based on a recent published paper in Journal of Gemmology, for detailed information, please check in the issue of the journal. Wang, H.A.O., Krzemnicki, M.S., Wälle, M. and Schultz-Guttler, R.A. 2025. Unsupervised Machine Learning for Geographic Origin Determination of Cu-bearing Tourmaline. Journal of Gemmology. 39, 8, 772-787.