New insights on Paraíba tourmaline origin determination

Determining the geographic origin of Cu-bearing tourmaline poses a significant challenge in gemmology, particularly when traditional microscopic methods yield inconclusive results.

This study applies a combined analytical and computational approach using 469 gem-quality samples from Brazil, Mozambique and Nigeria. A total of 57 elements (from Li to U) were quantified using full-mass-spectrum LA-ICP-TOF-MS. The high-dimensional elemental dataset was reduced to interpretable 2D maps using non-linear unsupervised machine-learning algorithms, including t-distributed stochastic neighbour embedding (t-SNE) and uniform manifold approximation and projection (UMAP).

These methods successfully identified complex patterns and distinct subgroups, revealing compositional similarities not captured by traditional linear approaches. The resulting clusters provided a clear framework for geographic origin determination of unknown samples. Elemental signatures of key elements (i.e. Na, Ca, Li, Ti, Fe, Mn, Cu, Ga, Sr, La and Pb) highlighted their influence on clustering and related geochemical variations to colour and geographic origin.

Unsupervised machine-learning algorithms do not rely on predefined origin labels. This reduces errors caused by uncertain origin information and helps reveal statistical outliers that may point to new or undocumented sources. By integrating colour information with compositional clustering, the method also provides a possible framework for identifying heat treatment in high-clarity stones.

Figure 1: These Cu-bearing tourmalines (2–20 ct) are representative of the samples analysed in this study. They are all from Brazil except for the purple specimen on the right, which originates from Mavuco, Mozambique. The individual images are scaled to similar visual size for comparison, and to illustrate the typical colour range of Cu-bearing tourmaline. Composite photo by H. A. O. Wang and Julien Xaysongkham, SSEF.
Figure 2: A schematic workflow illustrates the procedure using unsupervised ML to determine the geographic origin of an unknown sample. LA-ICP-TOF-MS measurements generate multielement data, which are projected in 2D/3D space using algorithms such as t-SNE and UMAP. The resulting low-dimensional plots are then coloured according to geographic origin or other attributes, such as elemental concentration or sample colour, to reveal compositional relationships. Finally, a gemmologist compares the composition of an unknown sample to the reference data to give an opinion on its geographic origin.