HEAVY METALS IN TOPSOILS OF THE REPUBLIC OF TATARSTAN, RUSSIA: MULTIVARIATE ANALYSIS AND POLLUTION EVALUATION

Background. The study is conducted to investigate the content and spatial distribution of Fe and eight heavy metals (Cd, Co, Cr, Cu, Mn, Ni, Pb, and Zn) in soils of the industrially developed region. Materials and methods. A total of 1,170 soil samples of different land use (natural, agricultural, urban) were collected from topsoils in the territory of the Republic of Tatarstan, Russia. Heavy metals concentrations in soil samples were determined using atomic absorption spectrometry after 5M HNO3 extraction. Multivariate and geostatistical analyses were used to investigate the current state of soil heavy metal contamination and to identify spatial patterns and possible sources of heavy metals on the regional scale. Results. Zonal soil types of a natural land use were used to assess the regional background values of the heavy metals: Cd – 0.44±0.24 mg/kg, Co – 10.4±3.6 mg/kg, Cr – 23.3±12.7 mg/kg, Cu – 16.4±7.8 mg/kg, Fe – 15275.4±5178.3 mg/kg, Mn – 652.4±228.4 mg/kg, Ni – 29.8±18.8 mg/kg, Pb – 11.5±3.2 mg/kg, Zn – 43.3±12.8 mg/kg. The results of the pollution evaluation showed the absence of regionalscale contamination directly related to agriculture. Urban soils were contaminated by Cu, Pb and Zn. Geostatistical analysis revealed several patterns of regional distribution of heavy metals and suggested an anthropogenic impact to the Cu, Pb and Zn distribution. Principal component analysis allowed distinguishing three regional geochemical groups of heavy metals and showed that at the regional scale the distribution of Cu, Mn and Ni is controlled by the element richness in soil parent material, overlayed by the soil forming factors; the distribution of the Co, Cr and Fe is controlled mainly by lithology; and the distribution of the Pb, Zn and Cd is strongly influenced by the anthropogenic sources. Conclusion. This case study demonstrates that a combination of multivariate statistics and geostatistical analysis together with the pollution assessment allows comprehensive characterizing heavy metals spatial distribution and determining their sources.


Introduction
Identifying sources of heavy metals in the environment is of great importance to our understanding of their pollution patterns and natural cycles in surface reservoirs of Earth [1]. Heavy metals in the topsoil have two sources: a natural input as a result of weathering of soil-forming rocks, and anthropogenic sources such as transport, industrial enterprises and agricultural land use [2,3]. Over the past decades, there has been a decline in anthropogenic emissions of heavy metals into the environment. Thus, annual emissions of cadmium in European countries decreased from 485 t to 257 t in the period from 1990 to 2003; lead -from 25 kt to 8.6 kt; mercury -from 413 t to 195 t [4]. Despite this, the danger of soil contamination by metals is still acute in industrially developed countries, which is due to their long-term accumulation in upper soil horizons [5].
Concern about increasing levels of heavy metals in soils has led to the development and implementation of numerous programs to determine the baseline levels and contamination rates by heavy metals [6]. The study of soil quality allows us not only to assess the consequences of human activities, but also is an indispensable stage for sustainable development and conservation of depleted soil resources [7]. A combination of multivariate statistics and geostatistical analysis is successfully used for identifying pollution characteristics of heavy metals in soils and distinguishing their natural sources and anthropogenic inputs [6,[8][9][10][11].
Previous studies have been dedicated to research influence of soil types and land use types on the total content and mobility of distinct heavy metals in the topsoil, and to investigate geochemical dependence of heavy metals on soil properties (humus content, particle size distribution, pH and content of Fe and Mn) [12].
The present study has been focused on polyelemental profilation of heavy metal concentrations in the topsoil in the Republic of Tatarstan, Russia. Multivariate and geostatistical analyses were used to investigate the current state of soil heavy metal contamination in this area and to identify spatial patterns and possible sources of heavy metals on the regional scale.

Materials and methods
Study area. The study was conducted in the territory of the Republic of Tatarstan located in the eastern part of the Russian Federation (55°20'36.1"N; 50°47'31.7"E). The total area of the Republic is 67 847 sq. km. The climate of the region is moderately continental with slight differences between climate-geographical zones. The mean annual temperature is 2-3.1 °С, the mean annual precipitation amount is 460-540 mm. The relief is an elevated stepped plain dissected by a dense network of river valleys [13]. The soil cover, according to the "Classification and Diagnostics of Soils of the USSR" is represented by the following main types: Chernozems (39.7 %), Gray Forest soils (39.5 %), Sod-Podzolic soils (7 %), Sod-
The industrial profile of the republic is determined by machine-building enterprises, radio-and electrical instrument making enterprises, production and processing of oil and gas. The main industrial centers of the Republic are the cities of Kazan, Naberezhnye Chelny, Nizhnekamsk, Zelenodolsk, Elabuga and Almetyevsk [16].

Soil sampling and soil analysis.
A total of 1,170 soil samples were collected from the topsoils (0-20 cm depths) (Fig. 1). Sampling was performed at a distance of at least 200 m away from industry and transport infrastructure. The soil pH was measured in a suspension of 1:2.5 soil: distilled water using a pH meter (pH-150МИ, LLC "Izmeritel'naya tekhnika", Russia). The soil humus content was determined by the Tyurin titrimetric method in TsINAO modification [17]. The particle size distribution was determined by the Kachinskiy-Robinson-Kehl pipet method [17]. As an indicator of the particle size distribution, the sum of particles < 0.01 mm was used. According to N. A. Kachinskiy this range of particle sizes is called "physical clay", and this term is used henceforward [18]. In the soil samples the "total" content of Fe and eight heavy metals (Cd, Co, Cr, Cu, Mn, Ni, Pb, and Zn) were determined according to GD 52. 18.191-89 [19]. The soil samples were airdried, pounded and passed through a 1.0 mm sieve. Then, 2.0 g of the sieved samples were digested for 3 hours in a 10 ml of 5 M HNO 3 on a boiling water bath. After the 3 hours, flasks with the samples were pulled out from the water bath and cooled to room temperature. After cooling, extractions were filtered through ashless filters into 50 ml flasks, and brought to the volume with deionized water. The concentrations in the solution were measured by flame atomic absorption spectrometry (AAnalyst-400 "Perkin Elmer", USA). It should be considered, that the extraction of heavy metals by 5 M HNO 3 does not in all cases allow estimating absolute values of heavy metals concentrations. The extractable fraction of heavy metals depends on geochemical properties of an element and nature of an organic-mineral matrix, and may be less than 75 % of their actual content. Nevertheless, this extraction allows assessing overall soil pollution and remains relevant from the practical point of view. The term "total concentration" is used henceforward to identify this fraction of metals. Soil types were determined according to "Classification and Diagnostics of Soils of the USSR" [14]. The type of land use was established according to the site of sampling: agricultural, natural (forest and virgin soils), urban (samples collected on a territory of cities). The distribution of soil samples by soil types and land use types is presented in Table 1. Sod-Podzolic, Sod-Alluvial and Meadow Alluvial soils were mainly represented by the natural land use type. The most samples of Chernozems, Gray Forest and Sod-Calcareous soils were represented by tillable horizons of agricultural soils. All urban soils were united in the type of Urbanozems due to impossibility to determine their original genetic type as a result of anthropogenic transformation and destruction. Background levels and pollution assessment. The Median±2MAD (MAD=Median(|Xi-Me(X)|)) method was used to estimate the regional background levels of heavy metals [2]. It is a robust non-parametric method and does not depend from the data distribution. The estimated background levels are represented by a range, which is more realistic than absolute single-value estimations [20]. Only zonal soil types of natural land use were used to assess the background of heavy metals. Nonzonal soil types, such as calcareous and alluvial soils, were excluded to eliminate the influence of anomalies associated with releases of carbonate rocks and the depositions with alluvial sediments. Agricultural and urban soils were excluded from the background estimation because of a potential contamination due to a high anthropogenic load.
In this study, three indices were applied to qualify the observed degree of heavy metal enrichment and to evaluate anthropogenic influences. The single factor pollution index (P i ) was used to assess enrichment ratios of the each heavy metal against background levels [7]: where C i is the measured value of the element i at each sampling point, and S i is the regional background value of the heavy metal i. The soil pollution according to P i values is classified into three levels: if P i <1, soil is characterized as unpolluted; if the index is in the range of 1≤P i ≤3, soil is moderately polluted; if 3≤P i , soil is strongly polluted [21]. Two methods to assess overall contamination were used. The first is the Nemerow Comprehen-sive Index (P N ), which is commonly used to assess polyelemental soil pollution [22]: where P i is the single factor pollution index of heavy metal i, P̅ i is the mean of single factor index for heavy metal i, and max(P i ) is the maximum of single factor index for heavy metal i. Classification of Nemerow Comprehensive Index values is similar to P i classification. The second one is the Saet-based Comprehensive Pollution Index (Z c ), and it allows assessing ecological threat of contamination since it takes into account geometric mean of single pollution indexes and toxicity of heavy metals [23]: where P i1 -P in is the single factor pollution indexes of the elements, n is the number of heavy metals, K tn is the toxicity coefficient of heavy metal i. Use of the geometric mean in Z c calculation, instead of the arithmetic mean in the classic Saet Index, decreases estimates of overall pollution in the case of high spread of the pollution indices P i [23]. The toxicity coefficients of heavy metals depend on their hazard classes. The heavy metals hazard classes and corresponding toxicity coefficients are represented in Table 2. The Z c is classified into the following pollution classes: Z c < 16 -pollution is not dangerous, 16<Z c <32 -moderate danger of pollution, 32<Z c <128 -dangerous pol-lution, and Z c >128 -extremely dangerous pollution [23]. Table 2 Hazard classes of heavy metals and corresponding toxicity coefficients used to calculate Z c Hazard class K t E l e m e n t 1 1.5 Cd, Pb, Zn, Cr 2 1.0 Co, Ni, Cu 3 0.5 Mn Spatial interpolation and accuracy assessment. It was shown that ordinary kriging (OK) is the best method to interpolate a single variable in absence of explanatory variables [24][25][26]. However, OK estimates do not reproduce the sample histogram because of reduced variance as a consequence of a smoothing effect. In the OK estimation process low values are overestimated and high values are underestimated making the estimated histogram narrower than the sample histogram [27]. This can lead to a critical underestimation of regional soil pollution. To get accurate maps with preserved spatial variation of the heavy metals content, the interpolation was done using OK with a smoothing effect correction, as it proposed by Rezaee et al. [27]. The method is based on a normal score transformation of the original variable: where y(x i ) is the normal score, G(y) is the standard normal cumulative density function, and r(x i ) is the rank for i th z(x i ) associated with a set of n data values. Further estimate and smoothing correction are done using equation: is the mean of the transformed data, and Y * OK (x 0 ) is the ordinary kriging estimate of the transformed data.
After the estimation and correction were done, Y ** OK (x 0 ) is back-transformed into an original scale using the inverse operation: where Z ** OK (x 0 ) is back-transformed values, F -1 is the function of back-transformation.
More details on the algorithm and examples can be found in the original paper.
Accuracy of the interpolated maps was assessed using leave-one-out cross-validation (LOOCV). In LOOCV each sampling point o i is removed sequentially. A spatial interpolation model is fit on n-1 observation, and a prediction p i is made for an excluded observation, using its X values. Several error measurements were calculated using the difference o i -p i : 1) mean error is given by: 2) root mean square error is given by: 3) ratio of the observed and the predicted variances is given by: Var p RVar Var o = Li and Heap made a review of several criteria for using error measurements to judge the performance of spatial interpolation methods [25]. The model is better if ME is closer to zero and RMSE is smaller. The closer RVar is to 1, the better the ability of a spatial interpolation method to preserve an observed variance.

Statistical analysis.
Principal component analysis was commonly used to distinguish possible sources of heavy metals and pathways in soils [1,28,29]. Correlation matrix-based Principal Component Analysis was carried out to determine geochemical associations of heavy metals in a regional scale. The PCA was performed considering the nine metals as variables. Varimax rotation was applied because orthogonal rotation minimizes the number of variables with high loading on each component and facilitates the interpretation of results.
The coefficients of the Principal Vectors, which are the eigenvectors of the correlation matrix, inform which metals contribute the most for each Rotated Principal Component (RC). In this work special attention was given to the metals with loadings higher than 0.5 and smaller than -0.5. This is similar to the Factor Analysis approach, with a difference that variables with loadings in the range -0.5 -0.5 are not excluded from the analysis [30].
The Kaiser criteria was used to decide the appropriate number of rotated components to be retained (i.e., only factors with eigenvalues >1).
Software. All statistical analyses here presented were performed within the statistical environment R [31]. The spatial interpolation was performed using "gstat" package [32]. Final processing of predicted maps was performed using geoinformatical system QGIS [33].

Results and discussion
Heavy metals content in soils. The descriptive statistics of the heavy metals content in the top soils are summarized in Table 3. Table 3 Descriptive statistics and the estimated regional background values of heavy metals  The mean content of Co, Cd, Ni, and Mn in the topsoils of the RT exceeds worldwide estimates according to C. Reimann and P. de Caritat [34] -10 mg/kg, 0.3 mg/kg, 20 mg/kg and 530 mg/kg for the listed elements respectively. The content of Cu, Zn, Cr, Fe and Pb, on the contrary, is lower than the mean worldwide estimations for soils -25 mg/kg, 70 mg/kg, 80 mg/kg, 35000 mg/kg and 17 mg/kg respectively [34]. The mean content values of heavy metals in soils of natural land use, normalized by the lower background threshold (Median-2MAD), were decreased as follow: Ni>Cr>Fe>Cu>Pb>Zn>Co>Mn>Cd. Agricultural activity had a certain impact on the accumulationleaching mode of the elements in topsoil, which slightly affected the decreasing order of normalized heavy metals content: Cr>Ni>Fe>Cu>Co> >Zn>Pb>Mn>Cd. The influence of anthropogenic sources is the most powerful in urban soils where the background-normalized content of heavy met-als decreased in the following order: Cu>Pb>Ni>Cd>Cr>Zn>Fe>Mn>Co. Table 4 and the resulted variogram graphs are shown in Fig. 2,j. The spatial structures of the Cd, Co and Zn content showed the presence of the geometric anisotropy in the NW-SW direction: the autocorrelation range in a minor direction (SE-NW) was 0.6 times less. The nugget/sill ratios showed the strong spatial dependence of the Co and Ni values [35]. The medium strength of autocorrelation was observed for Cu, Fe and Mn (see Table 4). In the case of Cd, Cr, Pb and Zn the nugget/sill ratio is >0.6 showing weak spatial dependence. The high nugget effect represents an uncorrelated variation at the scale of sampling, which in this case could be an indicator of significant local anthropogenic input of Cd, Cr, Pb and Zn [36]. The distribution maps of the heavy metals are shown in Fig. 2,a-i. As it can be seen from the predicted maps, there are several clear patterns of the heavy metals spatial distribution in the topsoil. The first geochemical zone showed elevated values of Cd (0.5-1.4 mg/kg), Co (15-37 mg/kg), Cr (30-68 mg/kg) and Zn (60-104 mg/kg) in the topsoil. It is located in the northern central part of the territory under investigation, in the western part of the so-called Predkam`e climate-geographical region (Fig. 2,a,b,c,i). The second geochemical zone is located in the southeastern part of the Republic (so-called Bugulmiskaya and Eastern-Zakamskaya climate-geographical regions), and it showed an elevated content of Cr (30-50 mg/kg), Mn (825-2043 mg/kg) and Ni (45-98 mg/kg) (Fig. 2,c,f,g). Iron showed two zones of elevated values: southwestern part of the Republic (socalled Predvolj`e), where it varied in the range 19500-67600 mg/kg; and the northern part (Predkam`e) with the range of the Fe content 19500-53300 mg/kg (Fig. 2,e). In addition, elements such as Cd, Cu, Pb and Zn showed "bullseye" elevated zones around the cities with developed industry (Fig. 2,a,d,h,i).

Vol. 4 (3), 2019
Principal component analysis. Only three rotated principal components (RC) with an eigenvalue higher than 1.0 were extracted during the PCA, accounting for over 64 % of the total variance.
The results of the PCA showed that nine heavy metals in the topsoil of the investigated territory can be identified as three rotated principal components, contributing to 24, 22, and 18 % of the total variance, respectively, and the accumulating contribution of them was 64 %. The plane formed the first and second Rotated Components are presented in Fig. 3,a. The plane formed by RC1 and RC3 is shown in Fig. 3,b.
The heavy metals with the strongest loadings on RC1 (0.76, 0.90 and 0.74 for Cu, Ni, and Mn, respectively) represent a regional geochemical group of manganophile elements. According to the geological classification by Goldshmidt, copper belongs to the group of chalcophiles, and nickel to the group of siderophiles [23]. However, it was shown that in the unpolluted soils there is a strong dependence of Cu and Ni content on the content of manganese compounds [23]. High correlation coefficients (Spearman`s rho) also pointed to the strong associations of Cu and Ni to the Mn in the investigated topsoils: r (Cu-Mn) =0.56 (adj. p-value<0.01), r (Ni-Mn) =0.66 (adj. p-value<0.01). The correlations between RC1 and humus content (r (RC1-Humus) =0.42, p<0.01), physical clay content (r (RC1-Phys. clay) =0.41, p<0.01) and pH(r (RC1-Humus) =0.36, p<0.01) allowed us to conclude that RC1 is controlled by the element richness in soil parent material, overlayed by the soil forming factors (Fig. 4,a,b,c).
The elements Cd, Pb, and Zn showed the highest loadings on RC2 (0.72, 0.85 and 0.75, respectively). The absence of significant correlations between RC2 and soil properties (r (RC2-Humus) =0.09, p<0.01; r (RC2-Phys. clay) =-0.01, p=0.79; r (RC2-pH) =-0.01, p=0.95) and patterns of the interpolated maps confirmed that the RC2 represents exogenous input of heavy metals in the conditions of increased anthropogenic load (Fig. 4,d,e,f).  Fig. 4. Relationships between humus content, particle size distribution and pH and scores of the first three Rotated Components obtained from the PCA. LOESS smooth lines (red color) were added for better interpretability Pollution assessment. The summary of the calculated pollution indices is presented in Table 5. Pollution indices calculation was performed on the base of the upper background thresholds of heavy metals in topsoils (Median+2MAD).
According to the single-factor pollution indices, natural and agricultural soils in the territory under investigation were characterized as uncontaminated relatively to the regional background values. Thus, a number of soil samples with a P i <2 did not exceed a 1 % of all the natural soils for any of the heavy metals. The P i values distribution for agricultural soils was similar to that of the natural soils: 68-96 % of arable horizon samples had a pollution index P i <1 and were characterized as uncontaminated; 5-30 % were characterized as potentially contaminated (1<Pi<2) by one or several heavy metals. At the same time, there were several Co and Cu P i values that indicate a high pollution, which could be an existence evidence of local con-tamination sources and requires additional confirmation on larger scales (see Table 5).
In urban soils high P i values were observed for three heavy metals: Cu, Pb, and Zn. A degree of contamination by copper was raised in cities in the following order: Naberezhnye Chelny (P̅ i =1.12, max(P i )=3. 35  The calculated Nemerow polyelemental pollution index for topsoil samples is shown in Fig. 5,a. According to the P N index, 54.4 % of the surveyed natural soils and 60.1 % of the agricultural soils were characterized as uncontaminated (P N <1); 43.4 % of the natural soils and 37.3 % of the agricultural soils were potentially contaminated concerning the regional background levels (1<P N <2). The proportion of soil samples with P N >2 was less than 3 % of all surveyed soils of natural and agricultural land use types (see Fig. 5,a). Distribution of the Nemerow index for urban soils reflected higher anthropogenic loads: 60.8 % of samples collected on the territory of the largest cities were characterized as potentially contaminated (1<P N <2); 8.9 % of the urban samples were contaminated with heavy metals.
According to the calculated Saet-based Comprehensive Pollution Index, the contamination levels revealed by the single factor and Nemerow pollution indices were characterized as not dangerous (Fig. 5,b).

Conclusion
The concentrations, background levels and sources of Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn in topsoils from the Republic of Tatarstan, Russia were investigated in this study. The results obtained offered relevant information on the distribution of heavy metals on a regional scale.
Pollution assessment using single factor pollution indices and two comprehensive polyelemental pollution indices showed the absence of heavy metals contamination directly related to an agricultural activity: proportions of pollution indices values of agricultural soils were similar to those of the natural soils. However, agricultural activity influences the physical and chemical soil properties, which was reflected on the proportions of heavy metals in the arable horizons compared with the humus horizons of the natural soils. A high anthropogenic load led to an increase of Cu, Pb and Zn content in urban soils, which was also reflected in the high values of the pollution indices.
About 40 % of the collected samples were characterized as potentially contaminated, and 3 %as contaminated by heavy metals. Although, the overall soil contamination was characterized as not dangerous according to the Saet-based Comprehensive index, a more detailed and strict contamination monitoring is required at the revealed sites of the elevated heavy metals content.
Geostatistical mapping allowed us to reveal several patterns of regional distribution of heavy metals and confirmed an anthropogenic influence to the Cu, Pb and Zn regional distribution. The PCA allowed distinguishing three regional geochemical groups of heavy metals, confirming previous findings about anthropogenic origin of Cu, Pb and Zn in the topsoils.
This case study demonstrates that a combination of multivariate statistics and geostatistical analysis together with the pollution assessment allows comprehensive characterizing heavy metals spatial distribution and determining their sources. In addition, this study provides a basis for further comparisons and assessments to monitor a longterm heavy metal accumulation.