2 , 5-dimethylfuran as a validated biomarker of smoking status

INTRODUCTION
Exposure biomarkers are required in tobacco use studies to accurately assess smoking status since self-reporting usually results in misclassification estimates. This study uses breath analysis and assesses some volatile organic compounds (VOCs) as potential biomarkers of tobacco smoke exposure.


METHODS
Forced-expiratory breath samples were obtained from 377 volunteers (174 smokers and 203 nonsmokers). Exhaled breath levels of different VOCs previously related to tobacco smoke were evaluated. The toluene-to-benzene ratio was evaluated as this ratio has been found to be different in atmospheric samples and tobacco smoke emissions. Finally, breath analyses from 64 patients attending a clinical practice were evaluated and the results were compared to their self-reporting status.


RESULTS
Univariate analysis shows that all compounds evaluated gave significant differences (p < .001). Receiver operating characteristic (ROC) curves suggest that xylenes and toluene are not able to accurately determine smoking status, and benzene and the T/B ratio present potential utility in certain conditions. The highest discriminant capacity was obtained for 2,5-dimethylfuran (AUC = 0.982, 95% confidence interval [CI]: 0.969-0.995), with a cut-off value of 0.016 ppbv (sensibility = 0.965, specificity = 0.896). Drinking coffee was the only confounding parameter that can give low breath levels for this compound. The evaluation of the results obtained from the patients attending a clinical practice showed that 8% of people who claim to be nonsmokers hid their real smoking status.


CONCLUSIONS
The results obtained confirm that the determination of 2,5-dimethylfuran in breath samples is a good and simpler alternative to conventional blood or urine tests for assessing smoking status.


IMPLICATIONS
Analysis of 2,5-dimethylfuran in breath samples results in a simple and fast method for the determination of the smoking status of a person. This methodology presents multiple advantages as it is neither invasive nor embarrassing for patients attending clinical practices. Moreover, analysis of biomarkers in breath samples is simpler and faster than using conventional methods based on urine or blood analysis.


INTRODUCTION
Smoking is associated to many adverse health effects [1,2] and also has a clear economic effect on the cost of health services [3,4]. For this reason, many life insurance companies offer significant premium reductions to non-smokers. In the U.S., the Affordable Care Act (ACA) allows insurance companies to charge smokers up to 50% more than non-smokers through a tobacco surcharge [5][6][7].
Accurate assessment of smoking status is critical for determining tobacco use, population risk, and smoking diseases. Self-reported smoking is widely used to determine exposure to tobacco smoke and estimate prevalence of cigarette smoking.
However, this procedure tends to underestimate the true prevalence of smoking when a biological sample has also been analyzed for comparison [8]. Although self-report is a fair estimator of smoking prevalence, misclassification rates for non-smokers ranging from 1% to 10% have typically been reported [8][9][10][11][12][13][14]. These misclassification rates tend to be still larger in ex-smokers, pregnant women, and clinic-based studies [9,15], which suggests that people may deny their current smoking habits due to social stigma or fear, particularly in situations where they may have been given advice to stop smoking by a doctor.
The WHO Study Group on Tobacco Product Regulation [16] concluded that exposure biomarkers are required to support exposure reduction claims in studies defining the dependence potential of different products and in evaluating the effects of specific regulatory changes on exposures in the general population. Therefore, it is necessary to find a biomarker of tobacco exposure to confirm self-reported information.
Nicotine is the main addictive component in tobacco products, a major constituent of cigarettes, and is highly specific to tobacco smoke. However, its short half-life time in biological fluids (t 1/2 =2-3 h in blood) makes it necessary to determine a nicotine metabolite. In the case of tobacco smoke, serum cotinine (t 1/2 =15-19 h in blood, urine and saliva), the major metabolite of nicotine, is considered to be the standard for measuring exposure [15,17]. However, blood analysis is an invasive method to obtain the samples. Urine cotinine or its glucuronide conjugate are reliable measures of nicotine uptake and are commonly used biomarkers of tobacco exposure [15,17], although it should be noted that total cotinine may also reflect nicotine exposure from tobacco substitutes, such as nicotine patches [18] and nicotine chewing gum [19].
Tobacco-specific nitrosamines, particularly 4-(methylnitrosamino)-1-3-(pyridyl)-1butanone (NNK) and its major metabolite 4-(methylnitrosamino)-1-3-(pyridyl)-1-butanol A c c e p t e d M a n u s c r i p t 4 (NNAL), are more accurate urinary biomarkers of tobacco exposure [20]. Although urine measurement is not invasive, it can be viewed as psychologically invasive or embarrassing, there is a biological hazard involved in specimen handling, and it may be hard to apply in a pediatric setting or in the case of large numbers of study subjects.
Other non-invasive methods such as saliva, hair, and sputum analysis have been used to evaluate tobacco exposure [15,17].
The least invasive and probably the simplest method to perform this study is through breath analysis. Exhaled CO has normally been used to assess recent tobacco exposure (<8h) [17,21] despite there being considerable inter-individual variability.
Tobacco smoke is an aerosol produced by complex and overlapping burning-, pyrolysis-, pyrosynthesis-, distillation-, sublimation-and condensation-processes during the smoking of cigarettes, which comprises a highly complex chemical mixture of nonspecific products of organic material combustion and chemicals that are specific to the combustion of tobacco and other components of the cigarette [22]. Approximately 5% (w/w) of mainstream smoke is composed by volatile organic compounds (VOCs), which are formed by the incomplete combustion of tobacco during and between puffs.
Previous studies have demonstrated that active smoking increases the levels of different VOCs in breath and blood [23][24][25][26][27][28][29][30], and active smokers can be discriminated by higher values for combustion products such as furans, as well as benzene, toluene and xylene aromatic hydrocarbons. It has been reported that 2,5-dimethylfuran plays a dominant role in distinguishing between smokers and non-smokers [25,27,29,30].
A preliminary study [29] found that 2,5-dimethylfuran is a highly selective breath biomarker of smoking status as this compound is able to differentiate between social smokers and non-smokers. In the present study, we have performed breath analysis of a large cohort of smokers and non-smokers to determine the validity of this technique as an alternative to conventional blood and urine analysis for the determination of smoking status. The cut-off value for the compound with the highest discriminant capacity for smoking status has been determined and the proposed method has been used to check the validity of self-reports at two different medical practices.

MATERIALS AND METHODS
Subjects 387 adult volunteers participated in the study. Before taking breath samples, participants were informed of the nature of the test and the aims of the study. Inclusion M a n u s c r i p t 5 criteria were that a person was considered to be a smoker when he/she admitted to a smoking habit of at least one cigarette/day and had smoked within the previous 24 h.
Within this group, only cigarette smokers were evaluated and the exclusive use of any other tobacco product, such as e-cigarettes, was an exclusion criterion. Given that a previous study has indicated that some VOCs can detect smoking status after more than 24 h without smoking [29], we decided to exclude ten participants from the statistical calculations as they reported being social smokers, consuming less than one cigarette/day on average, and that more than 24 h had gone by since they last smoked a cigarette. Of the 377 adult volunteers included in the study, 127 were men (33.7 %) and 250 were women (66.3 %), and the mean age was 29.2 years (range . No requirements related to food and drink ingestion were made prior to breath sampling, although the volunteers were asked whether they had drunk coffee in the previous hours because it has been reported that 2,5-dimethylfuran can be released by roasted coffee beans [31,32]. Thirty-six volunteers admitted to the use of cannabis and 29 of these mixed cannabis with tobacco. Under the conditions indicated for being considered smokers, 174 volunteers were included in this group. The smoking habits of these subjects were recorded: 20 people For the validation of the proposed results, breath samples from 64 people (43 females; mean age 42.5 years, range 23-61) attending a neurological and a endocrinology practice were assessed. Smoking status was taken during these visits and the information self-reported by patients was compared with the results obtained from breath analyses. When disagreement was observed, patients were contacted again to confirm that their initial self-report was not correct.

Breath analysis
M a n u s c r i p t 6 Different VOCs were selected for their evaluation as smoking biomarkers, taking into account preliminary results [23][24][25][26][27][28][29][30]. Benzene was evaluated given that it is the VOC that has most frequently been proposed as a smoking biomarker in the literature.
For the analysis of breath samples, an "in-house" capillary thermal desorption device connected to a gas chromatograph with mass spectrometry detection (GC-MS) (Thermo Scientific, Waltham, MA, USA) was used [37,38]. The microtrap used in this study has been specifically developed for the analysis of VOCs in breath samples at ppbv-pptv levels. Specific details about trap design, the GC-MS method and its validation are given in the Supplementary Materials.
Forced-expiratory breath samples were collected for each individual as follows: the first 2-3 s of the expiration were not collected in order to minimize the sampling of deadspace air, and the remaining fraction was collected until about 900 mL of breath had been introduced into a cleaned 1 L Tedlar gas-sampling bags (SKC Inc., Eighty Four, PA, USA). Each sample was analyzed no more than two hours after being collected to avoid the loss of analytes from the bags [39]. For each sample, 750 cm 3 of breath were required for the chromatographic analysis (i.e. breath samples were moved through the microtrap during 25 min at a fixed flow rate of 30 cm 3 ·min -1 ).
Each Tedlar bag was cleaned with purified nitrogen several times before new samples were collected. In order to confirm the validity of the cleaning process, the last portion of nitrogen collected in the cleaning process was analyzed in the same conditions as breath samples to confirm that no detectable levels of any target compound were found.

Statistical Analysis
Statistical analysis was performed using SPSS for Windows Version 15.0. For calculations of statistical significance, two-sided testing was used and p<0.05 was considered as significant. Shapiro-Wilk and Kolmogorov-Smirnov tests were used to study the distribution of the compounds evaluated in the samples. The results indicated that the chosen analytes do not follow a normal distribution neither in the case of smokers nor non-smokers (p<0.001). Continuous variables are expressed as median M a n u s c r i p t 7 [quartiles] and the Mann-Whitney U-test was used to compare the values found between smokers and non-smokers. Receiver operating characteristic (ROC) curves were used to assess the discriminant power of each individual compound and to determine the best cut-off value for the prediction of smoking status. Multivariate logistic regression analysis was used to determine the compounds that can predict smoking status.  Table 1 shows the results obtained in the determination of the target analytes in smokers and non-smokers.

Two
In the univariate analysis (Mann-Whitney U-test), significantly higher values were detected for all compounds in the smoker group, which agrees with previous studies [29]. In the case of the toluene-to-benzene (T/B) ratio, smaller values were obtained for the smokers group. showed that 2 non-smokers patients (8%) had levels of 2,5-dimethylfuran above the cut-off limit.
Of the 377 subjects in this study, 36 admitted to being cannabis consumers. 29 of these (80.6%) reported that they consume cannabis mixed with tobacco, which agrees with a 2017 Global Survey that reported that 90% (80% in Spain) of European cannabis consumers smoke cannabis mixed with tobacco [40]. 2,5-dimethylfuran was detected, always above the cut-off limit, in 28 subjects from this group (96.6%). Of the 7 people that reported that they consume cannabis without mixing it with tobacco, 6 subjects yielded non-detectable levels of 2,5-dimethylfuran and one gave a level below the cut-off limit at 0.014 ppbv.

Chambers et al. [41] used the data from the 2003-2004 National Health and Nutrition
Examination Survey (NHANES) to evaluate the blood levels of different VOCs and found that cigarette smoking is a primary source of benzene and toluene and an important source of xylene exposure, which was also confirmed with the data of the NHANES for the 2005-2006 period [42]. We included the T/B ratio in the present study as this ratio has been used in ambient air quality studies for estimating the ageing of air masses resulting from photochemical pollution and for characterizing the distance from vehicular emission sources, since the main anthropogenic source of VOCs in Western countries is road traffic and this ratio increases with increasing traffic volume [43,44].  [43,44]. In the case of cigarette smoke, different studies have indicated that the T/B ratio is relatively constant, in the 1.2-2.1 range, and without significant differences for different types of cigarettes [33][34][35].
Although both compounds increase their levels in cigarette smoke, this increase is about 1.5 times greater in the case of benzene than of toluene [42], which leads to a decrease in the T/B ratio in cigarette smoke. The results obtained in the present study indicates that the T/B ratio for the smoker group (1.908 [1.503-2.643]) agrees with conventional emissions in cigarette smoke. For non-smokers, a significantly higher exhaled T/B ratio was found (6.965 [4.357-11.478], p<0.001) (Figure 2). These results confirm that the main exposure source for toluene and benzene in the smoker group is cigarette smoke.
The univariate analysis results (Table 1) show that all VOCs evaluated and the T/B ratio gave significant differences between smokers and non-smokers, which agrees with previous studies that have demonstrated that active smoking increases the levels of different VOCs related to tobacco smoke in exhaled breath [23][24][25][26][27][28][29][30], and have suggested that this matrix can be used for the assessment of smoking status. Although these results indicate that all the target VOCs may be able to assess smoking status, some studies have indicated that 2,5-dimethylfuran plays a dominant role in distinguishing between smokers and non-smokers [25,27,29,30].
A previous study showed that xylenes and toluene seem only to be adequate in the case of heavy smokers and after short-term exposure (maximum 30-45 minutes after smoking); benzene was useful for medium and heavy smokers, and for as long as 12-13 h after smoking for heavy smokers and up to 2 h for light smokers; whereas 2,5dimethylfuran was effective for long-and short-term exposure (up to 48 after smoking) and for light and heavy consumption [29]. This study found a positive, although weak, significant correlation between the daily number of cigarettes smoked and breath levels detected. It also confirmed that breath levels are time dependent and fall rapidly after smoking. In general, it was found that breath levels depended on a combination of two parameters: time span and cigarette consumption, although time span after smoking is the most significant.
In the present study, we were focused on determining the diagnostic capacity of previously proposed VOCs without differentiating between light-and heavy-consumers and time-span, and the only limitation was of having had a minimum consumption of one cigarette per day. For this reason, the ROC curves have been determined (Figure M a n u s c r i p t 10 1) since the AUC of the ROC curve is widely recognized as the measure of a diagnostic test's discriminatory power [45]. The results obtained confirm that, according to the arbitrary classification guidelines based on a suggestion by Swets [46], xylenes and toluene have rather low accuracy (AUC<0.75), and are therefore not useful for the accurate determination of smoking status. Benzene (AUC=0.923, CI: 0.891-0.956) and T/B ratio (AUC=0.921, CI: 0.891-0.951) have good accuracy and present potential utility as a diagnostic test in some conditions, as for example short expanded time since smoking. Our results confirm that the compound with the highest discriminant capacity is 2,5-dimethylfuran with AUC=0.982 (CI: 0.969-0.995), a value that indicates excellent discriminatory ability.
The results obtained in the multivariate logistic regression analysis ( Table 2) indicate that 2,5-dimethylfuran is practically the only factor that needs to be used to determine the smoking habit. The high odds ratio obtained for 2,5-dimethylfuran can be explained by the fact that this compound was not detected in 174 breath samples (85.7%) from non-smokers, and therefore the differences between the two groups are close to perfect.
2,5-dimethylfuran was detected in 29 breath samples from non-smokers (14.3%) with a median value of 0.048 [0.015-0.106] ppbv and a maximum detected level of 0.210 ppbv. It was found that in all these cases, they had drunk a cup of coffee less than 1 h before taking the sample. It has been reported that 2,5-dimethylfuran can be released by roasted coffee beans [31,32], which is due to roast defects that result in thermal degradation of D-glucose and sugar polymers [32]. To assess the effect of coffee drinking, a group of five non-smoker volunteers were asked to perform breath analysis at different times: just before drinking a coffee and at three different times afterwards (15 min, 1h, and 3h later). In no case was 2,5-dimethylfuran at the first sampling time, before drinking coffee, but the compound was detected after drinking a coffee, with a maximum level of 0.226 ppbv after 10 min. The level of 2,5-dimethylfuran decreased over time and was never detected at 3h. This indicates that coffee drinking in the last 3 h should be a requirement for a perfect discriminant detection of smoking status using breath levels of 2,5-dimethylfuran. However, we have calculated the cut-off value for 2,5-dimethylfuran in breath from the ROC curve, 0.016 ppbv (sensibility=0.965, specificity=0.896), in the case that coffee drinking is not restricted before the analysis.
The results from the ten social smokers not included in the statistical evaluation showed 2,5-dimethylfuran values ranging from non-detected (n=3) to 0.050 ppbv, and 40% of them gave 2,5-dimethylfuran levels above the indicated cut-off limit. M a n u s c r i p t 11 In the case of cannabis consumers, 2,5-dimethylfuran was only detected in breath samples from those people that indicated that they mixed cannabis with tobacco and was either not detected or was below the cut-off limit for people that reported smoking cannabis alone, suggesting the value of this compound as a biomarker for tobacco use.
Finally, breath samples from 64 patients attending two different medical practices were analyzed and the results were compared to their self-reporting smoking status. 39 of the patients recognized being smokers and the results obtained for 2,5-dimethylfuran in breath confirmed their smoking status. 25 patients reported not being smokers and the breath analysis indicated that two of them (8%) should be classified as smokers. This percentage of misclassification agrees with previously reported percentages [8][9][10][11][12][13][14].
After a second interview with these two people, both admitted that they were in fact smokers.
The use of breath analysis, applying GC-MS, also permitted the qualitative detection of a significant presence of some VOCs that can be related to the ingestion of gums and candies or the use of toothpaste with flavors such as menthol, eucalyptol, cymene and limonene, which are present in these products. It is interesting to note that some of these compounds were found to be present in large concentrations in the breath samples of the two people who tried to hide their smoking status.

CONCLUSIONS
The results obtained confirm that 2,5-dimethylfuran is the VOC with the highest discriminant capacity for smoking status in breath analysis. This compound was the only compound tested that was able to detect smoking status in people smoking less than 1 cigarette/day and with a time-window of more than 24 h since last smoking.
Benzene and T/B ratio has good accuracy for assessing the smoking status but these two parameters are not able to detect social smokers or time-windows of more than 12 h.
Despite urinary biomarkers such as NNAL being the best choice to accurately confirm smoking status, the analysis of a breath biomarker such as 2,5-dimethylfuran serves as a simple and quick check. The use of breath analysis presents many advantages over conventional blood and urine test as, in addition to its simplicity, the methodology is not invasive or embarrassing and is well accepted by patients attending clinical practices. M a n u s c r i p t 12 A limitation of our study is that we only included regular tobacco smoke exposure resulting from combustible cigarette use and so we do not know whether or not our results can be extrapolated to the use of other tobacco products.

Conflict of Interest
None declared     The sensitivity on the ordinate represents the true positive rate whereas 1-specificity represents the false positive rate, where the specificity is the true negative rate.