Application: predicting antibiotic resistance
In this part of the tutorial, we will model the resistance of 141 Mycobacterium tuberculosis isolates to the rifampicin antibiotic. We will use the Set Covering Machine algorithm (Marchand and Shawe-Taylor, 2002), which produces sparse interpretable models, and a Support Vector Machine, which produces a black-box model.
To apply the Set Covering Machine algorithm, we will use Kover (Drouin et al., 2016) a disk-based implementation of this algorithm designed to learn from large genomic datasets. Kover uses reference-free genome comparisons, based on k-mers, to learn sparse and interpretable models of phenotypes. The models produced by Kover make predictions based on the presence/absence of k-mers.
Data
Download and uncompress the data using the following command:
make applications.antibiotics.data
Now, move to the data directory using the following command:
cd kover-example
Kover example
Once this is done, follow the example given in the Kover documentation, but skip the part about downloading the data.
Comparison to SVM
Use the cd ..
command to go back to the exercise
directory. Then, run the following command to train a Support Vector Machine on this dataset and compare it to Kover.
make applications.antibiotics.svm
Exercise: Which of the models is the most accurate (SVM or Kover)? Can you guess why?
Solution: click me
More antibiotic resistance models
If you are interested in interpretable models of antibiotic resistance, take a look at the Kover Antimicrobial Resistance Platform, which catalogs the models obtained by applying Kover to a large number of datasets.
References
Drouin, A., Giguère, S., Déraspe, M., Marchand, M., Tyers, M., Loo, V. G., … & Corbeil, J. (2016). Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons. BMC genomics, 17(1), 754. [link]