TY - JOUR
T1 - Interoperability in Deep Learning: A User Survey and Failure Analysis of ONNX Model Converters
AU - Jajal, Purvish
AU - Jiang, Wenxin
AU - Tewari, Arav
AU - Kocinare, Erik
AU - Woo, Joseph
AU - Sarraf, Anusha
AU - Lu, Yung-Hsiang
AU - Thiruvathukal, George
AU - Davis, James C
N1 - Jajal, P., Jiang, W., Tewari, A., Woo, J., Lu, Y., Thiruvathukal, G.K., & Davis, J.C. (2023). Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem. ArXiv, abs/2303.17708.
PY - 2024/9/11
Y1 - 2024/9/11
N2 - Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ∼75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics would be fruitful.
AB - Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ∼75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics would be fruitful.
KW - deep learning
KW - empirical software engineering
KW - models
KW - model conversion
UR - https://ecommons.luc.edu/cs_facpubs/349
U2 - 10.1145/3650212.3680374
DO - 10.1145/3650212.3680374
M3 - Article
JO - Computer Science: Faculty Publications and Other Works
JF - Computer Science: Faculty Publications and Other Works
ER -