TY - JOUR
T1 - Crowdsourcing Detection of Sampling Biases in Image Datasets
AU - Hu, Xiao
AU - Wang, Haobo
AU - Vegesana, Anirudh
AU - Dube, Somesh
AU - Yu, Kaiwen
AU - Kao, Gore
AU - Chen, Shuo-Han
AU - Lu, Yung-Hsiang
AU - Thiruvathukal, George K.
AU - Yin, Ming
N1 - Xiao Hu, Haobo Wang, Anirudh Vegesana, Somesh Dube, Kaiwen Yu, Gore Kao, Shuo-Han Chen, Yung-Hsiang Lu, George K. Thiruvathukal, Ming Yin, Crowdsourcing Detection of Sampling Biases in Image Datasets, The Web Conference 2020.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowdsourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.
AB - Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowdsourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.
KW - Human-centered computing
KW - Collaborative and social computing.
UR - https://ecommons.luc.edu/cs_facpubs/244
U2 - 10.1145/3366423.3380063
DO - 10.1145/3366423.3380063
M3 - Article
JO - Computer Science: Faculty Publications and Other Works
JF - Computer Science: Faculty Publications and Other Works
ER -