Artificial Intelligence (AI) System Evaluation Task
(Redirected from AI Evaluation Task)
Jump to navigation
Jump to search
An Artificial Intelligence (AI) System Evaluation Task is a software system evaluation task for assessing AI system properties (such as: AI system accuracy, AI system learning capability, AI system ethical implications).
- Context:
- input: an AI System.
- outout: a AI System Evaluation Result.
- ...
- It can (often) be supported by an AI System Evaluation System.
- It can (often) utilize AI System Evaluation Methods, for AI system assessment.
- ...
- It can range from being a Quantitative AI System Evaluation Task to being a Qualitative AI System Evaluation Task, depending on its assessment type.
- It can range from being an Offline AI System Evaluation Task to being a Production AI System Evaluation Task, depending on its deployment phase.
- It can range from being a Manual AI System Evaluation Task to an Automated AI System Evaluation Task, depending on its automation level.
- ...
- It can require User Study-Based AI System Evaluation with human participants.
- It can include Expert Review-Based AI System Evaluation by domain specialists.
- ...
- Example(s):
- Domain-Specific AI Evaluation Tasks, such as:
- Chatbot Evaluation Tasks assessing conversational ability, response accuracy, and user satisfaction
- Image Recognition System Evaluation Tasks measuring classification accuracy, processing speed, and reliability
- Recommendation System Evaluation Tasks analyzing relevance, personalization, and engagement
- ...
- Critical AI System Evaluation Tasks, such as:
- Specialized AI Testing Tasks, such as:
- Domain-Specific AI Evaluation Tasks, such as:
- Counter-Example(s):
- Non-Technical AI System Evaluation Tasks, which assess management strategies rather than AI systems.
- Hardware AI System Evaluation Tasks, which focus on physical components rather than algorithmic aspects.
- Financial AI System Analysis Tasks, which evaluate financial performance rather than AI capabilitys.
- See: AI System Development, Machine Learning Model Evaluation, User-Centered Design, Software Testing.
References
2021
- (Reddy et al., 2021) ⇒ S. Reddy, W. Rogers, V.P. Makinen, E. Coiera, et al. (2021). “Evaluation Framework to Guide Implementation of AI Systems into Healthcare Settings.” In: BMJ Health & Care Informatics. [URL: ncbi.nlm.nih.gov]
- NOTE: It provides an evaluation framework that can be applied at any development or deployment stage of AI systems, with a focus on assessing technical capabilities within healthcare contexts.
2020
- (Jin et al., 2020) ⇒ C. Jin, W. Chen, Y. Cao, Z. Xu, Z. Tan, X. Zhang, L. Deng, et al. (2020). “Development and Evaluation of an AI System for COVID-19.” [URL: pesquisa.bvsalud.org]
- NOTE: It discusses the development and evaluation of an AI system for COVID-19, detailing its comparative performance against radiologists in specific medical imaging applications.
2020
- (McKinney et al., 2020) ⇒ S.M. McKinney, M. Sieniek, V. Godbole, J. Godwin, et al. (2020). “International Evaluation of an AI System for Breast Cancer Screening.” In: Nature. [URL: nature.com]
- NOTE: It focuses on evaluating a new AI system for breast cancer screening, emphasizing the system's development and its effectiveness in cancer detection in mammograms.