The emergence of numerous AutoML (Automated Machine Learning) tools, such as Auto- sklearn and TPOT, as solutions to the challenges surrounding the utilization of Machine Learn- ing and Data Science by non-technical users to solve different problems related to data has been driven by prior efforts dedicated towards the automatic creation of data pipelines. Fur- thermore, the creation of these ML-specialized pipelines is only the tip of the iceberg in terms of the challenges on the road to finding the optimal set of pipelines. Thus, existing solutions specialize in the process of selecting the most effective candidate pipelines (or more accurately limiting the search space for these pipelines), whether through the optimization of data prepro- cessors, the optimization of model choice, or the optimization of the hyperparameter tuning process. However, there are other challenges that are not addressed enough within these efforts. The first challenge is the generation of complex analytical workflows that satisfy users’ needs aside from their differences; the AutoML tools specialize in ML-focused tasks, such as classi- fication and regression without consideration to other data-centered tasks, such as descriptive analytics or data visualization. Additionally, the workflows generated from these tools are de- signed to run on specific execution engines regardless of users’ preferences and limited scope of expertise. This work focuses on the generalization of the framework of generating engine-agnostic complex analytical workflows. Thus, it addresses the previously mentioned challenges by developing a new generalized and extensible ontology that represents the entire process of generating analytical workflows from user intents. In addition, a more generalized workflow generation algorithm is adopted to ensure the generation of workflows that satisfy other user intents beyond classification, in this case data visualization. Finally, a rule-based optimization technique is incorporated within the whole framework, specifically the logical level of the workflow generator, to encode heuristic rules that could assist in choosing the best preprocessing operators rather than generating all the possible combinations in a brute-force manner. The experimental setting consists of comparisons between the previous workflow genera- tion proposal and the work proposed in this thesis, in addition to comparisons between the workflows selected by the rule-based optimization and all the other possible workflows for each intent, separately.

The emergence of numerous AutoML (Automated Machine Learning) tools, such as Auto- sklearn and TPOT, as solutions to the challenges surrounding the utilization of Machine Learn- ing and Data Science by non-technical users to solve different problems related to data has been driven by prior efforts dedicated towards the automatic creation of data pipelines. Fur- thermore, the creation of these ML-specialized pipelines is only the tip of the iceberg in terms of the challenges on the road to finding the optimal set of pipelines. Thus, existing solutions specialize in the process of selecting the most effective candidate pipelines (or more accurately limiting the search space for these pipelines), whether through the optimization of data prepro- cessors, the optimization of model choice, or the optimization of the hyperparameter tuning process. However, there are other challenges that are not addressed enough within these efforts. The first challenge is the generation of complex analytical workflows that satisfy users’ needs aside from their differences; the AutoML tools specialize in ML-focused tasks, such as classi- fication and regression without consideration to other data-centered tasks, such as descriptive analytics or data visualization. Additionally, the workflows generated from these tools are de- signed to run on specific execution engines regardless of users’ preferences and limited scope of expertise. This work focuses on the generalization of the framework of generating engine-agnostic complex analytical workflows. Thus, it addresses the previously mentioned challenges by developing a new generalized and extensible ontology that represents the entire process of generating analytical workflows from user intents. In addition, a more generalized workflow generation algorithm is adopted to ensure the generation of workflows that satisfy other user intents beyond classification, in this case data visualization. Finally, a rule-based optimization technique is incorporated within the whole framework, specifically the logical level of the workflow generator, to encode heuristic rules that could assist in choosing the best preprocessing operators rather than generating all the possible combinations in a brute-force manner. The experimental setting consists of comparisons between the previous workflow genera- tion proposal and the work proposed in this thesis, in addition to comparisons between the workflows selected by the rule-based optimization and all the other possible workflows for each intent, separately.

A General Framework of Automated User Intent Mapping to Complex Analytical Workflows

AL-AZAZI, ZYAD ABDULJABBAR MOQBEL
2023/2024

Abstract

The emergence of numerous AutoML (Automated Machine Learning) tools, such as Auto- sklearn and TPOT, as solutions to the challenges surrounding the utilization of Machine Learn- ing and Data Science by non-technical users to solve different problems related to data has been driven by prior efforts dedicated towards the automatic creation of data pipelines. Fur- thermore, the creation of these ML-specialized pipelines is only the tip of the iceberg in terms of the challenges on the road to finding the optimal set of pipelines. Thus, existing solutions specialize in the process of selecting the most effective candidate pipelines (or more accurately limiting the search space for these pipelines), whether through the optimization of data prepro- cessors, the optimization of model choice, or the optimization of the hyperparameter tuning process. However, there are other challenges that are not addressed enough within these efforts. The first challenge is the generation of complex analytical workflows that satisfy users’ needs aside from their differences; the AutoML tools specialize in ML-focused tasks, such as classi- fication and regression without consideration to other data-centered tasks, such as descriptive analytics or data visualization. Additionally, the workflows generated from these tools are de- signed to run on specific execution engines regardless of users’ preferences and limited scope of expertise. This work focuses on the generalization of the framework of generating engine-agnostic complex analytical workflows. Thus, it addresses the previously mentioned challenges by developing a new generalized and extensible ontology that represents the entire process of generating analytical workflows from user intents. In addition, a more generalized workflow generation algorithm is adopted to ensure the generation of workflows that satisfy other user intents beyond classification, in this case data visualization. Finally, a rule-based optimization technique is incorporated within the whole framework, specifically the logical level of the workflow generator, to encode heuristic rules that could assist in choosing the best preprocessing operators rather than generating all the possible combinations in a brute-force manner. The experimental setting consists of comparisons between the previous workflow genera- tion proposal and the work proposed in this thesis, in addition to comparisons between the workflows selected by the rule-based optimization and all the other possible workflows for each intent, separately.
2023
A General Framework of Automated User Intent Mapping to Complex Analytical Workflows
The emergence of numerous AutoML (Automated Machine Learning) tools, such as Auto- sklearn and TPOT, as solutions to the challenges surrounding the utilization of Machine Learn- ing and Data Science by non-technical users to solve different problems related to data has been driven by prior efforts dedicated towards the automatic creation of data pipelines. Fur- thermore, the creation of these ML-specialized pipelines is only the tip of the iceberg in terms of the challenges on the road to finding the optimal set of pipelines. Thus, existing solutions specialize in the process of selecting the most effective candidate pipelines (or more accurately limiting the search space for these pipelines), whether through the optimization of data prepro- cessors, the optimization of model choice, or the optimization of the hyperparameter tuning process. However, there are other challenges that are not addressed enough within these efforts. The first challenge is the generation of complex analytical workflows that satisfy users’ needs aside from their differences; the AutoML tools specialize in ML-focused tasks, such as classi- fication and regression without consideration to other data-centered tasks, such as descriptive analytics or data visualization. Additionally, the workflows generated from these tools are de- signed to run on specific execution engines regardless of users’ preferences and limited scope of expertise. This work focuses on the generalization of the framework of generating engine-agnostic complex analytical workflows. Thus, it addresses the previously mentioned challenges by developing a new generalized and extensible ontology that represents the entire process of generating analytical workflows from user intents. In addition, a more generalized workflow generation algorithm is adopted to ensure the generation of workflows that satisfy other user intents beyond classification, in this case data visualization. Finally, a rule-based optimization technique is incorporated within the whole framework, specifically the logical level of the workflow generator, to encode heuristic rules that could assist in choosing the best preprocessing operators rather than generating all the possible combinations in a brute-force manner. The experimental setting consists of comparisons between the previous workflow genera- tion proposal and the work proposed in this thesis, in addition to comparisons between the workflows selected by the rule-based optimization and all the other possible workflows for each intent, separately.
Knowledge Graphs
Workflow Generation
Analytical Workflows
AutoML
File in questo prodotto:
File Dimensione Formato  
final_final_draft - Zyad Al-Azazi.pdf

accesso aperto

Dimensione 2.79 MB
Formato Adobe PDF
2.79 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80875