Feature Influence Based ETL for Efficient Big Data Management
Abstract
The increased volume of big data introduces various challenges for its maintenance and analysis. There exist various approaches to the problem, but they fail to achieve the expected results. To improve the big data management performance, an efficient real time feature influence analysis based Extraction, Transform, and Loading (ETL) framework is presented in this article. The model fetches the big data and analyses the features to find noisy records by preprocessing the data set. Further, the method performs feature extraction and applies feature influence analysis to various data nodes and the data present in the data nodes. The method estimates Feature Specific Informative Influence (FSII) and Feature Specific Supportive Influence (FSSI). The value of FSII and FSSI are measured with the support of a data dictionary. The class ontology belongs to various classes of data. The value of FSII is measured according to the presence of a concrete feature on a tuple towards any data node, whereas the value of FSSI is measured based on the appearance of supportive features on any data point towards the data node. Using these measures, the method computes the Node Centric Transformation Score (NCTS). Based on the value of NCTS the method performs map reduction and merging of data nodes. The NCTS_FIA method achieves higher performance in the ETL process. By adapting feature influence analysis in big data management, the ETL performance is improved with the least amount of time complexity.
Keyword(s)
Cloud, FSII, FSSI, FIA, NCTS, Ontology
Full Text: PDF (downloaded 516 times)
Refbacks
- There are currently no refbacks.