Heterogeneous Graph Learning for Automated Data Flow Analysis in Large Software Repositories

Authors

  • Angela Stephen Department of Computer Engineering, Tulane State University Author

Keywords:

Heterogeneous Graph Learning, Data Flow Analysis, Software Repositories, Graph Neural Networks, Code Dependency Graphs, Automated Software Engineering, Static and Dynamic Analysis, Software Vulnerability Detection, Meta-Path Learning, AI-Driven Code Optimization.

Abstract

Modern software development generates vast and complex repositories with intricate data flow relationships between different components, such as source code, dependencies, function calls, and issue tracking logs. Traditional data flow analysis (DFA) techniques struggle to handle the heterogeneity and dynamic nature of these repositories, leading to inefficiencies in vulnerability detection, code optimization, and software maintenance. This paper proposes a Heterogeneous Graph Learning (HGL) framework for automated data flow analysis in large-scale software repositories. The proposed approach constructs a heterogeneous graph where nodes represent various software artifacts (e.g., functions, APIs, libraries, commits), and edges capture their semantic, syntactic, and dependency relationships. By leveraging Graph Neural Networks (GNNs) and meta-path-based learning, the model learns meaningful representations of software entities, enabling precise anomaly detection, impact analysis, and automated code refactoring recommendations. Experiments conducted on GitHub repositories, Open Source Software (OSS) datasets, and industry-scale software projects demonstrate that the proposed framework outperforms traditional static analysis and deep learning-based approaches in accuracy, scalability, and generalizability. The results indicate a 27% improvement in data flow prediction accuracy and a 34% reduction in false positives in vulnerability detection compared to baseline methods. The findings suggest that Heterogeneous Graph Learning provides an effective and scalable solution for automated data flow analysis, software quality assurance, and security assessment in large software repositories.

Downloads

Download data is not yet available.

Downloads

Published

2014-08-16

How to Cite

Heterogeneous Graph Learning for Automated Data Flow Analysis in Large Software Repositories. (2014). International Journal of Machine Learning Research in Cybersecurity and Artificial Intelligence, 5(1), 44-53. https://ijmlrcai.com/index.php/Journal/article/view/367

Similar Articles

1-10 of 288

You may also start an advanced similarity search for this article.