publications | Bihui Jin

2026

Energy-Efficient Software Development: A Multi-dimensional Empirical Analysis of Stack Overflow

Bihui Jin, Heng Li, Pengyu Nie, and Ying Zou

In 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE) , May 2026

Abs DOI PDF

Energy consumption of software applications has emerged as acritical concern for developers to contemplate in their daily de-velopment processes. Previous studies have surveyed a limitednumber of developers to understand their viewpoints on energyconsumption. We complement these studies by analyzing a meticu-lously curated dataset of 1,193 Stack Overflow (SO) questions con-cerning energy consumption. These questions reflect real-worldenergy-related challenges practitioners face during development.To understand practitioners’ perceptions, we investigate the in-tentions behind these questions, semantic topics, and associatedtechnologies (e.g., programming languages). Our results reveal that:(i) the most prevalent energy consumption topic is about balancingPositioningusage; (ii) efficiently handling data is particularly chal-lenging, with these questions having the longest response times;(iii) practitioners primarily ask questions to understand a conceptor API related to energy consumption; and (iv) practitioners are con-cerned about energy consumption across multiple levels—hardware,operating systems, and programming languages—during energyefficient software development. Our findings raise awareness aboutenergy consumption’s impact on software development. We alsoderive actionable implications for energy optimization at differentlevels (e.g., optimizing API usage or hardware accesses) duringenergy-aware software development.

2025

CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories

Kaihang Jiang, Bihui Jin, and Pengyu Nie

In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , Apr 2025

Abs DOI PDF

Modern programming languages are constantly evolving, introducing new language features and APIs to enhance software development practices. Software developers often face the tedious task of upgrading their codebase to new programming language versions. Recently, large language models (LLMs) have demonstrated potential in automating various code generation and editing tasks, suggesting their applicability in automating code upgrade. However, there exists no benchmark for evaluating the code upgrade ability of LLMs, as distilling code changes related to programming language evolution from real-world software repositories’ commit histories is a complex challenge. In this work, we introduce CoUpJava, the first large-scale dataset for code upgrade, focusing on the code changes related to the evolution of Java. CoUpJava comprises 10,697 code upgrade samples, distilled from the commit histories of 1,379 open-source Java repositories and covering Java versions 7–23. The dataset is divided into two subsets: CoUpJava-FINE, which captures fine-grained method-level refactorings towards new language features; and CoUpJava-COARSE, which includes coarse-grained repository-level changes encompassing new language features, standard library APIs, and build configurations. Our proposed dataset provides high-quality samples by filtering irrelevant and noisy changes and verifying the compilability of upgraded code. Moreover, CoUpJava reveals diversity in code upgrade scenarios, ranging from small, fine-grained refactorings to large-scale repository modifications.
Impact of extensions on browser performance: An empirical study on google chrome

Bihui Jin, Heng Li, and Ying Zou

Empirical software Engineering (EMSE), Apr 2025

Abs DOI PDF

Web browsers have been used widely by users to conduct various online activities, such as information seeking or online shopping. To improve user experience and extend the functionality of browsers, practitioners provide mechanisms to allow users to install third-party-provided plugins (i.e., extensions) on their browsers. However, little is known about the performance implications caused by such extensions. In this paper, we conduct an empirical study to understand the impact of extensions on the user-perceived performance (i.e., energy consumption and page load time) of Google Chrome, the most popular browser. We study a total of 72 extensions from 11 categories (e.g., Developer Tools and Sports), consisting of 61 extensions with distinct types of privacy practices used and 11 extensions without adopting any privacy practices (i.e., no privacy-related data is collected). We observe that browser performance can be negatively impacted by the use of extensions, even when the extensions are used in unintended circumstances (e.g., when logging into an extension is not granted but required, or when an extension is not used for designated websites). We also identify a set of factors that significantly influence the performance impact of extensions, such as code complexity and privacy practices (i.e., collection of user data) adopted by the extensions. Based on our empirical observations, we provide recommendations for developers and users to mitigate the performance impact of browser extensions, such as conducting performance testing and optimization for unintended usage scenarios of extensions, or adhering to proper usage practices of extensions (e.g., logging into an extension when required).
Learning to Edit Interactive Machine Learning Notebooks

Bihui Jin, Jiayue Wang, and Pengyu Nie

In Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, Apr 2025

Abs DOI PDF

Machine learning (ML) developers frequently use interactive computational notebooks, such as Jupyter notebooks, to host code for data processing and model training. Notebooks provide a convenient tool for writing ML pipelines and interactively observing outputs. However, maintaining notebooks, e.g., to add new features or fix bugs, can be challenging due to the length and complexity of the ML pipeline code. Moreover, there is no existing benchmark related to developer edits on notebooks.In this paper, we present early results of the first study on learning to edit ML pipeline code in notebooks using large language models (LLMs). We collect the first dataset of 48,398 notebook edits derived from 20,095 revisions of 792 ML-related GitHub repositories. Our dataset captures granular details of file-level and cell-level modifications, offering a foundation for understanding real-world maintenance patterns in ML pipelines. We observe that the edits on notebooks are highly localized. Although LLMs have been shown to be effective on general-purpose code generation and editing, our results reveal that the same LLMs, even after finetuning, have low accuracy on notebook editing, demonstrating the complexity of real-world ML pipeline maintenance tasks. Our findings emphasize the critical role of contextual information in improving model performance and point toward promising avenues for advancing LLMs’ capabilities in engineering ML code.