Automatic Software Summarization - The State of the Art
The focus of this technical briefing is on automatic software summarization, an emerging and growing field in software engineering research, inspired by automatic text summarization. Automatic software summarization refers to the process of generating a concise representation of one or more software artifacts that conveys the information needed by a software stakeholder to perform a particular software engineering task. Different from text summarization, the summarization of software artifacts sometimes involves analysis and processing of source code, a data source specific to the software domain.
The technical briefing will introduce the main types of summaries (e.g., abstractive, extractive, indicative, informative, etc.) and the main categories of software summarization techniques, such as: (1) Text-to-text summarization - approaches that generate text-based summaries from textual software artifacts, such as, bug reports or user reviews. These approaches are the closest to automatic text summarization and rely on that research; (2) Code-to-text summarization - approaches that generate text-based summaries from source code artifacts, such as, methods, classes, test cases, code changes, etc.; (3) Code-to-code summarization - approaches that generate source code based summaries from source code artifacts, such as, code fragments, or code usage examples; (4) Mixed artifact summarization - approaches that generate summaries for heterogeneous software artifacts (i.e., those containing text and code), such as, programming forums posts.
In addition, the technical briefing will discuss the main forms of evaluation of automated software summarization results (e.g., online, offline, intrinsic, and extrinsic) and the challenges associated with each. Finally, we will discuss existing and potential applications of automatically generated software summaries.
Tue 29 May
|14:00 - 15:30|
|16:00 - 17:30|