* ICSE 2018 *
Sun 27 May - Sun 3 June 2018 Gothenburg, Sweden

The ICSE Technical Briefings program provides conference participants with the opportunity to gain new insights, knowledge, and skills in a broad range of areas of software engineering. The audience includes both academic researchers and industry practitioners. Technical Briefings offer a venue for communicating the current state of a timely topic related to software engineering.

Tuesday, May 29 Big data and Machine Learning Security Programming
09:00 - 10:30 Session I Georgios Gousios
Big Data Software Analytics with Apache Spark
Suraj Kothari
Demystifying Cyber-Physical Malware
Tushar Sharma
Detecting and Managing Code Smells: Research and Practice
10:30 - 11:00 Coffee
11:00 - 12:30 Session II Karl Meinke and Amel Bennaceur
Machine Learning for Software Engineering: Models, Methods, and Applications
Eric Bodden
State of the Systems Security
Thomas Ball, Judith Bishop and Joe Finney
Multi-Platform Computing for Physical Devices with MakeCode and CODAL
12:30 - 14:00 Lunch
Natural Language Processing Testing Research Methods
14:00 - 15:30 Session III Laura Moreno and Andrian Marcus
Automatic Software Summarization - The State of the Art
Peter Zimmerer
Strategies for Continuous Testing in iDevOps
Sira Vegas, Analyzing Software Engineering Experiments: Everything You Always Wanted to Know but Were Afraid to Ask
15:30 - 16:00 Coffee
16:00 - 17:30 Session IV Alessio Ferarri
Natural Language Requirements Processing: from Research to Practice
Sergio Segura and Zhiquan Zhou
Metamorphic Testing 20 Years Later: A Hands-on Introduction
Diomidis Spinellis and Georgios Gousios
How to Analyze Git Repositories with Command Line Tools: We're not in Kansas anymore

Georgios Gousios, Big Data Software Analytics

Abstract. At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming and error prone, while the produced artifacts (code, intermediate datasets) are usually not of scientific value. In recent years, Apache Spark has emerged as a solid foundation for data science and has taken the big data analytics domain by storm. With Spark, researchers can map their data sources into immutable lists or data frames and transform them using a declarative API based on functional programming primitives. The primitives exposed by the Apache Spark API can help software engineering researchers create and share reproducible, high-performance data analysis pipelines, that automatically scale processing to clusters of machines.

This technical briefing will cover the following topics:

  • Functional programming basics: what is map? What is fold? What does group by and join do?
  • Apache Spark in a nutshell: what are RDDs and what are Dataframes? How can we query any dataset with SQL?
  • Present a live demo of applying Apache Spark on a software engineering task.

The speaker has extensive experience in applying big data technologies on software engineering data, and has been teaching Apache Spark to BSc and MSc students.

Gousios Georgios Gousios is an assistant professor at the Software Engineering group, Delft University of Technology. His research interests include software engineering, software analytics and programming languages. He works in the fields of distributed software development processes, software quality, software testing, dependency management and research infrastructures. His research has been published in top venues, where he has received four best paper awards and various nominations. In total, he has published more that 50 papers and also co-edited the ``Beautiful Architectures'' book (OReilly, 2009). He is the main author of the GHTorrent data collection and curration framework and the Alitheia Core repository mining platform. Georgios holds and MSc from the university of Manchester and a PhD from the Athens University of Economics and Business, both in software engineering. In addition to research, he is also active as a speaker, both in research and practitioner-oriented conferences.


Suraj Kothari, Demystifying Cyber-Physical Malware

Abstract. The imminent danger of cyber-physical malware (CPM) is evident from attacks such as the power outage in Ukraine, or the hijacking of a Jeep Cherokee. The traditional notion of malware is too narrow, and the prevalent characterizations (virus, worm, Trojan horse, spyware etc.) are neither precise nor comprehensive enough to characterize cyber-physical malware (CPM). Detecting sophisticated CPM is like searching for a needle in the haystack without knowing what the needle looks like. The technical briefing shall congregate interdisciplinary knowledge to describe the fundamentals of CPM, the mathematical foundation for analyzing and verifying CPM, the current state-of-the-art, the challenges, and directions for future research. Employing real-world examples, we shall illustrate the challenges of analyzing and verifying CPM.

CPS security problems are often rooted in the complex CPS software. It is hard for the CPS community to understand intricacies of software analysis and verification. And for the software engineering community, the lack of adequate CPS knowledge is a major roadblock. This makes it important to demystify CPM, so that software engineers can model the CPM problems, establish the mathematical foundation, and advance the software analysis and verification techniques to effectively address the CPM problems.

The knowledge about CPM gained through this technical briefing will be useful to understand the need for new modeling, analysis, and verification techniques. The real-world CPM examples from the technical briefing will bring out limitations of the current software security techniques and the need for new research directions. The experiment-discover paradigm mingled through the briefing will be of interest to innovate education in ways that arouse student's curiosity and creativity.

The briefing will be shaped from the perspective of crucial needs for modeling, analyzing, and verifying CPM. It will cover:

  • Modeling: The Confidentiality-Integrity-Availability (CIA) triad characterizes the impact of the malware but it is not meant to facilitate analysis or verification of software. Modeling research is needed to characterize the program artifacts that enact CPM.
  • Analysis: Complete automation and machine learning are emphasized in many current research approaches to analyze software for security. The technical briefing will illustrate the shortcomings of such techniques and reflect on the need for a new type of analysis to address CPM.
  • Verification: Given the complexity of CPM and the possibility of catastrophic consequences, we will discuss the need for transparent verification that enable a human to easily participate by crosschecking the tool results or completing the verification where automation falls short.

Kothari Suraj Kothari is the Richardson Chair Professor at Iowa State University, and the founding President of EnSoft. He has served as the PI on DARPA APAC and the STAC (ongoing) projects. More than 350 automotive, aerospace, and defense companies license EnSoft's model-based software products worldwide. Kothari has served as a Distinguished ACM Lecturer. He led the effort to establish a highly successful software engineering undergraduate degree program at Iowa State University. He was awarded in 2012 the Iowa State Board of Regents Professor Award for excellence in research, teaching, and service. Kothari's team has developed the Atlas platform to build software analysis, program comprehension, and software verification tools for multiple programming languages. Graph database, graph abstractions, a graphical query language, built-in commonly used static analyses, and interactive visualization capabilities in Atlas make it easy to develop powerful tool using Java and Atlas APIs. ICSE 2014 demo video of Atlas can be found here. ICSE 2015 demo video of Android Security Toolbox built on Atlas can be found here.


Tushar Sharma, Detecting and Managing Code Smells: Research and Practice

Abstract. Code smells in a software system indicate the presence of quality problems that make the software hard to maintain and evolve. Smells not only impact maintainability but also negatively affect other quality attributes such as reliability and testability. Given the importance of smells and their potential impact on software quality, software engineering researchers have explored smells and various dimensions associated with them in the great width and depth in the last two decades. Identifying code smells automatically and refactoring them help software engineering practitioners to keep the software maintainable. Therefore, the dimension of automatically identifying smells has enjoyed active interest by the research community and appreciated by the practitioners.

We divide our session into three parts. In the first part of the session, we present a comprehensive overview of the literature by covering various dimensions of the metaphor. We present defining characteristics of smells synthesized from a comprehensive set of definitions discussed in the literature. The smell metaphor has been extended to other similar domains such as configuration management, spreadsheets, and presentations. We present a summary of the types of smells described in the literature. Smell could cause from a wide variety of factors including lack of skill or awareness, and frequently changing requirements; we discuss the curated set of ten such factors that cause smells. Further, we summarize the impact of smells on people, artifact, or process. In the second part, we delve into the details of smell detection methods prevailed currently both in research prototypes and industrial tools. Traditionally, smells are detected by metrics-based and rule-based approaches. History-based and optimization-based approaches are alternatives that also have been employed by the community. In the recent times, machine-learning-based approaches have been attempted to detect smells. We aim to present a synthesized overview of the current approaches. Additionally, we present the intricacies of developing a smell detection tool that we learned from developing Designite, Puppeteer and other tools. Designite is software design quality assessment tool that detects smells at implementation, design, and architecture granularity. Keeping the software maintainable is a non-trivial challenge for software development team given the real-life challenges (such as time pressure). The final part of the session deals with such challenges and present actionable and pragmatic strategies and practices for practitioners to avoid, detect, and eradicate smells from their codebase. These strategies focus on three major pillars of software development - people, process, and tools.

The session offers contributions to both research and practice. For researchers, it provides a comprehensive overview of the domain of code smells. Also, it reveals the intricacies of developing a smell detection tool. At the same time, practitioners can learn the potential quality issues that may arise in their codebase to avoid them. Furthermore, practitioners can apply pragmatic strategies planned in this session to identify, interpret, and refactor smells.

Sharma Tushar Sharma is a researcher at Athens University of Economics and Business, Athens, Greece. He has more than ten years of industrial work experience including seven years at Siemens Research and Technology Center, Bangalore, India. He earned an MS degree in Computer Science from the Indian Institute of Technology-Madras, Chennai, India, on the topic of design patterns and refactoring. He co-authored the book "Refactoring for Software Design Smells: Managing Technical Debt" published by Morgan Kaufmann in 2014. He has also co-authored two Oracle Java certification books. He has developed Designite which is a software design quality assessment tool; the tool is being used by many practitioners and researchers worldwide. He has delivered talks in many academic as well as developer conferences; he co-presented a tutorial titled "Software design quality in practice: Refactoring for software design smells" in ICSE 2014. He is an IEEE Senior Member.


Karl Meinke and Amel Bennaceur, Machine Learning for Software Engineering: Models, Methods, and Applications

Abstract. Machine Learning (ML) is the discipline that studies methods for automatically inferring models from data. Machine learning has been successfully applied in many areas of software engineering ranging from behaviour extraction, to testing, to bug fixing. Many more applications are yet be defined. However, a better understanding of ML methods, their assumptions and guarantees would help software engineers adopt and identify the appropriate methods for their desired applications. We argue that this choice can be guided by the models one seeks to infer. In this technical briefing, we review and reflect on the applications of ML for software engineering organised according to the models they produce and the methods they use. We introduce the principles of ML, give an overview of some key methods, and present examples of areas of software engineering benefiting from ML. We also discuss the open challenges for reaching the full potential of ML for software engineering and how ML can benefit from software engineering methods. In particular, we will focus on three questions that we think should be addressed before attempting any new ML solution to an existing software engineering problem. These are: What class of learned models is appropriate for solving an SE problem? For this class of models, are there any existing learning algorithms that will work for typical instances and sizes of my SE problem? Otherwise, is it possible to adapt any fundamental ML principles to derive new learning algorithms? Has anyone considered a similar SE problem, and was it tractable to an ML solution?

MeinkeKarl Meinke and his group, over the last ten years, have pioneered new applications of machine learning combined with model checking to the problem of functional requirements testing for software and systems. Group research has studied the learning problem for both procedural and reactive systems. Numerical function approximation as well as symbolic automaton learning methods have been considered. Our main requirements testing tool LBTest [http://www.lbtest.org] has been evaluated in sectors such as automotive, avionics, finance and web, and by major Swedish multinational companies such as SAAB Aerospace, Scania and Volvo. Meinke has a publication track record in the areas of machine learning for finite and infinite state systems, theoretical principles of learning-based testing, and practical tools and case studies for learning-based testing.

AmelAmel Bennaceur is a Lecturer (Assistant Professor) in Computing at the Open University, UK. She received her PhD degree in Computer Science from the University of Paris VI in 2013. Her research interests include dynamic mediator synthesis for interoperability and collaborative security. She was part of the Connect and EternalS EU projects that explored synergies between machine learning and software synthesis. The results of her work have been published in leading conferences and journals such as Middleware, ECSA, and IEEE TSE. She has also been invited to present the results of this work in various scientific events such as Dagstuhl and Shonan seminars.


Eric Bodden, State of the Systems Security

Abstract. Software-intensive systems are increasingly pervading our every day lives. As they get more and more connected, this opens them up to far-reaching cyber attacks. Moreover, a recent study by the U.S. Department of Homeland Security shows that more than 90% of current cyber-attacks are enabled not by faulty crypto, networks or hardware but by application-level implementation vulnerabilities.

In this technical briefing I will thus discuss current challenges and solutions for the secure engineering of software-intensive systems:

  • Why are current systems as insecure as they are?
  • What does it take to implement a Secure Software Engineering Lifecycle?
  • How can one secure software architectures, which role does code analysis play?
  • Which are current hot topics in the security community that we as software engineers should address?

I will address those questions by referring to current security incidents and by explaining a state-of-the-art secure engineering lifecycle. In doing so, I will refer to hands-on experiences that I have gained in projects during which we introduced security engineering into major engineering companies.

bodden Eric Bodden is one of the leading experts on secure software engineering, with a specialty in building highly precise tools for automated program analysis. He is Professor for Software Engineering at Paderborn University and co-director of Fraunhofer IEM. Further, he is a member of the directorate of the Collaborative Research Center CROSSING at TU Darmstadt. At Fraunhofer IEM, Bodden is heading the Attract-Group on Secure Software Engineering. In this function he is developing code analysis technology for security, in collaboration with the leading national and international software development companies. In 2016, Bodden's research group scored 1st place at the German IT-Security Price. In 2014, the DFG awarded Bodden the Heinz Maier-Leibnitz-Preis. In 2013, BITKOM elected him into their mentoring program BITKOM Management Club.


Thomas Ball, Judith Bishop and Joe Finney, Multi-Platform Computing for Physical Devices with MakeCode and CODAL

Abstract. Modern software increasingly has to encompass computing devices from sensors, actuators and controllers. How can non-experts program such systems? Is it possible to reap the benefits of such devices when the software development is likely to be done by programmers who are learning or who are not aiming to be technical experts? The challenge is to hide complexity and still provide flexibility and reliability. This briefing will examine these issues via a case study of MakeCode and CODAL, two symbiotic platform independent frameworks which exhibit many innovative engineering features for physical computing. MakeCode is an open source web-based environment for coding physical computing devices such as the micro:bit. It has novel features all running in a single web application (web app). No software installation is required, and the system runs entirely within the browser - allowing it to work in online and offline environments: MakeCode runs in any modern web browser on almost any operating system, including desktops, laptops, tablets, and even smartphones. CODAL, the Component Oriented Device Abstraction Layer, is an open source lightweight C/C++ runtime environment for resource constrained physical computing devices. CODAL enables easy programming of devices in C via either synchronous or asynchronous, event based programming. It consists of a tailored non-preemptive, scheduler, asynchronous procedure calls, optimized type safe memory allocation, strongly typed device drivers and a structured Object Oriented API to device hardware. Together, MakeCode and CODAL provide a substrate that raises the level of abstraction of physical computing devices in order to allow higher level languages such as Javascript to be supported. Attendees will be able to experience the system first hand with micro:bit and Adafruit Circuit Playground Express devices provided. The briefing will have a broad appeal to researchers, educators and industrialists.

ballThomas (Tom) Ball is a principal researcher and manager at Microsoft Research. Tom initiated the influential SLAM software model-checking project with Sriram Rajamani, which led to the creation of the Static Driver Verifier tool for finding defects in Windows device drivers. Tom is a 2011 ACM Fellow for "contributions to software analysis and defect detection." As a manager, he has nurtured research areas such as automated theorem proving, program testing/verification, and empirical software engineering. Most recently, he has worked on the Microsoft MakeCode project.


ballJudith Bishop is an Extraordinary Professor at Stellenbosch University and previously a Director of Computer Science at Microsoft Research Outreach. She focuses on making programming and programming languages easier to use and less error prone. She has been part of the development of three multi-platform systems - Views for C#, TryF# and TouchDevelop - and has written about principles for higher order programming with design patterns. Her 16 text books have been translated into 6 languages. She is an ACM Distinguished Member. She was part the team that developed the BBC micro:bit.


ballJoe Finney is a senior lecturer in the School of Computing and Communications at Lancaster University. His research interests include networked mobile systems, support for lightweight embedded systems, and novel mobile applications. Early in his career, he worked with Microsoft to develop next generation mobile internet protocols for the Windows and Linux operating systems. He also designed, developed and patented a technology known as Firefly, that enables the real-time modelling and control of large scale 3D artistic LED displays. More recently, he was a founding partner of the BBC micro:bit initiative, where he worked with colleagues at Microsoft, ARM and the open source community to develop the underlying software for the BBC micro:bit. He is now seconded into the Micro:bit Education Foundation and continues to work with these project colleagues to maximize the impact of the micro:bit and related technologies around the world.


Laura Moreno and Andrian Marcus, Automatic Software Summarizations - The State of the Art

Abstract. The focus of this technical briefing is on automatic software summarization, an emerging and growing field in software engineering research, inspired by automatic text summarization. Automatic software summarization refers to the process of generating a concise representation of one or more software artifacts that conveys the information needed by a software stakeholder to perform a particular software engineering task. Different from text summarization, the summarization of software artifacts sometimes involves analysis and processing of source code, a data source specific to the software domain.

The technical briefing will introduce the main types of summaries (e.g., abstractive, extractive, indicative, informative, etc.) and the main categories of software summarization techniques, such as: (1) Text-to-text summarization - approaches that generate text-based summaries from textual software artifacts, such as, bug reports or user reviews. These approaches are the closest to automatic text summarization and rely on that research; (2) Code-to-text summarization - approaches that generate text-based summaries from source code artifacts, such as, methods, classes, test cases, code changes, etc.; (3) Code-to-code summarization - approaches that generate source code based summaries from source code artifacts, such as, code fragments, or code usage examples; (4) Mixed artifact summarization - approaches that generate summaries for heterogeneous software artifacts (i.e., those containing text and code), such as, programming forums posts.

In addition, the technical briefing will discuss the main forms of evaluation of automated software summarization results (e.g., online, offline, intrinsic, and extrinsic) and the challenges associated with each. Finally, we will discuss existing and potential applications of automatically generated software summaries.

moreno Laura Moreno is an Assistant Professor in the Department of Computer Science at Colorado State University. She is particularly interested in software maintenance and evolution research. Her work leverages information contained in various software artifacts through techniques that are at the intersection of software engineering, information retrieval, natural language processing, data mining, and machine learning. An instance of this work is her dissertation, which focused on the generation of software documentation through summarization of source code artifacts. Papers resulting from her research have been published in top software engineering venues. She has served as organizing committee member and program committee member for several conferences in the field, and as a reviewer for several software engineering journals.

marcusAndrian Marcus is a Professor in the Department of Computer Science at The University of Texas at Dallas. His current research interests are in software engineering, with focus on software evolution and program comprehension. His current research interests are in software engineering, with focus on program comprehension and software evolution. He is best known for his work on using text retrieval and analysis techniques on software corpora for supporting comprehension during software evolution. Together with his students and collaborators, he received several Best Paper Awards and Most Influential Paper Awards at software engineering conferences, and the NSF CAREER award. He served on the editorial board of the IEEE Transactions on Software Engineering (2014-2018), and currently serves on the editorial board of the Empirical Software Engineering Journal, and the Journal of Software: Evolution and Process. He delivered more than 40 invited talks at various education and research institutions. These include tutorials and technical briefings at: ICSE'17, ICSE'16, SPLASH'15, FSE'15, ICSE'15, ICSE'12, ASE'12, ASE'11, FSE'11, ASE'10.


Peter Zimmerer, Strategies for Continuous Testing in iDevOps

Abstract. Continuous testing is the key for DevOps. But what is the right strategy, especially in industrial domains ("industrial-grade DevOps" - "iDevOps") that face many additional challenges compared to the ideal world of pure IT systems? This tutorial presentation describes the state-of-the-art in the emerging area of continuous testing in a DevOps context. It specifies the building blocks of a strategy for continuous testing in industrial-grade DevOps projects (iDevOps) and shares our motivations, achievements, and experiences on our journey to transform testing into the iDevOps world. Especially the mindset and culture is important for any DevOps transformation. But just arguing that "we need motivated people with the right mindset" is not enough; therefore many concrete examples of potential conflicts, misalignment, and new priorities that require a cultural change and different behavior for iDevOps are discussed and explained ("mindset by example"). Attend this presentation and do not only learn what industrial-grade DevOps (iDevOps) is all about and why we at Siemens are driving this forward but be able to use the strategies, tactics, and practices in your projects as a major lever for better testing. We are continuing our journey in this direction - that means a tremendous upgrade and empowerment of testing!

Agenda:

  • What & Why? – Continuous testing defined; Specifics of industrial-grade DevOps (iDevOps); Testing areas - What’s new and different in continuous testing?
  • How? – Prerequisites and preconditions for continuous testing; Mindset by example; Enablers for continuous testing; Practices, recommendations, and experiences.

Key Takeaways:

  • Understand why and how testing is changing dramatically in the iDevOps world – A strategy for continuous testing is the heartbeat of iDevOps; Continuous Testing is the linchpin of automated CI and CD.
  • Get to know the building blocks of a successful strategy for continuous testing in iDevOps
  • Exploit the discussed examples to create and foster an adequate mindset and culture.

zimmerer Peter Zimmerer is a Principal Key Expert Engineer at Siemens AG, Corporate Technology, in Munich, Germany. He studied Computer Science at the University of Stuttgart, Germany and received his M.Sc. degree (Diplom-Informatiker) in 1991. He is an ISTQB® Certified Tester Full Advanced Level and member of the German Testing Board (GTB). For more than 25 years he has been working in the field of software testing and quality engineering for object-oriented, component-based, distributed, service-oriented, and embedded software systems. At Siemens he performs consulting, coaching, and training on test management and test engineering practices including test strategies, test architectures, test methods, test techniques, test processes, and test automation in real-world projects and drives research and innovation in this area. He is author of several papers and articles and frequent speaker at international conferences. LinkedIn, XING, ResearchGate.


Sira Vegas, Analyzing Software Engineering Experiments: Everything you always wanted to know but were afraid to ask

Abstract. Experimentation is a key issue in science and engineering. But it is one of software engineering's stumbling blocks. Quite a lot of experiments are run nowadays, but it is a risky business. Software engineering has some special features, leading to some experimentation issues being conceived of differently than in other disciplines. The aim of this technical briefing is to help participants to avoid common pitfalls when analyzing the results of software engineering experiments. The technical briefing is not intended as a data analysis course, because there is already plenty of literature on this subject. It reviews several key issues that we have identified in published software engineering experiments, and addresses them based on the knowledge acquired after 19 years running experiments. The technical briefing starts by recalling what a SE experiment is, and its distinctive features: control and causality. Then, five topics related to data analysis will be explored. The technical briefing will end with participants' questions.

The five topics covered are:

  • One-tailed vs. two-tailed tests. SE experimenters very often opt for a one-tailed hypothesis, but this can be a shortcoming in many experiments.
  • Choosing the right data analysis technique. The selected data analysis technique should match the experimental design. However, the choice is not straightforward, and several issues have to be taken into consideration.
  • Analyzing tricky designs. We will discuss two designs which are commonly not properly analyzed: blocked and crossover designs. -Parametric vs. non-parametric tests. We will discuss the different options that can be used when data do not meet the parametric tests assumptions.
  • The 3 musketeers: statistical significance, effect size and power. We will discuss the meaning and implications of the three parameters, how they relate to each other, and how they should be used to properly interpret the results of an experiment.

vegasSira Vegas received her PhD degree from the Universidad Politecnica de Madrid in 2002. She is currently associate professor of software engineering at Universidad Politecnica de Madrid. Her main research interests are experimental software engineering and software testing. She is regular reviewer of highly ranked journals such as IEEE Transactions on Software Engineering, Empirical Software Engineering Journal, ACM Transactions on Software Engineering and Methodology and Information and Software Technology. Dr. Vegas was program chair for the International Symposium on Empirical Software Engineering and Measurement (ESEM) in 2007. She began her career as a summer student at the European Centre for Nuclear Research (CERN, Geneva) in 1995. She was a regular visiting scholar of the Experimental Software Engineering Group at the University of Maryland from 1998 to 2000, visiting scientist at the Fraunhofer Institute for Experimental Software Engineering in Germany in 2002 and regular visiting professor at the M3S group of the University of Oulu in Finland from 2015 to 2017.


Alessio Ferarri, Natural Language Requirements Processing: From Research to Practice

Abstract. Automated manipulation of natural language requirements, for classification, tracing, defect detection, information extraction, and other tasks, has been pursued by requirements engineering (RE) researchers for more than two decades. Recent technological advancements in natural language processing (NLP) have made it possible to apply this research more widely within industrial settings. This technical briefing targets researchers and practitioners, and aims to give an overview of what NLP can do today for RE problems, and what could do if specific research challenges, also emerging from practical experiences, are addressed. The talk will: survey current research on applications of NLP to RE problems; present representative industrially-ready techniques, with a focus on defect detection and information extraction problems; present enabling technologies in NLP that can play a role in RE research, including distributional semantics representations; discuss criteria for evaluation of NLP techniques in the RE context; outline the main challenges for a systematic application of the techniques in industry. The crosscutting topics that will permeate the talk are the need for domain adaptation, and the essential role of the human-in-the-loop.

ferrariAlessio Ferrari is Research Scientist at Consiglio Nazionale delle Ricerche Istituto di Scienza e Tecnologie dell'Informazione ``A. Faedo'' (CNR-ISTI), Italy. He received a Ph. D. in computer engineering from the University of Florence in 2011. He has about 10 years of research experience, and 3 years of industrial experience at General Electric Transportation Systems, a world leading railway signalling company. His main research research interests are: (a) application of natural language processing technologies to requirements engineering, with a focus on detection of ambiguity and communication defects in requirements documents and requirements elicitation interviews; (b) software process improvement for safety-critical systems, with a focus on formal/semi-formal model-based development and code generation in the railway domain. He is co-author of about 40 publications in international conferences and journals. In 2015, he received the Best Paper Award at the IEEE RE conference. He recently co-organised the NLP4RE workshop, and serves on the program committee of international conferences such as RE and REFSQ. He participated to several regional and EU projects, and he is currently Work Package leader of the H2020 EU project ASTRail, concerning formal methods in railways. He is also actively involved in technology transfer studies with railway companies, such as Alstom, SIRTI and ECM. Contact him at: alessio.ferrari@isti.cnr.it.


Sergio Segura and Zhiquan Zhou, Metamorphic Testing 20 Years Later: A Hands-on Introduction

Abstract. Two of the key challenges in software testing are the automated generation of test cases, and the identification of failures by checking test outputs. Both challenges are effectively addressed by metamorphic testing (MT), a software testing technique where failures are not revealed by checking an individual concrete output, but by checking the relations among the inputs and outputs of multiple executions of the software under test. Two decades after its introduction, MT is becoming a fully-fledged testing paradigm with successful applications in multiple domains including, among others, big data engineering, simulation and modeling, compilers, machine learning programs, autonomous cars and drones, and cybersecurity. This technical briefing will provide an introduction to MT from a double perspective. First, Sergio Segura will present the technique and the results of a novel survey outlining its main trends and lessons learned. Then, Zhi Quan Zhou will go deeper and present some of the successful applications of the technique, as well as challenges and opportunities on the topic. The briefing will be complemented with practical exercises on testing real web applications and APIs.

sergioSergio Segura obtained a PhD in Software Engineering in 2011 at Seville University, where he currently works as a senior lecturer. His research interests include software testing, software variability and search-based software engineering. He has co-authored some highly-cited papers as well as tools used by universities and companies in several countries. He also serves regularly as a reviewer for international journals and conferences. In 2016, he co-founded the ICSE International Workshop on Metamorphic Testing (ICSE MET'16).



zhouZhi Quan (George) Zhou received the BSc degree in Computer Science from Peking University, China, and the PhD degree in Software Engineering from The University of Hong Kong. He is currently an associate professor in software engineering at the University of Wollongong, Australia. His current research interests include software testing and debugging, software quality assessment, user experience improvement, and citation analysis. Zhou is one of the few earliest pioneers who co-founded the research field of metamorphic testing. In 2016, he co-founded the ICSE International Workshop on Metamorphic Testing (ICSE MET'16). He was an invited keynote speaker at ICSE MET'17. Zhou was selected for a Virtual Earth Award by Microsoft Research, Redmond.


Diomidis Spinellis and Georgios Gousios, How to Analyze Git Repositories with Command Line Tools: We’re not in Kansas anymore

Abstract. Git repositories are an important source of empirical software engineering product and process data. Running the Git command line tool and processing its output with other Unix tools allows the incremental construction of sophisticated and highly efficient data processing pipelines. Git data analytics on the command line can be systematically presented through a pattern that involves fetching, selection, processing, summarization, and reporting. For each part of the processing pipeline, the technical briefing examines the tools and techniques that can be most effectively used to perform the task at hand.

A broad section of the software engineering community can benefit from this briefing. The presented techniques are not widely known but can be easily applied, initially to get a feeling of version control repository data at hand and then also for extracting empirical results.

The speakers have applied the presented methods in empirical software engineering research settings and have written about them extensively. Importantly, they have first hand experience of the shortcomings of more heavyweight tools and approaches, which led them to come up with the techniques presented in this workshop. They are both active toolsmiths and have been teaching students on how to effectively use the command line for more than 20 years.

spinellisDiomidis Spinellis is a Professor heading the Department of Management Science and Technology at the Athens University of Economics and Business. His research interests include software engineering, IT security, and cloud systems engineering. He has written two award-winning, widely-translated books: Code Reading and Code Quality: The Open Source Perspective. His most recent book is Effective Debugging: 66 Specific Ways to Debug Software and Systems. Dr. Spinellis has also published more than 250 technical papers in journals and refereed conference proceedings, which have received more than 5000 citations. He served for a decade as a member of the IEEE Software editorial board, authoring the regular Tools of the Trade column. He has contributed code that ships with macOS and is the developer of UMLGraph, dgsh, gi, CScout, and other open-source software packages, libraries, and tools. He holds an MEng in Software Engineering and a PhD in Computer Science, both from Imperial College London. Dr. Spinellis has served as an elected member of the IEEE Computer Society Board of Governors (2013-2015), and is a senior member of the ACM and the IEEE. From January 2015 he is serving as the Editor-in-Chief for IEEE Software.

gousiosGeorgios Gousios is an assistant professor at the Software Engineering group, Delft University of Technology. His research interests include software engineering, software analytics and programming languages. He works in the fields of distributed software development processes, software quality, software testing, dependency management and research infrastructures. His research has been published in top venues, where he has received four best paper awards and various nominations. In total, he has published more that 50 papers and also co-edited the ``Beautiful Architectures'' book (OReilly, 2009). He is the main author of the GHTorrent data collection and curration framework and the Alitheia Core repository mining platform. Georgios holds and MSc from the university of Manchester and a PhD from the Athens University of Economics and Business, both in software engineering. In addition to research, he is also active as a speaker, both in research and practitioner-oriented conferences.