Publications
2011
-
Matthias Hert, Gerald Reif, Harald C. Gall, A Comparison of RDB-to-RDF Mapping Languages, Proceedings of the 7th International Conference on Semantic Systems (I-Semantics) 2011. (inproceedings)
Mapping Relational Databases (RDB) to RDF is an active field of research. The majority of data on the current Web is stored in RDBs. Therefore, bridging the conceptual gap between the relational model and RDF is needed to make the data available on the Semantic Web. In addition, recent research has shown that Semantic Web technologies are useful beyond the Web, especially if data from different sources has to be exchanged or integrated. Many mapping languages and approaches were explored leading to the ongoing standardization effort of the World Wide Web Consortium (W3C) carried out in the RDB2RDF Working Group (WG). The goal and contribution of this paper is to provide a feature-based comparison of the state-of-the-art RDB-to-RDF mapping languages. It should act as a guide in selecting a RDB-to-RDF mapping language for a given application scenario and its requirements w.r.t. mapping features. Our comparison framework is based on use cases and requirements for mapping RDBs to RDF as identified by the RDB2RDF WG. We apply this comparison framework to the state-of-the-art RDB-to-RDF mapping languages and report the findings in this paper. As a result, our classification proposes four categories of mapping languages: direct mapping, read-only general-purpose mapping, read-write general-purpose mapping, and special-purpose mapping. We further provide recommendations for selecting a mapping language.
-
Matthias Hert, Giacomo Ghezzi, Michael Würsch, Harald C. Gall, How to 'Make a Bridge to the new Town' using OntoAccess, Proceedings of the 10th International Semantic Web Conference (ISWC) 2011. (inproceedings)
Business-critical legacy applications often rely on relational databases to sustain daily operations. Introducing Semantic Web technology in newly developed systems is often difficult, as these systems need to run in tandem with their predecessors and cooperatively read and update existing data.
A common pattern is to incrementally migrate data from a legacy system to its successor by running the new system in parallel, with a data bridge in between. Existing approaches that can be deployed as a data bridge in theory, restrict Semantic Web-enabled applications to read legacy data in practice, disallowing update operations completely.
This paper explains how our RDB-to-RDF platform OntoAccess can be used to transition legacy systems into Semantic Web-enabled applications. By means of a case study, we exemplify how we successfully made a bridge between one of our own large-scale legacy systems and its long-term replacement. We elaborate on challenges we faced during the migration process and how we were able to overcome them.
-
Giacomo Ghezzi, Harald C. Gall, SOFAS : A Lightweight Architecture for Software Analysis as a Service, Working IEEE/IFIP Conference on Software Architecture (WICSA 2011), 20-24 June 2011, Boulder, Colorado, USA 2011, IEEE Computer Society. (inproceedings)
Access to data stored in software repositories by systems such as version control, bug and issue tracking, or mailing lists is essential for assessing the quality of a software system. A myriad of analyses exploiting that data have been proposed throughout the years: source code analysis, code duplication analysis, co-change analysis, bug prediction, or detection of bug fixing patterns. However, easy and straight forward synergies between these analyses rarely exist. To tackle this problem we have developed SOFAS, a distributed and collaborative software analysis platform to enable a seamless interoperation of such analyses. In particular, software analyses are offered as RESTful web services that can be accessed and composed over the Internet. SOFAS services are accessible through a software analysis catalog where any project stakeholder can, depending on the needs or interests, pick specific analyses, combine them, let them run remotely and then fetch the final results. That way, software developers, testers, architects, or quality assurance experts are given access to quality analysis services. They are shielded from many peculiarities of tool installations and configurations, but SOFAS offers them sophisticated and easy-to-use analyses. This paper describes in detail our SOFAS architecture, its considerations and implementation aspects, and the current set of implemented and offered RESTful analysis services.
-
Matthias Hert, Sergio Marsella, Gerald Reif, Harald C. Gall, UpLink - A Linked Data Editor for RDB-to-RDF Data, Proceedings of the 7th International Conference on Semantic Systems (I-Semantics) 2011. (inproceedings/Short Paper)
Linked Data builds a machine-processable Web of Data based on a large and growing number of RDF datasets and typed links among them. For the human user, Web-based interfaces were developed to enable browsing and editing Linked Data that is stored as native RDF. However, the majority of data on the current Web is stored in Relational Databases (RDB). This is a challenge for Linked Data browsers and especially for Linked Data editors. In this paper, we present UpLink which is to the best of our knowledge the first Linked Data editor for RDB-to-RDF data, i.e., RDF data that is mapped on demand from a RDB. We further present usage scenarios to demonstrate that UpLink supports the basic CRUD operations for editing Linked Data.
2010
-
Matthias Hert, Gerald Reif, Harald C. Gall, 'Semantic Web 2.0' - Write-enabling the Web of Data, Proceedings of the 6th Workshop on Semantic Web Applications and Perspectives (SWAP), September 2010. (inproceedings)
The Semantic Web today is mainly a read-only Web of Data. Many of the data sets that contribute to the Semantic Web are not stored as native RDF, but generated on demand via wrappers. Despite the fact that user contribution is the key success factor in the Web 2.0, current wrapper approaches and standardization efforts still focus on read-only data access. In this paper, we argue that the Semantic Web should learn from the evolution of the Web 2.0 and consider write-enabled semantic data wrappers.
-
Giacomo Ghezzi, Harald C. Gall, Distributed and Collaborative Software Analysis, Collaborative Software Engineering, Editor(s): Ivan Mistrik, John Grundy, Jim Whitehead, Andrè van der Hoek, January ; 2010, Springer-Verlag. (incollection)
-
Michael Würsch, Gerald Reif, Serge Demeyer, Harald C. Gall, Fostering Synergies - How Semantic Web Technology could influence Software Repositories, Proceedings of the 2nd Intl. Workshop on Search-driven development: Users, Infrastructure, Tools and Evaluation (SUITE)., May 2010. (inproceedings/Workshop paper)
The state-of-the-art in mining software repositories mirrors software artifacts from various sources into monolithic relational databases. This puts a lot of querying power in the hands of the software miners, however it comes at the cost of enclosing the data and hamper cross-application reuse. In this paper we discuss four problem scenarios to illustrate that Semantic Web technology is able to overcome these limitations. However, it requires that the software engineering research community agrees on two prerequisites: (a) a common vocabulary to talk about software repositories -- an ontology; (b) a strategy for generating unique and stable references to all software artifacts inside such a repository - a Universal Resource Identifier (URI).
-
Sandro Boccuzzo, Harald C. Gall, Multi-Touch Collaboration for Software Exploration, Proceedings of the International Conference on Program Comprehension (ICPC'10) 2010. (inproceedings)
Software systems have grown so complex and their design is so intricate that no individual can grasp the whole picture. Touch screen technology combined with 3D software visualization offers a promising way for the software engineers involved in a project to share knowledge about a software system in an intuitive way. In this paper we present first results on how such emerging technologies can be combined to support software exploration tasks, such as identifying high-impact changes or revealing problematic parts of the design. As demonstrated with a scenario, this turns the collaborative environment into a vehicle usable during software reviews.
-
Emanuel Giger, Martin Pinzger, Harald C. Gall, Predicting the fix time of bugs, Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, May 2010. (inproceedings)
Two important questions concerning the coordination of development effort are which bugs to fix first and how long it takes to fix them. In this paper we investigate empirically the relationships between bug report attributes and the time to fix. The objective is to compute prediction models that can be used to recommend whether a new bug should and will be fixed fast or will take more time for resolution. We examine in detail if attributes of a bug report can be used to build such a recommender system. We use decision tree analysis to compute and 10-fold cross validation to test prediction models. We explore prediction models in a series of empirical studies with bug report data of six systems of the three open source projects Eclipse, Mozilla, and Gnome. Results show that our models perform significantly better than random classification. For example, fast fixed Eclipse Platform bugs were classified correctly with a precision of 0.654 and a recall of 0.692. We also show that the inclusion of postsubmission bug report data of up to one month can further improve prediction models.
-
Giacomo Ghezzi, Harald C. Gall, SOFAS Architecture, University of Zurich, Department of Informatics, Software Evolution and Architecture Lab, 01 2010. (techreport)
-
Michael Würsch, Giacomo Ghezzi, Gerald Reif, Harald C. Gall, Supporting Developers with Natural Language Queries, Proceedings of the 32nd International Conference on Software Engineering, May 2010, IEEE Computer Society. (inproceedings)
The feature list of modern IDEs is growing steadily and mastering these tools becomes more and more demanding, especially for novice programmers. Despite their remarkable capabilities, IDEs often still cannot directly answer the questions that arise during program comprehension tasks. Instead developers have to map their questions to multiple concrete queries that can be answered only by combining several tools and examining the output of each of them manually to distill an appropriate answer. Existing approaches have in common that they are either limited to a set of predefined, hardcoded questions, or that they require to learn a specific query language only suitable for that limited purpose. We present a framework to query for information about a software system using guided-input natural language resembling plain English. For that, we model data extracted by classical software analysis tools with an OWL ontology and use knowledge processing technologies from the Semantic Web to query it. We also present a case study that demonstrates how our framework can be used to answer queries about static source code information for program comprehension purposes.
-
Matthias Hert, Gerald Reif, Harald C. Gall, Updating Relational Data via SPARQL/Update, EDBT Workshop Proceedings, March 2010. (inproceedings)
Relational Databases (RDBs) are used in most current enterprise environments to store and manage data. The semantics of the data is not explicitly encoded in the relational model, but implicitly at the application level. Ontologies and Semantic Web technologies provide explicit semantics that allows data to be shared and reused across application, enterprise, and community boundaries. Converting all relational data to RDF is often not feasible, therefore we adopt a mediation approach for ontology-based access to RDBs. Existing mapping approaches focus on read-only access via SPARQL or as Linked Data but other data access interfaces exist, including approaches for updating RDF data. In this paper we present OntoAccess, an extensible platform for ontology-based read and write access to existing relational data. It encapsulates the translation logic in the core layer that provides the foundation of an extensible set of data access interfaces in the interface layer. We further present the formal definition of our RDB-to-RDF mapping, the architecture of our mediator platform, and a performance evaluation of the prototype implementation.
-
Patrick Knab, Martin Pinzger, Harald C. Gall, Visual Patterns in Issue Tracking Data, New Modeling Concepts for Today's Software Processes 2010, Springer. (inproceedings)
Software development teams gather valuable data about features and bugs in issue tracking systems. This information can be used to measure and improve the efficiency and effectiveness of the development process. In this paper we present an approach that harnesses the extraordinary capability of the human brain to detect visual patterns.
We specify generic visual process patterns that can be found in issue tracking data. With these patterns we can analyze information about effort estimation, and the length, and sequence of problem resolution activities.
In an industrial case study we apply our interactive tool to identify instances of these patterns and discuss our observations.
Our approach was validated through extensive discussions with multiple project managers and developers, as well as feedback from the project review board.
2009
-
Beat Fluri, Michael Würsch, Emanuel Giger, Harald C. Gall, Analyzing the co-evolution of comments and source code, Software Quality Journal Vol. 17 (4), September 2009. (article)
Source code comments are a valuable instrument to preserve design decisions and to communicate the intent of the code to programmers and maintainers. Nevertheless, commenting source code and keeping them up-to-date is often neglected for reasons of time or programmer?s obliviousness. In this paper, we investigate the question whether developers comment their code and to which extent they add comments or adapt them when they evolve the code. We present an approach to associate comments with source code entities to track their co-evolution over multiple versions. A set of heuristics are used to decide whether a comment is associated to its preceding or its succeeding source code entity. We analyzed the co-evolution of code and comments in eight different open source and closed source software systems. We found with statistical significance that (1) the relative amount of comments and source code grows at about the same rate; (2) the type of a source code entity, such as a method declaration or an if-statement, has a significant influence on whether or not it gets commented; (3) in six out of the eight systems, code and comments co-evolve in 90 percent of the cases; and (4) surprisingly, API changes and comments do not co-evolve but they are re-documented in a later revision. As a result, our approach enables a quantitative assessment of the commenting process in a software system. We can, therefore, leverage the results to provide feedback during development to increase the awareness when to add comments or when to adapt comments because of source code changes.
-
Sandro Boccuzzo, Harald C. Gall, Automated Comprehension Tasks in Software Exploration, ASE '09: Proceedings of the 2009 International Conference on Automated Software Engineering 2009. (inproceedings/Short Paper)
Finding issues in software usually requires a serie of comprehension tasks. After every task, an engineer explores the results and decides whether further tasks are required. Software comprehension therefore is a combination of tasks and a supported exploration of the results typically in an adequate visualization. In this paper, we describe how we simplify the combination of existing automated procedures to sequentially solve common software comprehension tasks. Beyond that we improve the understanding of the outcomes with interactive and explorative visualization concepts in a time efficient workflow. We validate the presented concept with basic comprehension tasks in an extended CocoViz tool implementation.
-
Harald C. Gall, Beat Fluri, Martin Pinzger, Change Analysis with Evolizer and ChangeDistiller, IEEE Software Vol. 26 (1), January/February 2009. (article)
-
Thomas Zimmermann, Nachiappan Nagappan, Harald C. Gall, Emanuel Giger, Brendan Murphy, Cross-project defect prediction: a large scale experiment on data vs. domain vs. process, ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering 2009, ACM. (inproceedings)
-
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald C. Gall, Brendan Murphy, Does distributed development affect software quality? An empirical case study of Windows Vista, ICSE '09: Proceedings of the 2009 IEEE 31st International Conference on Software Engineering 2009, IEEE Computer Society. (inproceedings)
-
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald C. Gall, Brendan Murphy, Does distributed development affect software quality?: an empirical case study of Windows Vista, Communications of the ACM Vol. 52 (8), August 2009. (article)
-
Sandro Boccuzzo, , Richard Wettel, Sazzadul Alam, Philippe Dugerdil, Harald C. Gall, Michele Lanza, EvoSpaces - Multi-dimentional Navigation Spaces for Software Evolution Vol. LNCS 5440, Springer 2009. (inbook)
In software development, a major difficulty comes from the intrinsic complexity of software systems and the size of which can easily reach millions of lines of code. But software is an intangible artifact that does not have any natural visual representation. While many software visualization techniques have been proposed in the literature, they are often difficult to interpret. In fact, the user of such representations is confronted with an artificial world that contains and represents intangible objects. The goal of our EVOSPACES project was to investigate effective visual metaphors (i.e., analogies) between natural objects and software objects so that we can exploit the cognitive understanding of the user. The difficulty of the approach is that the common sense expectations about the displayed world should also apply to the world of software objects. To solve this common sense representation problem for software objects our project addressed both the small-scale (i.e., the level of individual objects) and the large-scale (i.e., the level of groups of objects). After many experiments we decided for a "city" metaphor: at the small scale we included different houses and their shapes as visual objects to cover size, structure and history. At the large-scale level we arrange the different types of houses in districts and include their history in diverse layouts. The user then is able to use EVOSPACES virtual software city to navigate and explore all kinds of aspects of a city and its houses: size, age, historical evolution, changes, growth, restructuring, and evolution patterns such as code smells or architectural decay. For that we have developed a software environment named EVOSPACES as a plug-in to Eclipse so that visual metaphors can quickly be implemented in an easily navigable virtual space. Due to the large amount of information we complemented the flat 2D world with full-fledged immersive 3D representation. In this virtual software city, the dimensions and appearance of the buildings can be set according to software metrics. The user of the EVOSPACES environment can then explore a given software system by navigating through the corresponding virtual city.
-
Harald C. Gall, Gerald Reif, ICSE 2009 Tutorial - Semantic Web Technologies in Software Engineering , 31th International Conference on Software Engineering (ICSE 2009), May 18 2009. (inproceedings/tutorial)
Over the years, the software engineering community has developed various tools to support the specification, development, and maintainance of software. Many of these tools use proprietary data formats to store artifacts which hamper interoperability. On the other hand, the Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Ontologies are used to define the concepts in the domain of discourse and their relationships and as such provide the formal vocabulary applications use to exchange data. Besides the Web, the technologies developed for the Semantic Web have proven to be useful also in other domains, especially when data is exchanged between applications from different parties. Software engineering is one of these domains in which recent research shows that Semantic Web technologies are able to reduce the barriers of proprietary data formats and enable interoperability.
In this tutorial, we present Semantic Web technologies and their application in software engineering. We discuss the current status of ontologies for software entities, bug reports, or change requests, as well as semantic representations for software and its documentation. This way, architecture, design, code, or test models can be shared across application boundaries enabling a seamless integration of engineering results.
-
Patrick Knab, Martin Pinzger, Beat Fluri, Harald C. Gall, Interactive Views for Analyzing Problem Reports, ICSM '09 Proceedings of the 25th International Conference on Software Maintenance 2009. (inproceedings)
Issue tracking repositories contain a wealth of information for reasoning about various aspects of software development processes. In this paper, we focus on bug triaging and
provide visual means to explore the effort estimation quality and the bug life-cycle of reported problems.
Our approach follows the Micro/Macro reading technique and uses a combination of graphical views to investigate details of individual problem reports while maintaining the context provided by the surrounding data population. This enables the detection and detailed analysis of hidden pat- terns and facilitates the analysis of problem report outliers.
In an industrial study, we use our approach in various problem report analysis scenarios and answer questions related to effort estimation and resource planning.
-
Matthias Hert, Gerald Reif, Harald C. Gall, Personal Knowledge Mapping with Semantic Web Technologies, Proceedings of the 1st International Workshop on Personal Knowledge Management at the 5th Conference on Professional Knowledge Management, March 2009. (inproceedings)
Semantic Web technologies promise great benefits for Personal Knowledge Management (PKM) and Knowledge Management (KM) in general when data needs to be exchanged or integrated. However, the Semantic Web also introduces new issues rooted in its distributed nature as multiple ontologies exist to encode data in the Personal Information Management (PIM) domain. This poses problems for applications processing this data as they would need to support all current and future PIM ontologies. In this paper, we introduce an approach that decouples applications from the data representation by providing a mapping service which translates Semantic Web data between different vocabularies. Our approach consists of the RDF Data Transformation Language (RDTL) to define mappings between different but related ontologies and the prototype implementation RDFTransformer to apply mappings. This allows the definition of mappings that are more complex than simple one-to-one matches.
-
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald C. Gall, Brendan Murphy, Putting it All Together: Using Socio-Technical Networks to Predict Failures, ISSRE '09: Proceedings of the 20th International Symposium on Software Reliability, November 2009, IEEE Computer Society. (inproceedings)
-
Patrick Knab, Martin Pinzger, Harald C. Gall, Smart views for analyzing problem reports: tool demo, ESEC/FSE '09: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering 2009, ACM. (inproceedings)
Issue tracking repositories contain a wealth of information for reasoning about various aspects of software development processes. In this paper, we focus on bug triaging and provide visual means to explore the effort estimation quality and the bug life-cycle of reported problems.
Our approach uses a combination of graphical views to investigate details of individual problem reports while maintaining the context provided by the surrounding data population. This enables the detection and detailed analysis of hidden patterns and facilitates the analysis of problem report outliers.
-
Matthias Hert, Gerald Reif, Harald C. Gall, SPARQL/Update for Relational Databases, Proceedings of the 6th European Semantic Web Conference (ESWC), June 2009. (inproceedings/Poster)
We present an approach for ontology-based read and write access to existing Relational Databases (RDBs). SPARQL/Update serves as the data manipulation language that is translated to equivalent SQL commands according to mappings between the RDBs and the Semantic Web. This addition of write support enables a full integration of existing relational data into Semantic Web applications.
-
Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, Abraham Bernstein, Tracking Concept Drift of Software Projects Using Defect Prediction Quality, Proceedings of the 6th IEEE Working Conference on Mining Software Repositories , May 2009, IEEE Computer Society. (inproceedings)
Defect prediction is an important task in the mining of
software repositories, but the quality of predictions varies
strongly within and across software projects. In this paper
we investigate the reasons why the prediction quality is so
fluctuating due to the altering nature of the bug (or defect)
fixing process. Therefore, we adopt the notion of a concept
drift, which denotes that the defect prediction model has
become unsuitable as set of influencing features has changed
? usually due to a change in the underlying bug generation
process (i.e., the concept). We explore four open source
projects (Eclipse, OpenOffice, Netbeans and Mozilla) and
construct file-level and project-level features for each of
them from their respective CVS and Bugzilla repositories.
We then use this data to build defect prediction models and
visualize the prediction quality along the time axis. These
visualizations allow us to identify concept drifts and ? as a
consequence ? phases of stability and instability expressed
in the level of defect prediction quality. Further, we identify
those project features, which are influencing the defect
prediction quality using both a tree induction-algorithm and
a linear regression model. Our experiments uncover that
software systems are subject to considerable concept drifts
in their evolution history. Specifically, we observe that the
change in number of authors editing a file and the number
of defects fixed by them contribute to a project?s concept
drift and therefore influence the defect prediction quality.
Our findings suggest that project managers using defect
prediction models for decision making should be aware of
the actual phase of stability or instability due to a potential
concept drift.
2008
-
Yu Zhou, Michael Würsch, Emanuel Giger, Harald C. Gall, Jian Lue, A Bayesian Network Based Approach for Change Coupling Prediction, WCRE '08: Proceedings of the 2008 15th Working Conference on Reverse Engineering 2008, IEEE Computer Society. (inproceedings)
Source code coupling and change history are two important data
sources for change coupling analysis. The popularity of public open
source projects in recent years makes both sources available. Based
on our previous research, in this paper, we inspect different
dimensions of software changes including change significance or source code
dependency levels, extract a set of features from the two
sources and propose a bayesian network-based approach for
change coupling prediction. By combining the features from the co-changed entities
and their dependency relation, the approach can model the underlying
uncertainty. The empirical case study on two medium-sized
open source projects demonstrates the feasibility and effectiveness
of our approach compared to previous work.
-
Martin Pinzger, Katja Gräfenhain, Patrick Knab, Harald C. Gall, A Tool for Visual Understanding of Source Code Dependencies, Proceedings of the International Conference on Program Comprehension (ICPC'08) 2008, IEEE Computer Society. (inproceedings)
Many program comprehension tools use graphs to visualize
and analyze source code. The main issue is that existing
approaches create graphs overloaded with too much
information. Graphs contain hundreds of nodes and even
more edges that cross each other. Understanding these
graphs and using them for a given program comprehension
task is tedious, and in the worst case developers stop using
the tools. In this paper we present DA4Java, a graphbased
approach for visualizing and analyzing static dependencies
between Java source code entities. The main contribution
of DA4Java is a set of features to incrementally
compose graphs and remove irrelevant nodes and edges
from graphs. This leads to graphs that contain significantly
fewer nodes and edges and need less effort to understand.
-
Yi Guo, Adrian Schwaninger, Harald C. Gall, An Architecture for an Adaptive and Collaborative Learning Management System in Aviation Security, 17th IEEE International Workshop in Enabling Technologies: Infrastructures for Collaborative Enterprises, Workshop for Distributed and Mobile Collaboration (DMC 2008), June 2008, IEEE Computer Society. (inproceedings)
The importance of aviation security has increased dramatically in recent years. Frequently changing regulations and the need to adapt quickly to new and emerging threats are challenges that need to be addressed by airports, security companies and appropriate authorities across the world. Learning Management Systems (LMS) have been developed as effective tools for enhancing the management, integration and application of knowledge in organizations. In the aviation security domain, we need mechanisms to quickly adapt to new learning content, to different roles ranging from screeners to supervisors, to flexible training
scenarios and solid job assessments. For that, a learning system has to be flexible and adaptive both in knowledge, organizational and in collaboration dimensions. Current LMS do not meet these requirements. In this paper we present a software architecture that is apt to support the adaptability and collaboration needs for such a system in aviation security. We discuss the requirements, roles, learning objects and course configuration in terms of adaptive and collaborative learning. We present a six-layer architecture and discuss some of its application scenarios. Our aim is to improve the quality and usefulness of LMS in aviation
security by utilizing knowledge-based analysis for data analysis and integrating a process engine for collaborative learning. We briefly report on our prototype and the gained
first feedback from the users.
-
Ionut Subasu, Patrick Ziegler, Klaus R. Dittrich, Harald C. Gall, Architectural Concerns for Flexible Data Management, EDBT 2008 Workshops, March 2008, ACM. (inproceedings/Workshop paper)
Evolving database management systems (DBMS) towards
more flexibility in functionality, adaptation to changing re-
quirements, and extensions with new or different compo-
nents, is a challenging task. Although many approaches
have tried to come up with a flexible architecture, there
is no architectural framework that is generally applicable to
provide tailor-made data management and can directly inte-
grate existing application functionality. We discuss an alter-
native database architecture that enables more lightweight
systems by decomposing the functionality into services and
have the service granularity drive the functionality. We pro-
pose a service-oriented DBMS architecture which provides
the necessary flexibility and extensibility for general-purpose
usage scenarios. For that we present a generic storage ser-
vice system to illustrate our approach.
-
Beat Fluri, Emanuel Giger, Harald C. Gall, Discovering Patterns of Change Types, Proceedings of the 23rd International Conference on Automated Software Engineering, September 2008, IEEE Computer Society. (inproceedings)
The reasons why software is changed are manyfold; new features are added, bugs have to be fixed, or the consistency of coding rules has to be re-established. Since there are many types of of source code changes we want to explore whether they appear frequently together in time and whether they describe specific development activities. We describe a semi-automated approach to discover patterns of such change types using agglomerative hierarchical clustering. We extracted source code changes of one commercial and two open-source software systems and applied the clustering. We found that change type patterns do describe development activities and affect the control flow, the exception flow, or change the API.
-
Martin Pinzger, Harald C. Gall, Michael Fischer, Emerging Methods, Technologies and Process Management in Software Engineering, John Wiley 2008. (inbook)
-
Martin Pinzger, Katja Gräfenhain, Patrick Knab, Harald C. Gall, Incremental Visual Understanding of Java Source Code, Department of Informatics, University of Zurich 2008. (techreport)
-
Jacek Ratzinger, Thomas Sigmund, Harald C. Gall, On the Relation of Refactorings and Software Defect Prediction, MSR 2008. (inproceedings)
This paper analyzes the influence of evolution activities such as refactoring on software defects. In a case study of five open source projects we used attributes of software evolution to predict defects in time periods of six months. We use versioning and issue tracking systems to extract 110 data mining features, which are separated into refactoring and non-refactoring related features. These features are used as input into classification algorithms that create prediction models for software defects. We found out that refactoring related features as well as non-refactoring related features lead to high quality prediction models. Additionally, we discovered that refactorings and defects have an inverse correlation: The number of software defects decreases, if the number of refactorings increased in the preceding time period. As a result, refactoring should be a signi?cant part of both bug ?xes and other evolutionary changes to reduce software defects.
-
Beat Fluri, Jonas Zuberbuehler, Harald C. Gall, Recommending Method Invocation Context Changes, Proceedings of the 1st International Workshop on Recommender Systems for Software Engineering, November 2008, ACM. (inproceedings)
Our investigations of bug fixes in Eclipse showed that a significant amount of bugs were fixed by moving invocations of certain methods into the then or else-part of if-statements with similar conditions. Based on this finding, we leverage such context changes applied in the past to support developers while adding invocations of the same method. In this paper we present ChangeCommander, an Eclipse plugin that implements our approach to recommend insertions of particular if-statements before calling a method. ChangeCommander presents context change suggestions by highlighting affected method invocations in the source code and provides automated code adaptation support.
-
Amancio Bouza, Gerald Reif, Abraham Bernstein, Harald C. Gall, SemTree: Ontology-Based Decision Tree Algorithm for Recommender Systems, In Proceedings of the 7th International Semantic Web Conference, October 2008. (inproceedings/Poster)
Recommender systems play an important role in supporting people when choosing items from an overwhelming huge number of choices. So far, no recommender system makes use of domain knowledge. We are modeling user preferences with a machine learning approach to recommend people items by predicting the item ratings. Specifically, we propose SemTree, an ontology-based decision tree learner, that uses a reasoner and an ontology to semantically generalize item features to improve the effectiveness of the decision tree built. We show that SemTree outperforms comparable approaches in recommending more accurate recommendations considering domain knowledge.
-
Marco D'Ambros, Harald C. Gall, Michele Lanza, Martin Pinzger, Software Evolution , SpringerLink 2008. (inbook)
Software repositories such as versioning systems, defect tracking systems, and
archived communication between project personnel are used to help manage the progress of
software projects. Software practitioners and researchers increasingly recognize the potential
benefit of mining this information to support the maintenance of software systems, improve
software design or reuse, and empirically validate novel ideas and techniques. Research is
now proceeding to uncover ways in which mining these repositories can help to understand
software development, to support predictions about software development, and to plan various
evolutionary aspects of software projects.
This chapter presents several analysis and visualization techniques to understand software
evolution by exploiting the rich sources of artifacts that are available. Based on the data models
that need to be developed to cover sources such as modification and bug reports we describe
how to use a Release History Database for evolution analysis. For that we present approaches
to analyze developer effort for particular software entities. Further we present change coupling
analyses that can reveal hidden change dependencies among software entities. Finally, we
show how to investigate architectural shortcomings over many releases and to identify trends
in the evolution. Kiviat graphs can be effectively used to visualize such analysis results.
-
Sandro Boccuzzo, Harald C. Gall, Software Visualization with Audio Supported Cognitive Glyphs, 24th IEEE International Conference on Software Maintenance (ICSM 2008) 2008, IEEE Computer Society. (inproceedings)
There exist numerous software visualization techniques that
aim to facilitate program comprehension. One of the main
concerns in every such software visualization is to identify
relevant aspects fast and provide information in an effective
way. In previous work, we developed a cognitive visualiza-
tion technique and tool called CocoViz that uses common
place metaphors for an intuitive understanding of software
structures and evolution. In this paper, we address soft-
ware comprehension by a combination of visualization and
audio. Evolution and structural aspects are annotated with
different audio to represent concepts such as design erosion,
code smells or evolution metrics. We use audio concepts
such as loudness, sharpness, tone pitch, roughness or oscil-
lation and map those to properties of classes and packages.
As such we provide an audio annotation of software entities
along their version history for software analysis and soft-
ware browsing. Our ?rst results with the prototype and a
small user study show that with this combination of visual
and aural means we can facilitate program comprehension
and provide additional information that usually is not pro-
vided by current visualization approaches.
-
Giacomo Ghezzi, Harald C. Gall, Towards Software Analysis as a Service, Proceedings of Evol'08, the 4th Intl. ERCIM Workshop on Software Evolution and Evolvability at the 23rd IEEE/ACM Intl. Conf. on Automated Software Engineering, September 2008. (inproceedings)
Throughout the years software engineers have come up with a myriad of specialized tools and techniques that focus on a certain type of analysis, such as metrics extraction, evolution tracking, co-change detection, bug prediction, all the way up to social network analysis of team dynamics.
However, easy and straight forward synergies between these analyses/tools rarely exist because of their stand-alone nature, their platform dependence, their different input and output formats and the variety of systems to analyze. This significantly hampers their usage and reduces their acceptance by other researchers and software companies.
To overcome this problem we propose a distributed and collaborative software analysis platform to enable a seamless interoperability of software analysis tools across platform, geographical and organizational boundaries. In particular, we devise software analysis tools as services that can be accessed and composed over the Internet. These distributed services shall be widely accessible through a software analysis broker where organizations and research groups can register and share their tools.
To enable (semi)-automatic use and composition of these tools, they are classified and mapped into a software analysis taxonomy and adhere to specific meta-models and ontologies for their category of analysis.
-
Harald C. Gall, Gerald Reif, Tutorial - Semantic Web Technologies in Software Engineering, 30th International Conference on Software Engineering (ICSE 2008), May 12 2008. (inproceedings/tutorial)
Over the years, the software engineering community has developed various tools to support the specification, development, and maintainance of software. Many of these tools use proprietary data formats to store artifacts which hamper interoperability. However, the Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Ontologies are used define the concepts in the domain of discourse and their relationships and as such provide the formal vocabulary applications use to exchange data. Beside the Web, the technologies developed for the Semantic Web have proven to be useful also in other domains, especially when data is exchanged between applications from different parties. Software engineering is one of these domains in which recent research shows that Semantic Web technologies are able to reduce the barriers of proprietary data formats and enable interoperability.
In this tutorial, we present Semantic Web technologies and their application in software engineering. We discuss the current status of ontologies for software entities, bug reports, or change requests, as well as semantic representations for software and its documentation. This way, architecture, design, code, or test models can be shared across application boundaries enabling a seamless integration of engineering results.
2007
-
Gian Marco Laube, Gerald Reif, Harald C. Gall, Architectural Issues of the Semantic Clipboard as Ontology Mediation Service, 1st Workshop on Architecture, Design, and Implementation of the Semantic Desktop (SemDeskDesign2007) at the Eurpean Semantic Web Conference ESWC2007, June 2007. (inproceedings)
When copying and pasting data between applications using
the operating system clipboard, the semantics of the transfered information is usually lost. Using Semantic Web technologies these semantics
can be explicitly de?ned in a machine process-able way. In previous research we developed a prototype to show the feasibility and bene?ts from
a semantic enriched clipboard, that was limited to the number of ontologies it could handle or application that could access it. In this paper
we introduce an advanced architecture for the Semantic Clipboard that
incorporates the standard communication paradigm of operating system
clipboards and is able to handle RDF graphs of arbitrary domains of interest. This architecture includes a data mediation service that overcomes
vocabulary heterogeneities between source and target applications.
-
Beat Fluri, Michael Würsch, Martin Pinzger, Harald C. Gall, Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction, IEEE Transactions on Software Engineering Vol. 33 (11), November 2007. (article)
A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm of Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark we developed that consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script by 45% better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, i.e., we reach a mean absolute percentage error of 34%, compared to 79% reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evaluation.
-
Sandro Boccuzzo, Harald C. Gall, CocoViz: Supported Cognitive Software Visualization, Proceedings of 14th Working Conference on Reverse Engineering (WCRE 2007) 2007, IEEE Computer Society. (inproceedings)
As software evolves and becomes more and more complex, program comprehension arises as a major concern in soft- ware projects. The amount of data and the complexity of relationships between the entities are unmanageable for en- gineers without effective tool support. In this paper, we demonstrate how CocoViz can help understanding software in a quick and intuitive manner. Some of the implemented approaches have been presented inde- pendently before. However, in CocoViz we combine them in an intuitive and easy to use manner.
-
Sandro Boccuzzo, Harald C. Gall, CocoViz: Towards Cognitive Software Visualizations, Proceedings of IEEE International Workshop on Visualizing Software for Understanding and Analysis (VisSoft 2007) 2007, IEEE Computer Society. (inproceedings)
Understanding software projects is a complex task. There is an increasing need for visualizations that improve com- prehensiveness of the evolution of a software system. This paper discusses our recent work in software visualization with respect to metaphors. Our goal is to use simple and well-known graphical elements known from daily life such as houses, spears, or tables to allow a user a quick and intuitive understanding of a given visualization via their proportions. We present a software metrics configurator that handle different metaphors and allows optimizations to their graphical representation. The results so far show that large systems can be visualized effectively with metaphor glyphs, yet more case studies and more metaphor glyphs are required for a better understanding for offering a simple and cognitive visual understanding of a software system.
-
Ksenia Ryndina, Jochen M. Küster, Harald C. Gall, Consistency of Business Process Models and Object Life Cycles, Models in Software Engineering 2007, Springer. (inproceedings)
-
Beat Fluri, Michael Würsch, Harald C. Gall, Do Code and Comments Co-Evolve? On the Relation Between Source Code and Comment Changes, Proceedings of the 14th Working Conference on Reverse Engineering, October 2007, IEEE Computer Society. (inproceedings)
Comments are valuable especially for program understanding and maintenance, but do developers comment their code? To which extent do they add comments or adapt them when they evolve the code? We examine the question whether source code and associated comments are really changed together along the evolutionary history of a software system.
In this paper, we describe an approach to map code and comments to observe their co-evolution over multiple versions. We investigated three open source systems (i.e., ArgoUML, Azureus, and JDT Core) and describe how comments and code co-evolved over time. Some of our findings show that: 1) newly added code|despite its growth rate|barely gets commented; 2) class and method declarations are commented most frequently but far less, for example, method calls; and 3) that 97% of comment changes are done in the same revision as the associated source code change.
-
Jacek Ratzinger, Martin Pinzger, Harald C. Gall, EQ-Mine:Predicting Short-Term Defects for Software Evolution, Proceedings of the 10th International Conference of Funtamental Approaches to Software Engineering (FASE), April 2007, Springer. (inproceedings)
-
Jacek Ratzinger, Thomas Sigmund, Peter Vorburger, Harald C. Gall, Mining Software Evolution to Predict Refactoring, Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2007) 2007, IEEE Computer Society. (inproceedings)
Can we predict locations of future refactoring based on the development history? In an empirical study of open source projects we found that attributes of software evolution data can be used to predict the need for refactoring in the following two months of development. Information systems utilized in software projects provide a broad range of data for decision support. Versioning systems log each activity during the development, which we use to extract data mining features such as growth measures, relationships between classes, the number of authors working on a particular piece of code, etc. We use this information as input into classification algorithms to create prediction models for future refactoring activities. Different state-of-the-art classifiers are investigated such as decision trees, logistic model trees, propositional rule learners, and nearest neighbor algorithms. With both high precision and high recall we can assess the refactoring proneness of object-oriented systems. Although we investigate different domains, we discovered critical factors within the development life cycle leading to refactoring, which are common among all studied projects.
-
Jacek Ratzinger, Martin Pinzger, Harald C. Gall, Quality Assessment based on Attribute Series of Software Evolution, Proceedings of the 14th Working Conference on Reverse Engineering (WCRE), October 2007, IEEE Computer Society. (inproceedings)
-
Gerald Reif, Gian Marco Laube, Knud Möller, Harald C. Gall, SemClip - Overcoming the Semantic Gap Between Desktop Applications, 5th Semantic Web Challenge at the 6th International Semantic Web Conference (ISWC 2007), November 11-15 2007. (inproceedings/Semantic Web Challenge)
When copying and pasting data between applications using
the operating system clipboard, the semantics of the transfered information is usually lost. Using Semantic Web technologies these semantics can
be explicitly de?ned in a machine process-able way and therefore be preserved during the data transfer. In this paper we introduce SemClip, our
implementation of a Semantic Clipboard that enables the exchange of
semantically enriched data between desktop applications and show how
such a clipboard can be used to copy and paste semantic annotations
from Web pages to desktop applications.
2006
-
Gerald Reif, Harald C. Gall, An Architecture for a Semantic Portal, International Workshop on Data Integration and Semantic Web (DISWeb'06) at the 18th Conference on Advanced Information Systems Engineering (CAiSE 2006), June 2006, Springer. (inproceedings)
Current Web applications provide their information and functionalities to human users only. To make Web applications also accessible for machines, the Semantic Web proposes an extension of the current Web, that describes the semantics of the content and the services explicitly with machine-processable meta-data. In this paper we introduce an architecture of a Semantic Portal that provides a unique front-end to the information and functionalities of individual Semantic Web applications. To realize the portal we use WEESA to semantically annotate Web applications and provide the annotations in a knowledge base (KB) for download and querying. Based on that, the Semantic Harvester collects the KBs from individual Semantic Web applications to build the global KB of the Semantic Portal. Finally, we use Semantic Web services to make the portal a unique interface to the services of the Web applications.
-
Beat Fluri, Harald C. Gall, Classifying Change Types for Qualifying Change Couplings, Proceedings of the 9th International Conference on Program Comprehension, June 2006, IEEE Computer Society. (inproceedings)
Current change history analysis approaches rely on information provided by versioning systems such as CVS. Therefore, changes are not related to particular source code entities such as classes or methods but rather to text lines added and/or removed. For analyzing whether some change coupling between source code entities is significant or only minor textual adjustments have been checked in, it is essential to reflect the changes to the source code entities.
We have developed an approach for analyzing and classifying change types based on code revisions. We can differentiate between several types of changes on the method or class level and assess their significance in terms of the impact of the change types on other source code entities and whether a change may be functionality-modifying or functionality-preserving.
We applied our change taxonomy to a case study and found out that in many cases large numbers of lines added and/or deleted are not accompanied by significant changes but small textual adaptations (such as indentation, etc.). Furthermore, our approach allows us to relate all change couplings to the significance of the identified change types. As a result, change couplings between code entities can be qualified and less relevant couplings can be filtered out.
-
Michael Fischer, Harald C. Gall, EvoGraph: A Lightweight Approach to Evolutionary and Structural Analysis of Large Software Systems, 13th Working Conference on Reverse Engineering (WCRE), October 2006, IEEE Computer Society. (inproceedings)
Structural analyses frequently fall short in an adequate
representation of historical changes for retrospective analysis.
By compounding the two underlying information spaces
in a single approach, the comprehension about the interaction
between evolving requirements and system development
can be improved significantly. We therefore propose
a lightweight approach based on release history data and
source code changes, which first selects entities with evolutionary
outstanding characteristics and then indicates their
structural dependencies via commonly used source code entities.
The resulting data sets and visualizations aim at a
holistic view to point out and assess structural stability, recurring
modifications, or changes in the dependencies of
the file-sets under inspection. In this paper we describe
our approach and its results in terms of the Mozilla case
study. Our approach completes typical release history mining
and source code analysis approaches, therefore past restructuring
events, new, shifted, and removed dependencies
can be spotted easily.
-
Reto Geiger, Beat Fluri, Harald C. Gall, Martin Pinzger, Relation of Code Clones and Change Couplings, Proceedings of the 9th International Conference of Funtamental Approaches to Software Engineering, March 2006, Springer. (inproceedings)
Code clones have long been recognized as bad smells in software systems and are considered to cause maintenance problems during evolution. It is broadly assumed that the more clones two files share, the more often they have to be changed together. This relation between clones and change couplings has been postulated but neither demonstrated nor quantified yet. However, given such a relation it would simplify the
identification of restructuring candidates and reduce change couplings.
In this paper, we examine this relation and discuss if a correlation between code clones and change couplings can be verified. For that, we propose a framework to examine code clones and relate them to change couplings taken from release history analysis.
We validated our framework with the open source project Mozilla and the results of the validation show that although the relation is statistically unverifiable it derives a reasonable amount of cases where the relation exists.
Therefore, to discover clone candidates for restructuring we additionally propose a set of metrics and a visualization technique. This allows one to spot where a correlation between cloning and change coupling exists and, as a result, which files should be restructured to ease further evolution.
-
Gerald Reif, Martin Morger, Harald C. Gall, Semantic Clipboard - Semantically Enriched Data Exchange Between Desktop Applications, Semantic Desktop and Social Semantic Collaboration Workshopat the 5th International Semantic Web Conference ISWC06, November 2006. (inproceedings)
The operating system clipboard is used to copy and paste data between applications even if the applications are from different vendors. Current clipboards only support the transfer of data or formatted data between applications. The semantics of the data, however, is lost in the transfer. The Semantic Web, on the other hand, provides a common framework that allows data to be shared across application boundaries while preserving the semantics of the data. In this paper we introduce the concept of a Semantic Clipboard and present a prototype implementation that can be used to copy and paste RDF meta-data between desktop applications. The Semantic Clipboard is based on a flexible plugin architecture that enables the easy extension of the clipboard to new ontology vocabularies and target applications. Furthermore, we show how the Semantic Clipboard is used to copy and paste the meta-data from semantically annotated Web pages to a user's desktop application.
-
Gerald Reif, Harald C. Gall, Using WEESA to Semantically Annotate Cocoon Web Applications, 1st Semantic Authoring and Annotation Workshop 2006 at the 5th International Semantic Web Conference ISWC2006, November 2006. (inproceedings)
The Semantic Web is based on the idea that Web applications provide semantically annotated Web pages. This meta-data is typically added in the semantic annotation process which is currently not part of the Web engineering process. Web engineering, however, proposes methodologies to design, implement and maintain Web applications but lack semantic annotation. In this paper we show how WEESA, a mapping from XML documents to ontologies, can be used in Apache Cocoon Web applications to semantically annotate Web pages. We introduce Cocoon transformer components that use the WEESA mapping definition to automatically generate RDF meta-data from XML documents. We further show how existing Cocoon Web applications can be extended to Semantic Web applications and discuss the experiences gained in an industry case study.
2005
-
Michele Lanza, Stephane Ducasse, Harald C. Gall, Martin Pinzger, CodeCrawler: An Information Visualization Tool for Program Comprehension, Proceedings of the 27th International Conference on Software Engineering 2005, ACM. (inproceedings)
CODECRAWLER is a language independent, interactive, software visualization tool. It is mainly targeted at visualizing object-oriented software, and in its newest implementation has become a general information visualization tool. It has been successfully validated in several industrial case studies over the past few years. CODECRAWLER strongly adheres to lightweight principles: it implements and visualizes polymetric views, visualizations of software enriched with information such as software metrics and other source code semantics. CODECRAWLER is built on top of Moose, an extensible language independent reengineering environment that implements the FAMIX metamodel. In its last implementation, CODECRAWLER has become a general-purpose information visualization tool.
-
Stefania Leone, Thomas Hodel, Harald C. Gall, Concept and architecture of an pervasive document editing and managing system, SIGDOC '05: Proceedings of the 23rd annual international conference on Design of communication, September 21-23 2005. (inproceedings)
Collaborative document processing has been addressed by many
approaches so far, most of which focus on document versioning
and collaborative editing. We address this issue from a different
angle and describe the concept and architecture of a pervasive
document editing and managing system. It exploits database
techniques and real-time updating for sophisticated collaboration
scenarios on multiple devices. Each user is always served with upto-
date documents and can organize his work based on document
meta data. For this, we present our conceptual architecture for
such a system and discuss it with an example.
-
Jacek Ratzinger, Michael Fischer, Harald C. Gall, EvoLens: Lens-View Visualizations of Evolution Data, Proceedings of the 8th International Workshop on Principles of Software Evolution 2005. (inproceedings)
Observing the evolution of very large software systems is difficult because of the sheer amount of information that needs to be analyzed and because the changes performed in the system are at a very low granularity level. In recent approaches software metrics have been used to compute condensed graphical visualizations of these data also reflecting metrics. However, most techniques concentrate on visualizing data of one particular release providing only insufficient support for visualizing data of several selected releases. In this paper we present the RelVis visualization approach that provides integrated condensed graphical views on source code and release history data of up to n releases of a software system. Measurements of metrics of n releases are composed to views that facilitate spectators to spot trends of metrics of source code entities and relationships. Critical trends are highlighted: This allows the user to direct perfective maintenance activities to source code entities involved. The paper provides needed background information and evaluation of the approach with a large open source software project.
-
Beat Fluri, Harald C. Gall, Martin Pinzger, Fine-Grained Analysis of Change Couplings, Proceedings of the 5th International Workshop on Source Code Analysis and Manipulation, October 2005, IEEE Computer Society. (inproceedings)
In software evolution analysis, many approaches analyze release history data available through versioning systems. The recent investigations of CVS data have shown that commonly committed files highlight their change couplings. However, CVS stores modifications on the basis of text but does not track structural changes, such as the insertion, removing, or modification of methods or classes. A detailed analysis whether change couplings are caused by source code couplings or by other textual modifications, such as updates in license terms, is not performed by current approaches.
The focus of this paper is on adding structural change information to existing release history data. We present an approach that uses the structure compare services shipped with the Eclipse IDE to obtain the corresponding fine-grained changes between two subsequent versions of any Java class. This information supports filtering those change couplings which result from structural changes. So we can distill the causes for change couplings along releases and filter out those that are structurally relevant. The first validation of our approach with a medium-sized open source software system showed that a reasonable amount of change couplings are not caused by source code changes.
-
Marco D'Ambros, Michele Lanza, Harald C. Gall, Fractal Figures: Visualizing Development Effort for CVS Entities, VISSOFT '05: Proceedings of the 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis 2005, IEEE Computer Society. (inproceedings)
Versioning systems such as CVS or Subversion exhibit a
large potential to investigate the evolution of software systems.
They are used to record the development steps of software
systems as they make it possible to reconstruct the
whole evolution of single files. However, they provide no
good means to understand how much a certain file has been
changed over time and by whom. In this paper we present
an approach to visualize files using fractal figures, which (1)
convey the overall development effort, (2) illustrate the distribution
of the effort among various developers, and (3) allow
files to be categorized in terms of the distribution of
the effort following gestalt principles. Our approach allows
us to discover files of high development efforts in terms of
team size and effort intensity of individual developers. The
visualizations allow an analyst or a project manager to get
first insights into team structures and code ownership principles.
We have analyzed Mozilla as a case study and we
show some of the recovered team development patterns in
this paper as a validation of our approach.
-
Jacek Ratzinger, Michael Fischer, Harald C. Gall, Improving Evolvability through Refactoring, Proceedings of the International Workshop on Mining Software Repositories 2005. (inproceedings)
Refactoring is one means of improving the structure of existing software. Locations where to apply refactoring are often based on subjective perceptions such as ?bad smells?, which are vague suspicions of design shortcomings. We exploit historical data extracted from repositories such as CVS and focus on change couplings: if some software parts change at the same time very often over several releases, this data can be used to point to candidates for refactoring. We adopt the concept of bad smells and provide additional change smells. Such a smell is hardly visible in the code, but easy to spot when viewing the change history. Our approach enables the detection of such smells allowing an engineer to apply refactoring on these parts of the source code to improve the evolvability of the software. For that, we analyzed the history of a large industrial system for a period of 15 months, proposed spots for refactorings based on change couplings, and performed them with the developers. After observing the system for another 15 months we finally analyzed the effectiveness of our approach. Our results support our hypothesis that the combination of change dependency analysis and refactoring is applicable and effective.
-
Michael Fischer, Johann Oberleitner, Jacek Ratzinger, Harald C. Gall, Mininig Evolution Data of a Product Family, Proceedings of the International Workshop on Mining Software Repositories 2005. (inproceedings)
Diversification of software assets through evolving requirements impose a constant challenge on the developers and maintainers of large software systems. Recent research has addressed the mining for data in software repositories of single products ranging from fine- to coarse grained analyses. But so far, little attention has been payed for mining data about the evolution of product families. In this work, we study the evolution and commonalities of three variants of the BSD, a large open source operating system. The research questions we tackle are concerned with how to generate high level views of the system discovering and indicating evolutionary highlights. To process the large amount of data, we extended our previously developed approach for storing release history information to support the analysis of product families. In a case study we apply our approach on data from three different code repositories representing about 8.5GB of data and 10 years of active development.
-
Michael Fischer, Johann Oberleitner, Harald C. Gall, System Evolution Tracking through Execution Trace Analysis, Proceedings of the 13th International Workshop on Program Comprehension 2005. (inproceedings)
Execution traces produced from instrumented code reflect a system's actual implementation. This information can be used to recover interaction patterns between different entities such as methods, files, or modules. Some solutions for the detection of patterns and their visualization exist, but are limited to small amounts of data and are incapable of comparing data from different versions of a large software system. In this paper, we propose a methodology to analyze and compare the execution traces of different versions of a software system to provide insights into its evolution. We recover high-level module views that facilitate the comprehension of each module's evolution. Our methodology allows us to track the evolution of particular modules and present the findings in three different kinds of visualizations. Based on these graphical representations, the evolution of the concerned modules can be tracked and comprehended much more effectively. Our EvoTrace approach uses standard database technology and instrumentation facilities of development tools, so exchanging data with other analysis tools is facilitated. Further, we show the applicability of our approach using the Mozilla open source system consisting of about 2 million lines of C/C++ code.
-
Martin Pinzger, Michael Fischer, Harald C. Gall, Towards an Integrated View on Architecture and its Evolution, Electronic Notes in Theoretical Computer Science Vol. 127 (3), April 2005. (article)
Information about the evolution of a software architecture can be found in the source basis of a project and in the release history data such as modification and problem reports. Existing approaches deal with these two data sources separately and do not exploit the integration of their analyses. In this paper, we present an architecture analysis approach that provides an integration of both kinds of evolution data. The analysis applies fact extraction and generates specific directed attributed graphs; nodes represent source code entities and edges represent relationships such as accesses, includes, inherits, invokes, and coupling between certain architectural elements. The integration of data is then performed on a meta-model level to enable the generation of architectural views using binary relational algebra. These integrated architectural views show intended and unintended couplings between architectural elements, hence pointing software engineers to locations in the system that may be critical for on-going and future maintenance activities. We demonstrate our analysis approach using a large open source software system.
-
Giuliano Antoniol, Massimiliano Di Penta, Harald C. Gall, Martin Pinzger, Towards the Integration of Versioning Systems, Bug Reports and Source Code Meta-Models, Electronic Notes in Theoretical Computer Science Vol. 127 (3), April 2005. (article)
Concurrent Versioning System (CVS) repositories and bug tracking systems are valuable sources of information to study the evolution of large open source software systems. However, being conceived for specific purposes, i.e., to support the development or trigger maintenance activities, they do neither allow an easy information browsing nor support the study of software evolution. For example, queries such as locating and browsing the faultiest methods are not provided. This paper addresses such issues and proposes an approach and a framework to consistently merge information extracted from source code, CVS repositories and bug reports. Our information representation exploits the property concepts of the FAMIX information exchange meta-model, allowing to represent, browse, and query, at different level of abstractions, the concept of interest. This allows the user to navigate back and forth from CVS modification reports to bug reports and to source code. This paper presents the analysis framework and approaches to populate it, tools developed and under development for it, as well as lessons learned while analyzing several releases of Mozilla.
-
Martin Pinzger, Harald C. Gall, Michael Fischer, Michele Lanza, Visualizing multiple evolution metrics, Proceedings of the ACM Symposium on Software Visualization (SoftVis'2005) 2005, ACM. (inproceedings)
Observing the evolution of very large software systems needs the analysis of large complex data models and visualization of condensed views on the system. For visualization software metrics have been used to compute such condensed views. However, current techniques concentrate on visualizing data of one particular release providing only insufficient support for visualizing data of several releases. In this paper we present the RelVis visualization approach that concentrates on providing integrated condensed graphical views on source code and release history data of up to n releases. Measures of metrics of source code entities and relationships are composed in Kiviat diagrams as annual rings. Diagrams highlight the good and bad times of an entity and facilitate the identification of entities and relationships with critical trends. They represent potential refactoring candidates that should be addressed first before further evolving the system. The paper provides needed background information and evaluation of the approach with a large open source software project.
-
Gerald Reif, Harald C. Gall, Mehdi Jazayeri, WEESA - Web Engineering for Semanitc Web Applications, Proceedings of the 14th International World Wide Web Conference, May 2005. (inproceedings)
The success of the Semantic Web crucially depends on the existence ofWeb pages that provide machine-understandable meta-data. This meta-data is typically added in the semantic annotation process which is currently not part of theWeb engineering process. Web engineering, however, proposes methodologies to design, implement and maintain Web applications but lack the generation of meta-data. In this paper we introduce a technique to extend existing Web engineering methodologies to develop semantically annotated Web pages. The novelty of this approach is the definition of a mapping from XML Schema to ontologies, called WEESA, that can be used to automatically generate RDF meta-data from XML content documents. We further show how we integrated the WEESA mapping into an Apache Cocoon transformer to easily extend XML based Web applications to semantically annotated Web application.
2004
-
Martin Pinzger, Michael Fischer, Mehdi Jazayeri, Harald C. Gall, Abstracting module views from source code, Proceedings of the International Conference on Software Maintenance (ICSM'04) 2004, IEEE Computer Society. (inproceedings)
In this paper we present ArchView an approach for abstracting and visualizing software module views from source code. ArchView computes abstraction metrics that are used to filter out architectural elements and relationships of minor interest resulting in more reasonable and comprehensible module views on software architectures.
-
Thomas Hodel, Harald C. Gall, Klaus R. Dittrich, Dynamic Collaborative Business Processes within Documents, In Proceedings of the 22nd Annual International Conference of Communication 2004. (inproceedings)
Effective collaborate business process support is essential in today?s business. In this paper, we address this aspect within documents. Often, such text documents are stored unsystematically in a rather confusing file structure with an inscrutable hierarchy and little access control. Business data, on the other hand, are stored in a systematic way in databases allowing multi-user, multi-site, user-/role-specific controlled access. We store text documents in databases and exploit these database capabilities: collaborative business processes then can be defined per document or any part of a document. In this paper, we present this dynamic collaborative business process concept and the prototype within documents for our database-based collaborative editor. We evaluate the potential of such business processes for the quality of communication and documentation.
-
Gerald Reif, Harald C. Gall, Mehdi Jazayeri, Towards Semantic Web Engineering: WEESA - Mapping XML Schema to Ontologies, Workshop on Application Design, Development and Implementation Issues in the Semantic Web at the 13th International World Wide Web Conference, May 2004, CEUR Workshop Proceedings. (inproceedings)
The existence of semantically tagged Web pages is crucial to bring the Semantic Web to life. But it is still costly to develop and maintain Web applications that offer data and meta-data. Several standard Web engineering methodologies exist for designing and implementing Web applications. In this paper we introduce a technique to extend existing Web engineering techniques to develop semantically tagged Web applications. The novelty of this technique is the definition and implementation of a mapping from XML Schema to ontologies that can be used to automatically generate RDF meta-data from XML content documents.
-
Thomas Gschwind, Martin Pinzger, Harald C. Gall, TUAnalyzer---Analyzing Templates in C++ Code, Proceedings of the 11th Working Conference on Reverse Engineering (WCRE'204), November 2004, IEEE Computer Society. (inproceedings)
In this paper, we present TUAnalyzer, a novel tool that extracts the template structure of C++ programs on the basis of the GNU C/C++ Compiler?s internal representation of a C/C++ translation unit. In comparison to other such tools, our tool is capable of supporting the extraction of function invocations that depend on the particular instantiation of C++ templates and to relate them to their particular template instantiation. TUAnalyzer produces RSF format output that can be easily fed into existing visualization and analysis tools such as Rigi or Graphviz. We motivate why this kind of template analysis information is essential to understand real-world legacy C++ applications. We present how our tool extracts this kind of information to allow others to build on our results and further use the template information. The applicability of our tool has been validated on real code as proof of concept. The results obtained with TUAnalyzer enable us and other approaches and tools to perform detailed studies of large (open source) C/C++ projects in the near future.
-
Michael Fischer, Harald C. Gall, Visualizing Feature Evolution of Large-Scale Software based on Problem and Modification Report Data, Journal of Software Maintenance and Evolution: Research and Practice Vol. 16 (6) 2004. (article)
Gaining higher-level evolutionary information about large software systems is a key challenge in dealing with increasing complexity and architectural deterioration. Modification reports and problem reports (PRs) taken from systems such as the concurrent versions system (CVS) and Bugzilla contain an overwhelming amount of information about the reasons and effects of particular changes. Such reports can be analyzed to provide a clearer picture about the problems concerning a particular feature or a set of features. Hidden dependencies of structurally unrelated but over time logically coupled files exhibit a good potential to illustrate feature evolution and possible architectural deterioration. In this paper, we describe the visualization of feature evolution by taking advantage of this logical coupling introduced by changes required to fix a reported problem. We compute the proximity of PRs by applying a standard technique called multidimensional scaling (MDS). The visualization of these data enables us to depict feature evolution by projecting PR dependence onto (a) feature-connected files and (b) the project directory structure of the software system. These two different views show how PRs, features and the directory tree structure relate. As a result, our approach uncovers hidden dependencies between features and presents them in an easy to assess visual form. A visualization of interwoven features can indicate locations of design erosion in the architectural evolution of a software system. As a case study, we used Mozilla and its CVS and Bugzilla data to show the applicability and effectiveness of our approach.
-
Schahram Dustdar, Harald C. Gall, Roman Schmidt, Web services for Groupware in Distributed and Mobile Collaboration, Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing 2004. (inproceedings)
While some years ago the focus of many Groupware systems has been the support of ?Web computing?, i.e. to support access with Web browsers, the focus today is shifting towards a programmatic access to ?software services?, regardless of their location and the application used to manipulate those services. Whereas the goal of ?Web Computing? has been to support group work on the Web (browser), Web services support for Groupware has the goal to provide interoperability between many groupware systems. The contribution of this paper is threefold: (i) to present a framework consisting of three levels of Web services for Groupware support, (ii) to present a novel Web services management and configuration architecture with the aim of integrating various Groupware systems in one overall configurable architecture, and (iii) to provide a use case scenario and preliminary proof-of-concept implementation example. Our overall goal for this paper is to provide a sound and flexible architecture for gluing together various Groupware systems using Web services technologies.
RDF for all publications
BibTeX for all publications
©2004-2012 University of Zurich, s.e.a.l.