Logo der Universität Wien

Open topics for Bachelor and Master Theses

VPSA for evaluating user studies

Many people have discussed and analyzed the effect of replicating studies in terms of data collection.  There’s obviously a difference in terms of what methods you use to analyze the data too. There are a variety of options of data cleaning and statistical tests that one can use to analyze the data. We want to examine how these different methods can affect the outcome using a variety of datasets. How would the significance and power tests change with different types of analysis?

Goals and tasks of this project are:

  • Collect a number of open-data user studies from the visualization and HCI community
  • Design a pipeline using, e.g. R or Python, to run a number of different statistical analyses of the data
  • Sample this pipeline and analyze the results

Contact: Thomas Torsney-Weir

University Course Evaluation

Area: Visualization / HCI

The university gathers data about all its courses after each semester in the form of evaluations and other quantitative measures. The main goal is to create an interactive dashboard to find correlations/outliers in the dataset and analyze them.

  • Design a dashboard which visualizes course evaluation data and is able to provide answers to common questions regarding this data.
  • It should also be able to show changes in certain courses over time.
  • The dataset contains real (anonymized) evaluations from the university and one challenge will be analyzing the dataset.

Contact: Raphael Sahann

Study Path Visualization

Area: Visualization

Each curriculum has a "suggested path" which students are supposed to take in order to finish their studies in time. Data shows that almost no one actually does so. The main question in this project is: how do students actually complete their study and how similar are the paths they take to do so?

  • Find a suitable visualization for the path which an individual student takes trough his/her study
  • compute a measure to estimate the difference between two study paths
  • create an interface which visualizes multiple student paths at once, lets the user select a group of similar paths and compare and explore individual paths.

Contact: Raphael Sahann

Visualization of Text Clusters

Area: Visualization

Even after successful clustering of documents it’s hard to look at those clusters and find out how meaningful the clusters are. In this project you will explore different visualizations and find a suitable way of showing text clustering’s.

  • Implement current clustering algorithm and use it on a classical dataset
  • Explore the design space of visualizations for this problem
  • Create a simple web based tool for the visualization

Contact: Christoph Kralj

Word vector based exploration of Music Tags

Area: Machine learning

Music tags are one interesting way of exploring music and word vectors allow a different view on those tags. This project tries to make those tags accessible through an interactive visualization which shows close neighbors of interesting tags and their songs.

  • Vectorize the millionsongs database tags
  • Create a visual exploration tool for this space
  • Implement graph/neighbor based methods for extraction of new songs

Contact: Christoph Kralj

Comparison of Bag-of-words and Bag-of-phrases as input for TFIDF like measures

Area: Natural language Processing, Information Retrieval

In Information Retrieval TF-IDF is a well-used measure to find relevant documents. In general TF-IDF uses words as input and you will find out how whole phrases will change the performance of tasks which use the output of TF-IDF (Clustering, Classification).

  • Create a simple IR pipeline using TF-IDF and similar measures
  • Implement an algorithm which extract phrases (use libraries is allowed/recommended)
  • Compare the initial pipeline with different extraction methods and compare the performance

Contact: Christoph Kralj

Clustering of grade distributions

We have grade distributions from nearly 1000 courses at our faculty. We would like to explore visual and algorithmic ways of exploring this data set. Core questions are:

  • are there clusters of grade distributions?
  • how pronounced are these clusters?

In order to tackle this problem, there is a combination of tasks to be followed:

  • convert the distribution into a space of percentages plus number of students
  • create an interface to "view" this (4+1)-dim space
  • apply standard clustering methods and visualize the results
  • show the impact of a change of parameters to the clustering algorithm visually

Contact: Torsten Möller | Claudia Plant

Personal finance

Area: Visualization / HCI

Today, abundance of data and scarcity of time can make it very difficult to be well informed about specific needs. This becomes even more relevant when it is about our own finances. The main goal of this project is to create a quick snapshot of your state-of-finances.

  • Design an interface which visualizes, valuable financial information that belong to a user. The purpose is to create a visual analysis pipeline. It should allow the user to grab, in less than 60 seconds and with a handful of indicators, a full picture of its financial standpoint.
  • The solution can fish data from a dataset, which will be provided by Erste Bank. The data set contains real (anonymized) transactions, with different categories. One challenge will be to analyze this dataset.
  • The visual outcome of the process should capture a specific timeline (i.e. last month, current month, last 3 months, or personalized time frame).

Contact: Torsten Möller

Visual Document Exploration for Journalists

Area: Visualization / Machine Learning / NLP

Typical document categorization systems use automatic clustering. There is evidence that this method does not produce human-understandable categorizations and does not match how a human would categorize documents. This project would combine machine learning with an interactive document exploration system to better support humans in classifying documents.

  • Analyze state-of-the-art in research and practice
  • Identify an interesting test case of a document collection (e.g. wikileaks data, or wikipedia articles)
  • Develop a tool that allows to
    • manually group documents
    • trains and updates a classifier in the background,
    • recommends other documents the journalist might be interested in
    • visually represents the data to foster overview, understanding and usability

Contact: Thomas Torsney-Weir | Elena Rudkowsky

Visually exploring neural networks

We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:

  • fast prototyping with Tableau
  • getting familiar with the data set
  • querying neural network users on what parameters they want to explore (requirement analysis)
  • development of low-fi and high-fi prototypes

Contact: Torsten Möller

Exploration of confusion matrices

When you are creating a classifier, often the error is reduced to just one number, which is typically the average of all misclassifications. However, if you train two classes A and B, A could be confused for a B and a B could be confused for an A. The difference is really important if A leads to surgery (for instance). Hence, given the data set of 100,000 neural networks from Tensorflow Playground try to understand the tradeoff of training the two classes (orange and blue). What impact do you find to the structure of the neural network? Your tasks include:

  • an interactive interface to change the percentage of the importance of classifiation errors A given B vs. B given A
  • understanding the tradeoff of different neural networks with regards to type of classification, but also to complexity and accuracy, etc.
  • suggest an extension to classifiers with more than three classes.

Contact: Torsten Möller

Visualisation-Supported Comparison of Image Segmentation Metrics

Area: Visualization, Image Processing

Segmentation algorithms, which assign labels to each element in a 2D/3D image, need to be evaluated regarding their performance on a given dataset. The quality of an algorithm is typically determined by comparing its result to a manually labelled image. Many metrics can be used to compute a single number representing the similarity of two such segmentation results, all with specific advantages and disadvantages. The goal in this project is to:

  • Research the segmentation metrics in use in the literature.
  • Create a tool that calculates multiple segmentation quality metrics on an image.
  • With the help of this tool, analyze how the single segmentation metrics perform in detecting specific kinds of errors in the segmentation results, as well as correlations between the metrics.

Contact: Bernhard Fröhler | Torsten Möller

iTuner: Touch interfaces for high-D visualization

Area: Visualization / HCI

Motivation: It is very hard for users to build up a visual understanding of spaces with dimensionality greater than 3

  • Develop a touch-screen interface for navigating high-dimensional spaces.
  • User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.

Contact: Thomas Torsney-Weir 

The Perception of Visual Uncertainty Representation by Non-Experts

Area: Visualization / HCI

Motivation:

  • Understanding / Communicating uncertainty and sensitivity information is difficult
  • Uncertainty is part of everyday life for any type of decision making process
  • Some of the previous studies done are unclear and could be improved

Goals and Tasks of several different projects:

  • Brainstorm about different visual encodings
  • Run and evaluate a larger Amazon Turk study

Contact: Torsten Möller | Thomas Torsney-Weir

Histogram Design

Area: Visualization / HCI

Histograms are often used as the first method to gain a quick overview over the statistical distribution of a collection of values, such as the pixel intensities in an image. Depending for example on the datatype of the underlying data (categorical, ordinal or continuous) and the number of data values that are available, several visualization parameters can be considered in constructing a histogram, such as bin width, aspect ratio, tick mark, etc. The perception of a histogram might vary quite a bit depending on the exact parameters chosen, and this might also influence the interpretation. On some of the above points, you should be able to find literature already.

  • Create a web application (e.g. in d3) that allows to enter data in a tabular format, and creates different histograms based on these values.
  • At least the parameters mentioned above should be adaptable by the use
  • Search for rules for determining above parameters automatically from the data, and implement a few
  • Research the variety of tasks that histograms are used for, for instance understanding distributions, filtering of data, finding modes in distribution (number and count)
  • Evaluate the different encodings regarding their effect on the found task.

Contact: Torsten Möller | Bernhard Fröhler

Semi-Automated Data Cleansing of Time Series

Area: Visualization

Many application domains involve a large number of time series, e.g., the energy sector and industrial quality management. However, such data is often afflicted by data quality problems like missing values, outliers, and other types of anomalies. For various downstream tasks, it is not sufficient to merely detect such quality problems, but to cleanse the data. Doing this manually for regularly acquired data may become very time-consuming. On the other hand, fully automated data cleansing may cause a lack of trust in the data by domain experts.

The goal of this work is to design and implement a software prototype that supports a semi-automated process of cleansing time series data. The key idea is to offer the user different mechanisms for cleansing data problems which are suggested by the system in a context-specific way. The flexibility of the user should range from a fully automated "cleanse everything" action to a detailed manual inspection of each detected problem and a corresponding individual choice of cleansing strategy.

Contact: Torsten Möller | Harald Piringer (VRVis)

Task-Oriented Guidance for Visualization Recommendation

Area: Visualization

In many application domains, data involves a large number of attributes and categories. In industrial manufacturing, for example, numerous quality indicators are measured for each produced item along with process information such as the order ID, the used machinery, and much more. For such complex data, manually searching for visualizations that reveal interesting patterns such as correlations, trends, and outliers may become very tedious and time-consuming.

The goal of this work is to extend well-known views such as scatterplots, histograms, or categorical views by integrating recommendations on demand of view parameterizations which may be worth looking at. Typical examples could include “list all scatterplots showing correlations between data attributes for any data subset”, or “rank all time-series plots by the amount of showing a clear trend over the past weeks”. Important tasks of this work are thus to:

  • identify meaningful tasks in the context of various visualization types
  • implement corresponding quality metrics which should ideally be computed efficiently in the background without disturbing the actual analysis
  • design and implement intuitive ways to present the possible visualization options as pre-views to the user in a way that is not obtrusive to the analysis and which scales to large number of possible variants (e.g., by clustering the variants to dissimilar groups).

Contact: Torsten Möller | Harald Piringer (VRVis)

Visual communication of the interdependencies of Bachelor and Master programs

Area: Vis / HCI

There are alltogether N Bachelor programs and M Master programs at the University of Vienna. Masters program are there? Which Bachelor programs give you entry to which Master programs? Considering my Bachelor degree to which Master's programs do I have access to? These are just a few of the questions that are not that easy to find an answer to. The goal of this project is it to find a good visual interface to help people find these answers. In phase one, the focus is on the programs of the University of Vienna. In phase two of the project, we would like to incorporate the Master's programs of other Austrian Universities. Important tasks to fulfill would be:

  • do a task and user/use-case analysis of such an interface
  • develop multiple different prototypes for such an interface
  • test and iteratively improve your prototypes with potential users

This project is in collaboration with the folks at the Student Services.

Contact: Torsten Möller | Stephan Precht | Carmen Fuchs

Contact us
Faculty of Computer Science
University of Vienna

Währinger Straße 29
A-1090 Vienna