
University of Applied Sciences and Arts of Southern Switzerland
UniversityManno, Ticino, Switzerland
Research output, citation impact, and the most-cited recent papers from University of Applied Sciences and Arts of Southern Switzerland (Switzerland). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from University of Applied Sciences and Arts of Southern Switzerland
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
Editor: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling, achieving or exceeding state-of-theart performance in each on four benchmark tasks. Our goal was to design a flexible architecture that can learn representations useful for the tasks, thus avoiding excessive taskspecific feature engineering (and therefore disregarding a lot of prior knowledge). Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabelled training data. This work is then used as a basis for building a freely available tagging system with excellent performance while requiring minimal computational resources. Keywords:
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
The Covid-19 pandemic has raised significant challenges for the higher education community worldwide. A particular challenge has been the urgent and unexpected request for previously face-to-face university courses to be taught online. Online teaching and learning imply a certain pedagogical content knowledge (PCK), mainly related to designing and organising for better learning experiences and creating distinctive learning environments, with the help of digital technologies. With this article, we provide some expert insights into this online-learning-related PCK, with the goal of helping non-expert university teachers (i.e. those who have little experience with online learning) to navigate in these challenging times. Our findings point at the design of learning activities with certain characteristics, the combination of three types of presence (social, cognitive and facilitatory) and the need for adapting assessment to the new learning requirements. We end with a reflection on how responding to a crisis (as best we can) may precipitate enhanced teaching and learning practices in the postdigital era.
Faster resting heart rate has been shown to be associated with a higher risk of developing hypertension and a greater incidence of cardiovascular morbidity and mortality. The aim of this study was to investigate the distribution of heart rate and its relationship with blood pressure and other cardiovascular risk factors in three populations. One European general population (Belgian study), one North American general population (Tecumseh study), and one European hypertensive population (HARVEST trial) were studied. Within each population, mixture analysis was used to investigate whether a mixture of two normal distributions explained the variance in heart rate better than a single distribution. In the men of all populations, mixture analysis identified a larger subpopulation of subjects with normal heart rate and a smaller one with fast heart rate. The subgroups with tachycardia had higher blood pressure and lipid levels than those with normal heart rate. In the populations in which they were measured, fasting insulin and postload glucose were also higher in the men with faster heart rate. A subgroup with tachycardia could also be singled out among the women from Tecumseh, but no relation between heart rate and blood pressure could be found. These findings show that in Western societies, high heart rate pertains to a distinct subgroup of subjects, who are more frequently men and exhibit the characteristic features of the insulin resistance syndrome. Sympathetic overactivity is likely to be the mechanism underlying this clinical condition.
ABSTRACT Many financial markets are characterized by strong relationships and networks, rather than arm's‐length, spot market transactions. We examine the performance consequences of this organizational structure in the context of relationships established when VCs syndicate portfolio company investments. We find that better‐networked VC firms experience significantly better fund performance, as measured by the proportion of investments that are successfully exited through an IPO or a sale to another company. Similarly, the portfolio companies of better‐networked VCs are significantly more likely to survive to subsequent financing and eventual exit. We also provide initial evidence on the evolution of VC networks.
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understanding beyond the detection of superficial correspondences between inputs and outputs, then it is critical to develop a unified model that can execute a range of linguistic tasks across different domains. To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models. For some benchmark tasks, training data is plentiful, but for others it is limited or does not match the genre of the test set. GLUE thus favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. While none of the datasets in GLUE were created from scratch for the benchmark, four of them feature privately-held test data, which is used to ensure that the benchmark is used fairly. We evaluate baselines that use ELMo (Peters et al., 2018), a powerful transfer learning technique, as well as state-of-the-art sentence representation models. The best models still achieve fairly low absolute scores. Analysis with our diagnostic dataset yields similarly weak performance over all phenomena tested, with some exceptions.
This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.
We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or non-membrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific postprocessing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer.
We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple back-propagation perform better than more shallow ones. Learning is surprisingly rapid. NORB is completely trained within five epochs. Test error rates on MNIST drop to 2.42%, 0.97 % and 0.48 % after 1, 3 and 17 epochs, respectively.
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.
OBJECTIVE: Remission in rheumatoid arthritis (RA) is an increasingly attainable goal, but there is no widely used definition of remission that is stringent but achievable and could be applied uniformly as an outcome measure in clinical trials. This work was undertaken to develop such a definition. METHODS: A committee consisting of members of the American College of Rheumatology, the European League Against Rheumatism, and the Outcome Measures in Rheumatology Initiative met to guide the process and review prespecified analyses from RA clinical trials. The committee requested a stringent definition (little, if any, active disease) and decided to use core set measures including, as a minimum, joint counts and levels of an acute-phase reactant to define remission. Members were surveyed to select the level of each core set measure that would be consistent with remission. Candidate definitions of remission were tested, including those that constituted a number of individual measures of remission (Boolean approach) as well as definitions using disease activity indexes. To select a definition of remission, trial data were analyzed to examine the added contribution of patient-reported outcomes and the ability of candidate measures to predict later good radiographic and functional outcomes. RESULTS: Survey results for the definition of remission suggested indexes at published thresholds and a count of core set measures, with each measure scored as 1 or less (e.g., tender and swollen joint counts, C-reactive protein [CRP] level, and global assessments on a 0-10 scale). Analyses suggested the need to include a patient-reported measure. Examination of 2-year followup data suggested that many candidate definitions performed comparably in terms of predicting later good radiographic and functional outcomes, although 28-joint Disease Activity Score-based measures of remission did not predict good radiographic outcomes as well as the other candidate definitions did. Given these and other considerations, we propose that a patient's RA can be defined as being in remission based on one of two definitions: (a) when scores on the tender joint count, swollen joint count, CRP (in mg/dl), and patient global assessment (0-10 scale) are all ≤ 1, or (b) when the score on the Simplified Disease Activity Index is ≤ 3.3. CONCLUSION: We propose two new definitions of remission, both of which can be uniformly applied and widely used in RA clinical trials. We recommend that one of these be selected as an outcome measure in each trial and that the results on both be reported for each trial.
We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
The authors provide a theoretical interpretation of two features of international data: the countercyclical movements in net exports and the tendency for the trade balance to be negatively correlated with current and future movements in terms of trade but positively correlated with past movements. They document the same properties in a two-country stochastic growth model in which trade fluctuations reflect, in large part, the dynamics of capital formation. The authors find that their general-equilibrium perspective is essential: the relation between the trade balance and the terms of trade depends critically on the source of fluctuations. Copyright 1994 by American Economic Association.
Abstract—Objective methods for assessing perceptual im-age quality traditionally attempt to quantify the visibility of errors (differences) between a distorted image and a ref-erence image using a variety of known properties of the hu-man visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative com-plementary framework for quality assessment based on the degradation of structural information. As a specific exam-ple of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive ex-amples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab imple-mentation of the proposed algorithm is available online at
The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multi-scale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1