University of Technology Sydney
UniversitySydney, New South Wales, Australia
Research output, citation impact, and the most-cited recent papers from University of Technology Sydney (Australia). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from University of Technology Sydney
Structural equation modeling (SEM) has become a quasi-standard in marketing and management research when it comes to analyzing the cause-effect relations between latent constructs. For most researchers, SEM is equivalent to carrying out covariance-based SEM (CB-SEM). While marketing researchers have a basic understanding of CB-SEM, most of them are only barely familiar with the other useful approach to SEM-partial least squares SEM (PLS-SEM). The current paper reviews PLS-SEM and its algorithm, and provides an overview of when it can be most appropriately applied, indicating its potential and limitations for future research. The authors conclude that PLS-SEM path modeling, if appropriately applied, is indeed a "silver bullet" for estimating causal models in many theoretical models and empirical data situations.
The last decade has seen a sharp increase in the number of scientific publications describing physiological and pathological functions of extracellular vesicles (EVs), a collective term covering various subtypes of cell-released, membranous structures, called exosomes, microvesicles, microparticles, ectosomes, oncosomes, apoptotic bodies, and many other names. However, specific issues arise when working with these entities, whose size and amount often make them difficult to obtain as relatively pure preparations, and to characterize properly. The International Society for Extracellular Vesicles (ISEV) proposed Minimal Information for Studies of Extracellular Vesicles ("MISEV") guidelines for the field in 2014. We now update these "MISEV2014" guidelines based on evolution of the collective knowledge in the last four years. An important point to consider is that ascribing a specific function to EVs in general, or to subtypes of EVs, requires reporting of specific information beyond mere description of function in a crude, potentially contaminated, and heterogeneous preparation. For example, claims that exosomes are endowed with exquisite and specific activities remain difficult to support experimentally, given our still limited knowledge of their specific molecular machineries of biogenesis and release, as compared with other biophysically similar EVs. The MISEV2018 guidelines include tables and outlines of suggested protocols and steps to follow to document specific EV-associated functional activities. Finally, a checklist is provided with summaries of key points.
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
This paper describes a new approach for generalizing the Kalman filter to nonlinear systems. A set of samples are used to parametrize the mean and covariance of a (not necessarily Gaussian) probability distribution. The method yields a filter that is more accurate than an extended Kalman filter (EKF) and easier to implement than an EKF or a Gauss second-order filter. Its effectiveness is demonstrated using an example.
Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications, where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on the existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this article, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art GNNs into four categories, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatial-temporal GNNs. We further discuss the applications of GNNs across various domains and summarize the open-source codes, benchmark data sets, and model evaluation of GNNs. Finally, we propose potential research directions in this rapidly growing field.
Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, the layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called bilateral rectified linear unit, which is able to improve the quality of recovered haze-free image. We establish connections between the components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.
Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.
In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Albeit simple, Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification. Code is available at: https://github.com/zhunzhong07/Random-Erasing.
This work proposes a new meta-heuristic method called Arithmetic Optimization Algorithm (AOA) that utilizes the distribution behavior of the main arithmetic operators in mathematics including (Multiplication (M), Division (D), Subtraction (S), and Addition (A)). AOA is mathematically modeled and implemented to perform the optimization processes in a wide range of search spaces. The performance of AOA is checked on twenty-nine benchmark functions and several real-world engineering design problems to showcase its applicability. The analysis of performance, convergence behaviors, and the computational complexity of the proposed AOA have been evaluated by different scenarios. Experimental results show that the AOA provides very promising results in solving challenging optimization problems compared with eleven other well-known optimization algorithms. Source codes of AOA are publicly available at and .
<h3>Importance</h3> Cancer and other noncommunicable diseases (NCDs) are now widely recognized as a threat to global development. The latest United Nations high-level meeting on NCDs reaffirmed this observation and also highlighted the slow progress in meeting the 2011 Political Declaration on the Prevention and Control of Noncommunicable Diseases and the third Sustainable Development Goal. Lack of situational analyses, priority setting, and budgeting have been identified as major obstacles in achieving these goals. All of these have in common that they require information on the local cancer epidemiology. The Global Burden of Disease (GBD) study is uniquely poised to provide these crucial data. <h3>Objective</h3> To describe cancer burden for 29 cancer groups in 195 countries from 1990 through 2017 to provide data needed for cancer control planning. <h3>Evidence Review</h3> We used the GBD study estimation methods to describe cancer incidence, mortality, years lived with disability, years of life lost, and disability-adjusted life-years (DALYs). Results are presented at the national level as well as by Socio-demographic Index (SDI), a composite indicator of income, educational attainment, and total fertility rate. We also analyzed the influence of the epidemiological vs the demographic transition on cancer incidence. <h3>Findings</h3> In 2017, there were 24.5 million incident cancer cases worldwide (16.8 million without nonmelanoma skin cancer [NMSC]) and 9.6 million cancer deaths. The majority of cancer DALYs came from years of life lost (97%), and only 3% came from years lived with disability. The odds of developing cancer were the lowest in the low SDI quintile (1 in 7) and the highest in the high SDI quintile (1 in 2) for both sexes. In 2017, the most common incident cancers in men were NMSC (4.3 million incident cases); tracheal, bronchus, and lung (TBL) cancer (1.5 million incident cases); and prostate cancer (1.3 million incident cases). The most common causes of cancer deaths and DALYs for men were TBL cancer (1.3 million deaths and 28.4 million DALYs), liver cancer (572 000 deaths and 15.2 million DALYs), and stomach cancer (542 000 deaths and 12.2 million DALYs). For women in 2017, the most common incident cancers were NMSC (3.3 million incident cases), breast cancer (1.9 million incident cases), and colorectal cancer (819 000 incident cases). The leading causes of cancer deaths and DALYs for women were breast cancer (601 000 deaths and 17.4 million DALYs), TBL cancer (596 000 deaths and 12.6 million DALYs), and colorectal cancer (414 000 deaths and 8.3 million DALYs). <h3>Conclusions and Relevance</h3> The national epidemiological profiles of cancer burden in the GBD study show large heterogeneities, which are a reflection of different exposures to risk factors, economic settings, lifestyles, and access to care and screening. The GBD study can be used by policy makers and other stakeholders to develop and improve national and local cancer control in order to achieve the global targets and improve equity in cancer care.
Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.
Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life, so-called ‘microbial dark matter’. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet. Uncultivated archaeal and bacterial cells of major uncharted branches of the tree of life are targeted and sequenced using single-cell genomics; this enables resolution of many intra- and inter-phylum-level relationships, uncovers unexpected metabolic features that challenge established boundaries between the three domains of life, and leads to the proposal of two new superphyla. Currently available genome sequences give us a narrow view of the remarkable diversity of microorganisms because the vast majority of them have never been cultivated in pure culture. Here Tanja Woyke and colleagues use single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats. This information reveals numerous intra- and inter-phylum relationships and a number of unexpected metabolic features. On the basis of the new data the authors propose taxonomic revisions to the archaeal and bacterial domains, including a proposal to reorganizing the Archaea into three superphyla.
Anomaly detection, a.k.a. outlier detection or novelty detection, has been a lasting yet active research area in various research communities for several decades. There are still some unique problem complexities and challenges that require advanced approaches. In recent years, deep learning enabled anomaly detection, i.e., deep anomaly detection , has emerged as a critical direction. This article surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in 3 high-level categories and 11 fine-grained categories of the methods. We review their key intuitions, objective functions, underlying assumptions, advantages, and disadvantages and discuss how they address the aforementioned challenges. We further discuss a set of possible future opportunities and new perspectives on addressing the challenges.
In recent years, mobile devices are equipped with increasingly advanced sensing and computing capabilities. Coupled with advancements in Deep Learning (DL), this opens up countless possibilities for meaningful applications, e.g., for medical purposes and in vehicular networks. Traditional cloud-based Machine Learning (ML) approaches require the data to be centralized in a cloud server or data center. However, this results in critical issues related to unacceptable latency and communication inefficiency. To this end, Mobile Edge Computing (MEC) has been proposed to bring intelligence closer to the edge, where data is produced. However, conventional enabling technologies for ML at mobile edge networks still require personal data to be shared with external parties, e.g., edge servers. Recently, in light of increasingly stringent data privacy legislations and growing privacy concerns, the concept of Federated Learning (FL) has been introduced. In FL, end devices use their local data to train an ML model required by the server. The end devices then send the model updates rather than raw data to the server for aggregation. FL can serve as an enabling technology in mobile edge networks since it enables the collaborative training of an ML model and also enables DL for mobile edge network optimization. However, in a large-scale and complex mobile edge network, heterogeneous devices with varying constraints are involved. This raises challenges of communication costs, resource allocation, and privacy and security in the implementation of FL at scale. In this survey, we begin with an introduction to the background and fundamentals of FL. Then, we highlight the aforementioned challenges of FL implementation and review existing solutions. Furthermore, we present the applications of FL for mobile edge network optimization. Finally, we discuss the important challenges and future research directions in FL.
RU-AI dataset is constructed based on three large publicly available datasets: Flickr8K, COCO, and Places205, by adding their corresponding machine-generated pairs.
Semiparametric regression is concerned with the flexible incorporation of non-linear functional relationships in regression analyses. Any application area that benefits from regression analysis can also benefit from semiparametric regression. Assuming only a basic familiarity with ordinary parametric regression, this user-friendly book explains the techniques and benefits of semiparametric regression in a concise and modular fashion. The authors make liberal use of graphics and examples plus case studies taken from environmental, financial, and other applications. They include practical advice on implementation and pointers to relevant software. The 2003 book is suitable as a textbook for students with little background in regression as well as a reference book for statistically oriented scientists such as biostatisticians, econometricians, quantitative social scientists, epidemiologists, with a good working knowledge of regression and the desire to begin using more flexible semiparametric models. Even experts on semiparametric regression should find something new here.
Background subtraction is a widely used approach for detecting moving objects from static cameras. Many different methods have been proposed over the recent years and both the novice and the expert can be confused about their benefits and limitations. In order to overcome this problem, this paper provides a review of the main methods and an original categorisation based on speed, memory requirements and accuracy. Such a review can effectively guide the designer to select the most suitable method for a given application in a principled way. Methods reviewed include parametric and non-parametric background density estimates and spatial correlation approaches.
The main contribution of this paper is a simple semi-supervised pipeline that only uses the original training set without collecting extra data. It is challenging in 1) how to obtain more training data only from the training set and 2) how to use the newly generated data. In this work, the generative adversarial network (GAN) is used to generate unlabeled samples. We propose the label smoothing regularization for outliers (LSRO). This method assigns a uniform label distribution to the unlabeled images, which regularizes the supervised model and improves the baseline. We verify the proposed method on a practical problem: person re-identification (re-ID). This task aims to retrieve a query person from other cameras. We adopt the deep convolutional generative adversarial network (DCGAN) for sample generation, and a baseline convolutional neural network (CNN) for representation learning. Experiments show that adding the GAN-generated data effectively improves the discriminative ability of learned CNN embeddings. On three large-scale datasets, Market-1501, CUHK03 and DukeMTMC-reID, we obtain +4.37%, +1.6% and +2.46% improvement in rank-1 precision over the baseline CNN, respectively. We additionally apply the proposed method to fine-grained bird recognition and achieve a +0.6% improvement over a strong baseline. The code is available at https://github.com/layumi/ Person-reID_GAN.
IMPORTANCE: The Global Burden of Diseases, Injuries, and Risk Factors Study 2019 (GBD 2019) provided systematic estimates of incidence, morbidity, and mortality to inform local and international efforts toward reducing cancer burden. OBJECTIVE: To estimate cancer burden and trends globally for 204 countries and territories and by Sociodemographic Index (SDI) quintiles from 2010 to 2019. EVIDENCE REVIEW: The GBD 2019 estimation methods were used to describe cancer incidence, mortality, years lived with disability, years of life lost, and disability-adjusted life years (DALYs) in 2019 and over the past decade. Estimates are also provided by quintiles of the SDI, a composite measure of educational attainment, income per capita, and total fertility rate for those younger than 25 years. Estimates include 95% uncertainty intervals (UIs). FINDINGS: In 2019, there were an estimated 23.6 million (95% UI, 22.2-24.9 million) new cancer cases (17.2 million when excluding nonmelanoma skin cancer) and 10.0 million (95% UI, 9.36-10.6 million) cancer deaths globally, with an estimated 250 million (235-264 million) DALYs due to cancer. Since 2010, these represented a 26.3% (95% UI, 20.3%-32.3%) increase in new cases, a 20.9% (95% UI, 14.2%-27.6%) increase in deaths, and a 16.0% (95% UI, 9.3%-22.8%) increase in DALYs. Among 22 groups of diseases and injuries in the GBD 2019 study, cancer was second only to cardiovascular diseases for the number of deaths, years of life lost, and DALYs globally in 2019. Cancer burden differed across SDI quintiles. The proportion of years lived with disability that contributed to DALYs increased with SDI, ranging from 1.4% (1.1%-1.8%) in the low SDI quintile to 5.7% (4.2%-7.1%) in the high SDI quintile. While the high SDI quintile had the highest number of new cases in 2019, the middle SDI quintile had the highest number of cancer deaths and DALYs. From 2010 to 2019, the largest percentage increase in the numbers of cases and deaths occurred in the low and low-middle SDI quintiles. CONCLUSIONS AND RELEVANCE: The results of this systematic analysis suggest that the global burden of cancer is substantial and growing, with burden differing by SDI. These results provide comprehensive and comparable estimates that can potentially inform efforts toward equitable cancer control around the world.