IBM Research - Haifa
facilityHaifa, Israel
Research output, citation impact, and the most-cited recent papers from IBM Research - Haifa (Israel). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from IBM Research - Haifa
The general problem of estimating the a posteriori probabilities of the states and transitions of a Markov source observed through a discrete memoryless channel is considered. The decoding of linear block and convolutional codes to minimize symbol error probability is shown to be a special case of this problem. An optimal decoding algorithm is derived.
A major consideration we had in writing this survey was to make it accessible to mathematicians as well as to computer scientists, since expander graphs, the protagonists of our story, come up in numerous and often surprising contexts in both fields.
The paper shows how a large class of interprocedural dataflow-analysis problems can be solved precisely in polynomial time by transforming them into a special kind of graph-reachability problem. The only restrictions are that the set of dataflow facts must be a finite set, and that the dataflow functions must distribute over the confluence operator (either union or intersection). This class of probable problems includes—but is not limited to—the classical separable problems (also known as “gen/kill” or “bit-vector” problems)—e.g., reaching definitions, available expressions, and live variables. In addition, the class of problems that our techniques handle includes many non-separable problems, including truly-live variables, copy constant propagation, and possibly-uninitialized variables.
The emerging cloud-computing paradigm is rapidly gaining momentum as an alternative to traditional IT (information technology). However, contemporary cloud-computing offerings are primarily targeted for Web 2.0-style applications. Only recently have they begun to address the requirements of enterprise solutions, such as support for infrastructure service-level agreements. To address the challenges and deficiencies in the current state of the art, we propose a modular, extensible cloud architecture with intrinsic support for business service management and the federation of clouds. The goal is to facilitate an open, service-based online economy in which resources and services are transparently provisioned and managed across clouds on an on-demand basis at competitive costs with high-quality service. The Reservoir project is motivated by the vision of implementing an architecture that would enable providers of cloud infrastructure to dynamically partner with each other to create a seemingly infinite pool of IT resources while fully preserving their individual autonomy in making technological and business management decisions. To this end, Reservoir could leverage and extend the advantages of virtualization and embed autonomous management in the infrastructure. At the same time, the Reservoir approach aims to achieve a very ambitious goal: creating a foundation for next-generation enterprise-grade cloud computing.
HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anti-coagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics.med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay-based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.
Clinical Document Architecture, Release One (CDA R1), became an American National Standards Institute (ANSI)-approved HL7 Standard in November 2000, representing the first specification derived from the Health Level 7 (HL7) Reference Information Model (RIM). CDA, Release Two (CDA R2), became an ANSI-approved HL7 Standard in May 2005 and is the subject of this article, where the focus is primarily on how the standard has evolved since CDA R1, particularly in the area of semantic representation of clinical events. CDA is a document markup standard that specifies the structure and semantics of a clinical document (such as a discharge summary or progress note) for the purpose of exchange. A CDA document is a defined and complete information object that can include text, images, sounds, and other multimedia content. It can be transferred within a message and can exist independently, outside the transferring message. CDA documents are encoded in Extensible Markup Language (XML), and they derive their machine processable meaning from the RIM, coupled with terminology. The CDA R2 model is richly expressive, enabling the formal representation of clinical statements (such as observations, medication administrations, and adverse events) such that they can be interpreted and acted upon by a computer. On the other hand, CDA R2 offers a low bar for adoption, providing a mechanism for simply wrapping a non-XML document with the CDA header or for creating a document with a structured header and sections containing only narrative content. The intent is to facilitate widespread adoption, while providing a mechanism for incremental semantic interoperability.
Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.
The Autonomous Agents and MultiAgent Systems (AAMAS) conference series brings together researchers from around the world to share the latest advances in the field. It provides a highprofile and high-quality forum for research in the theory and practice of autonomous agents and multiagent systems. AAMAS 2002, the first of the series, was held in Bologna, followed by Melbourne (2003), New York (2004), Utrecht (2005), Hakodate (2006), Honolulu (2007), Estoril (2008), Budapest (2009), Toronto (2010), Taipei (2011), and Valencia (2012). You are now about to enter the proceedings of AAMAS 2013, held in Saint Paul, Minnesota, in May 2013. In addition to the general track for the AAMAS 2013 conference, submissions were invited to four special tracks: robotics, virtual agents, innovative applications, and (new this year) a special challenges and visions track. The aims of these special tracks were to give researchers from these areas a strong focus, to provide a forum for discussion and debate within the encompassing structure of AAMAS, and to ensure that the impact of both theoretical contributions and innovative applications were recognized. The tracks were chaired by leaders in the corresponding fields: Daniele Nardi and Monica Nicolescu for the robotics track, Stefan Kopp and Catherine Pelachaud for the virtual agents track, Bo An and John Thangarajah for the innovative applications track, and Jeff Rosenschein for the challenges and visions track. The special track chairs provided critical input to selection of Program Committee (PC) and Senior Program Committee (SPC) members, and to the reviewer allocation and the review process itself. Both full paper and extended abstract submissions were solicited for AAMAS 2013. The papers were selected by means of a thorough review and discussion process which included an opportunity for authors to respond to reviewer comments, a discussion phase between SPC members and (track/PC) chairs, after which the program chairs made the final decisions. In the general track, 13 papers were withdrawn that were accepted as extended abstracts. No other papers were withdrawn after notification. Each full paper was allocated 8 pages in the proceedings, challenges and visions papers were allocated 4 pages, and extended abstracts 2 pages. Oral presentations were allocated 20 minutes in the program. Both full papers and extended abstracts were presented as posters during the conference. Of the submissions, 383 (64%) were indicated as being student papers, which indicates that AAMAS continues to be a nurturing environment for students. Submissions were assigned keywords, each of which was classified under one of 15 top-level topics (e.g., Cooperation). Representation of top-level topics (measured by first keyword) was broad, with top counts in the areas of Economic Paradigms (201 submissions), Agent Cooperation (137), Agent Reasoning (111), Learning and Adaptation (100), and Robotics (94).
A new method of farthest point strategy (FPS) for progressive image acquisition-an acquisition process that enables an approximation of the whole image at each sampling stage-is presented. Its main advantage is in retaining its uniformity with the increased density, providing efficient means for sparse image sampling and display. In contrast to previously presented stochastic approaches, the FPS guarantees the uniformity in a deterministic min-max sense. Within this uniformity criterion, the sampling points are irregularly spaced, exhibiting anti-aliasing properties comparable to those characteristic of the best available method (Poisson disk). A straightforward modification of the FPS yields an image-dependent adaptive sampling scheme. An efficient O(N log N) algorithm for both versions is introduced, and several applications of the FPS are discussed.
Following the events of September 11, 2001, in the United States, world public awareness for possible terrorist attacks on water supply systems has increased dramatically. Among the different threats for a water distribution system, the most difficult to address is a deliberate chemical or biological contaminant injection, due to both the uncertainty of the type of injected contaminant and its consequences, and the uncertainty of the time and location of the injection. An online contaminant monitoring system is considered as a major opportunity to protect against the impacts of a deliberate contaminant intrusion. However, although optimization models and solution algorithms have been developed for locating sensors, little is known about how these design algorithms compare to the efforts of human designers, and thus, the advantages they propose for practical design of sensor networks. To explore these issues, the Battle of the Water Sensor Networks (BWSN) was undertaken as part of the 8th Annual Water Distribution Systems Analysis Symposium, Cincinnati, Ohio, August 27–29, 2006. This paper summarizes the outcome of the BWSN effort and suggests future directions for water sensor networks research and implementation.
This paper presents the first reported 28-GHz phased-array IC for 5G communications. Implemented in 130-nm SiGe BiCMOS, the IC includes 32 TRX elements and features concurrent independent beams in two polarizations in either TX or RX operation. Circuit techniques to enable precise beam steering, orthogonal phase and amplitude control at each front end, and independent tapering and beam steering at the array level are presented. A TX/RX switch design is introduced which minimizes TX path loss resulting in 13.5 dBm/16 dBm Op1dB/Psat per front end with >20% peak power added efficiency of the power amplifier (including switch and off-mode LNA) while maintaining a 6 dB noise figure in the low noise amplifier (including switch and off-mode PA). Comprehensive on-wafer measurement results for the IC across multiple samples and temperature variation are presented. A package with four ICs and 64 dual-polarized antennas provides eight 16-element or two 64-element concurrent beams with 1.4°/step beam steering (<;0.6° rms error) across a ±50° steering range without requiring calibration. A maximum saturated effective isotropic radiated power of 54 dBm is measured in the broadside direction for each polarization. Tapering control without requiring calibration achieves up to 20-dB sidelobe rejection without affecting the main lobe direction.
Cloud storage systems are becoming increasingly popular. A promising technology that keeps their cost down is deduplication, which stores only a single copy of repeating data. Client-side deduplication attempts to identify deduplication opportunities already at the client and save the bandwidth of uploading copies of existing files to the server. In this work we identify attacks that exploit client-side deduplication, allowing an attacker to gain access to arbitrary-size files of other users based on a very small hash signatures of these files. More specifically, an attacker who knows the hash signature of a file can convince the storage service that it owns that file, hence the server lets the attacker download the entire file. (In parallel to our work, a subset of these attacks were recently introduced in the wild with respect to the Dropbox file synchronization service.) To overcome such attacks, we introduce the notion of proofs-of-ownership (PoWs), which lets a client efficiently prove to a server that that the client holds a file, rather than just some short information about it. We formalize the concept of proof-of-ownership, under rigorous security definitions, and rigorous efficiency requirements of Petabyte scale storage systems. We then present solutions based on Merkle trees and specific encodings, and analyze their security. We implemented one variant of the scheme. Our performance measurements indicate that the scheme incurs only a small overhead compared to naive client-side deduplication.
We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.
As the volume of data increases, so does the demand for online storage services, from simple backup services to cloud storage infrastructures. Although deduplication is most effective when applied across multiple users, cross-user deduplication has serious privacy implications. Some simple mechanisms can enable cross-user deduplication while greatly reducing the risk of data leakage. Cloud storage refers to scalable and elastic storage capabilities delivered as a service using Internet technologies with elastic provisioning and usebased pricing that doesn't penalize users for changing their storage consumption without notice.
This paper studies people recommendations designed to help users find known, offline contacts and discover new friends on social networking sites. We evaluated four recommender algorithms in an enterprise social networking site using a personalized survey of 500 users and a field study of 3,000 users. We found all algorithms effective in expanding users' friend lists. Algorithms based on social network information were able to produce better-received recommendations and find more known contacts for users, while algorithms using similarity of user-created content were stronger in discovering new friends. We also collected qualitative feedback from our survey users and draw several meaningful design implications.
UNLABELLED: What is already known about this subject Circulating concentrations of branched-chain amino acids (BCAAs) can affect carbohydrate metabolism in skeletal muscle, and therefore may alter insulin sensitivity. BCAAs are elevated in adults with diet-induced obesity, and are associated with their future risk of type 2 diabetes even after accounting for baseline clinical risk factors. What this study adds Increased concentrations of BCAAs are already present in young obese children and their metabolomic profiles are consistent with increased BCAA catabolism. Elevations in BCAAs in children are positively associated with insulin resistance measured 18 months later, independent of their initial body mass index. BACKGROUND: Branched-chain amino acid (BCAA) concentrations are elevated in response to overnutrition, and can affect both insulin sensitivity and secretion. Alterations in their metabolism may therefore play a role in the early pathogenesis of type 2 diabetes in overweight children. OBJECTIVE: To determine whether paediatric obesity is associated with elevations in fasting circulating concentrations of BCAAs (isoleucine, leucine and valine), and whether these elevations predict future insulin resistance. METHODS: Sixty-nine healthy subjects, ages 8-18 years, were enrolled as a cross-sectional cohort. A subset of subjects who were pre- or early-pubertal, ages 8-13 years, were enrolled in a prospective longitudinal cohort for 18 months (n = 17 with complete data). RESULTS: Elevations in the concentrations of BCAAs were significantly associated with body mass index (BMI) Z-score (Spearman's Rho 0.27, P = 0.03) in the cross-sectional cohort. In the subset of subjects that followed longitudinally, baseline BCAA concentrations were positively associated with homeostasis model assessment for insulin resistance measured 18 months later after controlling for baseline clinical factors including BMI Z-score, sex and pubertal stage (P = 0.046). CONCLUSIONS: Elevations in the concentrations of circulating BCAAs are significantly associated with obesity in children and adolescents, and may independently predict future insulin resistance.
Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. Design, Setting, and Participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Main Outcomes and Measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Results: Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. Conclusions and Relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.
We present an efficient query evaluation method based on a two level approach: at the first level, our method iterates in parallel over query term postings and identifies candidate documents using an approximate evaluation taking into account only partial information on term occurrences and no query independent factors; at the second level, promising candidates are fully evaluated and their exact scores are computed. The efficiency of the evaluation process can be improved significantly using dynamic pruning techniques with very little cost in effectiveness. The amount of pruning can be controlled by the user as a function of time allocated for query evaluation. Experimentally, using the TREC Web Track data, we have determined that our algorithm significantly reduces the total number of full evaluations by more than 90%, almost without any loss in precision or recall. At the heart of our approach there is an efficient implementation of a new Boolean construct called WAND or Weak AND that might be of independent interest.
A brain-computer interface (BCI) is a system for direct communication between brain and computer. The BCI developed in this work is based on a BCI described by Farwell and Donchin in 1988, which allows a subject to communicate one of 36 symbols presented on a 6 x 6 matrix. The system exploits the P300 component of event-related brain potentials (ERP) as a medium for communication. The processing methods distinguish this work from Donchin's work. In this work, independent component analysis (ICA) was used to separate the P300 source from the background noise. A matched filter was used together with averaging and threshold techniques for detecting the existence of P300s. The processing method was evaluated offline on data recorded from six healthy subjects. The method achieved a communication rate of 5.45 symbols/min with an accuracy of 92.1% compared to 4.8 symbols/min with an accuracy of 90% in Donchin's work. The online interface was tested with the same six subjects. The average communication rate achieved was 4.5 symbols/min with an accuracy of 79.5 % as apposed to the 4.8 symbols/min with an accuracy of 56 % in Donchin's work. The presented BCI achieves excellent performance compared to other existing BCIs, and allows a reasonable communication rate, while maintaining a low error rate.
The Internet enables connectivity between many strangers: entities that don't know each other. We present the Trust Policy Language (TPL), used to define the mapping of strangers to predefined business roles, based on certificates issued by third parties. TPL is expressive enough to allow complex policies, e.g. non-monotone (negative) certificates, while being simple enough to allow automated policy checking and processing. Issuers of certificates are either known in advance, or provide sufficient certificates to be considered a trusted authority according to the policy. This allows bottom-up, "grass roots" buildup of trust, as in the real world. We extend, rather than replace, existing role based access control mechanisms. This provides a simple, modular architecture and easy migration from existing systems. Our system automatically collects missing certificates from peer servers. In particular this allows use of standard browsers, which pass only one certificate to the server. We describe our implementation, which can be used as an extension of a Web server or as a separate server with interface to applications.