转载

arXiv Paper Daily: Fri, 23 Dec 2016

Neural and Evolutionary Computing

Highway and Residual Networks learn Unrolled Iterative Estimation

Klaus Greff , Rupesh K. Srivastava , Jürgen Schmidhuber

Comments: 10 + 3pages, under review for ICLR 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

The past year saw the introduction of new architectures such as Highway

networks and Residual networks which, for the first time, enabled the training

of feedforward networks with dozens to hundreds of layers using simple gradient

descent. While depth of representation has been posited as a primary reason for

their success, there are indications that these architectures defy a popular

view of deep learning as a hierarchical computation of increasingly abstract

features at each layer.

In this report, we argue that this view is incomplete and does not adequately

explain several recent findings. We propose an alternative viewpoint based on

unrolled iterative estimation—a group of successive layers iteratively refine

their estimates of the same features instead of computing an entirely new

representation. We demonstrate that this viewpoint directly leads to the

construction of Highway and Residual networks. Finally we provide preliminary

experiments to discuss the similarities and differences between the two

architectures.

Difficulty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit

Zhun Fan , Wenji Li , Xinye Cai , Hui Li , Kaiwen Hu , Qingfu Zhang , Kalyanmoy Deb , Erik D. Goodman

Comments: 15 pages, 8 figures, 7 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI)

In order to better understand the advantages and disadvantages of a

constrained multi-objective evolutionary algorithm (CMOEA), it is important to

understand the nature of difficulty of a constrained multi-objective

optimization problem (CMOP) that the CMOEA is going to deal with. In this

paper, we first propose three primary types of difficulty to characterize the

constraints in CMOPs, including feasibility-hardness, convergence-hardness and

diversity-hardness. We then develop a general toolkit to construct difficulty

adjustable CMOPs with three types of parameterized constraint functions

according to the proposed three primary types of difficulty. In fact,

combination of the three primary constraint functions with different parameters

can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can

be uniquely defined by a triplet with each of its parameter specifying the

level of each primary difficulty type respectively. Based on this toolkit, we

suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different

types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two

popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their

performances on them. Furthermore, this toolkit also has the ability to scale

the number of objectives. Nine difficulty adjustable constrained many-objective

optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the

number of objectives are also proposed using this toolkit. Two constrained

many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are

applied to test their performances on three, five, seven and ten-objective

DAC-MaOP1-9 with different difficulty levels and types.

Computer Vision and Pattern Recognition

Online Semantic Activity Forecasting with DARKO

Nicholas Rhinehart , Kris M. Kitani

Comments: this http URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We address the problem of continuously observing and forecasting long-term

semantic activities of a first-person camera wearer: what the person will do,

where they will go, and what goal they are seeking. In contrast to prior work

in trajectory forecasting and short-term activity forecasting, our algorithm,

DARKO, reasons about the future position, future semantic state, and future

high-level goals of the camera-wearer at arbitrary spatial and temporal

horizons defined only by the wearer’s behaviors. DARKO learns and forecasts

online from first-person observations of the user’s daily behaviors. We derive

novel mathematical results that enable efficient forecasting of different

semantic quantities of interest. We apply our method to a dataset of 5

large-scale environments with 3 different environment types, collected from 3

different users, and experimentally validate DARKO on forecasting tasks.

Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

Xin Li , Fuxin Li Subjects : Computer Vision and Pattern Recognition (cs.CV)

Deep learning has greatly improved visual recognition in recent years.

However, recent research has shown that there exist many adversarial examples

that can negatively impact the performance of such an architecture. This paper

focuses on detecting those adversarial examples by analyzing whether they come

from the same distribution as the normal examples. Instead of directly training

a deep neural network to detect adversarials, a much simpler approach is

proposed based on statistics on outputs from convolutional layers. A cascade

classifier is designed to efficiently detect adversarials. Furthermore, trained

from one particular adversarial generating mechanism, the resulting classifier

can successfully detect adversarials from a completely different mechanism as

well. After detecting adversarial examples, we show that many of them can be

recovered by simply performing a small average filter on the image. Those

findings should provoke us to think more about the classification mechanisms in

deep convolutional neural networks.

Internet-Based Image Retrieval Using End-to-End Trained Deep Distributions

A. Vakhitov , A. Kuzmin , V. Lempitsky Subjects : Computer Vision and Pattern Recognition (cs.CV)

Internet image search engines have long been considered as a promising tool

for handling open-vocabulary textual user queries to unannotated image

datasets. However, systems that use this tool have to deal with multi-modal and

noisy image sets returned by search engines, especially for polysemous queries.

Generally, for many queries, only a small part of the returned sets can be

relevant to the user intent.

In this work, we suggest an approach that explicitly accounts for the complex

and noisy structure of the image sets returned by Internet image search

engines. Similarly to a considerable number of previous image retrieval works,

we train a deep convolutional network that maps images to high-dimensional

descriptors. To model image sets obtained from the Internet, our approach then

fits a simple probabilistic model that accounts for multi-modality and noise

(e.g. a Gaussian mixture model) to the deep descriptors of the images in this

set. Finally, the resulting distribution model can be used to search in the

unannotated image dataset by evaluating likelihoods of individual images.

As our main contribution, we develop an end-to-end training procedure that

tunes the parameters of a deep network using an annotated training set, while

accounting for the distribution fitting and the subsequent matching. In the

experiments, we show that such an end-to-end approach boosts the accuracy of

the Internet-based image retrieval for hold-out concepts, as compared to

retrieval systems that fit similar distribution models to pre-trained features

and to simpler end-to-end trained baselines.

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Marvin Teichmann , Michael Weber , Marius Zoellner , Roberto Cipolla , Raquel Urtasun

Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (coming soon)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Robotics (cs.RO)

While most approaches to semantic reasoning have focused on improving

performance, in this paper we argue that computational times are very important

in order to enable real time applications such as autonomous driving. Towards

this goal, we present an approach to joint classification, detection and

semantic segmentation via a unified architecture where the encoder is shared

amongst the three tasks. Our approach is very simple, can be trained end-to-end

and performs extremely well in the challenging KITTI dataset, outperforming the

state-of-the-art in the road segmentation task. Our approach is also very

efficient, taking less than 100 ms to perform all tasks.

Hardware for Machine Learning: Challenges and Opportunities

Vivienne Sze , Yu-Hsin Chen , Joel Emer , Amr Suleiman , Zhengdong Zhang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Machine learning plays a critical role in extracting meaningful information

out of the zetabytes of sensor data collected every day. For some applications,

the goal is to analyze and understand the data to identify trends (e.g.,

surveillance, portable/wearable electronics); in other applications, the goal

is to take immediate action based the data (e.g., robotics/drones, self-driving

cars, smart Internet of Things). For many of these applications, local embedded

processing near the sensor is preferred over the cloud due to privacy or

latency concerns, or limitations in the communication bandwidth. However, at

the sensor there are often stringent constraints on energy consumption and cost

in addition to throughput and accuracy requirements. Furthermore, flexibility

is often required such that the processing can be adapted for different

applications or environments (e.g., update the weights and model in the

classifier). In many applications, machine learning often involves transforming

the input data into a higher dimensional space, which, along with programmable

weights, increases data movement and consequently energy consumption. In this

paper, we will discuss how these challenges can be addressed at various levels

of hardware design ranging from architecture, hardware-friendly algorithms,

mixed-signal circuits, and advanced technologies (including memories and

sensors).

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

Deng Cai Subjects : Computer Vision and Pattern Recognition (cs.CV)

Approximate Nearest Neighbor (ANN) search is a fundamental problem in many

areas of machine learning and data mining. During the past decade, numerous

hashing algorithms are proposed to solve this problem. Every proposed algorithm

claims outperforms other state-of-the-art methods. However, there are serious

drawbacks in the evaluation of existing hashing papers and most of the claims

in these papers should be re-examined. 1) Most of the existing papers failed to

correctly measure the search time which is essential for the ANN search

problem. 2) As a result, most of the papers report the performance increases as

the code length increases, which is wrong if we measure the search time

correctly. 3) The performance of some hashing algorithms (eg, LSH) can easily

be boosted if one uses multiple hash tables, which is an important factor

should be considered in the evaluation while most of the papers failed to do

so. In this paper, we carefully revisit many popular hashing algorithms and

suggest one possible promising direction. For the sake of reproducibility, all

the codes used in the paper are released on Github, which can be used as a

testing platform to fairly compare various hashing algorithms.

Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon

Bruno Stuner , Clément Chatelain , Thierry Paquet

Comments: 28 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Handwriting recognition state of the art methods are based on Long Short Term

Memory (LSTM) recurrent neural networks (RNN) coupled with the use of

linguistic knowledge. LSTM RNN presents high raw performance and interesting

training properties that allow us to break with the standard method at the

state of the art. We present a simple and efficient way to extract from a

single training a large number of complementary LSTM RNN, called cohort,

combined in a cascade architecture with a lexical verification. This process

does not require fine tuning, making it easy to use. Our verification allow to

deal quickly and efficiently with gigantic lexicon (over 3 million words). We

achieve state of the art results for isolated word recognition with very large

lexicon and present novel results for an unprecedented gigantic lexicon.

Deep Blind Compressed Sensing

Shikha Singh , Vanika Singhal , Angshul Majumdar

Comments: DCC 2017 Poster

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This work addresses the problem of extracting deeply learned features

directly from compressive measurements. There has been no work in this area.

Existing deep learning tools only give good results when applied on the full

signal, that too usually after preprocessing. These techniques require the

signal to be reconstructed first. In this work we show that by learning

directly from the compressed domain, considerably better results can be

obtained. This work extends the recently proposed framework of deep matrix

factorization in combination with blind compressed sensing; hence the term deep

blind compressed sensing. Simulation experiments have been carried out on

imaging via single pixel camera, under-sampled biomedical signals, arising in

wireless body area network and compressive hyperspectral imaging. In all cases,

the superiority of our proposed deep blind compressed sensing can be envisaged.

Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks

Yinda Zhang , Shuran Song , Ersin Yumer , Manolis Savva , Joon-Young Lee , Hailin Jin , Thomas Funkhouser

Comments: CVPR submission, main paper and supplementary material

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Indoor scene understanding is central to applications such as robot

navigation and human companion assistance. Over the last years, data-driven

deep neural networks have outperformed many traditional approaches thanks to

their representation learning capabilities. One of the bottlenecks in training

for better representations is the amount of available per-pixel ground truth

data that is required for core scene understanding tasks such as semantic

segmentation, normal prediction, and object edge detection. To address this

problem, a number of works proposed using synthetic data. However, a systematic

study of how such synthetic data is generated is missing. In this work, we

introduce a large-scale synthetic dataset with 400K physically-based rendered

images from 45K realistic 3D indoor scenes. We study the effects of rendering

methods and scene lighting on training for three computer vision tasks: surface

normal prediction, semantic segmentation, and object boundary detection. This

study provides insights into the best practices for training with synthetic

data (more realistic rendering is worth it) and shows that pretraining with our

new synthetic dataset can improve results beyond the current state of the art

on all three tasks.

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

Yi Zhu , Shawn Newsam

Comments: Accepted at WACV 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Multimedia (cs.MM)

This paper studies the joint learning of action recognition and temporal

localization in long, untrimmed videos. We employ a multi-task learning

framework that performs the three highly related steps of action proposal,

action recognition, and action localization refinement in parallel instead of

the standard sequential pipeline that performs the steps in order. We develop a

novel temporal actionness regression module that estimates what proportion of a

clip contains action. We use it for temporal localization but it could have

other applications like video retrieval, surveillance, summarization, etc. We

also introduce random shear augmentation during training to simulate viewpoint

change. We evaluate our framework on three popular video benchmarks. Results

demonstrate that our joint model is efficient in terms of storage and

computation in that we do not need to compute and cache dense trajectory

features, and that it is several times faster than its sequential ConvNets

counterpart. Yet, despite being more efficient, it outperforms state-of-the-art

methods with respect to accuracy.

Automatic Identification of Scenedesmus Polymorphic Microalgae from Microscopic Images

Jhony-Heriberto Giraldo-Zuluaga , Geman Diez , Alexander Gomez , Tatiana Martinez , Mariana Peñuela Vasquez , Jesus Francisco Vargas Bonilla , Augusto Salazar Subjects : Computer Vision and Pattern Recognition (cs.CV)

Microalgae counting is used to measure biomass quantity. Usually, it is

performed in a manual way using a Neubauer chamber and expert criterion, with

the risk of a high error rate. This paper addresses the methodology for

automatic identification of Scenedesmus microalgae (used in the methane

production and food industry) and applies it to images captured by a digital

microscope. The use of contrast adaptive histogram equalization for

pre-processing, and active contours for segmentation are presented. The

calculation of statistical features (Histogram of Oriented Gradients, Hu and

Zernike moments) with texture features (Haralick and Local Binary Patterns

descriptors) are proposed for algae characterization. Scenedesmus algae can

build coenobia consisting of 1, 2, 4 and 8 cells. The amount of algae of each

coenobium helps to determine the amount of lipids, proteins, and other

substances in a given sample of a algae crop. The knowledge of the quantity of

those elements improves the quality of bioprocess applications. Classification

of coenobia achieves accuracies of 98.63% and 97.32% with Support Vector

Machine (SVM) and Artificial Neural Network (ANN), respectively. According to

the results it is possible to consider the proposed methodology as an

alternative to the traditional technique for algae counting. The database used

in this paper is publicly available for download.

Top-down Visual Saliency Guided by Captions

Vasili Ramanishka , Abir Das , Jianming Zhang , Kate Saenko Subjects : Computer Vision and Pattern Recognition (cs.CV)

Top-down saliency methods based on deep neural networks have recently been

proposed for task-driven visual search. However existing methods focus on

object or scene classification tasks and cannot be used to compute saliency

heatmaps using a natural language sentence as the top-down input. Current

state-of-the-art image and video captioning models can generate accurate

sentence captions but are difficult to understand, as they do not expose the

internal process by which spatial and temporal regions are mapped to predicted

words. In this paper, we expose this mapping and demonstrate that

spatio-temporal saliency is learned implicitly by the combination of CNN and

LSTM parts of modern encoder-decoder networks. Our approach, which we call

Caption-Guided Visual Saliency, can produce spatial or spatio-temporal heatmaps

for both given input sentences or sentences predicted by our model. Unlike

recent efforts that introduce explicit “attention” layers to selectively attend

to certain inputs while generating each word, our approach recovers saliency

without the overhead of explicit attention layers, and can be used to analyze a

variety of existing model architectures and improve their design. We evaluate

our approach on large scale video and image datasets and make several

interesting discoveries about the inner workings of captioning models. The

source code is publicly available at

github.com/VisionLearningGroup/caption-guided-saliency.

Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)

The task of generating natural language descriptions from images has received

a lot of attention in recent years. Consequently, it is becoming increasingly

important to evaluate such image captioning approaches in an automatic manner.

In this paper, we provide an in-depth evaluation of the existing image

captioning metrics through a series of carefully designed experiments.

Moreover, we explore the utilization of the recently proposed Word Mover’s

Distance (WMD) document metric for the purpose of image captioning. Our

findings outline the differences and/or similarities between metrics and their

relative robustness by means of extensive correlation, accuracy and distraction

based evaluations. Our results also demonstrate that WMD provides strong

advantages over other metrics.

Artificial Intelligence

Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Hai Ye , Wenhan Chao , Zhunchen Luo Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)

In distant supervised relation extraction, the connection between relations

of one entity tuple, which we call class ties, is common. Exploiting this

connection may be promising for relation extraction. However, this property is

seldom considered by previous work. In this work, to leverage class ties, we

propose to make joint relation extraction with a unified model that integrates

convolutional neural network with a general pairwise ranking framework, in

which two novel ranking loss functions are introduced. Besides, an effective

method is proposed to relieve the impact of relation NR (not relation) for

model training and test. Experimental results on a widely used dataset show

that: (1) Our model is much more superior than the baselines, achieving

state-of-the-art performance; (2) Leveraging class ties, joint extraction is

indeed better than separated extraction; (3) Relieving the impact of NR will

significantly boost our model performance; (4) Our model can primely deal with

wrong labeling problem.

Solving Set Optimization Problems by Cardinality Optimization via Weak Constraints with an Application to Argumentation

Wolfgang Faber , Mauro Vallati , Federico Cerutti , Massimiliano Giacomin

Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016

Subjects

:

Artificial Intelligence (cs.AI)

Optimization – minimization or maximization – in the lattice of subsets is a

frequent operation in Artificial Intelligence tasks. Examples are

subset-minimal model-based diagnosis, nonmonotonic reasoning by means of

circumscription, or preferred extensions in abstract argumentation. Finding the

optimum among many admissible solutions is often harder than finding admissible

solutions with respect to both computational complexity and methodology. This

paper addresses the former issue by means of an effective method for finding

subset-optimal solutions. It is based on the relationship between

cardinality-optimal and subset-optimal solutions, and the fact that many

logic-based declarative programming systems provide constructs for finding

cardinality-optimal solutions, for example maximum satisfiability (MaxSAT) or

weak constraints in Answer Set Programming (ASP). Clearly each

cardinality-optimal solution is also a subset-optimal one, and if the language

also allows for the addition of particular restricting constructs (both MaxSAT

and ASP do) then all subset-optimal solutions can be found by an iterative

computation of cardinality-optimal solutions. As a showcase, the computation of

preferred extensions of abstract argumentation frameworks using the proposed

method is studied.

The SP Theory of Intelligence as a Foundation for the Development of a General, Human-Level Thinking Machine

J Gerard Wolff Subjects : Artificial Intelligence (cs.AI)

This paper summarises how the “SP theory of intelligence” and its realisation

in the “SP computer model” simplifies and integrates concepts across artificial

intelligence and related areas, and thus provides a promising foundation for

the development of a general, human-level thinking machine, in accordance with

the main goal of research in artificial general intelligence.

The key to this simplification and integration is the powerful concept of

“multiple alignment”, borrowed and adapted from bioinformatics. This concept

has the potential to be the “double helix” of intelligence, with as much

significance for human-level intelligence as has DNA for biological sciences.

Strengths of the SP system include: versatility in the representation of

diverse kinds of knowledge; versatility in aspects of intelligence (including:

strengths in unsupervised learning; the processing of natural language; pattern

recognition at multiple levels of abstraction that is robust in the face of

errors in data; several kinds of reasoning (including: one-step `deductive’

reasoning; chains of reasoning; abductive reasoning; reasoning with

probabilistic networks and trees; reasoning with ‘rules’; nonmonotonic

reasoning and reasoning with default values; Bayesian reasoning with

‘explaining away’; and more); planning; problem solving; and more); seamless

integration of diverse kinds of knowledge and diverse aspects of intelligence

in any combination; and potential for application in several areas (including:

helping to solve nine problems with big data; helping to develop human-level

intelligence in autonomous robots; serving as a database with intelligence and

with versatility in the representation and integration of several forms of

knowledge; serving as a vehicle for medical knowledge and as an aid to medical

diagnosis; and several more).

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Wendelin Böhmer , Rong Guo , Klaus Obermayer

Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Machine Learning (stat.ML)

This paper investigates a type of instability that is linked to the greedy

policy improvement in approximated reinforcement learning. We show empirically

that non-deterministic policy improvement can stabilize methods like LSPI by

controlling the improvements’ stochasticity. Additionally we show that a

suitable representation of the value function also stabilizes the solution to

some degree. The presented approach is simple and should also be easily

transferable to more sophisticated algorithms like deep reinforcement learning.

Highway and Residual Networks learn Unrolled Iterative Estimation

Klaus Greff , Rupesh K. Srivastava , Jürgen Schmidhuber

Comments: 10 + 3pages, under review for ICLR 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

The past year saw the introduction of new architectures such as Highway

networks and Residual networks which, for the first time, enabled the training

of feedforward networks with dozens to hundreds of layers using simple gradient

descent. While depth of representation has been posited as a primary reason for

their success, there are indications that these architectures defy a popular

view of deep learning as a hierarchical computation of increasingly abstract

features at each layer.

In this report, we argue that this view is incomplete and does not adequately

explain several recent findings. We propose an alternative viewpoint based on

unrolled iterative estimation—a group of successive layers iteratively refine

their estimates of the same features instead of computing an entirely new

representation. We demonstrate that this viewpoint directly leads to the

construction of Highway and Residual networks. Finally we provide preliminary

experiments to discuss the similarities and differences between the two

architectures.

Difficulty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit

Zhun Fan , Wenji Li , Xinye Cai , Hui Li , Kaiwen Hu , Qingfu Zhang , Kalyanmoy Deb , Erik D. Goodman

Comments: 15 pages, 8 figures, 7 tables

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI)

In order to better understand the advantages and disadvantages of a

constrained multi-objective evolutionary algorithm (CMOEA), it is important to

understand the nature of difficulty of a constrained multi-objective

optimization problem (CMOP) that the CMOEA is going to deal with. In this

paper, we first propose three primary types of difficulty to characterize the

constraints in CMOPs, including feasibility-hardness, convergence-hardness and

diversity-hardness. We then develop a general toolkit to construct difficulty

adjustable CMOPs with three types of parameterized constraint functions

according to the proposed three primary types of difficulty. In fact,

combination of the three primary constraint functions with different parameters

can lead to construct a large variety of CMOPs and CMaOPs, whose difficulty can

be uniquely defined by a triplet with each of its parameter specifying the

level of each primary difficulty type respectively. Based on this toolkit, we

suggest fifteen difficulty adjustable CMOPs named DAC-MOP1-15 with different

types and levels of difficulty. To study the effectiveness of DAC-MOP1-15, two

popular CMOEAs – MOEA/D-CDP and NSGA-II-CDP are adopted to test their

performances on them. Furthermore, this toolkit also has the ability to scale

the number of objectives. Nine difficulty adjustable constrained many-objective

optimization problems (DAC-MaOPs) named DAC-MaOP1-9 with the scalability to the

number of objectives are also proposed using this toolkit. Two constrained

many-objective evolutionary algorithms (CMaOEAs) – CNSGA-III and CMOEA/DD are

applied to test their performances on three, five, seven and ten-objective

DAC-MaOP1-9 with different difficulty levels and types.

Counting Answer Sets via Dynamic Programming

Johannes Fichte , Markus Hecher , Michael Morak , Stefan Woltran

Comments: Informal proceedings of the 1st Workshop on Trends and Applications of Answer Set Programming (TAASP 2016), Klagenfurt, Austria, 26 September 2016

Subjects

:

Logic in Computer Science (cs.LO)

; Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

While the solution counting problem for propositional satisfiability (#SAT)

has received renewed attention in recent years, this research trend has not

affected other AI solving paradigms like answer set programming (ASP). Although

ASP solvers are designed to enumerate all solutions, and counting can therefore

be easily done, the involved materialization of all solutions is a clear

bottleneck for the counting problem of ASP (#ASP). In this paper we propose

dynamic programming-based #ASP algorithms that exploit the structure of the

underlying (ground) ASP program. Experimental results for a prototype

implementation show promise when compared to existing solvers.

Causal Effect Identification in Acyclic Directed Mixed Graphs and Gated Models

Jose M. Peña , Marcus Bendtsen Subjects : Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI)

We introduce a new family of graphical models that consists of graphs with

possibly directed, undirected and bidirected edges but without directed cycles.

We show that these models are suitable for representing causal models with

additive error terms. We provide a set of sufficient graphical criteria for the

identification of arbitrary causal effects when the new models contain directed

and undirected edges but no bidirected edge. We also provide a necessary and

sufficient graphical criterion for the identification of the causal effect of a

single variable on the rest of the variables. Moreover, we develop an exact

algorithm for learning the new models from observational and interventional

data via answer set programming. Finally, we introduce gated models for causal

effect identification, a new family of graphical models that exploits context

specific independences to identify additional causal effects.

Information Retrieval

Development of UMLS Based Health Care Web Services for Android Platform

Nareena Soomro , Safeeulah Soomro , Zainab Alansari , Suhni Abbasi , Mohammad Riyaz Belgaum , Abdul Baqi Khakwani

Journal-ref: Sindh Univ. Res. Jour. (Sci. Ser.) Vol. 48 (4D) 05-08 (2016)

Subjects

:

Computers and Society (cs.CY)

; Information Retrieval (cs.IR); Software Engineering (cs.SE)

In this fast developing world of information, the amount of medical knowledge

is rising at an exponential level. The UMLS (Unified Medical Language Systems),

is rich knowledge base consisting files and software that provides many health

and biomedical vocabularies and standards. A Web service is a web solution to

facilitate machine-to-machine interaction over a network. Few UMLS web services

are currently available for portable devices, but most of them lack in

efficiency and performance. It is proposed to develop Android-based web

services for healthcare systems underlying rich knowledge source of UMLS. The

experimental evaluation was made to analyse the efficiency and performance

effect with and without using the designed prototype. The understand-ability

and interaction with the prototype were greater than those who used the

alternate sources to obtain the answers to their questions. The overall

performance indicates that the system is convenient and easy to use. The result

of the evaluation clearly proved that designed system retrieves all the

pertinent information better than syntactic searches.

Computation and Language

Re-evaluating Automatic Metrics for Image Captioning

Mert Kilickaya , Aykut Erdem , Nazli Ikizler-Cinbis , Erkut Erdem Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)

The task of generating natural language descriptions from images has received

a lot of attention in recent years. Consequently, it is becoming increasingly

important to evaluate such image captioning approaches in an automatic manner.

In this paper, we provide an in-depth evaluation of the existing image

captioning metrics through a series of carefully designed experiments.

Moreover, we explore the utilization of the recently proposed Word Mover’s

Distance (WMD) document metric for the purpose of image captioning. Our

findings outline the differences and/or similarities between metrics and their

relative robustness by means of extensive correlation, accuracy and distraction

based evaluations. Our results also demonstrate that WMD provides strong

advantages over other metrics.

Noise Mitigation for Neural Entity Typing and Relation Extraction

Yadollah Yaghoobzadeh , Heike Adel , Hinrich Schütze

Comments: EACL 2017; the first two authors contributed equally to this work

Subjects

:

Computation and Language (cs.CL)

In this paper, we address two different types of noise in information

extraction models: noise from distant supervision and noise from pipeline input

features. Our target tasks are entity typing and relation extraction. For the

first noise type, we introduce multi-instance multi-label learning algorithms

using neural network models, and apply them to fine-grained entity typing for

the first time. This gives our models comparable performance with the

state-of-the-art supervised approach which uses global embeddings of entities.

For the second noise type, we propose ways to improve the integration of noisy

entity type predictions into relation extraction. Our experiments show that

probabilistic predictions are more robust than discrete predictions and that

joint training of the two tasks performs best.

Continuous multilinguality with language vectors

Robert Östling , Jörg Tiedemann Subjects : Computation and Language (cs.CL)

Most existing models for multilingual natural language processing (NLP) treat

language as a discrete category, and make predictions for either one language

or the other. In contrast, we propose using continuous vector representations

of language. We show that these can be learned efficiently with a

character-based neural language model, and used to improve inference about

language varieties not seen during training. In experiments with 1303 Bible

translations into 990 different languages, we empirically explore the capacity

of multilingual language models, and also show that the language vectors

capture genetic relationships between languages.

A Context-aware Attention Network for Interactive Question Answering

Huayu Li , Martin Renqiang Min , Yong Ge , Asim Kadav

Comments: 13 pages, 2 figures

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG)

We develop a new model for Interactive Question Answering (IQA), using

Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and

questions, and another GRU as a decoder for outputs. Distinct from previous

work, our approach employs context-dependent word-level attention for more

accurate statement representations and question-guided sentence-level attention

for better context modeling. Employing these mechanisms, our model accurately

understands when it can output an answer or when it requires generating a

supplementary question for additional input. When available, user’s feedback is

encoded and directly applied to update sentence-level attention to infer the

answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively

the effectiveness of our model with significant improvement over conventional

QA models.

Jointly Extracting Relations with Class Ties via Effective Deep Ranking

Hai Ye , Wenhan Chao , Zhunchen Luo Subjects : Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)

In distant supervised relation extraction, the connection between relations

of one entity tuple, which we call class ties, is common. Exploiting this

connection may be promising for relation extraction. However, this property is

seldom considered by previous work. In this work, to leverage class ties, we

propose to make joint relation extraction with a unified model that integrates

convolutional neural network with a general pairwise ranking framework, in

which two novel ranking loss functions are introduced. Besides, an effective

method is proposed to relieve the impact of relation NR (not relation) for

model training and test. Experimental results on a widely used dataset show

that: (1) Our model is much more superior than the baselines, achieving

state-of-the-art performance; (2) Leveraging class ties, joint extraction is

indeed better than separated extraction; (3) Relieving the impact of NR will

significantly boost our model performance; (4) Our model can primely deal with

wrong labeling problem.

Distributed, Parallel, and Cluster Computing

Pot: Deterministic transactional execution

Tiago M. Vale , João A. Silva , Ricardo J. Dias , João M. Lourenço

Comments: Published in ACM Transactions on Architecture and Code Optimization (TACO) 13, 4

Journal-ref: ACM Trans. Archit. Code Optim. 13, 4, Article 52 (December 2016)

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

This paper presents Pot, a system that leverages the concept of preordered

transactions to achieve deterministic multithreaded execution of programs that

use Transactional Memory. Preordered transactions eliminate the root cause of

nondeterminism in transactional execution: they provide the illusion of

executing in a deterministic serial order, unlike traditional transactions

which appear to execute in a nondeterministic order that can change from

execution to execution. Pot uses a new concurrency control protocol that

exploits the serialization order to distinguish between fast and speculative

transaction execution modes in order to mitigate the overhead of imposing a

deterministic order. We build two Pot prototypes: one using STM and another

using off-the-shelf HTM. To the best of our knowledge, Pot enables

deterministic execution of programs using off-the-shelf HTM for the first time.

An experimental evaluation shows that Pot achieves deterministic execution of

TM programs with low overhead, sometimes even outperforming nondeterministic

executions, and clearly outperforming the state of the art.

An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures

Shengguo Li , Francois-Henry Rouet , Jie Liu , Chun Huang , Xingyu Gao , Xuebin Chi

Comments: 20 pages, 7 figures

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Mathematical Software (cs.MS); Numerical Analysis (cs.NA)

In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for

the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically

semiseparable (HSS) matrices. HSS is an important type of rank-structured

matrices.Most time of the DC algorithm is cost by computing the eigenvectors

via the matrix-matrix multiplications (MMM). In our parallel hybrid DC (PHDC)

algorithm, MMM is accelerated by using the HSS matrix techniques when the

intermediate matrix is large. All the HSS algorithms are done via the package

STRUMPACK. PHDC has been tested by using many different matrices. Compared with

the DC implementation in MKL, PHDC can be faster for some matrices with few

deflations when using hundreds of processes. However, the gains decrease as the

number of processes increases. The comparisons of PHDC with ELPA (the

Eigenvalue soLvers for Petascale Applications library) are similar. PHDC is

usually slower than MKL and ELPA when using 300 or more processes on Tianhe-2

supercomputer.

Comparative Analysis of SpatialHadoop and GeoSpark for Geospatial Big Data Analytics

Rakesh K. Lenka , Rabindra K. Barik , Noopur Gupta , Syed Mohd Ali , Amiya Rath , Harishchandra Dubey

Comments: 6 pages, 7 figures,table1

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Computers and Society (cs.CY)

In this digitalised world where every information is stored, the data a are

growing exponentially. It is estimated that data are doubles itself every two

years. Geospatial data are one of the prime contributors to the big data

scenario. There are numerous tools of the big data analytics. But not all the

big data analytics tools are capabilities to handle geospatial big data. In the

present paper, it has been discussed about the recent two popular open source

geospatial big data analytical tools i.e. Spatial- Hadoop and GeoSpark which

can be used for analysis and process the geospatial big data in efficient

manner. It has compared the architectural view of SpatialHadoop and GeoSpark.

Through the architectural comparison, it has also summarised the merits and

demerits of these tools according the execution times and volume of the data

which has been used.

Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly

Arijit Khan

Comments: 6 pages, 5 figures

Subjects

:

Databases (cs.DB)

; Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We study distributed graph algorithms that adopt an iterative vertex-centric

framework for graph processing, popularized by the Google’s Pregel system.

Since then, there are several attempts to implement many graph algorithms in a

vertex-centric framework, as well as efforts to design optimization techniques

for improving the efficiency. However, to the best of our knowledge, there has

not been any systematic study to compare these vertex-centric implementations

with their sequential counterparts. Our paper addresses this gap in two ways.

(1) We analyze the computational complexity of such implementations with the

notion of time-processor product, and benchmark several vertex-centric graph

algorithms whether they perform more work with respect to their best-known

sequential solutions. (2) Employing the concept of balanced practical Pregel

algorithms, we study if these implementations suffer from imbalanced workload

and large number of iterations. Our findings illustrate that with the exception

of Euler tour tree algorithm, all other algorithms either perform more work

than their best-known sequential approach, or suffer from imbalanced workload/

large number of iterations, or even both. We also emphasize on graph algorithms

that are fundamentally difficult to be expressed in vertex-centric frameworks,

and conclude by discussing the road ahead for distributed graph processing.

Learning

Deep Learning and Its Applications to Machine Health Monitoring: A Survey

Rui Zhao , Ruqiang Yan , Zhenghua Chen , Kezhi Mao , Peng Wang , Robert X. Gao

Comments: 14 pages, 9 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Since 2006, deep learning (DL) has become a rapidly growing research

direction, redefining state-of-the-art performances in a wide range of areas

such as object recognition, image segmentation, speech recognition and machine

translation. In modern manufacturing systems, data-driven machine health

monitoring is gaining in popularity due to the widespread deployment of

low-cost sensors and their connection to the Internet. Meanwhile, deep learning

provides useful tools for processing and analyzing these big machinery data.

The main purpose of this paper is to review and summarize the emerging research

work of deep learning on machine health monitoring. After the brief

introduction of deep learning techniques, the applications of deep learning in

machine health monitoring systems are reviewed mainly from the following

aspects: Auto-encoder (AE) and its variants, Restricted Boltzmann Machines and

its variants including Deep Belief Network (DBN) and Deep Boltzmann Machines

(DBM), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN).

Finally, some new trends of DL-based machine health monitoring methods are

discussed.

A note on the function approximation error bound for risk-sensitive reinforcement learning

Prasenjit Karmakar , Shalabh Bhatnagar

Comments: 4 pages technical short note

Subjects

:

Learning (cs.LG)

In this paper we improve the existing function approximation error bound for

the policy evaluation algorithm when the aim is to find the risk-sensitive cost

represented using exponential utility.

On Coreset Constructions for the Fuzzy (K)-Means Problem

Johannes Blömer , Sascha Brauer , Kathrin Bujna Subjects : Learning (cs.LG) ; Data Structures and Algorithms (cs.DS)

In this paper, we present coreset constructions for the fuzzy (K)-means

problem. First, we show that one can construct a weak coresets for fuzzy

(K)-means. Second, we show that there are coresets for fuzzy (K)-means with

respect to balanced fuzzy (K)-means solutions. Third, we use these coresets to

develop a randomized approximation algorithm whose runtime is polynomial in the

number of the given points and the dimension of these points.

How to Train Your Deep Neural Network with Dictionary Learning

Vanika Singhal , Shikha Singh , Angshul Majumdar

Comments: DCC 2017 poster

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Currently there are two predominant ways to train deep neural networks. The

first one uses restricted Boltzmann machine (RBM) and the second one

autoencoders. RBMs are stacked in layers to form deep belief network (DBN); the

final representation layer is attached to the target to complete the deep

neural network. Autoencoders are nested one inside the other to form stacked

autoencoders; once the stcaked autoencoder is learnt the decoder portion is

detached and the target attached to the deepest layer of the encoder to form

the deep neural network. This work proposes a new approach to train deep neural

networks using dictionary learning as the basic building block; the idea is to

use the features from the shallower layer as inputs for training the next

deeper layer. One can use any type of dictionary learning (unsupervised,

supervised, discriminative etc.) as basic units till the pre-final layer. In

the final layer one needs to use the label consistent dictionary learning

formulation for classification. We compare our proposed framework with existing

state-of-the-art deep learning techniques on benchmark problems; we are always

within the top 10 results. In actual problems of age and gender classification,

we are better than the best known techniques.

Microstructure Representation and Reconstruction of Heterogeneous Materials via Deep Belief Network for Computational Material Design

Ruijin Cang , Yaopengxiao Xu , Shaohua Chen , Yongming Liu , Yang Jiao , Max Yi Ren

Comments: 27 pages, 17 figures

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Integrated Computational Materials Engineering (ICME) aims to accelerate

optimal design of complex material systems by integrating material science and

design automation. For tractable ICME, it is required that (1) a structural

feature space be identified to allow reconstruction of new designs, and (2) the

reconstruction process be property-preserving. The majority of existing

structural presentation schemes rely on the designer’s understanding of

specific material systems to identify geometric and statistical features, which

could be biased and insufficient for reconstructing physically meaningful

microstructures of complex material systems. In this paper, we develop a

feature learning mechanism based on convolutional deep belief network to

automate a two-way conversion between microstructures and their

lower-dimensional feature representations, and to achieves a 1000-fold

dimension reduction from the microstructure space. The proposed model is

applied to a wide spectrum of heterogeneous material systems with distinct

microstructural features including Ti-6Al-4V alloy, Pb63-Sn37 alloy,

Fontainebleau sandstone, and Spherical colloids, to produce material

reconstructions that are close to the original samples with respect to 2-point

correlation functions and mean critical fracture strength. This capability is

not achieved by existing synthesis methods that rely on the Markovian

assumption of material microstructures.

Detecting Unusual Input-Output Associations in Multivariate Conditional Data

Charmgil Hong , Milos Hauskrecht Subjects : Learning (cs.LG) ; Machine Learning (stat.ML)

Despite tremendous progress in outlier detection research in recent years,

the majority of existing methods are designed only to detect unconditional

outliers that correspond to unusual data patterns expressed in the joint space

of all data attributes. Such methods are not applicable when we seek to detect

conditional outliers that reflect unusual responses associated with a given

context or condition. This work focuses on multivariate conditional outlier

detection, a special type of the conditional outlier detection problem, where

data instances consist of multi-dimensional input (context) and output

(responses) pairs. We present a novel outlier detection framework that

identifies abnormal input-output associations in data with the help of a

decomposable conditional probabilistic model that is learned from all data

instances. Since components of this model can vary in their quality, we combine

them with the help of weights reflecting their reliability in assessment of

outliers. We study two ways of calculating the component weights: global that

relies on all data, and local that relies only on instances similar to the

target instance. Experimental results on data from various domains demonstrate

the ability of our framework to successfully identify multivariate conditional

outliers.

Highway and Residual Networks learn Unrolled Iterative Estimation

Klaus Greff , Rupesh K. Srivastava , Jürgen Schmidhuber

Comments: 10 + 3pages, under review for ICLR 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI); Learning (cs.LG)

The past year saw the introduction of new architectures such as Highway

networks and Residual networks which, for the first time, enabled the training

of feedforward networks with dozens to hundreds of layers using simple gradient

descent. While depth of representation has been posited as a primary reason for

their success, there are indications that these architectures defy a popular

view of deep learning as a hierarchical computation of increasingly abstract

features at each layer.

In this report, we argue that this view is incomplete and does not adequately

explain several recent findings. We propose an alternative viewpoint based on

unrolled iterative estimation—a group of successive layers iteratively refine

their estimates of the same features instead of computing an entirely new

representation. We demonstrate that this viewpoint directly leads to the

construction of Highway and Residual networks. Finally we provide preliminary

experiments to discuss the similarities and differences between the two

architectures.

Stacking machine learning classifiers to identify Higgs bosons at the LHC

Alexandre Alves

Comments: 7 pages, 3 figures, 3 tables

Subjects

:

High Energy Physics – Phenomenology (hep-ph)

; Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

Machine learning (ML) algorithms have been employed in the problem of

classifying signal and background events with high accuracy in particle

physics. In this paper, we use a widespread ML technique, namely, emph{stacked

generalization}, for the task of discovering a new neutral Higgs boson in gluon

fusion. We found that, at the same time it demands far less computation

efforts, emph{stacking} ML algorithms performs almost as well as deep neural

networks (DNN) trained exclusively with kinematic distributions for the same

task by building either a highly discriminating linear model or a shallower

neural network with stacked ML outputs. We also show that it is possible to

outperform DNN in this channel by partially exploring correlations among the

classifiers outputs in a multivariate statistical analysis.

Structured Sequence Modeling with Graph Convolutional Recurrent Networks

Youngjoo Seo , Michaël Defferrard , Pierre Vandergheynst , Xavier Bresson Subjects : Machine Learning (stat.ML) ; Learning (cs.LG)

This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep

learning model able to predict structured sequences of data. Precisely, GCRN is

a generalization of classical recurrent neural networks (RNN) to data

structured by an arbitrary graph. Such structured sequences can represent

series of frames in videos, spatio-temporal measurements on a network of

sensors, or random walks on a vocabulary graph for natural language modeling.

The proposed model combines convolutional neural networks (CNN) on graphs to

identify spatial structures and RNN to find dynamic patterns. We study two

possible architectures of GCRN, and apply the models to two practical problems:

predicting moving MNIST data, and modeling natural language with the Penn

Treebank dataset. Experiments show that exploiting simultaneously graph spatial

and dynamic information about data can improve both precision and learning

speed.

Finding Statistically Significant Attribute Interactions

Andreas Henelius , Antti Ukkonen , Kai Puolamäki

Comments: 9 pages, 4 tables, 1 figure

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG)

In many data exploration tasks it is meaningful to identify groups of

attribute interactions that are specific to a variable of interest. These

interactions are also useful in several practical applications, for example, to

gain insight into the structure of the data, in feature selection, and in data

anonymisation. We present a novel method, based on statistical significance

testing, that can be used to verify if the data set has been created by a given

factorized class-conditional joint distribution, where the distribution is

parametrized by partition of its attributes. Furthermore, we provide a method,

named ASTRID, to automatically find a partition of attributes that describes

the distribution that has generated the data. The state-of-the-art classifiers

are utilized to capture the interactions present in the data by systematically

breaking attribute interactions and observing the effect of this breaking on

classifier performance. We empirically demonstrate the utility of the proposed

method with real and synthetic data as well as with usage examples.

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Wendelin Böhmer , Rong Guo , Klaus Obermayer

Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain

Subjects

:

Artificial Intelligence (cs.AI)

; Learning (cs.LG); Machine Learning (stat.ML)

This paper investigates a type of instability that is linked to the greedy

policy improvement in approximated reinforcement learning. We show empirically

that non-deterministic policy improvement can stabilize methods like LSPI by

controlling the improvements’ stochasticity. Additionally we show that a

suitable representation of the value function also stabilizes the solution to

some degree. The presented approach is simple and should also be easily

transferable to more sophisticated algorithms like deep reinforcement learning.

Robustness of Voice Conversion Techniques Under Mismatched Conditions

Monisankha Pal , Dipjyoti Paul , Md Sahidullah , Goutam Saha

Comments: 5 pages, 2 figures

Subjects

:

Sound (cs.SD)

; Learning (cs.LG); Machine Learning (stat.ML)

Most of the existing studies on voice conversion (VC) are conducted in

acoustically matched conditions between source and target signal. However, the

robustness of VC methods in presence of mismatch remains unknown. In this

paper, we report a comparative analysis of different VC techniques under

mismatched conditions. The extensive experiments with five different VC

techniques on CMU ARCTIC corpus suggest that performance of VC methods

substantially degrades in noisy conditions. We have found that bilinear

frequency warping with amplitude scaling (BLFWAS) outperforms other methods in

most of the noisy conditions. We further explore the suitability of different

speech enhancement techniques for robust conversion. The objective evaluation

results indicate that spectral subtraction and log minimum mean square error

(logMMSE) based speech enhancement techniques can be used to improve the

performance in specific noisy conditions.

A Context-aware Attention Network for Interactive Question Answering

Huayu Li , Martin Renqiang Min , Yong Ge , Asim Kadav

Comments: 13 pages, 2 figures

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG)

We develop a new model for Interactive Question Answering (IQA), using

Gated-Recurrent-Unit recurrent networks (GRUs) as encoders for statements and

questions, and another GRU as a decoder for outputs. Distinct from previous

work, our approach employs context-dependent word-level attention for more

accurate statement representations and question-guided sentence-level attention

for better context modeling. Employing these mechanisms, our model accurately

understands when it can output an answer or when it requires generating a

supplementary question for additional input. When available, user’s feedback is

encoded and directly applied to update sentence-level attention to infer the

answer. Extensive experiments on QA and IQA datasets demonstrate quantitatively

the effectiveness of our model with significant improvement over conventional

QA models.

Distributed Dictionary Learning

Amir Daneshmand , Gesualdo Scutari , Francisco Facchinei Subjects : Optimization and Control (math.OC) ; Learning (cs.LG)

The paper studies distributed Dictionary Learning (DL) problems where the

learning task is distributed over a multi-agent network with time-varying

(nonsymmetric) connectivity. This formulation is relevant, for instance, in

big-data scenarios where massive amounts of data are collected/stored in

different spatial locations and it is unfeasible to aggregate and/or process

all the data in a fusion center, due to resource limitations, communication

overhead or privacy considerations. We develop a general distributed

algorithmic framework for the (nonconvex) DL problem and establish its

asymptotic convergence. The new method hinges on Successive Convex

Approximation (SCA) techniques coupled with i) a gradient tracking mechanism

instrumental to locally estimate the missing global information; and ii) a

consensus step, as a mechanism to distribute the computations among the agents.

To the best of our knowledge, this is the first distributed algorithm with

provable convergence for the DL problem and, more in general, bi-convex

optimization problems over (time-varying) directed graphs.

Information Theory

Sum-networks from undirected graphs: construction and capacity analysis

Ardhendu Tripathy , Aditya Ramamoorthy

Comments: This has an extra Appendix section giving more information about Remark 2 of the original paper published in the 52nd Annual Allerton Conference on Communication, Control and Computing, 2014

Subjects

:

Information Theory (cs.IT)

We consider a directed acyclic network with multiple sources and multiple

terminals where each terminal is interested in decoding the sum of independent

sources generated at the source nodes. We describe a procedure whereby a simple

undirected graph can be used to construct such a sum-network and demonstrate an

upper bound on its computation rate. Furthermore, we show sufficient conditions

for the construction of a linear network code that achieves this upper bound.

Our procedure allows us to construct sum-networks that have any arbitrary

computation rate (frac{p}{q}) (where (p,q) are non-negative integers). Our

work significantly generalizes a previous approach for constructing

sum-networks with arbitrary capacities. Specifically, we answer an open

question in prior work by demonstrating sum-networks with significantly fewer

number of sources and terminals.

Compressed Sensing Algorithms for OFDM Channel Estimation

Jonathan Ling , Dmitry Chizhik , A. Tulino , Inaki Esnaola

Comments: Patent no: US8929390 Methods And Apparatuses For Channel Estimation In Wireless Networks

Subjects

:

Information Theory (cs.IT)

; Networking and Internet Architecture (cs.NI)

Radio channels are typically sparse in the delay domain, and ideal for

compressed sensing. A new compressed sensing algorithm called eX-OMP is

developed that yields performance similar to that of the optimal MMSE

estimator. The new algorithm relies on a small amount additional data. Both

eX-OMP and the MMSE estimator adaptively balance channel tracking and noise

reduction. They perform better than simple estimators such as the

linear-interpolator which fix this trade-off a priori. Some wideband

measurements are examined, and the channels are found to be represented by a

few delays.

Cache-induced Hierarchical Cooperation in Wireless Device-to-Device Caching Networks

An Liu , Vincent Lau , Giuseppe Caire

Comments: 45 pages, 10 figures, submitted to IEEE Transactions on Information Theory

Subjects

:

Information Theory (cs.IT)

We consider a wireless device-to-device (D2D) caching network where n nodes

are placed on a regular grid of area A(n). Each node caches L_C*F (coded) bits

from a library of size L*F bits, where L is the number of files and F is the

size of each file. Each node requests a file from the library independently

according to a popularity distribution. Under a commonly used “physical model”

and Zipf popularity distribution, we characterize the optimal per-node capacity

scaling law for extended networks (i.e., A(n). Moreover, we propose a

cache-induced hierarchical cooperation scheme and associated cache content

placement optimization algorithm to achieve the optimal per-node capacity

scaling law. When the path loss exponent alpha<3, the optimal per-node

capacity scaling law achieved by the cache-induced hierarchical cooperation can

be significantly better than that achieved by the existing state-of-the-art

schemes. To the best of our knowledge, this is the first work that completely

characterizes the per-node capacity scaling law for wireless caching networks

under the physical model and Zipf distribution with an arbitrary skewness

parameter au. While scaling law analysis yields clean results, it may not

accurately reflect the throughput performance of a large network with a finite

number of nodes. Therefore, we also analyze the throughput of the proposed

cache-induced hierarchical cooperation for networks of practical size. The

analysis and simulations verify that cache-induced hierarchical cooperation can

also achieve a large throughput gain over the cache-assisted multihop scheme

for networks of practical size.

Stopping Condition for Greedy Block Sparse Signal Recovery

Yu Luo , Ronggui Xie , Huarui Yin , Weidong Wang

Comments: 5 pages, 5 figures

Subjects

:

Information Theory (cs.IT)

For greedy block sparse recovery where the sparsity level is unknown, we

derive a stopping condition to stop the iteration process. Focused on the block

orthogonal matching pursuit (BOMP) algorithm, we model the energy of residual

signals at each iteration from a probabilistic perspective. At the iteration

when the last supporting block is detected, the resulting energy of residual

signals is supposed to suffer an obvious decrease. Based on this, we stop the

iteration process when the energy of residual signals is below a given

threshold. Compared with other approaches, our derived condition works well for

the BOMP recovery. What is more, we promote our approach to the interference

cancellation based BOMP (ICBOMP) recovery in paper [1]. Simulation results show

that our derived condition can save many unnecessary iterations and at the same

time guarantees a favorable recovery accuracy, both for the BOMP and ICBOMP

recoveries.

Delay Optimal Joint Scheduling-and-Power-Control for Cognitive Radio Uplinks

Ahmed Ewaisha , Cihan Tepedelenlioglu

Comments: Globecom 2016. arXiv admin note: text overlap with arXiv:1602.08010 , arXiv:1601.00608

Subjects

:

Information Theory (cs.IT)

An uplink cognitive radio system with a single primary user (PU) and multiple

secondary users (SUs) is considered. The SUs have an individual average delay

constraint and an aggregate average interference constraint to the PU. If the

interference channels between the SUs and the PU are statistically different

due to the different physical locations of the SUs, the SUs will experience

different delay performances. This is because SUs located closer to the PU

transmit with lower power levels. A dynamic scheduling-and-power-allocation

policy that uses the dynamic programming, is proposed. The proposed policy can

provide the required average delay guarantees to all SUs irrespective of their

location as well as protect the PU from harmful interference. The policy is

shown to be asymptotically delay optimal in the light traffic regime.

Motivated, by the high complexity of the dynamic programming algorithm in the

optimal policy we exploit the structure of the problem’s solution to present an

alternative suboptimal policy. Through simulations we show that in both light

and heavy traffic regimes, the delay performance of the suboptimal policy is

within (0.3/%) from the optimal policy and both outperforming existing methods.

Statistical limits of spiked tensor models

Amelia Perry , Alexander S. Wein , Afonso S. Bandeira

Comments: 38 pages, 5 figures

Subjects

:

Probability (math.PR)

; Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)

We study the statistical limits of both detecting and estimating a rank-one

deformation of a symmetric random Gaussian tensor. We establish upper and lower

bounds on the critical signal-to-noise ratio, under a variety of priors for the

planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. (pm 1)

entries, and (iii) a sparse vector where a constant fraction (

ho) of entries

are i.i.d. (pm 1) and the rest are zero. For each of these cases, our upper

and lower bounds match up to a (1+o(1)) factor as the order (d) of the tensor

becomes large. For sparse signals (iii), our bounds are also asymptotically

tight in the sparse limit (

ho o 0) for any fixed (d) (including the (d=2)

case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon

reminiscent of the work of Baik, Ben Arous and P’ech’e: an `eigenvalue’ of a

perturbed tensor emerges from the bulk at a strictly lower signal-to-noise

ratio than when the perturbation itself exceeds the bulk; we quantify the size

of this effect. We also provide some general results for larger classes of

priors. In particular, the large (d) asymptotics of the threshold location

differs between problems with discrete priors versus continuous priors.

Finally, for priors (i) and (ii) we carry out the replica prediction from

statistical physics, which is conjectured to give the exact

information-theoretic threshold for any fixed (d).

Of independent interest, we introduce a new improvement to the second moment

method for contiguity, on which our lower bounds are based. Our technique

conditions away from rare `bad’ events that depend on interactions between the

signal and noise. This enables us to close (sqrt{2})-factor gaps present in

several previous works.

Partial (ell_1) optimization in random linear systems — finite dimensions

Mihailo Stojnic

Comments: arXiv admin note: text overlap with arXiv:1612.06344

Subjects

:

Optimization and Control (math.OC)

; Information Theory (cs.IT); Probability (math.PR)

In this paper we provide a complementary set of results to those we present

in our companion work cite{Stojnicl1HidParasymldp} regarding the behavior of

the so-called partial (ell_1) (a variant of the standard (ell_1) heuristic

often employed for solving under-determined systems of linear equations). As is

well known through our earlier works

cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}, the partial (ell_1)

also exhibits the phase-transition (PT) phenomenon, discovered and well

understood in the context of the standard (ell_1) through Donoho’s and our own

works cite{DonohoPol,DonohoUnsigned,StojnicCSetam09,StojnicUpper10}.

cite{Stojnicl1HidParasymldp} goes much further though and, in addition to the

determination of the partial (ell_1)’s phase-transition curves (PT curves)

(which had already been done in

cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13}), provides a

substantially deeper understanding of the PT phenomena through a study of the

underlying large deviations principles (LDPs). As the PT and LDP phenomena are

by their definitions related to large dimensional settings, both sets of our

works, cite{StojnicICASSP10knownsupp,StojnicTowBettCompSens13} and

cite{Stojnicl1HidParasymldp}, consider what is typically called the asymptotic

regime. In this paper we move things in a different direction and consider

finite dimensional scenarios. Basically, we provide explicit performance

characterizations for any given collection of systems/parameters dimensions. We

do so for two different variants of the partial (ell_1), one that we call

exactly the partial (ell_1) and another one, possibly a bit more practical,

that we call the hidden partial (ell_1).

Partial (ell_1) optimization in random linear systems — phase transitions and large deviations

Mihailo Stojnic

Comments: arXiv admin note: text overlap with arXiv:1612.06361 , arXiv:1612.06835

Subjects

:

Optimization and Control (math.OC)

; Information Theory (cs.IT); Probability (math.PR)

(ell_1) optimization is a well known heuristic often employed for solving

various forms of sparse linear problems. In this paper we look at its a variant

that we refer to as the emph{partial} (ell_1) and discuss its mathematical

properties when used for solving linear under-determined systems of equations.

We will focus on large random systems and discuss the phase transition (PT)

phenomena and how they connect to the large deviation principles (LDP). Using a

variety of probabilistic and geometric techniques that we have developed in

recent years we will first present general guidelines that conceptually fully

characterize both, the PTs and the LDPs. After that we will put an emphasis on

providing a collection of explicit analytical solutions to all of the

underlying mathematical problems. As a nice bonus to the developed concepts,

the forms of the analytical solutions will, in our view, turn out to be fairly

elegant as well.

欢迎加入我爱机器学习QQ8群:19517895

arXiv Paper Daily: Fri, 23 Dec 2016

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习

原文  https://www.52ml.net/21435.html
正文到此结束
Loading...