转载

arXiv Paper Daily: Tue, 14 Feb 2017

Neural and Evolutionary Computing

Feature Space Modeling Through Surrogate Illumination

Adam Gaier , Alexander Asteroth , Jean-Baptiste Mouret Subjects : Neural and Evolutionary Computing (cs.NE) ; Computational Engineering, Finance, and Science (cs.CE); Machine Learning (stat.ML)

The MAP-Elites algorithm produces a set of high-performing solutions that

vary according to features defined by the user. This technique has the

potential to be a powerful tool for design space exploration, but is limited by

the need for numerous evaluations. The Surrogate-Assisted Illumination

algorithm (SAIL), introduced here, integrates approximative models and

intelligent sampling of the objective function to minimize the number of

evaluations required by MAP-Elites.

The ability of SAIL to efficiently produce both accurate models and diverse

high performing solutions is illustrated on a 2D airfoil design problem. The

search space is divided into bins, each holding a design with a different

combination of features. In each bin SAIL produces a better performing solution

than MAP-Elites, and requires several orders of magnitude fewer evaluations.

The CMA-ES algorithm was used to produce an optimal design in each bin: with

the same number of evaluations required by CMA-ES to find a near-optimal

solution in a single bin, SAIL finds solutions of similar quality in every bin.

Group Scissor: Scaling Neuromorphic Computing Design to Big Neural Networks

Yandan Wang , Wei Wen , Beiye Liu , Donald Chiarulli , Hai Li

Comments: Accepted in DAC 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI)

Synapse crossbar is an elementary structure in Neuromorphic Computing Systems

(NCS). However, the limited size of crossbars and heavy routing congestion

impedes the NCS implementations of big neural networks. In this paper, we

propose a two-step framework (namely, extit{group scissor}) to scale NCS

designs to big neural networks. The first step is extit{rank clipping}, which

integrates low-rank approximation into the training to reduce total crossbar

area. The second step is extit{group connection deletion}, which structurally

prunes connections to reduce routing congestion between crossbars. Tested on

convolutional neural networks of extit{LeNet} on MNIST database and

extit{ConvNet} on CIFAR-10 database, our experiments show significant

reduction of crossbar area and routing area in NCS designs. Without accuracy

loss, rank clipping reduces total crossbar area to 13.62/% and 51.81/% in the

NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following

rank clipping, group connection deletion further reduces the routing area of

extit{LeNet} and extit{ConvNet} to 8.1/% and 52.06/%, respectively.

Whale swarm algorithm for function optimization

Bing Zeng , Liang Gao , Xinyu Li

Comments: 8 pages, 5 figures

Subjects

:

Neural and Evolutionary Computing (cs.NE)

Increasing nature-inspired metaheuristic algorithms are applied to solving

the real-world optimization problems, as they have some advantages over the

classical methods of numerical optimization. This paper has proposed a new

nature-inspired metaheuristic called Whale Swarm Algorithm for function

optimization, which is inspired by the whales behavior of communicating with

each other via ultrasound for hunting. The proposed Whale Swarm Algorithm has

been compared with several popular metaheuristic algorithms on comprehensive

performance metrics. According to the experimental results, Whale Swarm

Algorithm has a quite competitive performance when compared with other

algorithms.

Computer Vision and Pattern Recognition

Cognitive Mapping and Planning for Visual Navigation

Saurabh Gupta , James Davidson , Sergey Levine , Rahul Sukthankar , Jitendra Malik

Comments: Under review for CVPR 2017. Project webpage: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

We introduce a neural architecture for navigation in novel environments. Our

proposed architecture learns to map from first-person viewpoints and plans a

sequence of actions towards goals in the environment. The Cognitive Mapper and

Planner (CMP) is based on two key ideas: a) a unified joint architecture for

mapping and planning, such that the mapping is driven by the needs of the

planner, and b) a spatial memory with the ability to plan given an incomplete

set of observations about the world. CMP constructs a top-down belief map of

the world and applies a differentiable neural net planner to produce the next

action at each time step. The accumulated belief of the world enables the agent

to track visited regions of the environment. Our experiments demonstrate that

CMP outperforms both reactive strategies and standard memory-based

architectures and performs well in novel environments. Furthermore, we show

that CMP can also achieve semantically specified goals, such as ‘go to a

chair’.

Estimation of the volume of the left ventricle from MRI images using deep neural networks

Fangzhou Liao , Xi Chen , Xiaolin Hu , Sen Song

Comments: 10 pages, 10 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Segmenting human left ventricle (LV) in magnetic resonance imaging (MRI)

images and calculating its volume are important for diagnosing cardiac

diseases. In 2016, Kaggle organized a competition to estimate the volume of LV

from MRI images. The dataset consisted of a large number of cases, but only

provided systole and diastole volumes as labels. We designed a system based on

neural networks to solve this problem. It began with a detector combined with a

neural network classifier for detecting regions of interest (ROIs) containing

LV chambers. Then a deep neural network named hypercolumns fully convolutional

network was used to segment LV in ROIs. The 2D segmentation results were

integrated across different images to estimate the volume. With ground-truth

volume labels, this model was trained end-to-end. To improve the result, an

additional dataset with only segmentation label was used. The model was trained

alternately on these two datasets with different types of teaching signals. We

also proposed a variance estimation method for the final prediction. Our

algorithm ranked the 4th on the test set in this competition.

Online People Tracking and Identification with RFID and Kinect

Xinyu Li , Yanyi Zhang , Ivan Marsic , Randall S. Burd

Comments: 8 Pages, 8 Figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

We introduce a novel, accurate and practical system for real-time people

tracking and identification. We used a Kinect V2 sensor for tracking that

generates a body skeleton for up to six people in the view. We perform

identification using both Kinect and passive RFID, by first measuring the

velocity vector of person’s skeleton and of their RFID tag using the position

of the RFID reader antennas as reference points and then finding the best match

between skeletons and tags. We introduce a method for synchronizing Kinect

data, which is captured regularly, with irregular or missing RFID data

readouts. Our experiments show centimeter-level people tracking resolution with

80% average identification accuracy for up to six people in indoor

environments, which meets the needs of many applications. Our system can

preserve user privacy and work with different lighting.

An Efficient Decomposition Framework for Discriminative Segmentation with Supermodular Losses

Jiaqian Yu , Matthew B. Blaschko Subjects : Computer Vision and Pattern Recognition (cs.CV)

Several supermodular losses have been shown to improve the perceptual quality

of image segmentation in a discriminative framework such as a structured output

support vector machine (SVM). These loss functions do not necessarily have the

same structure as the one used by the segmentation inference algorithm, and in

general, we may have to resort to generic submodular minimization algorithms

for loss augmented inference. Although these come with polynomial time

guarantees, they are not practical to apply to image scale data. Many

supermodular losses come with strong optimization guarantees, but are not

readily incorporated in a loss augmented graph cuts procedure. This motivates

our strategy of employing the alternating direction method of multipliers

(ADMM) decomposition for loss augmented inference. In doing so, we create a new

API for the structured SVM that separates the maximum a posteriori (MAP)

inference of the model from the loss augmentation during training. In this way,

we gain computational efficiency, making new choices of loss functions

practical for the first time, while simultaneously making the inference

algorithm employed during training closer to the test time procedure. We show

improvement both in accuracy and computational performance on the Microsoft

Research Grabcut database and a brain structure segmentation task, empirically

validating the use of several supermodular loss functions during training, and

the improved computational properties of the proposed ADMM approach over the

Fujishige-Wolfe minimum norm point algorithm.

Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis

Sebastian Bodenstedt (1), Martin Wagner (2), Darko Katić (1), Patrick Mietkowski (2), Benjamin Mayer (2), Hannes Kenngott (2), Beat Müller-Stich (2), Rüdiger Dillmann (1), Stefanie Speidel (1) ((1) Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, (2) Department of General, Visceral and Transplant Surgery, University of Heidelberg, Heidelberg) Subjects : Computer Vision and Pattern Recognition (cs.CV)

Computer-assisted surgery (CAS) aims to provide the surgeon with the right

type of assistance at the right moment. Such assistance systems are especially

relevant in laparoscopic surgery, where CAS can alleviate some of the drawbacks

that surgeons incur. For many assistance functions, e.g. displaying the

location of a tumor at the appropriate time or suggesting what instruments to

prepare next, analyzing the surgical workflow is a prerequisite. Since

laparoscopic interventions are performed via endoscope, the video signal is an

obvious sensor modality to rely on for workflow analysis.

Image-based workflow analysis tasks in laparoscopy, such as phase

recognition, skill assessment, video indexing or automatic annotation, require

a temporal distinction between video frames. Generally computer vision based

methods that generalize from previously seen data are used. For training such

methods, large amounts of annotated data are necessary. Annotating surgical

data requires expert knowledge, therefore collecting a sufficient amount of

data is difficult, time-consuming and not always feasible.

In this paper, we address this problem by presenting an unsupervised method

for training a convolutional neural network (CNN) to differentiate between

laparoscopic video frames on a temporal basis. We extract video frames at

regular intervals from 324 unlabeled laparoscopic interventions, resulting in a

dataset of approximately 2.2 million images. From this dataset, we extract

image pairs from the same video and train a CNN to determine their temporal

order. To solve this problem, the CNN has to extract features that are relevant

for comprehending laparoscopic workflow.

Furthermore, we demonstrate that such a CNN can be adapted for surgical

workflow segmentation. We performed image-based workflow segmentation on a

publicly available dataset of 7 cholecystectomies and 9 colorectal

interventions.

Underwater Optical Image Processing: A Comprehensive Review

Huimin Lu , Yujie Li , Yudong Zhang , Min Chen , Seiichi Serikawa , Hyoungseop Kim

Comments: 14 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

Underwater cameras are widely used to observe the sea floor. They are usually

included in autonomous underwater vehicles, unmanned underwater vehicles, and

in situ ocean sensor networks. Despite being an important sensor for monitoring

underwater scenes, there exist many issues with recent underwater camera

sensors. Because of lights transportation characteristics in water and the

biological activity at the sea floor, the acquired underwater images often

suffer from scatters and large amounts of noise. Over the last five years, many

methods have been proposed to overcome traditional underwater imaging problems.

This paper aims to review the state-of-the-art techniques in underwater image

processing by highlighting the contributions and challenges presented in over

40 papers. We present an overview of various underwater image processing

approaches, such as underwater image descattering, underwater image color

restoration, and underwater image quality assessments. Finally, we summarize

the future trends and challenges in designing and processing underwater imaging

sensors.

Sparse Representation based Multi-sensor Image Fusion: A Review

Qiang Zhang , Yi Liu , Rick S. Blum , Jungong Han , Dacheng Tao

Comments: 19 pages

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

As a result of several successful applications in computer vision and image

processing, sparse representation (SR) has attracted significant attention in

multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs)

that presume the basis functions, SR learns an over-complete dictionary from a

set of training images for image fusion, and it achieves more stable and

meaningful representations of the source images. By doing so, the SR-based

fusion methods generally outperform the traditional MST-based image fusion

methods in both subjective and objective tests. In addition, they are less

susceptible to mis-registration among the source images, thus facilitating the

practical applications. This survey paper proposes a systematic review of the

SR-based multi-sensor image fusion literature, highlighting the pros and cons

of each category of approaches. Specifically, we start by performing a

theoretical investigation of the entire system from three key algorithmic

aspects, (1) sparse representation models; (2) dictionary learning methods; and

(3) activity levels and fusion rules. Subsequently, we show how the existing

works address these scientific problems and design the appropriate fusion rules

for each application, such as multi-focus image fusion and multi-modality

(e.g., infrared and visible) image fusion. At last, we carry out some

experiments to evaluate the impact of these three algorithmic components on the

fusion performance when dealing with different applications. This article is

expected to serve as a tutorial and source of reference for researchers

preparing to enter the field or who desire to employ the sparse representation

theory in other fields.

A Novel Weight-Shared Multi-Stage Network Architecture of CNNs for Scale Invariance

Ryo Takahashi , Takashi Matsubara , Kuniaki Uehara Subjects : Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNNs) have demonstrated remarkable results in

image classification tasks for benchmark and practical uses. The CNNs with

deeper architectures have achieved higher performances thanks to their numerous

parameters and resulting high expression ability recently. However, the CNNs

have a problem of limited robustness to geometric transformation of objects in

images such as scaling and rotation. This problem is considered to limit

performance improvement of the deep CNNs but there is no established solution.

This study focuses on scale transformation and proposes a novel network

architecture called weight-shared multi-stage network (WSMS-Net), which enables

the existing deep CNNs, such as ResNet and DenseNet, to acquire robustness to

scaling of objects. The WSMS-Net architecture consists of multiple stages of

CNNs and is easily combined with existing deep CNNs. This study demonstrates

that existing deep CNNs combined the proposed WSMS-Net archive higher accuracy

for image classification tasks only with little increase in the number of

parameters.

Crossing Nets: Dual Generative Models with a Shared Latent Space for Hand Pose Estimation

Chengde Wan , Thomas Probst , Luc Van Gool , Angela Yao

Comments: 10 pages, 5 figures

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

State-of-the-art methods for 3D hand pose estimation from depth images

require large amounts of annotated training data. We propose to model the

statistical relationships of 3D hand poses and corresponding depth images using

two deep generative models with a shared latent space. By design, our

architecture allows for learning from unlabeled image data in a semi-supervised

manner. Assuming a one-to-one mapping between a pose and a depth map, any given

point in the shared latent space can be projected into both a hand pose and a

corresponding depth map. Regressing the hand pose can then be done by learning

a discriminator to estimate the posterior of the latent pose given some depth

map. To improve generalization and to better exploit unlabeled depth maps, we

jointly train a generator and a discriminator. At each iteration, the generator

is updated with the back-propagated gradient from the discriminator to

synthesize realistic depth maps of the articulated hand, while the

discriminator benefits from an augmented training set of synthesized and

unlabeled samples. The proposed discriminator network architecture is highly

efficient and runs at 90 FPS on the CPU with accuracies comparable or better

than state-of-art on 3 publicly available benchmarks.

ArtGAN: Artwork Synthesis with Conditional Categorial GANs

Wei Ren Tan , Chee Seng Chan , Hernan Aguirre , Kiyoshi Tanaka

Comments: 10 pages, 10 figures, submitted to ICIP2017 (extension version)

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This paper proposes an extension to the Generative Adversarial Networks

(GANs), namely as ARTGAN to synthetically generate more challenging and complex

images such as artwork that have abstract characteristics. This is in contrast

to most of the current solutions that focused on generating natural images such

as room interiors, birds, flowers and faces. The key innovation of our work is

to allow back-propagation of the loss function w.r.t. the labels (randomly

assigned to each generated images) to the generator from the discriminator.

With the feedback from the label information, the generator is able to learn

faster and achieve better generated image quality. Empirically, we show that

the proposed ARTGAN is capable to create realistic artwork, as well as generate

compelling real world images that globally look natural with clear shape on

CIFAR-10.

Reverse Classification Accuracy: Predicting Segmentation Performance in the Absence of Ground Truth

Vanya V. Valindria , Ioannis Lavdas , Wenjia Bai , Konstantinos Kamnitsas , Eric O. Aboagye , Andrea G. Rockall , Daniel Rueckert , Ben Glocker

Comments: Accepted article to appear in IEEE Transactions on Medical Imaging 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

When integrating computational tools such as automatic segmentation into

clinical practice, it is of utmost importance to be able to assess the level of

accuracy on new data, and in particular, to detect when an automatic method

fails. However, this is difficult to achieve due to absence of ground truth.

Segmentation accuracy on clinical data might be different from what is found

through cross-validation because validation data is often used during

incremental method development, which can lead to overfitting and unrealistic

performance expectations. Before deployment, performance is quantified using

different metrics, for which the predicted segmentation is compared to a

reference segmentation, often obtained manually by an expert. But little is

known about the real performance after deployment when a reference is

unavailable. In this paper, we introduce the concept of reverse classification

accuracy (RCA) as a framework for predicting the performance of a segmentation

method on new data. In RCA we take the predicted segmentation from a new image

to train a reverse classifier which is evaluated on a set of reference images

with available ground truth. The hypothesis is that if the predicted

segmentation is of good quality, then the reverse classifier will perform well

on at least some of the reference images. We validate our approach on

multi-organ segmentation with different classifiers and segmentation methods.

Our results indicate that it is indeed possible to predict the quality of

individual segmentations, in the absence of ground truth. Thus, RCA is ideal

for integration into automatic processing pipelines in clinical routine and as

part of large-scale image analysis studies.

Enhanced Local Binary Patterns for Automatic Face Recognition

Pavel Král , Antonín Vrba

Comments: Submitted for ICIP 2017

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel automatic face recognition approach based on

local binary patterns (LBP). LBP descriptor considers a local neighbourhood of

a pixel to compute the features. This method is not very robust to handle image

noise, variances and different illumination conditions. In this paper, we

address these issues and extend the original LBP operator by considering more

pixels and different neighbourhoods to compute the feature vector. The proposed

method is evaluated on two benchmark corpora, namely UFI and FERET face

datasets. We experimentally show that our approach is very efficient because it

significantly outperforms several other state-of-the-art methods and is

efficient particularly in the real conditions where the above mentioned issues

are obvious.

Multi-Resolution Dual-Tree Wavelet Scattering Network for Signal Classification

Amarjot Singh , Nick Kingsbury Subjects : Computer Vision and Pattern Recognition (cs.CV)

This paper introduces a Deep Scattering network that utilizes Dual-Tree

complex wavelets to extract translation invariant representations from an input

signal. The computationally efficient Dual-Tree wavelets decompose the input

signal into densely spaced representations over scales. Translation invariance

is introduced in the representations by applying a non-linearity over a region

followed by averaging. The discriminatory information in the densely spaced,

locally smooth, signal representations aids the learning of the classifier. The

proposed network is shown to outperform Mallat’s ScatterNet on four datasets

with different modalities on classification accuracy.

Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Object-based Models

Siddharth Choudhary , Luca Carlone , Carlos Nieto , John Rogers , Henrik I. Christensen , Frank Dellaert

Comments: preprint for IJRR submission

Subjects

:

Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

We consider the following problem: a team of robots is deployed in an unknown

environment and it has to collaboratively build a map of the area without a

reliable infrastructure for communication. The backbone for modern mapping

techniques is pose graph optimization, which estimates the trajectory of the

robots, from which the map can be easily built. The first contribution of this

paper is a set of distributed algorithms for pose graph optimization: rather

than sending all sensor data to a remote sensor fusion server, the robots

exchange very partial and noisy information to reach an agreement on the pose

graph configuration. Our approach can be considered as a distributed

implementation of the two-stage approach of Carlone et al., where we use the

Successive Over-Relaxation (SOR) and the Jacobi Over-Relaxation (JOR) as

workhorses to split the computation among the robots. As a second contribution,

we extend %and demonstrate the applicability of the proposed distributed

algorithms to work with object-based map models. The use of object-based models

avoids the exchange of raw sensor measurements (e.g., point clouds) further

reducing the communication burden. Our third contribution is an extensive

experimental evaluation of the proposed techniques, including tests in

realistic Gazebo simulations and field experiments in a military test facility.

Abundant experimental evidence suggests that one of the proposed algorithms

(the Distributed Gauss-Seidel method or DGS) has excellent performance. The DGS

requires minimal information exchange, has an anytime flavor, scales well to

large teams, is robust to noise, and is easy to implement. Our field tests show

that the combined use of our distributed algorithms and object-based models

reduces the communication requirements by several orders of magnitude and

enables distributed mapping with large teams of robots in real-world problems.

Artificial Intelligence

Bilateral Multi-Perspective Matching for Natural Language Sentences

Zhiguo Wang , Wael Hamza , Radu Florian

Comments: 7

Subjects

:

Artificial Intelligence (cs.AI)

; Computation and Language (cs.CL)

Natural language sentence matching is a fundamental technology for a variety

of tasks. Previous approaches either match sentences from a single direction or

only apply single granular (word-by-word or sentence-by-sentence) matching. In

this work, we propose a bilateral multi-perspective matching (BiMPM) model

under the “matching-aggregation” framework. Given two sentences (P) and (Q),

our model first encodes them with a BiLSTM encoder. Next, we match the two

encoded sentences in two directions (P

ightarrow Q) and (P leftarrow Q). In

each matching direction, each time step of one sentence is matched against all

time-steps of the other sentence from multiple perspectives. Then, another

BiLSTM layer is utilized to aggregate the matching results into a fix-length

matching vector. Finally, based on the matching vector, the decision is made

through a fully connected layer. We evaluate our model on three tasks:

paraphrase identification, natural language inference and answer sentence

selection. Experimental results on standard benchmark datasets show that our

model achieves the state-of-the-art performance on all tasks.

Traditional PageRank versus Network Capacity Bound

Mieczysław A.Kłopotek , Sławomir T.Wierzchom , Robert A. Kłopotek , Elżbieta A. Kłopotek Subjects : Artificial Intelligence (cs.AI)

In a former paper we simplified the proof of a theorem on personalized random

walk that is fundamental to graph nodes clustering and generalized it to

bipartite graphs for a specific case where the proobability of random jump was

proprtional to the number of links of “personally prefereed” nodes. In this

paper we turn to the more complex issue of graphs in which the random jump

follows uniform distribution.

On Seeking Consensus Between Document Similarity Measures

Mieczysław Kłopotek Subjects : Artificial Intelligence (cs.AI)

This paper investigates the application of consensus clustering and

meta-clustering to the set of all possible partitions of a data set. We show

that when using a “complement” of Rand Index as a measure of cluster

similarity, the total-separation partition, putting each element in a separate

set, is chosen.

Genetic and Memetic Algorithm with Diversity Equilibrium based on Greedy Diversification

Andrés Herrera-Poyatos , Francisco Herrera

Comments: 27 pages, 5 figures, 11 tables

Subjects

:

Artificial Intelligence (cs.AI)

The lack of diversity in a genetic algorithm’s population may lead to a bad

performance of the genetic operators since there is not an equilibrium between

exploration and exploitation. In those cases, genetic algorithms present a fast

and unsuitable convergence.

In this paper we develop a novel hybrid genetic algorithm which attempts to

obtain a balance between exploration and exploitation. It confronts the

diversity problem using the named greedy diversification operator. Furthermore,

the proposed algorithm applies a competition between parent and children so as

to exploit the high quality visited solutions. These operators are complemented

by a simple selection mechanism designed to preserve and take advantage of the

population diversity.

Additionally, we extend our proposal to the field of memetic algorithms,

obtaining an improved model with outstanding results in practice.

The experimental study shows the validity of the approach as well as how

important is taking into account the exploration and exploitation concepts when

designing an evolutionary algorithm.

Graph Neural Networks and Boolean Satisfiability

Benedikt Bünz , Matthew Lamm Subjects : Artificial Intelligence (cs.AI)

In this paper we explore whether or not deep neural architectures can learn

to classify Boolean satisfiability (SAT). We devote considerable time to

discussing the theoretical properties of SAT. Then, we define a graph

representation for Boolean formulas in conjunctive normal form, and train

neural classifiers over general graph structures called Graph Neural Networks,

or GNNs, to recognize features of satisfiability. To the best of our knowledge

this has never been tried before. Our preliminary findings are potentially

profound. In a weakly-supervised setting, that is, without problem specific

feature engineering, Graph Neural Networks can learn features of

satisfiability.

Similarity Preserving Representation Learning for Time Series Analysis

Qi Lei , Jinfeng Yi , Roman Vaculin , Lingfei Wu , Inderjit S. Dhillon Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG)

A considerable amount of machine learning algorithms take matrices as their

inputs. As such, they cannot directly analyze time series data due to its

temporal nature, usually unequal lengths, and complex properties. This is a

great pity since many of these algorithms are effective, robust, efficient, and

easy to use. In this paper, we bridge this gap by proposing an efficient

representation learning framework that is able to convert a set of time series

with equal or unequal lengths to a matrix format. In particular, we guarantee

that the pairwise similarities between time series are well preserved after the

transformation. Therefore, the learned feature representation is particularly

suitable to the class of learning problems that are sensitive to data

similarities. Given a set of (n) time series, we first construct an (n imes n)

partially observed similarity matrix by randomly sampling (O(n log n)) pairs

of time series and computing their pairwise similarities. We then propose an

extremely efficient algorithm that solves a highly non-convex and NP-hard

problem to learn new features based on the partially observed similarity

matrix. We use the learned features to conduct experiments on both data

classification and clustering tasks. Our extensive experimental results

demonstrate that the proposed framework is both effective and efficient.

Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Karan Goel , Shreya Rajpal , Mausam Subjects : Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

Managing micro-tasks on crowdsourcing marketplaces involves balancing

conflicting objectives — the quality of work, total cost incurred and time to

completion. Previous agents have focused on cost-quality, or cost-time

tradeoffs, limiting their real-world applicability. As a step towards this goal

we present Octopus, the first AI agent that jointly manages all three

objectives in tandem. Octopus is based on a computationally tractable,

multi-agent formulation consisting of three components; one that sets the price

per ballot to adjust the rate of completion of tasks, another that optimizes

each task for quality and a third that performs task selection. We demonstrate

that Octopus outperforms existing state-of-the-art approaches in simulation and

experiments with real data, demonstrating its superior performance. We also

deploy Octopus on Amazon Mechanical Turk to establish its ability to manage

tasks in a real-world, dynamic setting.

A Minimax Algorithm Better Than Alpha-beta?: No and Yes

Aske Plaat , Jonathan Schaeffer , Wim Pijls , Arie de Bruin

Comments: Report version of AI Journal article Best-first fixed-depth minimax algorithms 1996

Subjects

:

Artificial Intelligence (cs.AI)

This paper has three main contributions to our understanding of fixed-depth

minimax search: (A) A new formulation for Stockman’s SSS* algorithm, based on

Alpha-Beta, is presented. It solves all the perceived drawbacks of SSS*,

finally transforming it into a practical algorithm. In effect, we show that

SSS* = alpha-beta + ransposition tables. The crucial step is the realization

that transposition tables contain so-called solution trees, structures that are

used in best-first search algorithms like SSS*. Having created a practical

version, we present performance measurements with tournament game-playing

programs for three different minimax games, yielding results that contradict a

number of publications. (B) Based on the insights gained in our attempts at

understanding SSS*, we present a framework that facilitates the construction of

several best-first fixed- depth game-tree search algorithms, known and new. The

framework is based on depth-first null-window Alpha-Beta search, enhanced with

storage to allow for the refining of previous search results. It focuses

attention on the essential differences between algorithms. (C) We present a new

instance of the framework, MTD(f). It is well-suited for use with iterative

deepening, and performs better than algorithms that are currently used in most

state-of-the-art game-playing programs. We provide experimental evidence to

explain why MTD(f) performs better than the other fixed-depth minimax

algorithms.

Cognitive Mapping and Planning for Visual Navigation

Saurabh Gupta , James Davidson , Sergey Levine , Rahul Sukthankar , Jitendra Malik

Comments: Under review for CVPR 2017. Project webpage: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

We introduce a neural architecture for navigation in novel environments. Our

proposed architecture learns to map from first-person viewpoints and plans a

sequence of actions towards goals in the environment. The Cognitive Mapper and

Planner (CMP) is based on two key ideas: a) a unified joint architecture for

mapping and planning, such that the mapping is driven by the needs of the

planner, and b) a spatial memory with the ability to plan given an incomplete

set of observations about the world. CMP constructs a top-down belief map of

the world and applies a differentiable neural net planner to produce the next

action at each time step. The accumulated belief of the world enables the agent

to track visited regions of the environment. Our experiments demonstrate that

CMP outperforms both reactive strategies and standard memory-based

architectures and performs well in novel environments. Furthermore, we show

that CMP can also achieve semantically specified goals, such as ‘go to a

chair’.

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L. Smith , David H. P. Turban , Steven Hamblin , Nils Y. Hammerla

Comments: Accepted to conference track at ICLR 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Usually bilingual word vectors are trained “online”. Mikolov et al. showed

they can also be found “offline”, whereby two pre-trained embeddings are

aligned with a linear transformation, using dictionaries compiled from expert

knowledge. In this work, we prove that the linear transformation between two

spaces should be orthogonal. This transformation can be obtained using the

singular value decomposition. We introduce a novel “inverted softmax” for

identifying translation pairs, with which we improve the precision @1 of

Mikolov’s original mapping from 34% to 43%, when translating a test set

composed of both common and rare English words into Italian. Orthogonal

transformations are more robust to noise, enabling us to learn the

transformation without expert bilingual signal by constructing a

“pseudo-dictionary” from the identical character strings which appear in both

languages, achieving 40% precision on the same test set. Finally, we extend our

method to retrieve the true translations of English sentences from a corpus of

200k Italian sentences with a precision @1 of 68%.

Reservoir Computing Using Non-Uniform Binary Cellular Automata

Stefano Nichele , Magnus S. Gundersen Subjects : Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI)

The Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a

reservoir, and a linear classifier, i.e., a read-out layer, to process data

from sequential classification tasks. In this paper the usage of Cellular

Automata (CA) as a reservoir is investigated. The use of CA in RC has been

showing promising results. In this paper, selected state-of-the-art experiments

are reproduced. It is shown that some CA-rules perform better than others, and

the reservoir performance is improved by increasing the size of the CA

reservoir itself. In addition, the usage of parallel loosely coupled

CA-reservoirs, where each reservoir has a different CA-rule, is investigated.

The experiments performed on quasi-uniform CA reservoir provide valuable

insights in CA reservoir design. The results herein show that some rules do not

work well together, while other combinations work remarkably well. This

suggests that non-uniform CA could represent a powerful tool for novel CA

reservoir implementations.

Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

Patrick Glauner , Angelo Migliosi , Jorge Meira , Eric Aislan Antonelo , Petko Valtchev , Radu State , Franck Bettinger Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)

Non-technical losses (NTL) occur during the distribution of electricity in

power grids and include, but are not limited to, electricity theft and faulty

meters. In emerging countries, they may range up to 40% of the total

electricity distributed. In order to detect NTLs, machine learning methods are

used that learn irregular consumption patterns from customer data and

inspection results. The Big Data paradigm followed in modern machine learning

reflects the desire of deriving better conclusions from simply analyzing more

data, without the necessity of looking at theory and models. However, the

sample of inspected customers may be biased, i.e. it does not represent the

population of all customers. As a consequence, machine learning models trained

on these inspection results are biased as well and therefore lead to unreliable

predictions of whether customers cause NTL or not. In machine learning, this

issue is called covariate shift and has not been addressed in the literature on

NTL detection yet. In this work, we present a novel framework for quantifying

and visualizing covariate shift. We apply it to a commercial data set from

Brazil that consists of 3.6M customers and 820K inspection results. We show

that some features have a stronger covariate shift than others, making

predictions less reliable. In particular, previous inspections were focused on

certain neighborhoods or customer classes and that they were not sufficiently

spread among the population of customers. This framework is about to be

deployed in a commercial product for NTL detection.

Group Scissor: Scaling Neuromorphic Computing Design to Big Neural Networks

Yandan Wang , Wei Wen , Beiye Liu , Donald Chiarulli , Hai Li

Comments: Accepted in DAC 2017

Subjects

:

Neural and Evolutionary Computing (cs.NE)

; Artificial Intelligence (cs.AI)

Synapse crossbar is an elementary structure in Neuromorphic Computing Systems

(NCS). However, the limited size of crossbars and heavy routing congestion

impedes the NCS implementations of big neural networks. In this paper, we

propose a two-step framework (namely, extit{group scissor}) to scale NCS

designs to big neural networks. The first step is extit{rank clipping}, which

integrates low-rank approximation into the training to reduce total crossbar

area. The second step is extit{group connection deletion}, which structurally

prunes connections to reduce routing congestion between crossbars. Tested on

convolutional neural networks of extit{LeNet} on MNIST database and

extit{ConvNet} on CIFAR-10 database, our experiments show significant

reduction of crossbar area and routing area in NCS designs. Without accuracy

loss, rank clipping reduces total crossbar area to 13.62/% and 51.81/% in the

NCS designs of extit{LeNet} and extit{ConvNet}, respectively. Following

rank clipping, group connection deletion further reduces the routing area of

extit{LeNet} and extit{ConvNet} to 8.1/% and 52.06/%, respectively.

Information Retrieval

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L. Smith , David H. P. Turban , Steven Hamblin , Nils Y. Hammerla

Comments: Accepted to conference track at ICLR 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Usually bilingual word vectors are trained “online”. Mikolov et al. showed

they can also be found “offline”, whereby two pre-trained embeddings are

aligned with a linear transformation, using dictionaries compiled from expert

knowledge. In this work, we prove that the linear transformation between two

spaces should be orthogonal. This transformation can be obtained using the

singular value decomposition. We introduce a novel “inverted softmax” for

identifying translation pairs, with which we improve the precision @1 of

Mikolov’s original mapping from 34% to 43%, when translating a test set

composed of both common and rare English words into Italian. Orthogonal

transformations are more robust to noise, enabling us to learn the

transformation without expert bilingual signal by constructing a

“pseudo-dictionary” from the identical character strings which appear in both

languages, achieving 40% precision on the same test set. Finally, we extend our

method to retrieve the true translations of English sentences from a corpus of

200k Italian sentences with a precision @1 of 68%.

Computation and Language

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L. Smith , David H. P. Turban , Steven Hamblin , Nils Y. Hammerla

Comments: Accepted to conference track at ICLR 2017

Subjects

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Usually bilingual word vectors are trained “online”. Mikolov et al. showed

they can also be found “offline”, whereby two pre-trained embeddings are

aligned with a linear transformation, using dictionaries compiled from expert

knowledge. In this work, we prove that the linear transformation between two

spaces should be orthogonal. This transformation can be obtained using the

singular value decomposition. We introduce a novel “inverted softmax” for

identifying translation pairs, with which we improve the precision @1 of

Mikolov’s original mapping from 34% to 43%, when translating a test set

composed of both common and rare English words into Italian. Orthogonal

transformations are more robust to noise, enabling us to learn the

transformation without expert bilingual signal by constructing a

“pseudo-dictionary” from the identical character strings which appear in both

languages, achieving 40% precision on the same test set. Finally, we extend our

method to retrieve the true translations of English sentences from a corpus of

200k Italian sentences with a precision @1 of 68%.

Towards speech-to-text translation without speech recognition

Sameer Bansal , Herman Kamper , Adam Lopez , Sharon Goldwater

Comments: To appear in EACL 2017 (short papers)

Subjects

:

Computation and Language (cs.CL)

We explore the problem of translating speech to text in low-resource

scenarios where neither automatic speech recognition (ASR) nor machine

translation (MT) are available, but we have training data in the form of audio

paired with text translations. We present the first system for this problem

applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English

speech translation corpus. Our approach uses unsupervised term discovery (UTD)

to cluster repeated patterns in the audio, creating a pseudotext, which we pair

with translations to create a parallel text and train a simple bag-of-words MT

model. We identify the challenges faced by the system, finding that the

difficulty of cross-speaker UTD results in low recall, but that our system is

still able to correctly translate some content words in test data.

Multitask Learning with Deep Neural Networks for Community Question Answering

Daniele Bonadiman , Antonio Uva , Alessandro Moschitti Subjects : Computation and Language (cs.CL)

In this paper, we developed a deep neural network (DNN) that learns to solve

simultaneously the three tasks of the cQA challenge proposed by the

SemEval-2016 Task 3, i.e., question-comment similarity, question-question

similarity and new question-comment similarity. The latter is the main task,

which can exploit the previous two for achieving better results. Our DNN is

trained jointly on all the three cQA tasks and learns to encode questions and

comments into a single vector representation shared across the multiple tasks.

The results on the official challenge test set show that our approach produces

higher accuracy and faster convergence rates than the individual neural

networks. Additionally, our method, which does not use any manual feature

engineering, approaches the state of the art established with methods that make

heavy use of it.

A Morphology-aware Network for Morphological Disambiguation

Eray Yildiz , Caglar Tirkaz , H. Bahadir Sahin , Mustafa Tolga Eren , Ozan Sonmez

Comments: 6 pages, 1 figure, Thirtieth AAAI Conference on Artificial Intelligence. 2016

Subjects

:

Computation and Language (cs.CL)

Agglutinative languages such as Turkish, Finnish and Hungarian require

morphological disambiguation before further processing due to the complex

morphology of words. A morphological disambiguator is used to select the

correct morphological analysis of a word. Morphological disambiguation is

important because it generally is one of the first steps of natural language

processing and its performance affects subsequent analyses. In this paper, we

propose a system that uses deep learning techniques for morphological

disambiguation. Many of the state-of-the-art results in computer vision, speech

recognition and natural language processing have been obtained through deep

learning models. However, applying deep learning techniques to morphologically

rich languages is not well studied. In this work, while we focus on Turkish

morphological disambiguation we also present results for French and German in

order to show that the proposed architecture achieves high accuracy with no

language-specific feature engineering or additional resource. In the

experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation

accuracy among the ambiguous words for Turkish, German and French respectively.

Learning to Parse and Translate Improves Neural Machine Translation

Akiko Eriguchi , Yoshimasa Tsuruoka , Kyunghyun Cho Subjects : Computation and Language (cs.CL)

There has been relatively little attention to incorporating linguistic prior

to neural machine translation. Much of the previous work was further

constrained to considering linguistic prior on the source side. In this paper,

we propose a hybrid model, called NMT+RG, that learns to parse and translate by

combining the recurrent neural network grammar into the attention-based neural

machine translation. Our approach encourages the neural machine translation

model to incorporate linguistic prior during training, and lets it translate on

its own afterward. Extensive experiments with four language pairs show the

effectiveness of the proposed NMT+RG.

Vector Embedding of Wikipedia Concepts and Entities

Ehsan Sherkat , Evangelos Milios Subjects : Computation and Language (cs.CL)

Using deep learning for different machine learning tasks such as image

classification and word embedding has recently gained many attentions. Its

appealing performance reported across specific Natural Language Processing

(NLP) tasks in comparison with other approaches is the reason for its

popularity. Word embedding is the task of mapping words or phrases to a low

dimensional numerical vector. In this paper, we use deep learning to embed

Wikipedia Concepts and Entities. The English version of Wikipedia contains more

than five million pages, which suggest its capability to cover many English

Entities, Phrases, and Concepts. Each Wikipedia page is considered as a

concept. Some concepts correspond to entities, such as a person’s name, an

organization or a place. Contrary to word embedding, Wikipedia Concepts

Embedding is not ambiguous, so there are different vectors for concepts with

similar surface form but different mentions. We proposed several approaches and

evaluated their performance based on Concept Analogy and Concept Similarity

tasks. The results show that proposed approaches have the performance

comparable and in some cases even higher than the state-of-the-art methods.

Parallel Long Short-Term Memory for Multi-stream Classification

Mohamed Bouaziz , Mohamed Morchid , Richard Dufour , Georges Linarès , Renato De Mori

Comments: 2016 IEEE Workshop on Spoken Language Technology

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG)

Recently, machine learning methods have provided a broad spectrum of original

and efficient algorithms based on Deep Neural Networks (DNN) to automatically

predict an outcome with respect to a sequence of inputs. Recurrent hidden cells

allow these DNN-based models to manage long-term dependencies such as Recurrent

Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these

RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)

directions. But most of the information available nowadays is from multistreams

or multimedia documents, and require RNNs to process these information

synchronously during the training. This paper presents an original LSTM-based

architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel

synchronized input sequences in order to predict a common output. The proposed

PLSTM method could be used for parallel sequence classification purposes. The

PLSTM approach is evaluated on an automatic telecast genre sequences

classification task and compared with different state-of-the-art architectures.

Results show that the proposed PLSTM method outperforms the baseline n-gram

models as well as the state-of-the-art LSTM approach.

Learning Concept Embeddings for Efficient Bag-of-Concepts Densification

Walid Shalaby , Wlodek Zadrozny Subjects : Computation and Language (cs.CL)

Explicit concept space models have proven efficacy for text representation in

many natural language and text mining applications. The idea is to embed

textual structures into a semantic space of concepts which captures the main

topics of these structures. That so called bag-of-concepts representation

suffers from data sparsity causing low similarity scores between similar texts

due to low concept overlap. In this paper we propose two neural embedding

models in order to learn continuous concept vectors. Once learned, we propose

an efficient vector aggregation method to generate fully dense bag-of-concepts

representations. Empirical results on a benchmark dataset for measuring entity

semantic relatedness show superior performance over other concept embedding

models. In addition, by utilizing our efficient aggregation method, we

demonstrate the effectiveness of the densified vector representation over the

typical sparse representations for dataless classification where we can achieve

at least same or better accuracy with much less dimensions.

Universal Dependencies to Logical Forms with Negation Scope

Federico Fancellu , Siva Reddy , Adam Lopez , Bonnie Webber

Comments: This a draft version of the paper. We welcome any comments you may have regarding the content and presentation

Subjects

:

Computation and Language (cs.CL)

Many language technology applications would benefit from the ability to

represent negation and its scope on top of widely-used linguistic resources. In

this paper, we investigate the possibility of obtaining a first-order logic

representation with negation scope marked using Universal Dependencies. To do

so, we enhance UDepLambda, a framework that converts dependency graphs to

logical forms. The resulting UDepLambda(lnot) is able to handle phenomena

related to scope by means of an higher-order type theory, relevant not only to

negation but also to universal quantification and other complex semantic

phenomena. The initial conversion we did for English is promising, in that one

can represent the scope of negation also in the presence of more complex

phenomena such as universal quantifiers.

Bilateral Multi-Perspective Matching for Natural Language Sentences

Zhiguo Wang , Wael Hamza , Radu Florian

Comments: 7

Subjects

:

Artificial Intelligence (cs.AI)

; Computation and Language (cs.CL)

Natural language sentence matching is a fundamental technology for a variety

of tasks. Previous approaches either match sentences from a single direction or

only apply single granular (word-by-word or sentence-by-sentence) matching. In

this work, we propose a bilateral multi-perspective matching (BiMPM) model

under the “matching-aggregation” framework. Given two sentences (P) and (Q),

our model first encodes them with a BiLSTM encoder. Next, we match the two

encoded sentences in two directions (P

ightarrow Q) and (P leftarrow Q). In

each matching direction, each time step of one sentence is matched against all

time-steps of the other sentence from multiple perspectives. Then, another

BiLSTM layer is utilized to aggregate the matching results into a fix-length

matching vector. Finally, based on the matching vector, the decision is made

through a fully connected layer. We evaluate our model on three tasks:

paraphrase identification, natural language inference and answer sentence

selection. Experimental results on standard benchmark datasets show that our

model achieves the state-of-the-art performance on all tasks.

Distributed, Parallel, and Cluster Computing

Unit Commitment on the Cloud

Mushfiqur R. Sarker , Jianhui Wang

Comments: 2 pages, 1 figure, 1 table, IEEE Letter

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

The advent of High Performance Computing (HPC) has provided the computational

capacity required for power system operators (SO) to obtain solutions in the

least time to highly-complex applications, i.e., Unit Commitment (UC). The UC

problem, which attempts to schedule the least-cost combination of generating

units to meet the load, is increasing in complexity and problem size due to

deployments of renewable resources and smart grid technologies. The current

approach to solving the UC problem consists of in-house HPC infrastructures,

which experience issues at scale, and demands high maintenance and capital

expenditures. On the other hand, cloud computing is an ideal substitute due to

its powerful computational capacity, rapid scalability, and high

cost-effectiveness. In this work, the benefits and challenges of outsourcing

the UC application to the cloud are explored. A quantitative analysis of the

computational performance gain is explored for a large-scale UC problem solved

on the cloud and compared to traditional in-house HPC infrastructure. The

results show substantial reduction in solve time when outsourced to the cloud.

Random walk based in-network computation of arbitrary functions

Iqra Altaf Gillani , Pooja Vyavahare , Amitabha Bagchi Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Networking and Internet Architecture (cs.NI)

We study the in-network computation of arbitrary functions whose computation

schema is a complete binary tree, i.e., we assume that the network wants to

compute a function of (K) operands, each of which is available at a distinct

node in the network, and rather than simply collecting the (K) operands at a

single sink node and computing the function, we want to compute the function

during the process of moving the data towards the sink. Such settings have been

studied in the literature but largely only for symmetric functions, e.g.

average, parity etc., which have the specific property that the output is

invariant to permutations of the operands. To the best of our knowledge, we

present the first decentralised algorithms for arbitrary functions. We propose

two algorithms, Fixed Metropolis-Compute and Flexible Metropolis-Compute, for

this problem, both of which use random walks on the network as their basic

primitive. Assuming that time is slotted we provide upper bounds on time taken

to compute the function, characterising this time in terms of the fundamental

parameters of the random walk on a graph: the hitting time in the case of Fixed

Metropolis-Compute, and the mixing time in the case of Flexible

Metropolis-Compute. Assuming a stochastic model for the generation of streams

of data at each source, we also provide lower and upper bound on the rate at

which Fixed Metropolis-Compute is able to compute the stream of associated

function values.

System Modeling in the COSMA Environment

Wiktor B. Daszczuk , Waldemar Grabski , Jerzy Mieścicki , Jacek Wytrębowicz

Comments: 6 pages, 3 figures, 1 table

Journal-ref: Proc. Euromicro Symposium on Digital Systems Design –

Architectures, Methods and Tools, September 4-6 2001, Warsaw, Poland, pp.

152-157

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

The aim of this paper is to demonstrate how the COSMA environment can be used

for system modeling. This environment is a set of tools based on Concurrent

State Machines paradigm and is developed in the Institute of Computer Science

at the Warsaw University of Technology. Our demonstration example is a

distributed brake control system dedicated for a railway transport. The paper

shortly introduces COSMA. Next it shows how the example model can be validated

by our temporal logic analyzer.

Leader Election in Trees with Customized Advice

Barun Gorain , Andrzej Pelc Subjects : Distributed, Parallel, and Cluster Computing (cs.DC)

Leader election is a basic symmetry breaking problem in distributed

computing. All nodes of a network have to agree on a single node, called the

leader. If the nodes of the network have distinct labels, then agreeing on a

single node means that all nodes have to output the label of the elected

leader.

If the nodes are anonymous, the task of leader election is formulated as

follows: every node of the network must output a simple path starting at it,

which is coded as a sequence of port numbers, such that all these paths end at

a common node, the leader. In this paper, we study deterministic leader

election in anonymous trees.

Our goal is to establish tradeoffs between the allocated time ( au) and the

amount of information that has to be given {em a priori} to the nodes of a

network to enable leader election in time ( au). Following the framework of

{em algorithms with advice}, this information is provided to all nodes at the

start by an oracle knowing the entire tree, in form of binary strings assigned

to all nodes. There are two possible variants of formulating this advice

assignment. Either the strings provided to all nodes are identical, or strings

assigned to different nodes may be potentially different, i.e., advice can be

{em customized}. As opposed to previous papers on leader election with advice,

in this paper we consider the latter option.

The maximum length of all assigned binary strings is called the {em size of

advice}.

For a given time ( au) allocated to leader election, we give upper and lower

bounds on the minimum size of advice sufficient to perform leader election in

time ( au). All our bounds except one pair are tight up to multiplicative

constants, and in this one exceptional case, the gap between the upper and the

lower bound is very small.

Gathering Anonymous, Oblivious Robots on a Grid

Matthias Fischer , Daniel Jung , Friedhelm Meyer auf der Heide Subjects : Distributed, Parallel, and Cluster Computing (cs.DC) ; Multiagent Systems (cs.MA); Robotics (cs.RO)

We consider a swarm of (n) autonomous mobile robots, distributed on a

2-dimensional grid. A basic task for such a swarm is the gathering process: all

robots have to gather at one (not predefined) place. The work in this paper is

motivated by the following insight: On one side, for swarms of robots

distributed in the 2-dimensional Euclidean space, several gathering algorithms

are known for extremely simple robots that are oblivious, have bounded viewing

radius, no compass, and no “flags” to communicate a status to others. On the

other side, in case of the 2-dimensional grid, the only known gathering

algorithms for robots with bounded viewing radius without compass, need to

memorize a constant number of rounds and need flags.

In this paper we contribute the, to the best of our knowledge, first

gathering algorithm on the grid, which works for anonymous, oblivious robots

with bounded viewing range, without any further means of communication and

without any memory. We prove its correctness and an (O(n^2)) time bound. This

time bound matches those of the best known algorithms for the Euclidean plane

mentioned above.

Efficient Resource Allocation in Mass Customization based on Service Oriented Architecture

Ali Vatankhah Barenji , Reza Vatankhah Barenji

Comments: 6

Subjects

:

Distributed, Parallel, and Cluster Computing (cs.DC)

Mass customization explains the phenomenon to provide a unique designed

products and services to all customer by achieving a high process integration

and flexibility. It has been used as a competitive approach by many companies.

Adequate resource implementation in mass customization-particularly in terms of

resource usage, it is therefore important to meet customer’s requirement in

terms effective responsiveness and meeting deadlines, at the same time offering

high scalability. An architecture for solving the resource allocation issue in

mass customized flexible manufacturing system was illustrated, by putting in

place a couple of advance reservation systems and scheduling algorithms for

effective usage of the product.

Training Deep Neural Networks via Optimization Over Graphs

Guoqiang Zhang , W. Bastiaan Kleijn

Comments: 5 pages

Subjects

:

Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC)

In this work, we propose to train a deep neural network by distributed

optimization over a graph. Two nonlinear functions are considered: the

rectified linear unit (ReLU) and a linear unit with both lower and upper

cutoffs (DCutLU). The problem reformulation over a graph is realized by

explicitly representing ReLU or DCutLU using a set of slack variables. We then

apply the alternating direction method of multipliers (ADMM) to update the

weights of the network layerwise by solving subproblems of the reformulated

problem. Empirical results suggest that by proper parameter selection, the

ADMM- based method converges considerably faster than gradient descent method.

Learning

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

Akosua Busia , Navdeep Jaitly

Comments: 11 pages, 3 figures, 4 tables, submitted to ISMB/ECCB 2017

Subjects

:

Learning (cs.LG)

; Biomolecules (q-bio.BM)

Recently developed deep learning techniques have significantly improved the

accuracy of various speech and image recognition systems. In this paper we show

how to adapt some of these techniques to create a novel chained convolutional

architecture with next-step conditioning for improving performance on protein

sequence prediction problems. We explore its value by demonstrating its ability

to improve performance on eight-class secondary structure prediction. We first

establish a state-of-the-art baseline by adapting recent advances in

convolutional neural networks which were developed for vision tasks. This model

achieves 70.0% per amino acid accuracy on the CB513 benchmark dataset without

use of standard performance-boosting techniques such as ensembling or multitask

learning. We then improve upon this state-of-the-art result using a novel

chained prediction approach which frames the secondary structure prediction as

a next-step prediction problem. This sequential model achieves 70.3% Q8

accuracy on CB513 with a single model; an ensemble of these models produces

71.4% Q8 accuracy on the same test set, improving upon the previous overall

state of the art for the eight-class secondary structure problem. Our models

are implemented using TensorFlow, an open-source machine learning software

library available at TensorFlow.org; we aim to release the code for these

experiments as part of the TensorFlow repository.

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Maxim Raginsky , Alexander Rakhlin , Matus Telgarsky

Comments: 29 pages

Subjects

:

Learning (cs.LG)

; Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)

Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of

Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is

added to an unbiased estimate of the gradient at each iteration. This modest

change allows SGLD to escape local minima and suffices to guarantee asymptotic

convergence to global minimizers for sufficiently regular non-convex objectives

(Gelfand and Mitter, 1991).

The present work provides a nonasymptotic analysis in the context of

non-convex learning problems: SGLD requires ( ilde{O}(varepsilon^{-4}))

iterations to sample ( ilde{O}(varepsilon))-approximate minimizers of both

empirical and population risk, where ( ilde{O}(cdot)) hides polynomial

dependence on a temperature parameter, the model dimension, and a certain

spectral gap parameter.

As in the asymptotic setting, our analysis relates the discrete-time SGLD

Markov chain to a continuous-time diffusion process. A new tool that drives the

results is the use of weighted transportation cost inequalities to quantify the

rate of convergence of SGLD to a stationary distribution in the Euclidean

(2)-Wasserstein distance.

Is Big Data Sufficient for a Reliable Detection of Non-Technical Losses?

Patrick Glauner , Angelo Migliosi , Jorge Meira , Eric Aislan Antonelo , Petko Valtchev , Radu State , Franck Bettinger Subjects : Learning (cs.LG) ; Artificial Intelligence (cs.AI)

Non-technical losses (NTL) occur during the distribution of electricity in

power grids and include, but are not limited to, electricity theft and faulty

meters. In emerging countries, they may range up to 40% of the total

electricity distributed. In order to detect NTLs, machine learning methods are

used that learn irregular consumption patterns from customer data and

inspection results. The Big Data paradigm followed in modern machine learning

reflects the desire of deriving better conclusions from simply analyzing more

data, without the necessity of looking at theory and models. However, the

sample of inspected customers may be biased, i.e. it does not represent the

population of all customers. As a consequence, machine learning models trained

on these inspection results are biased as well and therefore lead to unreliable

predictions of whether customers cause NTL or not. In machine learning, this

issue is called covariate shift and has not been addressed in the literature on

NTL detection yet. In this work, we present a novel framework for quantifying

and visualizing covariate shift. We apply it to a commercial data set from

Brazil that consists of 3.6M customers and 820K inspection results. We show

that some features have a stronger covariate shift than others, making

predictions less reliable. In particular, previous inspections were focused on

certain neighborhoods or customer classes and that they were not sufficiently

spread among the population of customers. This framework is about to be

deployed in a commercial product for NTL detection.

Coresets for Kernel Regression

Yan Zheng , Jeff M. Phillips

Comments: 11 pages, 20 figures

Subjects

:

Learning (cs.LG)

; Data Structures and Algorithms (cs.DS)

Kernel regression is an essential and ubiquitous tool for non-parametric data

analysis, particularly popular among time series and spatial data. However, the

central operation which is performed many times, evaluating a kernel on the

data set, takes linear time. This is impractical for modern large data sets.

In this paper we describe coresets for kernel regression: compressed data

sets which can be used as proxy for the original data and have provably bounded

worst case error. The size of the coresets are independent of the raw number of

data points, rather they only depend on the error guarantee, and in some cases

the size of domain and amount of smoothing. We evaluate our methods on very

large time series and spatial data, and demonstrate that they incur negligible

error, can be constructed extremely efficiently, and allow for great

computational gains.

A Multi-model Combination Approach for Probabilistic Wind Power Forecasting

You Lin , Ming Yang , Can Wan , Jianhui Wang , Yonghua Song Subjects : Learning (cs.LG) ; Applications (stat.AP)

Short-term probabilistic wind power forecasting can provide critical

quantified uncertainty information of wind generation for power system

operation and control. As the complicated characteristics of wind power

prediction error, it would be difficult to develop a universal forecasting

model dominating over other alternative models. Therefore, a novel multi-model

combination (MMC) approach for short-term probabilistic wind generation

forecasting is proposed in this paper to exploit the advantages of different

forecasting models. The proposed approach can combine different forecasting

models those provide different kinds of probability density functions to

improve the probabilistic forecast accuracy. Three probabilistic forecasting

models based on the sparse Bayesian learning, kernel density estimation and

beta distribution fitting are used to form the combined model. The parameters

of the MMC model are solved based on Bayesian framework. Numerical tests

illustrate the effectiveness of the proposed MMC approach.

Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection

Lijie Chen , Jian Li , Mingda Qiao

Comments: Accepted by AISTATS 2017

Subjects

:

Learning (cs.LG)

; Data Structures and Algorithms (cs.DS)

In the Best-(k)-Arm problem, we are given (n) stochastic bandit arms, each

associated with an unknown reward distribution. We are required to identify the

(k) arms with the largest means by taking as few samples as possible. In this

paper, we make progress towards a complete characterization of the

instance-wise sample complexity bounds for the Best-(k)-Arm problem. On the

lower bound side, we obtain a novel complexity term to measure the sample

complexity that every Best-(k)-Arm instance requires. This is derived by an

interesting and nontrivial reduction from the Best-(1)-Arm problem. We also

provide an elimination-based algorithm that matches the instance-wise lower

bound within doubly-logarithmic factors. The sample complexity of our algorithm

strictly dominates the state-of-the-art for Best-(k)-Arm (module constant

factors).

Concept Drift Adaptation by Exploiting Historical Knowledge

Yu Sun , Ke Tang , Zexuan Zhu , Xin Yao

Comments: First version

Subjects

:

Learning (cs.LG)

Incremental learning with concept drift has often been tackled by ensemble

methods, where models built in the past can be re-trained to attain new models

for the current data. Two design questions need to be addressed in developing

ensemble methods for incremental learning with concept drift, i.e., which

historical (i.e., previously trained) models should be preserved and how to

utilize them. A novel ensemble learning method, namely Diversity and Transfer

based Ensemble Learning (DTEL), is proposed in this paper. Given newly arrived

data, DTEL uses each preserved historical model as an initial model and further

trains it with the new data via transfer learning. Furthermore, DTEL preserves

a diverse set of historical models, rather than a set of historical models that

are merely accurate in terms of classification accuracy. Empirical studies on

15 synthetic data streams and 4 real-world data streams (all with concept

drifts) demonstrate that DTEL can handle concept drift more effectively than 4

other state-of-the-art methods.

Training Deep Neural Networks via Optimization Over Graphs

Guoqiang Zhang , W. Bastiaan Kleijn

Comments: 5 pages

Subjects

:

Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC)

In this work, we propose to train a deep neural network by distributed

optimization over a graph. Two nonlinear functions are considered: the

rectified linear unit (ReLU) and a linear unit with both lower and upper

cutoffs (DCutLU). The problem reformulation over a graph is realized by

explicitly representing ReLU or DCutLU using a set of slack variables. We then

apply the alternating direction method of multipliers (ADMM) to update the

weights of the network layerwise by solving subproblems of the reformulated

problem. Empirical results suggest that by proper parameter selection, the

ADMM- based method converges considerably faster than gradient descent method.

Generative Mixture of Networks

Ershad Banijamali , Ali Ghodsi , Pascal Poupart

Comments: 9 pages

Subjects

:

Learning (cs.LG)

; Machine Learning (stat.ML)

A generative model based on training deep architectures is proposed. The

model consists of K networks that are trained together to learn the underlying

distribution of a given data set. The process starts with dividing the input

data into K clusters and feeding each of them into a separate network. After

few iterations of training networks separately, we use an EM-like algorithm to

train the networks together and update the clusters of the data. We call this

model Mixture of Networks. The provided model is a platform that can be used

for any deep structure and be trained by any conventional objective function

for distribution modeling. As the components of the model are neural networks,

it has high capability in characterizing complicated data distributions as well

as clustering data. We apply the algorithm on MNIST hand-written digits and

Yale face datasets. We also demonstrate the clustering ability of the model

using some real-world and toy examples.

Cognitive Mapping and Planning for Visual Navigation

Saurabh Gupta , James Davidson , Sergey Levine , Rahul Sukthankar , Jitendra Malik

Comments: Under review for CVPR 2017. Project webpage: this https URL

Subjects

:

Computer Vision and Pattern Recognition (cs.CV)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO)

We introduce a neural architecture for navigation in novel environments. Our

proposed architecture learns to map from first-person viewpoints and plans a

sequence of actions towards goals in the environment. The Cognitive Mapper and

Planner (CMP) is based on two key ideas: a) a unified joint architecture for

mapping and planning, such that the mapping is driven by the needs of the

planner, and b) a spatial memory with the ability to plan given an incomplete

set of observations about the world. CMP constructs a top-down belief map of

the world and applies a differentiable neural net planner to produce the next

action at each time step. The accumulated belief of the world enables the agent

to track visited regions of the environment. Our experiments demonstrate that

CMP outperforms both reactive strategies and standard memory-based

architectures and performs well in novel environments. Furthermore, we show

that CMP can also achieve semantically specified goals, such as ‘go to a

chair’.

DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Hong Yu , Zheng-Hua Tan , Zhanyu Ma , Jun Guo Subjects : Sound (cs.SD) ; Cryptography and Security (cs.CR); Learning (cs.LG)

With the development of speech synthesis techniques, automatic speaker

verification systems face the serious challenge of spoofing attack. In order to

improve the reliability of speaker verification systems, we develop a new

filter bank based cepstral feature, deep neural network filter bank cepstral

coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The

deep neural network filter bank is automatically generated by training a filter

bank neural network (FBNN) using natural and synthetic speech. By adding

restrictions on the training rules, the learned weight matrix of FBNN is

band-limited and sorted by frequency, similar to the normal filter bank. Unlike

the manually designed filter bank, the learned filter bank has different filter

shapes in different channels, which can capture the differences between natural

and synthetic speech more effectively. The experimental results on the ASVspoof

{2015} database show that the Gaussian mixture model maximum-likelihood

(GMM-ML) classifier trained by the new feature performs better than the

state-of-the-art linear frequency cepstral coefficients (LFCC) based

classifier, especially on detecting unknown attacks.

Similarity Preserving Representation Learning for Time Series Analysis

Qi Lei , Jinfeng Yi , Roman Vaculin , Lingfei Wu , Inderjit S. Dhillon Subjects : Artificial Intelligence (cs.AI) ; Learning (cs.LG)

A considerable amount of machine learning algorithms take matrices as their

inputs. As such, they cannot directly analyze time series data due to its

temporal nature, usually unequal lengths, and complex properties. This is a

great pity since many of these algorithms are effective, robust, efficient, and

easy to use. In this paper, we bridge this gap by proposing an efficient

representation learning framework that is able to convert a set of time series

with equal or unequal lengths to a matrix format. In particular, we guarantee

that the pairwise similarities between time series are well preserved after the

transformation. Therefore, the learned feature representation is particularly

suitable to the class of learning problems that are sensitive to data

similarities. Given a set of (n) time series, we first construct an (n imes n)

partially observed similarity matrix by randomly sampling (O(n log n)) pairs

of time series and computing their pairwise similarities. We then propose an

extremely efficient algorithm that solves a highly non-convex and NP-hard

problem to learn new features based on the partially observed similarity

matrix. We use the learned features to conduct experiments on both data

classification and clustering tasks. Our extensive experimental results

demonstrate that the proposed framework is both effective and efficient.

Enabling Robots to Communicate their Objectives

Sandy H. Huang , David Held , Pieter Abbeel , Anca D. Dragan Subjects : Robotics (cs.RO) ; Learning (cs.LG)

Our ultimate goal is to efficiently enable end-users to correctly anticipate

a robot’s behavior in novel situations. This behavior is often a direct result

of the robot’s underlying objective function. Our insight is that end-users

need to have an accurate mental model of this objective function in order to

understand and predict what the robot will do. While people naturally develop

such a mental model over time through observing the robot act, this

familiarization process may be lengthy. Our approach reduces this time by

having the robot model how people infer objectives from observed behavior, and

then selecting those behaviors that are maximally informative. The problem of

computing a posterior over objectives from observed behavior is known as

Inverse Reinforcement Learning (IRL), and has been applied to robots learning

human objectives. We consider the problem where the roles of human and robot

are swapped. Our main contribution is to recognize that unlike robots, humans

will not be emph{exact} in their IRL inference. We thus introduce two factors

to define candidate approximate-inference models for human learning in this

setting, and analyze them in a user study in the autonomous driving domain. We

show that certain approximate-inference models lead to the robot generating

example behaviors that better enable users to anticipate what the robot will do

in test situations. Our results also suggest, however, that additional research

is needed in modeling how humans extrapolate from examples of robot behavior.

A Collective, Probabilistic Approach to Schema Mapping: Appendix

Angelika Kimmig , Alex Memory , Renee J. Miller , Lise Getoor

Comments: This is the appendix to the paper “A Collective, Probabilistic Approach to Schema Mapping” accepted to ICDE 2017

Subjects

:

Databases (cs.DB)

; Learning (cs.LG)

In this appendix we provide additional supplementary material to “A

Collective, Probabilistic Approach to Schema Mapping.” We include an additional

extended example, supplementary experiment details, and proof for the

complexity result stated in the main paper.

Parallel Long Short-Term Memory for Multi-stream Classification

Mohamed Bouaziz , Mohamed Morchid , Richard Dufour , Georges Linarès , Renato De Mori

Comments: 2016 IEEE Workshop on Spoken Language Technology

Subjects

:

Computation and Language (cs.CL)

; Learning (cs.LG)

Recently, machine learning methods have provided a broad spectrum of original

and efficient algorithms based on Deep Neural Networks (DNN) to automatically

predict an outcome with respect to a sequence of inputs. Recurrent hidden cells

allow these DNN-based models to manage long-term dependencies such as Recurrent

Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these

RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM)

directions. But most of the information available nowadays is from multistreams

or multimedia documents, and require RNNs to process these information

synchronously during the training. This paper presents an original LSTM-based

architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel

synchronized input sequences in order to predict a common output. The proposed

PLSTM method could be used for parallel sequence classification purposes. The

PLSTM approach is evaluated on an automatic telecast genre sequences

classification task and compared with different state-of-the-art architectures.

Results show that the proposed PLSTM method outperforms the baseline n-gram

models as well as the state-of-the-art LSTM approach.

Batch Policy Gradient Methods for Improving Neural Conversation Models

Kirthevasan Kandasamy , Yoram Bachrach , Ryota Tomioka , Daniel Tarlow , David Carter

Comments: International Conference on Learning Representations (ICLR) 2017

Subjects

:

Machine Learning (stat.ML)

; Learning (cs.LG)

We study reinforcement learning of chatbots with recurrent neural network

architectures when the rewards are noisy and expensive to obtain. For instance,

a chatbot used in automated customer service support can be scored by quality

assurance agents, but this process can be expensive, time consuming and noisy.

Previous reinforcement learning work for natural language processing uses

on-policy updates and/or is designed for on-line learning settings. We

demonstrate empirically that such strategies are not appropriate for this

setting and develop an off-policy batch policy gradient method (BPG). We

demonstrate the efficacy of our method via a series of synthetic experiments

and an Amazon Mechanical Turk experiment on a restaurant recommendations

dataset.

Information Theory

On the Transport Capability of LAN Cables

Syed Hassan Raza Naqvi , Andrea Matera , Lorenzo Combi , Umberto Spagnolini Subjects : Information Theory (cs.IT)

Centralized Radio Access Network (C-RAN) architecture is the only viable

solution to handle the complex interference scenario generated by massive

antennas and small cells deployment as required by next generation (5G) mobile

networks. In conventional C-RAN, the fronthaul links used to exchange the

signal between Base Band Units (BBUs) and Remote Antenna Units (RAUs) are based

on digital baseband (BB) signals over optical fibers due to the huge bandwidth

required. In this paper we evaluate the transport capability of copper-based

all-analog fronthaul architecture called Radio over Copper (RoC) that leverages

on the pre-existing LAN cables that are already deployed in buildings and

enterprises. In particular, the main contribution of the paper is to evaluate

the number of independent BB signals for multiple antennas system that can be

transported over multi-pair Cat-5/6/7 cables under a predefined fronthauling

transparency condition in terms of maximum BB signal degradation. The MIMO-RoC

proves to be a complementary solution to optical fiber for the last 200m toward

the RAUs, mostly to reuse the existing LAN cables and to power-supply the RAUs

over the same cable.

A Study on the Impact of Locality in the Decoding of Binary Cyclic Codes

M. Nikhil Krishnan , Bhagyashree Puranik , P. Vijay Kumar , Itzhak Tamo , Alexander Barg

Comments: Extended version of a paper submitted to ISIT 2017

Subjects

:

Information Theory (cs.IT)

In this paper, we study the impact of locality on the decoding of binary

cyclic codes under two approaches, namely ordered statistics decoding (OSD) and

trellis decoding. Given a binary cyclic code having locality or availability,

we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise

ratio, for a given reliability and essentially the same level of decoder

complexity. With regard to trellis decoding, we show that careful introduction

of locality results in the creation of cyclic subcodes having lower maximum

state complexity. We also present a simple upper-bounding technique on the

state complexity profile, based on the zeros of the code. Finally, it is shown

how the decoding speed can be significantly increased in the presence of

locality, in the moderate-to-high SNR regime, by making use of a quick-look

decoder that often returns the ML codeword.

Stealthy Secret Key Generation

Pin-Hsun Lin , Carsten Rudolf Janda , Eduard Axel Jorswieck

Comments: 22 pages, 1 figure, submitted to ISIT 2017

Subjects

:

Information Theory (cs.IT)

In this work, we consider a complete covert communication system, which

includes the source-model of a stealthy secret key generation (SSKG) as the

first phase. The generated key will be used for the covert communication in the

second phase of the current round and also in the first phase of the next

round. We investigate the stealthy SK rate performance of the first phase. The

derived results show that the SK capacity lower and upper bounds of the

source-model SKG are not affected by the additional stealth constraint. This

result implies that we can attain the SSKG capacity for free when the sequences

observed by the three terminals Alice ((X^n)), Bob ((Y^n)) and Willie ((Z^n))

follow a Markov chain relationship, i.e., (X^n-Y^n-Z^n). We then prove that the

sufficient condition to attain both, the SK capacity as well as the SSK

capacity, can be relaxed from physical to stochastic degradedness. In order to

underline the practical relevance, we also derive a sufficient condition to

attain the degradedness by the usual stochastic order for Maurer’s fast fading

Gaussian (satellite) model for the source of common randomness.

On Muting Mobile Terminals for Uplink Interference Mitigation in HetNets — System-Level Analysis via Stochastic Geometry

F. J. Martin-Vega , M. C. Aguayo-Torres , G. Gomez , M. Di Renzo

Comments: 16 pages, 12 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects

:

Information Theory (cs.IT)

We investigate the performance of a scheduling algorithm where the Mobile

Terminals (MTs) may be turned off if they cause a level of interference greater

than a given threshold. This approach, which is referred to as Interference

Aware Muting (IAM), may be regarded as an interference-aware scheme that is

aimed to reduce the level of interference. We analyze its performance with the

aid of stochastic geometry and compare it against other interference-unaware

and interference-aware schemes, where the level of interference is kept under

control in the power control scheme itself rather than in the scheduling

process. IAM is studied in terms of average transmit power, mean and variance

of the interference, coverage probability, Spectral Efficiency (SE), and Binary

Rate (BR), which accounts for the amount of resources allocated to the typical

MT. Simplified expressions of SE and BR for adaptive modulation and coding

schemes are proposed, which better characterize practical communication

systems. Our system-level analysis unveils that IAM increases the BR and

reduces the mean and variance of the interference. It is proved that an

operating regime exists, where the performance of IAM is independent of the

cell association criterion, which simplifies the joint design of uplink and

downlink transmissions.

On the Energy/Distortion Tradeoff in the IoT

Alessandro Biason , Chiara Pielli , Andrea Zanella , Michele Zorzi

Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Wireless Communications, Nov. 2016

Subjects

:

Information Theory (cs.IT)

The Internet of Things paradigm envisages the presence of many

battery-powered sensors and this entails the design of energy-aware protocols.

Source coding techniques allow to save some energy by compressing the packets

sent over the network, but at the cost of a poorer accuracy in the

representation of the data. This paper addresses the problem of designing

efficient policies to jointly perform processing and transmission tasks. In

particular, we aim at defining an optimal scheduling strategy with the twofold

ultimate goal of extending the network lifetime and guaranteeing a low overall

distortion of the transmitted data. We propose a Time Division Multiple Access

(TDMA)-based access scheme that optimally allocates resources to heterogeneous

nodes. We use realistic rate-distortion curves to quantify the impact of

compression on the data quality and propose a complete energy model that

includes the energy spent for processing and transmitting the data, as well as

the circuitry costs. Both full knowledge and statistical knowledge of the

wireless channels are considered, and optimal policies are derived for both

cases. The overall problem is structured in a modular fashion and solved

through convex and alternate programming techniques. Finally, we thoroughly

evaluate the proposed algorithms and the influence of the design variables on

the system performance adopting parameters of real sensors.

Unified Analysis of SWIPT Relay Networks with Noncoherent Modulation

Lina Mohjazi , Sami Muhaidat , Mehrdad Dianati , Mahmoud Al-Qutayri Subjects : Information Theory (cs.IT)

Simultaneous wireless information and power transfer (SWIPT) relay networks

represent a paradigm shift in the development of wireless networks, enabling

simultaneous radio frequency (RF) energy harvesting (EH) and information

processing. Different from conventional SWIPT relaying schemes, which typically

assume the availability of perfect channel state information (CSI), here we

consider the application of noncoherent modulation in order to avoid the need

of instantaneous CSI estimation/tracking and minimise the energy consumption.

We propose a unified and comprehensive analytical framework for the analysis of

time switching (TS) and power splitting (PS) receiver architectures with the

amplify-and-forward (AF) protocol. In particular, we adopt a moments-based

approach to derive novel expressions for the outage probability, achievable

throughput, and average symbol error rate (ASER) of the considered SWIPT

system. We quantify the impact of several system parameters, involving relay

location, energy conversion efficiency, and TS and PS ratio assumptions,

imposed on the EH relay terminal. Our results reveal that the throughput

performance of the TS protocol is superior to that of the PS protocol at lower

receive signal-to-noise (SNR) values, which is in contrast to the

point-to-point SWIPT systems. An extensive Monte Carlo simulation study is

presented to corroborate the proposed analysis.

Information and estimation in Fokker-Planck channels

Andre Wibisono , Varun Jog , Po-Ling Loh Subjects : Information Theory (cs.IT) ; Statistics Theory (math.ST)

We study the relationship between information- and estimation-theoretic

quantities in time-evolving systems. We focus on the Fokker-Planck channel

defined by a general stochastic differential equation, and show that the time

derivatives of entropy, KL divergence, and mutual information are characterized

by estimation-theoretic quantities involving an appropriate generalization of

the Fisher information. Our results vastly extend De Bruijn’s identity and the

classical I-MMSE relation.

On the Capacity of Signal Dependent Noise Channels

Hamid Ghourchian , Gholamali Aminian , Amin Gohari , Mahtab Mirmohseni , Masoumeh Nasiri-Kenari

Comments: 32 pages, 2 figures

Subjects

:

Information Theory (cs.IT)

In some applications, the variance of measurement noise depends on the signal

that we aim to measure. For instance, additive Gaussian signal-dependent noise

(AGSDN) channel models are used in molecular and optical communication. Herein

we provide lower and upper bounds on the capacity of additive signal dependent

noise (ASDN) channels. We also provide sufficient conditions under which the

capacity becomes infinity.

Subspace-Aware Index Codes

Bhavya Kailkhura , Lakshmi Narasimhan Theagarajan , Pramod K. Varshney Subjects : Information Theory (cs.IT)

In this letter, we extend the well-known index coding problem to exploit the

structure in the source-data to improve system throughput. In many

applications, the data to be transmitted may lie (or can be well approximated)

in a low-dimensional subspace. We exploit this low-dimensional structure of the

data using an algebraic framework to solve the index coding problem (referred

to as subspace-aware index coding) as opposed to the traditional index coding

problem which is subspace-unaware. Also, we propose an efficient algorithm

based on the alternating minimization approach to obtain near optimal index

codes for both subspace-aware and -unaware cases. Our simulations indicate that

under certain conditions, a significant throughput gain (about 90%) can be

achieved by subspace-aware index codes over conventional subspace-unaware index

codes.

On the capacity of bandlimited optical intensity channels with Gaussian noise

Jing Zhou , Wenyi Zhang

Comments: 13 pages, 9 figures, 4 tables

Subjects

:

Information Theory (cs.IT)

We determine lower and upper bounds on the capacity of bandlimited optical

intensity channels (BLOIC) with white Gaussian noise. Three types of input

power constraints are considered: 1) only an average power constraint, 2) only

a peak power constraint, and 3) an average and a peak power constraint.

Capacity lower bounds are derived by a two-step process including 1) for each

type of constraint, designing admissible pulse amplitude modulated input

waveform ensembles, and 2) lower bounding the maximum achievable information

rates of the designed input ensembles. Capacity upper bounds are derived by

exercising constraint relaxations and utilizing known results on discrete-time

optical intensity channels. We obtain degrees-of-freedom-optimal (DOF-optimal)

lower bounds which have the same pre-log factor as the upper bounds, thereby

characterizing the high SNR capacity of BLOIC to within a finite gap. We

further derive intersymbol-interference-free (ISI-free) signaling based lower

bounds, which perform well for all practical SNR values. In particular, the

ISI-free signaling based lower bounds outperform the DOF-optimal lower bound

when the SNR is below 10 dB.

Sense-and-Predict: Opportunistic MAC Based on Spatial Interference Correlation for Cognitive Radio Networks

Jeemin Kim , Seung-Woo Ko , Han Cha , Seong-Lyun Kim Subjects : Information Theory (cs.IT)

Opportunity detection at secondary transmitters (TXs) is a key technique

enabling cognitive radio (CR) networks. Such detection however cannot guarantee

reliable communication at secondary receivers (RXs), especially when their

association distance is long. To cope with the issue, this paper proposes a

novel MAC called sense-and-predict (SaP), where each secondary TX decides

whether to access or not based on the prediction of the interference level at

RX. Firstly, we provide the spatial interference correlation in a probabilistic

form using stochastic geometry, and utilize it to maximize the area spectral

efficiency (ASE) for secondary networks while guaranteeing the service quality

of primary networks. Through simulations and testbed experiments using USRP,

SaP is shown to always achieve ASE improvement compared with the conventional

TX based sensing.

Classical-Quantum Arbitrarily Varying Wiretap Channel: Secret Message Transmission under Jamming Attacks

Holger Boche , Minglai Cai , Christian Deppe , Janis Nötzel Subjects : Information Theory (cs.IT)

We analyze arbitrarily varying classical-quantum wiretap channels.These

channels are subject to two attacks at the same time: one passive

(eavesdropping), and one active (jamming). We progress on previous works by

introducing a reduced class of allowed codes that fulfills a more stringent

secrecy requirement than earlier definitions. In addition, we prove that

non-symmetrizability of the legal link is suficient for equality of the

deterministic and the common randomness assisted secrecy capacities. At last,

we focus on analytic properties of both secrecy capacities: We completely

characterize their discontinuity points, and their super-activation properties.

1-bit Massive MU-MIMO Precoding in VLSI

Oscar Castañeda , Sven Jacobsson , Giuseppe Durisi , Mikael Coldrey , Tom Goldstein , Christoph Studer

Comments: 13 pages, 5 figures, 2 tables; submitted to a journal

Subjects

:

Information Theory (cs.IT)

; Hardware Architecture (cs.AR)

Massive multiuser (MU) multiple-input multiple-output (MIMO) will be a core

technology in fifth-generation (5G) wireless systems as it offers significant

improvements in spectral efficiency compared to existing multi-antenna

technologies. The presence of hundreds of antenna elements at the base station

(BS), however, results in excessively high hardware costs and power

consumption, and requires high interconnect throughput between the

baseband-processing unit and the radio unit. Massive MU-MIMO that uses

low-resolution analog-to-digital and digital-to-analog converters (DACs) has

the potential to address all these issues. In this paper, we focus on downlink

precoding for massive MU-MIMO systems with 1-bit DACs at the BS. The objective

is to design precoders that simultaneously mitigate multi-user interference

(MUI) and quantization artifacts. We propose two nonlinear 1-bit precoding

algorithms and corresponding very-large scale integration (VLSI) designs. Our

algorithms rely on biconvex relaxation, which enables the design of efficient

1-bit precoding algorithms that achieve superior error-rate performance

compared to that of linear precoding algorithms followed by quantization. To

showcase the efficacy of our algorithms, we design VLSI architectures that

enable efficient 1-bit precoding for massive MU-MIMO systems in which hundreds

of antennas serve tens of user equipments. We present corresponding

field-programmable gate array (FPGA) implementations to demonstrate that 1-bit

precoding enables reliable and high-rate downlink data transmission in

practical systems.

On the Global-Local Dichotomy in Sparsity Modeling

Dmitry Batenkov , Yaniv Romano , Michael Elad Subjects : Information Theory (cs.IT) ; Machine Learning (stat.ML)

The traditional sparse modeling approach, when applied to inverse problems

with large data such as images, essentially assumes a sparse model for small

overlapping data patches. While producing state-of-the-art results, this

methodology is suboptimal, as it does not attempt to model the entire global

signal in any meaningful way – a nontrivial task by itself. In this paper we

propose a way to bridge this theoretical gap by constructing a global model

from the bottom up. Given local sparsity assumptions in a dictionary, we show

that the global signal representation must satisfy a constrained

underdetermined system of linear equations, which can be solved efficiently by

modern optimization methods such as Alternating Direction Method of Multipliers

(ADMM). We investigate conditions for unique and stable recovery, and provide

numerical evidence corroborating the theory.

The Connectivity of Millimeter-Wave Networks in Urban Environments Modeled Using Random Lattices

Kaifeng Han , Kaibin Huang , Ying Cui , Yueping Wu

Comments: 28 pages, 10 figures, submitted to IEEE Trans. on Wireless Communications

Subjects

:

Information Theory (cs.IT)

Millimeter-wave (mmWave) communication opens up tens of giga-hertz (GHz)

spectrum in the mmWave band for use by next-generation wireless systems,

thereby solving the problem of spectrum scarcity. Maintaining connectivity

stands out to be a key design challenge for mmWave networks deployed in urban

regions due to the blockage effect characterizing mmWave propagation.

Specifically, mmWave signals can be blocked by buildings and other large urban

objects. In this paper, we set out to investigate the blockage effect on the

connectivity of mmWave networks in a Manhattan-type urban region modeled using

a random regular lattice while base stations (BSs) are Poisson distributed in

the plane. In particular, we analyze the connectivity probability that a

typical user is within the transmission range of a BS and connected by a

line-of-sight. Using random lattice and stochastic geometry theories, different

lower bounds on the connectivity probability are derived as functions of the

buildings’ size and probability of a lattice cell being occupied by a building

as well as BS density and transmission range. The asymptotic connectivity

probability is also derived for cases of dense buildings. Last, the results are

extended to heterogeneous networks. Our study yields closed-form relations

between the parameters of the building process and the BS process, providing

useful guidelines for practical mmWave network deployment.

Joint Precoding and RRH selection for User-centric Green MIMO C-RAN

Cunhua Pan , Huiling Zhu , Nathan J. Gomes , Jiangzhou Wang

Comments: This work has been accepted in IEEE TWC

Subjects

:

Information Theory (cs.IT)

This paper jointly optimizes the precoding matrices and the set of active

remote radio heads (RRHs) to minimize the network power consumption (NPC) for a

user-centric cloud radio access network (C-RAN), where both the RRHs and users

have multiple antennas and each user is served by its nearby RRHs. Both users’

rate requirements and per-RRH power constraints are considered. Due to these

conflicting constraints, this optimization problem may be infeasible. In this

paper, we propose to solve this problem in two stages. In Stage I, a

low-complexity user selection algorithm is proposed to find the largest subset

of feasible users. In Stage II, a low-complexity algorithm is proposed to solve

the optimization problem with the users selected from Stage I. Specifically,

the re-weighted (l_1)-norm minimization method is used to transform the

original problem with non-smooth objective function into a series of weighted

power minimization (WPM) problems, each of which can be solved by the weighted

minimum mean square error (WMMSE) method. The solution obtained by the WMMSE

method is proved to satisfy the Karush-Kuhn-Tucker (KKT) conditions of the WPM

problem. Moreover, a low-complexity algorithm based on Newton’s method and the

gradient descent method is developed to update the precoder matrices in each

iteration of the WMMSE method. Simulation results demonstrate the rapid

convergence of the proposed algorithms and the benefits of equipping multiple

antennas at the user side. Moreover, the proposed algorithm is shown to achieve

near-optimal performance in terms of NPC.

Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization

Jianze Li , Konstantin Usevich , Pierre Comon

Comments: 19 pages, 5 figures

Subjects

:

Numerical Analysis (math.NA)

; Information Theory (cs.IT); Optimization and Control (math.OC)

In this paper, we consider a family of Jacobi-type algorithms for

simultaneous orthogonal diagonalization problem of symmetric tensors. For the

Jacobi-based algorithm of [SIAM J. Matrix Anal. Appl., 2(34):651–672, 2013],

we prove its global convergence for simultaneous orthogonal diagonalization of

symmetric matrices and 3rd-order tensors. We also propose a new Jacobi-based

algorithm in the general setting and prove its global convergence for a wide

range of tensor problems (including Tucker approximation).

欢迎加入我爱机器学习QQ9群:173718917

arXiv Paper Daily: Tue, 14 Feb 2017

微信扫一扫,关注我爱机器学习公众号

微博:我爱机器学习

原文  https://www.52ml.net/21531.html
正文到此结束
Loading...