## Neural and Evolutionary Computing

### Neural Trajectory Analysis of Recurrent Neural Network In Handwriting Synthesis

Osamu Shouno

Comments: 4 pages, 3 figures

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

Recurrent neural networks (RNNs) are capable of learning to generate highly

realistic, online handwritings in a wide variety of styles from a given text

sequence. Furthermore, the networks can generate handwritings in the style of a

particular writer when the network states are primed with a real sequence of

pen movements from the writer. However, how populations of neurons in the RNN

collectively achieve such performance still remains poorly understood. To

tackle this problem, we investigated learned representations in RNNs by

extracting low-dimensional, neural trajectories that summarize the activity of

a population of neurons in the network during individual syntheses of

handwritings. The neural trajectories show that different writing styles are

encoded in different subspaces inside an internal space of the network. Within

each subspace, different characters of the same style are represented as

different state dynamics. These results demonstrate the effectiveness of

analyzing the neural trajectory for intuitive understanding of how the RNNs

work.

### The unreasonable effectiveness of the forget gate

Joan Lasenby

Comments: 15 pages, 5 figures

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

; Learning (cs.LG); Machine Learning (stat.ML)

Given the success of the gated recurrent unit, a natural question is whether

all the gates of the long short-term memory (LSTM) network are necessary.

Previous research has shown that the forget gate is one of the most important

gates in the LSTM. Here we show that a forget-gate-only version of the LSTM

with chrono-initialized biases, not only provides computational savings but

outperforms the standard LSTM on multiple benchmark datasets and competes with

some of the best contemporary models. Our proposed network, the JANET, achieves

accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the

standard LSTM which yields accuracies of 98.5% and 91%.

Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

Peter L. Bartlett , Steven N. Evans , Philip M. Long **Subjects** : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)

We show that any smooth bi-Lipschitz (h) can be represented exactly as a

composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close

to the identity in the sense that each (left(h_i-mathrm{Id}

ight)) is

Lipschitz, and the Lipschitz constant decreases inversely with the number (m)

of functions composed. This implies that (h) can be represented to any accuracy

by a deep residual network whose nonlinear layers compute functions with a

small Lipschitz constant. Next, we consider nonlinear regression with a

composition of near-identity nonlinear maps. We show that, regarding Fr’echet

derivatives with respect to the (h_1,…,h_m), any critical point of a

quadratic criterion in this near-identity region must be a global minimizer. In

contrast, if we consider derivatives with respect to parameters of a fixed-size

residual network with sigmoid activation functions, we show that there are

near-identity critical points that are suboptimal, even in the realizable case.

Informally, this means that functional gradient methods for residual networks

cannot get stuck at suboptimal critical points corresponding to near-identity

layers, whereas parametric gradient methods for sigmoidal residual networks

suffer from suboptimal critical points in the near-identity region.

### μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

Tal Ben-Nun ,

Torsten Hoefler ,

Satoshi Matsuoka

Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )

**Subjects**

:

Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used

in deep learning. Specifically, cuDNN implements several equivalent convolution

algorithms, whose performance and memory footprint may vary considerably,

depending on the layer dimensions. When an algorithm is automatically selected

by cuDNN, the decision is performed on a per-layer basis, and thus it often

resorts to slower algorithms that fit the workspace size constraints. We

present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides

layers’ mini-batch computation into several micro-batches. Based on Dynamic

Programming and Integer Linear Programming, {mu}-cuDNN enables faster

algorithms by decreasing the workspace requirements. At the same time,

{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples

statistical efficiency from the hardware efficiency safely. We demonstrate the

effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,

achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2

GPU. These results indicate that using micro-batches can seamlessly increase

the performance of deep learning, while maintaining the same memory footprint.

### Per-Corpus Configuration of Topic Modelling for GitHub and Stack Overflow Collections

Christoph Treude , Markus Wagner **Subjects** : Computation and Language (cs.CL) ; Neural and Evolutionary Computing (cs.NE)

To make sense of large amounts of textual data, topic modelling is frequently

used as a text-mining tool for the discovery of hidden semantic structures in

text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model

that aims to explain the structure of a corpus by grouping texts. LDA requires

multiple parameters to work well, and there are only rough and sometimes

conflicting guidelines available on how these parameters should be set. In this

paper, we contribute (i) a broad study of parameters to arrive at good local

optima, (ii) an a-posteriori characterisation of text corpora related to eight

programming languages from GitHub and Stack Overflow, and (iii) an analysis of

corpus feature importance via per-corpus LDA configuration.

## Computer Vision and Pattern Recognition

### Unsupervised Sparse Dirichlet-Net for Hyperspectral Image Super-Resolution

Hairong Qi ,

Chiman Kwan

Comments: Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

In many computer vision applications, obtaining images of high resolution in

both the spatial and spectral domains are equally important. However, due to

hardware limitations, one can only expect to acquire images of high resolution

in either the spatial or spectral domains. This paper focuses on hyperspectral

image super-resolution (HSI-SR), where a hyperspectral image (HSI) with low

spatial resolution (LR) but high spectral resolution is fused with a

multispectral image (MSI) with high spatial resolution (HR) but low spectral

resolution to obtain HR HSI. Existing deep learning-based solutions are all

supervised that would need a large training set and the availability of HR HSI,

which is unrealistic. Here, we make the first attempt to solving the HSI-SR

problem using an unsupervised encoder-decoder architecture that carries the

following uniquenesses. First, it is composed of two encoder-decoder networks,

coupled through a shared decoder, in order to preserve the rich spectral

information from the HSI network. Second, the network encourages the

representations from both modalities to follow a sparse Dirichlet distribution

which naturally incorporates the two physical constraints of HSI and MSI.

Third, the angular difference between representations are minimized in order to

reduce the spectral distortion. We refer to the proposed architecture as

unsupervised Sparse Dirichlet-Net, or uSDN. Extensive experimental results

demonstrate the superior performance of uSDN as compared to the

state-of-the-art.

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

Ionut-Teodor Sorodoc ,

Raffaella Bernardi

Comments: 12 pages (references included). To appear in the Proceedings of NAACL-HLT 2018

Journal-ref: Proceedings of NAACL-HLT 2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

The present work investigates whether different quantification mechanisms

(set comparison, vague quantification, and proportional estimation) can be

jointly learned from visual scenes by a multi-task computational model. The

motivation is that, in humans, these processes underlie the same cognitive,

non-symbolic ability, which allows an automatic estimation and comparison of

set magnitudes. We show that when information about lower-complexity tasks is

available, the higher-level proportional task becomes more accurate than when

performed in isolation. Moreover, the multi-task model is able to generalize to

unseen combinations of target/non-target objects. Consistently with behavioral

evidence showing the interference of absolute number in the proportional task,

the multi-task model no longer works when asked to provide the number of target

objects in the scene.

Convolutional Neural Networks for Skull-stripping in Brain MR Imaging using Consensus-based Silver standard Masks

Oeslle Lucena , Roberto Souza , Leticia Rittner , Richard Frayne , Roberto Lotufo **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Convolutional neural networks (CNN) for medical imaging are constrained by

the number of annotated data required in the training stage. Usually, manual

annotation is considered to be the “gold standard”. However, medical imaging

datasets that include expert manual segmentation are scarce as this step is

time-consuming, and therefore expensive. Moreover, single-rater manual

annotation is most often used in data-driven approaches making the network

optimal with respect to only that single expert. In this work, we propose a CNN

for brain extraction in magnetic resonance (MR) imaging, that is fully trained

with what we refer to as silver standard masks. Our method consists of 1)

developing a dataset with “silver standard” masks as input, and implementing

both 2) a tri-planar method using parallel 2D U-Net-based CNNs (referred to as

CONSNet) and 3) an auto-context implementation of CONSNet. The term CONSNet

refers to our integrated approach, i.e., training with silver standard masks

and using a 2D U-Net-based architecture. Our results showed that we

outperformed (i.e., larger Dice coefficients) the current state-of-the-art SS

methods. Our use of silver standard masks reduced the cost of manual

annotation, decreased inter-intra-rater variability, and avoided CNN

segmentation super-specialization towards one specific manual annotation

guideline that can occur when gold standard masks are used. Moreover, the usage

of silver standard masks greatly enlarges the volume of input annotated data

because we can relatively easily generate labels for unlabeled data. In

addition, our method has the advantage that, once trained, it takes only a few

seconds to process a typical brain image volume using modern hardware, such as

a high-end graphics processing unit. In contrast, many of the other competitive

methods have processing times in the order of minutes.

An efficient deep convolutional laplacian pyramid architecture for CS reconstruction at low sampling ratios

Heyao Xu ,

Xinwei Gao ,

Shengping Zhang ,

Feng Jiang ,

Debin Zhao

Comments: 5 pages. Accepted by ICASSP2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

The compressed sensing (CS) has been successfully applied to image

compression in the past few years as most image signals are sparse in a certain

domain. Several CS reconstruction models have been proposed and obtained

superior performance. However, these methods suffer from blocking artifacts or

ringing effects at low sampling ratios in most cases. To address this problem,

we propose a deep convolutional Laplacian Pyramid Compressed Sensing Network

(LapCSNet) for CS, which consists of a sampling sub-network and a

reconstruction sub-network. In the sampling sub-network, we utilize a

convolutional layer to mimic the sampling operator. In contrast to the fixed

sampling matrices used in traditional CS methods, the filters used in our

convolutional layer are jointly optimized with the reconstruction sub-network.

In the reconstruction sub-network, two branches are designed to reconstruct

multi-scale residual images and muti-scale target images progressively using a

Laplacian pyramid architecture. The proposed LapCSNet not only integrates

multi-scale information to achieve better performance but also reduces

computational cost dramatically. Experimental results on benchmark datasets

demonstrate that the proposed method is capable of reconstructing more details

and sharper edges against the state-of-the-arts methods.

### CNN-based Landmark Detection in Cardiac CTA Scans

Bob D. de Vos ,

Jelmer M. Wolterink ,

Tim Leiner ,

Ivana Išgum

Comments: This work was submitted to MIDL 2018 Conference

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Fast and accurate anatomical landmark detection can benefit many medical

image analysis methods. Here, we propose a method to automatically detect

anatomical landmarks in medical images. Automatic landmark detection is

performed with a patch-based fully convolutional neural network (FCNN) that

combines regression and classification. For any given image patch, regression

is used to predict the 3D displacement vector from the image patch to the

landmark. Simultaneously, classification is used to identify patches that

contain the landmark. Under the assumption that patches close to a landmark can

determine the landmark location more precisely than patches farther from it,

only those patches that contain the landmark according to classification are

used to determine the landmark location. The landmark location is obtained by

calculating the average landmark location using the computed 3D displacement

vectors. The method is evaluated using detection of six clinically relevant

landmarks in coronary CT angiography (CCTA) scans: the right and left ostium,

the bifurcation of the left main coronary artery (LM) into the left anterior

descending and the left circumflex artery, and the origin of the right,

non-coronary, and left aortic valve commissure. The proposed method achieved an

average Euclidean distance error of 2.19 mm and 2.88 mm for the right and left

ostium respectively, 3.78 mm for the bifurcation of the LM, and 1.82 mm, 2.10

mm and 1.89 mm for the origin of the right, non-coronary, and left aortic valve

commissure respectively, demonstrating accurate performance. The proposed

combination of regression and classification can be used to accurately detect

landmarks in CCTA scans.

### Pose estimation of a single circle using default intrinsic calibration

Mariyanayagam Damien , Gurdjos Pierre , Chambon Sylvie , Brunet Florent , Charvillat Vincent **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Circular markers are planar markers which offer great performances for

detection and pose estimation. For an uncalibrated camera with an unknown focal

length, at least the images of at least two coplanar circles are generally

required to recover their poses. Unfortunately, detecting more than one ellipse

in the image must be tricky and time-consuming, especially regarding concentric

circles. On the other hand, when the camera is calibrated, one circle suffices

but the solution is twofold and can hardly be disambiguated. Our contribution

is to put beyond this limit by dealing with the uncalibrated case of a camera

seeing one circle and discussing how to remove the ambiguity. We propose a new

problem formulation that enables to show how to detect geometric configurations

in which the ambiguity can be removed. Furthermore, we introduce the notion of

default camera intrinsics and show, using intensive empirical works, the

surprising observation that very approximate calibration can lead to accurate

circle pose estimation.

Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation

Carolina Redondo-Cabrera , Roberto J. López-Sastre **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Training a Convolutional Neural Network (CNN) for semantic segmentation

typically requires to collect a large amount of accurate pixel-level

annotations, a hard and expensive task. In contrast, simple image tags are

easier to gather. With this paper we introduce a novel weakly-supervised

semantic segmentation model able to learn from image labels, and just image

labels. Our model uses the prior knowledge of a network trained for image

recognition, employing these image annotations, as an attention mechanism to

identify semantic regions in the images. We then present a methodology that

builds accurate class-specific segmentation masks from these regions, where

neither external objectness nor saliency algorithms are required. We describe

how to incorporate this mask generation strategy into a fully end-to-end

trainable process where the network jointly learns to classify and segment

images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these

generated class-specific masks in conjunction with our novel end-to-end

learning process outperforms several recent weakly-supervised semantic

segmentation methods that use image tags only, and even some models that

leverage additional supervision or training data.

### Group Anomaly Detection using Deep Generative Models

Edward Toth (School of Information Technologies, The University of Sydney),

Sanjay Chawla (Qatar Computing Research Institute, HBKU)

Comments: Submitted Under review to The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-2018 Conference Dublin, Ireland during the 10-14 September 2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Unlike conventional anomaly detection research that focuses on point

anomalies, our goal is to detect anomalous collections of individual data

points. In particular, we perform group anomaly detection (GAD) with an

emphasis on irregular group distributions (e.g. irregular mixtures of image

pixels). GAD is an important task in detecting unusual and anomalous phenomena

in real-world applications such as high energy particle physics, social media,

and medical imaging. In this paper, we take a generative approach by proposing

deep generative models: Adversarial autoencoder (AAE) and variational

autoencoder (VAE) for group anomaly detection. Both AAE and VAE detect group

anomalies using point-wise input data where group memberships are known a

priori. We conduct extensive experiments to evaluate our models on real-world

datasets. The empirical results demonstrate that our approach is effective and

robust in detecting group anomalies.

### BodyNet: Volumetric Inference of 3D Human Body Shapes

Gül Varol , Duygu Ceylan , Bryan Russell , Jimei Yang , Ersin Yumer , Ivan Laptev , Cordelia Schmid **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Human shape estimation is an important task for video editing, animation and

fashion industry. Predicting 3D human body shape from natural images, however,

is highly challenging due to factors such as variation in human bodies,

clothing and viewpoint. Prior methods addressing this problem typically attempt

to fit parametric body models with certain priors on pose and shape. In this

work we argue for an alternative representation and propose BodyNet, a neural

network for direct inference of volumetric body shape from a single image.

BodyNet is an end-to-end trainable network that benefits from (i) a volumetric

3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate

supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them

results in performance improvement as demonstrated by our experiments. To

evaluate the method, we fit the SMPL model to our network output and show

state-of-the-art results on the SURREAL and Unite the People datasets,

outperforming recent approaches. Besides achieving state-of-the-art

performance, our method also enables volumetric body-part segmentation.

### Learning Warped Guidance for Blind Face Restoration

Ming Liu ,

Yuting Ye ,

Wangmeng Zuo ,

Liang Lin ,

Ruigang Yang

Comments: 25 pages, 14 figures and 1 table

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

This paper studies the problem of blind face restoration from an

unconstrained blurry, noisy, low-resolution, or compressed image (i.e.,

degraded observation). For better recovery of fine facial details, we modify

the problem setting by taking both the degraded observation and a high-quality

guided image of the same identity as input to our guided face restoration

network (GFRNet). However, the degraded observation and guided image generally

are different in pose, illumination and expression, thereby making plain CNNs

(e.g., U-Net) fail to recover fine and identity-aware facial details. To tackle

this issue, our GFRNet model includes both a warping subnetwork (WarpNet) and a

reconstruction subnetwork (RecNet). The WarpNet is introduced to predict flow

field for warping the guided image to correct pose and expression (i.e., warped

guidance), while the RecNet takes the degraded observation and warped guidance

as input to produce the restoration result. Due to that the ground-truth flow

field is unavailable, landmark loss together with total variation

regularization are incorporated to guide the learning of WarpNet. Furthermore,

to make the model applicable to blind restoration, our GFRNet is trained on the

synthetic data with versatile settings on blur kernel, noise level,

downsampling scale factor, and JPEG quality factor. Experiments show that our

GFRNet not only performs favorably against the state-of-the-art image and face

restoration methods, but also generates visually photo-realistic results on

real degraded facial images.

### Spline Error Weighting for Robust Visual-Inertial Fusion

Per-Erik Forssén

Comments: To appear in CVPR 2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

In this paper we derive and test a probability-based weighting that can

balance residuals of different types in spline fitting. In contrast to previous

formulations, the proposed spline error weighting scheme also incorporates a

prediction of the approximation error of the spline fit. We demonstrate the

effectiveness of the prediction in a synthetic experiment, and apply it to

visual-inertial fusion on rolling shutter cameras. This results in a method

that can estimate 3D structure with metric scale on generic first-person

videos. We also propose a quality measure for spline fitting, that can be used

to automatically select the knot spacing. Experiments verify that the obtained

trajectory quality corresponds well with the requested quality. Finally, by

linearly scaling the weights, we show that the proposed spline error weighting

minimizes the estimation errors on real sequences, in terms of scale and

end-point errors.

### Offline and Online calibration of Mobile Robot and SLAM Device for Navigation

Ryoichi Ishikawa , Takeshi Oishi , Katsushi Ikeuchi **Subjects** : Computer Vision and Pattern Recognition (cs.CV) ; Robotics (cs.RO)

Robot navigation technology is required to accomplish difficult tasks in

various environments. In navigation, it is necessary to know the information of

the external environments and the state of the robot under the environment. On

the other hand, various studies have been done on SLAM technology, which is

also used for navigation, but also applied to devices for Mixed Reality and the

like.

In this paper, we propose a robot-device calibration method for navigation

with a device using SLAM technology on a robot. The calibration is performed by

using the position and orientation information given by the robot and the

device. In the calibration, the most efficient way of movement is clarified

according to the restriction of the robot movement. Furthermore, we also show a

method to dynamically correct the position and orientation of the robot so that

the information of the external environment and the shape information of the

robot maintain consistency in order to reduce the dynamic error occurring

during navigation.

Our method can be easily used for various kinds of robots and localization

with sufficient precision for navigation is possible with offline calibration

and online position correction. In the experiments, we confirm the parameters

obtained by two types of offline calibration according to the degree of freedom

of robot movement and validate the effectiveness of online correction method by

plotting localized position error during robot’s intense movement. Finally, we

show the demonstration of navigation using SLAM device.

### MSnet: Mutual Suppression Network for Disentangled Video Representations

Jangho Lee ,

Sungmin Lee ,

Sungroh Yoon

Comments: 17 pages, 7 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

The extraction of meaningful features from videos is important as they can be

used in various applications. Despite its importance, video representation

learning has not been studied much, because it is challenging to deal with both

content and motion information. We present a Mutual Suppression network (MSnet)

to learn disentangled motion and content features in videos. The MSnet is

trained in such way that content features do not contain motion information and

motion features do not contain content information; this is done by suppressing

each other with adversarial training. We utilize the disentangled features from

the MSnet for several tasks, such as frame reproduction, pixel-level video

frame prediction, and dense optical flow estimation, to demonstrate the

strength of MSnet. The proposed model outperforms the state-of-the-art methods

in pixel-level video frame prediction. The source code will be publicly

available.

### Learning Deep Sketch Abstraction

Yongxin Yang ,

Yi-Zhe Song ,

Tao Xiang ,

Timothy M. Hospedales

Comments: This paper is accepted at CVPR 2018 as poster

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Human free-hand sketches have been studied in various contexts including

sketch recognition, synthesis and fine-grained sketch-based image retrieval

(FG-SBIR). A fundamental challenge for sketch analysis is to deal with

drastically different human drawing styles, particularly in terms of

abstraction level. In this work, we propose the first stroke-level sketch

abstraction model based on the insight of sketch abstraction as a process of

trading off between the recognizability of a sketch and the number of strokes

used to draw it. Concretely, we train a model for abstract sketch generation

through reinforcement learning of a stroke removal policy that learns to

predict which strokes can be safely removed without affecting recognizability.

We show that our abstraction model can be used for various sketch analysis

tasks including: (1) modeling stroke saliency and understanding the decision of

sketch recognition models, (2) synthesizing sketches of variable abstraction

for a given category, or reference object instance in a photo, and (3) training

a FG-SBIR model with photos only, bypassing the expensive photo-sketch pair

collection step.

### Precise Temporal Action Localization by Evolving Temporal Proposals

Haonan Qiu , Yingbin Zheng , Hao Ye , Yao Lu , Feng Wang , Liang He **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Locating actions in long untrimmed videos has been a challenging problem in

video content analysis. The performances of existing action localization

approaches remain unsatisfactory in precisely determining the beginning and the

end of an action. Imitating the human perception procedure with observations

and refinements, we propose a novel three-phase action localization framework.

Our framework is embedded with an Actionness Network to generate initial

proposals through frame-wise similarity grouping, and then a Refinement Network

to conduct boundary adjustment on these proposals. Finally, the refined

proposals are sent to a Localization Network for further fine-grained location

regression. The whole process can be deemed as multi-stage refinement using a

novel non-local pyramid feature under various temporal granularities. We

evaluate our framework on THUMOS14 benchmark and obtain a significant

improvement over the state-of-the-arts approaches. Specifically, the

performance gain is remarkable under precise localization with high IoU

thresholds. Our proposed framework achieves mAP@IoU=0.5 of 34.2%.

### Talking Face Generation by Conditional Recurrent Adversarial Network

Jingwen Zhu ,

Xiaolong Wang ,

Hairong Qi

Comments: Project Page: this http URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Given an arbitrary face image and an arbitrary speech clip, the proposed work

attempts to generating the talking face video with accurate lip synchronization

while maintaining smooth transition of both lip and facial movement over the

entire video clip. Existing works either do not consider temporal dependency on

face images across different video frames thus easily yielding

noticeable/abrupt facial and lip movement or are only limited to the generation

of talking face video for a specific person thus lacking generalization

capacity. We propose a novel conditional video generation network where the

audio input is treated as a condition for the recurrent adversarial network

such that temporal dependency is incorporated to realize smooth transition for

the lip and facial movement. In addition, we deploy a multi-task adversarial

training scheme in the context of video generation to improve both

photo-realism and the accuracy for lip synchronization. Finally, based on the

phoneme distribution information extracted from the audio clip, we develop a

sample selection method that effectively reduces the size of the training

dataset without sacrificing the quality of the generated video. Extensive

experiments on both controlled and uncontrolled datasets demonstrate the

superiority of the proposed approach in terms of visual quality, lip sync

accuracy, and smooth transition of lip and facial movement, as compared to the

state-of-the-art.

### Deep Motion Boundary Detection

Xiyang Dai ,

Xinchao Wang ,

Maojun Zhang ,

Dacheng Tao ,

Larry Davis

Comments: 17 pages, 5 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Motion boundary detection is a crucial yet challenging problem. Prior methods

focus on analyzing the gradients and distributions of optical flow fields, or

use hand-crafted features for motion boundary learning. In this paper, we

propose the first dedicated end-to-end deep learning approach for motion

boundary detection, which we term as MoBoNet. We introduce a refinement network

structure which takes source input images, initial forward and backward optical

flows as well as corresponding warping errors as inputs and produces

high-resolution motion boundaries. Furthermore, we show that the obtained

motion boundaries, through a fusion sub-network we design, can in turn guide

the optical flows for removing the artifacts. The proposed MoBoNet is generic

and works with any optical flows. Our motion boundary detection and the refined

optical flow estimation achieve results superior to the state of the art.

### FishEyeRecNet: A Multi-Context Collaborative Deep Network for Fisheye Image Rectification

Xinchao Wang ,

Jun Yu ,

Maojun Zhang ,

Pascal Fua ,

Dacheng Tao

Comments: 16 pages, 5 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Images captured by fisheye lenses violate the pinhole camera assumption and

suffer from distortions. Rectification of fisheye images is therefore a crucial

preprocessing step for many computer vision applications. In this paper, we

propose an end-to-end multi-context collaborative deep network for removing

distortions from single fisheye images. In contrast to conventional approaches,

which focus on extracting hand-crafted features from input images, our method

learns high-level semantics and low-level appearance features simultaneously to

estimate the distortion parameters. To facilitate training, we construct a

synthesized dataset that covers various scenes and distortion parameter

settings. Experiments on both synthesized and real-world datasets show that the

proposed model significantly outperforms current state of the art methods. Our

code and synthesized dataset will be made publicly available.

### A Hybrid Model for Identity Obfuscation by Face Replacement

Ayush Tewari ,

Weipeng Xu ,

Mario Fritz ,

Christian Theobalt ,

Bernt Schiele

Comments: 17 pages of main paper and 5 pages of supplementary materials

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Cryptography and Security (cs.CR)

As more and more personal photos are shared and tagged in social media,

avoiding privacy risks such as unintended recognition becomes increasingly

challenging. We propose a new hybrid approach to obfuscate identities in photos

by head replacement. Our approach combines state of the art parametric face

synthesis with latest advances in Generative Adversarial Networks (GAN) for

data-driven image synthesis. On the one hand, the parametric part of our method

gives us control over the facial parameters and allows for explicit

manipulation of the identity. On the other hand, the data-driven aspects allow

for adding fine details and overall realism as well as seamless blending into

the scene context. In our experiments, we show highly realistic output of our

system that improves over the previous state of the art in obfuscation rate

while preserving a higher similarity to the original image content.

### Multimodal Unsupervised Image-to-Image Translation

Ming-Yu Liu ,

Serge Belongie ,

Jan Kautz

Comments: Code: this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised image-to-image translation is an important and challenging

problem in computer vision. Given an image in the source domain, the goal is to

learn the conditional distribution of corresponding images in the target

domain, without seeing any pairs of corresponding images. While this

conditional distribution is inherently multimodal, existing approaches make an

overly simplified assumption, modeling it as a deterministic one-to-one

mapping. As a result, they fail to generate diverse outputs from a given source

domain image. To address this limitation, we propose a Multimodal Unsupervised

Image-to-image Translation (MUNIT) framework. We assume that the image

representation can be decomposed into a content code that is domain-invariant,

and a style code that captures domain-specific properties. To translate an

image to another domain, we recombine its content code with a random style code

sampled from the style space of the target domain. We analyze the proposed

framework and establish several theoretical results. Extensive experiments with

comparisons to the state-of-the-art approaches further demonstrates the

advantage of the proposed framework. Moreover, our framework allows users to

control the style of translation outputs by providing an example style image.

Code and pretrained models are available at this https URL

### A Variational U-Net for Conditional Appearance and Shape Generation

Ekaterina Sutter ,

Björn Ommer

Comments: CVPR 2018 (Spotlight). Project Page at this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Deep generative models have demonstrated great performance in image

synthesis. However, results deteriorate in case of spatial deformations, since

they generate images of objects directly, rather than modeling the intricate

interplay of their inherent shape and appearance. We present a conditional

U-Net for shape-guided image generation, conditioned on the output of a

variational autoencoder for appearance. The approach is trained end-to-end on

images, without requiring samples of the same object with varying pose or

appearance. Experiments show that the model enables conditional image

generation and transfer. Therefore, either shape or appearance can be retained

from a query image, while freely altering the other. Moreover, appearance can

be sampled due to its stochastic latent representation, while preserving shape.

In quantitative and qualitative experiments on COCO, DeepFashion, shoes,

Market-1501 and handbags, the approach demonstrates significant improvements

over the state-of-the-art.

### Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning

Jingjing Zheng ,

Azadeh Alavi ,

Rama Chellappa

Comments: Submitted to IEEE TIP Journal

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

In real-world visual recognition problems, the assumption that the training

data (source domain) and test data (target domain) are sampled from the same

distribution is often violated. This is known as the domain adaptation problem.

In this work, we propose a novel domain-adaptive dictionary learning framework

for cross-domain visual recognition. Our method generates a set of intermediate

domains. These intermediate domains form a smooth path and bridge the gap

between the source and target domains. Specifically, we not only learn a common

dictionary to encode the domain-shared features, but also learn a set of

domain-specific dictionaries to model the domain shift. The separation of the

common and domain-specific dictionaries enables us to learn more compact and

reconstructive dictionaries for domain adaptation. These dictionaries are

learned by alternating between domain-adaptive sparse coding and dictionary

updating steps. Meanwhile, our approach gradually recovers the feature

representations of both source and target data along the domain path. By

aligning all the recovered domain data, we derive the final domain-adaptive

features for cross-domain visual recognition. Extensive experiments on three

public datasets demonstrates that our approach outperforms most

state-of-the-art methods.

### Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Ganesh Iyer , J. Krishna Murthy , Gunshi Gupta , K. Madhava Krishna , Liam Paull **Subjects** : Robotics (cs.RO) ; Computer Vision and Pattern Recognition (cs.CV)

With the success of deep learning based approaches in tackling challenging

problems in computer vision, a wide range of deep architectures have recently

been proposed for the task of visual odometry (VO) estimation. Most of these

proposed solutions rely on supervision, which requires the acquisition of

precise ground-truth camera pose information, collected using expensive motion

capture systems or high-precision IMU/GPS sensor rigs. In this work, we propose

an unsupervised paradigm for deep visual odometry learning. We show that using

a noisy teacher, which could be a standard VO pipeline, and by designing a loss

term that enforces geometric consistency of the trajectory, we can train

accurate deep models for VO that do not require ground-truth labels. We

leverage geometry as a self-supervisory signal and propose “Composite

Transformation Constraints (CTCs)”, that automatically generate supervisory

signals for training and enforce geometric consistency in the VO estimate. We

also present a method of characterizing the uncertainty in VO estimates thus

obtained. To evaluate our VO pipeline, we present exhaustive ablation studies

that demonstrate the efficacy of end-to-end, self-supervised methodologies to

train deep models for monocular VO. We show that leveraging concepts from

geometry and incorporating them into the training of a recurrent neural network

results in performance competitive to supervised deep VO methods.

### CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks

Karnik Ram R. ,

J. Krishna Murthy ,

K. Madhava Krishna

Comments: Submitted to IEEE International Conference on Intelligent Robots and Systems (IROS) 2018

**Subjects**

:

Robotics (cs.RO)

; Computer Vision and Pattern Recognition (cs.CV)

3D LiDARs and 2D cameras are increasingly being used alongside each other in

sensor rigs for perception tasks. Before these sensors can be used to gather

meaningful data, however, their extrinsics (and intrinsics) need to be

accurately calibrated, as the performance of the sensor rig is extremely

sensitive to these calibration parameters. A vast majority of existing

calibration techniques require significant amounts of data and/or calibration

targets and human effort, severely impacting their applicability in large-scale

production systems. We address this gap with CalibNet: a self-supervised deep

network capable of automatically estimating the 6-DoF rigid body transformation

between a 3D LiDAR and a 2D camera in real-time. CalibNet alleviates the need

for calibration targets, thereby resulting in significant savings in

calibration efforts. During training, the network only takes as input a LiDAR

point cloud, the corresponding monocular image, and the camera calibration

matrix K. At train time, we do not impose direct supervision (i.e., we do not

directly regress to the calibration parameters, for example). Instead, we train

the network to predict calibration parameters that maximize the geometric and

photometric consistency of the input images and point clouds. CalibNet learns

to iteratively solve the underlying geometric problem and accurately predicts

extrinsic calibration parameters for a wide range of mis-calibrations, without

requiring retraining or domain adaptation. The project page is hosted at

this https URL

## Artificial Intelligence

### Monitoring and Executing Workflows in Linked Data Environments

Tobias Käfer , Andreas Harth **Subjects** : Artificial Intelligence (cs.AI) ; Software Engineering (cs.SE)

The W3C’s Web of Things working group is aimed at addressing the

interoperability problem on the Internet of Things using Linked Data as uniform

interface. While Linked Data paves the way towards combining such devices into

integrated applications, traditional solutions for specifying the control flow

of applications do not work seamlessly with Linked Data. We therefore tackle

the problem of the specification, execution, and monitoring of applications in

the context of Linked Data. We present a novel approach that combines

workflows, semantic reasoning, and RESTful interaction into one integrated

solution. We contribute to the state of the art by (1) defining an ontology for

describing workflow models and instances, (2) providing operational semantics

for the ontology that allows for the execution and monitoring of workflow

instances, (3) presenting a benchmark to evaluate our solution. Moreover, we

showcase how we used the ontology and the operational semantics to monitor

pilots executing workflows in virtual aircraft cockpits.

### Roster Evaluation Based on Classifiers for the Nurse Rostering Problem

Roman Václavík , Přemysl Šůcha , Zdeněk Hanzálek **Subjects** : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Optimization and Control (math.OC)

The personnel scheduling problem is a well-known NP-hard combinatorial

problem. Due to the complexity of this problem and the size of the real-world

instances, it is not possible to use exact methods, and thus heuristics,

meta-heuristics, or hyper-heuristics must be employed. The majority of

heuristic approaches are based on iterative search, where the quality of

intermediate solutions must be calculated. Unfortunately, this is

computationally highly expensive because these problems have many constraints

and some are very complex. In this study, we propose a machine learning

technique as a tool to accelerate the evaluation phase in heuristic approaches.

The solution is based on a simple classifier, which is able to determine

whether the changed solution (more precisely, the changed part of the solution)

is better than the original or not. This decision is made much faster than a

standard cost-oriented evaluation process. However, the classification process

cannot guarantee 100% correctness. Therefore, our approach, which is

illustrated using a tabu search algorithm in this study, includes a filtering

mechanism, where the classifier rejects the majority of the potentially bad

solutions and the remaining solutions are then evaluated in a standard manner.

We also show how the boosting algorithms can improve the quality of the final

solution compared with a simple classifier. We verified our proposed approach

and premises, based on standard and real-world benchmark instances, to

demonstrate the significant speedup obtained with comparable solution quality.

Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

Peter L. Bartlett , Steven N. Evans , Philip M. Long **Subjects** : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)

We show that any smooth bi-Lipschitz (h) can be represented exactly as a

composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close

to the identity in the sense that each (left(h_i-mathrm{Id}

ight)) is

Lipschitz, and the Lipschitz constant decreases inversely with the number (m)

of functions composed. This implies that (h) can be represented to any accuracy

by a deep residual network whose nonlinear layers compute functions with a

small Lipschitz constant. Next, we consider nonlinear regression with a

composition of near-identity nonlinear maps. We show that, regarding Fr’echet

derivatives with respect to the (h_1,…,h_m), any critical point of a

quadratic criterion in this near-identity region must be a global minimizer. In

contrast, if we consider derivatives with respect to parameters of a fixed-size

residual network with sigmoid activation functions, we show that there are

near-identity critical points that are suboptimal, even in the realizable case.

Informally, this means that functional gradient methods for residual networks

cannot get stuck at suboptimal critical points corresponding to near-identity

layers, whereas parametric gradient methods for sigmoidal residual networks

suffer from suboptimal critical points in the near-identity region.

### Affective Recommendation System for Tourists by Using Emotion Generating Calculations

Issei Tachibana

Comments: 6 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1804.02657 and arXiv:1804.03994

Journal-ref: Proc. of IEEE 7th International Workshop on Computational

Intelligence and Applications (IWCIA2014)

**Subjects**

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

An emotion orientated intelligent interface consists of Emotion Generating

Calculations (EGC) and Mental State Transition Network (MSTN). We have

developed the Android EGC application software which the agent works to

evaluate the feelings in the conversation. In this paper, we develop the

tourist information system which can estimate the user’s feelings at the

sightseeing spot. The system can recommend the sightseeing spot and the local

food corresponded to the user’s feeling. The system calculates the

recommendation list by the estimate function which consists of Google search

results, the important degree of a term at the sightseeing website, and the the

aroused emotion by EGC. In order to show the effectiveness, this paper

describes the experimental results for some situations during Hiroshima

sightseeing.

### Successful Nash Equilibrium Agent for a 3-Player Imperfect-Information Game

Sam Ganzfried , Austin Nowak , Joannier Pinales **Subjects** : Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)

Creating strong agents for games with more than two players is a major open

problem in AI. Common approaches are based on approximating game-theoretic

solution concepts such as Nash equilibrium, which have strong theoretical

guarantees in two-player zero-sum games, but no guarantees in non-zero-sum

games or in games with more than two players. We describe an agent that is able

to defeat a variety of realistic opponents using an exact Nash equilibrium

strategy in a 3-player imperfect-information game. This shows that, despite a

lack of theoretical guarantees, agents based on Nash equilibrium strategies can

be successful in multiplayer games after all.

### Efficient Model Identification for Tensegrity Locomotion

Shaojun Zhu , David Surovik , Kostas E. Bekris , Abdeslam Boularias **Subjects** : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

This paper aims to identify in a practical manner unknown physical

parameters, such as mechanical models of actuated robot links, which are

critical in dynamical robotic tasks. Key features include the use of an

off-the-shelf physics engine and the Bayesian optimization framework. The task

being considered is locomotion with a high-dimensional, compliant Tensegrity

robot. A key insight, in this case, is the need to project the model

identification challenge into an appropriate lower dimensional space for

efficiency. Comparisons with alternatives indicate that the proposed method can

identify the parameters more accurately within the given time budget, which

also results in more precise locomotion control.

## Information Retrieval

### DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Ruiming Tang ,

Yunming Ye ,

Zhenguo Li ,

Xiuqiang He ,

Zhenhua Dong

Comments: 14 pages. arXiv admin note: text overlap with arXiv:1703.04247

**Subjects**

:

Information Retrieval (cs.IR)

; Learning (cs.LG); Machine Learning (stat.ML)

Learning sophisticated feature interactions behind user behaviors is critical

in maximizing CTR for recommender systems. Despite great progress, existing

methods have a strong bias towards low- or high-order interactions, or rely on

expertise feature engineering. In this paper, we show that it is possible to

derive an end-to-end learning model that emphasizes both low- and high-order

feature interactions. The proposed framework, DeepFM, combines the power of

factorization machines for recommendation and deep learning for feature

learning in a new neural network architecture. Compared to the latest Wide &

Deep model from Google, DeepFM has a shared raw feature input to both its

“wide” and “deep” components, with no need of feature engineering besides raw

features. DeepFM, as a general learning framework, can incorporate various

network architectures in its deep component. In this paper, we study two

instances of DeepFM where its “deep” component is DNN and PNN respectively, for

which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are

conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the

existing models for CTR prediction, on both benchmark data and commercial data.

We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D

leads to more than 10% improvement of click-through rate in the production

environment, compared to a well-engineered LR model. We also covered related

practice in deploying our framework in Huawei App Market.

### RIPEx: Extracting malicious IP addresses from security forums using cross-forum learning

Evangelos E. Papalexakis ,

Michalis Faloutsos

Comments: 12 pages, Accepted in n 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018

**Subjects**

:

Information Retrieval (cs.IR)

; Learning (cs.LG)

Is it possible to extract malicious IP addresses reported in security forums

in an automatic way? This is the question at the heart of our work. We focus on

security forums, where security professionals and hackers share knowledge and

information, and often report misbehaving IP addresses. So far, there have only

been a few efforts to extract information from such security forums. We propose

RIPEx, a systematic approach to identify and label IP addresses in security

forums by utilizing a cross-forum learning method. In more detail, the

challenge is twofold: (a) identifying IP addresses from other numerical

entities, such as software version numbers, and (b) classifying the IP address

as benign or malicious. We propose an integrated solution that tackles both

these problems. A novelty of our approach is that it does not require training

data for each new forum. Our approach does knowledge transfer across forums: we

use a classifier from our source forums to identify seed information for

training a classifier on the target forum. We evaluate our method using data

collected from five security forums with a total of 31K users and 542K posts.

First, RIPEx can distinguish IP address from other numeric expressions with 95%

precision and above 93% recall on average. Second, RIPEx identifies malicious

IP addresses with an average precision of 88% and over 78% recall, using our

cross-forum learning. Our work is a first step towards harnessing the wealth of

useful information that can be found in security forums.

Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

Andre Greiner-Petter ,

Philipp Scharpf ,

Norman Meuschke ,

Howard Cohl ,

Bela Gipp

Comments: 10 pages, 4 figures

Journal-ref: Proceedings of the ACM/IEEE-CS Joint Conference on Digital

Libraries (JCDL), Jun. 2018, Fort Worth, USA

**Subjects**

:

Digital Libraries (cs.DL)

; Information Retrieval (cs.IR)

Mathematical formulae represent complex semantic information in a concise

form. Especially in Science, Technology, Engineering, and Mathematics,

mathematical formulae are crucial to communicate information, e.g., in

scientific papers, and to perform computations using computer algebra systems.

Enabling computers to access the information encoded in mathematical formulae

requires machine-readable formats that can represent both the presentation and

content, i.e., the semantics, of formulae. Exchanging such information between

systems additionally requires conversion methods for mathematical

representation formats. We analyze how the semantic enrichment of formulae

improves the format conversion process and show that considering the textual

context of formulae reduces the error rate of such conversions. Our main

contributions are: (1) providing an openly available benchmark dataset for the

mathematical format conversion task consisting of a newly created test

collection, an extensive, manually curated gold standard and task-specific

evaluation metrics; (2) performing a quantitative evaluation of

state-of-the-art tools for mathematical format conversions; (3) presenting a

new approach that considers the textual context of formulae to reduce the error

rate for mathematical format conversions. Our benchmark dataset facilitates

future research on mathematical format conversions as well as research on many

problems in mathematical information retrieval. Because we annotated and linked

all components of formulae, e.g., identifiers, operators and other entities, to

Wikidata entries, the gold standard can, for instance, be used to train methods

for formula concept discovery and recognition. Such methods can then be applied

to improve mathematical information retrieval systems, e.g., for semantic

formula search, recommendation of mathematical content, or detection of

mathematical plagiarism.

### Affective Recommendation System for Tourists by Using Emotion Generating Calculations

Issei Tachibana

Comments: 6 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1804.02657 and arXiv:1804.03994

Journal-ref: Proc. of IEEE 7th International Workshop on Computational

Intelligence and Applications (IWCIA2014)

**Subjects**

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

An emotion orientated intelligent interface consists of Emotion Generating

Calculations (EGC) and Mental State Transition Network (MSTN). We have

developed the Android EGC application software which the agent works to

evaluate the feelings in the conversation. In this paper, we develop the

tourist information system which can estimate the user’s feelings at the

sightseeing spot. The system can recommend the sightseeing spot and the local

food corresponded to the user’s feeling. The system calculates the

recommendation list by the estimate function which consists of Google search

results, the important degree of a term at the sightseeing website, and the the

aroused emotion by EGC. In order to show the effectiveness, this paper

describes the experimental results for some situations during Hiroshima

sightseeing.

### Distributed Collaborative Hashing and Its Applications in Ant Financial

Chaochao Chen , Ziqi Liu , Peilin Zhao , Longfei Li , Jun Zhou , Xiaolong Li **Subjects** : Learning (cs.LG) ; Information Retrieval (cs.IR); Machine Learning (stat.ML)

Collaborative filtering, especially latent factor model, has been popularly

used in personalized recommendation. Latent factor model aims to learn user and

item latent factors from user-item historic behaviors. To apply it into real

big data scenarios, efficiency becomes the first concern, including offline

model training efficiency and online recommendation efficiency. In this paper,

we propose a Distributed Collaborative Hashing (DCH) model which can

significantly improve both efficiencies. Specifically, we first propose a

distributed learning framework, following the state-of-the-art parameter server

paradigm, to learn the offline collaborative model. Our model can be learnt

efficiently by distributedly computing subgradients in minibatches on workers

and updating model parameters on servers asynchronously. We then adopt hashing

technique to speedup the online recommendation procedure. Recommendation can be

quickly made through exploiting lookup hash tables. We conduct thorough

experiments on two real large-scale datasets. The experimental results

demonstrate that, comparing with the classic and state-of-the-art (distributed)

latent factor models, DCH has comparable performance in terms of recommendation

accuracy but has both fast convergence speed in offline model training

procedure and realtime efficiency in online recommendation procedure.

Furthermore, the encouraging performance of DCH is also shown for several

real-world applications in Ant Financial.

## Computation and Language

### Pieces of Eight: 8-bit Neural Machine Translation

Miguel Ballesteros

Comments: To appear at NAACL 2018 Industry Track

**Subjects**

:

Computation and Language (cs.CL)

Neural machine translation has achieved levels of fluency and adequacy that

would have been surprising a short time ago. Output quality is extremely

relevant for industry purposes, however it is equally important to produce

results in the shortest time possible, mainly for latency-sensitive

applications and to control cloud hosting costs. In this paper we show the

effectiveness of translating with 8-bit quantization for models that have been

trained using 32-bit floating point values. Results show that 8-bit translation

makes a non-negligible impact in terms of speed with no degradation in accuracy

and adequacy.

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Yuhang Xia ,

Yangming Zhou ,

Tong Ruan ,

Daqi Gao ,

Ping He

Comments: 21 pages, 6 figures

**Subjects**

:

Computation and Language (cs.CL)

Clinical Named Entity Recognition (CNER) aims to identify and classify

clinical terms such as diseases, symptoms, treatments, exams, and body parts in

electronic health records, which is a fundamental and crucial task for clinical

and translational research. In recent years, deep neural networks have achieved

significant success in named entity recognition and many other Natural Language

Processing (NLP) tasks. Most of these algorithms are trained end to end, and

can automatically learn features from large scale labeled datasets. However,

these data-driven methods typically lack the capability of processing rare or

unseen entities. Previous statistical methods and feature engineering practice

have demonstrated that human knowledge can provide valuable information for

handling rare and unseen cases. In this paper, we address the problem by

incorporating dictionaries into deep neural networks for the Chinese CNER task.

Two different architectures that extend the Bi-directional Long Short-Term

Memory (Bi-LSTM) neural network and five different feature representation

schemes are proposed to handle the task. Computational results on the CCKS-2017

Task 2 benchmark dataset show that the proposed method achieves the highly

competitive performance compared with the state-of-the-art deep learning

methods.

### An Ontology-Based Dialogue Management System for Banking and Finance Dialogue Systems

Comments: 9 pages, 27 figures, goes to 1st Financial Narrative Processing Workshop @ LREC 7-12 May 2018, Miyazaki, Japan

**Subjects**

:

Computation and Language (cs.CL)

Keeping the dialogue state in dialogue systems is a notoriously difficult

task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue

manager that keeps the state of the conversation, provides a basis for anaphora

resolution and drives the conversation via domain ontologies. The banking and

finance area promises great potential for disambiguating the context via a rich

set of products and specificity of proper nouns, named entities and verbs. We

used ontologies both as a knowledge base and a basis for the dialogue manager;

the knowledge base component and dialogue manager components coalesce in a

sense. Domain knowledge is used to track Entities of Interest, i.e. nodes

(classes) of the ontology which happen to be products and services. In this way

we also introduced conversation memory and attention in a sense. We finely

blended linguistic methods, domain-driven keyword ranking and domain ontologies

to create ways of domain-driven conversation. Proposed framework is used in our

in-house German language banking and finance chatbots. General challenges of

German language processing and finance-banking domain chatbot language models

and lexicons are also introduced. This work is still in progress, hence no

success metrics have been introduced yet.

### Per-Corpus Configuration of Topic Modelling for GitHub and Stack Overflow Collections

Christoph Treude , Markus Wagner **Subjects** : Computation and Language (cs.CL) ; Neural and Evolutionary Computing (cs.NE)

To make sense of large amounts of textual data, topic modelling is frequently

used as a text-mining tool for the discovery of hidden semantic structures in

text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model

that aims to explain the structure of a corpus by grouping texts. LDA requires

multiple parameters to work well, and there are only rough and sometimes

conflicting guidelines available on how these parameters should be set. In this

paper, we contribute (i) a broad study of parameters to arrive at good local

optima, (ii) an a-posteriori characterisation of text corpora related to eight

programming languages from GitHub and Stack Overflow, and (iii) an analysis of

corpus feature importance via per-corpus LDA configuration.

## Distributed, Parallel, and Cluster Computing

### On the Efficiency of Localized Work Stealing

Charles E. Leiserson ,

Tao B. Schardl

Comments: 13 pages, 1 figure

Journal-ref: Information Processing Letters, 116(2):100-106 (2016)

**Subjects**

:

Distributed, Parallel, and Cluster Computing (cs.DC)

; Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

This paper investigates a variant of the work-stealing algorithm that we call

the localized work-stealing algorithm. The intuition behind this variant is

that because of locality, processors can benefit from working on their own

work. Consequently, when a processor is free, it makes a steal attempt to get

back its own work. We call this type of steal a steal-back. We show that the

expected running time of the algorithm is (T_1/P+O(T_infty P)), and that under

the “even distribution of free agents assumption”, the expected running time of

the algorithm is (T_1/P+O(T_inftylg P)). In addition, we obtain another

running-time bound based on ratios between the sizes of serial tasks in the

computation. If (M) denotes the maximum ratio between the largest and the

smallest serial tasks of a processor after removing a total of (O(P)) serial

tasks across all processors from consideration, then the expected running time

of the algorithm is (T_1/P+O(T_infty M)).

### A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

Demetrios Coutinho , Samuel Xavier-de-Souza , Daniel Aloise **Subjects** : Distributed, Parallel, and Cluster Computing (cs.DC)

We present a shared-memory parallel implementation of the Simplex tableau

algorithm for dense large-scale Linear Programming (LP) problems. We present

the general scheme and explain each parallelization step of the standard

simplex algorithm, emphasizing important solutions for solving performance

bottlenecks. We analyzed the speedup and the parallel efficiency for the

proposed implementation relative to the standard Simplex algorithm using a

shared-memory system with 64 processing cores. The experiments were performed

for several different problems, with up to 8192 variables and constraints, in

their primal and dual formulations. The results show that the performance is

mostly much better when we use the formulation with more variables than

inequality constraints. Also, they show that the parallelization strategies

applied to avoid bottlenecks caused the implementation to scale well with the

problem size and the core count up to a certain limit of problem size. Further

analysis showed that this was an effect of resource limitation. Even though,

our implementation was able to reach speedups in the order of 19x.

### Mitigating Docker Security Issues

Comments: 11 pages

**Subjects**

:

Cryptography and Security (cs.CR)

; Distributed, Parallel, and Cluster Computing (cs.DC)

It is very easy to run applications in Docker. Docker offers an ecosystem

that offers a platform for application packaging, distributing and managing

within containers. However, Docker platform is yet not matured. Presently,

Docker is less secured as compare to virtual machines (VM) and most of the

other cloud technologies. The key of reason of Docker inadequate security

protocols is containers sharing of Linux kernel, which can lead to risk of

privileged escalations. This research is going to outline some major security

vulnerabilities at Docker and counter solutions to neutralize such attacks.

There are variety of security attacks like insider and outsider. This research

will outline both types of attacks and their mitigations strategies. Taking

some precautionary measures can save from huge disasters. This research will

also present Docker secure deployment guidelines. These guidelines will suggest

different configurations to deploy Docker containers in a more secure way.

### μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

Tal Ben-Nun ,

Torsten Hoefler ,

Satoshi Matsuoka

Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )

**Subjects**

:

Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used

in deep learning. Specifically, cuDNN implements several equivalent convolution

algorithms, whose performance and memory footprint may vary considerably,

depending on the layer dimensions. When an algorithm is automatically selected

by cuDNN, the decision is performed on a per-layer basis, and thus it often

resorts to slower algorithms that fit the workspace size constraints. We

present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides

layers’ mini-batch computation into several micro-batches. Based on Dynamic

Programming and Integer Linear Programming, {mu}-cuDNN enables faster

algorithms by decreasing the workspace requirements. At the same time,

{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples

statistical efficiency from the hardware efficiency safely. We demonstrate the

effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,

achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2

GPU. These results indicate that using micro-batches can seamlessly increase

the performance of deep learning, while maintaining the same memory footprint.

### MPSM: Multi-prospective PaaS Security Model

Robail Yasrab **Subjects** : Cryptography and Security (cs.CR) ; Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud computing has brought a revolution in the field of information

technology and improving the efficiency of computational resources. It offers

computing as a service enabling huge cost and resource efficiency. Despite its

advantages, certain security issues still hinder organizations and enterprises

from it being adopted. This study mainly focused on the security of

Platform-as-a-Service (PaaS) as well as the most critical security issues that

were documented regarding PaaS infrastructure. The prime outcome of this study

was a security model proposed to mitigate security vulnerabilities of PaaS.

This security model consists of a number of tools, techniques and guidelines to

mitigate and neutralize security issues of PaaS. The security vulnerabilities

along with mitigation strategies were discussed to offer a deep insight into

PaaS security for both vendor and client that may facilitate future design to

implement secure PaaS platforms.

### Asynchronous Parallel Sampling Gradient Boosting Decision Tree

Cheng Daning , Xia Fen , Li Shigang , Zhang Yunquan **Subjects** : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

With the development of big data technology, Gradient Boosting Decision Tree,

i.e. GBDT, becomes one of the most important machine learning algorithms for

its accurate output. However, the training process of GBDT needs a lot of

computational resources and time. In order to accelerate the training process

of GBDT, the asynchronous parallel sampling gradient boosting decision tree,

abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we

adapt the numerical optimization process of traditional GBDT training process

into stochastic optimization process and use asynchronous parallel stochastic

gradient descent to accelerate the GBDT training process. Meanwhile, the

theoretical analysis of asynch-SGBDT is provided by us in this paper.

Experimental results show that GBDT training process could be accelerated by

asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear

speedup, especially for high-dimensional sparse datasets.

## Learning

Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

Peter L. Bartlett , Steven N. Evans , Philip M. Long **Subjects** : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Statistics Theory (math.ST); Machine Learning (stat.ML)

We show that any smooth bi-Lipschitz (h) can be represented exactly as a

composition (h_m circ … circ h_1) of functions (h_1,…,h_m) that are close

to the identity in the sense that each (left(h_i-mathrm{Id}

ight)) is

Lipschitz, and the Lipschitz constant decreases inversely with the number (m)

of functions composed. This implies that (h) can be represented to any accuracy

by a deep residual network whose nonlinear layers compute functions with a

small Lipschitz constant. Next, we consider nonlinear regression with a

composition of near-identity nonlinear maps. We show that, regarding Fr’echet

derivatives with respect to the (h_1,…,h_m), any critical point of a

quadratic criterion in this near-identity region must be a global minimizer. In

contrast, if we consider derivatives with respect to parameters of a fixed-size

residual network with sigmoid activation functions, we show that there are

near-identity critical points that are suboptimal, even in the realizable case.

Informally, this means that functional gradient methods for residual networks

cannot get stuck at suboptimal critical points corresponding to near-identity

layers, whereas parametric gradient methods for sigmoidal residual networks

suffer from suboptimal critical points in the near-identity region.

### Distributed Collaborative Hashing and Its Applications in Ant Financial

Chaochao Chen , Ziqi Liu , Peilin Zhao , Longfei Li , Jun Zhou , Xiaolong Li **Subjects** : Learning (cs.LG) ; Information Retrieval (cs.IR); Machine Learning (stat.ML)

Collaborative filtering, especially latent factor model, has been popularly

used in personalized recommendation. Latent factor model aims to learn user and

item latent factors from user-item historic behaviors. To apply it into real

big data scenarios, efficiency becomes the first concern, including offline

model training efficiency and online recommendation efficiency. In this paper,

we propose a Distributed Collaborative Hashing (DCH) model which can

significantly improve both efficiencies. Specifically, we first propose a

distributed learning framework, following the state-of-the-art parameter server

paradigm, to learn the offline collaborative model. Our model can be learnt

efficiently by distributedly computing subgradients in minibatches on workers

and updating model parameters on servers asynchronously. We then adopt hashing

technique to speedup the online recommendation procedure. Recommendation can be

quickly made through exploiting lookup hash tables. We conduct thorough

experiments on two real large-scale datasets. The experimental results

demonstrate that, comparing with the classic and state-of-the-art (distributed)

latent factor models, DCH has comparable performance in terms of recommendation

accuracy but has both fast convergence speed in offline model training

procedure and realtime efficiency in online recommendation procedure.

Furthermore, the encouraging performance of DCH is also shown for several

real-world applications in Ant Financial.

### Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features

Ngo Anh Vien

Comments: Submitted to ECML-PKDD 2018

**Subjects**

:

Learning (cs.LG)

; Machine Learning (stat.ML)

One-class Support Vector Machine (OC-SVM) for a long time has been one of the

most effective anomaly detection methods and widely adopted in both research as

well as industrial applications. The biggest issue for OC-SVM is, however, the

capability to operate with large and high-dimensional datasets due to

inefficient features and optimization complexity. Those problems might be

mitigated via dimensionality reduction techniques such as manifold learning or

auto-encoder. However, previous work often treats representation learning and

anomaly prediction separately. In this paper, we propose autoencoder based

one-class SVM (AE-1SVM) that brings OC-SVM, with the aid of random Fourier

features to approximate the radial basis kernel, into deep learning context by

combining it with a representation learning architecture and jointly exploit

stochastic gradient descend to obtain end-to-end training. Interestingly, this

also opens up the possible use of gradient-based attribution methods to explain

the decision making for anomaly detection, which has ever been challenging as a

result of the implicit mappings between the input space and the kernel space.

To the best of our knowledge, this is the first work to study the

interpretability of deep learning in anomaly detection. We evaluate our method

on a wide range of unsupervised anomaly detection tasks in which our end-to-end

training architecture achieves a performance significantly better than the

previous work using separate training.

### μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

Tal Ben-Nun ,

Torsten Hoefler ,

Satoshi Matsuoka

Comments: 11 pages, 14 figures. Part of the content have been published in IPSJ SIG Technical Report, Vol. 2017-HPC-162, No. 22, pp. 1-9, 2017. (DOI: this http URL )

**Subjects**

:

Learning (cs.LG)

; Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used

in deep learning. Specifically, cuDNN implements several equivalent convolution

algorithms, whose performance and memory footprint may vary considerably,

depending on the layer dimensions. When an algorithm is automatically selected

by cuDNN, the decision is performed on a per-layer basis, and thus it often

resorts to slower algorithms that fit the workspace size constraints. We

present {mu}-cuDNN, a transparent wrapper library for cuDNN, which divides

layers’ mini-batch computation into several micro-batches. Based on Dynamic

Programming and Integer Linear Programming, {mu}-cuDNN enables faster

algorithms by decreasing the workspace requirements. At the same time,

{mu}-cuDNN keeps the computational semantics unchanged, so that it decouples

statistical efficiency from the hardware efficiency safely. We demonstrate the

effectiveness of {mu}-cuDNN over two frameworks, Caffe and TensorFlow,

achieving speedups of 1.63x for AlexNet and 1.21x for ResNet-18 on P100-SXM2

GPU. These results indicate that using micro-batches can seamlessly increase

the performance of deep learning, while maintaining the same memory footprint.

### Distribution Regression Network

Connie Kou , Hwee Kuan Lee , Teck Khim Ng **Subjects** : Learning (cs.LG) ; Machine Learning (stat.ML)

We introduce our Distribution Regression Network (DRN) which performs

regression from input probability distributions to output probability

distributions. Compared to existing methods, DRN learns with fewer model

parameters and easily extends to multiple input and multiple output

distributions. On synthetic and real-world datasets, DRN performs similarly or

better than the state-of-the-art. Furthermore, DRN generalizes the conventional

multilayer perceptron (MLP). In the framework of MLP, each node encodes a real

number, whereas in DRN, each node encodes a probability distribution.

### MOVI: A Model-Free Approach to Dynamic Fleet Management

Takuma Oda , Carlee Joe-Wong **Subjects** : Learning (cs.LG) ; Machine Learning (stat.ML)

Modern vehicle fleets, e.g., for ridesharing platforms and taxi companies,

can reduce passengers’ waiting times by proactively dispatching vehicles to

locations where pickup requests are anticipated in the future. Yet it is

unclear how to best do this: optimal dispatching requires optimizing over

several sources of uncertainty, including vehicles’ travel times to their

dispatched locations, as well as coordinating between vehicles so that they do

not attempt to pick up the same passenger. While prior works have developed

models for this uncertainty and used them to optimize dispatch policies, in

this work we introduce a model-free approach. Specifically, we propose MOVI, a

Deep Q-network (DQN)-based framework that directly learns the optimal vehicle

dispatch policy. Since DQNs scale poorly with a large number of possible

dispatches, we streamline our DQN training and suppose that each individual

vehicle independently learns its own optimal policy, ensuring scalability at

the cost of less coordination between vehicles. We then formulate a centralized

receding-horizon control (RHC) policy to compare with our DQN policies. To

compare these policies, we design and build MOVI as a large-scale realistic

simulator based on 15 million taxi trip records that simulates policy-agnostic

responses to dispatch decisions. We show that the DQN dispatch policy reduces

the number of unserviced requests by 76% compared to without dispatch and 20%

compared to the RHC approach, emphasizing the benefits of a model-free approach

and suggesting that there is limited value to coordinating vehicle actions.

This finding may help to explain the success of ridesharing platforms, for

which drivers make individual decisions.

### Asynchronous Parallel Sampling Gradient Boosting Decision Tree

Cheng Daning , Xia Fen , Li Shigang , Zhang Yunquan **Subjects** : Learning (cs.LG) ; Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

With the development of big data technology, Gradient Boosting Decision Tree,

i.e. GBDT, becomes one of the most important machine learning algorithms for

its accurate output. However, the training process of GBDT needs a lot of

computational resources and time. In order to accelerate the training process

of GBDT, the asynchronous parallel sampling gradient boosting decision tree,

abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we

adapt the numerical optimization process of traditional GBDT training process

into stochastic optimization process and use asynchronous parallel stochastic

gradient descent to accelerate the GBDT training process. Meanwhile, the

theoretical analysis of asynch-SGBDT is provided by us in this paper.

Experimental results show that GBDT training process could be accelerated by

asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear

speedup, especially for high-dimensional sparse datasets.

### 3D G-CNNs for Pulmonary Nodule Detection

Marysia Winkels , Taco S. Cohen **Subjects** : Learning (cs.LG) ; Machine Learning (stat.ML)

Convolutional Neural Networks (CNNs) require a large amount of annotated data

to learn from, which is often difficult to obtain in the medical domain. In

this paper we show that the sample complexity of CNNs can be significantly

improved by using 3D roto-translation group convolutions (G-Convs) instead of

the more conventional translational convolutions. These 3D G-CNNs were applied

to the problem of false positive reduction for pulmonary nodule detection, and

proved to be substantially more effective in terms of performance, sensitivity

to malignant nodules, and speed of convergence compared to a strong and

comparable baseline architecture with regular convolutions, data augmentation

and a similar number of parameters. For every dataset size tested, the G-CNN

achieved a FROC score close to the CNN trained on ten times more data.

### Machine Learning in Astronomy: A Case Study in Quasar-Star Classification

Suryoday Basak ,

Ariruna Dasgupta ,

Surbhi Agrawal ,

Snehanshu Saha

Comments: 10 pages, 8 figures

**Subjects**

:

Instrumentation and Methods for Astrophysics (astro-ph.IM)

; Learning (cs.LG)

We present the results of various automated classification methods, based on

machine learning (ML), of objects from data releases 6 and 7 (DR6 and DR7) of

the Sloan Digital Sky Survey (SDSS), primarily distinguishing stars from

quasars. We provide a careful scrutiny of approaches available in the

literature and have highlighted the pitfalls in those approaches based on the

nature of data used for the study. The aim is to investigate the

appropriateness of the application of certain ML methods. The manuscript argues

convincingly in favor of the efficacy of asymmetric AdaBoost to classify

photometric data. The paper presents a critical review of existing study and

puts forward an application of asymmetric AdaBoost, as an offspring of that

exercise.

### A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content

Joshua Saxe , Richard Harang , Cody Wild , Hillary Sanders **Subjects** : Cryptography and Security (cs.CR) ; Learning (cs.LG); Machine Learning (stat.ML)

Malicious web content is a serious problem on the Internet today. In this

paper we propose a deep learning approach to detecting malevolent web pages.

While past work on web content detection has relied on syntactic parsing or on

emulation of HTML and Javascript to extract features, our approach operates

directly on a language-agnostic stream of tokens extracted directly from static

HTML files with a simple regular expression. This makes it fast enough to

operate in high-frequency data contexts like firewalls and web proxies, and

allows it to avoid the attack surface exposure of complex parsing and emulation

code. Unlike well-known approaches such as bag-of-words models, which ignore

spatial information, our neural network examines content at hierarchical

spatial scales, allowing our model to capture locality and yielding superior

accuracy compared to bag-of-words baselines. Our proposed architecture achieves

a 97.5% detection rate at a 0.1% false positive rate, and classifies

small-batched web pages at a rate of over 100 per second on commodity hardware.

The speed and accuracy of our approach makes it appropriate for deployment to

endpoints, firewalls, and web proxies.

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

Ionut-Teodor Sorodoc ,

Raffaella Bernardi

Comments: 12 pages (references included). To appear in the Proceedings of NAACL-HLT 2018

Journal-ref: Proceedings of NAACL-HLT 2018

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

The present work investigates whether different quantification mechanisms

(set comparison, vague quantification, and proportional estimation) can be

jointly learned from visual scenes by a multi-task computational model. The

motivation is that, in humans, these processes underlie the same cognitive,

non-symbolic ability, which allows an automatic estimation and comparison of

set magnitudes. We show that when information about lower-complexity tasks is

available, the higher-level proportional task becomes more accurate than when

performed in isolation. Moreover, the multi-task model is able to generalize to

unseen combinations of target/non-target objects. Consistently with behavioral

evidence showing the interference of absolute number in the proportional task,

the multi-task model no longer works when asked to provide the number of target

objects in the scene.

### Connectivity in Random Annulus Graphs and the Geometric Block Model

Sainyam Galhotra , Arya Mazumdar , Soumyabrata Pal , Barna Saha **Subjects** : Discrete Mathematics (cs.DM) ; Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

Random geometric graphs are the simplest, and perhaps the earliest possible

random graph model of spatial networks, introduced by Gilbert in 1961. In the

most basic setting, a random geometric graph (G(n,r)) has (n) vertices. Each

vertex of the graph is assigned a real number in ([0,1]) randomly and

uniformly. There is an edge between two vertices if the corresponding two

random numbers differ by at most (r) (to mitigate the boundary effect, let us

consider the Lee distance here, (d_L(u,v) = min{|u-v|, 1-|u-v|})). It is

well-known that the connectivity threshold regime for random geometric graphs

is at (r approx frac{log n}{n}). In particular, if (r = frac{alog n}{n}),

then a random geometric graph is connected with high probability if and only if

(a > 1). Consider (G(n,frac{(1+epsilon)log{n}}{n})) for any (epsilon >0) to

satisfy the connectivity requirement and delete half of its edges which have

distance at most (frac{log{n}}{2n}). It is natural to believe that the

resultant graph will be disconnected. Surprisingly, we show that the graph

still remains connected!

Formally, generalizing random geometric graphs, we define a random annulus

graph (G(n, [r_1, r_2]), r_1 <r_2) with (n) vertices. Each vertex of the graph

is assigned a real number in ([0,1]) randomly and uniformly as before. There is

an edge between two vertices if the Lee distance between the corresponding two

random numbers is between (r_1) and (r_2), (0<r_1<r_2). Let us assume (r_1 =

frac{b log n}{n},) and (r_2 = frac{a log n}{n}, 0 <b <a). We show that this

graph is connected with high probability if and only if (a -b > frac12) and (a

>1). That is (G(n, [0,frac{0.99log n}{n}])) is not connected but

(G(n,[frac{0.50 log n}{n},frac{1+epsilon log n}{n}])) is.

This result is then used to give improved lower and upper bounds on the

recovery threshold of the geometric block model.

### Roster Evaluation Based on Classifiers for the Nurse Rostering Problem

Roman Václavík , Přemysl Šůcha , Zdeněk Hanzálek **Subjects** : Artificial Intelligence (cs.AI) ; Learning (cs.LG); Optimization and Control (math.OC)

The personnel scheduling problem is a well-known NP-hard combinatorial

problem. Due to the complexity of this problem and the size of the real-world

instances, it is not possible to use exact methods, and thus heuristics,

meta-heuristics, or hyper-heuristics must be employed. The majority of

heuristic approaches are based on iterative search, where the quality of

intermediate solutions must be calculated. Unfortunately, this is

computationally highly expensive because these problems have many constraints

and some are very complex. In this study, we propose a machine learning

technique as a tool to accelerate the evaluation phase in heuristic approaches.

The solution is based on a simple classifier, which is able to determine

whether the changed solution (more precisely, the changed part of the solution)

is better than the original or not. This decision is made much faster than a

standard cost-oriented evaluation process. However, the classification process

cannot guarantee 100% correctness. Therefore, our approach, which is

illustrated using a tabu search algorithm in this study, includes a filtering

mechanism, where the classifier rejects the majority of the potentially bad

solutions and the remaining solutions are then evaluated in a standard manner.

We also show how the boosting algorithms can improve the quality of the final

solution compared with a simple classifier. We verified our proposed approach

and premises, based on standard and real-world benchmark instances, to

demonstrate the significant speedup obtained with comparable solution quality.

### Online Fall Detection using Recurrent Neural Networks

Daniele De Martini ,

Nicola Blago ,

Tullio Facchinetti ,

Marco Piastra

Comments: 6 pages, ICRA 2018

**Subjects**

:

Computers and Society (cs.CY)

; Learning (cs.LG); Machine Learning (stat.ML)

Unintentional falls can cause severe injuries and even death, especially if

no immediate assistance is given. The aim of Fall Detection Systems (FDSs) is

to detect an occurring fall. This information can be used to trigger the

necessary assistance in case of injury. This can be done by using either

ambient-based sensors, e.g. cameras, or wearable devices. The aim of this work

is to study the technical aspects of FDSs based on wearable devices and

artificial intelligence techniques, in particular Deep Learning (DL), to

implement an effective algorithm for on-line fall detection. The proposed

classifier is based on a Recurrent Neural Network (RNN) model with underlying

Long Short-Term Memory (LSTM) blocks. The method is tested on the publicly

available SisFall dataset, with extended annotation, and compared with the

results obtained by the SisFall authors.

### DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Ruiming Tang ,

Yunming Ye ,

Zhenguo Li ,

Xiuqiang He ,

Zhenhua Dong

Comments: 14 pages. arXiv admin note: text overlap with arXiv:1703.04247

**Subjects**

:

Information Retrieval (cs.IR)

; Learning (cs.LG); Machine Learning (stat.ML)

Learning sophisticated feature interactions behind user behaviors is critical

in maximizing CTR for recommender systems. Despite great progress, existing

methods have a strong bias towards low- or high-order interactions, or rely on

expertise feature engineering. In this paper, we show that it is possible to

derive an end-to-end learning model that emphasizes both low- and high-order

feature interactions. The proposed framework, DeepFM, combines the power of

factorization machines for recommendation and deep learning for feature

learning in a new neural network architecture. Compared to the latest Wide &

Deep model from Google, DeepFM has a shared raw feature input to both its

“wide” and “deep” components, with no need of feature engineering besides raw

features. DeepFM, as a general learning framework, can incorporate various

network architectures in its deep component. In this paper, we study two

instances of DeepFM where its “deep” component is DNN and PNN respectively, for

which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are

conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the

existing models for CTR prediction, on both benchmark data and commercial data.

We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D

leads to more than 10% improvement of click-through rate in the production

environment, compared to a well-engineered LR model. We also covered related

practice in deploying our framework in Huawei App Market.

### Learning Contracting Vector Fields For Stable Imitation Learning

Vikas Sindhwani , Stephen Tu , Mohi Khansari **Subjects** : Robotics (cs.RO) ; Learning (cs.LG); Machine Learning (stat.ML)

We propose a new non-parametric framework for learning incrementally stable

dynamical systems x’ = f(x) from a set of sampled trajectories. We construct a

rich family of smooth vector fields induced by certain classes of matrix-valued

kernels, whose equilibria are placed exactly at a desired set of locations and

whose local contraction and curvature properties at various points can be

explicitly controlled using convex optimization. With curl-free kernels, our

framework may also be viewed as a mechanism to learn potential fields and

gradient flows. We develop large-scale techniques using randomized kernel

approximations in this context. We demonstrate our approach, called contracting

vector fields (CVF), on imitation learning tasks involving complex

point-to-point human handwriting motions.

### The unreasonable effectiveness of the forget gate

Joan Lasenby

Comments: 15 pages, 5 figures

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

; Learning (cs.LG); Machine Learning (stat.ML)

Given the success of the gated recurrent unit, a natural question is whether

all the gates of the long short-term memory (LSTM) network are necessary.

Previous research has shown that the forget gate is one of the most important

gates in the LSTM. Here we show that a forget-gate-only version of the LSTM

with chrono-initialized biases, not only provides computational savings but

outperforms the standard LSTM on multiple benchmark datasets and competes with

some of the best contemporary models. Our proposed network, the JANET, achieves

accuracies of 99% and 92.5% on the MNIST and pMNIST datasets, outperforming the

standard LSTM which yields accuracies of 98.5% and 91%.

### Fast, Parameter free Outlier Identification for Robust PCA

Sheetal Kalyani

Comments: 13 pages. Submitted to IEEE JSTSP Special Issue on Data Science: Robust Subspace Learning and Tracking: Theory, Algorithms, and Applications

**Subjects**

:

Machine Learning (stat.ML)

; Learning (cs.LG)

Robust PCA, the problem of PCA in the presence of outliers has been

extensively investigated in the last few years. Here we focus on Robust PCA in

the column sparse outlier model. The existing methods for column sparse outlier

model assumes either the knowledge of the dimension of the lower dimensional

subspace or the fraction of outliers in the system. However in many

applications knowledge of these parameters is not available. Motivated by this

we propose a parameter free outlier identification method for robust PCA which

a) does not require the knowledge of outlier fraction, b) does not require the

knowledge of the dimension of the underlying subspace, c) is computationally

simple and fast. Further, analytical guarantees are derived for outlier

identification and the performance of the algorithm is compared with the

existing state of the art methods.

### Adversarial Clustering: A Grid Based Clustering Algorithm Against Active Adversaries

Wutao Wei , Bowei Xi , Murat Kantarcioglu **Subjects** : Machine Learning (stat.ML) ; Learning (cs.LG)

Nowadays more and more data are gathered for detecting and preventing cyber

attacks. In cyber security applications, data analytics techniques have to deal

with active adversaries that try to deceive the data analytics models and avoid

being detected. The existence of such adversarial behavior motivates the

development of robust and resilient adversarial learning techniques for various

tasks. Most of the previous work focused on adversarial classification

techniques, which assumed the existence of a reasonably large amount of

carefully labeled data instances. However, in practice, labeling the data

instances often requires costly and time-consuming human expertise and becomes

a significant bottleneck. Meanwhile, a large number of unlabeled instances can

also be used to understand the adversaries’ behavior. To address the above

mentioned challenges, in this paper, we develop a novel grid based adversarial

clustering algorithm. Our adversarial clustering algorithm is able to identify

the core normal regions, and to draw defensive walls around the centers of the

normal objects utilizing game theoretic ideas. Our algorithm also identifies

sub-clusters of attack objects, the overlapping areas within clusters, and

outliers which may be potential anomalies.

### Understanding Community Structure in Layered Neural Networks

Chihiro Watanabe , Kaoru Hiramatsu , Kunio Kashino **Subjects** : Machine Learning (stat.ML) ; Learning (cs.LG)

A layered neural network is now one of the most common choices for the

prediction of high-dimensional practical data sets, where the relationship

between input and output data is complex and cannot be represented well by

simple conventional models. Its effectiveness is shown in various tasks,

however, the lack of interpretability of the trained result by a layered neural

network has limited its application area.

In our previous studies, we proposed methods for extracting a simplified

global structure of a trained layered neural network by classifying the units

into communities according to their connection patterns with adjacent layers.

These methods provided us with knowledge about the strength of the relationship

between communities from the existence of bundled connections, which are

determined by threshold processing of the connection ratio between pairs of

communities.

However, it has been difficult to understand the role of each community

quantitatively by observing the modular structure. We could only know to which

sets of the input and output dimensions each community was mainly connected, by

tracing the bundled connections from the community to the input and output

layers. Another problem is that the finally obtained modular structure is

changed greatly depending on the setting of the threshold hyperparameter used

for determining bundled connections.

In this paper, we propose a new method for interpreting quantitatively the

role of each community in inference, by defining the effect of each input

dimension on a community, and the effect of a community on each output

dimension. We show experimentally that our proposed method can reveal the role

of each part of a layered neural network by applying the neural networks to

three types of data sets, extracting communities from the trained network, and

applying the proposed method to the community structure.

### RIPEx: Extracting malicious IP addresses from security forums using cross-forum learning

Evangelos E. Papalexakis ,

Michalis Faloutsos

Comments: 12 pages, Accepted in n 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018

**Subjects**

:

Information Retrieval (cs.IR)

; Learning (cs.LG)

Is it possible to extract malicious IP addresses reported in security forums

in an automatic way? This is the question at the heart of our work. We focus on

security forums, where security professionals and hackers share knowledge and

information, and often report misbehaving IP addresses. So far, there have only

been a few efforts to extract information from such security forums. We propose

RIPEx, a systematic approach to identify and label IP addresses in security

forums by utilizing a cross-forum learning method. In more detail, the

challenge is twofold: (a) identifying IP addresses from other numerical

entities, such as software version numbers, and (b) classifying the IP address

as benign or malicious. We propose an integrated solution that tackles both

these problems. A novelty of our approach is that it does not require training

data for each new forum. Our approach does knowledge transfer across forums: we

use a classifier from our source forums to identify seed information for

training a classifier on the target forum. We evaluate our method using data

collected from five security forums with a total of 31K users and 542K posts.

First, RIPEx can distinguish IP address from other numeric expressions with 95%

precision and above 93% recall on average. Second, RIPEx identifies malicious

IP addresses with an average precision of 88% and over 78% recall, using our

cross-forum learning. Our work is a first step towards harnessing the wealth of

useful information that can be found in security forums.

### Multimodal Unsupervised Image-to-Image Translation

Ming-Yu Liu ,

Serge Belongie ,

Jan Kautz

Comments: Code: this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised image-to-image translation is an important and challenging

problem in computer vision. Given an image in the source domain, the goal is to

learn the conditional distribution of corresponding images in the target

domain, without seeing any pairs of corresponding images. While this

conditional distribution is inherently multimodal, existing approaches make an

overly simplified assumption, modeling it as a deterministic one-to-one

mapping. As a result, they fail to generate diverse outputs from a given source

domain image. To address this limitation, we propose a Multimodal Unsupervised

Image-to-image Translation (MUNIT) framework. We assume that the image

representation can be decomposed into a content code that is domain-invariant,

and a style code that captures domain-specific properties. To translate an

image to another domain, we recombine its content code with a random style code

sampled from the style space of the target domain. We analyze the proposed

framework and establish several theoretical results. Extensive experiments with

comparisons to the state-of-the-art approaches further demonstrates the

advantage of the proposed framework. Moreover, our framework allows users to

control the style of translation outputs by providing an example style image.

Code and pretrained models are available at this https URL

### Network-based protein structural classification

Arash Rahnama , Khalique Newaz , Panos J. Antsaklis , Tijana Milenkovic **Subjects** : Molecular Networks (q-bio.MN) ; Learning (cs.LG); Machine Learning (stat.ML)

Experimental determination of protein function is resource-consuming. As an

alternative, computational prediction of protein function has received

attention. In this context, protein structural classification (PSC) can help,

by allowing for determining structural classes of currently unclassified

proteins based on their features, and then relying on the fact that proteins

with similar structures have similar functions. Existing PSC approaches rely on

sequence-based or direct (“raw”) 3-dimensional (3D) structure-based protein

features. Instead, we first model 3D structures as protein structure networks

(PSNs). Then, we use (“processed”) network-based features for PSC. We are the

first ones to do so. We propose the use of graphlets, state-of-the-art features

in many domains of network science, in the task of PSC. Moreover, because

graphlets can deal only with unweighted PSNs, and because accounting for edge

weights when constructing PSNs could improve PSC accuracy, we also propose a

deep learning framework that automatically learns network features from the

weighted PSNs. When evaluated on a large set of 9,509 CATH and 11,451 SCOP

protein domains, our proposed approaches are superior to existing PSC

approaches in terms of both accuracy and running time.

### Efficient Model Identification for Tensegrity Locomotion

Shaojun Zhu , David Surovik , Kostas E. Bekris , Abdeslam Boularias **Subjects** : Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)

This paper aims to identify in a practical manner unknown physical

parameters, such as mechanical models of actuated robot links, which are

critical in dynamical robotic tasks. Key features include the use of an

off-the-shelf physics engine and the Bayesian optimization framework. The task

being considered is locomotion with a high-dimensional, compliant Tensegrity

robot. A key insight, in this case, is the need to project the model

identification challenge into an appropriate lower dimensional space for

efficiency. Comparisons with alternatives indicate that the proposed method can

identify the parameters more accurately within the given time budget, which

also results in more precise locomotion control.

## Information Theory

### Shifted Coded Slotted ALOHA

Takayuki Nozaki

Comments: 5 pages, 7 figures, submitted to ISITA 2018

**Subjects**

:

Information Theory (cs.IT)

The random access scheme is a fundamental scenario in which users transmit

through a shared channel and cannot coordinate each other. In recent years,

successive interference cancellation (SIC) was introduced into the random

access scheme. It is possible to decode transmitted packets using collided

packets by the SIC. The coded slotted ALOHA (CSA) is a random access scheme

using the SIC. The CSA encodes each packet using a local code prior to

transmission. It is known that the CSA achieves excellent throughput. On the

other hand, it is reported that in the coding theory time shift improves the

decoding performance for packet-oriented erasure correcting codes. In this

paper, we propose a random access scheme which applies the time shift to the

CSA in order to achieve better throughput. Numerical examples show that our

proposed random access scheme achieves better throughput and packet loss rate

than the CSA.

### Erasure Correcting Codes by Using Shift Operation and Exclusive OR

Takayuki Nozaki

Comments: 6 pages, 1 figure, 3 tables, submitted to ISITA 2018

**Subjects**

:

Information Theory (cs.IT)

This paper proposes an erasure correcting code and its systematic form for

the distributed storage system.

The proposed codes are encoded by exclusive OR and bit-level shift operation.

By the shift operation, the encoded packets are slightly longer than the

source packets.

This paper evaluates the extra length of encoded packets, called overhead,

and shows that the proposed codes have smaller overheads than the zigzag

decodable code, which is an existing code using exclusive OR and bit-level

shift operation.

### Non-binary Code Correcting Single b-Burst of Insertions or Deletions

Takayuki Nozaki

Comments: 5 pages, submitted to ISITA 2018

**Subjects**

:

Information Theory (cs.IT)

This paper constructs a non-binary code correcting a single (b)-burst of

insertions or deletions. This paper also proposes a decoding algorithm of this

code and evaluates a lower bound of the cardinality of this code. Moreover, we

evaluate an asymptotic upper bound on the cardinality of codes which can

correct a single burst of insertions or deletions.

### Cooperative Strategies for {UAV}-Enabled Small Cell Networks Sharing Unlicensed Spectrum

Sung Hoon Lim ,

Sang-Woon Jeon ,

Seungjae Baek

Comments: 26 pages, 10 figures

**Subjects**

:

Information Theory (cs.IT)

In this paper, we study an aerial drone base station (DBS) assisted cellular

network that consists of a single ground macro base station (MBS), multiple

DBSs, and multiple ground terminals (GT). We assume that the MBS transmits to

the DBSs and the GTs in the licensed band while the DBSs use a separate

unlicensed band (e.g. Wi-Fi) to transmit to the GTs. For the utilization of the

DBSs, we propose a cooperative decode–forward (DF) protocol in which multiple

DBSs assist the terminals simultaneously while maintaining a predetermined

interference level on the coexisting unlicensed band users. For our network

setup, we formulate a joint optimization problem for minimizing the aggregate

gap between the target rates and the throughputs of terminals by optimizing

over the 3D positions of the DBSs and the resources (power, time, bandwidth) of

the network. To solve the optimization problem, we propose an efficient nested

structured algorithm based on particle swarm optimization and convex

optimization methods. Extensive numerical evaluations of the proposed algorithm

is performed considering various aspects to demonstrate the performance of our

algorithm and the gain for utilizing DBSs.

### 5G Wireless Network Slicing for eMBB, URLLC, and mMTC: A Communication-Theoretic View

Kasper F. Trillingsgaard ,

Osvaldo Simeone ,

Giuseppe Durisi

Comments: Submitted to IEEE

**Subjects**

:

Networking and Internet Architecture (cs.NI)

; Information Theory (cs.IT)

The grand objective of 5G wireless technology is to support services with

vastly heterogeneous requirements. Network slicing, in which each service

operates within an exclusive slice of allocated resources, is seen as a way to

cope with this heterogeneity. However, the shared nature of the wireless

channel allows non-orthogonal slicing, where services us overlapping slices of

resources at the cost of interference. This paper investigates the performance

of orthogonal and non-orthogonal slicing of radio resources for the

provisioning of the three generic services of 5G: enhanced mobile broadband

(eMBB), massive machine-type communications (mMTC), and ultra-reliable

low-latency communications (URLLC). We consider uplink communications from a

set of eMBB, mMTC and URLLC devices to a common base station. A

communication-theoretic model is proposed that accounts for the heterogeneous

requirements and characteristics of the three services. For non-orthogonal

slicing, different decoding architectures are considered, such as puncturing

and successive interference cancellation. The concept of reliability diversity

is introduced here as a design principle that takes advantage of the vastly

different reliability requirements across the services. This study reveals that

non-orthogonal slicing can lead, in some regimes, to significant gains in terms

of performance trade-offs among the three generic services compared to

orthogonal slicing.

### Connectivity in Random Annulus Graphs and the Geometric Block Model

Sainyam Galhotra , Arya Mazumdar , Soumyabrata Pal , Barna Saha **Subjects** : Discrete Mathematics (cs.DM) ; Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Learning (cs.LG)

Random geometric graphs are the simplest, and perhaps the earliest possible

random graph model of spatial networks, introduced by Gilbert in 1961. In the

most basic setting, a random geometric graph (G(n,r)) has (n) vertices. Each

vertex of the graph is assigned a real number in ([0,1]) randomly and

uniformly. There is an edge between two vertices if the corresponding two

random numbers differ by at most (r) (to mitigate the boundary effect, let us

consider the Lee distance here, (d_L(u,v) = min{|u-v|, 1-|u-v|})). It is

well-known that the connectivity threshold regime for random geometric graphs

is at (r approx frac{log n}{n}). In particular, if (r = frac{alog n}{n}),

then a random geometric graph is connected with high probability if and only if

(a > 1). Consider (G(n,frac{(1+epsilon)log{n}}{n})) for any (epsilon >0) to

satisfy the connectivity requirement and delete half of its edges which have

distance at most (frac{log{n}}{2n}). It is natural to believe that the

resultant graph will be disconnected. Surprisingly, we show that the graph

still remains connected!

Formally, generalizing random geometric graphs, we define a random annulus

graph (G(n, [r_1, r_2]), r_1 <r_2) with (n) vertices. Each vertex of the graph

is assigned a real number in ([0,1]) randomly and uniformly as before. There is

an edge between two vertices if the Lee distance between the corresponding two

random numbers is between (r_1) and (r_2), (0<r_1<r_2). Let us assume (r_1 =

frac{b log n}{n},) and (r_2 = frac{a log n}{n}, 0 <b <a). We show that this

graph is connected with high probability if and only if (a -b > frac12) and (a

>1). That is (G(n, [0,frac{0.99log n}{n}])) is not connected but

(G(n,[frac{0.50 log n}{n},frac{1+epsilon log n}{n}])) is.

This result is then used to give improved lower and upper bounds on the

recovery threshold of the geometric block model.

### On the Minimal Overcompleteness Allowing Universal Sparse Representation

Rotem Mulayoff , Tomer Michaeli **Subjects** : Signal Processing (eess.SP) ; Information Theory (cs.IT)

Sparse representation over redundant dictionaries constitutes a good model

for many classes of signals (e.g., patches of natural images, segments of

speech signals, etc.). However, despite its popularity, very little is known

about the representation capacity of this model. In this paper, we study how

redundant a dictionary must be so as to allow any vector to admit a sparse

approximation with a prescribed sparsity and a prescribed level of accuracy. We

address this problem both in a worst-case setting and in an average-case one.

For each scenario we derive lower and upper bounds on the minimal required

overcompleteness. Our bounds have simple closed-form expressions that allow to

easily deduce the asymptotic behavior in large dimensions. In particular, we

find that the required overcompleteness grows exponentially with the sparsity

level and polynomially with the allowed representation error. This implies that

universal sparse representation is practical only at moderate sparsity levels,

but can be achieved at relatively high accuracy. As a side effect of our

analysis, we obtain a tight lower bound on the regularized incomplete beta

function, which may be interesting in its own right. We illustrate the validity

of our results through numerical simulations, which support our findings.

### Robust 1-Bit Compressed Sensing via Hinge Loss Minimization

Martin Genzel , Alexander Stollenwerk **Subjects** : Statistics Theory (math.ST) ; Information Theory (cs.IT)

This work theoretically studies the problem of estimating a structured

high-dimensional signal (x_0 in mathbb{R}^n) from noisy (1)-bit Gaussian

measurements. Our recovery approach is based on a simple convex program which

uses the hinge loss function as data fidelity term. While such a risk

minimization strategy is very natural to learn binary output models, such as in

classification, its capacity to estimate a specific signal vector is largely

unexplored. A major difficulty is that the hinge loss is just piecewise linear,

so that its “curvature energy” is concentrated in a single point. This is

substantially different from other popular loss functions considered in signal

estimation, e.g., the square or logistic loss, which are at least locally

strongly convex. It is therefore somewhat unexpected that we can still prove

very similar types of recovery guarantees for the hinge loss estimator, even in

the presence of strong noise. More specifically, our non-asymptotic error

bounds show that stable and robust reconstruction of (x_0) can be achieved with

the optimal oversampling rate (O(m^{-1/2})) in terms of the number of

measurements (m). Moreover, we permit a wide class of structural assumptions on

the ground truth signal, in the sense that (x_0) can belong to an arbitrary

bounded convex set (K subset mathbb{R}^n). The proofs of our main results

rely on some recent advances in statistical learning theory due to Mendelson.

In particular, we invoke an adapted version of Mendelson’s small ball method

that allows us to establish a quadratic lower bound on the error of the first

order Taylor approximation of the empirical hinge loss function.

### On Deep Learning-based Massive MIMO Indoor User Localization

Sebastian Dörner ,

Sebastian Cammerer ,

Stephan ten Brink

Comments: submitted to SPAWC 2018

**Subjects**

:

Signal Processing (eess.SP)

; Information Theory (cs.IT)

We examine the usability of deep neural networks for multiple-input

multiple-output (MIMO) user positioning solely based on the orthogonal

frequency division multiplex (OFDM) complex channel coefficients. In contrast

to other indoor positioning systems (IPSs), the proposed method does not

require any additional piloting overhead or any other changes in the

communications system itself as it is deployed on top of an existing OFDM MIMO

system. Supported by actual measurements, we are mainly interested in the more

challenging non-line of sight (NLoS) scenario. However, gradient descent

optimization is known to require a large amount of data-points for training,

i.e., the required database would be too large when compared to conventional

methods. Thus, we propose a twostep training procedure, with training on

simulated line of sight (LoS) data in the first step, and finetuning on

measured NLoS positions in the second step. This turns out to reduce the

required measured training positions and thus, reduces the effort for data

acquisition.

#### 欢迎加入我爱机器学习QQ14群：336582044

微信扫一扫，关注我爱机器学习公众号

微博：我爱机器学习

*原文*

https://www.52ml.net/22367.html

本站部分文章源于互联网，本着传播知识、有益学习和研究的目的进行的转载，为网友免费提供。如有著作权人或出版方提出异议，本站将立即删除。如果您对文章转载有任何疑问请告之我们，以便我们及时纠正。PS：推荐一个微信公众号: askHarries 或者qq群：474807195，里面会分享一些资深架构师录制的视频录像：有Spring，MyBatis，Netty源码分析，高并发、高性能、分布式、微服务架构的原理，JVM性能优化这些成为架构师必备的知识体系。还能领取免费的学习资源，目前受益良多

*
*

转载请注明原文出处：Harries Blog™ » arXiv Paper Daily: Mon, 16 Apr 2018