转载

arXiv Paper Daily: Mon, 23 Apr 2018

Neural and Evolutionary Computing

An Investigation of Environmental Influence on the Benefits of Adaptation Mechanisms in Evolutionary Swarm Robotics

Andreas Steyven , Emma Hart , Ben Paechter

Comments: In GECCO 2017

Subjects

Neural and Evolutionary Computing (cs.NE)

A robotic swarm that is required to operate for long periods in a potentially

unknown environment can use both evolution and individual learning methods in

order to adapt. However, the role played by the environment in influencing the

effectiveness of each type of learning is not well understood. In this paper,

we address this question by analysing the performance of a swarm in a range of

simulated, dynamic environments where a distributed evolutionary algorithm for

evolving a controller is augmented with a number of different individual

learning mechanisms. The learning mechanisms themselves are defined by

parameters which can be either fixed or inherited. We conduct experiments in a

range of dynamic environments whose characteristics are varied so as to present

different opportunities for learning. Results enable us to map environmental

characteristics to the most effective learning algorithm.

Evolution of a Functionally Diverse Swarm via a Novel Decentralised Quality-Diversity Algorithm

Emma Hart , Andreas S.W. Steyven , Ben Paechter

Comments: In GECCO 2018

Subjects

Neural and Evolutionary Computing (cs.NE)

The presence of functional diversity within a group has been demonstrated to

lead to greater robustness, higher performance and increased problem-solving

ability in a broad range of studies that includes insect groups, human groups

and swarm robotics. Evolving group diversity however has proved challenging

within Evolutionary Robotics, requiring reproductive isolation and careful

attention to population size and selection mechanisms. To tackle this issue, we

introduce a novel, decentralised, variant of the MAP-Elites illumination

algorithm which is hybridised with a well-known distributed evolutionary

algorithm (mEDEA). The algorithm simultaneously evolves multiple diverse

behaviours for multiple robots, with respect to a simple token-gathering task.

Each robot in the swarm maintains a local archive defined by two pre-specified

functional traits which is shared with robots it come into contact with. We

investigate four different strategies for sharing, exploiting and combining

local archives and compare results to mEDEA. Experimental results show that in

contrast to previous claims, it is possible to evolve a functionally diverse

swarm without geographical isolation, and that the new method outperforms mEDEA

in terms of the diversity, coverage and precision of the evolved swarm.

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Shihui Yin , Gaurav Srivastava , Shreyas K. Venkataramanaiah , Chaitali Chakrabarti , Visar Berisha , Jae-sun Seo

Comments: 2017 Asilomar Conference on Signals, Systems and Computers

Subjects

Neural and Evolutionary Computing (cs.NE)

Deep learning algorithms have shown tremendous success in many recognition

tasks; however, these algorithms typically include a deep neural network (DNN)

structure and a large number of parameters, which makes it challenging to

implement them on power/area-constrained embedded platforms. To reduce the

network size, several studies investigated compression by introducing

element-wise or row-/column-/block-wise sparsity via pruning and

regularization. In addition, many recent works have focused on reducing

precision of activations and weights with some reducing down to a single bit.

However, combining various sparsity structures with binarized or

very-low-precision (2-3 bit) neural networks have not been comprehensively

explored. In this work, we present design techniques for minimum-area/-energy

DNN hardware with minimal degradation in accuracy. During training, both

binarization/low-precision and structured sparsity are applied as constraints

to find the smallest memory footprint for a given deep learning algorithm. The

DNN model for CIFAR-10 dataset with weight memory reduction of 50X exhibits

accuracy comparable to that of the floating-point counterpart. Area,

performance and energy results of DNN hardware in 40nm CMOS are reported for

the MNIST dataset. The optimized DNN that combines 8X structured compression

and 3-bit weight precision showed 98.4% accuracy at 20nJ per classification.

A Simple Quantum Neural Net with a Periodic Activation Function

Ammar Daskin

Comments: conference paper

Subjects

Quantum Physics (quant-ph)

; Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this paper, we propose a simple neural net that requires only (O(nlog_2k))

numbers of quantum gates and qubits: Here, (n) is the number of input

parameters, and (k) is the number of weights applied to these input parameters

in the proposed neural net. We describe the network in terms of a quantum

circuit, and then draw its equivalent classical neural net which involves

(O(k^n)) nodes in the hidden layer. Then, we show that the network uses a

periodic activation function of cosine values of the linear combinations of the

inputs and weights. The steps of the gradient descent are described, and then

Iris and Breast cancer datasets are used for the numerical simulations. The

numerical results indicate the network can be used in machine learning problems

and it may provide exponential speedup over the same structured classical

neural net.

Computer Vision and Pattern Recognition

Synthesizing Images of Humans in Unseen Poses

Guha Balakrishnan , Amy Zhao , Adrian V. Dalca , Fredo Durand , John Guttag

Comments: CVPR 2018

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We address the computational problem of novel human pose synthesis. Given an

image of a person and a desired pose, we produce a depiction of that person in

that pose, retaining the appearance of both the person and background. We

present a modular generative neural network that synthesizes unseen poses using

training pairs of images and poses taken from human action videos. Our network

separates a scene into different body part and background layers, moves body

parts to new locations and refines their appearances, and composites the new

foreground with a hole-filled background. These subtasks, implemented with

separate modules, are trained jointly using only a single target image as a

supervised label. We use an adversarial discriminator to force our network to

synthesize realistic details conditioned on pose. We demonstrate image

synthesis results on three action classes: golf, yoga/workouts and tennis, and

show that our method produces accurate results within action classes as well as

across action classes. Given a sequence of desired poses, we also produce

coherent videos of actions.

ADef: an Iterative Algorithm to Construct Adversarial Deformations

Rima Alaifari , Giovanni S. Alberti , Tandri Gauksson

Comments: 10 pages, 5 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Cryptography and Security (cs.CR); Learning (cs.LG); Machine Learning (stat.ML)

While deep neural networks have proven to be a powerful tool for many

recognition and classification tasks, their stability properties are still not

well understood. In the past, image classifiers have been shown to be

vulnerable to so-called adversarial attacks, which are created by additively

perturbing the correctly classified image.

In this paper, we propose the ADef algorithm to construct a different kind of

adversarial attack created by iteratively applying small deformations to the

image, found through a gradient descent step. We demonstrate our results on

MNIST with a convolutional neural network and on ImageNet with Inception-v3 and

ResNet-101.

Image Inpainting for Irregular Holes Using Partial Convolutions

Guilin Liu , Fitsum A. Reda , Kevin J. Shih , Ting-Chun Wang , Andrew Tao , Bryan Catanzaro

Comments: 23 pages, includes appendix

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Existing deep learning based image inpainting methods use a standard

convolutional network over the corrupted image, using convolutional filter

responses conditioned on both valid pixels as well as the substitute values in

the masked holes (typically the mean value). This often leads to artifacts such

as color discrepancy and blurriness. Post-processing is usually used to reduce

such artifacts, but are expensive and may fail. We propose the use of partial

convolutions, where the convolution is masked and renormalized to be

conditioned on only valid pixels. We further include a mechanism to

automatically generate an updated mask for the next layer as part of the

forward pass. Our model outperforms other methods for irregular masks. We show

qualitative and quantitative comparisons with other methods to validate our

approach.

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Yu-Wei Chao , Sudheendra Vijayanarasimhan , Bryan Seybold , David A. Ross , Jia Deng , Rahul Sukthankar

Comments: Accepted in CVPR 2018

Subjects

Computer Vision and Pattern Recognition (cs.CV)

We propose TAL-Net, an improved approach to temporal action localization in

video that is inspired by the Faster R-CNN object detection framework. TAL-Net

addresses three key shortcomings of existing approaches: (1) we improve

receptive field alignment using a multi-scale architecture that can accommodate

extreme variation in action durations; (2) we better exploit the temporal

context of actions for both proposal generation and action classification by

appropriately extending receptive fields; and (3) we explicitly consider

multi-stream feature fusion and demonstrate that fusing motion late is

important. We achieve state-of-the-art performance for both action proposal and

localization on THUMOS’14 detection benchmark and competitive performance on

ActivityNet challenge.

One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach

Decebal Constantin Mocanu , Elena Mocanu

Journal-ref: 17th International Conference on Autonomous Agents and Multiagent

Systems (AAMAS 2018)

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Deep learning, even if it is very successful nowadays, traditionally needs

very large amounts of labeled data to perform excellent on the classification

task. In an attempt to solve this problem, the one-shot learning paradigm,

which makes use of just one labeled sample per class and prior knowledge,

becomes increasingly important. In this paper, we propose a new one-shot

learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform

classification. Complementary to prior studies, MoVAE represents a shift of

paradigm in comparison with the usual one-shot learning methods, as it does not

use any prior knowledge. Instead, it starts from zero knowledge and one labeled

sample per class. Afterward, by using unlabeled data and the generalization

learning concept (in a way, more as humans do), it is capable to gradually

improve by itself its performance. Even more, if there are no unlabeled data

available MoVAE can still perform well in one-shot learning classification. We

demonstrate empirically the efficiency of our proposed approach on three

datasets, i.e. the handwritten digits (MNIST), fashion products

(Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE

outperforms state-of-the-art one-shot learning algorithms.

MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices

Sheng Chen , Yang Liu , Xiang Gao , Zhen Han

Comments: To be submitted to SPL

Subjects

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG)

In this paper, we present a class of extremely efficient CNN models called

MobileFaceNets, which use no more than 1 million parameters and specifically

tailored for high-accuracy real-time face verification on mobile and embedded

devices. We also make a simple analysis on the weakness of common mobile

networks for face verification. The weakness has been well overcome by our

specifically designed MobileFaceNets. Under the same experimental conditions,

our MobileFaceNets achieve significantly superior accuracy as well as more than

2 times actual speedup over MobileNetV2. After trained by ArcFace loss on the

refined MS-Celeb-1M from scratch, our single MobileFaceNet model of 4.0MB size

achieves 99.55% face verification accuracy on LFW and 92.59% TAR (FAR1e-6) on

MegaFace Challenge 1, which is even comparable to state-of-the-art big CNN

models of hundreds MB size. The fastest one of our MobileFaceNets has an actual

inference time of 18 milliseconds on a mobile phone. Our experiments on LFW,

AgeDB, and MegaFace show that our MobileFaceNets achieve significantly improved

efficiency compared with the state-of-the-art lightweight and mobile CNNs for

face verification.

An Approximate Shading Model with Detail Decomposition for Object Relighting

Zicheng Liao , Kevin Karsch , Hongyi Zhang , David Forsyth Subjects : Computer Vision and Pattern Recognition (cs.CV)

We present an object relighting system that allows an artist to select an

object from an image and insert it into a target scene. Through simple

interactions, the system can adjust illumination on the inserted object so that

it appears naturally in the scene. To support image-based relighting, we build

object model from the image, and propose a emph{perceptually-inspired}

approximate shading model for the relighting. It decomposes the shading field

into (a) a rough shape term that can be reshaded, (b) a parametric shading

detail that encodes missing features from the first term, and (c) a geometric

detail term that captures fine-scale material properties. With this

decomposition, the shading model combines 3D rendering and image-based

composition and allows more flexible compositing than image-based methods.

Quantitative evaluation and a set of user studies suggest our method is a

promising alternative to existing methods of object insertion.

Residual-Guide Feature Fusion Network for Single Image Deraining

Zhiwen Fan , Huafeng Wu , Xueyang Fu , Yue Hunag , Xinghao Ding Subjects : Computer Vision and Pattern Recognition (cs.CV)

Single image rain streaks removal is extremely important since rainy images

adversely affect many computer vision systems. Deep learning based methods have

found great success in image deraining tasks. In this paper, we propose a novel

residual-guide feature fusion network, called ResGuideNet, for single image

deraining that progressively predicts highquality reconstruction. Specifically,

we propose a cascaded network and adopt residuals generated from shallower

blocks to guide deeper blocks. By using this strategy, we can obtain a coarse

to fine estimation of negative residual as the blocks go deeper. The outputs of

different blocks are merged into the final reconstruction. We adopt recursive

convolution to build each block and apply supervision to all intermediate

results, which enable our model to achieve promising performance on synthetic

and real-world data while using fewer parameters than previous required.

ResGuideNet is detachable to meet different rainy conditions. For images with

light rain streaks and limited computational resource at test time, we can

obtain a decent performance even with several building blocks. Experiments

validate that ResGuideNet can benefit other low- and high-level vision tasks.

Graph-based Hypothesis Generation for Parallax-tolerant Image Stitching

Jing Chen , Nan Li , Tianli Liao

Comments: 3 pages, 3 figures, 2 tables

Subjects

Computer Vision and Pattern Recognition (cs.CV)

The seam-driven approach has been proven fairly effective for

parallax-tolerant image stitching, whose strategy is to search for an invisible

seam from finite representative hypotheses of local alignment. In this paper,

we propose a graph-based hypothesis generation and a seam-guided local

alignment for improving the effectiveness and the efficiency of the seam-driven

approach. The experiment demonstrates the significant reduction of number of

hypotheses and the improved quality of naturalness of final stitching results,

comparing to the state-of-the-art method SEAGULL.

Accurate Deep Direct Geo-Localization from Ground Imagery and Phone-Grade GPS

Shaohui Sun , Ramesh Sarukkai , Jack Kwok , Vinay Shet

Comments: To appear in CVPR 2018 Workshops

Subjects

Computer Vision and Pattern Recognition (cs.CV)

One of the most critical topics in autonomous driving or ride-sharing

technology is to accurately localize vehicles in the world frame. In addition

to common multi-view camera systems, it usually also relies on industrial grade

sensors, such as LiDAR, differential GPS, high precision IMU, and etc. In this

paper, we develop an approach to provide an effective solution to this problem.

We propose a method to train a geo-spatial deep neural network (CNN+LSTM) to

predict accurate geo-locations (latitude and longitude) using only ordinary

ground imagery and low accuracy phone-grade GPS. We evaluate our approach on

the open dataset released during ACM Multimedia 2017 Grand Challenge. Having

ground truth locations for training, we are able to reach nearly lane-level

accuracy. We also evaluate the proposed method on our own collected images in

San Francisco downtown area often described as “downtown canyon” where consumer

GPS signals are extremely inaccurate. The results show the model can predict

quality locations that suffice in real business applications, such as

ride-sharing, only using phone-grade GPS. Unlike classic visual localization or

recent PoseNet-like methods that may work well in indoor environments or

small-scale outdoor environments, we avoid using a map or an SFM

(structure-from-motion) model at all. More importantly, the proposed method can

be scaled up without concerns over the potential failure of 3D reconstruction.

A Complementary Tracking Model with Multiple Features

Peng Gao , Yipeng Ma , Ke Song , Chao Li , Fei Wang , Liyi Xiao Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Graphics (cs.GR)

Discriminative Correlation Filters (DCF)-based tracking algorithms exploiting

conventional handcrafted features have achieved impressive results both in

terms of accuracy and robustness. Template handcrafted features have shown

excellent performance, but they perform poorly when the appearance of target

changes rapidly such as fast motions and fast deformations. In contrast,

statistical handcrafted features are insensitive to fast states changes, but

they yield inferior performance in the scenarios of illumination variations and

background clutters. In this work, to achieve an efficient tracking

performance, we propose a novel visual tracking algorithm, named MFCMT, based

on a complementary ensemble model with multiple features, including Histogram

of Oriented Gradients (HOGs), Color Names (CNs) and Color Histograms (CHs).

Additionally, to improve tracking results and prevent targets drift, we

introduce an effective fusion method by exploiting relative entropy to coalesce

all basic response maps and get an optimal response. Furthermore, we suggest a

simple but efficient update strategy to boost tracking performance.

Comprehensive evaluations are conducted on two tracking benchmarks demonstrate

and the experimental results demonstrate that our method is competitive with

numerous state-of-the-art trackers. Our tracker achieves impressive performance

with faster speed on these benchmarks.

Generating a Fusion Image: One's Identity and Another's Shape

Donggyu Joo , Doyeon Kim , Junmo Kim

Comments: To appear in CVPR 2018

Subjects

Computer Vision and Pattern Recognition (cs.CV)

Generating a novel image by manipulating two input images is an interesting

research problem in the study of generative adversarial networks (GANs). We

propose a new GAN-based network that generates a fusion image with the identity

of input image x and the shape of input image y. Our network can simultaneously

train on more than two image datasets in an unsupervised manner. We define an

identity loss LI to catch the identity of image x and a shape loss LS to get

the shape of y. In addition, we propose a novel training method called

Min-Patch training to focus the generator on crucial parts of an image, rather

than its entirety. We show qualitative results on the VGG Youtube Pose dataset,

Eye dataset (MPIIGaze and UnityEyes), and the Photo-Sketch-Cartoon dataset.

View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition

Pengfei Zhang , Cuiling Lan , Junliang Xing , Wenjun Zeng , Jianru Xue , Nanning Zheng Subjects : Computer Vision and Pattern Recognition (cs.CV)

Skeleton-based human action recognition has recently attracted increasing

attention thanks to the accessibility and the popularity of 3D skeleton data.

One of the key challenges in skeleton-based action recognition lies in the

large view variations when capturing data. In order to alleviate the effects of

view variations, this paper introduces a novel view adaptation scheme, which

automatically determines the virtual observation viewpoints in a learning based

data driven manner. We design two view adaptive neural networks, i.e., VA-RNN

based on RNN, and VA-CNN based on CNN.. For each network, a novel view

adaptation module learns and determines the most suitable observation

viewpoints, and transforms the skeletons to those viewpoints for the end-to-end

recognition with a main classification network. Ablation studies find that the

proposed view adaptive models are capable of transforming the skeletons of

various viewpoints to much more consistent virtual viewpoints which largely

eliminates the viewpoint influence. In addition, we design a two-stream scheme

(referred to as VA-fusion) that fuses the scores of the two networks to provide

the fused prediction. Extensive experimental evaluations on five challenging

benchmarks demonstrate that the effectiveness of the proposed view-adaptive

networks and superior performance over state-of-the-art approaches.

Vision Meets Drones: A Challenge

Pengfei Zhu , Longyin Wen , Xiao Bian , Haibing Ling , Qinghua Hu

Comments: 11 pages, 11 figures

Subjects

Computer Vision and Pattern Recognition (cs.CV)

In this paper we present a large-scale visual object detection and tracking

benchmark, named VisDrone2018, aiming at advancing visual understanding tasks

on the drone platform. The images and video sequences in the benchmark were

captured over various urban/suburban areas of 14 different cities across China

from north to south. Specifically, VisDrone2018 consists of 263 video clips and

10,209 images (no overlap with video clips) with rich annotations, including

object bounding boxes, object categories, occlusion, truncation ratios, etc.

With intensive amount of effort, our benchmark has more than 2.5 million

annotated instances in 179,264 images/video frames. Being the largest such

dataset ever published, the benchmark enables extensive evaluation and

investigation of visual analysis algorithms on the drone platform. In

particular, we design four popular tasks with the benchmark, including object

detection in images, object detection in videos, single object tracking, and

multi-object tracking. All these tasks are extremely challenging in the

proposed dataset due to factors such as occlusion, large scale and pose

variation, and fast motion. We hope the benchmark largely boost the research

and development in visual analysis on drone platforms.

Calibration-free B0 correction of EPI data using structured low rank matrix recovery

Arvind Balachandrasekaran , Merry Mani , Mathews Jacob Subjects : Computer Vision and Pattern Recognition (cs.CV)

We introduce a structured low rank algorithm for the calibration-free

compensation of field inhomogeneity artifacts in Echo Planar Imaging (EPI) MRI

data. We acquire the data using two EPI readouts that differ in echo-time (TE).

Using time segmentation, we reformulate the field inhomogeneity compensation

problem as the recovery of an image time series from highly undersampled

Fourier measurements. The temporal profile at each pixel is modeled as a single

exponential, which is exploited to fill in the missing entries. We show that

the exponential behavior at each pixel, along with the spatial smoothness of

the exponential parameters, can be exploited to derive a 3D annihilation

relation in the Fourier domain. This relation translates to a low rank property

on a structured multi-fold Toeplitz matrix, whose entries correspond to the

measured k-space samples. We introduce a fast two-step algorithm for the

completion of the Toeplitz matrix from the available samples. In the first

step, we estimate the null space vectors of the Toeplitz matrix using only its

fully sampled rows. The null space is then used to estimate the signal

subspace, which facilitates the efficient recovery of the time series of

images. We finally demonstrate the proposed approach on spherical MR phantom

data and human data and show that the artifacts are significantly reduced. The

proposed approach could potentially be used to compensate for time varying

field map variations in dynamic applications such as functional MRI.

High Dynamic Range SLAM with Map-Aware Exposure Time Control

Sergey V. Alexandrov , Johann Prankl , Michael Zillich , Markus Vincze

Comments: 3DV 2017

Subjects

Computer Vision and Pattern Recognition (cs.CV)

The research in dense online 3D mapping is mostly focused on the geometrical

accuracy and spatial extent of the reconstructions. Their color appearance is

often neglected, leading to inconsistent colors and noticeable artifacts. We

rectify this by extending a state-of-the-art SLAM system to accumulate colors

in HDR space. We replace the simplistic pixel intensity averaging scheme with

HDR color fusion rules tailored to the incremental nature of SLAM and a noise

model suitable for off-the-shelf RGB-D cameras. Our main contribution is a

map-aware exposure time controller. It makes decisions based on the global

state of the map and predicted camera motion, attempting to maximize the

information gain of each observation. We report a set of experiments

demonstrating the improved texture quality and advantages of using the custom

controller that is tightly integrated in the mapping loop.

Survey of Face Detection on Low-quality Images

Yuqian Zhou , Ding Liu , Thomas Huang Subjects : Computer Vision and Pattern Recognition (cs.CV)

Face detection is a well-explored problem. Many challenges on face detectors

like extreme pose, illumination, low resolution and small scales are studied in

the previous work. However, previous proposed models are mostly trained and

tested on good-quality images which are not always the case for practical

applications like surveillance systems. In this paper, we first review the

current state-of-the-art face detectors and their performance on benchmark

dataset FDDB, and compare the design protocols of the algorithms. Secondly, we

investigate their performance degradation while testing on low-quality images

with different levels of blur, noise, and contrast. Our results demonstrate

that both hand-crafted and deep-learning based face detectors are not robust

enough for low-quality images. It inspires researchers to produce more robust

design for face detection in the wild.

Unsupervised Representation Adversarial Learning Network: from Reconstruction to Generation

Yuqian Zhou , Kuangxiao Gu , Thomas Huang Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Learning (cs.LG); Machine Learning (stat.ML)

A good representation for arbitrarily complicated data should have the

capability of semantic generation, clustering and reconstruction. Previous

research has already achieved impressive performance on either one. This paper

aims at learning a disentangled representation effective for all of them in an

unsupervised way. To achieve all the three tasks together, we learn the forward

and inverse mapping between data and representation on the basis of a symmetric

adversarial process. In theory, we minimize the upper bound of the two

conditional entropy loss between the latent variables and the observations

together to achieve the cycle consistency. The newly proposed RepGAN is tested

on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised or

semi-supervised classification, generation and reconstruction tasks. The result

demonstrates that RepGAN is able to learn a useful and competitive

representation. To the author’s knowledge, our work is the first one to achieve

both a high unsupervised classification accuracy and low reconstruction error

on MNIST.

Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events

Sanjeel Parekh , Slim Essid , Alexey Ozerov , Ngoc Q. K. Duong , Patrick Pérez , Gaël Richard Subjects : Computer Vision and Pattern Recognition (cs.CV) ; Sound (cs.SD); Audio and Speech Processing (eess.AS)

Audio-visual representation learning is an important task from the

perspective of designing machines with the ability to understand complex

events. To this end, we propose a novel multimodal framework that instantiates

multiple instance learning. We show that the learnt representations are useful

for classifying events and localizing their characteristic audio-visual

elements. The system is trained using only video-level event labels without any

timing information. An important feature of our method is its capacity to learn

from unsynchronized audio-visual events. We achieve state-of-the-art results on

a large-scale dataset of weakly-labeled audio event videos. Visualizations of

localized visual regions and audio segments substantiate our system’s efficacy,

especially when dealing with noisy situations where modality-specific cues

appear asynchronously.

Super-resolution Ultrasound Localization Microscopy through Deep Learning

Ruud J.G. van Sloun , Oren Solomon , Matthew Bruce , Zin Z. Khaing , Hessel Wijkstra , Yonina C. Eldar , Massimo Mischi

Comments: 14 pages

Subjects

Signal Processing (eess.SP)

; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Ultrasound localization microscopy has enabled super-resolution vascular

imaging in laboratory environments through precise localization of individual

ultrasound contrast agents across numerous imaging frames. However, analysis of

high-density regions with significant overlaps among the agents’ point spread

responses yields high localization errors, constraining the technique to

low-concentration conditions. As such, long acquisition times are required to

sufficiently cover the vascular bed. In this work, we present a fast and

precise method for obtaining super-resolution vascular images from high-density

contrast-enhanced ultrasound imaging data. This method, which we term Deep

Ultrasound Localization Microscopy (Deep-ULM), exploits modern deep learning

strategies and employs a convolutional neural network to perform localization

microscopy in dense scenarios. This end-to-end fully convolutional neural

network architecture is trained effectively using on-line synthesized data,

enabling robust inference in-vivo under a wide variety of imaging conditions.

We show that deep learning attains super-resolution with challenging

contrast-agent concentrations (microbubble densities), both in-silico as well

as in-vivo, as we go from ultrasound scans of a rodent spinal cord in an

experimental setting to standard clinically-acquired recordings in a human

prostate. Deep-ULM achieves high quality sub-diffraction recovery, and is

suitable for real-time applications, resolving about 135 high-resolution

64×64-patches per second on a standard PC. Exploiting GPU computation, this

number increases to 2500 patches per second.

Analyzing Solar Irradiance Variation From GPS and Cameras

Shilpa Manandhar , Soumyabrata Dev , Yee Hui Lee , Yu Song Meng

Comments: Published in IEEE AP-S Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, 2018

Subjects

Instrumentation and Methods for Astrophysics (astro-ph.IM)

; Computer Vision and Pattern Recognition (cs.CV)

The total amount of solar irradiance falling on the earth’s surface is an

important area of study amongst the photo-voltaic (PV) engineers and remote

sensing analysts. The received solar irradiance impacts the total amount of

generated solar energy. However, this generation is often hindered by the high

degree of solar irradiance variability. In this paper, we study the main

factors behind such variability with the assistance of Global Positioning

System (GPS) and ground-based, high-resolution sky cameras. This analysis will

also be helpful for understanding cloud phenomenon and other events in the

earth’s atmosphere.

Revisiting Small Batch Training for Deep Neural Networks

Dominic Masters , Carlo Luschi Subjects : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Modern deep neural network training is typically based on mini-batch

stochastic gradient optimization. While the use of large mini-batches increases

the available computational parallelism, small batch training has been shown to

provide improved generalization performance and allows a significantly smaller

memory footprint, which might also be exploited to improve machine throughput.

In this paper, we review common assumptions on learning rate scaling and

training duration, as a basis for an experimental comparison of test

performance for different mini-batch sizes. We adopt a learning rate that

corresponds to a constant average weight update per gradient calculation (i.e.,

per unit cost of computation), and point out that this results in a variance of

the weight updates that increases linearly with the mini-batch size (m).

The collected experimental results for the CIFAR-10, CIFAR-100 and ImageNet

datasets show that increasing the mini-batch size progressively reduces the

range of learning rates that provide stable convergence and acceptable test

performance. On the other hand, small mini-batch sizes provide more up-to-date

gradient calculations, which yields more stable and reliable training. The best

performance has been consistently obtained for mini-batch sizes between (m = 2)

and (m = 32), which contrasts with recent work advocating the use of mini-batch

sizes in the thousands.

Video based Contextual Question Answering

Akash Ganesan , Divyansh Pal , Karthik Muthuraman , Shubham Dash Subjects : Computation and Language (cs.CL) ; Computer Vision and Pattern Recognition (cs.CV)

The primary aim of this project is to build a contextual Question-Answering

model for videos. The current methodologies provide a robust model for image

based Question-Answering, but we are aim to generalize this approach to be

videos. We propose a graphical representation of video which is able to handle

several types of queries across the whole video. For example, if a frame has an

image of a man and a cat sitting, it should be able to handle queries like,

where is the cat sitting with respect to the man? or ,what is the man holding

in his hand?. It should be able to answer queries relating to temporal

relationships also.

Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families

Seong Jae Hwang , Ronak Mehta , Vikas Singh

Comments: First version. Submitted to ECCV 2018

Subjects

Learning (cs.LG)

; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

There has recently been a concerted effort to derive mechanisms in vision and

machine learning systems to offer uncertainty estimates of the predictions they

make. Clearly, there are enormous benefits to a system that is not only

accurate but also has a sense for when it is not sure. Existing proposals

center around Bayesian interpretations of modern deep architectures — these

are effective but can often be computationally demanding. We show how classical

ideas in the literature on exponential families on probabilistic networks

provide an excellent starting point to derive uncertainty estimates in Gated

Recurrent Units (GRU). Our proposal directly quantifies uncertainty

deterministically, without the need for costly sampling-based estimation. We

demonstrate how our model can be used to quantitatively and qualitatively

measure uncertainty in unsupervised image sequence prediction. To our

knowledge, this is the first result describing sampling-free uncertainty

estimation for powerful sequential models such as GRUs.

Artificial Intelligence

Delegating via Quitting Games

Juan Afanador , Nir Oren , Murilo S. Baptista Subjects : Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)

Delegation allows an agent to request that another agent completes a task. In

many situations the task may be delegated onwards, and this process can repeat

until it is eventually, successfully or unsuccessfully, performed. We consider

policies to guide an agent in choosing who to delegate to when such recursive

interactions are possible. These policies, based on quitting games and

multi-armed bandits, were empirically tested for effectiveness. Our results

indicate that the quitting game based policies outperform those which do not

explicitly account for the recursive nature of delegation.

Preference-Guided Planning: An Active Elicitation Approach

Mayukh Das , Phillip Odom , Md. Rakibul Islam , Janardhan Rao (Jana)

Doppa , Dan Roth , Sriraam Natarajan

Comments: Under Review at Knowledge-Based Systems (Elsevier); “Extended Abstract” accepted and to appear at AAMAS 2018

Subjects

Artificial Intelligence (cs.AI)

Planning with preferences has been employed extensively to quickly generate

high-quality plans. However, it may be difficult for the human expert to supply

this information without knowledge of the reasoning employed by the planner and

the distribution of planning problems. We consider the problem of actively

eliciting preferences from a human expert during the planning process.

Specifically, we study this problem in the context of the Hierarchical Task

Network (HTN) planning framework as it allows easy interaction with the human.

Our experimental results on several diverse planning domains show that the

preferences gathered using the proposed approach improve the quality and speed

of the planner, while reducing the burden on the human expert.

Cross-domain Dialogue Policy Transfer via Simultaneous Speech-act and Slot Alignment

Kaixiang Mo , Yu Zhang , Qiang Yang , Pascale Fung

Comments: v7

Subjects

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI)

Dialogue policy transfer enables us to build dialogue policies in a target

domain with little data by leveraging knowledge from a source domain with

plenty of data. Dialogue sentences are usually represented by speech-acts and

domain slots, and the dialogue policy transfer is usually achieved by assigning

a slot mapping matrix based on human heuristics. However, existing dialogue

policy transfer methods cannot transfer across dialogue domains with different

speech-acts, for example, between systems built by different companies. Also,

they depend on either common slots or slot entropy, which are not available

when the source and target slots are totally disjoint and no database is

available to calculate the slot entropy. To solve this problem, we propose a

Policy tRansfer across dOMaIns and SpEech-acts (PROMISE) model, which is able

to transfer dialogue policies across domains with different speech-acts and

disjoint slots. The PROMISE model can learn to align different speech-acts and

slots simultaneously, and it does not require common slots or the calculation

of the slot entropy. Experiments on both real-world dialogue data and

simulations demonstrate that PROMISE model can effectively transfer dialogue

policies across domains with different speech-acts and disjoint slots.

An Ensemble Generation MethodBased on Instance Hardness

Felipe N. Walmsley , George D. C. Cavalcanti , Dayvid V. R. Oliveira , Rafael M. O. Cruz , Robert Sabourin

Comments: Paper accepted for publication on IJCNN 2018

Subjects

Learning (cs.LG)

; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In Machine Learning, ensemble methods have been receiving a great deal of

attention. Techniques such as Bagging and Boosting have been successfully

applied to a variety of problems. Nevertheless, such techniques are still

susceptible to the effects of noise and outliers in the training data. We

propose a new method for the generation of pools of classifiers based on

Bagging, in which the probability of an instance being selected during the

resampling process is inversely proportional to its instance hardness, which

can be understood as the likelihood of an instance being misclassified,

regardless of the choice of classifier. The goal of the proposed method is to

remove noisy data without sacrificing the hard instances which are likely to be

found on class boundaries. We evaluate the performance of the method in

nineteen public data sets, and compare it to the performance of the Bagging and

Random Subspace algorithms. Our experiments show that in high noise scenarios

the accuracy of our method is significantly better than that of Bagging.

Stylistic Variation in Social Media Part-of-Speech Tagging

Murali Raghu Babu Balusu , Taha Merghani , Jacob Eisenstein

Comments: 9 pages, Published in Proceedings of NAACL workshop on stylistic variation (2018)

Subjects

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI)

Social media features substantial stylistic variation, raising new challenges

for syntactic analysis of online writing. However, this variation is often

aligned with author attributes such as age, gender, and geography, as well as

more readily-available social network metadata. In this paper, we report new

evidence on the link between language and social networks in the task of

part-of-speech tagging. We find that tagger error rates are correlated with

network structure, with high accuracy in some parts of the network, and lower

accuracy elsewhere. As a result, tagger accuracy depends on training from a

balanced sample of the network, rather than training on texts from a narrow

subcommunity. We also describe our attempts to add robustness to stylistic

variation, by building a mixture-of-experts model in which each expert is

associated with a region of the social network. While prior work found that

similar approaches yield performance improvements in sentiment analysis and

entity linking, we were unable to obtain performance improvements in

part-of-speech tagging, despite strong evidence for the link between

part-of-speech error rates and social network structure.

Information Retrieval

The Role-Relevance Model for Enhanced Semantic Targeting in Unstructured Text

Christopher A. George , Onur Ozdemir , Connie E. Fournelle , Kendra E. Moore

Comments: 10 pages, 3 figures, 6 tables, presented at SPIE Defense + Commercial Sensing: Next Generation Analyst (2018)

Subjects

Information Retrieval (cs.IR)

Personalized search provides a potentially powerful tool, however, it is

limited due to the large number of roles that a person has: parent, employee,

consumer, etc. We present the role-relevance algorithm: a search technique that

favors search results relevant to the user’s current role. The role-relevance

algorithm uses three factors to score documents: (1) the number of keywords

each document contains; (2) each document’s geographic relevance to the user’s

role (if applicable); and (3) each document’s topical relevance to the user’s

role (if applicable). Topical relevance is assessed using a novel extension to

Latent Dirichlet Allocation (LDA) that allows standard LDA to score document

relevance to user-defined topics. Overall results on a pre-labeled corpus show

an average improvement in search precision of approximately 20% compared to

keyword search alone.

twAwler: A lightweight twitter crawler

Polyvios Pratikakis

Comments: 8 pages, 7 figures, about to submit for review

Subjects

Social and Information Networks (cs.SI)

; Information Retrieval (cs.IR)

This paper presents twAwler, a lightweight twitter crawler that targets

language-specific communities of users. twAwler takes advantage of multiple

endpoints of the twitter API to explore user relations and quickly recognize

users belonging to the targetted set. It performs a complete crawl for all

users, discovering many standard user relations, including the retweet graph,

mention graph, reply graph, quote graph, follow graph, etc. twAwler respects

all twitter policies and rate limits, while able to monitor large communities

of active users.

twAwler was used between August 2016 and March 2018 to generate an extensive

dataset of close to all Greek-speaking twitter accounts (about 330 thousand)

and their tweets and relations. In total, the crawler has gathered 750 million

tweets of which 424 million are in Greek; 750 million follow relations;

information about 300 thousand lists, their members (119 million member

relations) and subscribers (27 thousand subscription relations); 705 thousand

trending topics; information on 52 million users in total of which 292 thousand

have been since suspended, 141 thousand have deleted their account, and 3.5

million are protected and cannot be crawled. twAwler mines the collected tweets

for the retweet, quote, reply, and mention graphs, which, in addition to the

follow relation crawled, offer vast opportunities for analysis and further

research.

The FactChecker: Verifying Text Summaries of Relational Data Sets

Saehan Jo , Immanuel Trummer , Weicheng Yu , Daniel Liu , Niyati Mehta

Comments: 13 pages, 11 figures, 6 tables

Subjects

Databases (cs.DB)

; Information Retrieval (cs.IR)

We present a novel natural language query interface, the FactChecker, aimed

at text summaries of relational data sets. The tool focuses on natural language

claims that translate into an SQL query and a claimed query result. Similar in

spirit to a spell checker, the FactChecker marks up text passages that seem to

be inconsistent with the actual data. At the heart of the system is a

probabilistic model that reasons about the input document in a holistic

fashion. Based on claim keywords and the document structure, it maps each text

claim to a probability distribution over associated query translations. By

efficiently executing tens to hundreds of thousands of candidate translations

for a typical input document, the system maps text claims to correctness

probabilities. This process becomes practical via a specialized processing

backend, avoiding redundant work via query merging and result caching.

Verification is an interactive process in which users are shown tentative

results, enabling them to take corrective actions if necessary.

Our system was tested on a set of 53 public articles containing 392 claims.

Our test cases include articles from major newspapers, summaries of survey

results, and Wikipedia articles. Our tool revealed erroneous claims in roughly

a third of test cases. A detailed user study shows that users using our tool

are in average six times faster at checking text summaries, compared to generic

SQL interfaces. In fully automated verification, our tool achieves

significantly higher recall and precision than baselines from the areas of

natural language query interfaces and fact checking.

Approaches for Enriching and Improving Textual Knowledge Bases

Besnik Fetahu

Comments: PhD thesis, 2017

Subjects

Computation and Language (cs.CL)

; Information Retrieval (cs.IR)

Verifiability is one of the core editing principles in Wikipedia, where

editors are encouraged to provide citations for the added statements.

Statements can be any arbitrary piece of text, ranging from a sentence up to a

paragraph. However, in many cases, citations are either outdated, missing, or

link to non-existing references (e.g. dead URL, moved content etc.). In total,

20/% of the cases such citations refer to news articles and represent the

second most cited source. Even in cases where citations are provided, there are

no explicit indicators for the span of a citation for a given piece of text. In

addition to issues related with the verifiability principle, many Wikipedia

entity pages are incomplete, with relevant information that is already

available in online news sources missing. Even for the already existing

citations, there is often a delay between the news publication time and the

reference time.

In this thesis, we address the aforementioned issues and propose automated

approaches that enforce the verifiability principle in Wikipedia, and suggest

relevant and missing news references for further enriching Wikipedia entity

pages.

Benchmarking Top-K Keyword and Top-K Document Processing with T({}^2)K({}^2) and T({}^2)K({}^2)D({}^2)

Ciprian-Octavian Truica (UPB), Jérôme Darmont (ERIC), Alexandru Boicea (UPB), Florin Radulescu (UPB)

Journal-ref: Future Generation Computer Systems, Elsevier, 2018, 85, pp.60-75.

https://www.sciencedirect.com/science/article/pii/S0167739X17323580

Subjects

Databases (cs.DB)

; Information Retrieval (cs.IR)

Top-k keyword and top-k document extraction are very popular text analysis

techniques. Top-k keywords and documents are often computed on-the-fly, but

they exploit weighted vocabularies that are costly to build. To compare

competing weighting schemes and database implementations, benchmarking is

customary. To the best of our knowledge, no benchmark currently addresses these

problems. Hence, in this paper, we present T({}^2)K({}^2), a top-k keywords and

documents benchmark, and its decision support-oriented evolution

T({}^2)K({}^2)D({}^2). Both benchmarks feature a real tweet dataset and queries

with various complexities and selectivities. They help evaluate weighting

schemes and database implementations in terms of computing performance. To

illustrate our bench-marks’ relevance and genericity, we successfully ran

performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand,

and on different relational (Oracle, PostgreSQL) and document-oriented

(MongoDB) database implementations, on the other hand.

Computation and Language

Phrase-Based & Neural Unsupervised Machine Translation

Guillaume Lample , Myle Ott , Alexis Conneau , Ludovic Denoyer , Marc'Aurelio Ranzato Subjects : Computation and Language (cs.CL)

Machine translation systems achieve near human-level performance on some

languages, yet their effectiveness strongly relies on the availability of large

amounts of bitexts, which hinders their applicability to the majority of

language pairs. This work investigates how to learn to translate when having

access to only large monolingual corpora in each language. We propose two model

variants, a neural and a phrase-based model. Both versions leverage automatic

generation of parallel data by backtranslating with a backward model operating

in the other direction, and the denoising effect of a language model trained on

the target side. These models are significantly better than methods from the

literature, while being simpler and having fewer hyper-parameters. On the

widely used WMT14 English-French and WMT16 German-English benchmarks, our

models respectively obtain 27.1 and 23.6 BLEU points without using a single

parallel sentence, outperforming the state of the art by more than 11 BLEU

points.

Learning Semantic Textual Similarity from Conversations

Yinfei Yang , Steve Yuan , Daniel Cer , Sheng-yi Kong , Noah Constant , Petr Pilar , Heming Ge , Yun-Hsuan Sung , Brian Strope , Ray Kurzweil

Comments: 10 pages, 8 Figures, 6 Tables

Subjects

Computation and Language (cs.CL)

We present a novel approach to learn representations for sentence-level

semantic similarity using conversational data. Our method trains an

unsupervised model to predict conversational input-response pairs. The

resulting sentence embeddings perform well on the semantic textual similarity

(STS) benchmark and SemEval 2017’s Community Question Answering (CQA) question

similarity subtask. Performance is further improved by introducing multitask

training combining the conversational input-response prediction task and a

natural language inference task. Extensive experiments show the proposed model

achieves the best performance among all neural models on the STS benchmark and

is competitive with the state-of-the-art feature engineered and mixed systems

in both tasks.

Improving Supervised Bilingual Mapping of Word Embeddings

Armand Joulin , Piotr Bojanowski , Tomas Mikolov , Edouard Grave Subjects : Computation and Language (cs.CL) ; Learning (cs.LG)

Continuous word representations, learned on different languages, can be

aligned with remarkable precision. Using a small bilingual lexicon as training

data, learning the linear transformation is often formulated as a regression

problem using the square loss. The obtained mapping is known to suffer from the

hubness problem, when used for retrieval tasks (e.g. for word translation). To

address this issue, we propose to use a retrieval criterion instead of the

square loss for learning the mapping. We evaluate our method on word

translation, showing that our loss function leads to state-of-the-art results,

with the biggest improvements observed for distant language pairs such as

English-Chinese.

Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

Minjoon Seo , Tom Kwiatkowski , Ankur P. Parikh , Ali Farhadi , Hannaneh Hajishirzi

Comments: 6 pages

Subjects

Computation and Language (cs.CL)

The current trend of extractive question answering (QA) heavily relies on the

joint encoding of the document and the question. In this paper, we formalize a

new modular variant of extractive QA, Phrase-Indexed Question Answering

(PI-QA), that enforces complete independence of the document encoder from the

question by building the standalone representation of the document discourse, a

key research goal in machine reading comprehension. That is, the document

encoder generates an index vector for each answer candidate phrase in the

document; at inference time, each question is mapped to the same vector space

and the answer with the nearest index vector is obtained. The formulation also

implies a significant scalability advantage since the index vectors can be

pre-computed and hashed offline for efficient retrieval. We experiment with

baseline models for the new task, which achieve a reasonable accuracy but

significantly underperform unconstrained QA models. We invite the QA research

community to engage in PI-QA for closing the gap.

Generating syntactically varied realisations from AMR graphs

Kris Cao , Stephen Clark Subjects : Computation and Language (cs.CL)

Generating from Abstract Meaning Representation (AMR) is an underspecified

problem, as many syntactic decisions are not specified by the semantic graph.

We learn a sequence-to-sequence model that generates possible constituency

trees for an AMR graph, and then train another model to generate text

realisations conditioned on both an AMR graph and a constituency tree. We show

that factorising the model this way lets us effectively use parse information,

obtaining competitive BLEU scores on self-generated parses and impressive BLEU

scores with oracle parses. We also demonstrate that we can generate

meaning-preserving syntactic paraphrases of the same AMR graph.

Lightweight Adaptive Mixture of Neural and N-gram Language Models

Anton Bakhtin , Arthur Szlam , Marc'Aurelio Ranzato , Edouard Grave Subjects : Computation and Language (cs.CL)

It is often the case that the best performing language model is an ensemble

of a neural language model with n-grams. In this work, we propose a method to

improve how these two models are combined. By using a small network which

predicts the mixture weight between the two models, we adapt their relative

importance at each time step. Because the gating network is small, it trains

quickly on small amounts of held out data, and does not add overhead at scoring

time. Our experiments carried out on the One Billion Word benchmark show a

significant improvement over the state of the art ensemble without retraining

of the basic modules.