: 会议首页
: 组织机构
: 特邀专家
: 会议日程
: 顶会交流NEW
: 食宿安排
: 赞助支持
: 参会通知
: 会议地点
: 会议文集
: 以往会议
: 会议照片




技术问题,请联系网站管理员

© LAMDA 2005-2015

题目: Learning Sequences: image caption with region-based attention and scene factorization
报告人: 张长水 教授 清华大学
摘要: Learning sequence is a challenge task. Recent progress on automatic generation of image captions has shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. In this talk, we introduce some models for sequence modeling. Then we introduce our image caption system that exploits the parallel structures between images and sentences. In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of "abstract meaning", encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types. We benchmark our system and contrast to published results on several popular datasets. We show that using either region-based attention or scene-specific contexts improves systems without those components. Furthermore, combining these two modeling ingredients attains the state-of-the-art performance.