官方微博
关注微信公众号 关注微信公众号
  • 本月热门标签:

当前位置: 主页 > 数码互联 >

Flickr30k Entities: Collecting Region-to-Phrase Corresponde

2018-09-23 05:05 - - 查看:
Bryan A. Plummer1 Liwei Wang1 Christopher M. Cervantes1 Juan C. Caicedo2 Julia Hockenmaier1 Svetlana Lazebnik1 1University of Illinois at Urbana Champaign 2Fundación Universitaria Konrad Lorenz The Flickr30k 皇冠体育 has become a stand

  Bryan A. Plummer1 Liwei Wang1 Christopher M. Cervantes1 Juan C. Caicedo2

  Julia Hockenmaier1 Svetlana Lazebnik1

  1University of Illinois at Urbana Champaign 2Fundación Universitaria Konrad Lorenz

  The Flickr30k 皇冠体育 has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.

  皇冠体育 Examples:

  In each group of captions describing the same image, coreferent mentions (coreference chains) and their corresponding bounding boxes are marked with the same color. In the left example, each chain points to a single entity (bounding box). Scenes and events like "outside" or "parade" have no box. In the middle example, the people (red) and flags (blue) chains point to multiple boxes each. On the right, blue phrases refer to the bride, and red phrases refer to the groom. The dark purple phrases ("a couple") refer to both of these entities, and their corresponding bounding boxes are identical to the red and blue ones.

  

  You can browse additional examples of our 皇冠体育 at:

  [Examples] [Browse by phrase]

  皇冠体育:

  Please fill in this form to request access to the Flickr30k Entities 皇冠体育. The annotations are in XML format and the size of the archive is 11MB. Instructions to obtain access will be automatically emailed immediately after a request is made.

  Please visit the website for the original Flickr30k 皇冠体育 to obtain the images for the 皇冠体育.

  [Flickr30k]

  The Flickr30k 皇冠体育 website is down temporarily. You can access the images and captions for a limited time from the following (4.1G):

  [Flickr30k]

  Reference:

  We have a journal version of our paper with a stronger baseline on the phrase localization task:

  Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, IJCV, 123(1):74-93, 2017. [paper]

上一篇:上一篇:2018银行休憩节放假吗 2018年五壹银行放工吗 下一篇:下一篇:没有了