1.
Summary
This package contains two parts:
- The "original" part contains
2000 natural scene images.
This part is somewhat
big, about 35.9Mb (24.2Mb after compression).
- The "processed" part contains
data sets for multi-instance multi-label learning.
This part is not big,
about 618Kb (608Kb after compression).
The
data set has been used in:
ATTN: You
can feel free to use the package (for academic purpose only) at your own risk.
An acknowledge or citation to the above paper is required. For other purposes,
please contact Prof. Zhi-Hua Zhou (zhouzh@nju.edu.cn).
Download:
[datafile] (24.7Mb)
2.
Details
The image data set consists of
2,000 natural scene images, where a set of labels is artificially assigned to
each image. The following table gives the detailed description of the number
of images associated with different label sets, where all the possible class
labels are desert, mountains, sea, sunset and trees.
The number of images belonging to more than one class (e.g. sea+sunset)
comprises over 22% of the data set, many combined classes (e.g. mountains+sunset
+trees) are extremely rare. On average, each image is associated with 1.24
class labels.
Table 1. Characteristics of the natural scene image data
----------------------------------------------------------------------------------------------------------------------------------------
Label Set
#Images |
Label Set
#Images |
Label Set
#Images
----------------------------------------------------------------------------------------------------------------------------------------
desert
340 |
desert+sunset
21 |
sunset+trees
28
mountains
268 |
desert+trees
20
| desert+mountains+sunset
1
sea
341
|
mountains+sea
38
| desert+sunset+trees
3
sunset
216
| mountains+sunset
19
| mountains+sea+trees
6
trees
378
|
mountains+trees
106
|
mountains+sunset+trees
1
desert+mountains
19
| sea+sunset
172 |
sea+sunset+trees
4
desert+sea
5
|
sea+trees
14 |
Total
2,000
----------------------------------------------------------------------------------------------------------------------------------------
The "original" part of this package contains all these 2,000 natural
scene images, which are named in numbers from 1 to 2,000.
The "processed" part
of this package contains the multi-instance multi-label data (in MATLAB format)
obtained from the natural scene images. Specifically, each image is represented
as a bag of nine instances generated by the SBN method [1]. Concretely, each
image is smoothed by a Gaussian filter and subsampled to an 8x8 matrix of color
blobs where each blob is a 2x2 set of pixels within the 8x8 matrix. An SBN is
defined as the combination of a single blob with its four neighboring blobs
(up, down, left, right). The sub-image is described as a 15-dimensional vector,
where the first three attributes represent the mean R, G, B values of the central
blob and the remaining twelve attributes correspond to the differences in mean
color values between the central blob and other four neighboring blobs respectively.
Therefore, each image bag is represented by a collection of nine 15-dimensional
feature vectors obtained by using each of the nine blobs not along the border
as the central blob. Furthermore, each image is also manually assigned with
a set of labels.
After reading the processed data
into MATLAB environment, for the i-th natural scene image in the "original"
part, the image bag corresponding to this image is stored in bags{i,1} while
its associated labels are stored in target(:,i). For illustration purpose, suppose
target(:,i)' equals [1 -1 -1 1 -1], it means that the i-th image belongs
to the 1st and 4th classes but do not belong to the 2nd, 3rd and 5th classes.
The variable "class_name" gives the name of each class.
[1] O. Maron and A. L. Ratan.
Multiple-instance learning for natural scene classification. In: Proceedings
of the 15th International Conference on Machine Learning, pp. 341-349, Madison,
WI, 1998.