AIOZ AI / Sep 21st / 6 min read

Large-Scale Coarse-to-Fine Object Retrieval Ontology and Deep Local Multitask Learning (Part 6)

Fashion Ontology.

In this section, we will mention about ontology, fashion ontology, and its related information and present the contributions of object ontology to the CFOR system.

1. Ontology Definition for CFOR System

As described by Guarino, ontology is a "formal, explicit specification of a shared conceptualization." Typically, ontologies consist of concepts and their hierarchical structure, aiding in organizing information within a domain. A complete ontology typically includes concepts, relations, and axioms. Additionally, ontologies offer several key advantages:

Describing domain knowledge through a semantic hierarchical tree, with concepts represented as nodes identified by words or phrases.
Bridging the semantic gap in various tasks, including those in computer vision and other disciplines.
Enhancing software engineering practices by improving flexibility, reliability, specification, and reusability.
Supporting multitask problem-solving capabilities.

Any proposed ontology should satisfy two fundamental criteria:

Wide recognition within the community.
Feasibility for formalization using mathematical expressions, enabling digitization.

In our approach, we employ ontological engineering to facilitate communication and information sharing across different levels of data abstraction involved in image fashion retrieval, detection, and information tagging.

The object ontology comprises two primary levels: coarse-grained and fine-grained.

At the coarse level, the object ontology includes regions, categories, or high-level conceptual entities, which leverage global features extracted by deep networks for similarity retrieval. However, these deep features are treated as black boxes, lacking explicit semantic information to aid users in their search process.
At the fine-grained level, the object ontology encompasses attributes that provide detailed descriptions of objects.

In our experiment, we focus on describing the object "Fashion." The fashion ontology is constructed using prior knowledge and information from the DeepFashion dataset, along with ontology definitions introduced by Guarino. See Figure.1 for an illustration of the fashion ontology.

Figure 1. Fashion ontology in general and a version of ontology for clothes.

The fashion ontology developed comprises three primary semantic levels: 1. Regions: Representing areas such as Top, Bottom, and Body. 2. Categories: Specific objects associated with each region, such as dresses or robes for the Body region. 3. Attributes: Describing detailed visual concepts like denim or fur.

To streamline the discussion, our investigation focuses on the object fashion across three regions (Top, Body, and Bottom), select categories within these regions, and their respective attributes.

Within the CFOR system, a query image undergoes processing starting from the coarse level of the object ontology to identify the region and category of the corresponding object. Subsequently, the object proceeds to the fine-grained concept ontology to ascertain attributes. Once all necessary information is obtained, the object undergoes indexing and similarity distance computation to identify similar images in the database, ranked by a cumulative score derived from similarity scores of global features and attribute learning between the query image and target database images. For a detailed illustration, refer to Figure.2.

Figure 2. An example of a relationship between the query image and semantic information from the coarse-grained level to the fine-grained level of the fashion ontology.

2. Fashion Object Ontology

In this section, we introduce the fashion object ontology. Within the fashion domain, we categorize semantic fashion concepts based on regions. Each region encompasses a detailed ontology comprising categories and attributes. To facilitate experiments using the DeepFashion dataset, we extend the fashion ontology within the "Clothes" branch (refer to Figure 9). It's essential to emphasize that the proposed ontology is not specific to any application and should be viewed as a flexible foundation.

The fashion object ontology consists of multiple levels of concepts, with relations between each level to articulate their associations. Two primary relations are employed: 1. "Part of": This relation specifies that the concepts are components of the main concept. 2. "Has a": This relation describes the main concept in detail.

For this study, we concentrate solely on the Clothes branch to ensure equitable comparisons with other methodologies. The Clothes taxonomy comprises 50 distinct categories. A clothing region taxonomy has been established (refer to Figure.3), organizing all clothing categories hierarchically. The first level of this hierarchy represents the most general clothing region, with three primary regions defined: 1. Top (e.g., tee and tank) 2. Bottom (e.g., skirt and jeans) 3. Body (e.g., dress and robe)

Figure 3. Excerpt from the “Clothes” taxonomy defined in the fashion ontology.

3. Fine-Grained Object Ontology

Fine-grained object ontology is used to describe objects at the attribute level. Semantic information such as attributes can be useful for a customer to retrieve (see Figure.4). It is important to note that the proposed ontology is not application dependent and should be considered as an extensible basis.

Figure 4. Fine-grained group at the attribute level.

Cloth attributes vary across different levels—some attributes, like color, are common across all cloth regions, while others are specific to certain regions or categories. Our ontology is structured into two main parts, each detailed in the following sections: 1. Specific fashion concepts—pertaining to particular characteristics of clothes such as fabric, part, and style. 2. Visual concepts—related to popular visual characteristics like color, shape, and texture, not exclusive to fashion.

Rudd et al. demonstrated in a study that a multitask learning-based model outperforms a combination of single-task learning-based models in face attribute prediction. While this approach shows promising results for fashion attributes as well, there's a significant difference in the quantity of attributes between faces and fashion items. This disparity can pose challenges in scaling the system, such as in training and storage requirements. To address this, we propose applying local multitask learning to attribute learning, providing more flexibility. Further explanation is provided in the subsequent sections.

In the next post, we will discover Attribute Learning and its correlation with multitask learning.