# Copy of Skills duplicate detection

### **1. What is AI Skill duplicate detection?**

The **AI Duplicate Detection** feature helps you clean and standardize your skills database by identifying potential duplicate skills.

Over time, skills libraries can grow inconsistent due to:

* Slight spelling differences
* Different naming conventions
* Abbreviations vs. full names
* Similar skills created by different teams

This AI-powered tool scans your skills database and groups potential duplicates into clusters so you can review and resolve them efficiently.

### **2. What is the purpose of a duplicate detection?** <a href="#h_ba9f231322" id="h_ba9f231322"></a>

AI duplicate detection helps you maintain a single, consistent source of truth and ensures:

* accurate reporting
* clear workforce insights
* reliable compliance tracking
* reduced administrative clutter

### **3. How to use the AI Skill duplicate detection?** <a href="#h_9c88230f8e" id="h_9c88230f8e"></a>

To access the AI duplicate detection feature

1. Open your **Profile flyout**
2. Click the **AI Assistant** button
3. You’ll be redirected to the AI duplicate detection page

When you open the page for the first time, it may be empty until you run a detection.

<h4 align="center"><img src="/files/Elhm35dPRb1qVuB9M6qk" alt=""></h4>

#### Step 1: Configure detection

At the top of the page, you’ll see the detection configuration.

You can choose a **similarity threshold**:

* **Strict** – Only very close matches are detected
* **Balanced** – Moderate similarity (recommended starting point)
* **Loose** – Broader matching, more potential duplicates

Each setting affects how sensitive the AI is when comparing skill names and related data. We recommend starting with a Balanced threshold as it helps you identify meaningful duplicates without overwhelming you with too many loosely related matches, making it the most effective option for an initial review.

***

#### Step 2: Run Detection

Click **Run Detection**.

* The process duration depends on the size of your database.
* It runs in the background.
* You’ll see a **completion percentage**.
* You can stop the process if needed.

> During the first run, **processing may take several hours** because none of the skills have been vectorized yet. Vectorization enables the AI to compare skills based on their meaning and similarity.
>
> On subsequent runs, only newly created or updated skills are processed. This significantly **reduces the runtime.**
>
> Once all skills have been vectorized, rerunning the process with a different similarity setting (e.g., stricter or more lenient) typically takes only a few minutes.

When completed, you’ll see:

* Number of skills analyzed
* Number of potential duplicates found

<figure><img src="/files/Bpr3booyBSxmdG5X0r8e" alt=""><figcaption></figcaption></figure>

**What happens when you run detection again?**

To keep your skill library clean and well-structured, we recommend running duplicate detection after large imports or updates.

Each time you complete a new detection run, the system rebuilds the duplicate clusters based on the current state of your database.

* Existing clusters from the previous run are **replaced.**
* All newly **created and updated** skills are analyzed.
* The system **compares** those skills **against the rest of your skills library** to identify duplicates based on the selected similarity settings.
* If **new potential duplicates** are found, they are included in the new clusters.
* Skills that were previously grouped together may no longer appear in the same cluster if the underlying data has changed.

This means the results always reflect the latest version of your skills library and any new or changed duplicates are captured in the updated clusters.<br>

### 4. Understanding the AI skill duplicate detection results

#### Duplicate Clusters

Potential duplicates are grouped into **clusters** and visible in the **Suggested duplicates tab**

Each cluster may contain two or more skills suggested as possible duplicates by AI

You can:

* Expand a cluster to see details
* Click on a skill name to open its full skill page
* Review similarities before making a decision to merge those skills or to mark the cluster as not containing duplicates and keep the skills intact.

#### Reasoning Filter

You can filter results by **AI reasoning**. The reasoning explains why the AI classifies skills as potential duplicates.&#x20;

There are six possible reasoning types:

1. Exact match
2. Misspelling
3. Multi language
4. Paraphrase
5. Special characters&#x20;
6. Or Miscellaneous

Reasoning filter helps you:

* Understand the AI’s logic
* Prioritize review
* Focus on certain types of similarities

#### Feedback History Tab

This tab shows:

* Previously reviewed and marked as not duplicates clusters
* Notes added
* Who and when marked them as not duplicates

You can reverse a decision to mark skills as not duplicates if needed.

#### Non-duplicate Skills Tab

This tab shows:

* Skills that were analyzed but no duplicates were detected

<figure><img src="/files/JedgHN27vJtD5HFhM5Yc" alt=""><figcaption></figcaption></figure>

### 5. Reviewing and Managing Duplicates

#### Option 1: Mark as Not duplicates

If skills are clearly different:

1. Click **Mark as not duplicates**
2. Add a note explaining your reasoning
   1. Explain the key differences between skills (e.g., different business contexts, distinct target audiences, unique technical requirements...)
3. Confirm

Your note will:

* Help AI to make better duplicate decisions in future runs and prevent repeated clusters
* Be visible to other reviewers in your organization
* Provide transparency

The cluster will move to the **Feedback History** tab.

#### Option 2: Merge duplicate skills&#x20;

When skills clustered as duplicates are indeed duplicates, you can merge them into 1 skill

1. This step has to be performed manually by deactivating the duplicate skill(s) and reassigning related objects to the survivor skill. AG5 can help you with that

{% hint style="info" %}
Built-in merge functionality will be available soon. &#x20;
{% endhint %}

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ag5.com/ag5-ai-skills-management/copy-of-skills-duplicate-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
