Copy of Skills duplicate detection
1. What is AI Skill duplicate detection?
The AI Duplicate Detection feature helps you clean and standardize your skills database by identifying potential duplicate skills.
Over time, skills libraries can grow inconsistent due to:
Slight spelling differences
Different naming conventions
Abbreviations vs. full names
Similar skills created by different teams
This AI-powered tool scans your skills database and groups potential duplicates into clusters so you can review and resolve them efficiently.
2. What is the purpose of a duplicate detection?
AI duplicate detection helps you maintain a single, consistent source of truth and ensures:
accurate reporting
clear workforce insights
reliable compliance tracking
reduced administrative clutter
3. How to use the AI Skill duplicate detection?
To access the AI duplicate detection feature
Open your Profile flyout
Click the AI Assistant button
You’ll be redirected to the AI duplicate detection page
When you open the page for the first time, it may be empty until you run a detection.

Step 1: Configure detection
At the top of the page, you’ll see the detection configuration.
You can choose a similarity threshold:
Strict – Only very close matches are detected
Balanced – Moderate similarity (recommended starting point)
Loose – Broader matching, more potential duplicates
Each setting affects how sensitive the AI is when comparing skill names and related data. We recommend starting with a Balanced threshold as it helps you identify meaningful duplicates without overwhelming you with too many loosely related matches, making it the most effective option for an initial review.
Step 2: Run Detection
Click Run Detection.
The process duration depends on the size of your database.
It runs in the background.
You’ll see a completion percentage.
You can stop the process if needed.
During the first run, processing may take several hours because none of the skills have been vectorized yet. Vectorization enables the AI to compare skills based on their meaning and similarity.
On subsequent runs, only newly created or updated skills are processed. This significantly reduces the runtime.
Once all skills have been vectorized, rerunning the process with a different similarity setting (e.g., stricter or more lenient) typically takes only a few minutes.
When completed, you’ll see:
Number of skills analyzed
Number of potential duplicates found

What happens when you run detection again?
To keep your skill library clean and well-structured, we recommend running duplicate detection after large imports or updates.
Each time you complete a new detection run, the system rebuilds the duplicate clusters based on the current state of your database.
Existing clusters from the previous run are replaced.
All newly created and updated skills are analyzed.
The system compares those skills against the rest of your skills library to identify duplicates based on the selected similarity settings.
If new potential duplicates are found, they are included in the new clusters.
Skills that were previously grouped together may no longer appear in the same cluster if the underlying data has changed.
This means the results always reflect the latest version of your skills library and any new or changed duplicates are captured in the updated clusters.
4. Understanding the AI skill duplicate detection results
Duplicate Clusters
Potential duplicates are grouped into clusters and visible in the Suggested duplicates tab
Each cluster may contain two or more skills suggested as possible duplicates by AI
You can:
Expand a cluster to see details
Click on a skill name to open its full skill page
Review similarities before making a decision to merge those skills or to mark the cluster as not containing duplicates and keep the skills intact.
Reasoning Filter
You can filter results by AI reasoning. The reasoning explains why the AI classifies skills as potential duplicates.
There are six possible reasoning types:
Exact match
Misspelling
Multi language
Paraphrase
Special characters
Or Miscellaneous
Reasoning filter helps you:
Understand the AI’s logic
Prioritize review
Focus on certain types of similarities
Feedback History Tab
This tab shows:
Previously reviewed and marked as not duplicates clusters
Notes added
Who and when marked them as not duplicates
You can reverse a decision to mark skills as not duplicates if needed.
Non-duplicate Skills Tab
This tab shows:
Skills that were analyzed but no duplicates were detected

5. Reviewing and Managing Duplicates
Option 1: Mark as Not duplicates
If skills are clearly different:
Click Mark as not duplicates
Add a note explaining your reasoning
Explain the key differences between skills (e.g., different business contexts, distinct target audiences, unique technical requirements...)
Confirm
Your note will:
Help AI to make better duplicate decisions in future runs and prevent repeated clusters
Be visible to other reviewers in your organization
Provide transparency
The cluster will move to the Feedback History tab.
Option 2: Merge duplicate skills
When skills clustered as duplicates are indeed duplicates, you can merge them into 1 skill
This step has to be performed manually by deactivating the duplicate skill(s) and reassigning related objects to the survivor skill. AG5 can help you with that
Built-in merge functionality will be available soon.
Last updated
Was this helpful?