AI Link building agency glossary — Entity coverage, internal linking, schema hints.

AI Link building agency glossary — Entity coverage, internal linking, schema hints.

The vocabulary of SEO (keresőoptimalizálás) has changed. Ten years ago, a link building glossary was a simple list of terms like "PageRank," "NoFollow," and "Anchor Text." Today, that is insufficient.

With the rise of Large Language Models (LLMs) and Google's shift toward a semantic search engine (powered by systems like BERT and RankBrain), the modern link builder must speak a new language. It is no longer enough to understand keywords; one must understand concepts, connections, and machine-readable context.

This glossary is not just a dictionary. It is an operational manual. It breaks down the critical terminology required to execute an AI-driven link building campaign in 2025. We have categorized these terms into four pillars: Entity Coverage, Internal Linking Architecture, Schema Markup, and AI-Specific Operations.

Part 1: Entity Coverage & Semantic Connectivity

The era of "strings" (keywords) is over. We are now in the era of "things" (entities). Google’s Knowledge Graph does not just look for matching words; it looks for the relationships between real-world objects, people, places, and concepts.

1. Entity ( The Atomic Unit)

Definition:

An entity is a distinct, independent, and identifiable concept. It can be a person (Elon Musk), a place (Budapest), an object (iPhone 15), or an abstract concept (Inflation). In the eyes of a search engine, an entity is defined by its ID in the Knowledge Graph, not just its name.

The Agency Application:

Modern link building isn't about getting a link from a site that says "Best Shoes." It is about getting a link from a site that Google identifies as an authority on the Entity of "Footwear." AI tools now analyze potential donor sites to ensure they hold "Entity Authority" before outreach begins.

2. Named Entity Recognition (NER)

Definition:

NER is a sub-task of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories.

The Agency Application:

Agencies use NLP (Natural Language Processing) tools to scan a client's content and a prospect's content. If the Client’s site is heavy on the entity "Sustainable Energy," but the prospect’s site only contains entities related to "Fossil Fuels," an AI tool will flag this as a "Semantic Mismatch," saving the agency from sending a low-value pitch.

3. Salience Score

Definition:

Salience refers to the importance or relevance of an entity within a specific document. Google assigns a score (0.0 to 1.0) to entities found on a page. If "Apple Inc." is mentioned once in the footer, it has low salience. If it is the subject of the headline and first paragraph, it has high salience.

The Agency Application:

When securing a guest post or a link insertion, simply getting the link is not enough. You must ensure the surrounding text increases the Salience Score of your target entity.

  • Old Way: Insert a link in the author bio.

  • AI Way: Analyze the main body text and insert the link where the semantic relevance (Salience) of the topic is highest.

4. Co-Occurrence (The "Bad Neighborhood" Check)

Definition:

This refers to the frequency with which two terms or entities appear together in a dataset. Google learns relationships based on co-occurrence. If "Diet Pills" and "Scam" frequently appear together, they become semantically linked.

The Agency Application:

AI tools scan the backlink profile of a potential partner site. If the site frequently links to entities associated with gambling, pharma, or low-quality loans (bad co-occurrence), the agency blacklists the site to protect the client’s "Trust Rank."

5. Knowledge Graph

Definition:

Google’s massive database of billions of facts about entities and their relationships. It allows the search engine to answer questions directly (e.g., "How tall is the Eiffel Tower?") without relying solely on website clicks.

The Agency Application:

The ultimate goal of high-level link building is not just ranking, but Knowledge Graph Reconciliation. By building links from authoritative sources (like Wikipedia, Wikidata, or major news outlets) that use specific Schema markup, agencies help Google "connect the dots" and verify their client as a legitimate entity.

Part 2: Internal Linking & Site Architecture

External links provide authority (Vote of Confidence), but internal links provide context (The Roadmap). In an AI-driven strategy, internal linking is calculated mathematically to distribute "link juice" exactly where it is needed.

6. Semantic Distance

Definition:

A metric that measures how closely related two pieces of content are in meaning.

  • "Dog" and "Puppy" have a short semantic distance.

  • "Dog" and "Carburetor" have a long semantic distance.

The Agency Application:

We use vector databases (like Pinecone or Weaviate) to analyze a client’s blog. We convert every article into a vector (a numerical representation). We then calculate the distance between these vectors. The strategy is to automatically suggest internal links only between pages with a short semantic distance, creating tight "Topical Clusters."

7. Orphan Page

Definition:

A page on a website that has no internal links pointing to it. Search engine crawlers (spiders) often fail to find or index these pages.

The Agency Application:

AI crawlers (like customized Python scripts using Screaming Frog data) identify orphan pages instantly. The "Fix" is automated: the AI finds the most semantically relevant existing post and drafts a snippet of text to add, creating an internal link to the orphan page.

8. Link Equity (formerly Link Juice)

Definition:

The value or authority passed from one page to another through a hyperlink.

The Agency Application:

SEO (keresőoptimalizálás) agencies use algorithms to simulate the flow of equity. If a client buys a high-power external link pointing to their Homepage, an AI model calculates the optimal internal linking structure to ensure that authority flows down to the "Money Pages" (product or service pages) rather than getting stuck on the "About Us" page.

9. Anchor Text Optimization (The N-Gram Strategy)

Definition:

The clickable text in a hyperlink.

  • Exact Match: "Buy SEO Services"

  • Partial Match: "services for link building"

  • Branded: "AgencyName"

The Agency Application:

Over-optimizing anchor text leads to penalties (Penguin). AI tools analyze the "Anchor Text Profile" of top-ranking competitors. They determine the safe ratio of Exact vs. Branded anchors. The agency then uses this data to instruct link builders on exactly what text to use for the next 10 links to maintain a natural profile.

10. Pillar Page & Topic Clusters

Definition:

  • Pillar Page: A comprehensive, long-form page covering a broad topic (e.g., "The Ultimate Guide to Coffee").

  • Cluster Content: Smaller posts covering sub-topics (e.g., "French Press vs. Drip," "Best Beans for Espresso").

The Agency Application:

This is the standard architecture for modern SEO (keresőoptimalizálás). The agency builds external links to the Pillar Page to build massive authority, and uses internal links to funnel that authority to the specific Cluster pages that drive specific long-tail traffic.

Part 3: Schema Markup & Structured Data Hints

Schema is code (usually JSON-LD) that helps machines understand the content of a page explicitly. It removes ambiguity. For a link building agency, Schema is a secret weapon for establishing entity identity.

11. JSON-LD (JavaScript Object Notation for Linked Data)

Definition:

The preferred format for structuring data. It is a script placed in the <head> or <body> of a page that provides metadata to search engines.

The Agency Application:

When an agency secures a guest post, they might ask the webmaster to include specific JSON-LD markup. This is rare and high-value. It ensures Google understands exactly who the article is about.

12. sameAs Property

Definition:

A Schema property that tells search engines that two URLs represent the exact same entity. This is critical for disambiguation.

The Agency Application:

On a client's "About" page, the agency implements the sameAs schema to link to the client's Wikipedia page, Crunchbase profile, and LinkedIn profile. This explicitly ties the website to trusted third-party data sources.

Code Example:

JSON

"sameAs": [ "https://www.facebook.com/your-brand", "https://en.wikipedia.org/wiki/Your_Brand", "https://twitter.com/your_brand" ]

13. mentions Property

Definition:

A schema property that indicates that a CreativeWork (like a blog post) mentions a specific entity.

The Agency Application:

This is the "Linkless Backlink." Even if a publisher refuses to give a dofollow hyperlink, an agency can ask them to include a mentions schema tag pointing to the client's Wikidata entry. This provides machine-readable validation of the brand's relevance to the topic.

14. citation Property

Definition:

Used to reference another creative work (article, book, study) that supports the current content.

The Agency Application:

When an agency publishes a study or a whitepaper for a client, they encourage other bloggers to use the citation schema when referencing the data. This creates a formal, structured connection between the source and the reference, which is highly valued by academic and research-oriented search algorithms.

15. Organization vs. Person Schema

Definition:

Distinguishing whether a site belongs to a corporate entity or an individual.

The Agency Application:

For "E-E-A-T" (Experience, Expertise, Authoritativeness, Trustworthiness), Google needs to know who is behind the content. Agencies ensure that all guest posts authored by the client have proper Person schema attached to the author bio, linking back to the client’s main site.

Part 4: AI Tools & Algorithmic Concepts

This section covers the tools and concepts used to execute the strategies above. These are the terms you will hear in a developer meeting at an AI SEO (keresőoptimalizálás) agency.

16. Vector Embeddings

Definition:

A method of converting words, sentences, or entire documents into a list of numbers (vectors) in a multi-dimensional space. In this space, similar concepts are located close to each other.

The Agency Application:

  • Outreach: Agencies use embeddings to match a client’s article with thousands of potential donor sites instantly. If the vector of the client's pitch matches the vector of the prospect's editorial style, the "Pitch Success Score" is high.

  • Relevance: It solves the synonym problem. It understands that "jaguar" (the car) is mathematically far away from "jaguar" (the animal), preventing irrelevant link placement.

17. RAG (Retrieval-Augmented Generation)

Definition:

An AI framework where an LLM (like GPT-4) retrieves specific data from an external trusted source (like a client's technical documentation) before generating an answer.

The Agency Application:

When writing guest posts, agencies use RAG to ensure the AI doesn't hallucinate facts.

  • Workflow: The AI is instructed: "Read this PDF about my client's new solar panel technology. Then, write a blog post about 'The Future of Green Energy' using only the specs found in that PDF." This ensures technical accuracy in link building content.

18. Tokenization

Definition:

The process of breaking down text into smaller units (tokens), which can be words or parts of words. LLMs process text in tokens, not words.

The Agency Application:

Understanding token limits is crucial for automated outreach. If an agency tries to feed a 5,000-word competitor analysis into a prompt with a small token window, the AI will "forget" the beginning of the text. Agencies optimize their prompts to be token-efficient to save API costs and improve accuracy.

19. Sentiment Analysis

Definition:

Using NLP to determine the emotional tone of a piece of text (Positive, Negative, Neutral).

The Agency Application:

  • Brand Monitoring: Before asking for a link, the agency scans the prospect site. If the site has recently written negative articles about the client’s industry, the sentiment analysis tool flags it as "Hostile."

  • Unlinked Mentions: The tool finds places where the brand is mentioned but not linked. It filters for Positive sentiment mentions first, as these are the easiest to convert into backlinks with a simple "Thank you" email.

20. E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)

Definition:

Google's framework for assessing the quality of content. It is not a direct ranking factor, but a component of the Search Quality Rater Guidelines.

The Agency Application:

AI link building is heavily focused on bolstering E-E-A-T.

  • Experience: Building links from forums or community discussions (User Generated Content) to show real usage.

  • Expertise: Securing bylines for the client on high-level industry journals.

  • Trustworthiness: Ensuring the technical security (HTTPS) and business transparency (Schema) are perfect.

Part 5: Metrics Reimagined (Measuring Success)

The glossary of success metrics has also evolved. We are moving away from vanity metrics toward performance metrics.

Old MetricNew Metric (AI Era)DefinitionDA (Domain Authority)Traffic Value (TV)Instead of a theoretical score (DA), we look at how much the site's organic traffic would cost if bought via Google Ads. A high TV indicates a valuable site.Number of BacklinksReferring Domains (RD) VelocityThe speed and consistency of new domains linking to the site. Spikes look suspicious; steady growth looks natural.Keyword PositionShare of Voice (SOV)The percentage of the market your client owns for a specific topic cluster, not just a single keyword.Spam ScoreLink ToxicityAn algorithmic assessment of how likely a link is to trigger a manual action or algorithmic devaluation.

Implementation Guide: Using This Glossary

How should a marketing manager or agency owner use this glossary?

  1. Audit Your Reports: Look at the monthly reports you send to clients. Are you still talking about "Keyword Density"? Update your language to discuss "Entity Salience" and "Semantic Coverage."

  2. Train Your Writers: Give this glossary to your content team. They need to understand that when they write, they are feeding a Knowledge Graph, not just filling space.

  3. Refine Your Outreach: Use the "Entity" concept to find better partners. Don't just look for "Tech Blogs." Look for blogs that have high authority on the entity of "SaaS Middleware" if that is what your client sells.

The Future is Semantic

The divide between "Technical SEO (keresőoptimalizálás)" and "Link Building" is disappearing. The modern link builder must be part data scientist, part PR professional.

Understanding these terms is the first step. Mastering the tools that utilize them is the competitive advantage that will separate successful agencies from the obsolete ones in the coming years.

© Copyright thuletetobox