Let's discuss search ranking factors. Start by saying that search ranking factors are the same as search ranking signals.
Hand crafted or manually adjustable vs LLM-based.
Google takes relevant data and performs regression to arrive at signals.
Neither Google, nor any other search engine would not disclose how its search engine ranking system works by pretext of safeguarding from manipulations. Obviously, what we knew about search ranking signals has become evident from other sources.
Importantly, Google ranking signals are nowhere to be seen in explicit form.
Image of Google antitrust proceeding docs
SOURCES
Google ranking signals may be divided into ‘hand crafted’ or manually adjustable and LLM-based.
Manually adjustable signals that can be analyzed and adjusted by engineers whereas large language model- or LLM based revolve around natural language processing as well as AI powered learnings. Almost every signal, aside from RankBrain and DeepRank (which are LLM-based) are hand-crafted and thus able to be analyzed and adjusted by engineers.
Search ranking engineers operate two major variables: data and signals. Data is primordial. Google uses a pair of data plus regression to arrive at a signal.
To develop a signal engineers look at the function and figure out what threshold of sensitivity to use. The function is a rule of describing a relationship between the sets of data. For example, Google uses sigmoid or other functions. The threshold is a midpoint in which the relation becomes statistically significant, and this midpoint can be picked manually or arrived at by regression, as in most cases with Google.
Image of Google antitrust proceeding docs
The "hand crafting" of signals means that Google takes all those sigmoids (and other functions) and figures out the thresholds
Google takes the relevant data and performs regression to confidently determine which factors matter most.
Google engineers plot ranking signal curves.
The curve-fitting is happening at every single level of signals.The purpose of curve fitting is to find a function, i.e. how to better explain a mathematical relationship between parameters that leaves the smallest residual.
NavBoost is a re-ranking module that uses a “pair of dice” metaphor and logic. As inferred from the leaked documents, modules use click and impression (and their proportions) as a "winning" dice combination per specific position in SERP: if a document gets a better combo per position than another, it gets a boost. People who navigate the search and choose a specific document called "voters", the whole process - "voting", people' data are tokenized and stored. This Twiddler - or re-ranking algorithm - works to Boost (promote) or Demote sites.
Overall, Twiddlers are responsible for the re-ranking of results from a single corpus. They act on a ranked sequence of results rather than on individual results. Twiddlers may work on device-basis, location-basis, topic-basis, etc. Google has Boost (or Demote) functions that are a part of Twiddlers framework. For example, the “Boost” functions identified in the leaked docs: NavBoost, QualityBoost, RealTimeBoost, WebImageBoost and more.
This is the oldest, probably the basic ranking signal. Anchors is a source page pointing to a target page by links. So if we take the number of anchors and analyze the text used therein, we'll find whether or not a page possesses a certain topic.For example, there are 10 links pointing to your page (internal or external links) and they use anchors like apple, red apple, green apple, and so on, so maybe then this page has topic of apple. So the document is relevant to the like queries.
These are terms in the document. This ranking signal analyses the relevance of the terms used in the document.
Clicks is how long a user stayed on the page before bouncing back to search. So whether not this vote - in the form of click - shall be counted towards the relevance and topicality.
This is to say how the document is relevant to the query. Insofar as topicality answers the question, how relevant is the page based on the query term to be showcased in the search results?
And these ABC (anchors, body and clicks) are the key components of topicality, so,, they allow Google to decide whether or not to show a page high or low against the search term.
Quality is the notion of Trustworthiness. It is an important signal. It has to do with authority of the web links coming to the website, the age of the domain, etc. This is to say Google wants to know whether or not users can actually trust the page and its content.
PageRank arguably exists on several layers including that which implies a "distance" from a golden standard "seed" websites.
Google arguably has a collection of trusted articles on all topics - the gold standard of trust. All selected links form a link graph. The rank or correlation of each link is calculated by the distance from the trusted documents, which is a standard graph algorithm. This is called “NeerestSeeds” method.
For example, if a trusted article from The New York Times links to an article on site X, and an article from site X links to an article on site Y, and an article on site Y links to wlw, the distance will be 3. The distance in graphs is calculated not by nodes, but by links or edges. The smaller the distance, the better for this indicator.
Summary
This is how search rankings work as inferred from analysis of the leaked documentation and the court testimonies.