Meta issue: source and communication path analysis

Source analysis

Most of the propositions we believe are based on information provided to us by other people. So when analyzing the truth of a proposition, a clear recording of the source and method of transmission of the information is often critical.

The next stage, of course, is how should we analyze the information after we collect it. Some issues to be consider include: Are we missing needed communication path information? What can we do to find it if it is missing? Do we have enough reliable sources to qualify the truth of the information? How do we detect bias?

Deception patterns

single malicious source that spreads false information to real repeaters (identifiable as a single source)
single malicious source that pretends to be many different original sources (more work required to identify that the false information originates from a single source)
interceptor that receives information then spreads it in distorted form (resolvable by checking information from original source)

Useful things to think about:

We want to prevent users from creating a new source record when a record for the source already exists.
We need a way to link sources that are suspected of being under the control of other sources and potentially describe the mechanism by which the control is suspected to take place. E.g. A person with multiple fake identities, one person paying another to publish information verbatim or to bias it.
How many real “original” sources of the information and what is their reputations? For example, if it is a one-time event, how many witnesses? If it is an experiment, how many people have successfully repeated it? What are their qualifications as a witness or an experimenter?
Average number of links between the origin and the recipients of the information. Note this isn’t necessarily the same number as the as the number of links provided by the user doesn’t promise a path all the way to the origin. This means we probably want a way to mark if a path terminates on an original source or not.
What are methods to find missing original sources? For example, if A reports B told them X, the software could prompt B for their source path. Then we would presumably want to link the path reported by B to A’s path in some way.
“Quality” of communication links in the path
How to find undocumented origins? e.g. User reports he heard it on TrueNews, but doesn’t know how TrueNews received the information.
Identification of biased sources
Identification of biased sources in communication paths
Analyze total set of original sources across a set of propositions
Original sources add real evidence to a proposition. What are ways to increase the number of original sources?
Analyze votes on a proposition in terms of how many votes are "repeaters" (and from which source).
Could be useful to match up full and partial paths (e.g. One path has B told A and another has C told B told A). Note the final receiver would generally be different, but there's a couple of useful things that could be derived here: 1) we get more evidence that a path was correctly reported and 2) we can correctly trace cases where the original source was actually the same for two voters. Various characteristics of the links can be used to ensure proper matching, but often it will still be a fuzzy match.
Frequent original sources
Frequently cited sources in communication paths
Highly trusted sources (based on proposition tag votes?)

Verifying a communication path

The "source" has a special position in verifying a communication path because the source is asserting that the receivers properly understood the source (at least as far as the wording of the proposition goes). For lack of a better term, let's call this source confirmation.

Sources should be able to assert that they are properly referenced in a communication path, but even in this case, its not proof that their assertion is true. Similarly, a source can deny they are a source, but that also can be false. Still such assertions provide useful information since the assertions can affect future reputation. So we can allow sources listed on any given communication path to assert for whether or not they provided the information contained in the proposition (or if they provided similar but different information and the information has been distorted). Other witnesses can also assert that they got the information from a source.

One issue that comes up is that a source may not be a user. In this case, the source cannot directly assert in the software that they transmitted the information, but other users who heard the source confirm the communication could be used to indirectly confirm it. What would be the best way to do this? Also, this clearly isn't ideal, so generally it would be much better to invite the source to create an account (either as a user or an organization).

Communication confirmation via virtual propositions (a workaround for case where source is not a user)

A user could create a proposition such as "Source A confirms their source position is correct on communication path B". Implicit propositions could be created for communication links. But this also ends up creating a lot of likely unused propostions, so we should probably only have these exist as "virtual" propositions until at least one user votes on it.

Original sources

How to mark an original source? Each communication path will have exactly one original source (even if it isn't recorded in the existing path description).

How do we qualify the reliability that a source is the original source for a path and how do we deal with cases where there is dispute about the original source?

We also have a similar issue to communication path confirmation, where the source can confirm they are not just a source, but the original source (clear definitions of original source need to be provided to avoid a wrong assertion by the source).

Original source is also a bit complex in that there can be multiple original sources for a proposition (with specific original sources being tied to different communication paths pointing to the same proposition). For example, someone could assert that they came up with information originally, but not know how that information actually got transmitted on specific communication paths. And this gets messier in the case where the same information could be independently discovered by two or more sources. In other words, the source may not be asserting they are the original source for a path, they are asserting they originated an idea or reported an event (and can't even be sure that someone else didn't originate the idea or report the event earlier).

The above isn't generally true for documents, as it is generally to be assumed that a document has a single source (which could be an organization, of course). But the individual information in a document can come from many sources, which again brings up the need for a citation format in documents. One way to do this could be to create subdocuments for individual pieces of information in the overall document and provide original sources for those pieces, but here the "original source" may better be represented by a document than a "source".

One thing that is interesting to note is that documents, videos, audio, etc are "recordings" which are don't quickly suffer from memory degradation compared to ephemeral mediums.

What kind of information can an original source provide?

Witness can provide time/location/bystanders where they witnessed an event. Also provide "certainty" concerning their recollection.
Researcher can provide details about experiment or analysis performed in terms of data and analysis method, plus background references, motivations, etc.

Apart from its general analytical value, such information can also be useful in deciding original sources in the case of disputes about who is the original source (or providing evidence that there are multiple original sources).

Extending communication paths

We need a way for a user to link to an existing communication path to virtually extend it. For example, A creates the communication path B told A. A separate user X creates the path C told B told X. This user may or may not know about the B told A path. But there’s an opportunity here to show a longer path for the A path. For example: A reports A was told by B (X reports B was told by C).

In the case of two communication paths to a single predicate or document, this should be pretty trivial to do.

Another case could be something like A read proposition P1 in document D by S1. And X reports he learned about proposition P2 in document D which was written by source S2.

So we would want to show Proposition P1 was told to A in document D by S1, and X reports the author of D is S2. This could further extended by Y reporting the author of D is actually S3. In short form:

A<S1(D)
X<S2(D)
Y<S3(D)

This suggests that all mediums have an associated document, but the recording “document” may only be the recollection of the receiver (e.g. an extremely poor recording that isn’t easily accessiable and easily subject to alteration). Of course, higher weight will typically be given to documents where the information doesn’t degrade as quickly across time and where the document’s contents are readily available and not subject to easy change.

CREATE TABLE communication_path_links
earlier_communication_path_id INT4 NOT NULL REFERENCES communication_paths(id),
later_communication_path_id INT4 NOT NULL REFERENCES communication_paths(id),
correctly_reports_earlier_source_predicate_id INT4 NOT NULL REFERENCES propositions(id),
PRIMARY KEY(id),
) INHERITS (user_created);

Voting on the truth of paths

Should we create an implicit proposition about the truth of each path?

Voting true on a path would be voting true on each link in the path. To express agreement on some links, a separate “corrected” path should be created instead.

Additional paths versus contradicting paths

Imagine the two paths below:

C told B told A
D told A

These are not necessarily mutually exclusive: each path can be individually true or false (e.g. both B and D told A).

But they could be mutually exclusive: let’s say there’s a single conversation between A, B, and D. In this case A thinks it was B that told A the info, but D claims they told A. And perhaps B remembers that they told A, but D also told A.

Even if someone believes the two paths are mutually exclusive (e.g. A was only told once), it still doesn’t seem like we need any new representation added to the database: if a user thinks only one communication path took place, they can vote true on one path and vote false on the other paths.

Methods to identify original sources

Users need a way to mark if they believe a path reaches the original source. We can add an option in the communication path UI to create a proposition that claims any source in the path is the real original source. When viewing a communication path, any user should be allowed to create additional such original source claims and vote on such predicates.

For example, user C creates a path A told B told C. C thinks that A is the original source of the information, so C creates the template proposition “A is the original source of communication path P”.

When viewing this communication path, this proposition will also be shown.

Later some other user creates a predicate that actually “B is the original source of the communication path P”.

Lastly, another user creates a new communication link

Rating sources

Does the source do a good job of identifying original sources
Does the source do a good job of filtering false and true information
Is the source biased
Does the source intentionally spread false information

Rating communication links (Likelihood that the communication link may have corrupted the original information)

Because of communication medium (e.g. recall of old conversation, noisy communication channel, etc)
Unreliable intermediate source

Consider the "meaning" of these paths:

(Prop1) 11 → 21 → 31
(Prop1) 21 → 31
(Prop1) 12 → 21 → 31

If 21 is a document, 11 and 12 could be a dispute about the author of the document, or 11 and 12 could have collaborated on the document. Even in collaborative case, it’s not necessarily clear if 11 or 12 collaborated on Prop1 in the document or it was just one of them and Prop1 is being wrongly attributed.

We need to separate the cases of collaborative document creation versus assertion of exclusive authorship.

There’s a difference between providing a document and authoring a document. For example, how should we represent a case where a friend points us to a document on the web? It seems like it is useful to know the friend as an intermediary source.

So we need to represent the path:

Source Unknown1 authored Document1 and published on the web at URL1 which is found by Friend1 via web search. Source Friend1 sends URL1 link (or copy of document) to Receiver1.

This can be represented in our current DB schema: the medium in the first link is the document, the medium in the second link is the method (e.g. email or IM) used by friend1 to provide the link to the document.

The only issue is to make sure its clear to users that they should refer to the actual medium used in each link, as it seems like this could easily be a point of confusion. The key thing we want to distinguish, of course, is authorship of the document versus providing a reference to the document. We should be able to handle this via design of the associated UI.

Source Propositions and Links

Proposition: Source A is good at filtering out false information.

Issues: “good at” is a vague term, so different raters could have different opinions about what the standard is.

Computing a filtering metric for sources

On a related note, we could compute a metric based on all the information attributed to a source. For example, we could compute the percentage of that information reported that is currently rated true/false (with some cutoffs as to the point at which the info is voted as true/false).

Potentially gaming of filtering metric

But this type of metric could be gamed to some extent by someone selectively linking to just the false or just the true information reported by a source (or simplifying creating false attributions). So when reporting such a metric, would be good to analyze the accuracy rating of the attribution (e.g. filtering attributions based on the “truth” rating of the attributions themselves), and also who is doing the attributing and if the attributor has a history of selective attribution. This latter should mostly only be a factor if there are only a few attributors associated with the source.

Qualitative problems with filtering metric

Another problem with the truth/falsehood ratio as a metric is that some falsehoods are more significant than others. A similar problem is that some falsehoods are more difficult to determine than others.

Source filtering methodology

Another way to analyze a source could be on how it reports the steps it takes to filter out truth from falsehood. Similarly, a source may choose to report most of the information it receives, but it also provides information about how it tried to verify the information. Of course, source could also directly rate the truth of the propositions it is reporting.

In other words, knowing the general stance of a source on when it does and doesn’t propagate information is useful (e.g. source reports everything or filters out falsehood via methods X, Y, Z).

Biased sources

Source A is biased about tag X. Will biases match up well with tags? Seems like they can match up ok if tags are appropriately defined and used.

But bias also needs to be characterized in terms of the nature of the bias and the degree of the bias (just saying source is “biased” about a tag doesn’t seem sufficient). For example, if the issue is abortion, is the source “for” or “against” abortion. Also, how does the bias manifest? Does the source just selectively filter what it reports? Does it “spin” the information? And in either case, to what degree?

Malicious source

Primary purpose of Source A is to spread misinformation.

Potential reasons for spreading misinformation:

source directly benefits from the spread of the misinformation (reputation, wealth, etc).
source is being paid to spread misinformation.

Fake sources and source-source links

Two separate cases: identified source doesn’t exist at all OR source exists but is misidentified as a source of the specific information.

Possible wordings for fake source predicates

Source A is not a real person.
Source A is not a real organization.
Source A is fake source. (combines first two)
Source A doesn’t exist. (combines first two)

How to link a real source to a fake source?

This claim isn’t tied to any particular path, it’s a claim that all information from a source is coming from another source.

There’s also at least two distinct cases for this type of direct source linking: “source doesn’t exist at all”, or a claim that a source is “controlled” by another source.

In other words:

Source A is actually Source B. Source A → Link type “is” → Source B
Source A is controlled by Source B. Source A → Link type “controlled by” → Source B

The sources of a source

Another potentially useful link between sources is to identify typical sources for a source (e.g. favorite “standard” news sources of a user). For example:

Source A → Link type “common source” → Source B

Common sources can also be computed from individual communication paths provided by users, but it seems possible a user might be consuming a lot of information from a source that influences their thinking but that doesn’t get directly attribute to any given proposition. This link type would allow for a compact way to communicate common sources used by a source. It could be especially useful for new users who haven’t generated many ratings reasons, or users who don’t like to provide ratings reasons.

Of course, it would be better if we get to a state where users are less of reliant on a small number of information sources, but at initial rollout of the software we can expect that most users/sources will be primarily reliant on a few sources in many areas (which brings up a related question of specifying which types of information are from which common sources, but in practice I think we can ignore this issue for now).

Highlighting of user’s rating of propositions about themselves

For predicates that reference a user, we should probably have a way to highlight the rating provided by the user in question. For example, in the case of the implicit proposition associated with the “common source” link for B, it’s especially useful to know if B agrees that A is one of their common sources.

Confirmation that information wasn’t modified in a communication path

Generally, it seems the fastest way to go about finding the original information is to start at the source closest to the information’s origin in the communication path. But there are other considerations: some sources may be easier to contact than others. And any intermediate source may potentially be able to contact the original source, not necessarily just the one closest to the original source in the communication path.

It’s also worth pointing out that an intermediate source may be able to provide enough information to determine that the original information has undergone significant change. When the information hasn’t changed at the intermediate point, it doesn’t guarantee that there wasn’t an alteration earlier in the path, but if information has changed significantly at an intermediate point in the path, it is unlikely that the original information would match the final received form.

Of course, just knowing the information has changed is usually not as useful as being able to obtain the original information, so being able to contact the original source is still generally the most desirable outcome, especially since the original source can potentially supplement with additional information.

Contacting sources

Generally, users of the rating system should be easier to contact. This should be even more true for frequent users.
Sources can be filtered based on whether or not there is contact information for them in the database. Contact information can be rated for accuracy.
When reliable contact information is available, sources can be rated based on their willingness to confirm and/or discuss information they’ve provided.
Links where information is available in the form of a recording (e.g. a document) can be reviewed without necessarily needing to directly contact a source, although original source contact can still provide potentially useful information not recorded in the document.

Another potential “market” for the rating software: news agencies

Would need to support confidential sources for some types of news agencies where the information comes from whistle-blowers, etc. I think all we’d would need is to allow a source to be marked as “confidential”. Creator of the source can provide as much detail as they desire about the nature of the confidential source.
Support for “shielded” sources: some sources may not want people to directly contact them because of concerns about time-wasting or harassment, but may be willing to entertain questions after the questions have been filtered for relevancy by an intermediary source. In theory, no special feature needs to be added to the code to support this, since the contact information could just point to the intermediary, but it’s probably worth flagging such contact information as being a filter contact rather the actual source’s contact. Since there's multiple people involved, the intermediate contact could be considered to be representing a filtering organization I suppose.

"Original information" for source pages

Sources pages can have lists for information where they are believed to be the original source. We should probably have multiple such lists for different types of information. For example:

event witness (non-reproducible event)
experiment witness (reproducible event)
scientific ideas, math proofs, etc
art documents containing information (song, novel, etc)

"Transmitted information"for source pages

Similar to original information, these lists could contain information where the source is repeating information from other sources.

What is "original" information?

One issue that arises in determining originality is that curation and compilation of info can also be considered an act of creating information. Math proofs demonstrate this clearly: the proof is "just" selecting from a set of axioms and theorems to prove something is true, but it is clearly also creating new information that wasn't previously available.

In terms of the rating system, the original source for the overall proof would be the prover, but other sources would be the original sources for the contained axioms and theorems.

Edited Aug 28, 2025 by Dan Notestein