Update Notes on using Lem's algorithm interface authored by Peter Menegay's avatar Peter Menegay
......@@ -22,22 +22,6 @@ Thereafter everything worked fine. This discussion will now focus on the details
Lem provided two algorithms to go with the interface (in algorithms.py), `sapienza_bayes_v1` and `trust_weighted_average_v1`. They both appear to work and give correct results. There is some concern that `sapienza_bayes_v1` has an error because it gives two peaks rather than one when running the [Binned and Continuous Distributions](Binned and Continuous Distributions) example. We will investigate the error, if it is that, and fix it. For now we are concerned with the process of getting algorithms into the interface and uploading them to the main server code.
<h2>Trust-weighted histogram algorithm</h2>
Also added was Eric's [trust_weighted_histogram algorithm](Dan's-proposal-for-trust-weighted-histograms). This algorithm requires an additional input, the number of bins, so a `misc_input` field was added to `AlgorithmInput` to include this easily:
```python
class AlgorithmInput:
components: list[ComponentData]
misc_input: dict
```
The user (or algorithm developer) would then include this when creating an `AlgorithmInput`, eg:
```python
alginp = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
```
<h2>straight_average and straight_average_intermediate algorithm</h2>
......@@ -66,7 +50,7 @@ Calculations now proceed with the input `intermediate_results` and result in new
In essence, for the case of straight averaging, the `intermediate_results` are just the Sapienza trust-modified probabilities of every source in the sub-group to be analyzed. The sub-group in this case is a node and its direct descendants (children). The average of the group is calculated and that plus the list of probabilities (modified by trust) become the results for the next level. Two examples of a multi-level tree were created manually to test this concept (just run custom_algo.py). In real life transferring results between nodes will presumably be handled by the server code, so this is really just an experiment to see that things are conceptually ok.
<h2>Example of using straight_average_intermediate</h2>
<h3>Example of using straight_average_intermediate</h3>
Let's start by using the case from [A simple averaging technique to supplement the Bayes equation](A simple averaging technique to supplement the Bayes equation):
......@@ -172,3 +156,112 @@ comp0.intermediate_results = output012.intermediate_results
Running the snippet XXX for this case gives an overall $P_{ave} = 0.616$, same as in [A-simple-averaging-technique-to-supplement-the-Bayes-equation](A-simple-averaging-technique-to-supplement-the-Bayes-equation).
<h2>Trust-weighted histogram algorithm</h2>
Also added was Eric's [trust_weighted_histogram algorithm](Dan's-proposal-for-trust-weighted-histograms). This algorithm requires an additional input, the number of bins, so a `misc_input` field was added to `AlgorithmInput` to include this easily:
```python
class AlgorithmInput:
components: list[ComponentData]
misc_input: dict
```
The user (or algorithm developer) would then include this when creating an `AlgorithmInput`, eg:
```python
alginp = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
```
<h3>Example problem for Trust-weighted Histogram Algorithm</h3>
The example problem is the same as the one solved by Eric in [Dan's-proposal-for-trust-weighted-histograms](Dan's-proposal-for-trust-weighted-histograms):
```graphviz
digraph G {
fontname="Helvetica,Arial,sans-serif"
node [fontname="Helvetica,Arial,sans-serif"]
edge [fontname="Helvetica,Arial,sans-serif"]
layout=dot
1 [label="1, P=20%"]
2 [label="2, P=30%"]
3 [label="3, P=40%"]
4 [label="4, P=45%"]
5 [label="5, P=60%"]
6 [label="6, P=90%"]
7 [label="7, P=55%"]
8 [label="8, P=65%"]
9 [label="9, P=70%"]
10 [label="10, P=80%"]
1 -> 2 [label="T=0.9"];
1 -> 3 [label="T=0.9"];
1 -> 4 [label="T=0.9"];
2 -> 5 [label="T=0.9"];
2 -> 6 [label="T=0.9"];
3 -> 7 [label="T=0.9"];
3 -> 8 [label="T=0.9"];
4 -> 9 [label="T=0.9"];
4 -> 10 [label="T=0.9"];
}
```
First we start with Nodes 2,5,6 which have no `intermediate_results` since they are the bottom level. Node 2 trusts itself completely so its `trust_factor` will be 1. Nodes 5 and 6 have `trust_factor` set to 0.9 which is Node 2's trust for them. The opinions and components are created as usual:
```python
intermediate_results = []
opinion2 = OpinionData([0.3,0.7], 1)
opinion5 = OpinionData([0.6,0.4], 1)
opinion6 = OpinionData([0.9,0.1], 1)
comp2 = ComponentData(opinion2, 1.0, intermediate_results)
comp5 = ComponentData(opinion5, 0.9, intermediate_results)
comp6 = ComponentData(opinion6, 0.9, intermediate_results)
```
Next the `AlgorithmInput` is created noting that we are choosing to use 10 bins for the histogram, inserted into the `misc_input` field discussed above. The calculation then proceeds:
```python
alginp256 = AlgorithmInput([comp2, comp5, comp6], {'Nbins':10})
output256 = trust_weighted_histogram(alginp256)
```
Since Node 2 will be used for the next level, we set its `intermediate_results`:
```python
comp2.intermediate_results = output256.intermediate_results
```
In similar fashion, Nodes 3,7,8 and Nodes 4,9,10 are computed. Once this is done, we can move to the next level up and do Nodes 1,2,3,4. First we modify the trust_factor for 2,3,4 to be 0.9 which is Node 1's trust for them rather than their trust for themselves:
```python
comp2.trust_factor = 0.9
comp3.trust_factor = 0.9
comp4.trust_factor = 0.9
```
Next, `comp1` needs to be created, starting with no `intermediate_results`:
```python
intermediate_results = []
opinion1 = OpinionData([0.2,0.8], 1)
comp1 = ComponentData(opinion1, 1.0, intermediate_results)
```
Next is the creation of the AlgorithmInput and the calculation itself:
```python
alginp1234 = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
output1234 = trust_weighted_histogram(alginp1234)
```
At this point we're done. Running the snippet XXX yields the expected result for Node 1's computed histogram:
[0.0, 0.0, 0.5555555555555556, 0.5, 1.0, 0.45, 0.9, 0.45, 0.45, 0.45]
If Node 1 were to need to feed it's results to a next higher level, its `intermediate_results` would need to be updated:
```python
comp1.intermediate_results = output1234.intermediate_results
```
The method by which we update intermediate_results here could be improved by doing it automatically, within the calculation. However, to do that, we would need to assume that the first component passed to `AlgorithmInput` is the parent component.