U.S. Congress Scorecard

Examining Consistency of AI Analysis

During the initial phase of AI analysis of these bills, certain anomalies in the results of AI analysis were observed. To rule out potential one-off occurence of such anomalies, and to verify reliability of such AI analysis, further tests were conducted by performing the same AI analysis 3 times (each test/analysis hereafter is called 'iteration') on a sampling set of 100 bills.

These 100 bills were randomly selected from over 2350 bills, analyzed by AI and included in this site. A period of 1 to 2 months separated these 3 AI analysis iterations from each other.
Exactly the same text of each of these 100 bills and the same prompt (i.e., instructions to AI model) were used in all 3 iterations.
The expectation from this test was that the data points from the qualitative items, i.e., 5 classification metrics [ 4 beneficiaries (Common Citizens, Small Businesses, Large Businesses, Government) and materiality ] for each bill should yield exactly the same result in all analysis iterations.
Although zero bias/temperature was used throughout this AI analysis, due to the nature of Generative AI, it was anticipated that part of the textual content (along with illustrative examples) of the AI analysis of the same bill may vary from one iteration to another.

Iteration result:

Comparison of AI Analysis Iterations: