• Watch Out for Scammers!

    We've now added a color code for all accounts. Orange accounts are new members, Blue are full members, and Green are Supporters. If you get a message about a sale from an orange account, make sure you pay attention before sending any money!

Voices in Time

MosesTheTank

Full Member
Full Member
Minuteman
Jan 28, 2011
3,154
1,838
SC
Just for the hell of it I entered a hackathon hosted by one of our most prominent universities. Funding was provided by a wall street firm. The problem to be solved was analyzing language via a branch of AI known as NLP, or natural language processing. Basically, we took a huge volume of text and used a computer to "decipher" it. The text at hand was political speech and the resulting coverage by the press. Participants used all sorts of technologies to tackle the problem. I used some common python packages. By an amazing stroke of luck, I was amongst four finalists who would enter a final round.

Part of the reason I was a finalist was because I knew what the judges wanted to hear. Much like an application to this school, you compose it to create the image that they are looking for. Given the environment surrounding this contest we had to have a few "AI ethicists" in the peanut gallery. Generally, these people are thugs who browbeat everyone into political conformance. The ethicists were pleased with my results, because of course I ensured that they conformed.

Mind you, at least two of the other participants are far more experienced than I am. Their skill set with this topic is first rate, which I am certainly not. But I was able to make up for it by telling a better story.

For the final round I let AI speak the truth. Speeches by Putin, Obama, Trump, and several others were analyzed and compared with the resulting press coverage. The unvarnished results laid bare a revolting media bias. My final entry was immediately squashed and sealed under the rubric of intellectual property of the sponsoring institution. I knew something like this this would happen.

The academic institution involved is lauded for many things, but perhaps chief amongst them is their school of journalism. They kept a watchful eye on things. I would like to take note of one of these "journalists" as she sums up the relationship between the Trump presidency and the MMM.

Margaret Sullivan hates Trump. She has posed as a fact checker and babbles endlessly about Trump in several publications, but most prominently in the WaPo. Here is a sample article, Trump has sown hatred of the press for years.

While Trump is certainly not a cerebral man and his list of faults is long, the gist of what he says about many things political, especially the press, is very much aligned with the likes of Cicero, Edward R. Murrow, Carl Sagan, James Madison, Will Rogers, Lysander Spooner, Ray Bradbury... or at least that is what AI unearthed.

Both Frau Sullivan and WaPo love Mark Twain. A casual persusal found more than 1,100 positive Mark Twain references in the WaPo. Sullivan has quoted him many times. But the moment I posed the parallels between Trump speech and Twain speech, Mr Twain was quite suddenly cast as a racist (see Huck Finn), and I became a pariah.

! pip install spacy begat this piece by Mark Twain (1)

The press has scoffed at religion till it has made scoffing popular. It has defended official criminals, on party pretexts, until it has created a U.S. Senate whose members are incapable of determining what crime against law and the dignity of their own body is; they are so morally blind, and it has made light of dishonesty till we have as a result a Congress which contracts to work for a certain sum and then deliberately steals additional wages out of the public pocket and is pained and surprised that anybody should worry about a little thing like that.

I am putting all this odious state of things upon the newspaper, and I believe it belongs there—chiefly, at any rate. It is a free press, a press that is more than free, a press which is licensed to say any infamous thing it chooses about a private or a public man, or advocate any outrageous doctrine it pleases. It is tied in no way. The public opinion which should hold it in bounds it has itself degraded to its own level. There are laws to protect the freedom of the press’ speech but none that are worth anything to protect the people from the press. A libel suit simply brings the plaintiff before a vast newspaper court to be tried before the law tries him, and reviled and ridiculed without mercy.

It seems to me that just in the ratio that our newspapers increase, our morals decay. The more newspapers, the worse morals. Where we have one newspaper that does good, I think we have fifty that do harm. We ought to look upon the establishment of a newspaper of the average pattern in a virtuous village as a calamity.

The difference between the tone and conduct of newspapers today and those of thirty or forty years ago is very noteworthy and very sad—I mean the average newspaper (for they had bad ones then, too). In those days the average newspaper was the champion of right and morals, and it dealt conscientiously in the truth. It is not the case now. The other day a reputable New York daily had an editorial defending the salary steal and justifying it on the grounds that congressmen were not paid enough—as if that were an all-sufficient excuse for stealing. That editorial put the matter in a new and perfectly satisfactory light with many a leather-headed reader, without a doubt. It has become a sarcastic proverb that a thing must be true if you saw it in a newspaper. That is the opinion intelligent people have of that lying vehicle in a nutshell. But the trouble is that the stupid people—who constitute the grand overwhelming majority of this and all other nations—do believe and are molded and convinced by what they get out of a newspaper, and there is where the harm lies.

That awful power, the public opinion of a nation, is created in America by a horde of ignorant, self-complacent simpletons who failed at ditching and shoemaking and fetched up in journalism on their way to the poorhouse. I am personally acquainted with hundreds of journalists, and the opinion of the majority of them would not be worth tuppence in private, but when they speak in print, it is the newspaper that is talking (the pygmy scribe is not visible), and then their utterances shake the community like the thunders of prophecy.

I know from personal experience the proneness of journalists to lie. I once started a peculiar and picturesque fashion of lying myself on the Pacific Coast, and it is not dead there to this day. Whenever I hear of a shower of blood and frogs combined in California, or a sea serpent found in some desert there, or a cave frescoed with diamonds and emeralds (always found by an Injun who died before he could finish telling where it was), I say to myself I am the father of this child—I have got to answer for this lie. And habit is everything—to this day I am liable to lie if I don’t watch all the time.

In a town in Michigan, I declined to dine with an editor who was drunk, and he said in his paper that my lecture was profane, indecent, and calculated to encourage intemperance. And yet that man never heard it. It might have reformed him if he had.

A Detroit paper once said that I was in the constant habit of beating my wife and that I still kept this recreation up, although I had crippled her for life and she was no longer able to keep out of my way when I came home in my usual frantic frame of mind. Now, scarcely the half of that was true. Perhaps I ought to have sued that man for libel—but I knew better. All the papers in America, with a few creditable exceptions, would have found out then, to their satisfaction, that I was a wife beater, and they would have given it a pretty general airing, too.

Why, I have published vicious libels upon people myself—and ought to have been hanged before my time for it, too—if I do say it myself, that shouldn’t.

But I will not continue these remarks. I have a sort of vague general idea that there is too much liberty of the press in this country, and that through the absence of all wholesome restraint, the newspaper has become in a large degree a national curse and will probably damn the republic yet.


1 License of the Press
 
The real power of what you have done is not the conclusion, but, if possible, scripting it as a run command so others can do this for themselves and see for themselves... this bias is persisant, and will surely be amplified and scaled across topics as we head into the new administration. Imagine creating a 'Perspectives In Voices Over Time' (PIVOT).

The recent historic commentary about data has been that freedom lies in its democratization. I've always felt that statement to be premature. it's in the ability to democratize the analysis of the data with transparency of source and curation - whatever it's volume and structure.
 
The real power of what you have done is not the conclusion, but, if possible, scripting it as a run command so others can do this for themselves and see for themselves... this bias is persisant, and will surely be amplified and scaled across topics as we head into the new administration. Imagine creating a 'Perspectives In Voices Over Time' (PIVOT).

The recent historic commentary about data has been that freedom lies in its democratization. I've always felt that statement to be premature. it's in the ability to democratize the analysis of the data with transparency of source and curation - whatever it's volume and structure.

I couldn't agree more. Numeracy is the mathematical analogue to literacy. It seems most people are happy to proclaim themselves not good with numbers, but almost no one is happy to say they are illiterate. There is no excuse for this, no matter how much explanation may be offered.

Right now there are gobs of tools people without advanced training can learn to use to gather and analyze vast amounts of data. A decade or so ago I learned how the NYT editorial board carefully curates the entire content of its publications to project a seamless narrative that paints the world in a way that aligns with their collective psyche. I learned this through some lengthy investigative conversation. Today I can prove it with analysis performed on free, open source tools. I can also prove collusion.

“The limits of my language means the limits of my world.”
― Ludwig Wittgenstein
 
I couldn't agree more. Numeracy is the mathematical analogue to literacy. It seems most people are happy to proclaim themselves not good with numbers, but almost no one is happy to say they are illiterate. There is no excuse for this, no matter how much explanation may be offered.

Right now there are gobs of tools people without advanced training can learn to use to gather and analyze vast amounts of data. A decade or so ago I learned how the NYT editorial board carefully curates the entire content of its publications to project a seamless narrative that paints the world in a way that aligns with their collective psyche. I learned this through some lengthy investigative conversation. Today I can prove it with analysis performed on free, open source tools. I can also prove collusion.

“The limits of my language means the limits of my world.”
― Ludwig Wittgenstein
Try curating some of the stop light data that San Diego City has for free. Some pretty interesting patterns show up. 🤔🤔🤔
 
You know Moses, the greatest threat against a democracy is the censure of free speech and this pattern of behavior that you used in your submission process - effectively a 'placebo' and then a 'real dose' shows just how attuned and triggered this censure mechanism is becoming. There's nothing like real sunshine, to cast light on the disease and perhaps even cure it...

In the same manner that Twitter has this flaggin mechanism of 'this claim is disputed', I wonder if a bias flag with a metric of bias could be created that is the analysis of the sum total of online/available data and content a poster has uploaded?

I would be interested if any of the judges in your competition asked how much of the signal you were getting was confirmation bias as opposed to direct content creation intent? One can pander to an audience whereas one can direct an audience - it would be interesting to see if these two could be parsed and if there's a correlation that can be shown. E.g. have we derived the view of the existence of white privilege because it's true and being simply reported on, or was there a body of content pushing and 'exposing' it and then the audience adopted the view?

There's nothing like using the opposition's own tools and methods as the rope by which they're snared...
 
  • Like
Reactions: lash
Another NLP practicioner! WOOT WOOT!

I myself have noted the influx of "AI Ethics" lately into the field which is to say people who have a lot of sociology background and 0 Data Science/Artificial Intelligence background/knowledge. Entire Conferences devoted to the subject. For those not familiar with AI it seems legit. For those of us in the field its like discussing the ethics of math

2 + 2 = 4 no matter what your ethics, bias, prejudices, etc.

All of AI is based on math. But the bullshit flows freely and these idiots get huge salaries doing zip.

The real leaders of AI and Machine Learning won't touch the topic with a 10 foot pole. That should tell you something.
 
  • Like
Reactions: lash
Actually---bias is a huge deal in AI and Machine Learning--very specific meaning.

Bias is how well your model fits the training set. On the one hand, you want a model that is somewhat biased towards your training set, as hopefully your training set represents somewhat the real world. However, if you have took much bias, you model does not perform well on new data. So the "ethics" people are displaying the very trait a data scientist is trained to try and avoid....

Irony indeed......
 
In the real world, data is very messy. "Training sets" by their very nature of being curated, are inherently biased.
 
  • Like
Reactions: gunman_7
In the real world, data is very messy. "Training sets" by their very nature of being curated, are inherently biased.

Presence of bias itself is not a problem (too muc) as long as there's the transparency and notation of it in the analysis. Too often, bias is not declared and the analysis presented as 'fact' rather than limited to the parameters of the data.

What's interesting about the NLP aspect is can editorial bias be shown with data/analytics and therefore, predicted and modeled, even when there is not data bias... in effect, what Moses said he did in different submissions, was to show exactly that... editorial bias and bias based decisions. It would be a marvelous tool of public policy audit if such biases could be shown in how Govt. at any level works...
 
It could also be used to regulate/litigate some of the big tech's censorship...
 
hmm.. that litigation would be interesting. I wonder what the hurlde would be for an AI output to meet legal requirements of fact or evidence.
 
hmm.. that litigation would be interesting. I wonder what the hurlde would be for an AI output to meet legal requirements of fact or evidence.

Which kind of proves my point of curated data sets being biased, no? If it can't pass a legal test...remind me again why I'd trust "AI/ML"?

True AI/ML, as in sentience, is a fallacy at this point in time. In most cases, "AI/ML" is just using brute force calculations and rulesets to achieve the appearance of "AI". Hence the coined term "Reflective Intelligence" (RI) as a true description of today's technology (and the danger of it as well).

Again, JMHO...
 
Which kind of proves my point of curated data sets being biased, no? If it can't pass a legal test...remind me again why I'd trust "AI/ML"?

True AI/ML, as in sentience, is a fallacy at this point in time. In most cases, "AI/ML" is just using brute force calculations and rulesets to achieve the appearance of "AI". Hence the coined term "Reflective Intelligence" (RI) as a true description of today's technology (and the danger of it as well).

Again, JMHO...

and still, sooo many corps are making huge bets on AI driving insight and decisions... I work in this field and it's worrying how much reliance is being made on this without the understanding of it's limitations in both data and method..

interesting times..
 
Ahh
Presence of bias itself is not a problem (too muc) as long as there's the transparency and notation of it in the analysis. Too often, bias is not declared and the analysis presented as 'fact' rather than limited to the parameters of the data.

What's interesting about the NLP aspect is can editorial bias be shown with data/analytics and therefore, predicted and modeled, even when there is not data bias... in effect, what Moses said he did in different submissions, was to show exactly that... editorial bias and bias based decisions. It would be a marvelous tool of public policy audit if such biases could be shown in how Govt. at any level works...

You touch the the essence of the problem.

The training sets reflect the choices and and decisions of those who label them and the algorithm just learns EXACTLY as it it taught.

THere are some interesting semi-supervised models out there where they will take a giant dump of the internet (say for instance all of wikipedia) and train over that--on things like predict the missing word. With things like that you can get a very neutral view of language versus where you try and classify a bunch of tweets based on an individual's feedback ("That tweet is fake news...").

That's where the ol SJW start to take their chunk of the pie with "The English Language itself is biased"

I really need to get one of those gigs where I can just make stuff up and get paid a couple hundred k. Just today I had to submit a whitepaper examining how we combat and monitor bias for a model that tells people how to change their password....
 
So in the modeling I used to support from a software enablement perspective, I always had to deal with quants who'd wax lyrical about their stats skils etc and wanted to see how well my software would support backtesting etc. I knew these guys would fudge things to make their models directionally and amplitude correct...

Many times, modelers are really doing regression analysis. I wanted to see 'X', therefore, I have curated data and methods that would lead to 'X' such that when 'X' finally does happen I can say I predicted it...

Putting political affiliations aside, I do wonder if the inclusion of fractal analysis could be used to ID good 'themes' to push with some predicitions of their viability of becoming accepted. E.g. Epstien (pedo) --> Bill Clinton (philanderer and easily believed pedo) --> Hilary C (wife of Bill C and enabler of Bill C) --> therefore --> PizzaGate!!
 
RTX 3090s, if you can get them, are great for offloading and crunching that Linear Algebra and Gradient Descent ;)
 
RTX 3090s, if you can get them, are great for offloading and crunching that Linear Algebra and Gradient Descent ;)

WHen you see the resources that Google or Facebook can bring, a 3090 is a drop in the bucket. There is an actual industry about how to make models "Green" in that they can actually calculate the cost and carbon footprint of training a model. WHile I tend to frown about 'carbon footprints' the resources are shall we say....impressive. Most impressive.
 
WHen you see the resources that Google or Facebook can bring, a 3090 is a drop in the bucket. There is an actual industry about how to make models "Green" in that they can actually calculate the cost and carbon footprint of training a model. WHile I tend to frown about 'carbon footprints' the resources are shall we say....impressive. Most impressive.

Yup. We spin up VMs with A100s on GCP all the time. I've seen a few instances where the tab was as much as an average house. And that is one of the problems with GCP, you may have no idea what it is going to cost. This is one of several reasons I love Snowflake as there is a great deal more control.
 
...

What's interesting about the NLP aspect is can editorial bias be shown with data/analytics and therefore, predicted and modeled, even when there is not data bias... in effect, what Moses said he did in different submissions, was to show exactly that... editorial bias and bias based decisions. It would be a marvelous tool of public policy audit if such biases could be shown in how Govt. at any level works...

Yes. The question is, who cares?

Using the coin toss example, do you really need to put eyes on a coin to see if it is a trick coin (either two heads or two tails) or can you just look at the results of the tosses and determine of it is a legit coin? Even though every toss is an independent event, as the number of consecutive heads goes from 3 to 5 to 11 to 100... at what point do you call bullshit? If the same result happens an absurd number of times I have zero need to lay eyes on the coin.

Case 1
Sentiment analysis is probably the simplest form of NLP. You can boil it down to classifying some bit of text as positive, negative, or neutral. Charles Blow and Chris Cuomo almost always display the same sentiment on the same political topics. Not a shock. What if we look at thousands of strings of text from a dozen media outlets and they all display the same sentiment on the same topics at about the same point in time?

Case 2
I've created word clouds many times but I never thought of forming the clouds into the shape of their point of origin. Here is a clever example of someone who put the clouds into the shape of the US Presidents who spoke them. Of course most of us could look at a transcript of a president's speech and automatically visualize the face of the speaker. Point here is that it is often easy to tie word composition to its origin.

To drive this home, here is an example of someone who removed all words from books, leaving only the punctuation, and was able to identify the author with a surprising degree of accuracy.

These are forms of fingerprints that can be detected at massive scale.

Would it shock anyone that numerous writers from multiple MMM outlets have virtually identical word clouds, published sometimes within minutes of each other, day in and day out, over many thousands of transcripts?

Case 3
What if you could mimic what Robert Reich or Chris Cillizza write, but before they write it, or even before a given event takes place? On the one hand someone could argue that we already know how they feel about certain topics. Many of our significant others easily finish our sentences with pinpoint accuracy. But if that significant other started finishing someone else's sentences, much hundreds of others peoples sentences, I might start to get a little suspicious.

Like the coin toss, at some point you have to start calling bullshit even if you have no idea what the mathematical odds are that such unseemly uniformity can occur simply by chance.

The work I referenced at the top of the thread did all three of these things and it got squashed. So to answer my own question, at least a few people definitely care. And at least some of those people are in a position to control the narrative.

As I make my way through a cup of coffee here on the couch I'm bitching and moaning about something many of us have seen in one form or another... just wanted to acknowledge that.

Carry on...

 
Last edited:
  • Like
Reactions: lash
Sometimes, the showing is more impactful than the telling. In the video clip, an interesting comment was 'this is what happens when 5 people own all the media outlets'.

Everyone who pays a bit of attention knows there's concentration of ownership, but this videa itself shows why that's bad. I think using AI in your manner, that shows bias of representation regarldess of context helps take emotion out of criticism. Labeling a newspater as 'liberal shit', while accurate, will rarely get minds changed. Showing the illilberal behaviour of indifference to facts to drive a specific agenda shows data and evidence that has the heat replaced with cold fact.

I think most Americans are conservative in the original sense of the word, not the modern version of corporate republicans. I think most Americans care about right and wrong and care about their country, the state of the world and the environment. But all the sub-catagories within those terms have become politicized because it suits the few who own the media, the lobbyists who want to channel those sentiments into political discussions and the politicians who want to use those sentiments and channels for their own gains. I have to believe that if a better way of showing these manipulations and their benefactors to those being manipulated would bring about a change...
 
  • Like
Reactions: lash