Artificial Intelligence and the Broadband Business: A Black Box

AI may be perpetuating and supercharging faults that already exist in researchers’ use of statistics.

Let me start with something a good manual search technique or an “AI-augmented” search (as now promised by Bing, Google and others) can find. I found it “manually” because any time a big funding bill hits Congress, I attempt to see if broadband issues are affected. I recently searched the 99 pages of the compromise final debt ceiling bill just before passage to see if any broadband funds were at risk.

The debt reduction bill – now law – mentions broadband once:

The unobligated balances of amounts made available under the heading “Rural Development Programs—Rural Utilities Service—Distance Learning, Telemedicine, and Broadband Program” in title I of division B of Public Law 116–136 are hereby permanently rescinded.

For those of us who do not obsessively memorize these things, Public Law 116–136 is the Coronavirus Aid, Relief, and Economic Security Act (aka CARES Act). It passed three years ago. What could still be unobligated with regard to broadband? The CARES Act, among many other things, provided the USDA with money for its ReConnect program, which provides grants and loans to rural broadband providers.

Applications for the last round, ReConnect Round 4, closed early last November. I knew, of course, that by spring at least some and probably most, if not all, grant and loan winners were already chosen but had not been announced. Normally, USDA would have finished its reviews and negotiations by early April at the latest. It was now late May. There was more than $1 billion to give away. That amounted to about 1 percent of the “savings” in this year’s budget that the debt reduction law projected.

But how much of that, if any, was not yet legally “obligated?” Finding out required a technique called “reporting.” No current AI program of any kind could have handled the task. My contacts at USDA said it had all been obligated. My contacts in Congress – staff of members representing rural districts, in this case – said they had not been sent notifications of any grant awards. Usually, USDA faxes those notices. Faxes? Oh, never mind.

In January, the Federal Reserve Bank had told all branches of government to conserve cash. This kind of spending is easily “conserved” by doing the work but not handing out the money. Otherwise, the federal government would probably have run out of cash in February. The new, final deadline was June 1.

Spoiler alert: The faxes started flowing that week after the debt ceiling bill became law.

The spending “delay” also allowed members of Congress – of all party flavors – to claim more cuts in the debt ceiling bill than really existed. Republican members from rural districts were happy. Their constituents will be getting broadband. Democratic members were also happy. Fake cuts are better than real cuts, especially when a public good such as broadband is on the line – and they will have a moral bargaining chip when renewal of those $30 per month grants for broadband services to poor families becomes necessary, around this time in 2024.

Renewal is not assured. But the folks who finance broadband deployment love those grants. They go to the deployer and are guaranteed cash flow. Folks who care about low-income families, especially families with kids and telemedicine needs, also look kindly. A two-year renewal of these grants would come to around $15 billion of assured revenue for broadband deployers.

I’ll be watching all that, and also analyzing this final ReConnect round. As I have noted in great detail in past issues, the first three rounds (Rounds 1 and 2 under the 2018 agricultural appropriations bill and Round 3 under the  CARES Act, all signed by President Trump) seem to have been money well spent.

But what is all that AI talk about? Here’s an explanation from my point of view.

When AI Mainly Just Calculated, Hardly Anyone Noticed. But Who Checks the Calculations?

Artificial intelligence is not new. What is new is that AI routines, seemingly suddenly, can interact in plain language. They can write, read and create or modify images. As long as AI was a math tool, most people did not notice.

Almost every industry and every company already have some specific AI-based processes. The term, often coupled with machine learning (ML), refers to data mining followed by statistical analysis in such fields as sports, medicine, finance and, of course, broadband network traffic management.

Each organization’s risk and use profile is unique. Will organizations that already have resources in place race ahead without their top managers even knowing that the race has begun? Will the current mess in academic research get more messy?

In this column, I want to share some background information and delve into AI’s specific relevance to the “math” behind broadband deployment and use.

Some readers know that I have long used some AI-based analysis in a data-heavy subset of my economic articles in Broadband Communities. What’s the difference between my old AI standbys (mainly based on random forest routines) and the smarty-pants new kids on the block, such as ChatGPT, based on neural networking?

Most modern broadband networks have been using domain-specific AI for more than a decade. Remember writing patch code to instruct “dumb” routers and switches where to send stuff? How often do you still have to do that? Could you imagine new applications, such as having 50,000 football fans streaming their experience all at once at the stadium, without dynamic routing?

Other fields have used domain-specific AI with considerable but mixed success. Just before the COVID-19 pandemic, for instance, AI run on IBM’s Watson super-computer hardware helped guide Boston-area medical professionals to suggest diagnoses and treatments. The idea was to overcome doctors’ tendency to undertreat Black and Hispanic people.

But the AI, trained on existing treatment history, perpetuated the bias. Watson had its good days in med school, but flunked out. Yes, humans noted (after awhile) what was going on and cured the problem. Likewise, self-driving vehicles are getting better. Give them five more years and maybe they’ll get their own driver’s licenses, and even apply for the licenses themselves.

Once this article is published online, ChatGPT will be able to use it to learn how to “write” similar stories in the future, maybe with suggestions on whom to call – or it could just make up sources.

But will humans be able to detect flaws in AI reasoning in the future? Maybe not. Will the computers go outside the data they trained on and produce unsupportable, wild-eyed results? They already have.

Will AI users care? Based on current (sometimes mindless) use of near-automatic statistical software, my guess is no, many will not care. AI, however, as it is evolving in the near term, does seem to make fewer mistakes than humans, as you will see.

Image by DALL-E on Microsoft Bing Image Creator, asked to create artwork that evoked “math, statistics, AI, and the internet” in the style of Edward Hopper (1887-1967), an American painter also known for his etchings. The first internet message was sent in 1969, two years after Hopper died.
The Problem with AI-Driven Statistics

My biggest concern is that AI is supercharging a fault that already exists in researchers’ use of statistics, especially use of automated statistical routines.

You should know that according to the Bureau of Labor Statistics, there are about 50,000 statisticians in the U.S. I am not one of them, although I have taken many post-graduate math courses and written a successful but dated statistics book (“Spreadstat: How to Build Statistics into Your Lotus 1-2-3 Spreadsheets,” McGraw-Hill, 1988). Microsoft quickly added the same statistical functionality to Excel, for more use with less thinking.

With degrees in physics and journalism, I have often written about quality control issues in construction (“Construction Disasters: Design Failures, Causes and Prevention,” McGraw-Hill 1984, with numerous TV spinoffs), safety issues inherent in various products and services (“Product Safety and Liability: A Desk Reference,” McGraw-Hill, 1977), and even COVID-19 data and testing, including acknowledgement in an early 2021 Lancet medical paper on the value of “quick” home COVID-19 tests.

I may have been the first to computerize an engineering text, “The Handbook of Engineering Calculations.” But I’m probably best known outside Broadband Communities for quality control and quality assurance in data – helping pollsters, market surveyors, medical professionals and Broadband Communities readers. (Read key economic articles at and a story about those studies, on an American Statistical Association website:

Half of all medical research published in peer-reviewed journals cannot be replicated. When it comes to toxic or possibly toxic substances, environmental studies, and even studies on strategies to help slow global warming, industry often – even maybe almost always – shapes the research it funds to avoid answers it does not like. A huge cadre of junior faculty, often at small colleges, does underfunded, underpowered research that begats crazy, peer-reviewed (and thus perversely newsworthy) results.

My sister oversaw the Environmental Protection Agency’s toxics research program – and spent a great deal of time breaking up rings of “you favorably review my research and I’ll favorably review yours” folks in what was supposed to be a confidential process, before she retired almost two decades ago.

Humans Also Make Mistakes

Readers may have heard that ChatGPT sometimes just makes stuff up. But so do experienced, live journalists. Here’s the start of New York Times opinion columnist Bret Stephens’ article on COVID-19 masking on February 21 of this year:

“The most rigorous and comprehensive analysis of scientific studies conducted on the efficacy of masks for reducing the spread of respiratory illnesses – including Covid-19 – was published late last month. Its conclusions, said Tom Jefferson, the Oxford epidemiologist who is its lead author, were unambiguous.

“There is just no evidence that they” – masks – “make any difference,” he told the journalist Maryanne Demasi. “Full stop.”

Seems pretty solid, doesn’t it? But here’s what the study itself said: “The high risk of bias in the trials, variation in outcome measurement, and relatively low adherence with the interventions during the studies hampers drawing firm conclusions. There were additional RCTs during the pandemic related to physical interventions but a relative paucity given the importance of the question of masking and its relative effectiveness and the concomitant measures of mask adherence which would be highly relevant to the measurement of effectiveness, especially in the elderly and in young children.”

RCTs refer to randomized controlled trials, which in this case are experiments in which the population adhering to a policy is chosen at random from the eligible population, and a control group is also chosen at random from the same eligible population.

Aside from all that, use of masks was meant mainly to slow the spread of COVID-19, not stop it. Public health leaders were trying to avoid overwhelming medical facilities all at once.

Similarly, a widely quoted study released late in 2022 on gas stoves and asthma was problematic. I sympathize with the motivation – gas stoves will have to be phased out. It is unlikely to happen in the 20 or 25 years the “electrification” crowd wants, but the crowd in this country is doing everything it can to assure there is no room for temporarily mixing green hydrogen produced by off-peak wind and solar generators with natural gas, where the infrastructure exists to do so.

The gas stove study was worthless – the authors themselves managed to ignore almost every confounder EPA researchers identified with regard to indoor air pollution starting in 1973 and the new “research” could not resolve the huge state-to-state asthma differences the authors found. Recent studies are better at controlling for confounders – facts that can skew the data. But there are very few such studies and they are also small.

The study is a compilation of small, generally old, often poorly conducted, studies from a dozen different countries and all U.S. states. The authors’ search terms for finding studies to select were “gas cooking and children” OR “gas appliance and children” OR “unvented and children” OR “gas heating and children” OR “gas heater and children.”

In the study, “gas cooking” could mean anything from high-efficiency natural gas cooktops to propane grills. “Gas appliance” could mean anything from modern high-efficiency natural gas cooktops to leaky 1970s stoves with pilot lights. “Unvented” is a problem with infrastructure, not a specific problem with using gas of any type. The cooking of food emits allergens that can cause asthma, and there is a lot more cooking when families are large or can’t afford to eat out very often.

Although the typical journalist and politician apparently believes a compilation of multiple small, underpowered studies somehow gains power and statistical significance, such a compilation emphatically does not. Statistical AI routines seem to know this. The large language models (LLMs) such as ChatGPT and Bard apparently do not.

Broadband Test Case

I’m in the business of giving advice to the people who deploy broadband. If I give bad advice, I could bankrupt them. So, I did an AI test, which finished last month.

I used familiar (to me) random forest AI routines (mine are mainly adapted, with help, from R code sources) to clean a big 10 million premises broadband database. That’s about half of all rural premises that could be served, unserved or underserved by broadband now. I knocked out what random forest thinks are 220,000-plus questionable sites in this rural piece of the first iteration of the new FCC data fabric.

I used a cluster routine to divide the 220,000 into smaller groups of premises so I could review them more closely, mainly by separating plain bad data such as corporate billing addresses several states away from the premises being served, from outliers due to local issues.

I then ran a neural network review of all data, a review of data without the 20,000 “outlier candidates,” and another with all 220,000 “questionable” data points removed. Finally, I added some of the 200,000 back in from my manual review after clustering. My theory is that if neural networking is so smart, I should get roughly the same answers for all runs. It should detect the outliers and account for them.

I did not get the same answers. The neural network routine, which has been tested on census, school and property tax data among other things, was not as smart as I or random forest was, at handling obvious bad data (those billing addresses, for instance, or the address where service is provided to a farmhouse rather than the bunkhouse, or in the field itself), or true outliers. For example, rural Utah (especially) is unlike any other state because median household size is so much bigger. The data has household size and number of kids (at census block level, not necessarily premises level), so Utah should have stood out.

Neural networking seems more forgiving than random forest at including data in the final analysis, especially when I can’t define a starting set of “facts on the ground,” so-called statistical “priors.” Right now, with the new FCC broadband coverage maps still evolving and new federal funding programs in place, priors are more speculative.

Bottom line: I can’t trust the results and I really can’t trace them easily – the neural network process begins with data going through (in this case) four procedures – least square fits, not necessarily linear, and various polynomial fits and tests, not necessarily limited to two dimensions. Then it fits curves and tries to explain the curves – what researchers would do in a complex systems analysis after using educated guesses as to what moves the data most, or most quickly or whatever.

I could publish almost any results, dazzle and maybe bankrupt my readers with bad advice in many geographic areas, and no one would (or could) ever trace why. In other words, AI for this is currently a black box. I can mention results, with caveats. I can fold them into my thinking, but I can’t base an entire article on them.

Even random forest routines give me odd results – I used random forest to show that the old FCC data fabric was worthless, and Congress gave the FCC money to vastly improve it. But the typical non-specialized researcher would have taken those uncertain results and rushed to publication.

LLM Issues Are Different

Note that this has little to do with ChatGPT, which invokes a word/phrase frequency table with something like 1.5 billion entries on top of more proven “search” techniques, mainly on pre-2022 sources used to generate the frequency table in the first place. The output of an LLM can at least be judged somewhat.

My personal concern with LLMs is that they have the potential to unmask “confidential” survey or article-source verbal responses. That means we’re likely to do fewer such surveys on controversial topics, or we will structure responses to generate very short, unidentifiable or less-identifiable answers. I note that all journalists now need to face this issue when quoting “confidential” sources.

I have long planned to update a humanitarian aid survey I did in 2002, to include more advice on social media use by aid organizations. But the update is on hold as I and my research partners rethink how to shorten answers to open-ended questions to preserve privacy. My first try two years ago created a 120-question survey instrument out of an already bloated 50-question instrument. Embarrassing.

On the other hand, AI might make it easier to disaggregate and recombine some of the data from the little, sloppy studies I complained about earlier, and group comparable data together for a true meta-analysis with greater statistical power.

Some Parting Advice

Right now, many vendors are using FCC and other data in creative, fast, accurate ways to make life easier for Broadband Communities readers to design new deployments, model customer usage and needs, and even to help predict effects of delays and inflation. The methods that may be best for one deployment might not be as good for others. Yes, it is all AI. But you buy a motor vehicle that fits your needs. Do the same for the AI you drive, or that might drive you out of business.


Steven S. Ross

Steve Ross is the founding editor and now editor-at-large for Broadband Communities. He can be reached at

Bandwidth Hawk was judged second-best staff-written column for 2022 by the American Society of Business Publication Editors. It has placed among the top three nationally for five of the last six years.


Read what others have to say, and share your own thoughts with the community.

2000 characters remaining

© 2023 Broadband Properties, LLC

Privacy Policy

Web Design and Web Development by Buildable