AI in Health Programs: APMARGIN Perspective

March 23, 2026

Artificial intelligence is increasingly being used in health systems for reporting, analysis, and decision support. Most discussions focus on hallucination, but real-world use shows that the risks go beyond that.

In an article by Nate Miller published on Beehiiv, six types of AI errors were observed across 37 health program use cases. APMARGIN reviewed these findings and looked at how they apply to actual program implementation.

What stands out is that many of these errors are not obvious. They appear reasonable, structured, and usable, which makes them harder to detect in day-to-day work.

Six Types of AI Errors Observed in Practice

The article identifies six specific error types that appeared in real use. These are directly relevant to health programs.

Fabrication
This is the classic hallucination. The AI generates something that has no basis in any source.In program work, this can appear as a named intervention, system, or activity that looks legitimate but cannot be verified. While this type of error is often easier to spot, it still requires checking, especially when outputs are used in reports or proposals.
Miscounting
AI may fail to correctly count even when all the information is present. This usually happens when data is embedded in narrative text rather than structured tables. In health programs, this can affect totals such as number of facilities, clients, or activities. Because the numbers are often close to correct, the error can easily pass unnoticed.
Misclassification
AI can assign incorrect labels based on how information is described. Programs described using terms like “scaling” or “expanding” may be classified as fully implemented, even when they are still in pilot stages. In health systems, this affects how programs are interpreted and can lead to incorrect assumptions about coverage or readiness.
Citation errors
AI-generated citations may look structured but can be misleading. Multiple references may point to the same source, or a cited source may not actually support the claim. In some cases, sources may be incorrectly attributed or missing altogether. This creates a false sense that a statement is well-supported.
Content that looks real but is not substantiated
Some outputs are based on content that sounds credible but lacks real-world evidence. This often comes from websites or materials that use appropriate terminology but do not provide proof of actual implementation. In program settings, this can lead to inclusion of examples that are not validated.
Inference stated as fact
AI may take statements that are exploratory or conditional and present them as confirmed facts. For example, something described as being tested or assessed may be rewritten as if it is already established. This subtle shift can affect how results or program intentions are understood.

Why These Errors Matter in Health Programs

These errors are not always easy to detect. Outputs are usually clear, organized, and written with confidence.

In real program settings, this increases the risk of accepting information without verification. A small issue in counting, classification, or interpretation can affect planning, reporting, and decision-making.

Because these errors are subtle, they are more likely to be overlooked compared to obvious false information.

Implications for Implementation

AI can support health programs, but it should not be treated as a primary source of truth.

Outputs need to be checked, especially when used for summaries, classifications, or evidence. Structured data can help reduce certain errors, particularly miscounting.

Clear identification of sources is also important. Naming sources directly makes validation easier.

Users should also be aware that well-written output does not always mean accurate output.

Practical Use in Program Settings

AI remains useful, especially for handling large volumes of information. In actual program work, this includes summarizing reports, consolidating field data, drafting documents, and supporting rapid analysis.

From an APMARGIN implementation perspective, the value of AI comes from how it is embedded in the workflow, not from using it as a standalone tool.

Validation should be built into each step where AI is used. Outputs need to be checked against original data sources, not just reviewed for readability. For example, if AI is used to summarize multiple LGU reports, the totals should be verified against the source documents, especially when the inputs are narrative rather than structured.

Source confirmation is equally important. Any claim about a program, intervention, or outcome should be traceable to a clear and identifiable reference. Naming the actual source, such as a report, facility, or partner, is more reliable than relying on generic or numbered citations.

Classification also needs careful review. When AI assigns labels such as “pilot,” “scaling,” or “implemented,” these should be validated against actual deployment data, not just descriptive language. This is particularly relevant in monitoring and evaluation, where classification affects how programs are interpreted.

In practice, this means assigning responsibility for review. AI-generated outputs should not move directly into final reports or dashboards without a second layer of checking, ideally by someone familiar with the program context.

It also helps to structure inputs where possible. Using tables, standardized formats, and defined fields reduces errors such as miscounting and improves consistency of outputs.

At the user level, teams should be trained to treat AI outputs as drafts rather than final answers. The focus should be on verification and interpretation, not just acceptance.

When these practices are in place, AI becomes a practical support tool. It can speed up routine tasks and improve efficiency, while the system ensures that accuracy and reliability are maintained.