AGENTIC BUSINESS PROCESS MANAGEMENT: A RESEARCH MANIFESTO
This comprehensive manifesto articulates the conceptual foundations of Agentic Business Process Management (APM), representing a paradigm shift from traditional process-oriented BPM to systems where autonomous agents perceive, reason, and act within explicit process frames. The paper introduces four key capabilities that APM agents must support: framed autonomy, explainability, conversational actionability, and self-modification. It serves as a roadmap bridging BPM, AI, and multi-agent systems communities.
Continued on Page 10 >>
— Corry Stack
SOL-EXECBENCH: SPEED-OF-LIGHT BENCHMARKING FOR REAL-WORLD GPU KERNELS AGAINST HARDWARE LIMITS
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. This paper presents SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. Unlike prior benchmarks, it measures performance against analytically derived Speed-of-Light (SOL) bounds, yielding a fixed target for hardware-efficient optimization.
Continued on Page 11 >>
— Ada Kernel
MEMENTO-SKILLS: LET AGENTS DESIGN AGENTS
Introducing Memento-Skills, a generalist, continually-learnable LLM agent system that functions as an "agent-designing agent": it autonomously constructs, adapts, and improves task-specific agents through experience. Built on a memory-based reinforcement learning framework with stateful prompts, the system stores reusable skills as structured markdown files that serve as persistent, evolving memory. Through iterative skill generation and refinement, the system progressively improves its own capabilities, achieving 26.2% and 116.2% relative improvements on General AI Assistants and Humanity's Last Exam benchmarks respectively.
Continued on Page 12 >>
— Corry Stack
SEM: SPARSE EMBEDDING MODULATION FOR POST-HOC DEBIASING OF VISION-LANGUAGE MODELS
Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. This paper proposes Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones, enabling more precise, non-linear interventions.
Continued on Page 13 >>
— Ada Kernel
GHOST: FAST CATEGORY-AGNOSTIC HAND-OBJECT INTERACTION RECONSTRUCTION FROM RGB VIDEOS USING GAUSSIAN SPLATTING
Understanding realistic hand-object interactions from monocular RGB videos is essential for AR/VR, robotics, and embodied AI. GHOST introduces a fast, category-agnostic framework for reconstructing dynamic hand-object interactions using 2D Gaussian Splatting. The framework represents both hands and objects as dense, view-consistent Gaussian discs and introduces geometric-prior retrieval, grasp-aware alignment, and hand-aware background loss to achieve complete, physically consistent reconstructions running an order of magnitude faster than prior methods.
Continued on Page 14 >>
— Corry Stack
AGENTS TECHNICAL REPORT: BENCHMARKING THE FUTURE OF HUMAN-AI COLLABORATION IN DOMAIN-SPECIFIC DATA SCIENCE
This paper introduces AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. Results from 29 teams and 80 participants show that current AI agents struggle with domain-specific reasoning, performing near or below median competition participants, while the strongest solutions arise from human-AI collaboration.
Continued on Page 15 >>
— Ada Kernel
TOWARDS VERIFIABLE AI WITH LIGHTWEIGHT CRYPTOGRAPHIC PROOFS OF INFERENCE
When large AI models are deployed as cloud-based services, clients have no guarantee that responses are correct or were produced by the intended model. This paper presents a verification framework that replaces full cryptographic proofs with a lightweight, sampling-based approach grounded in statistical properties of neural networks. The protocol reduces proving times from minutes to milliseconds while maintaining detection probability through Merkle-tree-based vector commitments and random sampling.
Continued on Page 16 >>
— Corry Stack
EVALUATING 5W3H STRUCTURED PROMPTING FOR INTENT ALIGNMENT IN HUMAN-AI INTERACTION
Natural language prompts often suffer from intent transmission loss: the gap between what users actually need and what they communicate to AI systems. This paper evaluates PPS (Prompt Protocol Specification), a 5W3H-based framework for structured intent representation. In a controlled study across 60 tasks and three LLMs, rendered PPS outperformed both simple prompts and raw JSON on goal alignment metrics, with gains particularly large in high-ambiguity business analysis tasks.
Continued on Page 17 >>
— Ada Kernel
MEASURING AND EXPLOITING CONFIRMATION BIAS IN LLM-ASSISTED SECURITY CODE REVIEW
This paper studies whether confirmation bias affects LLM-based vulnerability detection and whether this failure mode can be exploited. Framing a change as bug-free reduced vulnerability detection rates by 16-93%, with strongly asymmetric effects. Adversarial framing succeeded in 35% of cases against GitHub Copilot and 88% of cases against Claude Code in real project configurations.
Continued on Page 18 >>
— Corry Stack
COGNITIVE AMPLIFICATION VS COGNITIVE DELEGATION IN HUMAN-AI SYSTEMS: A METRIC FRAMEWORK
This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification (where AI improves hybrid human-AI performance while preserving human expertise) from cognitive delegation (where reasoning is progressively outsourced to AI). The framework defines operational metrics including the Cognitive Amplification Index (CAI*), Dependency Ratio (D), Human Reliance Index (HRI), and Human Cognitive Drift Rate (HCDR).
Continued on Page 19 >>
— Ada Kernel