Agentic AI Weekly | Berkeley RDI | July 1, 2026

1 Week Left of Early-Bird Pricing (Prices Go Up by $100)! | Research Highlight: OpenSage: Self‑Programming Agent Generation Engine | Agentic AI Summit Agenda Now Live + New Startup Spotlight Session

Jul 01, 2026

We are excited to highlight OpenAI’s recently released GPT-5.5-Cyber model and its progress on CyberGym!

CyberGym is part of the newly launched Frontier AI Cybersecurity Observatory, an effort to provide realistic, reproducible evaluations and continuous public measurements of frontier AI systems on real-world cybersecurity tasks.

Alongside CyberGym, the Observatory includes benchmarks such as ExploitGym and CyberGym-E2E, offering a more comprehensive view of frontier AI cybersecurity capabilities across vulnerability discovery, exploitation, patching, and end-to-end security workflows.

It is exciting to see CyberGym and the Frontier AI Cybersecurity Observatory becoming important guideposts for frontier AI cybersecurity development. As systems like GPT-5.5-Cyber continue to advance, these benchmarks help the community track progress, better understand emerging capabilities, advance AI systems that strengthen defensive security, and identify and mitigate potential risks from increasingly capable offensive cyber capabilities.

Learn More About CyberGym

Research Highlight: OpenSage: Self‑Programming Agent Generation Engine

AI agents are experiencing explosive growth, but building them still depends almost entirely on human expertise. Every major ADK today requires developers to manually design agent topology, tooling systems, and memory architecture from scratch. This human-centric paradigm does not scale, cannot dynamically adapt across tasks, and may not be optimal for AI reasoning — mirroring early ML’s reliance on hand-crafted features, right before end-to-end learning took over.

New research from teams at UC Berkeley, UC Santa Barbara, Columbia, Duke, Google DeepMind, and UCLA argues that agent development is now at the same inflection point: instead of manually designing agent structures and capabilities, we should move toward an AI-centric paradigm, where a base “agent scaffold” is provided and the AI itself learns how to organize topology, tools, and memory from experience and feedback.

What OpenSage Does

OpenSage is an Agent Development Kit built around three capabilities that no existing ADK provides:

Self-generating agent topology: AI dynamically creates and manages sub-agents at runtime, forming vertical (sequential) or horizontal (parallel) topology automatically.
Dynamic tool synthesis: Agents write and register their own tools (Python modules, Bash scripts, and more), execute them in isolated dependency-aware sandboxes, and run tools asynchronously in the background while continuing to reason. Agents can also create reusable skills beyond tools.
Hierarchical graph-based memory: Short- and long-term memory are stored as typed Neo4j graphs supporting both graph-based traversal and semantic retrieval, managed by a dedicated memory agent.

Outperforms Across Selected Benchmarks

Evaluated against four diverse and popular benchmarks, SageAgent (built on OpenSage) consistently ranks first under the same backbone model:

60.2% on CyberGym vs. 39.4% for OpenHands.
78.4% on Terminal-Bench 2.0.
59.0% on SWE-Bench Pro Python vs. 40.2% for SWE-agent.
46.8% on DevOps-Gym, the only agent completing end-to-end tasks.

Self-Generating Topology Makes a Difference

On a 300-instance CyberGym subset, removing vertical or horizontal topology drops performance to 60.3% and 62.3% respectively; disabling all features collapses it to 33.7%.

Tooling System Powers Complex Tasks

The tooling system is equally critical. Without the dynamic tooling system, agents fall back to a raw terminal interface and performance collapses.

Memory System Enables Long-Horizon Reasoning

On SWE-Bench Pro, full hierarchical memory achieves 59.0% vs. 56.4% for a baseline graph memory and 56.2% for none.

Emergent Capabilities and What’s Next

Across experiments, agents exhibit novel self-programming behaviors: spawning debugging sub-agents, constructing syntax-aware fuzzers, and selectively persisting high-signal memory. Current models don’t yet exploit these capabilities optimally — invocation frequency remains low in some cases, and models occasionally forget to reuse existing agents or memory, hallucinate tools, or create sub-agents with mismatched toolsets. To help build stronger models, OpenSage now includes integrated RL training support (AReaL, slime) and sandbox infrastructure for large-scale parallel rollouts.

The longer-term goal is for OpenSage to be not only an agent construction framework, but also a training scaffold for next-generation reasoning models, where AI can design, coordinate, and refine agents through interaction and feedback, unifying agent construction and problem solving within a single AI-centric loop.

Agentic AI Summit 2026: Early-Bird Tickets Go Up by $100 Next Week!

Save the date! The Agentic AI Summit returns to Berkeley on August 1–2, 2026, welcoming 5,000+ expected in-person attendees for two days of insights and innovation. Building on last year’s sold-out success—with 2,000+ in‑person attendees and 40,000+ global livestream participants—the summit will bring together researchers, builders, industry leaders, and the global agentic AI community for keynotes, technical talks and panels, hands-on workshops, live demos, and more!

Full Summit Agenda Now Available!

We are thrilled to announce that the full agenda for the Summit is now available on our website! The Summit is set to feature 200+ speakers across 4 stages, plus 200+ poster presentations during the event! If you’ve been waiting for the schedule to come out before grabbing your tickets, now is the time to take a look and secure your spot!

View the Agenda

In addition, we are excited to showcase our expanded list of speakers for the Summit! We are honored to have such a great group of academics, founders, executives, and investors participate in this year’s event, and more will be announced soon!

🎟️ Early‑Bird Pricing (Limited Capacity)
A limited number of early‑bird tickets are still available, and prices go up by $100 starting on July 6!

Standard Early-Bird: $399 ($499 next week!)

Get your tickets

Join Us Virtually

Can't attend the Summit in person? Join us virtually—register for the livestream to access all sessions from anywhere in the world.

Apply to the Startup Spotlight

Building the future of Agentic AI? We’re looking for you.

We are excited to announce that applications are now open for the Startup Spotlight at the Agentic AI Summit! Selected startups will have the opportunity to showcase their products and innovations to researchers, founders, engineers, investors, industry leaders, and policymakers from across the AI ecosystem. Apply to showcase your product and innovation!

Applications will be reviewed on a rolling basis, so we encourage interested teams to apply as soon as possible. The deadline to apply is Friday, July 10, at 11:59 PM PT.

Apply Now

You can also learn more about the Summit, the full agenda, and our event sponsors by reading Professor Dawn Song’s recent LinkedIn and Twitter/X updates:

Twitter/X

Trends This Week

Last week, OpenAI previewed GPT-5.6 Sol, its next-generation flagship model, alongside Terra, a balanced model for everyday work, and Luna, a faster, lower-cost model. According to OpenAI, GPT-5.6 Sol is its strongest model yet, with improved agentic capabilities across coding, biology, and cybersecurity. The release introduces a new “max” reasoning effort for deeper reasoning and an “ultra” mode that uses subagents to accelerate complex work. OpenAI says Sol sets a new state of the art on Terminal-Bench 2.1, improves on GPT-5.5 in GeneBench v1 while using fewer tokens, and advances performance on long-horizon cybersecurity tasks. The company emphasized that Sol is better at helping users find and fix vulnerabilities than at reliably carrying out end-to-end attacks, and said the model does not cross the Cyber Critical threshold under its Preparedness Framework. OpenAI is beginning with a limited preview for select trusted partners through the API and Codex, with broader availability for ChatGPT, Codex, and API users planned in the coming weeks.
Google introduced built-in computer-use capabilities in Gemini 3.5 Flash, allowing the lightweight model to see, reason, and act across browser, mobile, and desktop environments. The feature, previously available through a standalone Gemini 2.5 computer-use model, is now integrated directly into Flash for developers building agents that can click, scroll, type, and complete longer-horizon tasks. Google says the capability is available through the Gemini API and Gemini Enterprise Agent Platform, with safeguards such as user confirmation for sensitive actions and protections against indirect prompt injection.
Last week, OpenAI and Broadcom unveiled Jalapeño, OpenAI’s first custom AI accelerator designed specifically for large-language-model inference. OpenAI says the chip was built from the ground up around its model, kernel, memory, networking, and serving needs, with early testing showing substantially better performance per watt than current state-of-the-art systems. The chip was developed from design to tape-out in nine months, with OpenAI models helping accelerate parts of the design process. OpenAI says Jalapeño is part of a broader multi-generation compute platform with Broadcom and data center partners, with deployments planned at gigawatt scale beginning in 2026.
Anthropic launched Claude Tag, a Slack-based beta for Claude Enterprise and Team customers that lets teams tag @Claude in selected channels and delegate work across connected tools, data sources, and codebases. Claude Tag can build context from the channels it has access to, remember relevant information within scoped permissions, and work asynchronously on tasks over hours or days. Anthropic says the system is already widely used internally, with 65% of its product team’s code created by an internal version of Claude Tag. The company described the product as a more collaborative and proactive evolution of Claude Code, with admin controls for tool access, channel-specific memory, token spend limits, and activity logs.

Don’t miss the developments shaping Agentic AI. Subscribe for weekly coverage of groundbreaking research, emerging trends, and critical insights across Agentic AI and the broader AI landscape.

Discussion about this post

Ready for more?