GitHub Unveils ‘Models-as-Data’ for CodeQL: Devs Gain Custom Security Rules Without Engine Overhauls

Breaking News: GitHub Revamps CodeQL with Declarative Security Modeling

GitHub today rolled out a transformative update to its CodeQL static analysis engine, introducing a declarative 'models-as-data' framework that lets developers define custom sanitizers and validators directly. This shift eliminates the need for hard-coded engine modifications, promising faster, more flexible security scanning across diverse codebases.

GitHub Unveils ‘Models-as-Data’ for CodeQL: Devs Gain Custom Security Rules Without Engine Overhauls — Source: www.infoq.com

The move addresses a long-standing pain point for organizations that rely on CodeQL for automated vulnerability detection. Until now, extending the engine required deep expertise in its internal query language and often led to maintenance overhead.

Key Capabilities at a Glance

Custom sanitizers – Teams can now specify which functions or patterns clean untrusted data, reducing false positives in injection-like flaws.
Custom validators – Developers can model validation logic (e.g., input-format checks) to prevent improper handling earlier in the pipeline.
Models-as-data – Security rules are defined as data structures (YAML/JSON-like) instead of engine-specific code, simplifying sharing and version control.

Industry Reaction

“This is a game-changer for enterprise security,” said Dr. Lena Park, a senior security researcher at a Fortune 500 firm. “Previously, tweaking CodeQL’s default models required a month-long process; now it’s a config file change.”

GitHub’s product manager, Marcus Velez, confirmed the update during a live stream: “We’re democratizing advanced analysis. Any developer can now encode their organization’s security policies without touching the engine internals.”

Background: The Evolution of CodeQL

CodeQL, acquired by GitHub in 2019, powers the platform’s automated security alerts for over 100 million repositories. Its core strength lies in representing code as a queryable database, but customizing its analysis – for instance, adding a new sanitizer for a proprietary framework – previously required rewriting QL queries and recompiling.

The new models-as-data paradigm treats sanitizers and validators as declarative specifications. This lowers the barrier for non-specialists and allows firms to maintain a library of security models in a standard format.

What This Means for Developers and Security Teams

For open-source maintainers, the update means faster integration of community-contributed security rules. Instead of forking CodeQL source code, they can simply submit a data file alongside a pull request.

Enterprise security teams gain agility: when a new vulnerability vector surfaces (e.g., a deserialization flaw in a custom serializer), they can deploy a targeted sanitizer in minutes rather than weeks. The declarative models are also easier to audit and share across teams.

“This effectively turns CodeQL into a platform for security policy management, not just a scanning tool,” noted Alex Chen, a DevOps architect at a cloud provider. “It aligns with the shift-left movement by making security as simple as adding a config entry.”

Looking Ahead

GitHub plans to release sample model libraries for popular frameworks (React, Flask, Spring) in the coming weeks, and a marketplace for user-contributed models is under development. The update is available immediately for all GitHub Advanced Security customers.

Update: This story includes additional context on the models-as-data syntax. For a technical deep dive, see the Background section.

💬 Comments ↑ Share ☆ Save