Converting an existing project to use dg
dg
and Dagster Components are under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please join the #dg-components channel in the Dagster Community Slack.
Suppose we have an existing Dagster project. Our project defines a Python
package with a a single Dagster asset. The asset is exposed in a top-level
Definitions
object in my_existing_project/definitions.py
. We'll consider
both a case where we have been using uv with pyproject.toml
and pip
with setup.py
.
- uv
- pip
tree
.
├── my_existing_project
│ ├── __init__.py
│ ├── assets.py
│ ├── definitions.py
│ └── py.typed
├── pyproject.toml
└── uv.lock
2 directories, 6 files
tree
.
├── my_existing_project
│ ├── __init__.py
│ ├── assets.py
│ ├── definitions.py
│ └── py.typed
└── setup.py
2 directories, 5 files
Before proceeding, we'll make sure we have an activated and up-to-date virtual
environment in the project root. Having the virtual environment located in the
project root is recommended (particularly when using uv
) but not required.
- uv
- pip
If you don't have a virtual environment yet, run:
uv sync
Then activate it:
source .venv/bin/activate
If you don't have a virtual environment yet, run:
python -m venv .venv
Now activate it:
source .venv/bin/activate
And install the project package as an editable install:
pip install --editable .
Install dependencies
Install the dg
command line tool into your project virtual environment.
- uv
- pip
uv add dagster-dg-cli
pip install dagster-dg-cli
Update project structure
Add dg
configuration
The dg
command recognizes Dagster projects through the presence of TOML
configuration. This may be either a pyproject.toml
file with a tool.dg
section or a dg.toml
file. Let's add this configuration:
- uv
- pip
Since our project already has a pyproject.toml
file, we can just add
the requisite tool.dg
section to the file:
...
[tool.dg]
directory_type = "project"
[tool.dg.project]
root_module = "my_existing_project"
code_location_target_module = "my_existing_project.definitions"
Since our sample project has a setup.py
and no pyproject.toml
,
we'll create a dg.toml
file:
directory_type = "project"
[project]
root_module = "my_existing_project"
code_location_target_module = "my_existing_project.definitions"
There are three settings:
directory_type = "project"
: This is howdg
identifies your package as a Dagster project. This is required.project.root_module = "my_existing_project"
: This points to the root module of your project. This is also required.project.code_location_target_module = "my_existing_project.definitions"
: This tellsdg
where to find the top-levelDefinitions
object in your project. This actually defaults to[root_module].definitions
, so it is not strictly necessary for us to set it here, but we are including this setting in order to be explicit--existing projects might have the top-levelDefinitions
object defined in a different module, in which case this setting is required.
Now that these settings are in place, you can interact with your project using dg
. If we run dg list defs
we can see the sole existing asset in our project:
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ my_asset │ default │ │ │ │ │
│ │ └──────────┴─────────┴──────┴───────┴─────────────┘ │
└─────────┴─────────────────────────────────────────────────────┘
Add a dagster_dg_cli.plugin
entry point
We're not quite done adding configuration. dg
uses the Python entry
point API
to expose custom component types and other scaffoldable objects from user
projects. Our entry point declaration will specify a submodule as the location
where our project exposes plugin objects. By convention, this submodule is
named <root_module>.lib
. In our case, it will be my_existing_project.lib
.
Let's create this submodule now:
mkdir my_existing_project/components && touch my_existing_project/components/__init__.py
See the plugin guide for more on dg
plugins.
We'll need to add a dagster_dg_cli.plugin
entry point to our project and then
reinstall the project package into our virtual environment. The reinstallation
step is crucial. Python entry points are registered at package installation
time, so if you simply add a new entry point to an existing editable-installed
package, it won't be picked up.
Entry points can be declared in either pyproject.toml
or setup.py
:
- uv
- pip
Since our package metadata is in pyproject.toml
, we'll add the entry
point declaration there:
...
[project.entry-points]
"dagster_dg_cli.plugin" = { my_existing_project = "my_existing_project.components"}
...
Then we'll reinstall the package. Note that uv sync
will not
reinstall our package, so we'll use uv pip install
instead:
uv pip install --editable .
Our package metadata is in setup.py
. While it is possible to add
entry point declarations to setup.py
directly, we want to be able to
read the entry point declaration from dg
, and there is no reliable
way to read setup.py
(since it is arbitrary Python code). So we'll
instead add the entry point to a new setup.cfg
, which can be used
alongside setup.py
. Create setup.cfg
with the following contents
(if your package has existing entry points declared in setup.py
, you'll
want to move their definitions to setup.cfg
as well):
[options.entry_points]
dagster_dg_cli.plugin =
my_existing_project = my_existing_project.components
Then we'll reinstall the package:
pip install --editable .
If we've done everything correctly, we should now be able to run dg list plugin-modules
and see the module my_existing_project.components
, which we have registered as an entry point, listed in the output.
dg list plugin-modules
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Module ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ dagster │
│ my_existing_project.components │
└──────────────── ────────────────┘
We can now scaffold a new component type in our project and it will be
available to dg
commands. First create the component type:
dg scaffold component Foo
Creating a Dagster component type at /.../my-existing-project/my_existing_project/components/foo.py.
Scaffolded files for Dagster component type at /.../my-existing-project/my_existing_project/components/foo.py.
Then run dg list components
to confirm that the new component type is available:
dg list components
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Key ┃ Summary ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ dagster.DefinitionsComponent │ An arbitrary set of dagster definitions. │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.DefsFolderComponent │ A folder which may contain multiple submodules, each │
│ │ which define components. │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ dagster.PipesSubprocessScriptCollectionComponent │ Assets that wrap Python scripts executed with Dagster's │
│ │ PipesSubprocessClient. │
├──────────────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤
│ my_existing_project.components.Foo │ COMPONENT SUMMARY HERE. │
└──────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘
You should see the my_project.lib.MyComponentType
listed in the output.
Create a defs
directory
Part of the dg
experience is autoloading definitions. This means
automatically picking up any definitions that exist in a particular module. We
are going to create a new submodule named my_existing_project.defs
(defs
is
the conventional name of the module for where definitions live in dg
) from which we will autoload definitions.
mkdir my_existing_project/defs
Modify top-level definitions
Autoloading is provided by a function that returns a Definitions
object. Because we already have some other definitions in our project, we'll combine those with the autoloaded ones from my_existing_project.defs
.
To do so, you'll need to modify your definitions.py
file, or whichever file contains your top-level Definitions
object.
You'll autoload definitions using load_defs
, then merge them with your existing definitions using Definitions.merge
. You pass load_defs
the defs
module you just created:
- Before
- After
import dagster as dg
from my_existing_project.assets import my_asset
defs = dg.Definitions(
assets=[my_asset],
)
import my_existing_project.defs
from my_existing_project.assets import my_asset
import dagster as dg
defs = dg.Definitions.merge(
dg.Definitions(assets=[my_asset]),
dg.components.load_defs(my_existing_project.defs),
)
Now let's add an asset to the new defs
module. Create
my_existing_project/defs/autoloaded_asset.py
with the following contents:
import dagster as dg
@dg.asset
def autoloaded_asset(): ...
Finally, let's confirm the new asset is being autoloaded. Run dg list defs
again and you should see both the new autoloaded_asset
and old my_asset
:
dg list defs
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩ │
│ │ │ autoloaded_asset │ default │ │ │ │ │
│ │ ├──────────────────┼─────────┼──────┼───────┼─────────────┤ │
│ │ │ my_asset │ default │ │ │ │ │
│ │ └──────────────────┴─────────┴──────┴───────┴─────────────┘ │
└─────────┴─────────────────────────────────────────────────────────────┘
Now your project is fully compatible with dg
!