How to migrate your existing Python package with scikit-package

Installation

To get started, install scikit-package, black, and pre-commit in a new conda environment. Follow the steps below:

Create a new environment named scikit-package_env:

conda create -n scikit-package_env

Activate the environment:

conda activate scikit-package_env

Install packages:

conda install scikit-package black pre-commit

Prerequisites

This guide is for developers who have an existing Python package and want to migrate it to the Billinge group’s project structure using the scikit-package library. Hence, we assume you have a basic understanding of Python, Git, and GitHub workflows. If you are not familiar with GitHub workflows, please refer our brief guide provided here.

Tips and how to receive support

We understand that your migration journey can be challenging. We offer the following ways to help guide migrate your package to scikit-package:

  1. You may cross-check with the Billinge group’s up-to-date package, diffpy.utils: https://github.com/diffpy/diffpy.utils.

  2. If you have any questions, first read the FAQ for how to customize your package and certain design decisions in the scikit-package template.

  3. After you’ve cross-checked and searched through the FAQ, please feel free to ask questions by creating an issue on the scikit-package repository here.

Migration overview and expected outcome

By the end of the migration process, you will have a package that is structured according to the Billinge group’s project structure shown here: https://github.com/diffpy/diffpy.utils. The migration process is divided into four main steps.

  1. During the first step of the pre-commit workflow, you will use automatic formatting tools to standardize your package with PEP8 before migrating it to the Billinge group’s project structure with scikit-package.

  2. In the migration workflow, you will use the scikit-package library to generate a new project inside the original directory. The new project contains dynamically filled templates based on your package information, and configure GitHub CI and Codecov.

  3. In the API documentation build workflow, you will use our Python script to automatically generate and build API documentation for your package and render the documentation locally.

  4. In the final clean-up workflow, you will host your package documentation online. Your package will be in good shape for PyPI, GitHub, and conda-forge release!

1. Pre-commit workflow

Here, let’s first standarlize your package so that itis PEP8 and PEP256 compliant using both automatic formatting tools with manual edits.

1.1. Run black in your codebase

  1. Fork the repository that you want to sk-package from the GitHub website under your account.

    If you are the owner of the repository, you can skip this step.

  2. Type git clone <https://github.com/<username>/<project-name> and cd <project-name>.

  3. Type git pull upstream main to sync with the main branch.

    If your default branch is called master, run git pull upstream master instead. However, main is the new default branch name for GitHub.

  4. Type git checkout -b black to create a new branch called black.

  5. Create pyproject.toml at the top project level.

  6. Copy and paste with the following content to pyproject.toml:

    [tool.black]
    line-length = 79
    include = '\.pyi?$'
     exclude = '''
     /(
         \.git
     | \.hg
     | \.mypy_cache
     | \.tox
     | \.venv
     | \.rst
     | \.txt
     | _build
     | buck-out
     | build
     | dist
     # The following are specific to Black, you probably don't want those.
     | blib2to3
     | tests/data
     )/
     '''
    
  7. Type black src. If your source code is in a different directory, replace src with the appropriate directory path. This will automatically format your code to PEP8 standards given the line-length provided under line-length above in pyproject.toml. If you want to ignore specific files or directories, add them to the exclude section in pyproject.toml

  8. Add and commit the automatic changes by black. The commit message can be git commit -m "skpkg: apply black to src directory with black configured in pyproject.toml".

  9. Type black . Here, you are running black across the entire package directory. Run pytest to test locally.

  10. Type git add . and git commit -m "skpkg: apply black to all files in the project directory".

  11. Create a pull request into main. The pull request title can be skpkg: Apply black to project directory with no manual edits.

  12. Wait for the PR to be merged to main.

1.2. Apply pre-commit hooks without manual edits

Here, you will use automatic formatting tools to standardize your package with PEP8, PEP256, etc. We will not directly create PRs to main but to package.

  1. Type git checkout main && git pull upstream main and git branch -b precommit to create a new branch called precommit.

  2. Copy and paste three files of .flake8, .isort.cfg, .pre-commit-config.yaml from https://github.com/Billingegroup/scikit-package/tree/main/%7B%7B%20cookiecutter.github_repo_name%20%7D%7D to your project directory.

  3. Type pre-commit run --all-files. This will attempt to lint your code such as docstrings, extra spaces, across all file types such as .yml, .md, .rst, etc.

  4. Type git status to get an overview of the files modified and then by running git diff <file-or-directory-path> to see the specific changes.

  5. If you do not want the new changes, you can run git restore <file-or-directory-path> to revert the changes done by pre-commit.

  6. If you want to prevent prettier from applying on specific files, create .`prettierignore file at the top project level like .flake8, add the file paths to be ignored in the file one file path per line.

  7. If you are satisfied with the automatic changes by pre-commit run --all-files, run pytest, type git add <file-path(s)> and git commit -m "style: apply pre-commit hooks with no manual edits".

    Attention

    At this point, you may have failed hooks when you run pre-commit run --all-files. Don’t worry! We will fix them in the following section below here.

  8. Push the changes to the precommit branch by typing git push origin precommit.

  9. Create a PR from precommit to package branch. The PR title can be skpkg: Apply pre-commit to project directory with no manual edits.

  10. Wait for the PR to be merged to package.

1.3. Apply manual edits to pass pre-commit hooks

Your package will most likely have pre-commit hooks that are not automatically fixed by pre-commit. Here, you will manually fix the errors raised by flake8, codespell, etc.

  1. Type git checkout upstream package && git pull upstream package to sync with the package branch.

  2. Type git checkout -b flake8-length to create a new branch. In this branch you will fix flake8 errors. In this branch, fix all of flake8 errors related to line-lenghts if there are any. If you want to ignore certain files from flake8 errors include filepaths to exclude section in the .flake8 files.

  3. Create a PR request to package. Since you are fixing flake8 errors, the commit message can be skpkg: fix flake8 line-length errors and the pull request title can be skpkg: Fix flake8 line-length errors.

  4. If you have codespell errors, create a new branch called codespell and fix all of the spelling errors.

    To ignore a word, add it to .codespell/ignore_words.txt. See an example here.

    To ignore a specific line, add it to .codespell/ignore_lines.txt. See an example below:

    ;; src/translation.py
    ;; The following single-line comment is written in German.
    # Hallo Welt
    

    To ignore a specific file extension, add *.ext to the skip section under [tool.codespell] in pyproject.toml. Please see an example here.

  5. If you want to suppress the flake8 error, add # noqa: <error-code> at the end of the line. For example, import numpy as np # noqa: E000 but make sure you create an issue for this so that you can revisit them.

  6. For each flake8 branch, create a PR request to package. Since you are fixing flake8 errors, the commit message can be skpkg: Fix flake8 <readable-error-type> errors and the pull request title can be skpkg: Fix flake8 <readable-error-type> errors.

Congratulations if you have successfully passed all the pre-commit hooks! You can now proceed to the next section.

2. Migration workflow

Here, you will create a new Python project using scikit-package. Then you will migrate existing files from the old project to the new project directory.

Attention

Please read the following carefully before proceeding:

  • Do NOT delete/remove any files before confirming that it is absolutely unnecessary. Create an issue or contact the maintainer.

  • Do NOT delete project-specific content such as project descriptions in README, license information, authors, tutorials, examples.

2.1. Setup correct folder structure

  1. Sync with the main branch by typing git checkout main && git pull upstream main.

  2. Before migration, we want to make sure your existing package is structured as a standard recommended Python.

    For a standard package, it should be structured as follows:

    my-package/
    ├── src/
    │   ├── my_package/
    │   │   ├── __init__.py
    │   │   ├── file.py
    │   │   ├── ...
    ├── tests/
    │   ├── test_file.py
    │   ├── ...
    ├── ...
    

    For a namespace package, it should be structured as follows:

    diffpy.utils/
    ├── src
    │   ├── diffpy
    │   │   ├── __init__.py
    │   │   └── utils
    │   │       ├── __init__.py
    │   │       ├── file.py
    │   │       ├── ...
    ├── tests/
    │   ├── test_file.py
    │   ├── ...
    ├── ...
    
  3. Is your package structured as above? If yes, skip to the next section in starting a new project with scikit-package here.

  4. Type git checkout -b structure to create a new branch. In this branch, you will ensure src and tests are correctly structured.

  5. If your project is structured as my-package/my-package/<code>, run git mv <package-name> src. Your project should now be structured as my-package/src/<code>.

  6. Run pytest locally to ensure the tests are running as expected.

  7. Run git add src and git commit -m "skpkg: src to the top level of the package directory"

  8. You can run git mv my-package src to rename the directory.

  9. You will now move tests to the top level of the package directory ../my-package/tests/<code>. If your tests files are located inside src, ensure you use git mv src/tests ..

  10. Type git add tests and git commit -m "skpkg: tests to the top level of the package directory".

  11. Push the changes to a new branch and create a PR to sk-package.

2.2. Start a new project

  1. Type package create inside the project directory.

  2. Answer the questions as follows.

proj stands for “project” and gh for “GitHub”.

proj_owner_name:

e.g., Simon J. L. Billinge.

proj_owner_email:

e.g., sbillinge@columbia.edu.

proj_owner_gh_username:

e.g., sbillinge.

contributors:

e.g., Billinge Group members and community contributors.

license_holders:

e.g., The Trustees of Columbia University in the City of New York.

project_name:

e.g., my-package. For a namespace package, use e.g., diffpy.my-package.

github_org:

The GitHub organization name or owner’s GitHub username. e.g., diffpy or sbillinge.

github_repo_name:

e.g., my-package. The repository name of the project displayed on GitHub.

package_dist_name:

The name in the package distribution in PyPI and conda-forge. If your package name contains _, replace it with -. e.g., my-package. For a namespace package, use e.g., diffpy.my-package.

package_dir_name:

The name of the package directory. e.g., src/my_package. Unlike project_name, it must be lowercase so that it can be imported as import my_package.

proj_short_description:

e.g., Python package for doing science.

keywords:

Each word is separated by a comma and a space. e.g., pdf, diffraction, neutron, x-ray. The keywords may be found in pyproject.toml or setup.py.

min_python_version:

The minimum Python version for package distribution.

max_python_version:

The maximum Python version for package distribution.

needs_c_code_compiled:

Whether the package requires C/C++ code that requires building the package. For pure Python packages, type 1 to select No.

has_gui_tests:

Whether the package runs headless testing in GitHub CI. If your package does not contain a GUI, type 1 to select No.

  1. Type ls to see the project directory.

  2. Type cd <package_dir_name> to change the directory to the re-packaged directory.

2.3. Move src, tests, requirements to setup GitHub CI in PR

  1. Type ls. Notice there is a new directory named <package-name>. We will call this new directory as the sk-packaged directory.

  2. Type cd <package-name>. Type pwd and expect you are inside the directory e.g., ~/dev/diffpy.pdfmorph/diffpy.pdfmorph

  3. Type mv ../.git . to move .git to the re-packaged directory created by scikit-package. Please note that there is a . in mv ../.git ..

  4. Type git status to see a list of files that have been (1) untracked, (2) deleted, (3) modified.

    • untracked are new files created by the scikit-package

    • deleted are files in the original directory but the files that are not in the re-packaged directory. Most of the src and tests and doc files will be in this category. We will move them from the original to the re-packaged directory in the next few steps.

    • modified are files that that exist both in the original and the re-packaged directory, while the scikig-package has made changes to them.

  5. Type git checkout -b setup-CI to create a new branch.

  6. Notice there is a requirements folder containing pip.txt, tests.test, docs, conda.txt. Follow the instructions prvided in requirements/README.txt.

  7. Type git add requirements && git commit -m "skpkg: create requirements folder".

  8. Now you will move src and tests folders in the following steps.

  9. Type cp -n -r ../src . to copy the source code from the main to the sk-packaged directory, without overwriting existing files in the destination.

  10. Type cp -n -r ../tests ..

  11. Run git diff and the differences

  12. Then run pytest locally to ensure the tests are running as expected.

  13. Type git add src && git commit -m "skpkg: move src folder".

  14. Type git add tests && git commit -m "skpkg: move tests folder".

  15. Type git add .github && git commit -m "skpkg: move and create github CI and issue templates".

    Attention

    If your package does not support Python 3.13, you will need to specify the Python version supported by your package. Follow the instructions here to set the Python version under .github/workflows here

  16. Follow the current practice to ensure it can be installed

    # Create a new environment, specify the Python version and install packages
    conda create -n <package_name>_env python=3.13 \
        --file requirements/test.txt \
        --file requirements/conda.txt \
        --file requirements/build.txt
    
    # Activate the environment
    conda activate <package_name>_env
    
    # Install your package locally
    # `--no-deps` to NOT install packages again from `requirements.pip.txt`
    pip install -e . --no-deps
    
    # Run pytest locally
    pytest
    
    # ... run example tutorials
    
  17. Push the changes to the CI branch by typing git push origin CI.

  18. Create a PR from CI to sk-package. The pull request title can be skpkg: move src, tests and setup requirements folder to setup CI.

  19. Notice there is a CI running in the PR. Once the CI is successful, review the PR merge to sk-package.

2.4. Move configuration files

  1. Sync with the sk-package branch by typing git checkout package && git pull upstream package.

  2. Copy all configuration files that are, .codecov.yml, .flake8, .isort.cfg, .pre-commit-config.yaml files from the main repo to the scikit-package repo.

2.5. Move rest of text files

  1. Files showing as (2) “deleted” upon git status are in the main repo but not in the scikit-package repo. We took care of most of these by moving over the src tree, but let’s do the rest now. Go down the list and for <filename> in the git status “delete” files type cp -n ../<filepath>/<filename> ./<target_filepath>. Do not move files that we do not want. If you are unsure, please confirm with Project Owner.

  2. Files that have been (3) modified exist in both places and need to be merged manually. Do these one at a time. Differences will show up. Select anything you want to inherit from the file in the main repo. For example, you want to copy useful information such as LICENSE and README files.

3. Documentation workflow

3.1. Move documentation files

  1. We want to copy over everything in the doc/<path>/source file from the old repo to the doc/source file in the new repo.

  2. If you see this extra manual directory, run cp -n -r ../doc/manual/source/* ./doc/source.

  3. If files are moved to a different path, open the project in PyCharm and do a global search (ctrl + shift + f) for ../ or .. and modify all relative path instances.

  4. Any files that we moved over from the old place, but put into a new location in the new repo, we need to delete them from git. For example, files that were in doc/manual/source/ in the old repo but are not doc/source we correct by typing git add doc/manual/source.

3.2. Render API documentation

When you see files with ..automodule:: within them, these are API documentation. However, these are not populated. We will populate them using our release scripts.

  1. Make sure you have our release scripts repository. Go to dev and run git clone https://github.com/Billingegroup/release-scripts.git.

  2. Enter your scikit-package package directory. For example, I would run cd ./diffpy.pdfmorph/diffpy.pdfmorph.

  3. Build the package using python -m build. You may have to install python-build first.

  4. Get the path of the package directory proper. In the case of diffpy.pdfmorph, this is ./src/diffpy/pdfmorph. In general, for a.b.c, this is ./src/a/b/c.

  5. Run the API script. This is done by running python <path_to_auto_api> <package_name> <path_to_package_proper> <path_to_api_directory>.

    If you have followed the steps above, the command is python ../../release-scripts/auto_api.py <package_name> <path_to_package_proper> ./doc/source/api.

    Make sure you build the documentation by going to /doc and running make html. The error “No module named” (e.g. WARNING: autodoc: failed to import module 'tools' from module 'diffpy.pdfmorph'; the following exception was raised: No module named 'diffpy.utils') can be resolved by adding autodoc_mock_imports = [<pkg>] to your conf.py right under imports. This file is located in /doc/source/conf.py. In the case of PDFmorph, this was done by adding autodoc_mock_imports = ["diffpy.utils",].

Congratulations! You may now commit the changes made by auto_api.py (and yourself) and push this commit. Create a PR to the package branch.

3.3. Build documentation locally

Follow these steps sequentially:

# Create a new environment, specify the Python version and install packages
conda create -n <project-name>_env \
    --file requirements/test.txt \
    --file requirements/conda.txt \
    --file requirements/build.txt

# Activate the environment
conda activate diffpy_utils_env

cd doc
make html
open open build/html/index.html

To run as a single command:

cd doc && make html && open build/html/index.html && cd ..

Your default browser will open the documentation in a new window.

4. Clean up

4.1. Check LICENSE and README

  1. For the package branch, make a <branchname>.rst file by copying TEMPLATE.rst in the news folder and under “fixed” put Repo structure modified to the new diffpy standard

  2. Check the README and make sure that all parts have been filled in and all links resolve correctly.

  3. Run through the documentation online and do the same, fix grammar and make sure all links work.

  4. Recall in your local, you are currently in the re-packaged directory.

4.2. Clean up the old directory

  1. Then rename the old directory to mv ../../<package-name> ../../<package-name>-old. You will have then user/dev/<package-name>/<package-name> and user/dev/<package-name>-old/<package-name>.

  2. Type ../.. to go back to the dev directory.

  3. Type git clone <https://github.com<org-name>/<project-name>.

  4. Test your package by running pytest.

    # Create a new environment, specify the Python version and install packages
    conda create -n <package_name>_env python=3.13 \
        --file requirements/test.txt \
        --file requirements/conda.txt \
        --file requirements/build.txt
    
    # Activate the environment
    conda activate <package_name>_env
    
    # Install your package locally
    # `--no-deps` to NOT install packages again from `requirements.pip.txt`
    pip install -e . --no-deps
    
    # Run pytest locally
    pytest
    
    # ... run example tutorials
    
  5. Good to go! Once the test is successful, you can delete the old directory by typing rm -rf <package-name>-old.

What’s next?

Congratulations! Your package has been successfully migrated. This has been the most challenging step. To distribute and build your doc locally, follow the instructions in the release guide next.