Skip to content

feat: return uploaded file URL from Repo.upload and upload_files#688

Open
singhvishalkr wants to merge 3 commits intoDagsHub:mainfrom
singhvishalkr:feat/upload-return-url
Open

feat: return uploaded file URL from Repo.upload and upload_files#688
singhvishalkr wants to merge 3 commits intoDagsHub:mainfrom
singhvishalkr:feat/upload-return-url

Conversation

@singhvishalkr
Copy link
Copy Markdown

Summary

Both Repo.upload() and the module-level upload_files() now return the DagsHub UI URL where the uploaded file(s) can be viewed.

Fixes #238

Changes

  • Repo.upload() returns str — the browsable DagsHub URL constructed via the existing get_files_ui_url() helper
  • upload_files() (module-level convenience wrapper) passes through that return value
  • Added Returns section to both docstrings

Motivation

Currently Repo.upload() returns None, forcing callers to manually reconstruct the URL if they want to link to the uploaded content. The get_files_ui_url() helper was already used internally for directory upload logging — this change exposes the same URL to callers.

Testing

Verified that get_files_ui_url produces the correct {owner}/{reponame}/src/{branch}/{path} URL by tracing the existing directory upload log message. No new dependencies introduced.

Made with Cursor

Fixes DagsHub#238

Both Repo.upload() and the module-level upload_files() now return the
DagsHub UI URL where the uploaded file(s) can be viewed. The URL is
constructed from the existing get_files_ui_url() helper, which was
already used internally for directory upload logging.

Made-with: Cursor
@dagshub
Copy link
Copy Markdown

dagshub Bot commented Apr 21, 2026

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Warning

Rate limit exceeded

@singhvishalkr has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 45 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 18 minutes and 45 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99955438-c924-4aeb-8bd5-7bd6b27ccf15

📥 Commits

Reviewing files that changed from the base of the PR and between cee12b6 and 2a191ec.

📒 Files selected for processing (1)
  • dagshub/upload/wrapper.py
📝 Walkthrough

Walkthrough

Updated dagshub/upload/wrapper.py: upload_files() and Repo.upload() now declare -> Optional[str] and return the upload result. For bucket uploads the method returns None; for non-bucket uploads it returns the computed DagsHub file-browser URL (using kwargs["new_branch"] or self.branch).

Changes

Cohort / File(s) Summary
Upload wrapper changes
dagshub/upload/wrapper.py
Added explicit return type -> Optional[str] to upload_files(...) and Repo.upload(...). Repo.upload now returns None for bucket=True and returns a computed DagsHub file-browser URL for non-bucket uploads (uses kwargs["new_branch"] or self.branch). Docstrings updated to document Returns:. Upload mechanics and lower-level methods unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped along the upload trail,
A URL caught on my whiskered tail,
For buckets I nod and quietly sleep,
For files I point where they live and keep —
Hop, click, and find your data tale.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding return values (file URLs) to Repo.upload() and upload_files().
Description check ✅ Passed The description accurately explains the changes, motivation, and testing approach, directly relating to the changeset modifications.
Linked Issues check ✅ Passed The PR fully addresses issue #238 by implementing URL returns in Repo.upload() and upload_files() as required, using the existing get_files_ui_url() helper.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue requirements: return type updates, URL computation, docstring updates, and conditional None returns for bucket uploads.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dagshub/upload/wrapper.py (1)

331-341: ⚠️ Potential issue | 🟠 Major

Return the correct URL for each upload target.

Line 341 unconditionally returns a repo source URL using self.branch, which is incorrect for two cases:

  1. When bucket=True, the method uploads to the S3 bucket but returns a repo URL that won't show the uploaded files.
  2. When new_branch is passed in kwargs, the files are uploaded to that branch but the returned URL points to self.branch instead.

However, a bucket UI URL helper method does not currently exist in the codebase. Determine whether bucket uploads should return a URL and what that should be before implementing. For the new_branch case, use kwargs.get("new_branch") with get_repo_url(FILES_UI_URL, remote_path, branch=...) to return the correct branch URL.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae242e7d-88ef-4c8e-8b96-0c7a530b682e

📥 Commits

Reviewing files that changed from the base of the PR and between 474f305 and 1b1b2e3.

📒 Files selected for processing (1)
  • dagshub/upload/wrapper.py
📜 Review details
🔇 Additional comments (1)
dagshub/upload/wrapper.py (1)

187-213: LGTM: wrapper now propagates the upload URL.

The module-level upload_files(...) signature, docstring, and return repo_obj.upload(...) change align with the new public contract.

@singhvishalkr
Copy link
Copy Markdown
Author

Thanks @coderabbitai -- both catches are valid. Pushed cee12b6 addressing them:

  1. new_branch kwarg: when Repo.upload(..., new_branch="branch-x") is called, the returned URL now uses branch-x instead of self.branch. Implementation threads kwargs.get("new_branch") or self.branch into get_repo_url(FILES_UI_URL, remote_path, branch=...).
  2. bucket=True: there is no canonical file-browser URL for S3 bucket uploads, so the function now returns None in that case rather than a misleading repo source URL. Signature changed to Optional[str] and the docstring calls this out explicitly.

Ready for re-review.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull up the latest version of the file to verify the implementation.


🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs cat -n

Length of output: 45737


@singhvishalkr: Both fixes look correct in cee12b6. Here's what I verified:

1. new_branch kwarg (Lines 345–346):

target_branch = kwargs.get("new_branch") or self.branch
return self.get_repo_url(FILES_UI_URL, remote_path, branch=target_branch)

The resolution order is correct — explicit new_branch takes priority over self.branch, and kwargs are already forwarded through both code paths (single-file and directory) before reaching this line. The edge case where Repo.upload_files internally promotes a branch on BranchNotFoundError is also handled correctly since kwargs.get("new_branch") falls back to self.branch which is already set.

2. bucket=True (Lines 334–336):

if bucket:
    self.upload_files_to_bucket(local_path, remote_path, **kwargs)
    return None

Clean early return. Both signatures correctly reflect -> Optional[str] and the docstrings call out the None-for-bucket behaviour explicitly. ✅

3. Type consistency: Optional[str] is correctly applied to both the module-level upload_files (Line 194) and Repo.upload (Line 300). ✅

No issues found. The implementation is solid.

[approve]

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
dagshub/upload/wrapper.py (1)

334-336: Optional: consider returning the S3 URI for bucket uploads instead of None.

Returning None for bucket uploads is consistent with the docstring, but callers that want a pointer to the uploaded data still have to reconstruct it themselves (the same motivation that led to this PR). A canonical s3://<bucket>/<remote_path> URI (or the DagsHub storage bucket browser URL, if one exists) would make the return type more useful and avoid the asymmetry where one branch of upload() returns a locator and the other returns nothing. Not blocking — just worth considering as a follow-up.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d5c79e2-3a87-49bd-a5a1-bf09775e9431

📥 Commits

Reviewing files that changed from the base of the PR and between 1b1b2e3 and cee12b6.

📒 Files selected for processing (1)
  • dagshub/upload/wrapper.py

Comment thread dagshub/upload/wrapper.py
@singhvishalkr
Copy link
Copy Markdown
Author

@coderabbitai thanks for the thorough scan. Pushed \2a191ec\ addressing the major issue:

Fixed (in-scope):

  • Slash preservation in URL — switched \get_repo_url\ to \urllib.parse.quote(directory, safe="/"). Nested paths like \data/images/foo.png\ now render as path separators instead of %2F, matching DagsHub's file-browser URL format (/{owner}/{repo}/src/{branch}/{path}). This also corrects the \CONTENT_UPLOAD_URL\ caller, which was silently relying on the server to decode %2F\ in REST paths.
  • Path normalization consistency
    emote_path\ is now normalized once via \posixpath.normpath(Path(remote_path).as_posix())\ at the entry of \Repo.upload. This matches \DataSet._clean_directory_name\ exactly, so inputs like \ oo/../bar\ or ./data\ resolve to the same path used by \self.directory()\ and therefore the returned URL.

Declined (nit): returning \s3:///<remote_path>\ for bucket uploads. The docstring commits specifically to a file-browser URL, and \s3://\ URIs aren't browser-navigable — they're storage identifiers that most callers would have to translate anyway. If DagsHub later exposes a bucket browser URL (e.g. /{owner}/{repo}/bucket/...), a follow-up PR can return that and remove the \None\ branch; happy to file a tracking issue if that UI helper is on the roadmap. Until then, \None\ is the honest signal.

Ready for another pass.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull the latest state of the file to verify both fixes.


🏁 Script executed:

#!/bin/bash
fd -e py wrapper.py --full-path | xargs cat -n

Length of output: 45758


@singhvishalkr: Both fixes in 2a191ec are verified and correct.

1. Slash preservation — get_repo_url (Lines 596–607):

path=urllib.parse.quote(directory, safe="/"),

Applied centrally in get_repo_url, so it benefits both FILES_UI_URL (the new return value) and CONTENT_UPLOAD_URL (the existing REST upload path) callers in one shot. Clean.

2. Path normalization — Repo.upload (Line 332):

remote_path = posixpath.normpath(cast(str, Path(remote_path).as_posix()))

Placed after the if not remote_path: resolution block and before the bucket / non-bucket branch split, so it normalizes once for all paths. Matches DataSet._clean_directory_name (Line 797) exactly. ✅

3. Bucket test compatibility: The test_bucket_upload parametrized case with remote_path="new_data/" still passes — after normpath it becomes "new_data", and os.path.join("new_data", relative_path) in upload_files_to_bucket produces the same expected S3 paths. ✅

4. All prior fixes intact: new_branch kwarg resolution (Lines 345–346), bucket=True early return None (Lines 334–336), and Optional[str] signatures (Lines 194, 300). ✅

The declination of s3:// URI is well-reasoned — keeping None as an honest signal until a proper bucket browser URL is available is the right call.

No issues found. The implementation is solid.

[approve]

Addresses CodeRabbit review on DagsHub#688:
- Repo.upload(..., new_branch=...) now returns the URL for the target branch instead of self.branch.
- Repo.upload(..., bucket=True) and upload_files(..., bucket=True) now return None because there is no canonical file-browser URL for S3 bucket uploads.
- Signature + docstring updated to Optional[str].
Address CodeRabbit review on DagsHub#688:

1. get_repo_url now uses urllib.parse.quote(path, safe="/") so nested
   paths like data/images/foo.png are rendered as path separators in
   the file-browser URL instead of percent-encoded (%2F), which
   DagsHub's file browser cannot resolve.

2. remote_path is normalized with posixpath.normpath() at entry of
   Repo.upload, matching the normalization applied by
   DataSet._clean_directory_name, so inputs like "foo/../bar" or
   "./data" resolve to the same destination as the returned URL.

Signed-off-by: Vishal Kumar Singh <vishal.kr.singh2021@gmail.com>
@singhvishalkr
Copy link
Copy Markdown
Author

Coderabbit has re-scanned and the inline suggestions from 2a191ec are all resolved on its side. Ready for a human review whenever someone has cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Repo.upload method will return url of that uploaded file in dagshub/upload/wrapper.py

1 participant