close
The Wayback Machine - https://web.archive.org/web/20200716010401/https://github.com/denoland/deno/issues/4188
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: ops/fs.rs should use tokio::fs where applicable #4188

Open
bartlomieju opened this issue Feb 29, 2020 · 6 comments
Open

refactor: ops/fs.rs should use tokio::fs where applicable #4188

bartlomieju opened this issue Feb 29, 2020 · 6 comments

Comments

@bartlomieju
Copy link
Contributor

@bartlomieju bartlomieju commented Feb 29, 2020

There are many functions available in tokio::fs that can be used to replace current blocking fs function used in cli/ops/fs.rs.

Example:

deno/cli/ops/fs.rs

Lines 189 to 214 in a29343c

fn op_copy_file(
state: &State,
args: Value,
_zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
let args: CopyFileArgs = serde_json::from_value(args)?;
let from = deno_fs::resolve_from_cwd(Path::new(&args.from))?;
let to = deno_fs::resolve_from_cwd(Path::new(&args.to))?;
state.check_read(&from)?;
state.check_write(&to)?;
debug!("op_copy_file {} {}", from.display(), to.display());
let is_sync = args.promise_id.is_none();
blocking_json(is_sync, move || {
// On *nix, Rust deem non-existent path as invalid input
// See https://github.com/rust-lang/rust/issues/54800
// Once the issue is reolved, we should remove this workaround.
if cfg!(unix) && !from.is_file() {
return Err(OpError::not_found("File not found".to_string()));
}
fs::copy(&from, &to)?;
Ok(json!({}))
})
}

It can be rewritten as follows:

fn op_copy_file(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: CopyFileArgs = serde_json::from_value(args)?;
  let from = deno_fs::resolve_from_cwd(Path::new(&args.from))?;
  let to = deno_fs::resolve_from_cwd(Path::new(&args.to))?;

  state.check_read(&from)?;
  state.check_write(&to)?;

  debug!("op_copy_file {} {}", from.display(), to.display());
  let fut = async move {
    // On *nix, Rust deem non-existent path as invalid input
    // See https://github.com/rust-lang/rust/issues/54800
    // Once the issue is resolved, we should remove this workaround.
    if cfg!(unix) && !from.is_file() {
      return Err(OpError::not_found("File not found".to_string()));
    }

    tokio::fs::copy(&from, &to).await?;
    Ok(json!({}))
  }.boxed_local();

  if args.promise_id.is_none() {
    let result = futures::executor::block_on(fut)?;
    Ok(JsonOp::Sync(result))
  } else {
    Ok(JsonOp::Async(fut))
  }
}

Last bit could be factored out as maybe_sync_op similar to blocking_json.
This change should remove as many calls to as possible blocking_json - it's better to use async functions where possible - spawn_blocking requires separate threadpool for running blocking ops and we want to avoid that if possible.

@dubiousjim
Copy link
Contributor

@dubiousjim dubiousjim commented Mar 14, 2020

I'm working on this issue now.

Old version of op_realpath:

fn op_realpath(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: RealpathArgs = serde_json::from_value(args)?;
  let path = resolve_from_cwd(Path::new(&args.path))?;

  state.check_read(&path)?;

  let is_sync = args.promise_id.is_none();
  blocking_json(is_sync, move || {
    debug!("op_realpath {}", path.display());
    // corresponds to the realpath on Unix and
    // CreateFile and GetFinalPathNameByHandle on Windows
    let realpath = fs::canonicalize(&path)?;
    let mut realpath_str =
      realpath.to_str().unwrap().to_owned().replace("\\", "/");
    if cfg!(windows) {
      realpath_str = realpath_str.trim_start_matches("//?/").to_string();
    }
    Ok(json!(realpath_str))
  })
}

New version, passes tests:

fn op_realpath(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: RealpathArgs = serde_json::from_value(args)?;
  let path = resolve_from_cwd(Path::new(&args.path))?;

  state.check_read(&path)?;

  let is_sync = args.promise_id.is_none();
  let fut = async move {
    debug!("op_realpath {}", path.display());
    // corresponds to the realpath on Unix and
    // CreateFile and GetFinalPathNameByHandle on Windows
    let realpath = tokio::fs::canonicalize(&path).await?;
    let mut realpath_str =
      realpath.to_str().unwrap().to_owned().replace("\\", "/");
    if cfg!(windows) {
      realpath_str = realpath_str.trim_start_matches("//?/").to_string();
    }
    Ok(json!(realpath_str))
  };

  if is_sync {
    let buf = futures::executor::block_on(fut)?;
    Ok(JsonOp::Sync(buf))
  } else {
    Ok(JsonOp::Async(fut.boxed_local()))
  }
}

Would like to refactor into a shorthand like this, but can't get it to work atm.

fn op_realpath(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: RealpathArgs = serde_json::from_value(args)?;
  let path = resolve_from_cwd(Path::new(&args.path))?;

  state.check_read(&path)?;

  let is_sync = args.promise_id.is_none();
  tokio_json(is_sync, async || {
    debug!("op_realpath {}", path.display());
    // corresponds to the realpath on Unix and
    // CreateFile and GetFinalPathNameByHandle on Windows
    let realpath = tokio::fs::canonicalize(&path).await?;
    let mut realpath_str =
      realpath.to_str().unwrap().to_owned().replace("\\", "/");
    if cfg!(windows) {
      realpath_str = realpath_str.trim_start_matches("//?/").to_string();
    }
    Ok(json!(realpath_str))
  })
}

with a helper function in cli/ops/dispatch_json.rs:

pub fn tokio_json<F>(is_sync: bool, f: F) -> Result<JsonOp, OpError>
where
  F: 'static + Send + FnOnce() -> JsonResult,
{
  let fut = async move { f() }.boxed_local();
  if is_sync {
    let result = futures::executor::block_on(fut)?;
    Ok(JsonOp::Sync(result))
  } else {
    Ok(JsonOp::Async(fut))
  }
}
@dubiousjim
Copy link
Contributor

@dubiousjim dubiousjim commented Mar 15, 2020

@bartlomieju (cc @ry),

In some cases we can move to a tokio version, but I'm not sure whether it's an improvement to do so.

The existing implementation of op_mkdir atomically applies a supplied mode (which it applies the process's current umask to), and leaves nothing behind if it fails. Here's how it looks (after some refactoring I did to master):

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct MkdirArgs {
  promise_id: Option<u64>,
  path: String,
  recursive: bool,
  mode: Option<u32>,
}

fn op_mkdir(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: MkdirArgs = serde_json::from_value(args)?;
  let path = resolve_from_cwd(Path::new(&args.path))?;
  let mode = args.mode.unwrap_or(0o777) & 0o777;

  state.check_write(&path)?;

  let is_sync = args.promise_id.is_none();
  blocking_json(is_sync, move || {
    debug!("op_mkdir {} {:o} {}", path.display(), mode, args.recursive);
    let mut builder = std::fs::DirBuilder::new();
    builder.recursive(args.recursive);
    #[cfg(unix)]
    {
      use std::os::unix::fs::DirBuilderExt;
      builder.mode(mode);
    }
    builder.create(path)?;
    Ok(json!({}))
  })
}

Possible downsides of that approach are: (1) it uses blocking_json; and (2) the way DirBuilder works with a supplied mode, any intermediate directories that get created with a recursive call also get the mode applied. (So say the docs, I haven't verified.) This is arguably not the best behavior. (Python first did things this way, then changed to making the intermediate directories just have default modes --- that is, 0o777 & !umask.)

Here is a tokio-ized version of op_mkdir. It solves (1) and in passing also solves (2): if one deals with the umask at all, it's easiest here to only apply it to the target directory, and not any intermediate ones.

fn op_mkdir(
  state: &State,
  args: Value,
  _zero_copy: Option<ZeroCopyBuf>,
) -> Result<JsonOp, OpError> {
  let args: MkdirArgs = serde_json::from_value(args)?;
  let path = resolve_from_cwd(Path::new(&args.path))?;
  let mode = args.mode.unwrap_or(0o777) & 0o777;

  state.check_write(&path)?;

  let is_sync = args.promise_id.is_none();
  let fut = async move {
    debug!("op_mkdir {} {:o} {}", path.display(), mode, args.recursive);
    if args.recursive {
      // exit early if dir already exists, so that we don't
      // try to apply mode and remove it on failure
      if path.is_dir() {
        return Ok(json!({}))
      }
      tokio::fs::create_dir_all(&path).await?;
    } else {
      tokio::fs::create_dir(&path).await?;
    }
    if let Some(_) = args.mode {
      #[cfg(unix)]
      {
        use std::os::unix::fs::PermissionsExt;
        // we have to query (takes 2 syscalls) and apply umask by hand
        let permissions = PermissionsExt::from_mode(mode & !umask(None));
        match tokio::fs::set_permissions(&path, permissions).await {
          Ok(()) => (),
          Err(e) => {
            // couldn't apply mode, so remove_dir then propagate error
            // if dir already existed (and so might not be empty)
            // we'll already have exited (if args.recursive) or failed
            tokio::fs::remove_dir(path).await?;
            return Err(OpError::from(e));
          }
        }
      }
    }
    Ok(json!({}))
  };

  if is_sync {
    let buf = futures::executor::block_on(fut)?;
    Ok(JsonOp::Sync(buf))
  } else {
    Ok(JsonOp::Async(fut.boxed_local()))
  }
}

The downsides of this implementation is that it requires a lot more syscalls, in order to be as robust as the previous version. (For example, if the attempt to apply a mode fails for some reason, we shouldn't leave a directory behind with different mode than requested. But if we did a recursive mkdir and the directory already existed, we shouldn't in that case try to apply the mode.)

It may be useful to know that the blocking_json calls can't be completely eliminated. There's no tokio version of (at least) chown and utime.

Given that, what's your preference between these versions of op_mkdir? Should we use the tokio version? Or let this be another case where we stick to blocking_json.

Similar issues will apply to the op for makeTempDir.

@dubiousjim
Copy link
Contributor

@dubiousjim dubiousjim commented Mar 15, 2020

I've got this issue 95% implemented now. There's a lot of subtleties, so it wouldn't have been such a great "first issue." :-)

Am waiting feedback on what to do with mkdir and makeTempDir, and for some other PRs I've already pushed or have queued (related to #4017) to be merged, before pushing.

@bartlomieju
Copy link
Contributor Author

@bartlomieju bartlomieju commented Mar 16, 2020

@dubiousjim how many of ops from "fs" you could rewrite to Tokio version without problems described above? We could move gradually as tokio adds more APIs to fs module. If some op requires a lot of refactor just for sake of moving to Tokio then it's not worth it

@dubiousjim
Copy link
Contributor

@dubiousjim dubiousjim commented Mar 16, 2020

@bartlomieju I was able to do all of them except the ones I mentioned (chown, utime, with mkdir/makeTemp up in the air).

My ftruncate etc ops (still in the queue for #4017, waiting for other PRs to be merged first) use the nix crate, so they mostly don't have tokio correlates, so they're written instead like this:

  let fut = async move {
    #[cfg(unix)]
    ...
    Ok(json!({}))
  };

  if is_sync {
    let buf = futures::executor::block_on(fut)?;
    Ok(JsonOp::Sync(buf))
  } else {
    Ok(JsonOp::Async(fut.boxed_local()))
  }

I ended up doing a lot of refactoring of cli/fs.rs and cli/ops/fs.rs while doing this, but it was all worthwhile anyway. I'll push the refactoring + the tokio-izing soon (as either 1 or 2 PRs, I welcome input about which would be preferred).

For mkdir, it's just up to you and/or @ry and/or other core devs which implementation you prefer. I have them both worked out. And once you've chosen for mkdir, I'll do makeTempDir on the same model.

dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 17, 2020
This helps refactoring, especially when std::fs calls are replaced with tokio::fs calls (denoland#4188)
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 17, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 17, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 17, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 19, 2020
This helps refactoring, especially when std::fs calls are replaced with tokio::fs calls (denoland#4188)
ry pushed a commit that referenced this issue Mar 20, 2020
This a complex boring PR that shifts around code (primarily) in cli/fs.rs and
cli/ops/fs.rs. The gain of this refactoring is to ease the way for #4188 and
#4017, and also to avoid some future development pain.

Mostly there is no change in functionality. Except:
* squashed bugs where op_utime and op_chown weren't using `resolve_from_cwd`
* eliminated the use of the external `remove_dir_all` crate.
* op_chmod now only queries metadata to verify file/dir exists on Windows (it
  will already fail on Unix if it doesn't)
* op_chown now verifies the file/dir's existence on Windows like chmod does.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 21, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 23, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 24, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 26, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 26, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
dubiousjim added a commit to dubiousjim/deno that referenced this issue Mar 26, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
@dubiousjim
Copy link
Contributor

@dubiousjim dubiousjim commented Mar 26, 2020

It'd help to strategize together with some core devs more about this.

As described above, I've already re-implemented most of cli/ops/fs.rs in terms of tokio::fs. (The make_temp functions are the only ones where this could still be done; with chown and utime it's not currently possible.) I've written a bunch more tests and so on. This is all just waiting in a queue of PRs I have. (They're blocked by some others that are now waiting for review. Even once those are decided on, the queue still has to be reviewed and decided on in stages. It requires too much rebasing and conflict fixing if they're merged in any old order.)

The tokio-izing isn't straightforward. There are subtle differences in behavior and exceptional cases between std::fs and tokio::fs. This is worth keeping in mind for what I'm about to say.

While waiting for my queue of PRs to advance, I was looking ahead to what other file-system changes might be needed soon. I was interested in adding openat and similar operations (this interacts with #4440), and the option to supply O_NOFOLLOW flags more generally. (Currently we only distinguish between stat and lstat.) The latter would be central to the strategy @kevinkassimo proposed (in #2319) for dealing with #2318.

There's no technical obstacle to doing that, in lots of cases. The nix crate is getting pretty good support for these things. We'd just have to upgrade to nix 0.17. (Currently we're using nix 0.14, since that's what the version of rustyline we use already depends on. But rustyline upgraded to nix 0.17 in their master branch, so as soon as they make a new release we won't need to be pulling in two versions of nix.)

However, the strategy of (1) replacing std::fs calls with async tokio::fs calls, and thereby avoiding blocking_json; and the strategy of (2) replacing std::fs calls (on Unix) with calls from the nix crate, for richer Unix functionality like fchmodat, lchmod, and so on, are exclusive. We can't do both. So --- although a well-tested switchover to tokio::fs is available --- one can envisage us soon thereafter shifting back again to blocking calls to functions from nix inside blocking_json.

I wanted to invite other dev's thoughts/feedback about this.

dubiousjim added a commit to dubiousjim/deno that referenced this issue Apr 1, 2020
When we shift from std::fs::read_dir to the tokio::fs version (denoland#4188), the processing of entries will be an async function, and Rust doesn't currently have a stable implementation of async closures. So instead of using `filter_map` with a closure, we iterate and build the Vec by hand. This makes the shift to tokio::fs more straightforward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.