dask

We're trying to introduce Parquet into our team, and the largest blocker that we've seen is the dreaded "schemas are inconsistent" error message:

RuntimeError: Schemas are inconsistent, try using to_parquet(..., schema="infer"), or pass an explicit pyarrow schema. Such as to_parquet(..., schema={"column1": pa.string()})

This error message is super unhelpful: surely Dask knows what th

Is your feature request related to a problem? Please describe.
Our Python docstrings have various style violations when compared against standards like pep257. Not only does this impact readability (which may be subjective), it also reduces the effectiveness of tools like Sphinx or numpydoc that rely on specific formatting in order to parse docstrings.

Is your feature request related to a problem? Please describe.
Implements classification_report for classification metrics.(https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)

The stumpy.snippets feature is now completed in #283 which follows this work:

We have a rough notebook t

tornado.IOLoop.run_sync is deprecated and must be removed from our code base.

The CLI scripts are all calling this and a replacement with asyncio.run should be possible

Caveats

The way we handle signals needs to be adjusted
Once asyncio.run finishes we need to ensure the tornado loop is also closed
behaviour of preload modules may be affected if they are using loops about whe

Feature Request

Is your feature request related to a problem? Please describe.

Whenever I report a bug, I need to confirm what satpy version I am using. This is of course important, but it's also an extra step that could be semi-automated.

Describe the solution you'd like

I would like that debug_on() prints the relevant versions. When we report bugs, we anyway call `debu

Is your feature request related to a problem? Please describe.
Look at here

If taking just one row with our sorting, we may use GROUP BY and FIRST to solve this problem, it can be a lot faster. Let's add this special handling.

Code Sample, a minimal, complete, and verifiable piece of code

from pyresample.boundary import Boundary
b = Boundary(my_lons, my_lats)
print(b.contour_poly.area())

Problem description

The above code doesn't fail if the provided lons/lats are 2D (not sure on 3D+), but the class and all functions/utilities underneath it assume 1D arrays. The end results are incor

@romainr

The ML implementation is still a bit experimental - we can improve on this:

SHOW MODELS and DESCRIBE MODEL
Hyperparameter optimizations, AutoML-like behaviour
@romainr brought up the idea of exporting models (#191, still missing: onnx - see discussion in the PR by @rajagurunath)
and some more showcases and examples

Does HyperGBM's make_experiment return the best model?
How does it work on paramter tuning? It's say that, what's its seach space (e.g. in XGboost)???

from dask_jobqueue import SLURMCluster 
cluster = SLURMCluster(cores=1, memory='1GB') 
print(cluster.job_script())

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=954M
#SBATCH -t 00:30:00

/home/lesteve/miniconda3/bin/python -m distributed.cli.dask_worker tcp://192.168.0.11:44065 --nthreads 1 --memory-limit 1000.00MB -

Problem description

Reading a dataset with eager's read functionality raises a ValueError when providing columns.

Example code (ideally copy-pastable)

import pandas as pd

from tempfile import TemporaryDirectory
from functools import partial
from storefact import get_store_from_url

from kartothek.io.eager import store_dataframes_as_dataset, read_dataset_as_data

Example for numerical weather prediction

to be added to initialised datasets

Data sources (to) implement(ed):

relates to #600

Without thinking I put resampling="bilinear" and got an error when I called .compute()

Traceback (most recent call last):
  File "carajas.py", line 92, in <module>
    band_medianNP = band_median.compute()
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/dataarray.py", line 899, in compute
    return new.load(**kwargs)
  File "/home/ubuntu/anaco

https://forum.image.sc/t/imsave-tifffile-imwrite-returns-tiff-files-of-wrong-shape-in-imagej-cellprofiler/65332/9?u=jacksonmaxfield

The dim_order parameter should be used as the parameter to aicsimageio.transforms.reshape_data with TCZYX as the return order (optional S)

Currently all of the metrics computed are independent of a target variable or column, but if lens.summarise took the name of a column as the target variable, the output of some metrics could be more interpretable even if the target variable is not used in any kind of predictive modelling.

A good example of this could be PCA (see #14), which could plot the different categories of the target va

Apr	MAY	Jun
	02
2021	2022	2023

dask

Here are 290 public repositories matching this topic...

dask / dask

rapidsai / cudf

pydata / xarray

mars-project / mars

TDAmeritrade / stumpy

jmcarpenter2 / swifter

ibis-project / ibis

dask / distributed

Caveats

hi-primus / optimus

itamarst / eliot

pytroll / satpy

Feature Request

fugue-project / fugue

ranaroussi / pystore

polyaxon / datatile

pytroll / pyresample

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

timkpaine / paperboy

JiaweiZhuang / xESMF

dask-contrib / dask-sql

DataCanvasIO / HyperGBM

dask / dask-jobqueue

Ouranosinc / xclim

JDASoftwareGroup / kartothek

Problem description

Example code (ideally copy-pastable)

pangeo-data / climpred

aws-samples / amazon-sagemaker-local-mode

LDO-CERT / orochi

hi-primus / bumblebee

gjoseph92 / stackstac

AllenCellModeling / aicsimageio

dask / dask-ec2

facultyai / lens

Improve this page

Add this topic to your repo