pygpu package¶
pygpu.gpuarray module¶
-
class
pygpu.gpuarray.
GpuArray
¶ Device array
To create instances of this class use
zeros()
,empty()
orarray()
. It cannot be instantiated directly.You can also subclass this class and make the module create your instances by passing the cls argument to any method that return a new GpuArray. This way of creating the class will NOT call your
__init__()
method.You can also implement your own
__init__()
method, but you must take care to ensure you properly initialized the GpuArray C fields before using it or you will most likely crash the interpreter.-
T
¶
-
astype
(dtype, order='A', copy=True)¶ Cast the elements of this array to a new type.
This function returns a new array will all elements cast to the supplied dtype, but otherwise unchanged.
If copy is False and the type and order match self is returned.
Parameters: - dtype (str or numpy.dtype or int) – type of the elements of the result
- order ({'A', 'C', 'F'}) – memory layout of the result
- copy (bool) – Always return a copy?
-
base
¶
-
base_data
¶ Return a pointer to the backing OpenCL object.
-
context
¶
-
copy
(order='C')¶ Return a copy if this array.
Parameters: order ({'C', 'A', 'F'}) – memory layout of the copy
-
data
¶ Return a pointer to the raw OpenCL buffer object.
This will fail for arrays that have an offset.
-
dtype
¶ The dtype of the element
-
flags
¶ Return a flags object describing the properties of this array.
- This is mostly numpy-compatible with some exceptions:
- Flags are always constant (numpy allows modification of certain flags in certain cicumstances).
- OWNDATA is always True, since the data is refcounted in libgpuarray.
- UPDATEIFCOPY is not supported, therefore always False.
-
get_ipc_handle
()¶
-
gpudata
¶ Return a pointer to the raw backend object.
-
itemsize
¶ The size of the base element.
-
ndim
¶ The number of dimensions in this object
-
offset
¶ Return the offset into the gpudata pointer for this array.
-
read
(dst)¶ Reads from this GpuArray into host’s Numpy array.
This method is as fast as or even faster than :ref:__array__ method and thus :ref:numpy.asarray. This is because it skips allocation of a new buffer in host’s memory to contain device’s GpuArray. It uses an existing Numpy ndarray as a buffer to get the GpuArray. It is required though that the GpuArray and the Numpy array to be compatible in byte size, contiguity and data type. It is also needed for dst to be writeable and properly aligned in host’s memory and for self to be contiguous. It is allowed for this GpuArray and dst to have different shapes.
Parameters: dst (numpy.ndarray) – destination array in host Raises: ValueError
– If this GpuArray is not compatible with src or if dst is not well behaved.
-
reshape
(shape, order='C')¶ Returns a new array with the given shape and order.
The new shape must have the same size (total number of elements) as the current one.
-
shape
¶ shape of this ndarray (tuple)
-
size
¶ The number of elements in this object.
-
strides
¶ data pointer strides (in bytes)
-
sync
()¶ Wait for all pending operations on this array.
This is done automatically when reading or writing from it, but can be useful as a separate operation for timings.
-
take1
(idx)¶
-
transfer
(new_ctx)¶
-
transpose
(*params)¶
-
typecode
¶ The gpuarray typecode for the data type of the array
-
view
(cls=GpuArray)¶ Return a view of this array.
The returned array shares device data with this one and both will reflect changes made to the other.
Parameters: cls (type) – class of the view (must inherit from GpuArray)
-
write
(src)¶ Writes host’s Numpy array to device’s GpuArray.
This method is as fast as or even faster than :ref:asarray, because it skips possible allocation of a buffer in device’s memory. It uses this already allocated GpuArray buffer to contain src array from host’s memory. It is required though that the GpuArray and the Numpy array are compatible in byte size and data type. It is also needed for the GpuArray to be well behaved and contiguous. If src is not aligned or compatible in contiguity it will be copied to a new Numpy array in order to be. It is allowed for this GpuArray and src to have different shapes.
Parameters: src (numpy.ndarray) – source array in host Raises: ValueError
– If this GpuArray is not compatible with src or if it is not well behaved or contiguous.
-
-
exception
pygpu.gpuarray.
GpuArrayException
¶ Exception used for most errors related to libgpuarray.
-
class
pygpu.gpuarray.
GpuContext
¶ Class that holds all the information pertaining to a context.
The currently implemented modules (for the kind parameter) are “cuda” and “opencl”. Which are available depends on the build options for libgpuarray.
The flag values are defined in the gpuarray/buffer.h header and are in the “Context flags” group. If you want to use more than one value you must bitwise OR them together.
If you want an alternative interface check
init()
.Parameters: - kind (str) – module name for the context
- devno (int) – device number
- flags (int) – context flags
-
bin_id
¶ Binary compatibility id
-
devname
¶ Device name for this context
-
free_gmem
¶ Size of free global memory on the device
-
kind
¶
-
largest_memblock
¶ Size of the largest memory block you can allocate
-
lmemsize
¶ Size of the local (shared) memory, in bytes, for this context
-
maxgsize0
¶ Maximum global size for dimension 0
-
maxgsize1
¶ Maximum global size for dimension 1
-
maxgsize2
¶ Maximum global size for dimension 2
-
maxlsize0
¶ Maximum local size for dimension 0
-
maxlsize1
¶ Maximum local size for dimension 1
-
maxlsize2
¶ Maximum local size for dimension 2
-
numprocs
¶ Number of compute units for this context
-
ptr
¶ Raw pointer value for the context object
-
total_gmem
¶ Total size of global memory on the device
-
unique_id
¶ Device PCI Bus ID for this context
-
class
pygpu.gpuarray.
GpuKernel
(source, name, types, context=None, have_double=False, have_small=False, have_complex=False, have_half=False, cuda=False, opencl=False)¶ Compile a kernel on the device
The kernel function is retrieved using the provided name which must match what you named your kernel in source. You can safely reuse the same name multiple times.
The have_* parameter are there to tell libgpuarray that we need the particular type or feature to work for this kernel. If the request can’t be satisfied a
UnsupportedException
will be raised in the constructor.Once you have the kernel object you can simply call it like so:
k = GpuKernel(...) k(param1, param2, n=n)
where n is the minimum number of threads to run. libgpuarray will try to stay close to this number but may run a few more threads to match the hardware preferred multiple and stay efficient. You should watch out for this in your code and make sure to test against the size of your data.
If you want more control over thread allocation you can use the gs and ls parameters like so:
k = GpuKernel(...) k(param1, param2, gs=gs, ls=ls)
If you choose to use this interface, make sure to stay within the limits of k.maxlsize or the call will fail.
Parameters: - source (str) – complete kernel source code
- name (str) – function name of the kernel
- types (list or tuple) – list of argument types
- context (GpuContext) – device on which the kernel is compiled
- have_double (bool) – ensure working doubles?
- have_small (bool) – ensure types smaller than float will work?
- have_complex (bool) – ensure complex types will work?
- have_half (bool) – ensure half-floats will work?
- cuda (bool) – kernel is cuda code?
- opencl (bool) – kernel is opencl code?
Notes
With the cuda backend, unless you use the cluda include, you must either pass the mangled name of your kernel or declare the function ‘extern “C”’, because cuda uses a C++ compiler unconditionally.
Warning
If you do not set the have_ flags properly, you will either get a device-specific error (the good case) or silent completely bogus data (the bad case).
-
context
¶
-
maxlsize
¶ Maximum local size for this kernel
-
numargs
¶ Number of arguments to kernel
-
preflsize
¶ Preferred multiple for local size for this kernel
-
exception
pygpu.gpuarray.
UnsupportedException
¶
-
pygpu.gpuarray.
abi_version
()¶
-
pygpu.gpuarray.
api_version
()¶
-
pygpu.gpuarray.
array
(obj, dtype='float64', copy=True, order=None, ndmin=0, context=None, cls=None)¶ Create a GpuArray from existing data
This function creates a new GpuArray from the data provided in obj except if obj is already a GpuArray and all the parameters match its properties and copy is False.
The properties of the resulting array depend on the input data except if overridden by other parameters.
This function is similar to
numpy.array()
except that it returns GpuArrays.Parameters: - obj (array-like) – data to initialize the result
- dtype (string or numpy.dtype or int) – data type of the result elements
- copy (bool) – return a copy?
- order (str) – memory layout of the result
- ndmin (int) – minimum number of result dimensions
- context (GpuContext) – allocation context
- cls (type) – result class (must inherit from GpuArray)
-
pygpu.gpuarray.
asarray
(a, dtype=None, order='A', context=None)¶ Returns a GpuArray from the data in a
If a is already a GpuArray and all other parameters match, then the object itself returned. If a is an instance of a subclass of GpuArray then a view of the base class will be returned. Otherwise a new object is create and the data is copied into it.
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
Parameters: - a (array-like) – data
- dtype (str, numpy.dtype or int) – type of the elements
- order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
- context (GpuContext) – context in which to do the allocation
-
pygpu.gpuarray.
ascontiguousarray
(a, dtype=None, context=None)¶ Returns a contiguous array in device memory (C order).
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
Parameters: - a (array-like) – input
- dtype (str, numpy.dtype or int) – type of the return array
- context (GpuContext) – context to use for a new array
-
pygpu.gpuarray.
asfortranarray
(a, dtype=None, context=None)¶ Returns a contiguous array in device memory (Fortran order)
context is optional if a is a GpuArray (but must match exactly the context of a if specified) and is mandatory otherwise.
Parameters: - a (array-like) – input
- dtype (str, numpy.dtype or int) – type of the elements
- context (GpuContext) – context in which to do the allocation
-
pygpu.gpuarray.
cl_wrap_ctx
(ptr)¶ Wrap an existing OpenCL context (the cl_context struct) into a GpuContext class.
-
pygpu.gpuarray.
count_devices
(kind, platform)¶ Returns number of devices in host’s platform compatible with kind.
-
pygpu.gpuarray.
count_platforms
(kind)¶ Return number of host’s platforms compatible with kind.
-
pygpu.gpuarray.
cuda_wrap_ctx
(ptr)¶ Wrap an existing CUDA driver context (CUcontext) into a GpuContext class.
If own is true, libgpuarray is now responsible for the context and it will be destroyed once there are no references to it. Otherwise, the context will not be destroyed and it is the calling code’s responsibility.
-
pygpu.gpuarray.
dtype_to_ctype
(dtype)¶ Return the C name for a type.
Parameters: dtype (numpy.dtype) – type to get the name for
-
pygpu.gpuarray.
dtype_to_typecode
(dtype)¶ Get the internal typecode for a type.
Parameters: dtype (numpy.dtype) – type to get the code for
-
pygpu.gpuarray.
empty
(shape, dtype='float64', order='C', context=None, cls=None)¶ Returns an empty (uninitialized) array of the requested shape, type and order.
Parameters: - shape (iterable of ints) – number of elements in each dimension
- dtype (str, numpy.dtype or int) – type of the elements
- order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
- context (GpuContext) – context in which to do the allocation
- cls (type) – class of the returned array (must inherit from GpuArray)
-
class
pygpu.gpuarray.
flags
¶ -
aligned
¶
-
behaved
¶
-
c_contiguous
¶
-
carray
¶
-
contiguous
¶
-
f_contiguous
¶
-
farray
¶
-
fnc
¶
-
forc
¶
-
fortran
¶
-
num
¶
-
owndata
¶
-
updateifcopy
¶
-
writeable
¶
-
-
pygpu.gpuarray.
from_gpudata
(data, offset, dtype, shape, context=None, strides=None, writable=True, base=None, cls=None)¶ Build a GpuArray from pre-allocated gpudata
Parameters: - data (int) – pointer to a gpudata structure
- offset (int) – offset to the data location inside the gpudata
- dtype (numpy.dtype) – data type of the gpudata elements
- shape (iterable of ints) – shape to use for the result
- context (GpuContext) – context of the gpudata
- strides (iterable of ints) – strides for the results (C contiguous if not specified)
- writable (bool) – is the data writable?
- base (object) – base object that keeps gpudata alive
- cls (type) – view type of the result
Notes
This function might be deprecated in a later release since the only way to create gpudata pointers is through libgpuarray functions that aren’t exposed at the python level. It can be used with the value of the gpudata attribute of an existing GpuArray.
Warning
This function is intended for advanced use and will crash the interpreter if used improperly.
-
pygpu.gpuarray.
get_default_context
()¶ Return the currently defined default context (or None).
-
pygpu.gpuarray.
init
()¶ - init(dev, sched=’default’, single_stream=False, kernel_cache_path=None,
- max_cache_size=sys.maxsize, initial_cache_size=0)
Creates a context from a device specifier.
Device specifiers are composed of the type string and the device id like so:
"cuda0" "opencl0:1"
For cuda the device id is the numeric identifier. You can see what devices are available by running nvidia-smi on the machine. Be aware that the ordering in nvidia-smi might not correspond to the ordering in this library. This is due to how cuda enumerates devices. If you don’t specify a number (e.g. ‘cuda’) the first available device will be selected according to the backend order.
For opencl the device id is the platform number, a colon (:) and the device number. On Debian, the clinfo package can list available platforms and devices. Or, you can experiment with the values, unavailable ones will just raise an error, and there are no gaps in the valid numbers.
Parameters: - dev (str) – device specifier
- sched ({'default', 'single', 'multi'}) – optimize scheduling for which type of operation
- disable_alloc_cache (bool) – disable allocation cache (if any)
- single_stream (bool) – enable single stream mode
Returns True if a and b may share memory, False otherwise.
-
pygpu.gpuarray.
open_ipc_handle
(c, hpy, l)¶ Open an IPC handle to get a new GpuArray from it.
Parameters: - c (GpuContext) – context
- hpy (bytes) – binary handle data received
- l (int) – size of the referred memory block
-
pygpu.gpuarray.
register_dtype
(dtype, cname)¶ Make a new type known to the cluda machinery.
This function return the associted internal typecode for the new type.
Parameters: - dtype (numpy.dtype) – new type
- cname (str) – C name for the type declarations
-
pygpu.gpuarray.
set_default_context
(ctx)¶ Set the default context for the module.
The provided context will be used as a default value for all the other functions in this module which take a context as parameter. Call with None to clear the default value.
If you don’t call this function the context of all other functions is a mandatory argument.
This can be helpful to reduce clutter when working with only one context. It is strongly discouraged to use this function when working with multiple contexts at once.
Parameters: ctx (GpuContext) – default context
-
pygpu.gpuarray.
zeros
(shape, dtype='float64', order='C', context=None, cls=None)¶ Returns an array of zero-initialized values of the requested shape, type and order.
Parameters: - shape (iterable of ints) – number of elements in each dimension
- dtype (str, numpy.dtype or int) – type of the elements
- order ({'A', 'C', 'F'}) – layout of the data in memory, one of ‘A’ny, ‘C’ or ‘F’ortran
- context (GpuContext) – context in which to do the allocation
- cls (type) – class of the returned array (must inherit from GpuArray)
pygpu.elemwise module¶
-
class
pygpu.elemwise.
GpuElemwise
¶
-
pygpu.elemwise.
as_argument
(o, name, read=False, write=False)¶
-
pygpu.elemwise.
elemwise1
(a, op, oper=None, op_tmpl='res = %(op)sa', out=None, convert_f16=True)¶
-
pygpu.elemwise.
elemwise2
(a, op, b, ary, odtype=None, oper=None, op_tmpl='res = (%(out_t)s)a %(op)s (%(out_t)s)b', broadcast=False, convert_f16=True)¶
-
pygpu.elemwise.
ielemwise2
(a, op, b, oper=None, op_tmpl='a = a %(op)s b', broadcast=False, convert_f16=True)¶
-
pygpu.elemwise.
compare
(a, op, b, broadcast=False, convert_f16=True)¶
pygpu.operations module¶
-
pygpu.operations.
array_split
(ary, indices_or_sections, axis=0)¶
-
pygpu.operations.
atleast_1d
(*arys)¶
-
pygpu.operations.
atleast_2d
(*arys)¶
-
pygpu.operations.
atleast_3d
(*arys)¶
-
pygpu.operations.
concatenate
(arys, axis=0, context=None)¶
-
pygpu.operations.
dsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
dstack
(tup, context=None)¶
-
pygpu.operations.
hsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
hstack
(tup, context=None)¶
-
pygpu.operations.
split
(ary, indices_or_sections, axis=0)¶
-
pygpu.operations.
vsplit
(ary, indices_or_sections)¶
-
pygpu.operations.
vstack
(tup, context=None)¶
pygpu.reduction module¶
-
class
pygpu.reduction.
ReductionKernel
(context, dtype_out, neutral, reduce_expr, redux, map_expr=None, arguments=None, preamble='', init_nd=None)¶
-
pygpu.reduction.
massage_op
(operation)¶
-
pygpu.reduction.
parse_c_args
(arguments)¶
-
pygpu.reduction.
reduce1
(ary, op, neutral, out_type, axis=None, out=None, oper=None)¶
pygpu.blas module¶
-
pygpu.blas.
dot
(X, Y, Z=None, overwrite_z=False)¶
-
pygpu.blas.
gemm
(alpha, A, B, beta, C=None, trans_a=False, trans_b=False, overwrite_c=False)¶
-
pygpu.blas.
gemmBatch_3d
(alpha, A, B, beta, C=None, trans_a=False, trans_b=False, overwrite_c=False)¶
-
pygpu.blas.
gemv
(alpha, A, X, beta=0.0, Y=None, trans_a=False, overwrite_y=False)¶
-
pygpu.blas.
ger
(alpha, X, Y, A=None, overwrite_a=False)¶
pygpu.collectives module¶
-
class
pygpu.collectives.
GpuComm
(cid, ndev, rank)¶ Represents a communicator which participates in a multi-gpu clique.
It is used to invoke collective operations to gpus inside its clique.
Parameters: - cid (GpuCommCliqueId) – Unique id shared among participating communicators.
- ndev (int) – Number of communicators inside the clique.
- rank (int) – User-defined rank of this communicator inside the clique. It influences order of collective operations.
-
all_gather
(self, src, dest=None, nd_up=1)¶ AllGather collective operation for ranks in a communicator world.
Parameters: - src (GpuArray) – Array to be gathered.
- dest (GpuArray) – Array to receive all gathered arrays from ranks in GpuComm.
- nd_up (int) – Used when creating result array. Indicates how many extra dimensions user wants result to have. Default is 1, which means that the result will store each rank’s gathered array in one extra new dimension.
Notes
- Providing nd_up == 0 means that gathered arrays will be appended to the dimension with the largest stride.
-
all_reduce
(self, src, op, dest=None)¶ AllReduce collective operation for ranks in a communicator world.
Parameters: Notes
- Not providing dest argument for a caller will result in creating
a new compatible
GpuArray
and returning result in it.
- Not providing dest argument for a caller will result in creating
a new compatible
-
broadcast
(self, array, root=-1)¶ Broadcast collective operation for ranks in a communicator world.
Parameters: - array (GpuArray) – Array to be reduced.
- root (int) – Rank in GpuComm which broadcasts its array.
Notes
- root is necessary when invoking from a non-root rank. Root caller does not need to provide root argument.
-
count
¶ Total number of communicators inside the clique
-
rank
¶ User-defined rank of this communicator inside the clique
-
reduce
(self, src, op, dest=None, root=-1)¶ Reduce collective operation for ranks in a communicator world.
Parameters: Notes
- root is necessary when invoking from a non-root rank. Root caller does not need to provide root argument.
- Not providing dest argument for a root caller will result
in creating a new compatible
GpuArray
and returning result in it.
-
reduce_scatter
(self, src, op, dest=None)¶ ReduceScatter collective operation for ranks in a communicator world.
Parameters: Notes
- Not providing dest argument for a caller will result in creating
a new compatible
GpuArray
and returning result in it.
- Not providing dest argument for a caller will result in creating
a new compatible
-
class
pygpu.collectives.
GpuCommCliqueId
(context=None, comm_id=None)¶ Represents a unique id shared among
GpuComm
communicators which participate in a multi-gpu clique.Parameters: - context (GpuContext) – Reference to which gpu this GpuCommCliqueId object belongs.
- comm_id (bytes) – Existing unique id to be passed in this object.
-
context
¶
pygpu.dtypes module¶
Type mapping helpers.
-
pygpu.dtypes.
dtype_to_ctype
(dtype)¶ Return the C type that corresponds to dtype.
Parameters: dtype (data type) – a numpy dtype
-
pygpu.dtypes.
get_common_dtype
(obj1, obj2, allow_double)¶ Returns the proper output type for a numpy operation involving the two provided objects. This may not be suitable for certain obscure numpy operations.
If allow_double is False, a return type of float64 will be forced to float32 and complex128 will be forced to complex64.
-
pygpu.dtypes.
get_np_obj
(obj)¶ Returns a numpy object of the same dtype and comportement as the source suitable for output dtype determination.
This is used since the casting rules of numpy are rather obscure and the best way to imitate them is to try an operation ans see what it does.
-
pygpu.dtypes.
parse_c_arg_backend
(c_arg, scalar_arg_class, vec_arg_class)¶
-
pygpu.dtypes.
register_dtype
(dtype, c_names)¶ Associate a numpy dtype with its C equivalents.
Will register dtype for use with the gpuarray module. If the c_names argument is a list then the first element of that list is taken as the primary association and will be used for generated C code. The other types will be mapped to the provided dtype when going in the other direction.
Parameters: - dtype (numpy.dtype or string) – type to associate
- c_names (str or list) – list of C type names
-
pygpu.dtypes.
upcast
(*args)¶
pygpu.tools module¶
-
pygpu.tools.
as_argument
(obj, name)¶
-
pygpu.tools.
check_args
(args, collapse=False, broadcast=False)¶ Returns the properties of arguments and checks if they all match (are all the same shape)
If collapse is True dimension collapsing will be performed. If collapse is False dimension collapsing will not be performed.
If broadcast is True array broadcasting will be performed which means that dimensions which are of size 1 in some arrays but not others will be repeated to match the size of the other arrays. If broadcast is False no broadcasting takes place.
-
pygpu.tools.
lru_cache
(maxsize=20)¶
-
pygpu.tools.
prod
(iterable)¶