Class CacheableData<T extends CacheBlock>
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.cp.Data
-
- org.apache.sysds.runtime.controlprogram.caching.CacheableData<T>
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
FrameObject,MatrixObject,TensorObject
public abstract class CacheableData<T extends CacheBlock> extends Data
Each object of this class is a cache envelope for some large piece of data called "cache block". For example, the body of a matrix can be the cache block. The term cache block refers strictly to the cacheable portion of the data object, often excluding metadata and auxiliary parameters, as defined in the subclasses. Under the protection of the envelope, the data blob may be evicted to the file system; then the subclass must set its reference tonullto allow Java garbage collection. If other parts of the system continue keep references to the cache block, its eviction will not release any memory.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCacheableData.CacheStatusDefines all possible cache status types for a data blob.
-
Field Summary
Fields Modifier and Type Field Description static StringcacheEvictionLocalFilePathstatic StringcacheEvictionLocalFilePrefixstatic booleanCACHING_ASYNC_FILECLEANUPstatic booleanCACHING_ASYNC_SERIALIZEstatic booleanCACHING_BUFFER_PAGECACHEstatic LazyWriteBuffer.RPolicyCACHING_BUFFER_POLICYstatic StringCACHING_COUNTER_GROUP_NAMEstatic StringCACHING_EVICTION_FILEEXTENSIONstatic longCACHING_THRESHOLDstatic booleanCACHING_WRITE_CACHE_ON_READ
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description TacquireModify(T newData)Acquires the exclusive "write" lock for a thread that wants to throw away the old cache block data and link up with new cache block data.TacquireRead()Acquires a shared "read-only" lock, produces the reference to the cache block, restores the cache block to main memory, reads from HDFS if needed.TacquireReadAndRelease()static voidaddBroadcastSize(long size)static voidcleanupCacheDir()static voidcleanupCacheDir(boolean withDir)Deletes the DML-script-specific caching working dir.voidclearData()voidclearData(long tid)Sets the cache block reference tonull, abandons the old block.static voiddisableCaching()static voidenableCaching()voidenableCleanup(boolean flag)Enables or disables the cleanup of the associated data object on clearData().voidexportData()voidexportData(int replication)Writes, or flushes, the cache block data to HDFS.voidexportData(String fName, String outputFormat)voidexportData(String fName, String outputFormat, int replication, FileFormatProperties formatProperties)Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop).voidexportData(String fName, String outputFormat, FileFormatProperties formatProperties)voidfreeEvictedBlob()Low-level cache I/O method that deletes the file containing the evicted data blob, without reading it.longgetBlocksize()BroadcastObject<T>getBroadcastHandle()static longgetBroadcastSize()LineageItemgetCacheLineage()longgetCompressedSize()DataCharacteristicsgetDataCharacteristics()longgetDataSize()StringgetDebugName()longgetDim(int dim)FederationMapgetFedMapping()Gets the mapping of indices ranges to federated objects.FileFormatPropertiesgetFileFormatProperties()StringgetFileName()GPUObjectgetGPUObject(GPUContext gCtx)MetaDatagetMetaData()longgetNumColumns()longgetNumRows()RDDObjectgetRDDHandle()CacheableData.CacheStatusgetStatus()longgetUniqueID()booleanhasValidLineage()static voidinitCaching()Inits caching with the default uuid of DMLScriptstatic voidinitCaching(String uuid)Creates the DML-script-specific caching working dir.static booleanisBelowCachingThreshold(CacheBlock data)booleanisCached(boolean inclCachedNoWrite)static booleanisCachingActive()booleanisCleanupEnabled()Indicates if cleanup of the associated data object is enabled on clearData().booleanisCompressed()booleanisDirty()trueif the in-memory or evicted matrix may be different from the matrix located at_hdfsFileName;falseif the two matrices are supposed to be the same.booleanisFederated()Check if object is federated.booleanisFederated(FTypes.FType type)booleanisFederatedExcept(FTypes.FType type)booleanisHDFSFileExists()booleanisPendingRDDOps()booleanmoveData(String fName, String outputFormat)abstract voidrefreshMetaData()voidrelease()Releases the shared ("read-only") or exclusive ("write") lock.voidremoveGPUObject(GPUContext gCtx)voidremoveMetaData()voidsetBroadcastHandle(BroadcastObject bc)voidsetCacheLineage(LineageItem li)voidsetCompressedSize(long size)voidsetDirty(boolean flag)voidsetEmptyStatus()voidsetFedMapping(FederationMap fedMapping)Sets the mapping of indices ranges to federated objects.voidsetFileFormatProperties(FileFormatProperties props)voidsetFileName(String file)voidsetGPUObject(GPUContext gCtx, GPUObject gObj)voidsetHDFSFileExists(boolean flag)voidsetMetaData(MetaData md)voidsetRDDHandle(RDDObject rdd)StringtoString()-
Methods inherited from class org.apache.sysds.runtime.instructions.cp.Data
getDataType, getPrivacyConstraint, getValueType, setPrivacyConstraints, updateDataCharacteristics
-
-
-
-
Field Detail
-
CACHING_THRESHOLD
public static final long CACHING_THRESHOLD
-
CACHING_BUFFER_POLICY
public static final LazyWriteBuffer.RPolicy CACHING_BUFFER_POLICY
-
CACHING_BUFFER_PAGECACHE
public static final boolean CACHING_BUFFER_PAGECACHE
- See Also:
- Constant Field Values
-
CACHING_WRITE_CACHE_ON_READ
public static final boolean CACHING_WRITE_CACHE_ON_READ
- See Also:
- Constant Field Values
-
CACHING_COUNTER_GROUP_NAME
public static final String CACHING_COUNTER_GROUP_NAME
- See Also:
- Constant Field Values
-
CACHING_EVICTION_FILEEXTENSION
public static final String CACHING_EVICTION_FILEEXTENSION
- See Also:
- Constant Field Values
-
CACHING_ASYNC_FILECLEANUP
public static final boolean CACHING_ASYNC_FILECLEANUP
- See Also:
- Constant Field Values
-
CACHING_ASYNC_SERIALIZE
public static final boolean CACHING_ASYNC_SERIALIZE
- See Also:
- Constant Field Values
-
cacheEvictionLocalFilePath
public static String cacheEvictionLocalFilePath
-
cacheEvictionLocalFilePrefix
public static String cacheEvictionLocalFilePrefix
-
-
Method Detail
-
enableCleanup
public void enableCleanup(boolean flag)
Enables or disables the cleanup of the associated data object on clearData().- Parameters:
flag- true if cleanup
-
isCleanupEnabled
public boolean isCleanupEnabled()
Indicates if cleanup of the associated data object is enabled on clearData().- Returns:
- true if cleanup enabled
-
getStatus
public CacheableData.CacheStatus getStatus()
-
isHDFSFileExists
public boolean isHDFSFileExists()
-
setHDFSFileExists
public void setHDFSFileExists(boolean flag)
-
getFileName
public String getFileName()
-
getUniqueID
public long getUniqueID()
-
setFileName
public void setFileName(String file)
-
isDirty
public boolean isDirty()
trueif the in-memory or evicted matrix may be different from the matrix located at_hdfsFileName;falseif the two matrices are supposed to be the same.- Returns:
- true if dirty
-
setDirty
public void setDirty(boolean flag)
-
getFileFormatProperties
public FileFormatProperties getFileFormatProperties()
-
setFileFormatProperties
public void setFileFormatProperties(FileFormatProperties props)
-
setMetaData
public void setMetaData(MetaData md)
- Overrides:
setMetaDatain classData
-
setCompressedSize
public void setCompressedSize(long size)
-
isCompressed
public boolean isCompressed()
-
getCompressedSize
public long getCompressedSize()
-
getMetaData
public MetaData getMetaData()
- Overrides:
getMetaDatain classData
-
removeMetaData
public void removeMetaData()
- Overrides:
removeMetaDatain classData
-
getDataCharacteristics
public DataCharacteristics getDataCharacteristics()
-
getDim
public long getDim(int dim)
-
getNumRows
public long getNumRows()
-
getNumColumns
public long getNumColumns()
-
getBlocksize
public long getBlocksize()
-
refreshMetaData
public abstract void refreshMetaData()
-
getCacheLineage
public LineageItem getCacheLineage()
-
setCacheLineage
public void setCacheLineage(LineageItem li)
-
hasValidLineage
public boolean hasValidLineage()
-
isFederated
public boolean isFederated()
Check if object is federated.- Returns:
- true if federated else false
-
isFederated
public boolean isFederated(FTypes.FType type)
-
isFederatedExcept
public boolean isFederatedExcept(FTypes.FType type)
-
getFedMapping
public FederationMap getFedMapping()
Gets the mapping of indices ranges to federated objects.- Returns:
- fedMapping mapping
-
setFedMapping
public void setFedMapping(FederationMap fedMapping)
Sets the mapping of indices ranges to federated objects.- Parameters:
fedMapping- mapping
-
getRDDHandle
public RDDObject getRDDHandle()
-
setRDDHandle
public void setRDDHandle(RDDObject rdd)
-
getBroadcastHandle
public BroadcastObject<T> getBroadcastHandle()
-
setBroadcastHandle
public void setBroadcastHandle(BroadcastObject bc)
-
getGPUObject
public GPUObject getGPUObject(GPUContext gCtx)
-
setGPUObject
public void setGPUObject(GPUContext gCtx, GPUObject gObj)
-
removeGPUObject
public void removeGPUObject(GPUContext gCtx)
-
acquireReadAndRelease
public T acquireReadAndRelease()
-
acquireRead
public T acquireRead()
Acquires a shared "read-only" lock, produces the reference to the cache block, restores the cache block to main memory, reads from HDFS if needed. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: READ(+1).- Returns:
- cacheable data
-
acquireModify
public T acquireModify(T newData)
Acquires the exclusive "write" lock for a thread that wants to throw away the old cache block data and link up with new cache block data. Abandons the old data without reading it and sets the new data reference. In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: MODIFY.- Parameters:
newData- new data- Returns:
- cacheable data
-
release
public void release()
Releases the shared ("read-only") or exclusive ("write") lock. Updates size information, last-access time, metadata, etc. Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). In-Status: READ, MODIFY; Out-Status: READ(-1), EVICTABLE, EMPTY.
-
clearData
public void clearData()
-
clearData
public void clearData(long tid)
Sets the cache block reference tonull, abandons the old block. Makes the "envelope" empty. Run it to finalize the object (otherwise the evicted cache block file may remain undeleted). In-Status: EMPTY, EVICTABLE, EVICTED; Out-Status: EMPTY.- Parameters:
tid- thread ID
-
exportData
public void exportData()
-
exportData
public void exportData(int replication)
Writes, or flushes, the cache block data to HDFS. In-Status: EMPTY, EVICTABLE, EVICTED, READ; Out-Status: EMPTY, EVICTABLE, EVICTED, READ.- Parameters:
replication- ?
-
exportData
public void exportData(String fName, String outputFormat, FileFormatProperties formatProperties)
-
exportData
public void exportData(String fName, String outputFormat, int replication, FileFormatProperties formatProperties)
Synchronized because there might be parallel threads (parfor local) that access the same object (in case it was created before the loop). If all threads export the same data object concurrently it results in errors because they all write to the same file. Efficiency for loops and parallel threads is achieved by checking if the in-memory block is dirty. NOTE: MB: we do not use dfs copy from local (evicted) to HDFS because this would ignore the output format and most importantly would bypass reblocking during write (which effects the potential degree of parallelism). However, we copy files on HDFS if certain criteria are given.- Parameters:
fName- file nameoutputFormat- formatreplication- ?formatProperties- file format properties
-
freeEvictedBlob
public final void freeEvictedBlob()
Low-level cache I/O method that deletes the file containing the evicted data blob, without reading it. Must be defined by a subclass, never called by users.
-
isBelowCachingThreshold
public static boolean isBelowCachingThreshold(CacheBlock data)
-
getDataSize
public long getDataSize()
-
getDebugName
public String getDebugName()
- Specified by:
getDebugNamein classData
-
isCached
public boolean isCached(boolean inclCachedNoWrite)
-
setEmptyStatus
public void setEmptyStatus()
-
isPendingRDDOps
public boolean isPendingRDDOps()
-
addBroadcastSize
public static void addBroadcastSize(long size)
-
getBroadcastSize
public static long getBroadcastSize()
-
cleanupCacheDir
public static void cleanupCacheDir()
-
cleanupCacheDir
public static void cleanupCacheDir(boolean withDir)
Deletes the DML-script-specific caching working dir.- Parameters:
withDir- if true, delete directory
-
initCaching
public static void initCaching() throws IOExceptionInits caching with the default uuid of DMLScript- Throws:
IOException- if IOException occurs
-
initCaching
public static void initCaching(String uuid) throws IOException
Creates the DML-script-specific caching working dir. Takes the UUID in order to allow for custom uuid, e.g., for remote parfor caching- Parameters:
uuid- ID- Throws:
IOException- if IOException occurs
-
isCachingActive
public static boolean isCachingActive()
-
disableCaching
public static void disableCaching()
-
enableCaching
public static void enableCaching()
-
-