.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cluster/plot_face_compress.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_cluster_plot_face_compress.py>`
        to download the full example code. or to run this example in your browser via JupyterLite or Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_face_compress.py:


===========================
مثال على تكميم المتجهات
===========================

يوضح هذا المثال كيف يمكن استخدام :class:`~sklearn.preprocessing.KBinsDiscretizer`
لإجراء تكميم المتجهات على مجموعة من الصور التجريبية، وجه الراكون.

.. GENERATED FROM PYTHON SOURCE LINES 9-13

.. code-block:: Python


    # Authors: The scikit-learn developers
    # SPDX-License-Identifier: BSD-3-Clause


.. GENERATED FROM PYTHON SOURCE LINES 14-24

الصورة الأصلية
--------------

سنبدأ بتحميل صورة وجه الراكون من SciPy. سنقوم أيضًا بفحص
بعض المعلومات المتعلقة بالصورة، مثل الشكل ونوع البيانات المستخدم
لتخزين الصورة.

لاحظ أنه اعتمادًا على إصدار SciPy، نحتاج إلى تعديل الاستيراد
نظرًا لأن الدالة التي تعيد الصورة ليست موجودة في نفس الوحدة.
أيضًا، يتطلب SciPy >= 1.10 تثبيت الحزمة `pooch`.

.. GENERATED FROM PYTHON SOURCE LINES 24-37

.. code-block:: Python

    try:  # Scipy >= 1.10
        from sklearn.preprocessing import KBinsDiscretizer
        import matplotlib.pyplot as plt
        from scipy.datasets import face
    except ImportError:
        from scipy.misc import face

    raccoon_face = face(gray=True)

    print(f"البعد الخاص بالصورة هو {raccoon_face.shape}")
    print(f"البيانات المستخدمة لترميز الصورة هي من نوع {raccoon_face.dtype}")
    print(f"عدد البايتات المستخدمة في الذاكرة هو {raccoon_face.nbytes}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    البعد الخاص بالصورة هو (768, 1024)
    البيانات المستخدمة لترميز الصورة هي من نوع uint8
    عدد البايتات المستخدمة في الذاكرة هو 786432


.. GENERATED FROM PYTHON SOURCE LINES 38-44

وبالتالي، فإن الصورة هي مصفوفة ثنائية الأبعاد بارتفاع 768 بكسل وعرض 1024 بكسل.
كل قيمة هي عدد صحيح غير موقع 8 بت، مما يعني أن الصورة مشفرة باستخدام 8 بت لكل بكسل.
إجمالي استخدام الذاكرة للصورة هو 786 كيلوبايت (1 بايت يساوي 8 بت).

باستخدام عدد صحيح غير موقع 8 بت يعني أن الصورة مشفرة باستخدام 256 لونًا مختلفًا
من الرمادي، على الأكثر. يمكننا فحص توزيع هذه القيم.

.. GENERATED FROM PYTHON SOURCE LINES 44-56

.. code-block:: Python


    fig, ax = plt.subplots(ncols=2, figsize=(12, 4))

    ax[0].imshow(raccoon_face, cmap=plt.cm.gray)
    ax[0].axis("off")
    ax[0].set_title("عرض الصورة")
    ax[1].hist(raccoon_face.ravel(), bins=256)
    ax[1].set_xlabel("قيمة البكسل")
    ax[1].set_ylabel("عدد البكسلات")
    ax[1].set_title("توزيع قيم البكسل")
    _ = fig.suptitle("الصورة الأصلية لوجه الراكون")


.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_001.png
   :alt: الصورة الأصلية لوجه الراكون, عرض الصورة, توزيع قيم البكسل
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_face_compress_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 57-72

ضغط عبر تكميم المتجهات
----------------------

الفكرة وراء ضغط عبر تكميم المتجهات هي تقليل عدد مستويات الرمادي لتمثيل الصورة.
على سبيل المثال، يمكننا استخدام 8 قيم بدلاً من 256 قيمة.
وبالتالي، فهذا يعني أننا يمكن أن نستخدم 3 بت بدلاً من 8 بت لترميز بكسل واحد
وبالتالي تقليل استخدام الذاكرة بمعامل تقريبًا 2.5. سنناقش لاحقًا حول استخدام الذاكرة.

استراتيجية الترميز
"""""""""""""""""""

يمكن إجراء الضغط باستخدام :class:`~sklearn.preprocessing.KBinsDiscretizer`.
نحتاج إلى اختيار استراتيجية لتحديد 8 قيم رمادية للتحقيق.
أبسط استراتيجية هي تحديدها بشكل متساوٍ، مما يتوافق مع ضبط `strategy="uniform"`.
من الرسم البياني السابق، نعلم أن هذه الاستراتيجية ليست بالضرورة أمثل.

.. GENERATED FROM PYTHON SOURCE LINES 72-95

.. code-block:: Python


    n_bins = 8
    encoder = KBinsDiscretizer(
        n_bins=n_bins,
        encode="ordinal",
        strategy="uniform",
        random_state=0,
    )
    compressed_raccoon_uniform = encoder.fit_transform(raccoon_face.reshape(-1, 1)).reshape(
        raccoon_face.shape
    )

    fig, ax = plt.subplots(ncols=2, figsize=(12, 4))
    ax[0].imshow(compressed_raccoon_uniform, cmap=plt.cm.gray)
    ax[0].axis("off")
    ax[0].set_title("عرض الصورة")
    ax[1].hist(compressed_raccoon_uniform.ravel(), bins=256)
    ax[1].set_xlabel("قيمة البكسل")
    ax[1].set_ylabel("عدد البكسلات")
    ax[1].set_title("توزيع القيم المحققة للبكسل")
    _ = fig.suptitle("وجه الراكون المضغوط باستخدام 3 بت واستراتيجية متساوية")


.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_002.png
   :alt: وجه الراكون المضغوط باستخدام 3 بت واستراتيجية متساوية, عرض الصورة, توزيع القيم المحققة للبكسل
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_face_compress_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 96-102

نوعيًا، يمكننا ملاحظة بعض المناطق الصغيرة حيث نرى تأثير الضغط
(مثل الأوراق في الزاوية اليمنى السفلى). لكن بعد كل شيء، الصورة الناتجة
لا تزال تبدو جيدة.

نلاحظ أن توزيع قيم البكسل تم تعيينه إلى 8 قيم مختلفة. يمكننا التحقق
من التطابق بين هذه القيم وقيم البكسل الأصلية.

.. GENERATED FROM PYTHON SOURCE LINES 102-107

.. code-block:: Python


    bin_edges = encoder.bin_edges_[0]
    bin_center = bin_edges[:-1] + (bin_edges[1:] - bin_edges[:-1]) / 2
    bin_center


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([ 15.625,  46.875,  78.125, 109.375, 140.625, 171.875, 203.125,
           234.375])


.. GENERATED FROM PYTHON SOURCE LINES 108-116

.. code-block:: Python

    _, ax = plt.subplots()
    ax.hist(raccoon_face.ravel(), bins=256)
    color = "tab:orange"
    for center in bin_center:
        ax.axvline(center, color=color)
        ax.text(center - 10, ax.get_ybound()
                [1] + 100, f"{center:.1f}", color=color)


.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_003.png
   :alt: plot face compress
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_face_compress_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 117-122

كما ذكرنا سابقًا، الاستراتيجية المتساوية للتحقيق ليست أمثل.
لاحظ على سبيل المثال أن البكسلات المعينة إلى القيمة 7 ستقوم بترميز
كمية صغيرة نسبيًا من المعلومات، بينما القيمة المعينة 3 ستمثل كمية
كبيرة من العدد. يمكننا بدلاً من ذلك استخدام استراتيجية تجميع مثل k-means
للعثور على تعيين أكثر امتثالًا.

.. GENERATED FROM PYTHON SOURCE LINES 122-143

.. code-block:: Python


    encoder = KBinsDiscretizer(
        n_bins=n_bins,
        encode="ordinal",
        strategy="kmeans",
        random_state=0,
    )
    compressed_raccoon_kmeans = encoder.fit_transform(raccoon_face.reshape(-1, 1)).reshape(
        raccoon_face.shape
    )

    fig, ax = plt.subplots(ncols=2, figsize=(12, 4))
    ax[0].imshow(compressed_raccoon_kmeans, cmap=plt.cm.gray)
    ax[0].axis("off")
    ax[0].set_title("عرض الصورة")
    ax[1].hist(compressed_raccoon_kmeans.ravel(), bins=256)
    ax[1].set_xlabel("قيمة البكسل")
    ax[1].set_ylabel("عدد البكسلات")
    ax[1].set_title("توزيع قيم البكسل")
    _ = fig.suptitle("وجه الراكون المضغوط باستخدام 3 بت واستراتيجية K-means")


.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_004.png
   :alt: وجه الراكون المضغوط باستخدام 3 بت واستراتيجية K-means, عرض الصورة, توزيع قيم البكسل
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_face_compress_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 144-148

.. code-block:: Python

    bin_edges = encoder.bin_edges_[0]
    bin_center = bin_edges[:-1] + (bin_edges[1:] - bin_edges[:-1]) / 2
    bin_center


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([ 18.90885631,  53.34346583,  82.64447187, 109.28225276,
           134.70763101, 159.78681467, 185.17226834, 224.02069427])


.. GENERATED FROM PYTHON SOURCE LINES 149-157

.. code-block:: Python

    _, ax = plt.subplots()
    ax.hist(raccoon_face.ravel(), bins=256)
    color = "tab:orange"
    for center in bin_center:
        ax.axvline(center, color=color)
        ax.text(center - 10, ax.get_ybound()
                [1] + 100, f"{center:.1f}", color=color)


.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_face_compress_005.png
   :alt: plot face compress
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_face_compress_005.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 158-166

العدد في الصناديق الآن أكثر توازنًا ومراكزها لم تعد متساوية المسافات.
لاحظ أنه يمكننا فرض نفس عدد البكسلات في كل صندوق باستخدام `strategy="quantile"`
بدلاً من `strategy="kmeans"`.

استخدام الذاكرة
""""""""""""""""

ذكرنا سابقًا أننا يجب أن نوفر 8 مرات أقل من الذاكرة. دعونا نتحقق من ذلك.

.. GENERATED FROM PYTHON SOURCE LINES 166-171

.. code-block:: Python


    print(
        f"عدد البايتات المستخدمة في الذاكرة هو {compressed_raccoon_kmeans.nbytes}")
    print(f"نسبة الضغط: {compressed_raccoon_kmeans.nbytes / raccoon_face.nbytes}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    عدد البايتات المستخدمة في الذاكرة هو 6291456
    نسبة الضغط: 8.0


.. GENERATED FROM PYTHON SOURCE LINES 172-175

من المدهش جدًا رؤية أن الصورة المضغوطة تستخدم ذاكرة أكثر بـ x8
من الصورة الأصلية. هذا هو بالضبط عكس ما كنا نتوقعه. السبب يرجع أساسًا
إلى نوع البيانات المستخدم لترميز الصورة.

.. GENERATED FROM PYTHON SOURCE LINES 175-178

.. code-block:: Python


    print(f"نوع الصورة المضغوطة: {compressed_raccoon_kmeans.dtype}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    نوع الصورة المضغوطة: float64


.. GENERATED FROM PYTHON SOURCE LINES 179-189

في الواقع، ناتج :class:`~sklearn.preprocessing.KBinsDiscretizer` هو
مصفوفة من النوع float64. هذا يعني أنها تستخدم ذاكرة أكثر بـ x8.
ومع ذلك، نحن نستخدم هذا التمثيل float64 لترميز 8 قيم. في الواقع،
سنوفر الذاكرة فقط إذا قمنا بتحويل الصورة المضغوطة إلى مصفوفة من الأعداد
الصحيحة التي تستخدم 3 بت. يمكننا استخدام طريقة `numpy.ndarray.astype`.
ومع ذلك، لا يوجد تمثيل عدد صحيح بـ 3 بت ولترميز الـ 8 قيم، سنحتاج إلى
استخدام تمثيل عدد صحيح غير موقع 8 بت أيضًا.

في الممارسة العملية، ملاحظة مكسب في الذاكرة ستتطلب أن تكون الصورة الأصلية
بتمثيل float64.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 2.235 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_face_compress.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_face_compress.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: lite-badge

      .. image:: images/jupyterlite_badge_logo.svg
        :target: ../../lite/lab/index.html?path=auto_examples/cluster/plot_face_compress.ipynb
        :alt: Launch JupyterLite
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_face_compress.ipynb <plot_face_compress.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_face_compress.py <plot_face_compress.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_face_compress.zip <plot_face_compress.zip>`


.. include:: plot_face_compress.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_